Updates to Tracing Paper

- Noted need to re-write/fix ID tracking section - Addition to conclusion section - Mentioning of need for detailed detail about system - Re-word of some of the confusinf language
paw10003 · Feb 19, 2015 · 8d5c74a · 8d5c74a
1 parent 7810bdd
commit 8d5c74a
Show file tree

Hide file tree

Showing 5 changed files with 53 additions and 37 deletions.
diff --git a/TracingPaper.aux b/TracingPaper.aux
@@ -74,17 +74,15 @@
 \newlabel{SMB}{{4.1}{4}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {4.2}ID Tracking}{4}}
 \newlabel{ID Tracking}{{4.2}{4}}
+\citation{Traeger2008}
 \@writefile{toc}{\contentsline {subsection}{\numberline {4.3}System Information and Predictions}{5}}
 \newlabel{System Information and Predictions}{{4.3}{5}}
-\@writefile{toc}{\contentsline {subsection}{\numberline {4.4}Run Patterns}{5}}
-\newlabel{Run Patterns}{{4.4}{5}}
-\bibcite{Leung2008}{1}
-\bibcite{Ellard2003}{2}
-\bibcite{EllardLedlie2003}{3}
-\bibcite{Anderson2004}{4}
-\bibcite{Orosz2013}{5}
-\bibcite{Dabir2008}{6}
-\bibcite{Narayan2010}{7}
+\citation{Douceur1999}
+\citation{RuemmlerWilkes1993}
+\citation{Bolosky2007}
+\citation{EllardLedlie2003}
+\@writefile{toc}{\contentsline {subsection}{\numberline {4.4}Run Patterns}{6}}
+\newlabel{Run Patterns}{{4.4}{6}}
 \@writefile{toc}{\contentsline {subsection}{\numberline {4.5}Locating Performance Bottlenecks}{6}}
 \newlabel{Locating Performance Bottlenecks}{{4.5}{6}}
 \@writefile{toc}{\contentsline {section}{\numberline {5}Intuition Confirm/Change}{6}}
@@ -93,6 +91,13 @@
 \newlabel{Characterizations of Different Packet Types}{{5.1}{6}}
 \@writefile{toc}{\contentsline {section}{\numberline {6}Conclusion}{6}}
 \newlabel{Conclusion}{{6}{6}}
+\bibcite{Leung2008}{1}
+\bibcite{Ellard2003}{2}
+\bibcite{EllardLedlie2003}{3}
+\bibcite{Anderson2004}{4}
+\bibcite{Orosz2013}{5}
+\bibcite{Dabir2008}{6}
+\bibcite{Narayan2010}{7}
 \bibcite{Skopko2012}{8}
 \bibcite{MS-CIFS}{9}
 \bibcite{MS-SMB}{10}
@@ -105,3 +110,5 @@
 \bibcite{Kavalanekar2009}{17}
 \bibcite{Douceur1999}{18}
 \bibcite{Ruemmler1993}{19}
+\bibcite{RuemmlerWilkes1993}{20}
+\bibcite{Bolosky2007}{21}
diff --git a/TracingPaper.log b/TracingPaper.log
@@ -1,4 +1,4 @@
-This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9 64-bit) (preloaded format=pdflatex 2012.11.13)  18 FEB 2015 09:42
+This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9 64-bit) (preloaded format=pdflatex 2012.11.13)  19 FEB 2015 17:17
 entering extended mode
 **C:/Users/rundeMT/Documents/UConn/TracingPaper/TracingPaper.tex
 (C:/Users/rundeMT/Documents/UConn/TracingPaper/TracingPaper.tex
@@ -172,61 +172,62 @@ Underfull \hbox (badness 2269) in paragraph at lines 102--103
 LaTeX Font Info:    Font shape `OT1/ptm/bx/it' in size <10> not available
 (Font)              Font shape `OT1/ptm/b/it' tried instead on input line 157.
  [4]
-Underfull \hbox (badness 10000) in paragraph at lines 194--195
+Underfull \hbox (badness 10000) in paragraph at lines 195--196
 
  []
 
-LaTeX Font Info:    Try loading font information for OMS+ptm on input line 201.
+LaTeX Font Info:    Try loading font information for OMS+ptm on input line 202.
 
 ("C:\Program Files\MiKTeX 2.9\tex\latex\psnfss\omsptm.fd"
 File: omsptm.fd 
 )
 LaTeX Font Info:    Font shape `OMS/ptm/m/n' in size <10> not available
-(Font)              Font shape `OMS/cmsy/m/n' tried instead on input line 201.
+(Font)              Font shape `OMS/cmsy/m/n' tried instead on input line 202.
  [5]
-Underfull \hbox (badness 1077) in paragraph at lines 366--367
+Underfull \vbox (badness 5578) has occurred while \output is active []
+
+
+Underfull \hbox (badness 1077) in paragraph at lines 368--369
 \OT1/ptm/m/n/10 not only pull out in-for-ma-tion per-ta-nent to the
  []
 
 [6]
-Underfull \hbox (badness 10000) in paragraph at lines 400--401
+Underfull \hbox (badness 10000) in paragraph at lines 402--403
 []\OT1/ptm/m/it/10 Common In-ter-net File Sys-tem (CIFS) Pro-
  []
 
 
-Underfull \hbox (badness 10000) in paragraph at lines 400--401
+Underfull \hbox (badness 10000) in paragraph at lines 402--403
 \OT1/ptm/m/it/10 to-col\OT1/ptm/m/n/10 , urlhttp://msdn.microsoft.com/en-
  []
 
 
-Underfull \hbox (badness 10000) in paragraph at lines 402--403
+Underfull \hbox (badness 10000) in paragraph at lines 404--405
 []\OT1/ptm/m/it/10 Server Mes-sage Block (SMB) Pro-to-
  []
 
 
-Underfull \hbox (badness 10000) in paragraph at lines 402--403
+Underfull \hbox (badness 10000) in paragraph at lines 404--405
 \OT1/ptm/m/it/10 col\OT1/ptm/m/n/10 , urlhttp://msdn.microsoft.com/en-
  []
 
-[7
-
-] (C:\Users\rundeMT\Documents\UConn\TracingPaper\TracingPaper.aux) ) 
+[7] (C:\Users\rundeMT\Documents\UConn\TracingPaper\TracingPaper.aux) ) 
 Here is how much of TeX's memory you used:
- 1476 strings out of 494049
- 19894 string characters out of 3146058
- 78677 words of memory out of 3000000
- 4768 multiletter control sequences out of 15000+200000
+ 1478 strings out of 494049
+ 19927 string characters out of 3146058
+ 79685 words of memory out of 3000000
+ 4770 multiletter control sequences out of 15000+200000
  20443 words of font info for 42 fonts, out of 3000000 for 9000
  715 hyphenation exceptions out of 8191
- 34i,8n,21p,2182b,435s stack positions out of 5000i,500n,10000p,200000b,50000s
-{C:/Program
- Files/MiKTeX 2.9/fonts/enc/dvips/fontname/8r.enc}<C:/Program Files/MiKTeX 2.9/
-fonts/type1/public/amsfonts/cm/cmsy10.pfb><C:/Program Files/MiKTeX 2.9/fonts/ty
-pe1/urw/courier/ucrr8a.pfb><C:/Program Files/MiKTeX 2.9/fonts/type1/urw/times/u
-tmb8a.pfb><C:/Program Files/MiKTeX 2.9/fonts/type1/urw/times/utmbi8a.pfb><C:/Pr
-ogram Files/MiKTeX 2.9/fonts/type1/urw/times/utmr8a.pfb><C:/Program Files/MiKTe
-X 2.9/fonts/type1/urw/times/utmri8a.pfb>
-Output written on TracingPaper.pdf (7 pages, 109988 bytes).
+ 34i,8n,21p,2172b,435s stack positions out of 5000i,500n,10000p,200000b,50000s
+{C:/Progr
+am Files/MiKTeX 2.9/fonts/enc/dvips/fontname/8r.enc}<C:/Program Files/MiKTeX 2.
+9/fonts/type1/public/amsfonts/cm/cmsy10.pfb><C:/Program Files/MiKTeX 2.9/fonts/
+type1/urw/courier/ucrr8a.pfb><C:/Program Files/MiKTeX 2.9/fonts/type1/urw/times
+/utmb8a.pfb><C:/Program Files/MiKTeX 2.9/fonts/type1/urw/times/utmbi8a.pfb><C:/
+Program Files/MiKTeX 2.9/fonts/type1/urw/times/utmr8a.pfb><C:/Program Files/MiK
+TeX 2.9/fonts/type1/urw/times/utmri8a.pfb>
+Output written on TracingPaper.pdf (7 pages, 112306 bytes).
 PDF statistics:
  51 PDF objects out of 1000 (max. 8388607)
  0 named destinations out of 1000 (max. 500000)

diff --git a/TracingPaper.pdf b/TracingPaper.pdf
diff --git a/TracingPaper.synctex.gz b/TracingPaper.synctex.gz
diff --git a/TracingPaper.tex b/TracingPaper.tex
@@ -158,6 +158,7 @@ \subsection{SMB}
 
 \subsection{ID Tracking}
 \label{ID Tracking}
+\textit{\textbf{Note:} It should be noted that this system is currently not implemented due to the poorly written way in which it was implemented.  The new concept for this ID tracking is to combine the MID/PID/TID/UID tuple tracking along with FID tracking to know what files are opened, by whom (i.e. tuple identification), and tracking of file sizes for files that are created with an initial size of zero.  The purpose for this tracking will be to track the habits of individual users.  While initially simplistic (drawing a connection between FIDs and tuple IDs) this aspect of the research will be developed in future work to be move inclusive.} \\
 All comands sent over the network are coupled to an identifying MID/PID/TID/UID tuple.  Since the only commands being examined are read or write commands, the identifying characteristic distinguishing a request command packet from a reponse command packet is the addition of an FID field with the sent packet.  It is examination of the packets for this FID field that allows the analysis code to distinguish between request \& response command pakets. The pairing is done by examining the identifying tuple and assuming that each tuple-identified system will only send one command at a time (awaiting a response before sending the next command of that same type).
 \\Following these process IDs is as a way to check for intercommunication between two or more processes.  In particular, we examine the compute time \& I/O (input/output) time (i.e. time spent in communication; between information arrivals).  This is done by examining the inter-arrival times (IAT) between the server \& the client. This is interesting because this information will give us a realistic sense of the data transit time of the network connections being used (e.g. ethernet, firewire, fibre, etc.).  Other pertinent information would be how often the client makes requests \& how often this event occurs per client process ID, identifiable by their PID/MID tuple.  One could also track the amount of sharing that is occurring between users.  The PID is the process identifier and the MID is the multiplex identifier, which is set by the client and is to be used for identifying groups of commands belonging to the same logical thread of operation on the client node.
 \\The per client process ID can be used to map the activity of given programs, thus allowing for finer granularity in the produced benchmark (e.g. control down to process types ran by individual client levels).  Other features of interest are the time between an open \& close, or how many opens/closes occurred in a window (e.g. a period of time).  This information could be used as a gauge of current day trends in filesystem usage \& its consequent taxation on the surrounding network.  It would also allow for greater insight on the r/w habits of users on a network along with a rough comparison between other registered events that occur on the network.  Lastly, though no less important, it would allow us to look at how many occurrences there are of shared files between different users.  One must note that the type of sharing may differ and there can be an issue of resource locking (e.g. shared files) that needs to be taken into account.  This is preliminarily addressed by monitoring any oplock flags that are sent for read \& writes.  This information also helps provide a preliminary mapping of how the network is used and what sort of traffic populates the communication.
@@ -189,7 +190,7 @@ \subsection{ID Tracking}
 
 \subsection{System Information and Predictions}
 \label{System Information and Predictions}
-The following is an explination the UITS system from which trace1 pulls it's packet traffic information along with predicitions of how the data will look along with the reasoning behind the shape of the information.
+It is important to detail out any benchmakring system so that when the results of one's research are being examined, they can be properly understood with the correct background information and understanding that lead the originating author to those results~\cite{Traeger2008}.  The following is an explination the UITS system from which trace1 pulls it's packet traffic information along with predicitions of how the data will look along with the reasoning behind the shape of the information.
 
 The UITS system consisnts of five Microsoft file server cluster nodes.  These blade servers are used to host home directories for all UConn users within a list of 88 departments.  These home directories are used to provide personal drive share space to facultiy, staff and students, along with at lest one small group of users.  Each server is capable of handling 1Gb/s of traffic in each direction (e.g. outbound and inbound traffic).  All together the five blade server system can in theory handle 10Gb/s of recieving and tranmitting data.  Some of these blade servers have local storage but the majority do not have any.  To the understanding of this paper, the blade servers are purposed purely for dealing with incoming traffic to the SAN storage nodes that sit behind them.  This system does not currently implement load balancing, instead the servers are set up to spread the traffic load among four of the active cluster nodes while the fifth node is passive and purposed to take over in the case that any of the other nodes go down (e.g. become inoperable or crash). \\
 
@@ -357,8 +358,9 @@ \subsection{Characterizations of Different Packet Types}
 \section{Conclusion}
 \label{Conclusion}
 \textit{Do the results show a continuation in the trend of traditional computer science workloads?}
-On the outset of this work it was believed that the data collected and analyzed would follow similar behavior patterns seen in previous papers \textit{Cite?}.  One of these oddities was that during the day one would see a greater increase in writes instead of reads.  The first assumption was that this is due to the system and how users interact with everything.
-I belive that the greater number of writes comes from students doing intro work for different classes in which students are constantly saving their work while reading instructions from a single source.  The early traffic is most likely due to professors preparing for classes.  One must also recall that this data itself has limited interpretation because only a small three week windows of infomration is being examined.  A better, and far more complete, image can be constructed using data captured from the following months, or more ideally, from an entire year's worth of data.  An other limitation of the results is the scope of the analysis is curbed and does not yet fully dissect all of the fields being passed in network communication.
+On the outset of this work it was believed that the data collected and analyzed would follow similar behavior patterns seen in previous papers ~\cite{Douceur1999, RuemmlerWilkes1993, Bolosky2007, EllardLedlie2003}.  The expectation is that certain aspect of the data, such as transfer/buffer sizes, will produce a bell-shape and be centralized around a larger size than previous papers' findings.  The number of I/O operations was expected to peak during noctural hours and fall during day time hours.  On top of that the expectation is that a greater number of reads will be seen over the course of a day, where the majority of writes will occur near the expected times of UITS' backup (e.g. 2am to 6am).  Granted, one must recall that the expectation is that any backup traffic that is seen will be due to a fetching of user's cahces inorder to preserve fidelity of any shared data.\\
+One oddity was that during the day one would see a greater increase in writes instead of reads.  The first assumption was that this is due to the system and how users interact with everything.
+I believe that the greater number of writes comes from students doing intro work for different classes in which students are constantly saving their work while reading instructions from a single source.  The early traffic is most likely due to professors preparing for classes.  One must also recall that this data itself has limited interpretation because only a small three week windows of infomration is being examined.  A better, and far more complete, image can be constructed using data captured from the following months, or more ideally, from an entire year's worth of data.  An other limitation of the results is the scope of the analysis is curbed and does not yet fully dissect all of the fields being passed in network communication.
 The future work of this project would be to
 \begin{itemize}
 	\item 1. Complete the dissection analysis to include all captured fields from the originating pcap files.  
@@ -426,6 +428,12 @@ \section{Conclusion}
 \bibitem{Ruemmler1993} Chris Ruemmler and John Wilkes, \emph{
 UNIX disk access patterns}, Winter USENIX 1993 (January 1993)
 
+\bibitem{RuemmlerWilkes1993} Chris Ruemmler and John Wilkes, \emph{
+A trace-driven analysis of working set sizes}, Hewlett-Packard Laboratories (5 April 1993)
+
+\bibitem{Bolosky2007} Nitin Agrawal and William J.~Bolosky and John R.~Douceur and Jacob R.~Lorch, \emph{
+A Five-Year Study of File-System Metadata}, ACM Transactions on Storage (TOS) Volume 3 Issue 3 (October 2007)
+
 \end{thebibliography}
 
 \end{document}