diff --git a/TracingPaper.aux b/TracingPaper.aux index 72ee472..33f5f93 100644 --- a/TracingPaper.aux +++ b/TracingPaper.aux @@ -6,22 +6,20 @@ \citation{Dabir2008} \citation{Skopko2012} \citation{Anderson2004} -\@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}} -\newlabel{Introduction}{{1}{1}} -\@writefile{toc}{\contentsline {subsection}{\numberline {1.1}Issues with Tracing}{1}} -\newlabel{Issues with Tracing}{{1.1}{1}} -\citation{Anderson2004} -\citation{Traeger2008} -\citation{Vogels1999} -\citation{Dabir2008} \citation{Orosz2013} -\citation{Skopko2012} +\citation{PFRINGMan} \citation{Ellard2003} \citation{EllardLedlie2003} +\citation{Anderson2004} +\citation{Orosz2013} +\citation{Dabir2008} +\citation{Skopko2012} +\citation{Vogels1999} +\citation{Traeger2008} \citation{Ruemmler1993} \citation{Roselli2000} -\citation{Ruemmler1993} \citation{Traeger2008} +\citation{Ruemmler1993} \citation{Ellard2003} \citation{EllardLedlie2003} \citation{Douceur1999} @@ -34,6 +32,10 @@ \citation{Anderson2004} \citation{Roselli2000} \citation{Vogels1999} +\@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}} +\newlabel{Introduction}{{1}{1}} +\@writefile{toc}{\contentsline {subsection}{\numberline {1.1}Issues with Tracing}{1}} +\newlabel{Issues with Tracing}{{1.1}{1}} \citation{Orosz2013} \citation{Dabir2008} \citation{Skopko2012} @@ -43,20 +45,20 @@ \citation{Anderson2004} \@writefile{toc}{\contentsline {subsection}{\numberline {1.2}Previous Advances Due to Testing}{2}} \newlabel{Previous Advances Due to Testing}{{1.2}{2}} -\@writefile{toc}{\contentsline {subsection}{\numberline {1.3}The Need for a New Study}{2}} -\newlabel{The Need for a New Study}{{1.3}{2}} \@writefile{toc}{\contentsline {section}{\numberline {2}Methodology}{2}} \newlabel{Methodology}{{2}{2}} -\@writefile{toc}{\contentsline {subsection}{\numberline {2.1}System Limitations}{2}} -\newlabel{System Limitations}{{2.1}{2}} +\@writefile{toc}{\contentsline {subsection}{\numberline {2.1}Interesting Aspects of Research}{2}} +\newlabel{Interesting Aspects of Research}{{2.1}{2}} +\@writefile{toc}{\contentsline {subsection}{\numberline {2.2}System Limitations}{2}} +\newlabel{System Limitations}{{2.2}{2}} \citation{Leung2008} \citation{Ellard2003} -\@writefile{toc}{\contentsline {subsection}{\numberline {2.2}Main Challenges}{3}} -\newlabel{Main Challenges}{{2.2}{3}} -\@writefile{toc}{\contentsline {subsection}{\numberline {2.3}Interpretation of Data}{3}} -\newlabel{Interpretation of Data}{{2.3}{3}} -\@writefile{toc}{\contentsline {subsection}{\numberline {2.4}Scope of Interpretation}{3}} -\newlabel{Scope of Interpretation}{{2.4}{3}} +\@writefile{toc}{\contentsline {subsection}{\numberline {2.3}General Challenges}{3}} +\newlabel{General Challenges}{{2.3}{3}} +\@writefile{toc}{\contentsline {subsection}{\numberline {2.4}Interpretation of Data}{3}} +\newlabel{Interpretation of Data}{{2.4}{3}} +\@writefile{toc}{\contentsline {subsection}{\numberline {2.5}Scope of Interpretation}{3}} +\newlabel{Scope of Interpretation}{{2.5}{3}} \@writefile{toc}{\contentsline {section}{\numberline {3}Tracing System}{3}} \newlabel{Tracing System}{{3}{3}} \@writefile{toc}{\contentsline {subsection}{\numberline {3.1}Stages of Trace}{3}} @@ -106,9 +108,10 @@ \bibcite{Vogels1999}{13} \bibcite{Meyer2012}{14} \bibcite{PFRING}{15} -\bibcite{Traeger2008}{16} -\bibcite{Kavalanekar2009}{17} -\bibcite{Douceur1999}{18} -\bibcite{Ruemmler1993}{19} -\bibcite{RuemmlerWilkes1993}{20} -\bibcite{Bolosky2007}{21} +\bibcite{PFRINGMan}{16} +\bibcite{Traeger2008}{17} +\bibcite{Kavalanekar2009}{18} +\bibcite{Douceur1999}{19} +\bibcite{Ruemmler1993}{20} +\bibcite{RuemmlerWilkes1993}{21} +\bibcite{Bolosky2007}{22} diff --git a/TracingPaper.log b/TracingPaper.log index e05274f..cb8a803 100644 --- a/TracingPaper.log +++ b/TracingPaper.log @@ -1,4 +1,4 @@ -This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9 64-bit) (preloaded format=pdflatex 2012.11.13) 5 MAR 2015 13:22 +This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9 64-bit) (preloaded format=pdflatex 2012.11.13) 16 MAR 2015 13:35 entering extended mode **C:/Users/rundeMT/Documents/UConn/TracingPaper/TracingPaper.tex (C:/Users/rundeMT/Documents/UConn/TracingPaper/TracingPaper.tex @@ -142,81 +142,92 @@ LaTeX Font Info: External font `cmex10' loaded for size (Font) <5> on input line 76. LaTeX Font Info: Font shape `OT1/ptm/bx/n' in size <12> not available (Font) Font shape `OT1/ptm/b/n' tried instead on input line 78. +LaTeX Font Info: Font shape `OT1/ptm/bx/n' in size <10> not available +(Font) Font shape `OT1/ptm/b/n' tried instead on input line 79. + +Underfull \hbox (badness 10000) in paragraph at lines 79--84 + + [] + Missing character: There is no â in font ptmr7t! Missing character: There is no € in font ptmr7t! Missing character: There is no ś in font ptmr7t! Missing character: There is no â in font ptmr7t! Missing character: There is no € in font ptmr7t! Missing character: There is no ť in font ptmr7t! -LaTeX Font Info: Font shape `OT1/ptm/bx/n' in size <10> not available -(Font) Font shape `OT1/ptm/b/n' tried instead on input line 94. - [1{C:/ProgramData/MiKTeX/2.9/pdftex/config/pdftex.map} +Underfull \hbox (badness 10000) in paragraph at lines 105--109 -] -Underfull \hbox (badness 10000) in paragraph at lines 96--97 -[]\OT1/ptm/m/n/10 [CLOSING SEN-TENCES?]While per-form-ing a [] -Underfull \hbox (badness 1596) in paragraph at lines 96--97 -\OT1/ptm/m/n/10 ma-chines com-mu-ni-cat-ing with each other, and the - [] - +Underfull \vbox (badness 1436) has occurred while \output is active [] -Underfull \hbox (badness 2269) in paragraph at lines 102--103 -\OT1/ptm/m/n/10 no pa-per an re-ally see the whole scope of trac- - [] + [1{C:/ProgramData/MiKTeX/2.9/pdftex/config/pdftex.map} -[2] [3] -LaTeX Font Info: Font shape `OT1/ptm/bx/it' in size <10> not available -(Font) Font shape `OT1/ptm/b/it' tried instead on input line 157. - [4] -Underfull \hbox (badness 10000) in paragraph at lines 195--196 - [] +] +LaTeX Font Info: Try loading font information for OMS+ptm on input line 129. -LaTeX Font Info: Try loading font information for OMS+ptm on input line 202. ("C:\Program Files\MiKTeX 2.9\tex\latex\psnfss\omsptm.fd" File: omsptm.fd ) LaTeX Font Info: Font shape `OMS/ptm/m/n' in size <10> not available -(Font) Font shape `OMS/cmsy/m/n' tried instead on input line 202. - [5] -Underfull \vbox (badness 5578) has occurred while \output is active [] +(Font) Font shape `OMS/cmsy/m/n' tried instead on input line 129. + [2] [3] +LaTeX Font Info: Font shape `OT1/ptm/bx/it' in size <10> not available +(Font) Font shape `OT1/ptm/b/it' tried instead on input line 192. + [4] +Underfull \vbox (badness 10000) has occurred while \output is active [] -Underfull \hbox (badness 1077) in paragraph at lines 368--369 +Underfull \hbox (badness 10000) in paragraph at lines 230--231 + + [] + +[5] +Underfull \hbox (badness 1077) in paragraph at lines 403--404 \OT1/ptm/m/n/10 not only pull out in-for-ma-tion per-ta-nent to the [] [6] -Underfull \hbox (badness 10000) in paragraph at lines 402--403 +Underfull \hbox (badness 10000) in paragraph at lines 445--446 []\OT1/ptm/m/it/10 Common In-ter-net File Sys-tem (CIFS) Pro- [] -Underfull \hbox (badness 10000) in paragraph at lines 402--403 +Underfull \hbox (badness 10000) in paragraph at lines 445--446 \OT1/ptm/m/it/10 to-col\OT1/ptm/m/n/10 , urlhttp://msdn.microsoft.com/en- [] -Underfull \hbox (badness 10000) in paragraph at lines 404--405 +Underfull \hbox (badness 10000) in paragraph at lines 447--448 []\OT1/ptm/m/it/10 Server Mes-sage Block (SMB) Pro-to- [] -Underfull \hbox (badness 10000) in paragraph at lines 404--405 +Underfull \hbox (badness 10000) in paragraph at lines 447--448 \OT1/ptm/m/it/10 col\OT1/ptm/m/n/10 , urlhttp://msdn.microsoft.com/en- [] + +Underfull \hbox (badness 10000) in paragraph at lines 462--464 +[]\OT1/ptm/m/it/10 PF[]RING User Guide\OT1/ptm/m/n/10 , url- + [] + + +Overfull \hbox (61.33023pt too wide) in paragraph at lines 462--464 +\OT1/ptm/m/n/10 https://svn.ntop.org/svn/ntop/trunk/PF[]RING/doc/UsersGuide.pdf + + [] + [7] (C:\Users\rundeMT\Documents\UConn\TracingPaper\TracingPaper.aux) ) Here is how much of TeX's memory you used: - 1478 strings out of 494049 - 19927 string characters out of 3146058 - 79685 words of memory out of 3000000 - 4770 multiletter control sequences out of 15000+200000 + 1480 strings out of 494049 + 19972 string characters out of 3146058 + 79689 words of memory out of 3000000 + 4772 multiletter control sequences out of 15000+200000 20443 words of font info for 42 fonts, out of 3000000 for 9000 715 hyphenation exceptions out of 8191 34i,8n,21p,2172b,435s stack positions out of 5000i,500n,10000p,200000b,50000s @@ -227,7 +238,7 @@ type1/urw/courier/ucrr8a.pfb> -Output written on TracingPaper.pdf (7 pages, 112306 bytes). +Output written on TracingPaper.pdf (7 pages, 114523 bytes). PDF statistics: 51 PDF objects out of 1000 (max. 8388607) 0 named destinations out of 1000 (max. 500000) diff --git a/TracingPaper.pdf b/TracingPaper.pdf index b915b5c..6faf505 100644 Binary files a/TracingPaper.pdf and b/TracingPaper.pdf differ diff --git a/TracingPaper.synctex.gz b/TracingPaper.synctex.gz index fce47d9..03af1f1 100644 Binary files a/TracingPaper.synctex.gz and b/TracingPaper.synctex.gz differ diff --git a/TracingPaper.tex b/TracingPaper.tex index 1fec586..d435a7f 100644 --- a/TracingPaper.tex +++ b/TracingPaper.tex @@ -86,7 +86,7 @@ \section{Introduction} \label{Introduction} Traces are important for the purpose of developing and taking accurate metrics of current technologies. One must determine which aspects of the trace are most representative of what occurred during the tracing of the system, while figuring out which are represntative of the habits and patterns of said system. This discovered information is used to produce a benchmark, either by running a repeat of the captured traces or by using synthetic benchmark created from the trends detailed within the captured tracing data~\cite{Anderson2004}. -As seen in previous trace work done [Leund et al, ellard et al, roselli et al], the general perceptions of how computer systems are being used versus their initial purpose have allowed for great strides in eliminating actual bottlenecks rather than spending unnecessary time working on imagined bottlenecks. Leung's \textit{et. al.} work led to a series of obervations, from the fact that files are rarely re-opened to finding that read-write access patterns are more frequent ~\cite{Leung2008}. Without illumination of these underlying actions (e.g. read-write ratios, file death rates, file access rates) these issues can not be readily tackled. +As seen in previous trace work done [Leung et al, Ellard et al, Roselli et al], the general perceptions of how computer systems are being used versus their initial purpose have allowed for great strides in eliminating actual bottlenecks rather than spending unnecessary time working on imagined bottlenecks. Leung's \textit{et. al.} work led to a series of obervations, from the fact that files are rarely re-opened to finding that read-write access patterns are more frequent ~\cite{Leung2008}. Without illumination of these underlying actions (e.g. read-write ratios, file death rates, file access rates) these issues can not be readily tackled. \\ \textbf{NOT SURE IF KEEP OR NEEDED} I/O benchmarking, the process of comparing I/O systems by subjecting them to known workloads, is a widespread pratice in the storage industry and serves as the basis for purchasing decisions, performance tuning studies, and marketing campaigns ~\cite{Anderson2004}. @@ -95,22 +95,25 @@ \section{Introduction} \subsection{Issues with Tracing} \label{Issues with Tracing} \textbf{REWORD TO REMOVE MENTION OF BENCHMARKS}\\ -The majority of benchmarks are attempts to represent a known system and structure on which some “original” design/system was tested. While this is all well and good, there are many issues with this sort of approach; temporal \& spatial scaling concerns, timestamping and buffer copying, as well as driver operation for capturing packets~\cite{Orosz2013,Dabir2008,Skopko2012}. Each of these aspects contribute to the inital problems with dissection and analysis of the captured information. Inaccuracies in scheduling I/Os may result in as much as a factor of 3.5 differences in measured response time and factor of 26 in measured queue sizes; differences that are too large to ignore~\cite{Anderson2004}. Inaccuracies in packet timestamping can be caused due to overhead in generic kernel-time based solutions, as well as use of the kernel data structures ~\cite{Orosz2013,PFRINGMan}. +The majority of benchmarks are attempts to represent a known system and structure on which some “original” design/system was tested. While this is all well and good, there are many issues with this sort of approach; temporal \& spatial scaling concerns, timestamping and buffer copying, as well as driver operation for capturing packets~\cite{Orosz2013,Dabir2008,Skopko2012}. Each of these aspects contribute to the inital problems with dissection and analysis of the captured information. For example, inaccuracies in scheduling I/Os may result in as much as a factor of 3.5 differences in measured response time and factor of 26 in measured queue sizes; differences that are too large to ignore~\cite{Anderson2004}. -Temporal scaling refers to the need to account for the nuances of timing with respect to the run time of commands; consiting of computation, communication \& service. A temporally scalable benchmarking system would take these subtleties into account when expanding its operation across multiple machines in a network. While these temporal issues have been tackled for a single processor (and even somewhat for cases of multi-processor), these same timing issues are not properly handles when dealing with inter-network communication. Spatial scaling refers to the need to account for the nuances of expanding a benchmark to incorporate a number of (\textbf{n}) machines over a network. A system that properly incorporates spatial scaling is one that would be able to inccorporate communication (even in varying intensities) between all the machines on a system, thus stress testing all communicative actions and aspects (e.g. resource ocks, queueing) on the network. +Temporal scaling refers to the need to account for the nuances of timing with respect to the run time of commands; consiting of computation, communication \& service. A temporally scalable benchmarking system would take these subtleties into account when expanding its operation across multiple machines in a network. While these temporal issues have been tackled for a single processor (and even somewhat for cases of multi-processor), these same timing issues are not properly handles when dealing with inter-network communication. Inaccuracies in packet timestamping can be caused due to overhead in generic kernel-time based solutions, as well as use of the kernel data structures ~\cite{Orosz2013,PFRINGMan}.// +Spatial scaling refers to the need to account for the nuances of expanding a benchmark to incorporate a number of (\textbf{n}) machines over a network. A system that properly incorporates spatial scaling is one that would be able to incorporate communication (even in varying intensities) between all the machines on a system, thus stress testing all communicative actions and aspects (e.g. resource locks, queueing) on the network. \subsection{Previous Advances Due to Testing} \label{Previous Advances Due to Testing} -Previous tracing work has shown that one of the largest \& broadest hurdles to tackle is that benchmarks (and traces) must be tailored (to every extent) to the system being tested. There are always some generalizations taken into account but these generalizations can also be a major source of error~\cite{Anderson2004,Traeger2008,Vogels1999,Dabir2008,Orosz2013,Skopko2012,Ellard2003,EllardLedlie2003,Ruemmler1993}. To produce a benchmark with high fidelity one needs to understand not only the technology being used but how it is being implemented within the system to trace \& benchmark~\cite{Roselli2000,Ruemmler1993,Traeger2008}. All of these aspects will lend to the behavior of the system; from timing \& resource elements to how the managing software governs~\cite{Ellard2003,EllardLedlie2003,Douceur1999}. Further more, in pursuing this work one may find unexpected results and learn new things through examination~\cite{Leung2008,Ellard2003,Roselli2000}. - -\subsection{The Need for a New Study} -\label{The Need for a New Study} -Tracing collection and analysis has proved its worth in time from previous studies where can be seen important lessons pulled from the research; change in behavior of read/write events, overhead concerns originating in system implementation, bottlenecks in communication, and other revelations found in the traces \textbf{CITE PAPERS HERE}. GRAB TEXT FROM OTHER SECTION WRITTEN TO STATE WHY TRACES/BENCHMARKS MATTER. -Certain elements of this research are purposed to improve tracing knowledge along with general understanding of networked systems. One such element is the delevopment of tracking IO inter-arrival times along with processor times. This allows for more realistic replay of commands run due to more complete temporal considerations; time taken for computation and "travel" time. Another elements is the PID/MID/TID/UID tracking which allows for following command executions between a given client and server. This element paired with the previous development helps expand the understanding and replay-ability of temporal scaling. -Things to make sure to say: Need trace of CIFS behavior. +Tracing collection and analysis has proved its worth in time from previous studies where can be seen important lessons pulled from the research; change in behavior of read/write events, overhead concerns originating in system implementation, bottlenecks in communication, and other revelations found in the traces. Previous tracing work has shown that one of the largest \& broadest hurdles to tackle is that traces (and benchmarks) must be tailored (to every extent) to the system being tested. There are always some generalizations taken into account but these generalizations can also be a major source of error~\cite{Ellard2003,EllardLedlie2003,Anderson2004,Orosz2013,Dabir2008,Skopko2012,Vogels1999,Traeger2008,Ruemmler1993}. To produce a benchmark with high fidelity one needs to understand not only the technology being used but how it is being implemented within the system being traced \& benchmarked~\cite{Roselli2000,Traeger2008,Ruemmler1993}. All of these aspects will lend to the behavior of the system; from timing \& resource elements to how the managing software governs actions~\cite{Ellard2003,EllardLedlie2003,Douceur1999}. Further more, in pursuing this work one may find unexpected results and learn new things through examination~\cite{Leung2008,Ellard2003,Roselli2000}. \\ These studies are required in order to evaluate the development of technologies and methodologies along with furthering knowledge of different system aspects and capabilities. \\ -As has been pointed out by past work, the design of systems is usually guided by an understanding of the file system workloads and user behavior~\cite{Leung2008}. It is for that reason that new studies are constantly performed by the science community, from large scale studies to individual protocol studies~\cite{Leung2008,Ellard2003,Anderson2004,Roselli2000,Vogels1999}. Even within these studies, the information gleaned is only as meaningful as the considerations of how the data is handled. The following are issues that our work hopes to alleviate: there has been no large scale study done on networks for some time, there has been no study on CIFS(Common Internet File System)/SMB(Server Message Block) protocols for even longer, and most importantly these studies have not tackled lower level aspects of the trace, such as spacial \& temporal scaling idiosyncrasies of network communication. It is for these reasons that we have developed this tracing system and have developed new studies for lower level aspects of communication network. A detailed overview of the tracings and analysis system can be seen in section ~\ref{Tracing System}. The hope is to further the progress made with benchmarks \& tracing in the hope that it too will lend to improving and deepening the knowledge and understanding of these systems so that as a result the technology and methodology is bettered as a whole. +As has been pointed out by past work, the design of systems is usually guided by an understanding of the file system workloads and user behavior~\cite{Leung2008}. It is for that reason that new studies are constantly performed by the science community, from large scale studies to individual protocol studies~\cite{Leung2008,Ellard2003,Anderson2004,Roselli2000,Vogels1999}. Even within these studies, the information gleaned is only as meaningful as the considerations of how the data is handled. \\ + +A detailed overview of the tracings and analysis system can be seen in section ~\ref{Tracing System}. The hope is to further the progress made with benchmarks \& tracing in the hope that it too will lend to improving and deepening the knowledge and understanding of these systems so that as a result the technology and methodology is bettered as a whole. + +%\subsection{My Work} +%\label{My Work} +%Certain elements of this research are purposed to improve tracing knowledge along with general understanding of networked systems. One such element is the delevopment of tracking IO inter-arrival times along with processor times. This allows for more realistic replay of commands run due to more complete temporal considerations; time taken for computation and "travel" time. Another elements is the PID/MID/TID/UID tracking which allows for following command executions between a given client and server. This element paired with the previous development helps expand the understanding and replay-ability of temporal scaling. +%Things to make sure to say: Need trace of CIFS behavior. +%The following are issues that our work hopes to alleviate: there has been no large scale study done on networks for some time, there has been no study on CIFS(Common Internet File System)/SMB(Server Message Block) protocols for even longer, and most importantly these studies have not tackled lower level aspects of the trace, such as spacial \& temporal scaling idiosyncrasies of network communication. It is for these reasons that we have developed this tracing system and have developed new studies for lower level aspects of communication network. \\ \section{Methodology} \label{Methodology} @@ -118,34 +121,19 @@ \section{Methodology} \subsection{Interesting Aspects of Research} \label{Interesting Aspects of Research} \textbf{RENAME THIS SECTION SOMETHING MORE INTELLIGENT} \\ -Key components of the tracing system are as follows: -\begin{enumerate} -\item PF\_RING lends to the tracing system by minimizing copying of packets which allows for more accurate timestamping of incoming traffic packets being captured \textbf{CITE Orosoz and PF\_RING here}. - \begin{itemize} - \item PF\_RING license are free for students doing research (how licenses were obtained) - \item PF\_RING makes use of a memory ring allocated at creation time. Incoming packets are copied by the kernel module to the memory ring, and read by the user-space applications. This aids in minimizing packet loss/timestamping issues by not passing packets through the kernel data structures (straight from the PF\_RING user manual). - \end{itemize} -\item Setup of trace1 to intake upto 10Gb/s of traffic that comes from a network tap on the UITS system. - \begin{itemize} - \item The use of PF\_RING software aids in allowing for the 10Gb/s rate - \end{itemize} -\item Code written to convert CIFS protocol traffic into DataSeries format. Specific fields were chosen to be the interesting fields to be kept for analysis. It should be noted that this was done arbitrarily and changes/additions have been made as the value of certain fields are determined to be worth examining. -\item Code written to analyze the captured DataSeries format packets \& other aspects of the analysis. - \begin{itemize} - \item Packet dissection for R/W events. - \item ID tracking (PID/TID/MID/UID) - \item OpLock information. - \item \textbf{Note:} Future work - combine ID tracking with OpLock info to track resource sharing. - \end{itemize} -\end{enumerate} +Out of all the elements that make up the tracing system used for this research, there are a few key aspects that are worth covering due to their uniqueness within the system. These key components of the tracing system are the use of PF\_RING to mitigate timing and resource concerns, the use of proper hardware and software to handle incoming data, along with the tweaking of DataSeries code to create analysis tools for the captured data. +% PF\_RING section +The addition of PF\_RING lends to the tracing system by minimizing the copying of packets which, in turn, allows for more accurate timestamping of incoming traffic packets being captured ~\cite{Orosz2013,Skopko2012,PFRING,PFRINGMan}. PF\_RING acts as a kernel module which allows for kernel-based capture and sampling that limits packet loss and timestamping overhead leading to faster packet capture while efficiently preserving CPU cycles ~\cite{PFRING}. This aids in minimizing packet loss/timestamping issues by not passing packets through the kernel data structures~\cite{PFRINGMan}. The other reason PF\_RING is instrumental is that it functions with the 10Gb/s hardware that was installed into the Trace1 server; allowing for full throughput from the network tap on the UITS system. \\ +% DataSeries + Code section +The tweaks and code additions to the existing DataSeries work are filtering for specific CIFS/SMB protocol fields along with the writing of analysis tools to parse and dissect the captured packets. Specific fields were chosen to be the interesting fields to be kept for analysis. It should be noted that this was done arbitrarily and changes/additions have been made as the value of certain fields are determined to be worth examining. \textbf{ADD BIT ABOUT FIELDS' VALUE AND WORTH/IMPACT}. The code written for analysis of the captured DataSeries format packets focuses on read/write events, ID tracking (PID/MID/TID/UID), and OpLock information. The future vision for this information is to combine ID tracking with the OpLock information in order to track resource sharing of the different clients on the network. \subsection{System Limitations} \label{System Limitations} -When initially designing the tracing system used in this paper, different aspects were taken into account, such as space limitations of the tracing system, packet capture limitations (e.g. file size), and speed limitations of the hardware. The major space limitation that is dealt with in this work is the amount of space that the system has for storing the captured packets, including the resulting DataSeries-file compressions. One limitation encountered in the packet capture system deals with the functional pcap (packet capture file) size. The concern being that the pcap files only need to be held until they have been filtered for specific protocol information and then compressed using the DataSeries format, but still allow for room for the DataSeries files being created to be stored. Other limitation concerns came from the software and packages used to collect the network traffic data~\cite{Orosz2013,Dabir2008,Skopko2012}. These ranged from timestamp resolution provided by the tracing system's kernel~\cite{Orosz2013} to how the packet capturing drivers and programs (such as dumpcap and tshark) operate along with how many copies are performed and how often. These aspects were tackled by installing PF\_RING, which is a kernel module which allows for kernel-based capture and sampling with the idea that this will limit packets loss and timestamp overhead leading to faster packet capture while efficiently preserving CPU cycles~\cite{PFRING}. The speed limitations of the hardware are dictated by the hardware being used (e.g. GB capture interface) and the software that makes use of this hardware (e.g. PF\_RING). After all, our data can only be as accurate as the information being captured~\cite{Ellard2003,Anderson2004}. -Other concerns deal with the whether or not the system would be able to function optimally during periods of high network traffic. All apsects of the system, from the hardware to the software, have been altered to help combat these concerns and allow for the most accurate packet capturing possible. +When initially designing the tracing system used in this paper, different aspects were taken into account, such as space limitations of the tracing system, packet capture limitations (e.g. file size), and speed limitations of the hardware. One limitation encountered in the packet capture system deals with the functional pcap (packet capture file) size. The concern being that the pcap files only need to be held until they have been filtered for specific protocol information and then compressed using the DataSeries format, but still allow for room for the DataSeries files being created to be stored. Other limitation concerns came from the software and packages used to collect the network traffic data~\cite{Orosz2013,Dabir2008,Skopko2012}. These ranged from timestamp resolution provided by the tracing system's kernel~\cite{Orosz2013} to how the packet capturing drivers and programs (such as dumpcap and tshark) operate along with how many copies are performed and how often. The speed limitations of the hardware are dictated by the hardware being used (e.g. GB capture interface) and the software that makes use of this hardware (e.g. PF\_RING). After all, our data can only be as accurate as the information being captured~\cite{Ellard2003,Anderson2004}. +An other concern was whether or not the system would be able to function optimally during periods of high network traffic. All apsects of the system, from the hardware to the software, have been altered to help combat these concerns and allow for the most accurate packet capturing possible. -\subsection{Main Challenges} -\label{Main Challenges} +\subsection{General Challenges} +\label{General Challenges} Challenges include: Interpretation of data, selective importance of information, arbitrary distribution of collected information. One glaring challenge with building this tracing system was using code written by others; tshark \& DataSeries. While these programs are used within the tracing structure (which will be further examined in section ~\ref{Tracing System}) there are some issues when working with them. These issues ranged from data type limitations of the code to hash value \& checksum miscalculations due to encryption of specific fields/data. Attempt was made to dig and correct these issues, but they were so inherrent to the code being worked with that hacks and workaround were developed to minimize their effect. Other challenges centralize around selection, intrepretations and distribution scope of the data collected. Which fields should be filtered out from the original packet capture? What data is most prophetic to the form and function of the network being traced? What should be the scope, with respect to time, of the data being examined? Where will the most interesting information appear? As each obstacle was tackled, new information and ways of examining the data reveal themselves and with each development different alterations \& corrections are made. @@ -167,7 +155,7 @@ \subsection{Stages of Trace} \subsubsection{Capture} \label{Capture} -The packet capturing aspect of the tracing system is fairly straight forward. On top of the previously mentioned alterations to the system (e.g. PF\_RING), the capture of packets is done through the use of \textit{tshark}, \textit{pcap2ds}, and \textit{inotify} programs. The broad strokes are that incoming SMB/CIFS information comes from the university's network. All packet and transaction information is passed through a duplicating switch that then allows for the tracing system to capture these packet transactions over a 10 Gb port. The reason for using 10Gb hardware is to help ensure that the system is able to capture and \& all information on the network. These packets are then passed along to the \textit{tshark} packet collection program (which is the terminal version of wireshark) which records these packets into a cyclical capturing ring. A watchdog program (called \textit{inotify}) watches the directory where all of these packet-capture (pcap) files are being stored and as a new pcap file is completed \textit{inotify} passes the file to \textit{pcap2ds} along with what protocol is being examined (i.e. SMB). The \textit{pcap2ds} program reads through the given pcap files, filters out any data fields deemed important or interesting for the passed protocol type, then the results are written in DataSeries format and these compressed files are then collected and stored. Due to the fundamental nature of this work, there is no need to track every piece of information that is exchanged, only that information which illuminates the behavior of the clients \& servers that function over the network (e.g. read \& write transactions). It should also be noted that all sensitive information being captured by the tracing system in encrypted to proect the users whose information is be examined by this tracing system. +The packet capturing aspect of the tracing system is fairly straight forward. On top of the previously mentioned alterations to the system (e.g. PF\_RING), the capture of packets is done through the use of \textit{tshark}, \textit{pcap2ds}, and \textit{inotify} programs. The broad strokes are that incoming SMB/CIFS information comes from the university's network. All packet and transaction information is passed through a duplicating switch that then allows for the tracing system to capture these packet transactions over a 10 Gb port. The reason for using 10Gb hardware is to help ensure that the system is able to capture and all information on the network. These packets are passed along to the \textit{tshark} packet collection program which records these packets into a cyclical capturing ring. A watchdog program (\textit{inotify}) watches the directory where all of these packet-capture (pcap) files are being stored. As a new pcap file is completed \textit{inotify} passes the file to \textit{pcap2ds} along with what protocol is being examined (i.e. SMB). The \textit{pcap2ds} program reads through the given pcap files, filters out any data fields deemed important or interesting for the passed protocol type, then the results are written in DataSeries format; these compressed files are then collected and stored. Due to the fundamental nature of this work, there is no need to track every piece of information that is exchanged, only that information which illuminates the behavior of the clients \& servers that function over the network (e.g. read \& write transactions). It should also be noted that all sensitive information being captured by the tracing system in encrypted to proect the users whose information is be examined by this tracing system. \subsubsection{Collection} \label{Collection}