diff --git a/TracingPaper.aux b/TracingPaper.aux index 3ee5c29..a173217 100644 --- a/TracingPaper.aux +++ b/TracingPaper.aux @@ -76,12 +76,13 @@ \newlabel{Trace Analysis}{{3}{5}} \@writefile{toc}{\contentsline {subsection}{\numberline {3.1}System Information and Predictions}{5}} \newlabel{System Information and Predictions}{{3.1}{5}} +\@writefile{toc}{\contentsline {subsection}{\numberline {3.2}Run Patterns}{5}} +\newlabel{Run Patterns}{{3.2}{5}} \citation{Douceur1999} \citation{RuemmlerWilkes1993} \citation{Bolosky2007} \citation{EllardLedlie2003} -\@writefile{toc}{\contentsline {subsection}{\numberline {3.2}Run Patterns}{6}} -\newlabel{Run Patterns}{{3.2}{6}} +\bibcite{Leung2008}{1} \@writefile{toc}{\contentsline {subsection}{\numberline {3.3}Locating Performance Bottlenecks}{6}} \newlabel{Locating Performance Bottlenecks}{{3.3}{6}} \@writefile{toc}{\contentsline {section}{\numberline {4}Intuition Confirm/Change}{6}} @@ -90,7 +91,6 @@ \newlabel{Characterizations of Different Packet Types}{{4.1}{6}} \@writefile{toc}{\contentsline {section}{\numberline {5}Conclusion}{6}} \newlabel{Conclusion}{{5}{6}} -\bibcite{Leung2008}{1} \bibcite{Ellard2003}{2} \bibcite{EllardLedlie2003}{3} \bibcite{Anderson2004}{4} diff --git a/TracingPaper.log b/TracingPaper.log index 56194f3..46efb4e 100644 --- a/TracingPaper.log +++ b/TracingPaper.log @@ -1,4 +1,4 @@ -This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9 64-bit) (preloaded format=pdflatex 2012.11.13) 17 MAR 2015 16:57 +This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9 64-bit) (preloaded format=pdflatex 2012.11.13) 20 MAR 2015 10:25 entering extended mode **C:/Users/rundeMT/Documents/UConn/TracingPaper/TracingPaper.tex (C:/Users/rundeMT/Documents/UConn/TracingPaper/TracingPaper.tex @@ -189,10 +189,6 @@ Underfull \hbox (badness 10000) in paragraph at lines 159--161 [] - -LaTeX Warning: Reference `Tracing System' on page 4 undefined on input line 164 -. - [4] Underfull \hbox (badness 10000) in paragraph at lines 207--208 @@ -206,9 +202,6 @@ File: omsptm.fd LaTeX Font Info: Font shape `OMS/ptm/m/n' in size <10> not available (Font) Font shape `OMS/cmsy/m/n' tried instead on input line 214. [5] -Underfull \vbox (badness 10000) has occurred while \output is active [] - - Underfull \hbox (badness 1077) in paragraph at lines 242--243 \OT1/ptm/m/n/10 not only pull out in-for-ma-tion per-ta-nent to the [] @@ -263,7 +256,7 @@ s/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmsy10.pfb> -Output written on TracingPaper.pdf (7 pages, 115073 bytes). +Output written on TracingPaper.pdf (7 pages, 114508 bytes). PDF statistics: 51 PDF objects out of 1000 (max. 8388607) 0 named destinations out of 1000 (max. 500000) diff --git a/TracingPaper.pdf b/TracingPaper.pdf index 38cbe3d..935a1fe 100644 Binary files a/TracingPaper.pdf and b/TracingPaper.pdf differ diff --git a/TracingPaper.synctex.gz b/TracingPaper.synctex.gz index 1d91e00..1ce2799 100644 Binary files a/TracingPaper.synctex.gz and b/TracingPaper.synctex.gz differ diff --git a/TracingPaper.tex b/TracingPaper.tex index 48047ac..a072937 100644 --- a/TracingPaper.tex +++ b/TracingPaper.tex @@ -161,15 +161,15 @@ An other concern was whether or not the system would be able to function optimal %About Challenges of system Challenges include: Interpretation of data, selective importance of information, arbitrary distribution of collected information. -One glaring challenge with building this tracing system was using code written by others; tshark \& DataSeries. While these programs are used within the tracing structure (which will be further examined in section ~\ref{Tracing System}) there are some issues when working with them. These issues ranged from data type limitations of the code to hash value \& checksum miscalculations due to encryption of specific fields/data. Attempt was made to dig and correct these issues, but they were so inherrent to the code being worked with that hacks and workaround were developed to minimize their effect. Other challenges centralize around selection, intrepretations and distribution scope of the data collected. Which fields should be filtered out from the original packet capture? What data is most prophetic to the form and function of the network being traced? What should be the scope, with respect to time, of the data being examined? Where will the most interesting information appear? As each obstacle was tackled, new information and ways of examining the data reveal themselves and with each development different alterations \& corrections are made. +One glaring challenge with building this tracing system was using code written by others; tshark \& DataSeries. While these programs are used within the tracing structure there are some issues when working with them. These issues ranged from data type limitations of the code to hash value \& checksum miscalculations due to encryption of specific fields/data. Attempt was made to dig and correct these issues, but they were so inherrent to the code being worked with that hacks and workaround were developed to minimize their effect. Other challenges centralize around selection, intrepretations and distribution scope of the data collected. Which fields should be filtered out from the original packet capture? What data is most prophetic to the form and function of the network being traced? What should be the scope, with respect to time, of the data being examined? Where will the most interesting information appear? As each obstacle was tackled, new information and ways of examining the data reveal themselves and with each development different alterations \& corrections are made. %About interpretation of data -Unfortunately benchmarks require that the person(s) creating the benchmark determines the interpretation of the data collected. To some degree these interpretations are easy to make (e.g. file system behavior \& user behavior~\cite{Leung2008}) while others are more complicated (e.g. temporal scaling of occurances of read/write), but in all scenarios there is still the requirment for human interpretation of the data. While having humans do the interpretations can be adventageous, a lack of all the "background" information can also lead to incorrectly interpreting the information. The hope of this project is that, despite the possible pitfall of incorrect data interpretation, we will be able to not only find out more about the workings and uses of a network but also produce a meaningful benchmark that will more accurately represent the low level aspects of large communication networks. +To some degree these interpretations are easy to make (e.g. file system behavior \& user behavior~\cite{Leung2008}) while others are more complicated (e.g. temporal scaling of occurances of read/write), but in all scenarios there is still the requirment for human interpretation of the data. While having humans do the interpretations can be adventageous, a lack of all the "background" information can also lead to incorrectly interpreting the information. %About scope of interpretation (affect of time on data seen) -Expanding on the previous point about interpretation of data, another human factor of benchmark creation is selecting which information is important or which information will give the greatest insight to the workings on the network. As stated earlier too little information can lead to incorrect conclusions being drawn about the workings on the system, while too much information (and not knowing which information is pertinent) can lead to erroneous conclusions as well. Thus there is a need to strike a balance between what information is important enough to capture (so as not to slow down the capturing process through needless processing) while still obtaining enough information to acquire the bigger picture of what is going on. Unfortunately every step of the tracing process requires a degree of human input to decide what network information will end up providing the most complete picture of the network communication and how to interpret that data into meaningful graphs and tables. This can lead to either finds around the focus of the work being done, or even lead to discoveries of other phenomena that end up having far more impact on the overall performance of the system~\cite{Ellard2003}. +Another human factor of benchmark creation is selecting which information is important or which information will give the greatest insight to the workings on the network. Too little information can lead to incorrect conclusions being drawn about the workings on the system, while too much information (and not knowing which information is pertinent) can lead to erroneous conclusions as well. There is a need to strike a balance between what information is important enough to capture (so as not to slow down the capturing process through needless processing) while still obtaining enough information to acquire the bigger picture of what is going on. Every step of the tracing process requires a degree of human input to decide what network information will end up providing the most complete picture of the network communication and how to interpret that data into meaningful graphs and tables. This can lead to either finds around the focus of the work being done, or even lead to discoveries of other phenomena that end up having far more impact on the overall performance of the system~\cite{Ellard2003}. -Even when all the information is collected and the most important data has been selected, there is still the issue of what lens should be used to view this information. Because the data being collected is from an active network, there will be differing activity depending on the time of day, week, and scholastic year. For example, although the first week or so of the year may contain a lot of traffic, this does not mean that trends of that period of time will occur for every week of the year (except perhaps the final week of the semester). The trends and habits of the network will change based on the time of year, time of day, and even depend on the exam schedule. For these reasons one will see different trends depending on the distribution of the data used for analysis, and the truly interesting examination of data requires looking at all different periods of time to see how all these factors play into the communications of the network. +Even when all the information is collected and the most important data has been selected, there is still the issue of what lens should be used to view this information. Because the data being collected is from an active network, there will be differing activity depending on the time of day, week, and scholastic year. For example, although the first week or so of the year may contain a lot of traffic, this does not mean that trends of that period of time will occur for every week of the year (except perhaps the final week of the semester). The trends and habits of the network will change based on the time of year, time of day, and even depend on the exam schedule. Truly interesting examination of data requires looking at all different periods of time to see how all these factors play into the communications of the network. \section{Trace Analysis} \label{Trace Analysis}