Updates to tables and some writing

paw10003 · Jan 19, 2020 · 75dc2b4 · 75dc2b4
1 parent beb7cb7
commit 75dc2b4
Showing 1 changed file with 25 additions and 22 deletions.
diff --git a/trackingPaper.tex b/trackingPaper.tex
@@ -249,7 +249,7 @@ \section{Background}
 \subsection{Server Message Block}
 The Server Message Block (SMB) is an application-layer network protocol mainly used for providing shared access to files, shared access to printers, shared access to serial ports, miscellaneous communications between nodes on the network, as well as providing an authenticated inter-process communication mechanism.  
 %The majority of usage for the SMB protocol involves Microsfot Windows.  Almost all implementations of SMB servers use NT Domain authentication to validate user-access to resources
-The SMB 1.0 protocol has been found to have high/significant impact on performance due to latency issues.  Monitoring revealed a high degree of ``chattiness'' and disregard of network latency between hosts.  Solutions to this problem were included in the updated SMB 2.0 protocol which decreases ``chattiness'' by reducing commands and sub-commands from over a hundred to nineteen. Additional changes, most significantly being increased security, were implemented in SMB 3.0 protocol (previously named SMB 2.2). % XXX citations for SMB specs for different versions?
+The SMB 1.0 protocol~\cite{SMB1Spec} has been found to have high/significant impact on performance due to latency issues.  Monitoring revealed a high degree of ``chattiness'' and disregard of network latency between hosts.  Solutions to this problem were included in the updated SMB 2.0 protocol which decreases ``chattiness'' by reducing commands and sub-commands from over a hundred to nineteen. Additional changes, most significantly being increased security, were implemented in SMB 3.0 protocol (previously named SMB 2.2)~\cite{SMB2Spec}. % XXX citations for SMB specs for different versions?
 %\textcolor{red}{\textbf{Add information about SMB 2.X/3?}}
 
 The rough order of communication for SMB session file interaction contains about five steps.  First is a negotiation where a Microsoft SMB Protocol dialect is determined.  Next a session is established to determine the share-level security.  After this the Tree ID (TID) is determined for the share to be connected to as well as a file ID (FID) for a file requested by the client.  From this establishment, I/O operations are performed using the FID given in the previous step.
@@ -258,11 +258,11 @@ \subsection{Server Message Block}
 The only data that needs to be tracked from the SMB traces are the UID (User ID) and TID for each session.  The SMB commands also include a MID (Multiplex ID) value that is used for tracking individual packets in each established session, and a PID (Process ID) that tracks the process running the command or series of commands on a host.  
 For the purposes of our tracing, we do not track the MID or PID information.
 
-Some nuances of SMB protocol I/O to note are:
+Some nuances of SMB protocol I/O to note are that SMB/SMB2 write requests are the actions that push bytes over the wire while for SMB/SMB2 read operations it is the response packets.
-\begin{itemize}
+%\begin{itemize}
-	\item SMB/SMB2 write request is the command that pushes bytes over the wire.  \textbf{Note:} the response packet only confirms their arrival and use (e.g. writing).
+%	\item SMB/SMB2 write request is the command that pushes bytes over the wire.  \textbf{Note:} the response packet only confirms their arrival and use (e.g. writing).
-	\item SMB/SMB2 read response is the command that pushes bytes over the wire.  \textbf{Note:} The request packet only asks for the data.
+%	\item SMB/SMB2 read response is the command that pushes bytes over the wire.  \textbf{Note:} The request packet only asks for the data.
-\end{itemize}
+%\end{itemize}
 % Make sure to detail here how exactly IAT/RT are each calculated
 
 \begin{figure}
@@ -321,7 +321,7 @@ \subsection{High-speed Packet Capture}
 
 The \texttt{.pcap} files from \texttt{tshark} do not lend themselves to easy data analysis, so we translate these files into the DataSeries~\cite{DataSeries} format.  HP developed DataSeries, an XML-based structured data format, that was designed to be self-descriptive, storage and access efficient, and highly flexible.
 The system for taking captured \texttt{.pcap} files and writing them into the DataSeries format (i.e. \texttt{.ds}) does so by first creating a structure (based on a pre-written determination of the data desired to capture).  Once the code builds this structure, it then reads through the capture traffic packets while dissecting and filling in the prepared structure with the desired information and format.  
-Due to the fundamental nature of this work, there is no need to track every piece of information that is exchanged, only that information which illuminates the behavior of the clients and servers that interact over the network (i.e. I/O transactions).  It should also be noted that all sensitive information being captured by the tracing system is hashed to protect the users whose information is examined by the tracing system.  Furthermore, the DataSeries file retains only the first XXX bytes of the SMB packet - enough to capture the SMB header information  that contains the I/O information we seek, while the body of the SMB traffic is not retained in order to better ensure security of the university's network communications.  It is worth noting that in the case of larger SMB headers, some information is lost, but this is a trade-off by the university to provide, on average, the correct sized SMB header but does lead to scenarios where some information may be captured incompletely.
+Due to the fundamental nature of this work, there is no need to track every piece of information that is exchanged, only that information which illuminates the behavior of the clients and servers that interact over the network (i.e. I/O transactions).  It should also be noted that all sensitive information being captured by the tracing system is hashed to protect the users whose information is examined by the tracing system.  Furthermore, the DataSeries file retains only the first 512 bytes of the SMB packet - enough to capture the SMB header information  that contains the I/O information we seek, while the body of the SMB traffic is not retained in order to better ensure security of the university's network communications.  It is worth noting that in the case of larger SMB headers, some information is lost, but this is a trade-off by the university to provide, on average, the correct sized SMB header but does lead to scenarios where some information may be captured incompletely.
 
 \subsection{DataSeries Analysis}
 
@@ -441,7 +441,7 @@ \subsection{I/O Data Request Sizes}
 %	\label{fig:IO-R+W}
 %\end{figure}
 Each SMB Read and Write command is associated with a data request size that indicates how many bytes are to be read or written as part of that command.
-Figures~\ref{fig:PDF-Bytes-Read} and~\ref{fig:PDF-Bytes-Write} show the probability density function (PDF) of the different sizes of bytes transferred for read and write I/O operations respectively.  The most noticeable aspect of these graphs are that the majority of bytes transferred for read and write operations is around 64 bytes.  It is worth noting that write I/O also have a larger number of very small transfer amounts.  This is unexpected in terms of the amount of data passed in a frame.  Our belief is that this is due to a large number of long term calculations/scripts being run that only require small but frequent updates. This assumption was later validated in part when examining the files transferred, as some were related to running scripts creating a large volume of files.
+Figures~\ref{fig:PDF-Bytes-Read} and~\ref{fig:PDF-Bytes-Write} show the probability density function (PDF) of the different sizes of bytes transferred for read and write I/O operations respectively.  The most noticeable aspect of these graphs are that the majority of bytes transferred for read and write operations is around 64 bytes.  It is worth noting that write I/O also have a larger number of very small transfer amounts.  This is unexpected in terms of the amount of data passed in a frame.  Our belief is that this is due to a large number of long term calculations/scripts being run that only require small but frequent updates. This assumption was later validated in part when examining the files transferred, as some were related to running scripts creating a large volume of files, however the more affirming finding was the behavior observed with common applications.  For example, it was seen that Microsoft Word would perform a large number of small reads at ever growing offsets.  This was interpreted as when a user is viewing a document over the network and Word would load the next few lines of text as the user scrolled down the document; causing ``loading times'' amid use.
 %This could also be attributed to simple reads relating to metadata\textcolor{red}{???}
 
 %\begin{figure}
@@ -486,7 +486,10 @@ \subsection{I/O Data Request Sizes}
 %	\label{fig:CDF-Bytes-RW}
 %\end{figure}
 Figures~\ref{fig:CDF-Bytes-Read} and~\ref{fig:CDF-Bytes-Write} show cumulative distribution functions (CDF) for bytes read and bytes written.  As can be seen, almost no read transfer sizes are less than 32 bytes, whereas 20\% writes below 32 bytes.  Table~\ref{fig:transferSizes} shows a tabular view of this data.  For reads, $34.97$\% are between 64 and 512 bytes, with another $28.86$\% at 64 byte request sizes.  There are a negligible percentage of read requests larger than 512.
-This read data is similar to what was observed by Leung et al.  Writes, on the other hand, are very different.  Leung et al. showed that $60$-$70$\% of writes were less than 4K in size and $90$\% less than 64K in size.  In our data, however, we see that only $11.16$\% of writes are less than 4K, $52.41$\% are 64K requests, and only $43.63$\% of requests are less than 64K writes.
+This read data differs from the size of reads observed by Leung et al. by a factor of 4 smaller.
+%This read data is similar to what was observed by Leung et al, however at an order of magnitude smaller.  
+Writes observed also differ from previous inspection of the protocol's usage.  % are very different.  
+Leung et al. showed that $60$-$70$\% of writes were less than 4K in size and $90$\% less than 64K in size.  In our data, however, we see that only $11.16$\% of writes are less than 4 bytes, $52.41$\% are 64 byte requests, and only $43.63$\% of requests are less than 64 byte writes.
 In the ten years since the last study, it is clear that writes have become significantly larger.  This may be explained by the fact that large files, and multiple files, are being written as standardized blocks more fitting to the larger data-sets and disk space available.  This could be as an effort to improve the fidelity of data across the network, allow for better realtime data consistency between client and backup locations, or could just be due to a large number of scripts being run that create and update a series of relatively smaller documents.
 %\textbf{Note: It seems like a change in the order of magnitude that is being passed per packet.  What would this indicate?}\textcolor{red}{Answer the question. Shorter reads/writes = better?}
 
@@ -794,7 +797,7 @@ \subsection{I/O Response Times}
 %\end{figure}
 
 \subsection{File Extensions}
-Tables~\ref{tab:top10SMB2FileExts} and~\ref{tab:commonSMB2FileExts} show a summary of the various file extensions that were seen within the three-week capture period.  The easier to understand is Table~\ref{tab:commonSMB2FileExts}, which illustrates the number of common file extensions (e.g. doc, ppt, xls, pdf) that were part of the data.  
+Tables~\ref{tab:top10SMB2FileExts} and~\ref{tab:commonSMB2FileExts} show a summary of the various file extensions that were seen within the SMB2 traffic during the three-week capture period; following the \textit{smb2.filename} field.  The easier to understand is Table~\ref{tab:commonSMB2FileExts}, which illustrates the number of common file extensions (e.g. doc, ppt, xls, pdf) that were part of the data.  
 %The greatest point of note is that the highest percentage is ``.xml'' with $0.54$\%, which is found to be surprising result.  
 Originally we expected that these common file extensions would be a much larger total of traffic.  However, as seen in Table~\ref{tab:commonSMB2FileExts}, these common file extensions were less than $2$\% of total files seen.  The top ten extensions that we saw (Table~\ref{tab:top10SMB2FileExts}) comprised approximately $84$\% of the total seen.  
 Furthermore, the majority of extensions are not readily identified.  
@@ -866,7 +869,7 @@ \subsection{Distribution Models}
 %	\item Read + Write command RT CDF, shown in Figure~\ref{fig:CDF-RT-RW}, has $R^2$ Value of $0.7837$.
 %\end{itemize}
 
-\begin{table}
+\begin{table*}
 \centering
 \begin{tabular}{|l|c|c|c||c|c|c|}
 \hline
@@ -875,23 +878,23 @@ \subsection{Distribution Models}
 CDF			  & \multicolumn{3}{|c|}{$\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\frac{x-\mu}{\sigma}}e^{\frac{-t^2}{2}}dt$}
 			  & \multicolumn{3}{|c|}{$1 - e^{(-x/\lambda)^k}$} \\ \hline \hline
 I/O Operation & $\mu$ & \multicolumn{2}{|c|}{$\sigma$} & $k$ & \multicolumn{2}{|c|}{$\lambda$} \\ \hline
-General IAT   &   786.72    &  \multicolumn{2}{|c|}{10329.6}        & 0.9031   &   \multicolumn{2}{|c|}{743.2075}        \\
+General IAT   &   786.72$\pm$2.79  &  \multicolumn{2}{|c|}{10329.6$\pm$2}        & 0.9031$\pm$0.0002   &   \multicolumn{2}{|c|}{743.2075$\pm$0.2341}        \\
-General RT    &   3606.66    &  \multicolumn{2}{|c|}{2.74931e+06}        & 0.5652   &   \multicolumn{2}{|c|}{980.9721}        \\
+General RT    &   3606.66$\pm$742.44    &  \multicolumn{2}{|c|}{2.74931e+06$\pm$530}        & 0.5652$\pm$0.0001  &   \multicolumn{2}{|c|}{980.9721$\pm$0.4975}        \\
-Read RT       &   44718.5    &  \multicolumn{2}{|c|}{1.72776e+07}        & 0.0004   &   \multicolumn{2}{|c|}{1.5517}        \\
+Read RT       &   44718.5$\pm$11715    &  \multicolumn{2}{|c|}{1.72776e+07$\pm$8300}        & 0.0004$\pm$0.0   &   \multicolumn{2}{|c|}{1.5517$\pm$0.0028}        \\
-Read IAT       &   24146    &  \multicolumn{2}{|c|}{1.189e+07}        & 0.0005   &   \multicolumn{2}{|c|}{3.8134}        \\
+Read IAT       &   24146$\pm$8062    &  \multicolumn{2}{|c|}{1.189e+07$\pm$5700}        & 0.0005$\pm$0.0   &   \multicolumn{2}{|c|}{3.8134$\pm$0.0057}        \\
-Write RT      &   379.823    &  \multicolumn{2}{|c|}{4021.72}        &  0.8569  &   \multicolumn{2}{|c|}{325.2856}        \\
+Write RT      &   379.823$\pm$2.809    &  \multicolumn{2}{|c|}{4021.72$\pm$1.99}        &  0.8569$\pm$0.0004  &   \multicolumn{2}{|c|}{325.2856$\pm$0.2804}        \\
-Write IAT      &    25785.7   &  \multicolumn{2}{|c|}{1.22491e+07}        &  0.0004  &   \multicolumn{2}{|c|}{3.1287}        \\
+Write IAT      &    25785.7$\pm$8556.6   &  \multicolumn{2}{|c|}{1.22491e+07$\pm$6000}        &  0.0004$\pm$0.0  &   \multicolumn{2}{|c|}{3.1287$\pm$0.0052}        \\
-Create RT     &   502.084    &  \multicolumn{2}{|c|}{21678.4}        & 0.9840 &   \multicolumn{2}{|c|}{496.9497}        \\
+Create RT     &   502.084$\pm$5.756    &  \multicolumn{2}{|c|}{21678.4$\pm$4.1}        & 0.9840$\pm$0.0002 &   \multicolumn{2}{|c|}{496.9497$\pm$0.1403}        \\
-Create IAT     &    3694.82   &  \multicolumn{2}{|c|}{4.65553e+06}        & 0.0008 &   \multicolumn{2}{|c|}{2.3504}        \\ \hline
+Create IAT     &    3694.82$\pm$1236.16   &  \multicolumn{2}{|c|}{4.65553e+06$\pm$880}        & 0.0008$\pm$0.0 &   \multicolumn{2}{|c|}{2.3504$\pm$0.0009}        \\ \hline
 %R+W RT	      &	  \textcolor{red}{0.8045}    &  \multicolumn{2}{|c|}{\textcolor{red}{0.2122}}        & \textcolor{red}{5.103} &   \multicolumn{2}{|c|}{\textcolor{red}{0.3937}}       \\ \hline
 %R+W Byte Transfer	  &	\textcolor{red}{0.3744}	&	\multicolumn{2}{|c|}{\textcolor{red}{0.2983}}	&	\textcolor{red}{1.153}	&	\multicolumn{2}{|c|}{\textcolor{red}{0.3937}}	\\
-Read Buff Transfer	&	82.9179	&	\multicolumn{2}{|c|}{1117.9}	&	1.0548	&	\multicolumn{2}{|c|}{85.2525}	\\
+Read Buff Transfer	&	82.9179$\pm$0.7641	&	\multicolumn{2}{|c|}{1117.9$\pm$0.54}	&	1.0548$\pm$0.0003	&	\multicolumn{2}{|c|}{85.2525$\pm$0.0575}	\\
-Write Buff Transfer	&	46.2507	&	\multicolumn{2}{|c|}{640.621}	&	1.0325	&	\multicolumn{2}{|c|}{46.8707}	\\	\hline
+Write Buff Transfer	&	46.2507$\pm$0.4475	&	\multicolumn{2}{|c|}{640.621$\pm$0.316}	&	1.0325$\pm$0.0004	&	\multicolumn{2}{|c|}{46.8707$\pm$0.0328}	\\	\hline
 \end{tabular}
 \caption{\label{tbl:curveFitting}Comparison of %$R^2$ 
 $\mu$, $\sigma$, $k$, and $\lambda$ Values for Curve Fitting Equations on CDF Graphs}
 \vspace{-3em}
-\end{table}
+\end{table*}
 
 %The graphs created by the dissection script are:
 %\begin{itemize}