Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Updates to tables and some writing
  • Loading branch information
Duncan committed Jan 19, 2020
1 parent beb7cb7 commit 75dc2b4
Showing 1 changed file with 25 additions and 22 deletions.
47 changes: 25 additions & 22 deletions trackingPaper.tex
Expand Up @@ -249,7 +249,7 @@ As seen in previous trace work~\cite{leung2008measurement,roselli2000comparison,
\subsection{Server Message Block} \subsection{Server Message Block}
The Server Message Block (SMB) is an application-layer network protocol mainly used for providing shared access to files, shared access to printers, shared access to serial ports, miscellaneous communications between nodes on the network, as well as providing an authenticated inter-process communication mechanism. The Server Message Block (SMB) is an application-layer network protocol mainly used for providing shared access to files, shared access to printers, shared access to serial ports, miscellaneous communications between nodes on the network, as well as providing an authenticated inter-process communication mechanism.
%The majority of usage for the SMB protocol involves Microsfot Windows. Almost all implementations of SMB servers use NT Domain authentication to validate user-access to resources %The majority of usage for the SMB protocol involves Microsfot Windows. Almost all implementations of SMB servers use NT Domain authentication to validate user-access to resources
The SMB 1.0 protocol has been found to have high/significant impact on performance due to latency issues. Monitoring revealed a high degree of ``chattiness'' and disregard of network latency between hosts. Solutions to this problem were included in the updated SMB 2.0 protocol which decreases ``chattiness'' by reducing commands and sub-commands from over a hundred to nineteen. Additional changes, most significantly being increased security, were implemented in SMB 3.0 protocol (previously named SMB 2.2). % XXX citations for SMB specs for different versions? The SMB 1.0 protocol~\cite{SMB1Spec} has been found to have high/significant impact on performance due to latency issues. Monitoring revealed a high degree of ``chattiness'' and disregard of network latency between hosts. Solutions to this problem were included in the updated SMB 2.0 protocol which decreases ``chattiness'' by reducing commands and sub-commands from over a hundred to nineteen. Additional changes, most significantly being increased security, were implemented in SMB 3.0 protocol (previously named SMB 2.2)~\cite{SMB2Spec}. % XXX citations for SMB specs for different versions?
%\textcolor{red}{\textbf{Add information about SMB 2.X/3?}} %\textcolor{red}{\textbf{Add information about SMB 2.X/3?}}


The rough order of communication for SMB session file interaction contains about five steps. First is a negotiation where a Microsoft SMB Protocol dialect is determined. Next a session is established to determine the share-level security. After this the Tree ID (TID) is determined for the share to be connected to as well as a file ID (FID) for a file requested by the client. From this establishment, I/O operations are performed using the FID given in the previous step. The rough order of communication for SMB session file interaction contains about five steps. First is a negotiation where a Microsoft SMB Protocol dialect is determined. Next a session is established to determine the share-level security. After this the Tree ID (TID) is determined for the share to be connected to as well as a file ID (FID) for a file requested by the client. From this establishment, I/O operations are performed using the FID given in the previous step.
Expand All @@ -258,11 +258,11 @@ The rough order of communication for SMB session file interaction contains about
The only data that needs to be tracked from the SMB traces are the UID (User ID) and TID for each session. The SMB commands also include a MID (Multiplex ID) value that is used for tracking individual packets in each established session, and a PID (Process ID) that tracks the process running the command or series of commands on a host. The only data that needs to be tracked from the SMB traces are the UID (User ID) and TID for each session. The SMB commands also include a MID (Multiplex ID) value that is used for tracking individual packets in each established session, and a PID (Process ID) that tracks the process running the command or series of commands on a host.
For the purposes of our tracing, we do not track the MID or PID information. For the purposes of our tracing, we do not track the MID or PID information.


Some nuances of SMB protocol I/O to note are: Some nuances of SMB protocol I/O to note are that SMB/SMB2 write requests are the actions that push bytes over the wire while for SMB/SMB2 read operations it is the response packets.
\begin{itemize} %\begin{itemize}
\item SMB/SMB2 write request is the command that pushes bytes over the wire. \textbf{Note:} the response packet only confirms their arrival and use (e.g. writing). % \item SMB/SMB2 write request is the command that pushes bytes over the wire. \textbf{Note:} the response packet only confirms their arrival and use (e.g. writing).
\item SMB/SMB2 read response is the command that pushes bytes over the wire. \textbf{Note:} The request packet only asks for the data. % \item SMB/SMB2 read response is the command that pushes bytes over the wire. \textbf{Note:} The request packet only asks for the data.
\end{itemize} %\end{itemize}
% Make sure to detail here how exactly IAT/RT are each calculated % Make sure to detail here how exactly IAT/RT are each calculated


\begin{figure} \begin{figure}
Expand Down Expand Up @@ -321,7 +321,7 @@ The filesize used was in a ring buffer where each file captured was 64000 kB.


The \texttt{.pcap} files from \texttt{tshark} do not lend themselves to easy data analysis, so we translate these files into the DataSeries~\cite{DataSeries} format. HP developed DataSeries, an XML-based structured data format, that was designed to be self-descriptive, storage and access efficient, and highly flexible. The \texttt{.pcap} files from \texttt{tshark} do not lend themselves to easy data analysis, so we translate these files into the DataSeries~\cite{DataSeries} format. HP developed DataSeries, an XML-based structured data format, that was designed to be self-descriptive, storage and access efficient, and highly flexible.
The system for taking captured \texttt{.pcap} files and writing them into the DataSeries format (i.e. \texttt{.ds}) does so by first creating a structure (based on a pre-written determination of the data desired to capture). Once the code builds this structure, it then reads through the capture traffic packets while dissecting and filling in the prepared structure with the desired information and format. The system for taking captured \texttt{.pcap} files and writing them into the DataSeries format (i.e. \texttt{.ds}) does so by first creating a structure (based on a pre-written determination of the data desired to capture). Once the code builds this structure, it then reads through the capture traffic packets while dissecting and filling in the prepared structure with the desired information and format.
Due to the fundamental nature of this work, there is no need to track every piece of information that is exchanged, only that information which illuminates the behavior of the clients and servers that interact over the network (i.e. I/O transactions). It should also be noted that all sensitive information being captured by the tracing system is hashed to protect the users whose information is examined by the tracing system. Furthermore, the DataSeries file retains only the first XXX bytes of the SMB packet - enough to capture the SMB header information that contains the I/O information we seek, while the body of the SMB traffic is not retained in order to better ensure security of the university's network communications. It is worth noting that in the case of larger SMB headers, some information is lost, but this is a trade-off by the university to provide, on average, the correct sized SMB header but does lead to scenarios where some information may be captured incompletely. Due to the fundamental nature of this work, there is no need to track every piece of information that is exchanged, only that information which illuminates the behavior of the clients and servers that interact over the network (i.e. I/O transactions). It should also be noted that all sensitive information being captured by the tracing system is hashed to protect the users whose information is examined by the tracing system. Furthermore, the DataSeries file retains only the first 512 bytes of the SMB packet - enough to capture the SMB header information that contains the I/O information we seek, while the body of the SMB traffic is not retained in order to better ensure security of the university's network communications. It is worth noting that in the case of larger SMB headers, some information is lost, but this is a trade-off by the university to provide, on average, the correct sized SMB header but does lead to scenarios where some information may be captured incompletely.


\subsection{DataSeries Analysis} \subsection{DataSeries Analysis}


Expand Down Expand Up @@ -441,7 +441,7 @@ Oplock Break & \multicolumn{2}{|c|}{22397} & 0.008\% \\ \hline
% \label{fig:IO-R+W} % \label{fig:IO-R+W}
%\end{figure} %\end{figure}
Each SMB Read and Write command is associated with a data request size that indicates how many bytes are to be read or written as part of that command. Each SMB Read and Write command is associated with a data request size that indicates how many bytes are to be read or written as part of that command.
Figures~\ref{fig:PDF-Bytes-Read} and~\ref{fig:PDF-Bytes-Write} show the probability density function (PDF) of the different sizes of bytes transferred for read and write I/O operations respectively. The most noticeable aspect of these graphs are that the majority of bytes transferred for read and write operations is around 64 bytes. It is worth noting that write I/O also have a larger number of very small transfer amounts. This is unexpected in terms of the amount of data passed in a frame. Our belief is that this is due to a large number of long term calculations/scripts being run that only require small but frequent updates. This assumption was later validated in part when examining the files transferred, as some were related to running scripts creating a large volume of files. Figures~\ref{fig:PDF-Bytes-Read} and~\ref{fig:PDF-Bytes-Write} show the probability density function (PDF) of the different sizes of bytes transferred for read and write I/O operations respectively. The most noticeable aspect of these graphs are that the majority of bytes transferred for read and write operations is around 64 bytes. It is worth noting that write I/O also have a larger number of very small transfer amounts. This is unexpected in terms of the amount of data passed in a frame. Our belief is that this is due to a large number of long term calculations/scripts being run that only require small but frequent updates. This assumption was later validated in part when examining the files transferred, as some were related to running scripts creating a large volume of files, however the more affirming finding was the behavior observed with common applications. For example, it was seen that Microsoft Word would perform a large number of small reads at ever growing offsets. This was interpreted as when a user is viewing a document over the network and Word would load the next few lines of text as the user scrolled down the document; causing ``loading times'' amid use.
%This could also be attributed to simple reads relating to metadata\textcolor{red}{???} %This could also be attributed to simple reads relating to metadata\textcolor{red}{???}


%\begin{figure} %\begin{figure}
Expand Down Expand Up @@ -486,7 +486,10 @@ Figures~\ref{fig:PDF-Bytes-Read} and~\ref{fig:PDF-Bytes-Write} show the probabil
% \label{fig:CDF-Bytes-RW} % \label{fig:CDF-Bytes-RW}
%\end{figure} %\end{figure}
Figures~\ref{fig:CDF-Bytes-Read} and~\ref{fig:CDF-Bytes-Write} show cumulative distribution functions (CDF) for bytes read and bytes written. As can be seen, almost no read transfer sizes are less than 32 bytes, whereas 20\% writes below 32 bytes. Table~\ref{fig:transferSizes} shows a tabular view of this data. For reads, $34.97$\% are between 64 and 512 bytes, with another $28.86$\% at 64 byte request sizes. There are a negligible percentage of read requests larger than 512. Figures~\ref{fig:CDF-Bytes-Read} and~\ref{fig:CDF-Bytes-Write} show cumulative distribution functions (CDF) for bytes read and bytes written. As can be seen, almost no read transfer sizes are less than 32 bytes, whereas 20\% writes below 32 bytes. Table~\ref{fig:transferSizes} shows a tabular view of this data. For reads, $34.97$\% are between 64 and 512 bytes, with another $28.86$\% at 64 byte request sizes. There are a negligible percentage of read requests larger than 512.
This read data is similar to what was observed by Leung et al. Writes, on the other hand, are very different. Leung et al. showed that $60$-$70$\% of writes were less than 4K in size and $90$\% less than 64K in size. In our data, however, we see that only $11.16$\% of writes are less than 4K, $52.41$\% are 64K requests, and only $43.63$\% of requests are less than 64K writes. This read data differs from the size of reads observed by Leung et al. by a factor of 4 smaller.
%This read data is similar to what was observed by Leung et al, however at an order of magnitude smaller.
Writes observed also differ from previous inspection of the protocol's usage. % are very different.
Leung et al. showed that $60$-$70$\% of writes were less than 4K in size and $90$\% less than 64K in size. In our data, however, we see that only $11.16$\% of writes are less than 4 bytes, $52.41$\% are 64 byte requests, and only $43.63$\% of requests are less than 64 byte writes.
In the ten years since the last study, it is clear that writes have become significantly larger. This may be explained by the fact that large files, and multiple files, are being written as standardized blocks more fitting to the larger data-sets and disk space available. This could be as an effort to improve the fidelity of data across the network, allow for better realtime data consistency between client and backup locations, or could just be due to a large number of scripts being run that create and update a series of relatively smaller documents. In the ten years since the last study, it is clear that writes have become significantly larger. This may be explained by the fact that large files, and multiple files, are being written as standardized blocks more fitting to the larger data-sets and disk space available. This could be as an effort to improve the fidelity of data across the network, allow for better realtime data consistency between client and backup locations, or could just be due to a large number of scripts being run that create and update a series of relatively smaller documents.
%\textbf{Note: It seems like a change in the order of magnitude that is being passed per packet. What would this indicate?}\textcolor{red}{Answer the question. Shorter reads/writes = better?} %\textbf{Note: It seems like a change in the order of magnitude that is being passed per packet. What would this indicate?}\textcolor{red}{Answer the question. Shorter reads/writes = better?}


Expand Down Expand Up @@ -794,7 +797,7 @@ However, one should notice that the response time on read operations grows at a
%\end{figure} %\end{figure}


\subsection{File Extensions} \subsection{File Extensions}
Tables~\ref{tab:top10SMB2FileExts} and~\ref{tab:commonSMB2FileExts} show a summary of the various file extensions that were seen within the three-week capture period. The easier to understand is Table~\ref{tab:commonSMB2FileExts}, which illustrates the number of common file extensions (e.g. doc, ppt, xls, pdf) that were part of the data. Tables~\ref{tab:top10SMB2FileExts} and~\ref{tab:commonSMB2FileExts} show a summary of the various file extensions that were seen within the SMB2 traffic during the three-week capture period; following the \textit{smb2.filename} field. The easier to understand is Table~\ref{tab:commonSMB2FileExts}, which illustrates the number of common file extensions (e.g. doc, ppt, xls, pdf) that were part of the data.
%The greatest point of note is that the highest percentage is ``.xml'' with $0.54$\%, which is found to be surprising result. %The greatest point of note is that the highest percentage is ``.xml'' with $0.54$\%, which is found to be surprising result.
Originally we expected that these common file extensions would be a much larger total of traffic. However, as seen in Table~\ref{tab:commonSMB2FileExts}, these common file extensions were less than $2$\% of total files seen. The top ten extensions that we saw (Table~\ref{tab:top10SMB2FileExts}) comprised approximately $84$\% of the total seen. Originally we expected that these common file extensions would be a much larger total of traffic. However, as seen in Table~\ref{tab:commonSMB2FileExts}, these common file extensions were less than $2$\% of total files seen. The top ten extensions that we saw (Table~\ref{tab:top10SMB2FileExts}) comprised approximately $84$\% of the total seen.
Furthermore, the majority of extensions are not readily identified. Furthermore, the majority of extensions are not readily identified.
Expand Down Expand Up @@ -866,7 +869,7 @@ Table~\ref{tbl:curveFitting} shows best-fit parametrized distributions for the m
% \item Read + Write command RT CDF, shown in Figure~\ref{fig:CDF-RT-RW}, has $R^2$ Value of $0.7837$. % \item Read + Write command RT CDF, shown in Figure~\ref{fig:CDF-RT-RW}, has $R^2$ Value of $0.7837$.
%\end{itemize} %\end{itemize}


\begin{table} \begin{table*}
\centering \centering
\begin{tabular}{|l|c|c|c||c|c|c|} \begin{tabular}{|l|c|c|c||c|c|c|}
\hline \hline
Expand All @@ -875,23 +878,23 @@ Model & \multicolumn{3}{|c|}{Gaussian}
CDF & \multicolumn{3}{|c|}{$\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\frac{x-\mu}{\sigma}}e^{\frac{-t^2}{2}}dt$} CDF & \multicolumn{3}{|c|}{$\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\frac{x-\mu}{\sigma}}e^{\frac{-t^2}{2}}dt$}
& \multicolumn{3}{|c|}{$1 - e^{(-x/\lambda)^k}$} \\ \hline \hline & \multicolumn{3}{|c|}{$1 - e^{(-x/\lambda)^k}$} \\ \hline \hline
I/O Operation & $\mu$ & \multicolumn{2}{|c|}{$\sigma$} & $k$ & \multicolumn{2}{|c|}{$\lambda$} \\ \hline I/O Operation & $\mu$ & \multicolumn{2}{|c|}{$\sigma$} & $k$ & \multicolumn{2}{|c|}{$\lambda$} \\ \hline
General IAT & 786.72 & \multicolumn{2}{|c|}{10329.6} & 0.9031 & \multicolumn{2}{|c|}{743.2075} \\ General IAT & 786.72$\pm$2.79 & \multicolumn{2}{|c|}{10329.6$\pm$2} & 0.9031$\pm$0.0002 & \multicolumn{2}{|c|}{743.2075$\pm$0.2341} \\
General RT & 3606.66 & \multicolumn{2}{|c|}{2.74931e+06} & 0.5652 & \multicolumn{2}{|c|}{980.9721} \\ General RT & 3606.66$\pm$742.44 & \multicolumn{2}{|c|}{2.74931e+06$\pm$530} & 0.5652$\pm$0.0001 & \multicolumn{2}{|c|}{980.9721$\pm$0.4975} \\
Read RT & 44718.5 & \multicolumn{2}{|c|}{1.72776e+07} & 0.0004 & \multicolumn{2}{|c|}{1.5517} \\ Read RT & 44718.5$\pm$11715 & \multicolumn{2}{|c|}{1.72776e+07$\pm$8300} & 0.0004$\pm$0.0 & \multicolumn{2}{|c|}{1.5517$\pm$0.0028} \\
Read IAT & 24146 & \multicolumn{2}{|c|}{1.189e+07} & 0.0005 & \multicolumn{2}{|c|}{3.8134} \\ Read IAT & 24146$\pm$8062 & \multicolumn{2}{|c|}{1.189e+07$\pm$5700} & 0.0005$\pm$0.0 & \multicolumn{2}{|c|}{3.8134$\pm$0.0057} \\
Write RT & 379.823 & \multicolumn{2}{|c|}{4021.72} & 0.8569 & \multicolumn{2}{|c|}{325.2856} \\ Write RT & 379.823$\pm$2.809 & \multicolumn{2}{|c|}{4021.72$\pm$1.99} & 0.8569$\pm$0.0004 & \multicolumn{2}{|c|}{325.2856$\pm$0.2804} \\
Write IAT & 25785.7 & \multicolumn{2}{|c|}{1.22491e+07} & 0.0004 & \multicolumn{2}{|c|}{3.1287} \\ Write IAT & 25785.7$\pm$8556.6 & \multicolumn{2}{|c|}{1.22491e+07$\pm$6000} & 0.0004$\pm$0.0 & \multicolumn{2}{|c|}{3.1287$\pm$0.0052} \\
Create RT & 502.084 & \multicolumn{2}{|c|}{21678.4} & 0.9840 & \multicolumn{2}{|c|}{496.9497} \\ Create RT & 502.084$\pm$5.756 & \multicolumn{2}{|c|}{21678.4$\pm$4.1} & 0.9840$\pm$0.0002 & \multicolumn{2}{|c|}{496.9497$\pm$0.1403} \\
Create IAT & 3694.82 & \multicolumn{2}{|c|}{4.65553e+06} & 0.0008 & \multicolumn{2}{|c|}{2.3504} \\ \hline Create IAT & 3694.82$\pm$1236.16 & \multicolumn{2}{|c|}{4.65553e+06$\pm$880} & 0.0008$\pm$0.0 & \multicolumn{2}{|c|}{2.3504$\pm$0.0009} \\ \hline
%R+W RT & \textcolor{red}{0.8045} & \multicolumn{2}{|c|}{\textcolor{red}{0.2122}} & \textcolor{red}{5.103} & \multicolumn{2}{|c|}{\textcolor{red}{0.3937}} \\ \hline %R+W RT & \textcolor{red}{0.8045} & \multicolumn{2}{|c|}{\textcolor{red}{0.2122}} & \textcolor{red}{5.103} & \multicolumn{2}{|c|}{\textcolor{red}{0.3937}} \\ \hline
%R+W Byte Transfer & \textcolor{red}{0.3744} & \multicolumn{2}{|c|}{\textcolor{red}{0.2983}} & \textcolor{red}{1.153} & \multicolumn{2}{|c|}{\textcolor{red}{0.3937}} \\ %R+W Byte Transfer & \textcolor{red}{0.3744} & \multicolumn{2}{|c|}{\textcolor{red}{0.2983}} & \textcolor{red}{1.153} & \multicolumn{2}{|c|}{\textcolor{red}{0.3937}} \\
Read Buff Transfer & 82.9179 & \multicolumn{2}{|c|}{1117.9} & 1.0548 & \multicolumn{2}{|c|}{85.2525} \\ Read Buff Transfer & 82.9179$\pm$0.7641 & \multicolumn{2}{|c|}{1117.9$\pm$0.54} & 1.0548$\pm$0.0003 & \multicolumn{2}{|c|}{85.2525$\pm$0.0575} \\
Write Buff Transfer & 46.2507 & \multicolumn{2}{|c|}{640.621} & 1.0325 & \multicolumn{2}{|c|}{46.8707} \\ \hline Write Buff Transfer & 46.2507$\pm$0.4475 & \multicolumn{2}{|c|}{640.621$\pm$0.316} & 1.0325$\pm$0.0004 & \multicolumn{2}{|c|}{46.8707$\pm$0.0328} \\ \hline
\end{tabular} \end{tabular}
\caption{\label{tbl:curveFitting}Comparison of %$R^2$ \caption{\label{tbl:curveFitting}Comparison of %$R^2$
$\mu$, $\sigma$, $k$, and $\lambda$ Values for Curve Fitting Equations on CDF Graphs} $\mu$, $\sigma$, $k$, and $\lambda$ Values for Curve Fitting Equations on CDF Graphs}
\vspace{-3em} \vspace{-3em}
\end{table} \end{table*}


%The graphs created by the dissection script are: %The graphs created by the dissection script are:
%\begin{itemize} %\begin{itemize}
Expand Down

0 comments on commit 75dc2b4

Please sign in to comment.