diff --git a/images/smb_create_iats_cdf.png b/images/smb_create_iats_cdf.png index 3b2954b..70a7702 100644 Binary files a/images/smb_create_iats_cdf.png and b/images/smb_create_iats_cdf.png differ diff --git a/images/smb_create_iats_pdf.png b/images/smb_create_iats_pdf.png index ab854f1..6385ad5 100644 Binary files a/images/smb_create_iats_pdf.png and b/images/smb_create_iats_pdf.png differ diff --git a/images/smb_create_rts_cdf.png b/images/smb_create_rts_cdf.png index 9720b6d..1bca71e 100644 Binary files a/images/smb_create_rts_cdf.png and b/images/smb_create_rts_cdf.png differ diff --git a/images/smb_create_rts_pdf.png b/images/smb_create_rts_pdf.png index d00602a..3124e9a 100644 Binary files a/images/smb_create_rts_pdf.png and b/images/smb_create_rts_pdf.png differ diff --git a/images/smb_general_iats_cdf.png b/images/smb_general_iats_cdf.png index 2663321..ae683e2 100644 Binary files a/images/smb_general_iats_cdf.png and b/images/smb_general_iats_cdf.png differ diff --git a/images/smb_general_iats_pdf.png b/images/smb_general_iats_pdf.png index 43ae6b2..e2c70c3 100644 Binary files a/images/smb_general_iats_pdf.png and b/images/smb_general_iats_pdf.png differ diff --git a/images/smb_general_rts_cdf.png b/images/smb_general_rts_cdf.png index 892d3c7..eafa498 100644 Binary files a/images/smb_general_rts_cdf.png and b/images/smb_general_rts_cdf.png differ diff --git a/images/smb_general_rts_pdf.png b/images/smb_general_rts_pdf.png index aed19b2..f7573a0 100644 Binary files a/images/smb_general_rts_pdf.png and b/images/smb_general_rts_pdf.png differ diff --git a/images/smb_read_bytes_cdf.png b/images/smb_read_bytes_cdf.png index cc02d79..2a8f4ea 100644 Binary files a/images/smb_read_bytes_cdf.png and b/images/smb_read_bytes_cdf.png differ diff --git a/images/smb_read_bytes_pdf.png b/images/smb_read_bytes_pdf.png index 4e45a2b..221039b 100644 Binary files a/images/smb_read_bytes_pdf.png and b/images/smb_read_bytes_pdf.png differ diff --git a/images/smb_read_iats_cdf.png b/images/smb_read_iats_cdf.png index 9b56e66..590f8cd 100644 Binary files a/images/smb_read_iats_cdf.png and b/images/smb_read_iats_cdf.png differ diff --git a/images/smb_read_iats_pdf.png b/images/smb_read_iats_pdf.png index 023c835..c435a47 100644 Binary files a/images/smb_read_iats_pdf.png and b/images/smb_read_iats_pdf.png differ diff --git a/images/smb_read_rts_cdf.png b/images/smb_read_rts_cdf.png index 8f98351..334a8bf 100644 Binary files a/images/smb_read_rts_cdf.png and b/images/smb_read_rts_cdf.png differ diff --git a/images/smb_read_rts_pdf.png b/images/smb_read_rts_pdf.png index ca6c943..519a294 100644 Binary files a/images/smb_read_rts_pdf.png and b/images/smb_read_rts_pdf.png differ diff --git a/images/smb_write_bytes_cdf.png b/images/smb_write_bytes_cdf.png index 01e4dba..a3ae6f6 100644 Binary files a/images/smb_write_bytes_cdf.png and b/images/smb_write_bytes_cdf.png differ diff --git a/images/smb_write_bytes_pdf.png b/images/smb_write_bytes_pdf.png index 0195ad5..12dd811 100644 Binary files a/images/smb_write_bytes_pdf.png and b/images/smb_write_bytes_pdf.png differ diff --git a/images/smb_write_iats_cdf.png b/images/smb_write_iats_cdf.png index e015314..38e2a2c 100644 Binary files a/images/smb_write_iats_cdf.png and b/images/smb_write_iats_cdf.png differ diff --git a/images/smb_write_iats_pdf.png b/images/smb_write_iats_pdf.png index bcb3d1a..56148d5 100644 Binary files a/images/smb_write_iats_pdf.png and b/images/smb_write_iats_pdf.png differ diff --git a/images/smb_write_rts_cdf.png b/images/smb_write_rts_cdf.png index 0c504c4..e229fdb 100644 Binary files a/images/smb_write_rts_cdf.png and b/images/smb_write_rts_cdf.png differ diff --git a/images/smb_write_rts_pdf.png b/images/smb_write_rts_pdf.png index 2d8608c..110acd4 100644 Binary files a/images/smb_write_rts_pdf.png and b/images/smb_write_rts_pdf.png differ diff --git a/trackingPaper.tex b/trackingPaper.tex index c151d7e..48b6ba5 100644 --- a/trackingPaper.tex +++ b/trackingPaper.tex @@ -191,7 +191,7 @@ DataSeries was modified to filter specific SMB protocol fields along with the wr The DataSeries data format allowed us to create data analysis code that focuses on I/O events and ID tracking (TID/UID). The future vision for this information is to combine ID tracking with the OpLock information in order to track resource sharing of the different clients on the network. As well as using IP information to recreate communication in a larger network trace to establish a better benchmark. %Focus should be aboiut analysis and new traces -The contributions of this work are the new traces of SMB traffic over a larger university network as well as new analysis of this traffic. Our new examination of the captured data reveals that despite the streamlining of the CIFS/SMB protocol to be less "chatty", the majority of SMB communication is still metadata based I/O rather than actual data I/O. We found that read operations occur in greater numbers and cause a larger overall number of bytes to pass over the network. However, the average number of bytes transferred for each write I/O is greater than that of the average read operation. We also find that the current standard for modeling network I/O holds for the majority of operations, while a more representative model needs to be developed for reads. +The contributions of this work are the new traces of SMB traffic over a larger university network as well as new analysis of this traffic. Our new examination of the captured data reveals that despite the streamlining of the CIFS/SMB protocol to be less "chatty", the majority of SMB communication is still metadata based I/O rather than actual data I/O. We found that read operations occur in greater numbers and cause a larger overall number of bytes to pass over the network. Additionally, the average number of bytes transferred for each write I/O is smaller than that of the average read operation. \textcolor{red}{We also find that the current standard for modeling network I/O holds for the majority of operations, while a more representative model needs to be developed for reads.} \subsection{Related Work} In this section we discuss previous studies examining traces and testing that has advanced benchmark development. We summarize major works in trace study in Table~\ref{tbl:studySummary}. In addition we examine issues that occur with traces and the assumptions in their study. @@ -249,7 +249,7 @@ As seen in previous trace work~\cite{leung2008measurement,roselli2000comparison, \subsection{Server Message Block} The Server Message Block (SMB) is an application-layer network protocol mainly used for providing shared access to files, shared access to printers, shared access to serial ports, miscellaneous communications between nodes on the network, as well as providing an authenticated inter-process communication mechanism. %The majority of usage for the SMB protocol involves Microsfot Windows. Almost all implementations of SMB servers use NT Domain authentication to validate user-access to resources -The SMB 1.0 protocol~\cite{SMB1Spec} has been found to have high/significant impact on performance due to latency issues. Monitoring revealed a high degree of ``chattiness'' and disregard of network latency between hosts. Solutions to this problem were included in the updated SMB 2.0 protocol which decreases ``chattiness'' by reducing commands and sub-commands from over a hundred to nineteen. Additional changes, most significantly being increased security, were implemented in SMB 3.0 protocol (previously named SMB 2.2)~\cite{SMB2Spec}. % XXX citations for SMB specs for different versions? +The SMB 1.0 protocol~\cite{SMB1Spec} has been found to have high/significant impact on performance due to latency issues. Monitoring revealed a high degree of ``chattiness'' and disregard of network latency between hosts. Solutions to this problem were included in the updated SMB 2.0 protocol which decreases ``chattiness'' by reducing commands and sub-commands from over a hundred to nineteen~\cite{SMB2Spec}. Additional changes, most significantly being increased security, were implemented in SMB 3.0 protocol (previously named SMB 2.2). % XXX citations for SMB specs for different versions? %\textcolor{red}{\textbf{Add information about SMB 2.X/3?}} The rough order of communication for SMB session file interaction contains about five steps. First is a negotiation where a Microsoft SMB Protocol dialect is determined. Next a session is established to determine the share-level security. After this the Tree ID (TID) is determined for the share to be connected to as well as a file ID (FID) for a file requested by the client. From this establishment, I/O operations are performed using the FID given in the previous step. @@ -257,7 +257,7 @@ The rough order of communication for SMB session file interaction contains about % Information relating to the capturing of SMB information The only data that needs to be tracked from the SMB traces are the UID (User ID) and TID for each session. The SMB commands also include a MID (Multiplex ID) value that is used for tracking individual packets in each established session, and a PID (Process ID) that tracks the process running the command or series of commands on a host. For the purposes of our tracing, we do not track the MID or PID information. - +% Some nuances of SMB protocol I/O to note are that SMB/SMB2 write requests are the actions that push bytes over the wire while for SMB/SMB2 read operations it is the response packets. %\begin{itemize} % \item SMB/SMB2 write request is the command that pushes bytes over the wire. \textbf{Note:} the response packet only confirms their arrival and use (e.g. writing). @@ -332,6 +332,8 @@ Sessions are any communication where a valid UID and TID is used. %\subsection{Python Dissection} %The final step of our SMB/SMB2 traffic analysis system is the dissection of the \texttt{AnalysisModule} output using the pandas data analysis library~\cite{pandasPythonWebsite}. The pandas library is a python implementation similar to R. In this section of the analysis structure, the generated text file is tokenized and placed into specific DataFrames representing the data seen for each 15 minute period. The python code is used for the analysis and further dissection of the data. This is where the cumulative distribution frequency and graphing of collected data is performed. Basic analysis and aggregation is also performed in this part of the code. This analysis includes the summation of individual session I/O (e.g. reads, writes, creates) as well as the collection of inter arrival time data and response time data. +\textcolor{red}{Continue edits from here} + \section{Data Analysis} \label{sec:data-analysis}