Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Text coloring and edits
  • Loading branch information
Duncan committed Apr 20, 2020
1 parent adee6b9 commit 689a230
Showing 1 changed file with 31 additions and 29 deletions.
60 changes: 31 additions & 29 deletions trackingPaper.tex
Expand Up @@ -128,7 +128,7 @@
\begin{abstract}
Storage system traces are important for examining real-world applications, studying potential bottlenecks, as well as driving benchmarks in the evaluation of new system designs.
While file system traces have been well-studied in earlier work, it has been some time since the last examination of the SMB network file system.
The purpose of this work is to continue previous SMB studies to better understand the use of the protocol in a real-world production system in use at the University of Connecticut.
The purpose of this work is to continue previous SMB studies to better understand the use of the protocol in a real-world production system in use at \textcolor{red}{the University of Connecticut}.
The main contribution of our work is the exploration of I/O behavior in modern file system workloads as well as new examinations of the inter-arrival times and run times for I/O events.
We further investigate if the recent standard models for traffic remain accurate.
Our findings reveal interesting data relating to the number of read and write events. We notice that the number of read and write events is significantly less than creates and \textcolor{blue}{that average number of bytes exchanged per I/O has reduced.}
Expand Down Expand Up @@ -162,7 +162,7 @@ Since an SMB-based trace study has not been undertaken
recently, we took a look at its current implementation and use in a large university network.
%Due to the sensitivity of the captured information, we ensure that all sensitive information is hashed and that the original network captures are not saved.

Our study is based on network packet traces collected on the University of Connecticut's centralized storage facility over a period of three weeks in May 2019. This trace-driven analysis can help in the design of future storage products as well as providing data for future performance benchmarks.
Our study is based on network packet traces collected on \textcolor{red}{the University of Connecticut}'s centralized storage facility over a period of three weeks in May 2019. This trace-driven analysis can help in the design of future storage products as well as providing data for future performance benchmarks.
%Benchmarks are important for the purpose of developing technologies as well as taking accurate metrics. The reasoning behind this tracing capture work is to eventually better develop accurate benchmarks for network protocol evaluation.
Benchmarks allow for the stress testing of various aspects of a system (e.g. network, single system). Aggregate data analysis collected from traces can lead to the development of synthetic benchmarks. Traces can also expose systems patterns that can also be reflected in synthetic benchmarks. Finally, the traces themselves can drive system simulations that can be used to evaluate prospective storage architectures.

Expand All @@ -182,7 +182,7 @@ Benchmarks allow for the stress testing of various aspects of a system (e.g. net
% \end{enumerate}
%\end{itemize}

We created a new tracing system to collect data from the UConn storage network system. The tracing system was built around the high-speed PF\_RING packet capture system and required the use of proper hardware and software to handle incoming data. We also created a new trace capture format derived on the DataSeries structured data format developed by HP~\cite{DataSeries}.
We created a new tracing system to collect data from the \textcolor{red}{UConn} storage network system. The tracing system was built around the high-speed PF\_RING packet capture system and required the use of proper hardware and software to handle incoming data\textcolor{blue}{; however interaction with later third-party code did require re-design for processing of the information}. We also created a new trace capture format derived on the DataSeries structured data format developed by HP~\cite{DataSeries}.
% PF\_RING section
%The addition of PF\_RING lends to the tracing system by minimizing the copying of packets which, in turn, allows for more accurate timestamping of incoming traffic packets being captured ~\cite{Orosz2013,skopko2012loss,pfringWebsite,PFRINGMan}.
PF\_RING acts as a kernel module that aids in minimizing packet loss/timestamping issues by not passing packets through the kernel data structures~\cite{PFRINGMan}.
Expand Down Expand Up @@ -302,8 +302,8 @@ In this section, we describe the packet capturing system as well as decisions ma
\label{fig:captureTopology}
\end{figure*}

\subsection{UITS System Overview}
We collected traces from the University of Connecticut University Information Technology Services (UITS) centralized storage server. The UITS system consists of five Microsoft file server cluster nodes. These blade servers are used to host SMB file shares for various departments at UConn as well as personal drive share space for faculty, staff and students, along with at least one small group of users. Each server is capable of handling 1~Gb/s of traffic in each direction (e.g. outbound and inbound traffic). Altogether, the five-blade server system can in theory handle 5~Gb/s of data traffic in each direction.
\subsection{\textcolor{red}{UITS} System Overview}
We collected traces from \textcolor{red}{the University of Connecticut University Information Technology Services (UITS)} centralized storage server. The \textcolor{red}{UITS system} consists of five Microsoft file server cluster nodes. These blade servers are used to host SMB file shares for various departments at \textcolor{red}{UConn} as well as personal drive share space for faculty, staff and students, along with at least one small group of users. Each server is capable of handling 1~Gb/s of traffic in each direction (e.g. outbound and inbound traffic). Altogether, the five-blade server system can in theory handle 5~Gb/s of data traffic in each direction.
%Some of these blade servers have local storage but the majority do not have any.
The blade servers serve as SMB heads, but the actual storage is served by SAN storage nodes that sit behind them. This system does not currently implement load balancing. Instead, the servers are set up to spread the traffic load with a static distribution among four of the active cluster nodes while the fifth node is passive and purposed to take over in the case that any of the other nodes go down (e.g. become inoperable or crash).

Expand Down Expand Up @@ -338,6 +338,8 @@ Building upon existing code for the interpretation and dissection of the capture
This step also creates an easily digestible output that can be used to re-create all tuple information for SMB/SMB2 sessions that are witnessed over the entire time period.
Sessions are any communication where a valid UID and TID is used.

\textcolor{red}{Add information about if the code will be publically shared?}

%\subsection{Python Dissection}
%The final step of our SMB/SMB2 traffic analysis system is the dissection of the \texttt{AnalysisModule} output using the pandas data analysis library~\cite{pandasPythonWebsite}. The pandas library is a python implementation similar to R. In this section of the analysis structure, the generated text file is tokenized and placed into specific DataFrames representing the data seen for each 15 minute period. The python code is used for the analysis and further dissection of the data. This is where the cumulative distribution frequency and graphing of collected data is performed. Basic analysis and aggregation is also performed in this part of the code. This analysis includes the summation of individual session I/O (e.g. reads, writes, creates) as well as the collection of inter arrival time data and response time data.

Expand All @@ -354,15 +356,14 @@ Sessions are any communication where a valid UID and TID is used.
%\textcolor{red}{Maximum Sessions in 15-min Window} & 35 \\ %\hline
%Maximum Non-Session in 15-min Window & 2 \\ \hline
Total Days & 21 \\ %\hline
Total Sessions & 2413589 \\ %\hline
Total Sessions & 2,413,589 \\ %\hline
%Total Non-Sessions & 279006484 \\ \hline
Number of SMB Operations & 281419686 \\ %\hline
Number of Read I/Os & 8355557
\\ %\hline
Number of Write I/Os & 7872219 \\ %\hline
R:W I/O Ratio & 1.06 \\ %\hline
Number of Creates & 54486043 \\ %\hline
Number of General SMB Operations & 210705867 \\ \hline
Number of SMB Operations & 281,419,686 (100\%)\\ %\hline
Number of General SMB Operations & 210,705,867 (74.87\%) \\ %\hline
Number of Creates & 54,486,043 (19.36\%) \\ %\hline
Number of Read I/Os & 8,355,557 (2.97\%) \\ %\hline
Number of Write I/Os & 7,872,219 (2.80\%) \\ %\hline
R:W I/O Ratio & 1.06 \\ \hline
Total Data Read (GB) & 0.97 \\ %\hline
Total Data Written (GB) & 0.6 \\ %\hline
Average Read Size (B) & 144 \\ %\hline
Expand Down Expand Up @@ -394,14 +395,14 @@ The latter two relate to metadata information of shares and files accessed, howe
\begin{tabular}{|l|c|c|c|}
\hline
I/O Operation & SMB & SMB2 & Both \\ \hline
General Operations & 2418980 & 208286887 & 210705867 \\
General \% & 99.91\% & 74.66\% & 74.87\% \\ %\hline
Create Operations & 0 & 54486043 & 54486043 \\
Create \% & 0.00\% & 19.53\% & 19.36\% \\
Read Operations & 1931 & 8353626 & 8355557 \\
Read \% & 0.08\% & 2.99\%& 2.97\%\\
Write Operations & 303 & 7871916 & 7872219 \\
Write \% & 0.01\% & 2.82\% & 2.80\% \\
Create Operations & 0 & 54486043 & 54486043 \\
Create \% & 0.00\% & 19.53\% & 19.36\% \\
General Operations & 2418980 & 208286887 & 210705867 \\
General \% & 99.91\% & 74.66\% & 74.87\% \\ \hline
Write \% & 0.01\% & 2.82\% & 2.80\% \\ \hline
Combine Protocol Operations & 2421214 & 278998472 & 281419686 \\
Combined Protocols \% & 0.86\% & 99.14\% & 100\% \\ \hline
%\end{tabular}
Expand All @@ -412,22 +413,23 @@ Combined Protocols \% & 0.86\% & 99.14\% & 100\% \\ \hline
%\begin{tabular}{|l|c|c|}
\hline \hline
SMB2 General Operation & \multicolumn{2}{|c|}{Occurrences} & Percentage of Total \\ \hline
Negotiate & \multicolumn{2}{|c|}{25276447} & 9.06\% \\
Session Setup & \multicolumn{2}{|c|}{2041208} & 0.73\%\\
Logoff & \multicolumn{2}{|c|}{143592} & 0.05\% \\
Close & \multicolumn{2}{|c|}{80114256} & 28.71\% \\
Tree Connect & \multicolumn{2}{|c|}{48414491} & 17.35\% \\
Query Info & \multicolumn{2}{|c|}{27155528} & 9.73\% \\
Negotiate & \multicolumn{2}{|c|}{25276447} & 9.06\% \\
Tree Disconnect & \multicolumn{2}{|c|}{9773361} & 3.5\% \\
Close & \multicolumn{2}{|c|}{80114256} & 28.71\% \\
Flush & \multicolumn{2}{|c|}{972790} & 0.35\% \\
Lock & \multicolumn{2}{|c|}{1389250} & 0.5\% \\
IOCtl & \multicolumn{2}{|c|}{4475494} & 1.6\% \\
Cancel & \multicolumn{2}{|c|}{0} & 0.00\% \\
Echo & \multicolumn{2}{|c|}{4715} & 0.002\% \\
Set Info & \multicolumn{2}{|c|}{4447218} & 1.59\% \\
Query Directory & \multicolumn{2}{|c|}{3443491} & 1.23\% \\
Session Setup & \multicolumn{2}{|c|}{2041208} & 0.73\%\\
Lock & \multicolumn{2}{|c|}{1389250} & 0.5\% \\
Flush & \multicolumn{2}{|c|}{972790} & 0.35\% \\
Change Notify & \multicolumn{2}{|c|}{612850} & 0.22\% \\
Query Info & \multicolumn{2}{|c|}{27155528} & 9.73\% \\
Set Info & \multicolumn{2}{|c|}{4447218} & 1.59\% \\
Oplock Break & \multicolumn{2}{|c|}{22397} & 0.008\% \\ \hline
Logoff & \multicolumn{2}{|c|}{143592} & 0.05\% \\
Oplock Break & \multicolumn{2}{|c|}{22397} & 0.008\% \\
Echo & \multicolumn{2}{|c|}{4715} & 0.002\% \\
Cancel & \multicolumn{2}{|c|}{0} & 0.00\% \\
\hline
\end{tabular}
\caption{\label{tbl:SMBCommands}Percentage of SMB and SMB2 Protocol Commands from April 30th, 2019 to May 20th, 2019. Breakdown of General Operations for SMB2}
\vspace{-2em}
Expand Down

0 comments on commit 689a230

Please sign in to comment.