Skip to content
Permalink
Browse files

Text coloring and edits

  • Loading branch information
Duncan
Duncan committed Apr 20, 2020
1 parent adee6b9 commit 689a230e03ad6a4bef4056a99d2973975179d1ec
Showing with 31 additions and 29 deletions.
  1. +31 −29 trackingPaper.tex
@@ -128,7 +128,7 @@
\begin{abstract}
Storage system traces are important for examining real-world applications, studying potential bottlenecks, as well as driving benchmarks in the evaluation of new system designs.
While file system traces have been well-studied in earlier work, it has been some time since the last examination of the SMB network file system.
The purpose of this work is to continue previous SMB studies to better understand the use of the protocol in a real-world production system in use at the University of Connecticut.
The purpose of this work is to continue previous SMB studies to better understand the use of the protocol in a real-world production system in use at \textcolor{red}{the University of Connecticut}.
The main contribution of our work is the exploration of I/O behavior in modern file system workloads as well as new examinations of the inter-arrival times and run times for I/O events.
We further investigate if the recent standard models for traffic remain accurate.
Our findings reveal interesting data relating to the number of read and write events. We notice that the number of read and write events is significantly less than creates and \textcolor{blue}{that average number of bytes exchanged per I/O has reduced.}
@@ -162,7 +162,7 @@ \section{Introduction}
recently, we took a look at its current implementation and use in a large university network.
%Due to the sensitivity of the captured information, we ensure that all sensitive information is hashed and that the original network captures are not saved.

Our study is based on network packet traces collected on the University of Connecticut's centralized storage facility over a period of three weeks in May 2019. This trace-driven analysis can help in the design of future storage products as well as providing data for future performance benchmarks.
Our study is based on network packet traces collected on \textcolor{red}{the University of Connecticut}'s centralized storage facility over a period of three weeks in May 2019. This trace-driven analysis can help in the design of future storage products as well as providing data for future performance benchmarks.
%Benchmarks are important for the purpose of developing technologies as well as taking accurate metrics. The reasoning behind this tracing capture work is to eventually better develop accurate benchmarks for network protocol evaluation.
Benchmarks allow for the stress testing of various aspects of a system (e.g. network, single system). Aggregate data analysis collected from traces can lead to the development of synthetic benchmarks. Traces can also expose systems patterns that can also be reflected in synthetic benchmarks. Finally, the traces themselves can drive system simulations that can be used to evaluate prospective storage architectures.

@@ -182,7 +182,7 @@ \section{Introduction}
% \end{enumerate}
%\end{itemize}

We created a new tracing system to collect data from the UConn storage network system. The tracing system was built around the high-speed PF\_RING packet capture system and required the use of proper hardware and software to handle incoming data. We also created a new trace capture format derived on the DataSeries structured data format developed by HP~\cite{DataSeries}.
We created a new tracing system to collect data from the \textcolor{red}{UConn} storage network system. The tracing system was built around the high-speed PF\_RING packet capture system and required the use of proper hardware and software to handle incoming data\textcolor{blue}{; however interaction with later third-party code did require re-design for processing of the information}. We also created a new trace capture format derived on the DataSeries structured data format developed by HP~\cite{DataSeries}.
% PF\_RING section
%The addition of PF\_RING lends to the tracing system by minimizing the copying of packets which, in turn, allows for more accurate timestamping of incoming traffic packets being captured ~\cite{Orosz2013,skopko2012loss,pfringWebsite,PFRINGMan}.
PF\_RING acts as a kernel module that aids in minimizing packet loss/timestamping issues by not passing packets through the kernel data structures~\cite{PFRINGMan}.
@@ -302,8 +302,8 @@ \section{Packet Capturing System}
\label{fig:captureTopology}
\end{figure*}

\subsection{UITS System Overview}
We collected traces from the University of Connecticut University Information Technology Services (UITS) centralized storage server. The UITS system consists of five Microsoft file server cluster nodes. These blade servers are used to host SMB file shares for various departments at UConn as well as personal drive share space for faculty, staff and students, along with at least one small group of users. Each server is capable of handling 1~Gb/s of traffic in each direction (e.g. outbound and inbound traffic). Altogether, the five-blade server system can in theory handle 5~Gb/s of data traffic in each direction.
\subsection{\textcolor{red}{UITS} System Overview}
We collected traces from \textcolor{red}{the University of Connecticut University Information Technology Services (UITS)} centralized storage server. The \textcolor{red}{UITS system} consists of five Microsoft file server cluster nodes. These blade servers are used to host SMB file shares for various departments at \textcolor{red}{UConn} as well as personal drive share space for faculty, staff and students, along with at least one small group of users. Each server is capable of handling 1~Gb/s of traffic in each direction (e.g. outbound and inbound traffic). Altogether, the five-blade server system can in theory handle 5~Gb/s of data traffic in each direction.
%Some of these blade servers have local storage but the majority do not have any.
The blade servers serve as SMB heads, but the actual storage is served by SAN storage nodes that sit behind them. This system does not currently implement load balancing. Instead, the servers are set up to spread the traffic load with a static distribution among four of the active cluster nodes while the fifth node is passive and purposed to take over in the case that any of the other nodes go down (e.g. become inoperable or crash).

@@ -338,6 +338,8 @@ \subsection{DataSeries Analysis}
This step also creates an easily digestible output that can be used to re-create all tuple information for SMB/SMB2 sessions that are witnessed over the entire time period.
Sessions are any communication where a valid UID and TID is used.

\textcolor{red}{Add information about if the code will be publically shared?}

%\subsection{Python Dissection}
%The final step of our SMB/SMB2 traffic analysis system is the dissection of the \texttt{AnalysisModule} output using the pandas data analysis library~\cite{pandasPythonWebsite}. The pandas library is a python implementation similar to R. In this section of the analysis structure, the generated text file is tokenized and placed into specific DataFrames representing the data seen for each 15 minute period. The python code is used for the analysis and further dissection of the data. This is where the cumulative distribution frequency and graphing of collected data is performed. Basic analysis and aggregation is also performed in this part of the code. This analysis includes the summation of individual session I/O (e.g. reads, writes, creates) as well as the collection of inter arrival time data and response time data.

@@ -354,15 +356,14 @@ \section{Data Analysis}
%\textcolor{red}{Maximum Sessions in 15-min Window} & 35 \\ %\hline
%Maximum Non-Session in 15-min Window & 2 \\ \hline
Total Days & 21 \\ %\hline
Total Sessions & 2413589 \\ %\hline
Total Sessions & 2,413,589 \\ %\hline
%Total Non-Sessions & 279006484 \\ \hline
Number of SMB Operations & 281419686 \\ %\hline
Number of Read I/Os & 8355557
\\ %\hline
Number of Write I/Os & 7872219 \\ %\hline
R:W I/O Ratio & 1.06 \\ %\hline
Number of Creates & 54486043 \\ %\hline
Number of General SMB Operations & 210705867 \\ \hline
Number of SMB Operations & 281,419,686 (100\%)\\ %\hline
Number of General SMB Operations & 210,705,867 (74.87\%) \\ %\hline
Number of Creates & 54,486,043 (19.36\%) \\ %\hline
Number of Read I/Os & 8,355,557 (2.97\%) \\ %\hline
Number of Write I/Os & 7,872,219 (2.80\%) \\ %\hline
R:W I/O Ratio & 1.06 \\ \hline
Total Data Read (GB) & 0.97 \\ %\hline
Total Data Written (GB) & 0.6 \\ %\hline
Average Read Size (B) & 144 \\ %\hline
@@ -394,14 +395,14 @@ \section{Data Analysis}
\begin{tabular}{|l|c|c|c|}
\hline
I/O Operation & SMB & SMB2 & Both \\ \hline
General Operations & 2418980 & 208286887 & 210705867 \\
General \% & 99.91\% & 74.66\% & 74.87\% \\ %\hline
Create Operations & 0 & 54486043 & 54486043 \\
Create \% & 0.00\% & 19.53\% & 19.36\% \\
Read Operations & 1931 & 8353626 & 8355557 \\
Read \% & 0.08\% & 2.99\%& 2.97\%\\
Write Operations & 303 & 7871916 & 7872219 \\
Write \% & 0.01\% & 2.82\% & 2.80\% \\
Create Operations & 0 & 54486043 & 54486043 \\
Create \% & 0.00\% & 19.53\% & 19.36\% \\
General Operations & 2418980 & 208286887 & 210705867 \\
General \% & 99.91\% & 74.66\% & 74.87\% \\ \hline
Write \% & 0.01\% & 2.82\% & 2.80\% \\ \hline
Combine Protocol Operations & 2421214 & 278998472 & 281419686 \\
Combined Protocols \% & 0.86\% & 99.14\% & 100\% \\ \hline
%\end{tabular}
@@ -412,22 +413,23 @@ \section{Data Analysis}
%\begin{tabular}{|l|c|c|}
\hline \hline
SMB2 General Operation & \multicolumn{2}{|c|}{Occurrences} & Percentage of Total \\ \hline
Negotiate & \multicolumn{2}{|c|}{25276447} & 9.06\% \\
Session Setup & \multicolumn{2}{|c|}{2041208} & 0.73\%\\
Logoff & \multicolumn{2}{|c|}{143592} & 0.05\% \\
Close & \multicolumn{2}{|c|}{80114256} & 28.71\% \\
Tree Connect & \multicolumn{2}{|c|}{48414491} & 17.35\% \\
Query Info & \multicolumn{2}{|c|}{27155528} & 9.73\% \\
Negotiate & \multicolumn{2}{|c|}{25276447} & 9.06\% \\
Tree Disconnect & \multicolumn{2}{|c|}{9773361} & 3.5\% \\
Close & \multicolumn{2}{|c|}{80114256} & 28.71\% \\
Flush & \multicolumn{2}{|c|}{972790} & 0.35\% \\
Lock & \multicolumn{2}{|c|}{1389250} & 0.5\% \\
IOCtl & \multicolumn{2}{|c|}{4475494} & 1.6\% \\
Cancel & \multicolumn{2}{|c|}{0} & 0.00\% \\
Echo & \multicolumn{2}{|c|}{4715} & 0.002\% \\
Set Info & \multicolumn{2}{|c|}{4447218} & 1.59\% \\
Query Directory & \multicolumn{2}{|c|}{3443491} & 1.23\% \\
Session Setup & \multicolumn{2}{|c|}{2041208} & 0.73\%\\
Lock & \multicolumn{2}{|c|}{1389250} & 0.5\% \\
Flush & \multicolumn{2}{|c|}{972790} & 0.35\% \\
Change Notify & \multicolumn{2}{|c|}{612850} & 0.22\% \\
Query Info & \multicolumn{2}{|c|}{27155528} & 9.73\% \\
Set Info & \multicolumn{2}{|c|}{4447218} & 1.59\% \\
Oplock Break & \multicolumn{2}{|c|}{22397} & 0.008\% \\ \hline
Logoff & \multicolumn{2}{|c|}{143592} & 0.05\% \\
Oplock Break & \multicolumn{2}{|c|}{22397} & 0.008\% \\
Echo & \multicolumn{2}{|c|}{4715} & 0.002\% \\
Cancel & \multicolumn{2}{|c|}{0} & 0.00\% \\
\hline
\end{tabular}
\caption{\label{tbl:SMBCommands}Percentage of SMB and SMB2 Protocol Commands from April 30th, 2019 to May 20th, 2019. Breakdown of General Operations for SMB2}
\vspace{-2em}

0 comments on commit 689a230

Please sign in to comment.
You can’t perform that action at this time.