Skip to content
Permalink
Browse files

Push of merge; question on x-axis

  • Loading branch information
Duncan
Duncan committed Apr 23, 2020
1 parent 9e7f302 commit 428ce4b9649360fe55330e69dfc82acaef298157
Showing with 17 additions and 38 deletions.
  1. +17 −38 trackingPaper.tex
@@ -261,7 +261,7 @@ \section{Background}
The SMB 1.0 protocol~\cite{SMB1Spec} has been found to have high/significant impact on performance due to latency issues. Monitoring revealed a high degree of ``chattiness'' and disregard of network latency between hosts. Solutions to this problem were included in the updated SMB 2.0 protocol which decreases ``chattiness'' by reducing commands and sub-commands from over a hundred to nineteen~\cite{SMB2Spec}. Additional changes, most significantly being increased security, were implemented in SMB 3.0 protocol (previously named SMB 2.2). % XXX citations for SMB specs for different versions?
%\textcolor{red}{\textbf{Add information about SMB 2.X/3?}}

The rough order of communication for SMB session file interaction contains five steps. First is a negotiation where a Microsoft SMB Protocol dialect is determined. Next, a session is established to determine the share-level security. After this, the Tree ID (TID) is determined for the share to be connected to as well as a file ID (FID) for a file requested by the client. From this establishment, I/O operations are performed using the FID given in the previous step. \textcolor{green}{The SMB packet header is shown in Figure~\ref{fig:smbPacket}.}
The rough order of communication for SMB session file interaction contains five steps. First is a negotiation where a Microsoft SMB Protocol dialect is determined. Next, a session is established to determine the share-level security. After this, the Tree ID (TID) is determined for the share to be connected to as well as a file ID (FID) for a file requested by the client. From this establishment, I/O operations are performed using the FID given in the previous step. %\textcolor{green}{The SMB packet header is shown in Figure~\ref{fig:smbPacket}.}

% Information relating to the capturing of SMB information
The only data that needs to be tracked from the SMB traces are the UID (User ID) and TID for each session. The SMB commands also include a MID (Multiplex ID) value that is used for tracking individual packets in each established session, and a PID (Process ID) that tracks the process running the command or series of commands on a host.
@@ -276,37 +276,11 @@ \section{Background}

%\textcolor{red}{Add writing about the type of packets used by SMB. Include information about the response time of R/W/C/General (to introduce them formally; not sure what this means.... Also can bring up the relation between close and other requests.}

<<<<<<< Updated upstream
\textcolor{blue}{It is worth noting that for the SMB2 protocol, the close request packet is used by clients to close instances of file that \textcolor{green}{were opened} with a previous create request packet.}

\begin{figure}
\includegraphics[width=0.5\textwidth]{./images/smbPacket.jpg}
\caption{SMB Packet \textcolor{green}{Header Format}}
\label{fig:smbPacket}
\end{figure}

\subsection{Issues with Tracing}
\label{Issues with Tracing}
There are three general approaches to creating a benchmark based on a trade-off between experimental complexity and resemblance to the original application. (1) Connect the system to a production test environment, run the application, and measure the application metrics. (2) Collect traces from running the application and replay them (after possible modification) back on the test I/O system. (3) Generate a synthetic workload and measure the system performance.

The majority of benchmarks attempt to represent a known system and structure on which some ``original'' design/system was tested. While this is all well and good, there are many issues with this sort of approach; temporal and spatial scaling concerns, timestamping and buffer copying, as well as driver operation for capturing packets~\cite{Orosz2013,dabir2007bottleneck,skopko2012loss}. Each of these aspects contribute to the initial problems with dissection and analysis of the captured information. For example, inaccuracies in scheduling I/Os may result in as much as a factor of 3.5 differences in measured response time and factor of 26 in measured queue sizes; differences that are too large to ignore~\cite{anderson2004buttress}.
Dealing with timing accuracy and high throughput involves three challenges. (1) Designing for dealing with peak performance requirements. (2) Coping with OS timing inaccuracies. (3) Working around unpredictable OS behavior; e.g. mechanisms to keep time and issue I/Os or performance effects due to interrupts.

Temporal scaling refers to the need to account for the nuances of timing with respect to the run time of commands; consisting of computation, communication and service. A temporally scalable benchmarking system would take these subtleties into account when expanding its operation across multiple machines in a network. While these temporal issues have been tackled for a single processor (and even somewhat for cases of multi-processor), these same timing issues are not properly handled when dealing with inter-network communication. Inaccuracies in packet timestamping can be caused due to overhead in generic kernel-time based solutions, as well as use of the kernel data structures ~\cite{PFRINGMan,Orosz2013}.

\begin{figure*}
\includegraphics[width=\textwidth]{./images/packetcapturetopology.png}
\caption{Visualization of Packet Capturing System}
\label{fig:captureTopology}
\end{figure*}

Spatial scaling refers to the need to account for the nuances of expanding a benchmark to incorporate a number of machines over a network. A system that properly incorporates spatial scaling is one that would be able to incorporate communication (even in varying intensities) between all the machines on a system, thus stress testing all communicative actions and aspects (e.g. resource locks, queueing) on the network.
=======
%\textcolor{blue}{It is worth noting that for the SMB2 protocol, the close request packet is used by clients to close instances of file that was openned with a previous create request packet.}
%\textcolor{blue}{It is worth noting that for the SMB2 protocol, the close request packet is used by clients to close instances of file that \textcolor{green}{were opened} with a previous create request packet.}

%\begin{figure}
% \includegraphics[width=0.5\textwidth]{./images/smbPacket.jpg}
% \caption{Visualization of SMB Packet}
% \caption{SMB Packet \textcolor{green}{Header Format}}
% \label{fig:smbPacket}
%\end{figure}

@@ -318,9 +292,14 @@ \subsection{Issues with Tracing}
%Dealing with timing accuracy and high throughput involves three challenges. (1) Designing for dealing with peak performance requirements. (2) Coping with OS timing inaccuracies. (3) Working around unpredictable OS behavior; e.g. mechanisms to keep time and issue I/Os or performance effects due to interrupts.
%
%Temporal scaling refers to the need to account for the nuances of timing with respect to the run time of commands; consisting of computation, communication and service. A temporally scalable benchmarking system would take these subtleties into account when expanding its operation across multiple machines in a network. While these temporal issues have been tackled for a single processor (and even somewhat for cases of multi-processor), these same timing issues are not properly handled when dealing with inter-network communication. Inaccuracies in packet timestamping can be caused due to overhead in generic kernel-time based solutions, as well as use of the kernel data structures ~\cite{PFRINGMan,Orosz2013}.
%
%Spatial scaling refers to the need to account for the nuances of expanding a benchmark to incorporate a number of machines over a network. A system that properly incorporates spatial scaling is one that would be able to incorporate communication (even in varying intensities) between all the machines on a system, thus stress testing all communicative actions and aspects (e.g. resource locks, queueing) on the network.
>>>>>>> Stashed changes

\begin{figure*}
\includegraphics[width=\textwidth]{./images/packetcapturetopology.png}
\caption{Visualization of Packet Capturing System}
\label{fig:captureTopology}
\end{figure*}

%Spatial scaling refers to the need to account for the nuances of expanding a benchmark to incorporate a number of machines over a network. A system that properly incorporates spatial scaling is one that would be able to incorporate communication (even in varying intensities) between all the machines on a system, thus stress testing all communicative actions and aspects (e.g. resource locks, queueing) on the network.

\section{Packet Capturing System}
In this section, we describe the packet capturing system as well as decisions made that influence its capabilities. We illustrate the existing university network filesystem as well as our methods for ensuring high-speed packet capture. Then, we discuss the analysis code we developed for examining the captured data.
@@ -369,7 +348,7 @@ \subsection{DataSeries Analysis}
This step also creates an easily digestible output that can be used to re-create all tuple information for SMB/SMB2 sessions that are witnessed over the entire time period.
Sessions are any communication where a valid UID and TID is used.

\textcolor{red}{Add information about if the code will be publically shared?}
%\textcolor{red}{Add information about if the code will be publically shared?}

%\subsection{Python Dissection}
%The final step of our SMB/SMB2 traffic analysis system is the dissection of the \texttt{AnalysisModule} output using the pandas data analysis library~\cite{pandasPythonWebsite}. The pandas library is a python implementation similar to R. In this section of the analysis structure, the generated text file is tokenized and placed into specific DataFrames representing the data seen for each 15 minute period. The python code is used for the analysis and further dissection of the data. This is where the cumulative distribution frequency and graphing of collected data is performed. Basic analysis and aggregation is also performed in this part of the code. This analysis includes the summation of individual session I/O (e.g. reads, writes, creates) as well as the collection of inter arrival time data and response time data.
@@ -530,7 +509,7 @@ \subsection{I/O Data Request Sizes}
%\end{figure}

\begin{figure}[t]
\includegraphics[width=0.5\textwidth]{./images/smb_2020_bytes_pdf.png}
\includegraphics[width=0.5\textwidth]{./images/smb_2019_bytes_pdf.png}
\vspace{-2em}
\caption{PDF and CDF of Bytes Transferred for Read and Write I/O}
\label{fig:SMB-Bytes-IO}
@@ -641,28 +620,28 @@ \subsection{I/O Response Times}
%\end{figure}

\begin{figure}[t!]
\includegraphics[width=0.5\textwidth]{./images/smb_2020_rts_cdf.png}
\includegraphics[width=0.5\textwidth]{./images/smb_2019_rts_cdf.png}
\caption{CDF of Response Time for SMB I/O}
\label{fig:CDF-RT-SMB}
%\vspace{-2em}
\end{figure}

\begin{figure}[t!]
\includegraphics[width=0.5\textwidth]{./images/smb_2020_rts_pdf.png}
\includegraphics[width=0.5\textwidth]{./images/smb_2019_rts_pdf.png}
\caption{PDF of Response Time for SMB I/O}
\label{fig:PDF-RT-SMB}
%\vspace{-2em}
\end{figure}

\begin{figure}[t!]
\includegraphics[width=0.5\textwidth]{./images/smb_2020_iats_cdf.png}
\includegraphics[width=0.5\textwidth]{./images/smb_2019_iats_cdf.png}
\caption{CDF of Inter-Arrival Time for SMB I/O}
\label{fig:CDF-RT-SMB}
%\vspace{-2em}
\end{figure}

\begin{figure}[t!]
\includegraphics[width=0.5\textwidth]{./images/smb_2020_iats_pdf.png}
\includegraphics[width=0.5\textwidth]{./images/smb_2019_iats_pdf.png}
\caption{PDF of Inter-Arrival Time for SMB I/O}
\label{fig:PDF-RT-SMB}
%\vspace{-2em}

0 comments on commit 428ce4b

Please sign in to comment.
You can’t perform that action at this time.