diff --git a/trackingPaper.tex b/trackingPaper.tex index 8b0624b..79e40fb 100644 --- a/trackingPaper.tex +++ b/trackingPaper.tex @@ -128,7 +128,7 @@ \begin{abstract} Storage system traces are important for examining real-world applications, studying potential bottlenecks, as well as driving benchmarks in the evaluation of new system designs. While file system traces have been well-studied in earlier work, it has been some time since the last examination of the SMB network file system. -The purpose of this work is to continue previous SMB studies to better understand the use of the protocol in a real-world production system in use at \textcolor{green}{a major research university}.%\textcolor{red}{the University of Connecticut}. +The purpose of this work is to continue previous SMB studies to better understand the use of the protocol in a real-world production system in use at \textcolor{green}{a major research university}. %\textcolor{red}{the University of Connecticut}. The main contribution of our work is the exploration of I/O behavior in modern file system workloads as well as new examinations of the inter-arrival times and run times for I/O events. We further investigate if the recent standard models for traffic remain accurate. Our findings reveal interesting data relating to the number of read and write events. We notice that the number of read and write events is significantly less than creates and \textcolor{green}{the} \textcolor{blue}{average number of bytes exchanged per I/O} \textcolor{green}{is much smaller than what has been seen in previous studies}. @@ -187,7 +187,7 @@ Benchmarks allow for the stress testing of various aspects of a system (e.g. net We created a new tracing system to collect data from the \textcolor{green}{university} %\textcolor{red}{UConn} storage network system. The tracing system was built around the high-speed PF\_RING packet capture system and required the use of proper hardware and software to handle incoming data%\textcolor{blue}{; however interaction with later third-party code did require re-design for processing of the information} -. We also created a new trace capture format derived on the DataSeries structured data format developed by HP~\cite{DataSeries}. +. We also created a new trace capture format based on the DataSeries structured data format developed by HP~\cite{DataSeries}. % PF\_RING section %The addition of PF\_RING lends to the tracing system by minimizing the copying of packets which, in turn, allows for more accurate timestamping of incoming traffic packets being captured ~\cite{Orosz2013,skopko2012loss,pfringWebsite,PFRINGMan}. PF\_RING acts as a kernel module that aids in minimizing packet loss/timestamping issues by not passing packets through the kernel data structures~\cite{PFRINGMan}. @@ -198,13 +198,12 @@ DataSeries was modified to filter specific SMB protocol fields along with the wr The DataSeries data format allowed us to create data analysis code that focuses on I/O events and ID tracking (TID/UID). The future vision for this information is to combine ID tracking with the OpLock information in order to track resource sharing of the different clients on the network. As well as using IP information to recreate communication in a larger network trace to establish a better benchmark. %Focus should be aboiut analysis and new traces -The contributions of this work are the new traces of SMB traffic over a larger university network as well as new analysis of this traffic. Our new examination of the captured data reveals that despite the streamlining of the CIFS/SMB protocol to be less "chatty", the majority of SMB communication is still metadata based I/O rather than actual data I/O. We found that read operations occur in greater numbers and cause a larger overall number of bytes to pass over the network. Additionally, the average number of bytes transferred for each write I/O is smaller than that of the average read operation. We also find that the current standard for modeling network I/O holds for the majority of operations, while a more representative model needs to be developed for reads. +The contributions of this work are the new traces of SMB traffic over a large university network as well as new analysis of this traffic. Our new examination of the captured data reveals that despite the streamlining of the CIFS/SMB protocol to be less "chatty", the majority of SMB communication is still metadata based I/O rather than actual data I/O. We found that read operations occur in greater numbers and cause a larger overall number of bytes to pass over the network. Additionally, the average number of bytes transferred for each write I/O is smaller than that of the average read operation. We also find that the current standard for modeling network I/O holds for the majority of operations, while a more representative model needs to be developed for reads. %\textcolor{red}{Add information about releasing the code?} -\subsection{Related Work} -In this section we discuss previous studies examining traces and testing that has advanced benchmark development. We summarize major works in trace study in Table~\ref{tbl:studySummary}. In addition we examine issues that occur with traces and the assumptions in their study. -\begin{table*}[] +\section{Related Work} +\begin{table*}[h] \centering \begin{tabular}{|r|c|c|c|c|c|} \hline @@ -231,44 +230,72 @@ This paper & 2020 & SMB & x & Dynamic & \vspace{-2em} \end{table*} \label{Previous Advances Due to Testing} -Tracing collection and analysis has proved its worth in time from previous studies where one can see important lessons pulled from the research; change in behavior of read/write events, overhead concerns originating in system implementation, bottlenecks in communication, and other revelations found in the traces. \\ +%In this section we discuss previous studies examining traces and testing that has advanced benchmark development. +We summarize major works in trace study in Table~\ref{tbl:studySummary}. +%In addition we examine issues that occur with traces and the assumptions in their study. +Tracing collection and analysis \textcolor{green}{from previous studies have provided important insights and lessons such as an observations of read/write event changes}, overhead concerns originating in system implementation, bottlenecks in communication, and other revelations found in the traces. Previous tracing work has shown that one of the largest and broadest hurdles to tackle is that traces (and benchmarks) must be tailored to the system being tested. There are always some generalizations taken into account but these generalizations can also be a major source of error \textcolor{blue}{(e.g. timing, accuracy, resource usage)} ~\cite{vogels1999file,malkani2003passive,seltzer2003nfs,anderson2004buttress,Orosz2013,dabir2007bottleneck,skopko2012loss,traeger2008nine,ruemmler1992unix}. -To produce a benchmark with high fidelity one needs to understand not only the technology being used but how it is being implemented within the system~\cite{roselli2000comparison,traeger2008nine,ruemmler1992unix}. All of these aspects will lend to the behavior of the system; from timing and resource elements to how the managing software governs actions~\cite{douceur1999large,malkani2003passive,seltzer2003nfs}. Furthermore, in pursuing this work one may find unexpected results and learn new things through examination~\cite{leung2008measurement,roselli2000comparison,seltzer2003nfs}. \\ -These studies are required in order to evaluate the development of technologies and methodologies along with furthering knowledge of different system aspects and capabilities. As has been pointed out by past work, the design of systems is usually guided by an understanding of the file system workloads and user behavior~\cite{leung2008measurement}. It is for that reason that new studies are constantly performed by the science community, from large scale studies to individual protocol studies~\cite{leung2008measurement,vogels1999file,roselli2000comparison,seltzer2003nfs,anderson2004buttress}. Even within these studies, the information gleaned is only as meaningful as the considerations of how the data is handled. - -The work done by Leung et al.~\cite{leung2008measurement} found observations related to the infrequency of files to be shared by more than one client. Over 67\% of files were never open by more than one client. -Leung's \textit{et al.} work led to a series of observations, from the fact that files are rarely re-opened to finding that read-write access patterns are more frequent ~\cite{leung2008measurement}. +To produce a benchmark with high fidelity one needs to understand not only the technology being used but how it is being implemented within the system~\cite{roselli2000comparison,traeger2008nine,ruemmler1992unix}. All these aspects lend to the behavior of the system; from timing and resource elements to how the managing software governs actions~\cite{douceur1999large,malkani2003passive,seltzer2003nfs}. Furthermore, in pursuing this work one may find unexpected results and learn new things through examination~\cite{leung2008measurement,roselli2000comparison,seltzer2003nfs}. +These studies are required in order to evaluate the development of technologies and methodologies along with furthering knowledge of different system aspects and capabilities. As has been pointed out by past work, the design of systems is usually guided by an understanding of the file system workloads and user behavior~\cite{leung2008measurement}. +%It is for that reason that new studies are constantly performed by the science community, from large scale studies to individual protocol studies~\cite{leung2008measurement,vogels1999file,roselli2000comparison,seltzer2003nfs,anderson2004buttress}. Even within these studies, the information gleaned is only as meaningful as the considerations of how the data is handled. + +%The work done by +Leung et al.~\cite{leung2008measurement} found \textcolor{green}{that} +%observations related to the infrequency of files to be shared by more than one client. +over 67\% of files were never opened by more than one client. +%Work by Leung \textit{et al.} led to a series of observations, from the fact that files are rarely re-opened to finding +and that read-write access patterns are more frequent ~\cite{leung2008measurement}. %If files were shared it was rarely concurrently and usually as read-only; where 5\% of files were opened by multiple clients concurrently and 90\% of the file sharing was read only. %Concerns of the accuracy achieved of the trace data was due to using standard system calls as well as errors in issuing I/Os leading to substantial I/O statistical errors. % Anderson Paper -The 2004 paper by Anderson et al.~~\cite{anderson2004buttress} has the following observations. A source of decreased precision came from the Kernel overhead for providing timestamp resolution. This would introduce substantial errors in the observed system metrics due to the use inaccurate tools when benchmarking I/O systems. These errors in perceived I/O response times can range from +350\% to -15\%. +%The 2004 paper by +Anderson et al.~~\cite{anderson2004buttress} \textcolor{green}{found that a } +%has the following observations. A + source of decreased precision came from the kernel overhead for providing timestamp resolution. This would introduce substantial errors in the observed system metrics due to the use inaccurate tools when benchmarking I/O systems. These errors in perceived I/O response times can range from +350\% to -15\%. %I/O benchmarking widespread practice in storage industry and serves as basis for purchasing decisions, performance tuning studies and marketing campaigns. Issues of inaccuracies in scheduling I/O can result in as much as a factor 3.5 difference in measured response time and factor of 26 in measured queue sizes. These inaccuracies pose too much of an issue to ignore. -Orosz and Skopko examined the effect of the kernel on packet loss in their 2013 paper~\cite{Orosz2013}. Their work showed that when taking network measurements the precision of the timestamping of packets is a more important criterion than low clock offset, especially when measuring packet inter-arrival times and round-trip delays at a single point of the network. One \textcolor{blue}{solution for network capture is the tool Dumpcap. However the} concern \textcolor{blue}{with} Dumpcap is \textcolor{blue}{that it is a} single threaded application and was suspected to be unable to handle new arriving packets due to \textcolor{green}{the} small size of the kernel buffer. Work by Dabir and Matrawy, in 2008~\cite{dabir2007bottleneck}, attempted to overcome this limitation by using two semaphores to buffer incoming strings and improve the writing of packet information to disk. - -Narayan and Chandy examined the concerns of distributed I/O and the different models of parallel application I/O. +Orosz and Skopko examined the effect of the kernel on packet loss and +%in their 2013 paper~\cite{Orosz2013}. Their work +showed that when taking network measurements the precision of the timestamping of packets is a more important criterion than low clock offset, especially when measuring packet inter-arrival times and round-trip delays at a single point of the network. One \textcolor{blue}{solution for network capture is the tool Dumpcap. However the} concern \textcolor{blue}{with} Dumpcap is \textcolor{blue}{that it is a} single threaded application and was suspected to be unable to handle new arriving packets due to \textcolor{green}{the} small size of the kernel buffer. Work by +Dabir and Matrawy%, in 2008 +~\cite{dabir2007bottleneck} attempted to overcome this limitation by using two semaphores to buffer incoming strings and improve the writing of packet information to disk. +%Narayan and Chandy examined the concerns of distributed I/O and the different models of parallel application I/O. %There are five major models of parallel application I/O. (1) Single output file shared by multiple nodes. (2) Large sequential reads by a single node at the beginning of computation and large sequential writes by a single node at the end of computation. (3) Checkpointing of states. (4) Metadata and read intensive (e.g. small data I/O and frequent directory lookups for reads). -Due to the striping of files across multiple nodes, this can cause any read or write to access all the nodes; which does not decrease the inter-arrival times (IATs) seen. As the number of I/O operations increases and the number of nodes increases, the IAT times decreased. -Observations from Skopko in a 2012 paper~\cite{skopko2012loss} examined the nuance concerns of software based capture solutions. The main observation was software solutions relied heavily on OS packet processing mechanisms. Further more, depending on the mode of operation (e.g. interrupt or polling), the timestamping of packets would change. +%Due to the striping of files across multiple nodes, this can cause any read or write to access all the nodes; which does not decrease the inter-arrival times (IATs) seen. As the number of I/O operations increases and the number of nodes increases, the IAT times decreased. +%Observations from +Skopk\'o +%in a 2012 paper +~\cite{skopko2012loss} examined the concerns of software based capture solutions \textcolor{green}{and observed that } +%. The main observation was + software solutions relied heavily on OS packet processing mechanisms. Furthermore, depending on the mode of operation (e.g. interrupt or polling), the timestamping of packets would change. As seen in previous trace work~\cite{leung2008measurement,roselli2000comparison,seltzer2003nfs}, the general perceptions of how computer systems are being used versus their initial purpose have allowed for great strides in eliminating actual bottlenecks rather than spending unnecessary time working on imagined bottlenecks. Without illumination of these underlying actions (e.g. read-write ratios, file death rates, file access rates) these issues can not be readily tackled. -\\ + \section{Background} %\subsection{Server Message Block} The Server Message Block (SMB) is an application-layer network protocol mainly used for providing shared access to files, shared access to printers, shared access to serial ports, miscellaneous communications between nodes on the network, as well as providing an authenticated inter-process communication mechanism. %The majority of usage for the SMB protocol involves Microsfot Windows. Almost all implementations of SMB servers use NT Domain authentication to validate user-access to resources -The SMB 1.0 protocol~\cite{SMB1Spec} has been found to have high/significant impact on performance due to latency issues. Monitoring revealed a high degree of ``chattiness'' and disregard of network latency between hosts. Solutions to this problem were included in the updated SMB 2.0 protocol which decreases ``chattiness'' by reducing commands and sub-commands from over a hundred to nineteen~\cite{SMB2Spec}. Additional changes, most significantly being increased security, were implemented in SMB 3.0 protocol (previously named SMB 2.2). % XXX citations for SMB specs for different versions? +The SMB 1.0 protocol~\cite{SMB1Spec} has been found to have high/significant impact on performance due to latency issues. Monitoring revealed a high degree of ``chattiness'' and disregard of network latency between hosts. Solutions to this problem were included in the updated SMB 2.0 protocol which decreases ``chattiness'' by reducing commands and sub-commands from over a hundred to nineteen~\cite{SMB2Spec}. Additional changes, most significantly increased security, were implemented in the SMB 3.0 protocol (previously named SMB 2.2). % XXX citations for SMB specs for different versions? %\textcolor{red}{\textbf{Add information about SMB 2.X/3?}} +\begin{figure*}[ht!] + \includegraphics[width=\textwidth]{./images/packetcapturetopology.png} + \caption{Visualization of Packet Capturing System} + \label{fig:captureTopology} +\end{figure*} + + The rough order of communication for SMB session file interaction contains five steps. First is a negotiation where a Microsoft SMB Protocol dialect is determined. Next, a session is established to determine the share-level security. After this, the Tree ID (TID) is determined for the share to be connected to as well as a file ID (FID) for a file requested by the client. From this establishment, I/O operations are performed using the FID given in the previous step. %\textcolor{green}{The SMB packet header is shown in Figure~\ref{fig:smbPacket}.} % Information relating to the capturing of SMB information The only data that needs to be tracked from the SMB traces are the UID (User ID) and TID for each session. The SMB commands also include a MID (Multiplex ID) value that is used for tracking individual packets in each established session, and a PID (Process ID) that tracks the process running the command or series of commands on a host. For the purposes of our tracing, we do not track the MID or PID information. % -Some nuances of SMB protocol I/O to note are that SMB/SMB2 write requests are the actions that push bytes over the wire while for SMB/SMB2 read operations it is the response packets. +Some nuances of the SMB protocol I/O to note are that SMB/SMB2 write requests are the actions that push bytes over the wire while for SMB/SMB2 read operations it is the response packets. + + %\begin{itemize} % \item SMB/SMB2 write request is the command that pushes bytes over the wire. \textbf{Note:} the response packet only confirms their arrival and use (e.g. writing). % \item SMB/SMB2 read response is the command that pushes bytes over the wire. \textbf{Note:} The request packet only asks for the data. @@ -294,16 +321,10 @@ Some nuances of SMB protocol I/O to note are that SMB/SMB2 write requests are th % %Temporal scaling refers to the need to account for the nuances of timing with respect to the run time of commands; consisting of computation, communication and service. A temporally scalable benchmarking system would take these subtleties into account when expanding its operation across multiple machines in a network. While these temporal issues have been tackled for a single processor (and even somewhat for cases of multi-processor), these same timing issues are not properly handled when dealing with inter-network communication. Inaccuracies in packet timestamping can be caused due to overhead in generic kernel-time based solutions, as well as use of the kernel data structures ~\cite{PFRINGMan,Orosz2013}. -\begin{figure*} - \includegraphics[width=\textwidth]{./images/packetcapturetopology.png} - \caption{Visualization of Packet Capturing System} - \label{fig:captureTopology} -\end{figure*} - %Spatial scaling refers to the need to account for the nuances of expanding a benchmark to incorporate a number of machines over a network. A system that properly incorporates spatial scaling is one that would be able to incorporate communication (even in varying intensities) between all the machines on a system, thus stress testing all communicative actions and aspects (e.g. resource locks, queueing) on the network. \section{Packet Capturing System} -In this section, we describe the packet capturing system as well as decisions made that influence its capabilities. We illustrate the existing university network filesystem as well as our methods for ensuring high-speed packet capture. Then, we discuss the analysis code we developed for examining the captured data. +%In this section, we describe the packet capturing system as well as decisions made that influence its capabilities. We illustrate the existing university network filesystem as well as our methods for ensuring high-speed packet capture. Then, we discuss the analysis code we developed for examining the captured data. % and on the python dissection code we wrote for performing traffic analysis. @@ -316,7 +337,7 @@ centralized storage server%The \textcolor{red}{UITS system} %\textcolor{red}{UConn} as well as personal drive share space for faculty, staff and students, along with at least one small group of users. Each server is capable of handling 1~Gb/s of traffic in each direction (e.g. outbound and inbound traffic). Altogether, the five-blade server system can in theory handle 5~Gb/s of data traffic in each direction. %Some of these blade servers have local storage but the majority do not have any. -The blade servers serve as SMB heads, but the actual storage is served by SAN storage nodes that sit behind them. This system does not currently implement load balancing. Instead, the servers are set up to spread the traffic load with a static distribution among four of the active cluster nodes while the fifth node is passive and purposed to take over in the case that any of the other nodes go down (e.g. become inoperable or crash). +The blade servers serve as SMB heads, but the actual storage is served by SAN storage nodes that sit behind them. This system does not currently implement load balancing. Instead, the servers are set up to spread the load with a static distribution across four of the active cluster nodes while the passive fifth node takes over in the case any of the other nodes go down.% (e.g. become inoperable or crash). The actual tracing was performed with a tracing server connected to a switch outfitted with a packet duplicating element as shown in the topology diagram in Figure~\ref{fig:captureTopology}. A 10~Gbps network tap was installed in the file server switch, allowing our storage server to obtain a copy of all network traffic going to the 5 file servers. The reason for using 10~Gbps hardware is to help ensure that the system is able to capture information on the network at peak theoretical throughput. @@ -339,8 +360,8 @@ The filesize used was in a ring buffer where each file captured was 64000 kB. % This causes tshark to switch to the next file after it reaches a determined size. %To simplify this aspect of the capturing process, the entirety of the capturing, dissection, and permanent storage was all automated through watch-dog scripts. -The \texttt{.pcap} files from \texttt{tshark} do not lend themselves to easy data analysis, so we translate these files into the DataSeries~\cite{DataSeries} format. HP developed DataSeries, an XML-based structured data format, that was designed to be self-descriptive, storage and access efficient, and highly flexible. -The system for taking captured \texttt{.pcap} files and writing them into the DataSeries format (i.e. \texttt{.ds}) does so by first creating a structure (based on a pre-written determination of the data desired to capture). Once the code builds this structure, it then reads through the capture traffic packets while dissecting and filling in the prepared structure with the desired information and format. +The \texttt{.pcap} files from \texttt{tshark} do not lend themselves to easy data analysis, so we translate these files into the DataSeries~\cite{DataSeries} format, an XML-based structured data format designed to be self-descriptive, storage and access efficient, and highly flexible. +%The system for taking captured \texttt{.pcap} files and writing them into the DataSeries format (i.e. \texttt{.ds}) does so by first creating a structure (based on a pre-written determination of the data desired to capture). Once the code builds this structure, it then reads through the capture traffic packets while dissecting and filling in the prepared structure with the desired information and format. Due to the fundamental nature of this work, there is no need to track every piece of information that is exchanged, only that information which illuminates the behavior of the clients and servers that interact over the network (i.e. I/O transactions). It should also be noted that all sensitive information being captured by the tracing system is hashed to protect the users whose information is examined by the tracing system. Furthermore, the DataSeries file retains only the first 512 bytes of the SMB packet - enough to capture the SMB header information that contains the I/O information we seek, while the body of the SMB traffic is not retained in order to better ensure \textcolor{green}{the privacy} of the university's network communications. \textcolor{blue}{The reasoning for this limit was to allow for capture of longer SMB AndX message chains due to negotiated \textit{MaxBufferSize}.} It is worth noting that in the case of larger SMB headers, some information is lost, however this is a trade-off by the university to provide, on average, the correct sized SMB header but does lead to scenarios where some information may be captured incompletely. \textcolor{blue}{This scenario only occurs in the cases of large AndX Chains in the SMB protocol, since the SMB header for SMB 2 is fixed at 72 bytes. In those scenarios the AndX messages specify only a single SMB header with the rest of the AndX Chain attached in a series of block pairs.} \subsection{DataSeries Analysis} @@ -397,11 +418,11 @@ Our examination of the collected network filesystem data revealed interesting pa %While CIFS/SMB protocol has less metadata operations, this is due to a depreciation of the SMB protocol commands, therefore we would expect to see less total operations (e.g. $0.04$\% of total operations). %The infrequency of file activity is further strengthened by our finding that within a week long window of time there are no Read or Write inter arrival times that can be calculated. %\textcolor{red}{XXX we are going to get questioned on this. its not likely that there are no IATs for reads and writes} -General operations happen at very high frequency with inter arrival times that were found to be relatively short (1317$\mu$s on average), as shown in Table~\ref{tbl:PercentageTraceSummary}. +%General operations happen at very high frequency with inter arrival times that were found to be relatively short (1317$\mu$s on average), as shown in Table~\ref{tbl:PercentageTraceSummary}. Taking a deeper look at the SMB2 operations, shown in %the bottom half of Table~\ref{tbl:SMBCommands2}, we see that $9.06$\% of the general operations are negotiate commands. These are commands sent by the client to notify the server which dialects of the SMB2 protocol the client can understand. The three most common commands are close, tree connect, and query info. -The latter two relate to metadata information of shares and files accessed, however the close operation relates to the create operations relayed over the network. Note that the create command is also used as an open file. The first thing one will notice is that the number of closes is greater than the total number of create operations; by $9.35$\%. These extra close operations are most likely due to applications doing multiple closes that do not need to be done. +The latter two relate to metadata information of shares and files accessed. However, the close operation corresponds to the create operations. Note that the create command is also used as an open file. Notice is that the number of closes is greater than the total number of create operations by $9.35$\%. These extra close operations are most likely due to applications doing multiple closes that do not need to be performed. \begin{table} \centering @@ -465,7 +486,10 @@ Cancel & \multicolumn{2}{|c|}{0} & 0.00\% \\ %\end{figure} Each SMB Read and Write command is associated with a data request size that indicates how many bytes are to be read or written as part of that command. Figure~\ref{fig:SMB-Bytes-IO} %and~\ref{fig:PDF-Bytes-Write} -shows the probability density function (PDF) of the different sizes of bytes transferred for read and write I/O operations respectively. The most noticeable aspect of these graphs are that the majority of bytes transferred for read and write operations is around 64 bytes. It is worth noting that write I/O also have a larger number of very small transfer amounts. This is unexpected in terms of the amount of data passed in a frame. Our belief is that this is due to a large number of long term calculations/scripts being run that only require small but frequent updates. This assumption was later validated in part when examining the files transferred, as some were related to running scripts creating a large volume of files, however the more affirming finding was the behavior observed with common applications. For example, it was seen that Microsoft Word would perform a large number of small reads at ever growing offsets. This was interpreted as when a user is viewing a document over the network and Word would load the next few lines of text as the user scrolled down the document; causing ``loading times'' amid use. A large degree of small writes were observed to be related to application cookies or other such smaller data communications. +shows the probability density function (PDF) of the different sizes of bytes transferred for read and write I/O operations respectively. The most noticeable aspect of these graphs are that the majority of bytes transferred for read and write operations is around 64 bytes. It is worth noting that write I/Os also have a larger number of very small transfer amounts. This is unexpected in terms of the amount of data passed in a frame. \textcolor{green}{Part of the reason} is due to a large number of long term %calculations/ +scripts that only require small but frequent updates, \textcolor{green}{as we observed several} +%. This assumption was later validated in part when examining the files transferred, as some were related to +running scripts creating a large volume of files. \textcolor{green}{A more significant reason was because we noticed} Microsoft Word would perform a large number of small reads at ever growing offsets. This was interpreted as when a user is viewing a document over the network and Word would load the next few lines of text as the user scrolled down the document; causing ``loading times'' amid use. \textcolor{green}{Finally,} a large degree of small writes were observed to be related to application cookies or other such smaller data communications. %This could also be attributed to simple reads relating to metadata\textcolor{red}{???} %\textcolor{blue}{Reviewing of the SMB and SMB2 leads to some confusion in understanding this behavior. According to the specification the default ``MaxBuffSize'' for reads and writes should be between 4,356 bytes and 16,644 bytes depending on the use of either a client version of server version of Windows; respectively. In the SMB2 protocol specification, specific version of Windows (e.g. Vista SP1, Server 2008, 7, Server 2008 R2, 8, Server 2012, 8.1, Server 2012 R2) disconnect if the ``MaxReadSize''/``MaxWriteSize'' value is less than 4096. However, further examination of the specification states that for SMB2 the read length and write length can be zero. Thus, this seems to conflict that the size has to be greater than 4096 but allows for it to also be zero. It is due to this protocol specification of allowing zero that supports the smaller read/write sizes seen in the captured traffic. The author's assumption here is that the university's configuration allows for smaller traffic to be exchanged without the disconnection for sizes smaller than 4096.} @@ -523,12 +547,12 @@ shows the probability density function (PDF) of the different sizes of bytes tra % \label{fig:CDF-Bytes-RW} %\end{figure} Figure~\ref{fig:SMB-Bytes-IO} %and~\ref{fig:CDF-Bytes-Write} -shows cumulative distribution functions (CDF) for bytes read and bytes written. As can be seen, almost no read transfer sizes are less than 32 bytes, whereas 20\% writes below 32 bytes. Table~\ref{fig:transferSizes} shows a tabular view of this data. For reads, $34.97$\% are between 64 and 512 bytes, with another $28.86$\% at 64 byte request sizes. There are a negligible percentage of read requests larger than 512. +shows cumulative distribution functions (CDF) for bytes read and bytes written. As can be seen, almost no read transfer sizes are less than 32 bytes, whereas 20\% of the writes are smaller than 32 bytes. Table~\ref{fig:transferSizes} shows a tabular view of this data. For reads, $34.97$\% are between 64 and 512 bytes, with another $28.86$\% at 64 byte request sizes. There are a negligible percentage of read requests larger than 512. This read data differs from the size of reads observed by Leung et al. by a factor of four smaller. %This read data is similar to what was observed by Leung et al, however at an order of magnitude smaller. -Writes observed also differ from previous inspection of the protocol's usage. % are very different. -Leung et al. showed that $60$-$70$\% of writes were less than 4K in size and $90$\% less than 64K in size. In our data, however, we see that almost all writes are less than 1K in size. In fact, $11.16$\% of writes are less than 4 bytes, $52.41$\% are 64 byte requests, and $43.63$\% of requests are less than 64 byte writes. -In the ten years since the last study, it is clear that writes have become significantly smaller. In our analysis of a subset of the writes, we found that a significant part of the write profile was writes to cookies which are necessarily small files. The preponderance of web applications and the associated tracking is a major change in how computers and data storage is used compared to over 10 years ago. These small data reads and writes significantly alter the assumptions that most network storage systems are designed for. +%Writes observed also differ from previous inspection of the protocol's usage. % are very different. +Leung et al. showed that $60$-$70$\% of writes were less than 4K in size and $90$\% less than 64K in size. In our data, however, we see that almost all writes are less than 1K in size. In fact, $11.16$\% of writes are less than 4 bytes, $52.41$\% are 64 byte requests, and $43.63$\% of requests are less than 64 bytes. +In the ten years since the last study, it is clear that writes have become significantly smaller. In our analysis of a subset of the writes, we found that a significant part of the write profile was writes to cookies which are necessarily small files. The preponderance of web applications and the associated tracking is a major change in how computers and data storage are used compared to a decade ago. These small data reads and writes significantly alter the assumptions that most network storage systems are designed for. %This may be explained by the fact that large files, and multiple files, are being written as standardized blocks more fitting to the frequent update of larger data-sets and disk space available. This could be as an effort to improve the fidelity of data across the network, allow for better realtime data consistency between client and backup locations, or could just be due to a large number of scripts being run that create and update a series of relatively smaller documents. %\textbf{Note: It seems like a change in the order of magnitude that is being passed per packet. What would this indicate?}\textcolor{red}{Answer the question. Shorter reads/writes = better?} @@ -590,7 +614,7 @@ files when files are modified. Furthermore, read operations account for the lar %~!~ Addition since Chandy writing ~!~% Most previous tracing work has not reported I/O response times or command latency which is generally proportional to data request size, but under load, the response times give an indication of server load. In Table~\ref{tbl:PercentageTraceSummary} we show a summary of the response times for read, write, create, and general commands. We note that most general (metadata) operations occur fairly frequently, run relatively slowly, and happen at high frequency. -Other observations of the data show that the number of writes is very close to the number of reads, although the write response time for their operations is very small - most likely because the storage server caches the write without actually committing to disk. Reads on the other hand are in most cases probably not going to hit in the cache and require an actual read from the storage media. Although read operations are only a few percentage of the total operations they have the greatest average response time; more than general I/O. As noted above, creates happen more frequently, but have a slightly slower response time, because of the extra metadata operations required for a create as opposed to a simple write. +We also observe that the number of writes is very close to the number of reads. The write response time for their operations is very small - most likely because the storage server caches the write without actually committing to disk. Reads, on the other hand, are in most cases probably not going to hit in the cache and require an actual read from the storage media. Although read operations are only a small percentage of all operations, they have the highest average response time. As noted above, creates happen more frequently, but have a slightly slower response time, because of the extra metadata operations required for a create as opposed to a simple write. % Note: RT + IAT time CDFs exist in data output @@ -623,30 +647,30 @@ Other observations of the data show that the number of writes is very close to t %\end{figure} \begin{figure}[t!] - \includegraphics[width=0.5\textwidth]{./images/smb_2019_rts_cdf.png} - \caption{CDF of Response Time for SMB I/O} - \label{fig:CDF-RT-SMB} + \includegraphics[width=0.5\textwidth]{./images/smb_2019_iats_cdf.png} + \caption{CDF of Inter-Arrival Time for SMB I/O} + \label{fig:CDF-IAT-SMB} %\vspace{-2em} \end{figure} \begin{figure}[t!] - \includegraphics[width=0.5\textwidth]{./images/smb_2019_rts_pdf.png} - \caption{PDF of Response Time for SMB I/O} - \label{fig:PDF-RT-SMB} + \includegraphics[width=0.5\textwidth]{./images/smb_2019_iats_pdf.png} + \caption{PDF of Inter-Arrival Time for SMB I/O} + \label{fig:PDF-IAT-SMB} %\vspace{-2em} \end{figure} \begin{figure}[t!] - \includegraphics[width=0.5\textwidth]{./images/smb_2019_iats_cdf.png} - \caption{CDF of Inter-Arrival Time for SMB I/O} - \label{fig:CDF-IAT-SMB} + \includegraphics[width=0.5\textwidth]{./images/smb_2019_rts_cdf.png} + \caption{CDF of Response Time for SMB I/O} + \label{fig:CDF-RT-SMB} %\vspace{-2em} \end{figure} \begin{figure}[t!] - \includegraphics[width=0.5\textwidth]{./images/smb_2019_iats_pdf.png} - \caption{PDF of Inter-Arrival Time for SMB I/O} - \label{fig:PDF-IAT-SMB} + \includegraphics[width=0.5\textwidth]{./images/smb_2019_rts_pdf.png} + \caption{PDF of Response Time for SMB I/O} + \label{fig:PDF-RT-SMB} %\vspace{-2em} \end{figure} @@ -751,10 +775,12 @@ Avg IAT ($\mu$s) & 33220.8 & \multicolumn{1}{r|}{35260.4} & \multicolumn{1} %% \end{itemize} %%\end{enumerate} % -Figure~\ref{fig:CDF-IAT-SMB} shows the inter arrival times CDF for general I/O. As can be seen, SMB commands happen very frequently - $85$\% of commands are issued less than 1024~$\mu s$ apart. As was mentioned above, the SMB protocol is known to be very chatty, and it is clear that servers must spend a lot of time dealing with these commands. For the most part, most of these commands are also serviced fairly quickly as -seen in Figure~\ref{fig:CDF-RT-SMB}. Interestingly, the response/return time (RT) for the general metadata operations follows a similar curve to the inter-arrival times. +Figures~\ref{fig:CDF-IAT-SMB} and~\ref{fig:PDF-IAT-SMB} shows the inter arrival times CDFs and PDFs. As can be seen, SMB commands happen very frequently - $85$\% of commands are issued less than 1000~$\mu s$ apart. As mentioned above, SMB is known to be very chatty, and it is clear that servers must spend a lot of time dealing with these commands. For the most part, most of these commands are also serviced fairly quickly as +seen in Figures~\ref{fig:CDF-RT-SMB} and~\ref{fig:PDF-RT-SMB}. Interestingly, the response time for the general metadata operations follows a similar curve to the inter-arrival times. -Next we examine the response time (RT) of the read, write, and create I/O operations that occur over the SMB network filesystem. The response time for write operations (shown in Figure~\ref{fig:CDF-RT-SMB}) does not follow the step function similar to the bytes written CDF in Figure~\ref{fig:SMB-Bytes-IO}. This is understandable as the response time for a write would be expected to be a more standardized action and not necessarily proportional to the number of bytes written. However, the read response time (Figure~\ref{fig:CDF-RT-SMB}) is smoother than the bytes read CDF (Figure~\ref{fig:SMB-Bytes-IO}). This is most likely due to the fact that some of the reads are satisfied by server caches, thus eliminating some long access times to persistent storage. +%Next we examine the response time (RT) of the read, write, and create I/O operations that occur over the SMB network filesystem. +The response time for write operations (shown in Figure~\ref{fig:CDF-RT-SMB}) does not follow the step function similar to the bytes written CDF in Figure~\ref{fig:SMB-Bytes-IO}. This is understandable as the response time for a write would be expected to be a more standardized action and not necessarily proportional to the number of bytes written. However, the read response time %(Figure~\ref{fig:CDF-RT-SMB}) + is smoother than the bytes read CDF (Figure~\ref{fig:SMB-Bytes-IO}). This is most likely due to the fact that some of the reads are satisfied by server caches, thus eliminating some long access times to persistent storage. However, one should notice that the response time on read operations grows at a rate similar to that of write operations. This, again, shows a form of standardization in the communication patterns although some read I/O take a far greater period of time; due to larger amounts of read data sent over several standardized size packets. %While the RT for Write operations are not included (due to their step function behavior) Figure~\ref{fig:CDF-RT-Read} and Figure~\ref{fig:CDF-RT-RW} show the response times for Read and Read+Write operations respectively. T %\textcolor{red}{The write I/O step function behavior is somewhat visible in the CDF of both reads and writes in Figures~\ref{fig:CDF-RT-Read}~and~\ref{fig:CDF-RT-Write}. Moreover, this shows that the majority ($80$\%) of read (and write) operations occur within 2~$ms$, the average access time for enterprise storage disks. As would be expected, this is still an order of magnitude greater than the general I/O.} @@ -882,7 +908,7 @@ Originally we expected that these common file extensions would be a much larger Furthermore, the majority of extensions are not readily identified. Upon closer examination of the tracing system it was determined that %these file extensions are an artifact of how Windows interprets file extensions. The Windows operating system merely guesses the file type based on the assumed extension (e.g. whatever characters follow after the final `.'). -many files simply do not have a valid extension. These range from linux-based library files, manual pages, odd naming schemes as part of scripts or back-up files, as well as date-times and IPs as file names. There are undoubtedly a larger number more, but exhaustive determination of all variations is seen as out of scope for this work. +many files simply do not have a valid extension. These range from :inux-based library files, man pages, odd naming schemes as part of scripts or back-up files, as well as date-times and IPs as file names. There are undoubtedly more, but exhaustive determination of all variations is seen as out of scope for this work. %\textcolor{red}{Add in information stating that the type of OS in use in the university environment range from Windows, Unix, BSD, as well as other odd operating systems used by the engineering department.} @@ -1004,7 +1030,7 @@ Our comparison of the existing standard use of a exponential distribution to mod % had better $R^2$ result than the exponential equivalent for write operations. This is not surprising due to the step-function shape of the Figure~\ref{fig:CDF-RT-Write} CDF. Examining the $R^2$ results for the read + write I/O operations we find that the exponential distribution is far more accurate at modeling this combined behavior. for write and create operations are similar, while those for read operations are not. Further more there is less similarity between the modeled behavior of general operation inter arrival times and their response times, showing the need for a more refined model for each aspect of the network filesystem interactions. One should also notice that the general operation model is more closely similar to that of the creates. -This makes sense since the influence of create operations are found to dominate the I/O behavior of the network filesystem, which aligns well with the number of existing close operations. +This makes sense since create operations are found to dominate the network filesystem I/O, which aligns well with the number of existing close operations. %improves the ability of a exponential distribution to model the combined behavior.} %Observations: %\begin{itemize} @@ -1030,21 +1056,23 @@ Due to the large number of metadata operations, the use of smart storage solutio \subsection{System Limitations and Challenges} \label{System Limitations and Challenges} -When initially designing the tracing system used in this paper, different aspects were taken into account, such as space limitations of the tracing system, packet capture limitations (e.g. file size), and speed limitations of the hardware. One limitation encountered in the packet capture system deals with the functional pcap (packet capture file) size. The concern being that the pcap files only need to be held until they have been filtered for specific protocol information and then compressed using the DataSeries format, but still allow for room for the DataSeries files being created to be stored. Other limitation concerns came from the software and packages used to collect the network traffic data~\cite{Orosz2013,dabir2007bottleneck,skopko2012loss}. These ranged from timestamp resolution provided by the tracing system's kernel~\cite{Orosz2013} to how the packet capturing drivers and programs (such as dumpcap and tshark) operate along with how many copies are performed and how often. The speed limitations of the hardware are dictated by the hardware being used (e.g. Gb capture interface) and the software that makes use of this hardware (e.g. PF\_RING). After all, our data can only be as accurate as the information being captured~\cite{seltzer2003nfs,anderson2004buttress}. +When initially designing the tracing system used in this paper, different aspects were taken into account, such as space limitations of the tracing system, packet capture limitations (e.g. file size), and speed limitations of the hardware. One limitation encountered in the packet capture system deals with the functional pcap (packet capture file) size. The concern being that the pcap files only need to be held until they have been filtered for specific protocol information and then compressed using the DataSeries format, but still allow for room for the DataSeries files being created to be stored. %Other limitation concerns came from the software and packages used to collect the network traffic data~\cite{Orosz2013,dabir2007bottleneck,skopko2012loss}. These ranged from timestamp resolution provided by the tracing system's kernel~\cite{Orosz2013} to how the packet capturing drivers and programs (such as dumpcap and tshark) operate along with how many copies are performed and how often. The speed limitations of the hardware are dictated by the hardware being used (e.g. Gb capture interface) and the software that makes use of this hardware (e.g. PF\_RING). +%After all, our data can only be as accurate as the information being captured~\cite{seltzer2003nfs,anderson2004buttress}. An other concern was whether or not the system would be able to function optimally during periods of high network traffic. All aspects of the system, from the hardware to the software, have been altered to help combat these concerns and allow for the most accurate packet capturing possible. %About Challenges of system -While the limitations of the system were concerns, there were other challenges that were tackled in the development of this research. -One glaring challenge with building this tracing system was using code written by others; tshark and DataSeries. While these programs are used within the tracing structure there are some issues when working with them. These issues ranged from data type limitations of the code to hash value and checksum miscalculations due to encryption of specific fields/data. Attempt was made to dig and correct these issues, but they were so inherent to the code being worked with that hacks and workarounds were developed to minimize their effect. Other challenges centralize around selection, interpretations and distribution scope of the data collected. Which fields should be filtered out from the original packet capture? What data is most prophetic to the form and function of the network being traced? What should be the scope, with respect to time, of the data being examined? Where will the most interesting information appear? As each obstacle was tackled, new information and ways of examining the data reveal themselves and with each development different alterations and corrections are made. +%While the limitations of the system were concerns, there were other challenges that were tackled in the development of this research. +%One glaring challenge with building this tracing system was using code written by others; tshark and DataSeries. While these programs are used within the tracing structure there are some issues when working with them. These issues ranged from data type limitations of the code to hash value and checksum miscalculations due to encryption of specific fields/data. Attempt was made to dig and correct these issues, but they were so inherent to the code being worked with that hacks and workarounds were developed to minimize their effect. Other challenges centralize around selection, interpretations and distribution scope of the data collected. Which fields should be filtered out from the original packet capture? What data is most prophetic to the form and function of the network being traced? What should be the scope, with respect to time, of the data being examined? Where will the most interesting information appear? As each obstacle was tackled, new information and ways of examining the data reveal themselves and with each development different alterations and corrections are made. -Even when all the information is collected and the most important data has been selected, there is still the issue of what lens should be used to view this information. Because the data being collected is from an active network, there will be differing activity depending on the time of day, week, and scholastic year. For example, although the first week or so of the year may contain a lot of traffic, this does not mean that trends of that period of time will occur for every week of the year (except perhaps the final week of the semester). The trends and habits of the network will change based on the time of year, time of day, and even depend on the exam schedule. Truly interesting examination of data requires looking at all different periods of time to see how all these factors play into the communications of the network. +%Even when all the information is collected and the most important data has been selected, there is still the issue of what lens should be used to view this information. +Because the data is being collected from an active network, there will be differing activity depending on the time of day, week, and scholastic year. For example, although the first week or so of the year may contain a lot of traffic, this does not mean that trends of that period of time will occur for every week of the year (except perhaps the final week of the semester). The trends and habits of the network will change based on the time of year, time of day, and even depend on the exam schedule. A comprehensive examination requires looking at all different periods of time to see how all these factors play into the storage system utilization. % DataSeries Challenge -A complication of this process is that the DataSeries code makes use of a push-pop stack for iterating through packet information. This means that if information can not be re-read then errors occur. This can manifest in the scenario where a produced \texttt{.ds} file is corrupted or incomplete, despite the fact that the original \texttt{.pcap} being fine. +%A complication of this process is that the DataSeries code makes use of a push-pop stack for iterating through packet information. This means that if information can not be re-read then errors occur. This can manifest in the scenario where a produced \texttt{.ds} file is corrupted or incomplete, despite the fact that the original \texttt{.pcap} being fine. %This manifested as an approximate loss of \textbf{????} out of every 100,000 files. -Normally, one could simply re-perform the conversion process to a DataSeries file, but due to the rate of the packets being captured and security concerns of the data being captured, we are unable to re-run any captured information. +%Normally, one could simply re-perform the conversion process to a DataSeries file, but due to the rate of the packets being captured and security concerns of the data being captured, we are unable to re-run any captured information. \section{Conclusions and Future Work} -Our analysis of this university network filesystem illustrated the current implementation and use of the CIFS/SMB protocol in a large academic setting. We notice the effect of caches on the ability of the filesystem to limit the number of accesses to persistant storage. The effect of enterprise storage disks access time can be seen in the response time for read and write I/O. The majority of network communication is dominated by metadata operation, which is of less surprise since SMB is a known chatty protocol. We do notice that the CIFS/SMB protocol continues to be chatty with metadata I/O operations regardless of the version of SMB being implemented; $74.66$\% of I/O being metadata operations for SMB2. +Our analysis of this university network filesystem illustrated the current implementation and use of the CIFS/SMB protocol in a large academic setting. We notice the effect of caches on the ability of the filesystem to limit the number of accesses to persistant storage. The effect of enterprise storage disks access time can be seen in the response time for read and write I/O. Metadata operations dominate the neThe majority of network communication is dominated by metadata operation, which is of less surprise since SMB is a known chatty protocol. We do notice that the CIFS/SMB protocol continues to be chatty with metadata I/O operations regardless of the version of SMB being implemented; $74.66$\% of I/O being metadata operations for SMB2. We also find that read and write transfer sizes are significantly smaller than would be expected and requires further study as to the impact on current storage systems. %operations happen in greater number than write operations (at a ratio of 1.06) and the size of their transfers are is also greater by a factor of about 2. %However, the average write operation includes a larger number of relatively smaller writes. @@ -1053,7 +1081,7 @@ Examination of the return times for these different I/O operations shows that ex Our work finds that write and create response times can be modeled similarly, but that the read response times require the alteration of the general model. However, the general I/O can be modeled using the same standard; which has similar shape and scale to that of the write and create operations. -\subsection{Future Work} +%\subsection{Future Work} The analysis work will eventually incorporate oplocks and other aspects of resource sharing on the network to gain a more complete picture of the network's usage and bottlenecks. Network filesystem usage from an individual user scope has become simple and does not contain a greater deal of read, write, and create operations. Further analysis will be made in examining how the determined metrics change when examined at the scope of a per share (i.e. TID) or per user (i.e. UID). At this level of examination we will be able to obtain a better idea of how each share is interacted with, as well as how files and directories are shared and access control is implemented.