Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
various edits
  • Loading branch information
joc02012 committed Feb 3, 2020
1 parent dceb37c commit 37da015
Showing 1 changed file with 8 additions and 5 deletions.
13 changes: 8 additions & 5 deletions trackingPaper.tex
Expand Up @@ -131,7 +131,7 @@ While file system traces have been well-studied in earlier work, it has been som
The purpose of this work is to continue previous SMB studies to better understand the use of the protocol in a real-world production system in use at the University of Connecticut. The purpose of this work is to continue previous SMB studies to better understand the use of the protocol in a real-world production system in use at the University of Connecticut.
The main contribution of our work is the exploration of I/O behavior in modern file system workloads as well as new examinations of the inter-arrival times and run times for I/O events. The main contribution of our work is the exploration of I/O behavior in modern file system workloads as well as new examinations of the inter-arrival times and run times for I/O events.
We further investigate if the recent standard models for traffic remain accurate. We further investigate if the recent standard models for traffic remain accurate.
Our findings reveal interesting data relating to the number of read and write events. We notice that the number of read events exceeds writes and that the average of bytes transferred over the wire is greater for reads as well. Furthermore we find an increase in the use of metadata for overall network communication that can be taken advantage of through the use of smart storage devices. Our findings reveal interesting data relating to the number of read and write events. We notice that the number of read and write events is significantly less than creates that the average of bytes transferred over the wire is much smaller than what has been seen in previous studies. Furthermore we find an increase in the use of metadata for overall network communication that can be taken advantage of through the use of smart storage devices.
\end{abstract} \end{abstract}


\section{Introduction} \section{Introduction}
Expand Down Expand Up @@ -491,8 +491,9 @@ Figures~\ref{fig:CDF-Bytes-Read} and~\ref{fig:CDF-Bytes-Write} show cumulative d
This read data differs from the size of reads observed by Leung et al. by a factor of 4 smaller. This read data differs from the size of reads observed by Leung et al. by a factor of 4 smaller.
%This read data is similar to what was observed by Leung et al, however at an order of magnitude smaller. %This read data is similar to what was observed by Leung et al, however at an order of magnitude smaller.
Writes observed also differ from previous inspection of the protocol's usage. % are very different. Writes observed also differ from previous inspection of the protocol's usage. % are very different.
Leung et al. showed that $60$-$70$\% of writes were less than 4K in size and $90$\% less than 64K in size. In our data, however, we see that only $11.16$\% of writes are less than 4 bytes, $52.41$\% are 64 byte requests, and only $43.63$\% of requests are less than 64 byte writes. Leung et al. showed that $60$-$70$\% of writes were less than 4K in size and $90$\% less than 64K in size. In our data, however, we see that almost all writes are less than 1K in size. In fact, $11.16$\% of writes are less than 4 bytes, $52.41$\% are 64 byte requests, and $43.63$\% of requests are less than 64 byte writes.
In the ten years since the last study, it is clear that writes have become significantly smaller. This may be explained by the fact that large files, and multiple files, are being written as standardized blocks more fitting to the frequent update of larger data-sets and disk space available. This could be as an effort to improve the fidelity of data across the network, allow for better realtime data consistency between client and backup locations, or could just be due to a large number of scripts being run that create and update a series of relatively smaller documents. In the ten years since the last study, it is clear that writes have become significantly smaller. In our analysis of a subset of the writes, we found that a significant part of the write profile was writes to cookies which are necessarily small files. The preponderance of web applications and the associated tracking is a major change in how computers and data storage is used compared to over 10 years ago. These small data reads and writes significantly alter the assumptions that most network storage systems are designed for.
%This may be explained by the fact that large files, and multiple files, are being written as standardized blocks more fitting to the frequent update of larger data-sets and disk space available. This could be as an effort to improve the fidelity of data across the network, allow for better realtime data consistency between client and backup locations, or could just be due to a large number of scripts being run that create and update a series of relatively smaller documents.
%\textbf{Note: It seems like a change in the order of magnitude that is being passed per packet. What would this indicate?}\textcolor{red}{Answer the question. Shorter reads/writes = better?} %\textbf{Note: It seems like a change in the order of magnitude that is being passed per packet. What would this indicate?}\textcolor{red}{Answer the question. Shorter reads/writes = better?}


\begin{table}[] \begin{table}[]
Expand Down Expand Up @@ -978,8 +979,10 @@ Normally, one could simply re-perform the conversion process to a DataSeries fil


\section{Conclusions and Future Work} \section{Conclusions and Future Work}
Our analysis of this university network filesystem illustrated the current implementation and use of the CIFS/SMB protocol in a large academic setting. We notice the effect of caches on the ability of the filesystem to limit the number of accesses to persistant storage. The effect of enterprise storage disks access time can be seen in the response time for read and write I/O. The majority of network communication is dominated by metadata operation, which is of less surprise since SMB is a known chatty protocol. We do notice that the CIFS/SMB protocol continues to be chatty with metadata I/O operations regardless of the version of SMB being implemented; $74.66$\% of I/O being metadata operations for SMB2. Our analysis of this university network filesystem illustrated the current implementation and use of the CIFS/SMB protocol in a large academic setting. We notice the effect of caches on the ability of the filesystem to limit the number of accesses to persistant storage. The effect of enterprise storage disks access time can be seen in the response time for read and write I/O. The majority of network communication is dominated by metadata operation, which is of less surprise since SMB is a known chatty protocol. We do notice that the CIFS/SMB protocol continues to be chatty with metadata I/O operations regardless of the version of SMB being implemented; $74.66$\% of I/O being metadata operations for SMB2.
We also find that read operations happen in greater number than write operations (at a ratio of 1.06) and the size of their transfers are is also greater by a factor of about 2. We also find that read and write transfer sizes are significantly smaller than would be expected and requires further study as to the impact on current storage systems.
However, the average write operation includes a larger number of relatively smaller writes. Examination of the return times for these different I/O operations shows that exponential distribution curve fitting equation is most accurate at modeling the CDF of the various I/O operations. This shows that the current model is still effective for the majority of I/O, but that for read operations there needs to be further research in modeling their behavior. %operations happen in greater number than write operations (at a ratio of 1.06) and the size of their transfers are is also greater by a factor of about 2.
%However, the average write operation includes a larger number of relatively smaller writes.
Examination of the return times for these different I/O operations shows that exponential distribution curve fitting equation is most accurate at modeling the CDF of the various I/O operations. This shows that the current model is still effective for the majority of I/O, but that for read operations there needs to be further research in modeling their behavior.
%Our work finds that a single term Gaussian distribution has an $R^2$ value of $0.7797$, but further work needs to be made in order to refine the model. %Our work finds that a single term Gaussian distribution has an $R^2$ value of $0.7797$, but further work needs to be made in order to refine the model.
Our work finds that write and create response times can be modeled similarly, but that the read response times require the alteration of the general model. Our work finds that write and create response times can be modeled similarly, but that the read response times require the alteration of the general model.
However, the general I/O can be modeled using the same standard; which has similar shape and scale to that of the write and create operations. However, the general I/O can be modeled using the same standard; which has similar shape and scale to that of the write and create operations.
Expand Down

0 comments on commit 37da015

Please sign in to comment.