# paw10003 / TracingPaper

few more edits

 @@ -870,7 +870,7 @@ \subsection{Distribution Models} For simulations and analytic modeling, it is often useful to have models that describe storage systems I/O behavior. In this section, we attempt to map traditional probabilistic distributions to the data that we have observed. Specifically, taking the developed CDF graphs, we perform curve fitting to determine the applicability of Gaussian and Weibull distributions to the the network filesystem I/O behavior. Note that an exponential distribution, typically used to model interarrival times and response times, is a special case of a Weibull distribution where $k=1$. Table~\ref{tbl:curveFitting} shows best-fit parametrized distributions for the measured data. % along with $R^2$ fitness values. Table~\ref{tbl:curveFitting} shows best-fit parametrized distributions for the measured data. The error bounds give an indication of how well the model fits the CDF. % along with $R^2$ fitness values. %Based on the collected IAT and RT data, the following are the best fit curve representation equations with supporting $R^{2}$ values. In the case of each, it was found that the equation used to model the I/O behavior was a Gaussian equation with a single term. %$$f(x) = a_1 * e^{-((x-b_1)/c_1)^2)}$$ @@ -934,9 +934,9 @@ \subsection{Distribution Models} %Examination of the Response Time (RT) and Inter Arrival Times (IAT) revealed the speed and frequency with which metadata operations are performed, as well as the infrequency of individual users and sessions to interact with a given share. %% NEED: Run the matlab curve fitting to complete this section of the writing Our comparison of the existing standard use of a exponential distribution to model network interarrival and response times is still valid. One should notice that the Gaussian distributions The curve-fitting data shows that the use of an exponential distribution to model network interarrival and response times is still valid. One should notice that the Gaussian distributions % had better $R^2$ result than the exponential equivalent for write operations. This is not surprising due to the step-function shape of the Figure~\ref{fig:CDF-RT-Write} CDF. Examining the $R^2$ results for the read + write I/O operations we find that the exponential distribution is far more accurate at modeling this combined behavior. for write and create operations are similar, while those for read operations are not. Further more there is less similarity between the modeled behavior of general operation inter arrival times and their response times, showing the need for a more refined model for each aspect of the network filesystem interactions. for write and create operations are similar, while those for read operations are not. Furthermore there is less similarity between the modeled behavior of general operation inter arrival times and their response times, showing the need for a more refined model for each aspect of the network filesystem interactions. One should also notice that the general operation model is more closely similar to that of the creates. This makes sense since the influence of create operations are found to dominate the I/O behavior of the network filesystem, which aligns well with the number of existing close operations. %improves the ability of a exponential distribution to model the combined behavior.}