Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Small word change
  • Loading branch information
Duncan committed Feb 3, 2020
1 parent c5e9953 commit 8ac7cf2
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion trackingPaper.tex
Expand Up @@ -321,7 +321,7 @@ The filesize used was in a ring buffer where each file captured was 64000 kB.

The \texttt{.pcap} files from \texttt{tshark} do not lend themselves to easy data analysis, so we translate these files into the DataSeries~\cite{DataSeries} format. HP developed DataSeries, an XML-based structured data format, that was designed to be self-descriptive, storage and access efficient, and highly flexible.
The system for taking captured \texttt{.pcap} files and writing them into the DataSeries format (i.e. \texttt{.ds}) does so by first creating a structure (based on a pre-written determination of the data desired to capture). Once the code builds this structure, it then reads through the capture traffic packets while dissecting and filling in the prepared structure with the desired information and format.
Due to the fundamental nature of this work, there is no need to track every piece of information that is exchanged, only that information which illuminates the behavior of the clients and servers that interact over the network (i.e. I/O transactions). It should also be noted that all sensitive information being captured by the tracing system is hashed to protect the users whose information is examined by the tracing system. Furthermore, the DataSeries file retains only the first 512 bytes of the SMB packet - enough to capture the SMB header information that contains the I/O information we seek, while the body of the SMB traffic is not retained in order to better ensure security of the university's network communications. It is worth noting that in the case of larger SMB headers, some information is lost, but this is a trade-off by the university to provide, on average, the correct sized SMB header but does lead to scenarios where some information may be captured incompletely.
Due to the fundamental nature of this work, there is no need to track every piece of information that is exchanged, only that information which illuminates the behavior of the clients and servers that interact over the network (i.e. I/O transactions). It should also be noted that all sensitive information being captured by the tracing system is hashed to protect the users whose information is examined by the tracing system. Furthermore, the DataSeries file retains only the first 512 bytes of the SMB packet - enough to capture the SMB header information that contains the I/O information we seek, while the body of the SMB traffic is not retained in order to better ensure security of the university's network communications. It is worth noting that in the case of larger SMB headers, some information is lost, however this is a trade-off by the university to provide, on average, the correct sized SMB header but does lead to scenarios where some information may be captured incompletely.

\subsection{DataSeries Analysis}

Expand Down

0 comments on commit 8ac7cf2

Please sign in to comment.