Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
pvfs2-osd/doc/design/storage-interface.tex
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
777 lines (643 sloc)
24.3 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
% | |
% flow design | |
% | |
\documentclass[10pt]{article} % FORMAT CHANGE | |
\usepackage[dvips]{graphicx} | |
\usepackage{times} | |
\graphicspath{{./}{figs/}} | |
% | |
% GET THE MARGINS RIGHT, THE UGLY WAY | |
% | |
% \topmargin 0.2in | |
% \textwidth 6.5in | |
% \textheight 8.75in | |
% \columnsep 0.25in | |
% \oddsidemargin 0.0in | |
% \evensidemargin 0.0in | |
% \headsep 0.0in | |
% \headheight 0.0in | |
\pagestyle{plain} | |
\addtolength{\hoffset}{-2cm} | |
\addtolength{\textwidth}{4cm} | |
\addtolength{\voffset}{-1.5cm} | |
\addtolength{\textheight}{3cm} | |
\setlength{\parindent}{0pt} | |
\setlength{\parskip}{11pt} | |
\title{Trove: The PVFS2 Storage Interface} | |
\author{PVFS Development Team} | |
% \date{December 2000} | |
\begin{document} | |
\maketitle | |
\section{Motivation and Goals} | |
The Trove storage interface will be the lowest level interface used by the | |
PVFS server for storing both file data and metadata. It will be used by | |
individual servers (and servers only) to keep track of locally stored | |
information. There are several goals and ideas that we should keep in | |
mind when discussing this interface: | |
\begin{itemize} | |
\item \textbf{Multiple storage instances}: This interface is intended to | |
hide the use of multiple storage instances for storage of data. This | |
data can be roughly categorized into two types, bytestream and keyval | |
spaces, which are described in further detail below. | |
\item \textbf{Contiguous and noncontiguous data access}: The first cut | |
of this interface will probably only handle contiguous data access. | |
However, we would like to also support some form of noncontiguous access. We | |
think that this will be done through list I/O type operations, as we don't | |
necessarily want anything more complicated at this level. | |
\item \textbf{Metadata storage}: This | |
interface will be used as a building block for storing metadata in | |
addition to file data. This includes extended metadata. | |
\item \textbf{Nonblocking semantics}: This interface will be completely | |
nonblocking for both file data and metadata operations. The usual | |
argument for scalability and flexible interaction with other I/O | |
devices applies here. We should try to provide this | |
functionality without sacrificing latency if possible. | |
\emph{The interface will not require interface calls to be made in order | |
for progress to occur.} This implies that threads will be used | |
underneath where necessary. | |
\item \textbf{Compatibility with flows}: Flows will almost certainly be | |
built on top of this interface. Both the default BMI flow | |
implementation and custom implementations should be able to use this | |
interface. | |
\item \textbf{Consistency semantics}: If we are going to support | |
consistency, locking, etc, then we need to be able to enforce | |
consistency semantics at the storage interface level. The interface | |
will provide the option for serializing access to a dataspace and a vtag | |
interface. | |
\item \textbf{Error recovery}: The system must detect and report errors | |
occurring while accessing data storage. The system may or may not | |
implement redundancy, journaling, etc. for recovering from errors | |
resulting in data loss. | |
\end{itemize} | |
Our first cut implementation of this interface will have the following | |
restrictions: | |
\begin{itemize} | |
\item only one type of storage for bytestreams and one type for keyvals | |
will be supported | |
\item consistency semantics will not be implemented | |
\item errors will be reported, but no measures will be taken to recover | |
\item noncontiguous access will not be enabled | |
\item only one process/thread will be accessing a given storage instance | |
through this interface at a time | |
\end{itemize} | |
\emph{PARTIAL COMPLETION SEMANTICS NEED MUCH WORK!!!} | |
\section{Storage space concepts} | |
A server controls one storage space. | |
Within this storage space are some number of \emph{collections}, which | |
are akin to file systems. Collections serve as a mechanism for | |
supporting multiple traditional file systems on a single server and for | |
separating the use of various physical resources. (Collections can span | |
multiple underlying storage devices, and hints would be used in that case to | |
specify the device on which to place files. This concept might be used in | |
systems that can migrate data from slow storage to faster storage as well). | |
Two collections will be created for each file system: one collection will | |
support the dataspaces needed for the file system's data and metadata objects. | |
A second collection will be created for administrative purposes. If the | |
underlying implementation needs to perform disk i/o, for example, it can use | |
bstream and keyval objects from the administration collection. | |
A collection id will be used in conjunction with other parameters in | |
order to specify a unique entity on a server to access or modify, just as a | |
file system ID might be used. | |
% | |
% DATASPACE CONCEPTS | |
% | |
\section{Dataspace concepts} | |
This storage interface stores and accesses what we will call | |
\emph{dataspaces}. These are logical collections of data organized in one | |
of two possible ways. The first organization for a dataspace is the | |
traditional ``byte stream''. This term refers to arbitrary binary data | |
that can be referenced using offsets and sizes. The second organization | |
is ``keyword/value'' data. This term refers to information that is | |
accessed in a simplified database-like manner. The data is indexed by way of a | |
variable length key rather than an offset and size. Both keyword and | |
value are arbitrary byte arrays with a length parameter (i.e.~ need not | |
be readable strings). We will refer to a | |
dataspace organized as a byte stream as a bytestream dataspace or simply | |
a \emph{bytestream space}, and a dataspace organized by keyword/value | |
pairs as a keyval dataspace or \emph{keyval space}. | |
Each dataspace will have an identifier that is unique to its server, | |
which we will simply call a \emph{handle}. Physically these dataspaces | |
may be stored in any number of ways on underlying storage. | |
Here are some potential uses of each type: | |
\begin{itemize} | |
\item Byte stream | |
\begin{itemize} | |
\item traditional file data | |
\item binary metadata storage (as is currently done in PVFS 1) | |
\end{itemize} | |
\item Key/value | |
\begin{itemize} | |
\item extended metadata attributes | |
\item directory entries | |
\end{itemize} | |
\end{itemize} | |
% All storage interface level data will be stored in discrete \emph{objects}. | |
% Each object can be accessed using both key/value and byte stream | |
% methods. These two halves of the object are known as \emph{data | |
% spaces}. The same handle is used for each data space. The implementor | |
% may choose to access either data space depending on what data is needed. | |
In our design thus far (reference the system interface documents) we have | |
defined four types of \emph{system level objects}. These are data files, | |
metadata files, directories, and symlinks. All four of these will be | |
implemented using a combination of bytestream and/or keyval dataspaces. | |
At the storage interface level there is no real distinction between | |
different types of system level objects. | |
% | |
% VTAG CONCEPTS | |
% | |
\section{Vtag concepts} | |
Vtags are a capability that can be used to implement atomic updates in | |
shared storage systems. In this case they can be used to implement | |
atomic access to a set of shared storage devices through the storage | |
interface. To clarify, these would be of particular use when multiple | |
threads are using the storage interface to access local storage or when | |
multiple servers are accessing shared storage devices such as a MySQL | |
database or SAN storage. | |
This section can be skipped if you are not interested in consistency | |
semantics. Vtags will probably not be implemented in the first cut | |
anyway. | |
\subsection{Phil's poor explanation} | |
Vtags are an approach to ensuring consistency for multiple readers | |
and writers that avoids the use of locks and their associated problems | |
within a distributed environment. These problems include complexity, | |
poor performance in the general case, and awkward error recovery. | |
A vtag fundamentally provides a version number for any region of a byte | |
stream or any individual key/value pair. This allows the implementation | |
of an optimistic approach to consistency. Take the example of a | |
read-modify-write operation. The caller first reads a data region, | |
obtaining a version tag in the process. It then modifies it's own copy | |
of the data. When it writes the data back, it gives the vtag back to | |
the storage interface. The storage interface compares the given vtag against | |
the current vtag for the region. If the vtags match, it indicates that | |
the data has not been modified since it was read by the caller, and the | |
operation succeeds. If the vtags do not match, then the operation fails | |
and the caller must retry the operation. | |
This is an optimistic approach in that the caller always assumes that | |
the region has not been modified. | |
Many different locking primitives can be built upon the vtag concept... | |
\subsection{Use of vtags} | |
Layers above trove can take advantage of vtags as a way to simplify the | |
enforcement of consistency semantics (rather than keeping complicated lists of | |
concurrent operations, simply use the vtag facility to ensure that operations | |
occur atomically). Alternatively they could be used to handle the case of | |
trove resources shared by multiple upper layers. Finally they might be used | |
in conjunction with higher level consistency control in some complimentary | |
fashion (dunno yet...). | |
% | |
% STORAGE INTERFACE | |
% | |
\section{The storage interface} | |
In this section we describe all the functions that make up the storage | |
interface. The storage interface functions can be divided into four | |
categories: dataspace management functions, bytestream access functions, | |
keyval access functions, and completion test functions. The access | |
functions can be further subdivided into contiguous and noncontiguous | |
access capabilities. | |
First we describe the return values and error values for the interface. | |
Then we describe special vtag values and the implementation of keys. | |
Next we describe the dataspace management functions. Next we | |
describe the contiguous and noncontiguous dataspace access functions. | |
Finally we cover the completion test functions. | |
\subsection{Return values} | |
Unless otherwise noted, all functions return an integer with three | |
possible values: | |
\begin{itemize} | |
\item 0: Success. If the operation was nonblocking, then this return | |
value indicates the caller must test for completion later. | |
\item 1: Success with immediate completion. No later testing is required, and | |
no handle is returned for use in testing. | |
\item -errno: Failure. The error code is encoded in the negative return | |
value. | |
\end{itemize} | |
\subsection{Error values} | |
Table~\ref{table:storage_errors} shows values. All values will be returned as | |
integers in the native format (size and byte order). | |
\emph{Needs to be fleshed out. Need to pick a reasonable prefix.} | |
\emph{Phil: Once this is fleshed out, can we apply the same sort of | |
scheme to BMI? BMI doesn't have a particularly informative error | |
reporting mechanism.} | |
\emph{Rob: Definitely. I would really like to make sure that in | |
addition to getting error values back, the error values actually make | |
sense :). This was (and still is in some cases) a real problem for | |
PVFS1.} | |
\begin{table} | |
\caption{Error values for storage interface} | |
\begin{center} | |
\begin{tabular}{|l|l|} | |
\hline | |
Value & Meaning \\ | |
\hline | |
TROVE\_ENOENT & no such dataspace \\ | |
TROVE\_EIO & I/O error \\ | |
TROVE\_ENOSPC & no space on storage device \\ | |
TROVE\_EVTAG & vtag didn't match \\ | |
TROVE\_ENOMEM & unable to allocate memory for operation \\ | |
TROVE\_EINVAL & invalid input parameter \\ | |
\hline | |
\end{tabular} | |
\end{center} | |
\label{table:storage_errors} | |
\end{table} | |
\subsection{Flags related to vtags} | |
As mentioned earlier, the usage of vtags is not manditory. Therefore we | |
define two flags values that can be used to control the behavior of the calls | |
with respect to vtags: | |
\emph{TODO: pick a reasonable prefix for our flags.} | |
\begin{itemize} | |
\item \textbf{FLAG\_VTAG}: Indicates that the vtag is valid. | |
The caller does not have a valid vtag for input, nor does he desire a | |
valid vtag in response. | |
\item \textbf{FLAG\_VTAG\_RETURN}: Indicates that the caller wishes to obtain | |
a vtag from the operation. However, the caller does not wish to use a | |
vtag for input. | |
\end{itemize} | |
By default calls ignore vtag values on input and do not create vtag values for output. | |
\subsection {Implementation of keys, values, and hints} | |
\emph{TODO: sync. with code on data\_sz element.} | |
\begin{verbatim} | |
struct TROVE_keyval { | |
void * buffer; | |
int32_t buffer_sz; | |
int32_t data_sz; | |
}; | |
typedef struct TROVE_keyval TROVE_keyval_s; | |
\end{verbatim} | |
Keys, values, and hints are all implemented with the same TROVE\_keyval | |
structure (do we want a different name?), shown above. Keys and values used | |
in keyval spaces are arbitrary binary data values with an associated length. | |
Hint keys and values have the additional constraint of being null-terminated, | |
readable strings. This makes them very similar to MPI\_Info key/value pairs. | |
\emph{TODO: we should build hints out of a pair of the TROVE\_keyvals. We'll | |
call them a TROVE\_hint\_s in here for now.} | |
\subsection{Functions} | |
\emph{Note: need to add valid error values for each function.} | |
\emph{TODO: find a better format for function descriptions.} | |
\subsubsection{IDs} | |
In this context, IDs are unique identifiers assigned to each storage interface | |
operation. They are used as handles to test for completion of operations once | |
they have been submitted. If an operation completes immediately, then the ID | |
field should be ignored. | |
These IDs are only unique in the context of the storage interface, so upper | |
layers may have to handle management of multiple ID spaces (if working with | |
both a storage interface and a network interface, for instance). | |
The type for these IDs is TROVE\_op\_id. | |
\subsubsection{User pointers} | |
Each function allows the user to pass in a pointer value (void *). This value | |
is returned by the test functions, and it allows for quick reference to user | |
data structures associated with the completed operation. | |
To motivate, normally there is some data at the caller's level that | |
corresponds with the trove operation. Without some help, the caller would | |
have to map IDs for completed operations back to the caller data structures | |
manually. By providing a parameter that the caller can pass in, they can | |
directly reference these structures on trove operation completion. | |
% | |
% DATASPACE MANAGEMENT | |
% | |
\subsubsection{Dataspace management} | |
\begin{itemize} | |
\item \textbf{ds\_create( | |
[in]coll\_id, | |
[in/out]handle, | |
[in]bitmask, | |
[in]type, | |
[in/out]hint, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Creates a new storage interface object. The interface will | |
fill any any portion of the handle that is not already filled in and | |
ensure that it is unique. For example, if the caller wants to | |
specify the first 16 bits of the handle, it may do so by setting the | |
appropriate bits and then specifying with the bitmask that the storage | |
interface should not modify those bits. | |
The type field can be used by the caller to assign an arbitrary integer | |
type to the object. This may, for example, be used to distinguish | |
between directories, symlinks, datafiles, and metadata files. The storage | |
interface does not assign any meaning to the type value. \emph{Do we even need | |
this type field?} | |
The hint field may be used to specify what type of underlying storage | |
should be used for this dataspace in the case where multiple potential | |
underlying storage methods are available. | |
\item \textbf{ds\_remove( | |
[in]handle, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Removes an existing object from the system. | |
\item \textbf{ds\_verify( | |
[in]coll\_id, | |
[in]handle, | |
[out]type, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Verifies that an object exists with the specified handle. If the object | |
does exist, then the type of the object is also returned. Useful for | |
verifying sanity of handles provided by client. | |
\item \textbf{ds\_getattr( | |
[in]coll\_id, | |
[in]handle, | |
[out]ds\_attr, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Obtains statistics about the given dataspace that aren't actually stored | |
within the dataspace. This may include information such as number of | |
key/value pairs, size of byte stream, access statistics, on what medium | |
it is stored, etc. | |
\item \textbf{ds\_setattr() ???} | |
\item \textbf{ds\_hint( | |
[in]coll\_id, | |
[in]handle, | |
[in/out]hint | |
);}: | |
Passes a hint to the underlying trove implementation. Used to indicate | |
caching needs, access patterns, begin/end of use, etc. | |
\item \textbf{ds\_migrate( | |
[in]coll\_id, | |
[in]handle, | |
[in/out]hint, | |
[in]user\_ptr, | |
[out]id | |
);}: | |
Used to indicate that a dataspace should be migrated to another medium. | |
\emph{could this be done with just the hint call? having an id in this case | |
is particularly useful ... so we know the operation is completed...} | |
\end{itemize} | |
% | |
% BYTESTREAM ACCESS | |
% | |
\subsubsection{Byte stream access} | |
Parameters in read and write at calls are ordered similarly to pread and pwrite. | |
\begin{itemize} | |
\item \textbf{bstream\_read\_at( | |
[in]coll\_id, | |
[in]handle, | |
[in]buffer, | |
[in]size, | |
[in]offset, | |
[in]flags, | |
[out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Reads a contiguous region from bytestream. Most of the arguments are self | |
explanatory. The flags are not yet defined, but may include such | |
possibilities as specifying atomic operations. The vtag returned from this | |
function applies to the region of the byte stream defined by the requested | |
offset and size. \underline{A flag can be passed in if the caller does not | |
want a vtag returned.} This allows the underlying implementation to avoid the | |
overhead of calculating the value. | |
\emph{The size is [in/out] in code? Figure out semantics!!!} | |
\item \textbf{bstream\_write\_at( | |
[in]coll\_id, | |
[in]handle, | |
[in]buffer, | |
[in]size, | |
[in]offset, | |
[in]flags, | |
[in/out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Writes a contiguous region to the bytestream. Same arguments as | |
read\_bytestream, except that the vtag is an in/out parameter. | |
\emph{The size is [in/out] in code? Figure out semantics!!!} | |
\item \textbf{bstream\_resize( | |
[in]coll\_id, | |
[in]handle, | |
[in]size, | |
[in]flags, | |
[in/out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Used to truncate or allocate storage for a bytestream. Flags are used | |
to specify if preallocation is desired. | |
\item \textbf{bstream\_validate( | |
[in]handle, | |
[in/out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
This function may be used to check for modification of a particular | |
bytestream. | |
\emph{Flags?} | |
\end{itemize} | |
% | |
% KEYVAL ACCESS | |
% | |
\subsubsection{Key/value access} | |
An important call for keyval spaces is the iterator function. The | |
iterator function is used to obtain all keyword/value pairs from the | |
keyval space with a sequence of calls from the client. The iterator | |
function returns a logical, opaque ``position'' value that allows a | |
client to continue reading pairs from the keyval space where it last | |
left off. | |
\begin{itemize} | |
\item \textbf{keyval\_read( | |
[in]coll\_id, | |
[in]handle, | |
[in]key, | |
[out]val, | |
[in]flags, | |
[out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Reads the value corresponding to a given key. Fails if the key does | |
not exist. A buffer is provided for the value to be placed in (the | |
value may be an arbitrary type). | |
The amount of data actually placed in the value buffer should be indicated by | |
the data\_sz element of the structure. | |
\item \textbf{keyval\_write( | |
[in]coll\_id, | |
[in]handle, | |
[in]key, | |
[in]val, | |
[in]flags, | |
[in/out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Writes out a value for a given key. If the key does not exist, it is | |
added. | |
\item \textbf{keyval\_remove( | |
[in]coll\_id, | |
[in]handle, | |
[in]key, | |
[in]flags, | |
[in/out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Removes a key/value pair from the keyval data space. | |
\item \textbf{keyval\_validate( | |
[in]coll\_id, | |
[in]handle, | |
[in/out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Used to check for modification of a particular key/value pair. | |
\item \textbf{keyval\_iterate( | |
[in]coll\_id, | |
[in]handle, | |
[in/out]position, | |
[out]key\_array, | |
[out]val\_array, | |
[in/out]count, | |
[in]flags, | |
[in/out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Reads count keyword/value pairs from the provided logical position in | |
the keyval space. Fails if the vtag doesn't match. The position | |
SI\_START\_POSITION is used to start at the beginning, and a new | |
position is returned allowing the caller to continue where they left | |
off. | |
keyval\_iterate will always read \emph{count} items, unless it hits the end | |
of the keyval space (EOK). After hitting EOK, \emph{count} will be set to | |
the number of pairs processed. Thus, callers must compare \emph{count} | |
after calling and compare with the value it had before the function call: | |
if they are different, EOK has been reached. If there are N items left in | |
the keyspace, and keyval\_iterate requests N items, there will be no | |
indication that EOK has been reached and only after making another call | |
will the caller know he is at EOK. The value of \emph{position} is not | |
meaningful after reaching EOK. | |
\item \textbf{keyval\_iterate\_keys( | |
[in]coll\_id, | |
[in]handle, | |
[in/out]position, | |
[out]key\_array, | |
[in]count, | |
[in]flags, | |
[in/out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
Similar to above, but only returns keys, not corresponding values. \emph{need | |
to fix parameters} | |
\end{itemize} | |
\subsubsection{Noncontiguous (list) access} | |
These functions are used to read noncontiguous byte stream regions or | |
multiple key/value pairs. | |
\emph{How do vtags work with noncontiguous calls?} | |
The byte stream functions will implement simple listio style | |
noncontiguous access. Any more advanced data types should be unrolled | |
into flat regions before reaching this interface. The process for unrolling | |
is outside the scope of this document, but examples are available in the ROMIO code. | |
\emph{TODO: SEMANTICS!!!!!} | |
\emph{TODO: how to we report partial success for listio calls?} | |
\begin{itemize} | |
\item \textbf{bstream\_read\_list( | |
[in]coll\_id, | |
[in]handle, | |
[in]mem\_offset\_array, | |
[in]mem\_size\_array, | |
[in]mem\_count, | |
[in]stream\_offset\_array, | |
[in]stream\_size\_array, | |
[in]stream\_count, | |
[in]flags, | |
[out]vtag(?), | |
[in]user\_ptr, | |
[out]id | |
)}: | |
\item \textbf{bstream\_write\_list( | |
[in]coll\_id, | |
[in]handle, | |
[in]mem\_offset\_array, | |
[in]mem\_size\_array, | |
[in]mem\_count, | |
[in]stream\_offset\_array, | |
[in]stream\_size\_array, | |
[in]stream\_count, | |
[in]flags, | |
[in/out]vtag(?), | |
[in]user\_ptr, | |
[out]id | |
)}: | |
\item \textbf{keyval\_read\_list( | |
[in]coll\_id, | |
[in]handle, | |
[in]key\_array, | |
[in]value\_array, | |
[in]count, | |
[in]flags, | |
[out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
\item \textbf{keyval\_write\_list( | |
[in]coll\_id, | |
[in]handle, | |
[in]key\_array, | |
[in]value\_array, | |
[in]count, | |
[in]flags, | |
[in/out]vtag, | |
[in]user\_ptr, | |
[out]id | |
)}: | |
\end{itemize} | |
\subsubsection{Testing for completion} | |
\emph{Do we need coll\_ids here?} | |
\begin{itemize} | |
\item \textbf{test( | |
[in]coll\_id, | |
[in]id, | |
[out]count, | |
[out]vtag, | |
[out]user\_ptr, | |
[out]state | |
)}: | |
Tests for completion of a storage interface operation. The count field | |
indicates how many operations completed (in this case either 1 or 0). | |
If an operation completes, then the final status of the operation should | |
be checked using the state parameter. Note the vtag output argument | |
here; it is used to provide vtags for operations that did not complete | |
immediately. | |
% \item \textbf{testglobal( | |
% [in]coll\_id, | |
% [in/out]id\_array, | |
% [in/out]count, | |
% [out]vtag\_array, | |
% [out]user\_ptr\_array, | |
% [out]state\_array | |
% )}: | |
% Tests for completion of any storage interface operation(s). The | |
% id\_array is filled in with some number of completed operation ids. The | |
% number must not exceed the input value of count, and count is set to the | |
% number of returned ids on exit. \emph{this name is nonintuitive to me}. | |
\item \textbf{testsome( | |
[in]coll\_id, | |
[in/out]id\_array, | |
[in/out]count, | |
[out]vtag\_array, | |
[out]user\_ptr\_array, | |
[out]state\_array | |
)}: | |
Tests for completion of one or more trove operations. The id\_array lists | |
operations to test on. A value of TROVE\_OP\_ID\_NULL will be ignored. Count is | |
set to the number of completed items on return. | |
\emph{TODO: fix up semantics for testsome; look at MPI functions for ideas.} | |
\emph{wait function for testing purposes if nothing else?} | |
\end{itemize} | |
\emph{Note: need to discuss completion queue, internal or external?} | |
\emph{Phil: See pvfs2-internal email at \\ | |
http://beowulf-underground.org/pipermail/pvfs2-internal/2001-October/000010.html | |
for my thoughts on this topic.} | |
\subsubsection{Batch operations} | |
Batch operations are used to perform a sequence of operations possibly as an | |
atomic whole. These will be handled at a higher level. | |
\section{Optimizations} | |
This section lists some potential optimizations that might be applied at this | |
layer or that are related to this layer. | |
\subsection{Metadata Stuffing} | |
In many file systems ``inode stuffing'' is used to store the data for small | |
files in the space used to store pointers to indirect blocks. The analogous | |
approach for PVFS2 would be to store the data for small files in the | |
bytestream space associated with the metafile. | |
\end{document} | |