Skip to content
Permalink
1139b72d5e
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
298 lines (234 sloc) 12.3 KB
%
% server design
%
\documentclass[11pt]{article}
\usepackage[dvips]{graphicx}
\usepackage{times}
\graphicspath{{./}{figs/}}
\pagestyle{plain}
\addtolength{\hoffset}{-2cm}
\addtolength{\textwidth}{4cm}
\addtolength{\voffset}{-1.5cm}
\addtolength{\textheight}{3cm}
\setlength{\parindent}{0pt}
\setlength{\parskip}{11pt}
\title{PVFS2 Distribution Design Notes}
\author{PVFS Development Team}
\date{May 2004}
\begin{document}
\maketitle
\section{Introduction}
This document is intended to serve as a reference for the design of the
PVFS2 file distributions. This should (eventually) include a description
of the mechanism and a guide on developing new distribution methods.
Distributions in PVFS are a mapping from a logical sequence of bytes
to a physical sequence of bytes on each of several I/O servers. To
be of use to PVFS system code this mapping is expressed as a set of
methods.
Files in PVFS appear as a linear sequence of bytes. A specific byte
in a file is identified by its offset from the start of this sequence.
This is refered to here as a \emph{logical offset}. A contiguous
sequence of bytes can be specified with a logical offset and an extent.
Requests for access to file data can be to PVFS servers using various
request formats. Regardless of the format, the same data request is
sent to all PVFS servers that store part of the requested data. These
formats must be decoded to produce a series of contiguous sequences of
bytes each with a logical offest and extent.
PVFS servers store some part of the logical byte sequence of each file
in a linear sequence of bytes or byte stream within a data space
associated with the file.
Bytes within this byte stream are identified by their offset from the
start of the byte stream referred to here as a \emph{physical offset}.
On the server the PVFS distribution methods are used to determine which
portion of the requested data is stored on the server, and where in
the associated byte stream the data is stored.
\section{System Interface Distributions}
PVFS2 users should be able to utilize distributions effectively through
the system interface. API's are exposed that allow users to create files
with the user-specified distribution. In the case that no distribution is
specified (i.e. the NULL distribution is specified), the default distribution,
simple stripe is used. The system interface must be initialized before
distributions may be accessed.
The external distribution API is exposed to users via the following data types
and functions:
\begin{verbatim}
struct PVFS_sys_dist;
\end{verbatim}
The system interface distribution structure. It contains the distribution
identifier (i.e. the name) and a pointer to an instance of the distribution
parameters for this type distribution. In general, the user should not
modify the data within this struct.
\begin{verbatim}
int PVFS_sys_create( char* entry_name,
PVFS_object_ref ref,
PVFS_sys_attr,
PVFS_credentials credentials,
PVFS_sys_dist* dist,
PVFS_sysresp_create* resp );
\end{verbatim}
Creates a file using the specified distribution. If no distribution is
specified, the default distribution \emph{simple\_stripe} is used during
creation. The distribution used during file creation is stored with the
file and may not be changed later. Altering the distribution used to
store the file contents could result in data corruption.
\begin{verbatim}
PVFS_sys_dist* PVFS_sys_dist_lookup( const char* name );
\end{verbatim}
Allocates a new distribution instance by copying the internal distribution
registered for the supplied name. Note that the internal distribution has
additional data not exposed thru the system interface, but that should be
fully configurable thru the distribution parameters.
\begin{verbatim}
int PVFS_sys_dist_free( PVFS_sys_dist* dist );
\end{verbatim}
Deallocate all system interface resources allocated during distribution
lookup.
\begin{verbatim}
int PVFS_sys_dist_setparam( PVFS_sys_dist* dist,
const char* param,
void* value );
\end{verbatim}
Set the distribution parameter specified by the string \emph{param} to
\emph{value}. The strings used to specify parameters are distribution defined
but should generally correspond to the field name in the distributions
parameter struct. All parameters must be set before the distribution is used
in file creation. Once a file is created, there is no safe way to modify
the distribution parameters for that file.
\section{Distribution Initialization}
All distributions are registered during PVFS2 initialization. Although there
has been some discussion about having distributions function as loadable
modules, there is currently no support for that feature within PVFS2. All
available distributions are loaded into a registration table during
initialization and registered with the distribution name as the key. When a
user then wishes to create a distribution later, a lookup can be performed
with the distribution name, and a copy of the registered distribution is
returned. The registered distribution itself is never actually modified after
registration. The only opportunity to modify the registered distribution is
during the registration itself. Each distribution implements a callback
method named \emph{register\_init} that is called during registration. The
function signature is described completely below, for now we merely want to
note that this function is called exactly once (at registration time), and
it is generally used by distributions to setup the distribution parameter
strings (for use in PVFS\_sys\_dist\_setparam), and to set default parameter
values.
Distribution initialization is performed by the function
PINT\_dist\_initialize() in pint-dist-utils.h. In order to add a new
distribution to the table of registered distributions, it will be neccesary to
modify this function.
\section{Internal Distribution Representation}
PVFS2 distributions are internally represented with the struct PINT\_dist.
This structure contains a pointer to the distribution name, methods,
parameters and various sizes. The internal distributions are used on both the
clients and the metadata server, as well as being stored physically with the
file metadata.
When a user creates a file, the system distribution supplied, or the default
distribution is exchanged for a corresponding PINT\_dist structure. It is this
structure that will be used for any further operations performed on the file
and stored in the metadata for the file.
The client and server both use the distribution methods to fulfill the request
from the client to the server to locate a specific byte range in a specific
file. All this processing is performed within the PINT request for the file
and byte range. The main difference in the client and server processing is the
way segments are built is different as they represent the distribution of data
from the various servers, not the distribution of data on the server (What in
the world does this sentence mean?!?)
Distribution parameters are defined in the exported header for the
distribution (e.g. for the simple stripe distribution, the header file is
pvfs2-dist-simple-stripe.h). The distribution methods are usually defined in
a corresponding implementation file in the io/description subsystem (e.g. the
simple stripe implementation is in io/description/dist-simple-stripe.c).
The methods defined for each distribution allow it to completely specify how
the file data is mapped to the PVFS2 disk abstraction, the data file object.
The one possible exception to this is that distributions cannot currently
assert their preference in how data file objects are mapped to data servers.
This is planned in the near future, however their is no current consensus on
how to improve upon the current round robin mapping approach (see
PINT\_bucket\_get\_next\_io).
\section{Distribution Parameters}
The parameters for each distribution are defined in a struct defined
specifically for the distribution, and an individual instance of the
parameters is stored in the metadata of every file.
Both the PVFS\_sys\_dist and PINT\_dist data structures maintain a pointer to
the same distribution parameters. The parameters are passed into every call to
distribution code so that distribution can modify its behavior as neccesary.
The distribution provider can also provide a method for setting the
distribution parameters explicitly as described in the distribution methods
below.
\section{Distribution Methods}
The distribution methods are the individual code used by each distribution to
perform mappings between the logical file data and the data file objects. The
methods also provide a mechanism for encoding/decoding the distribution
parameters, determining the number of data file objects to create for a file,
modifying distribution parameters, and distribution registration tasks. For
some of the methods a default implementation is available that may be
acceptable for most distributions.
\begin{verbatim}
PVFS_offset logical_to_physical_offset( void* params,
uint32_t dfile_nr,
uint32_t dfile_ct,
PVFS_offset logical_offset );
\end{verbatim}
Given a logical offset, return the physical offset that corresponds to
that logical offset. Returns a physical offset. The return value rounds
down to the largest physical offset held by the I/O server if the
logical offset does not map to a physical offset on that server.
\begin{verbatim}
PVFS_offset physical_to_logical_offset( void* params,
uint32_t dfile_nr,
uint32_t dfile_ct,
PVFS_offset physical_offset)
\end{verbatim}
Given a physical offset, return the logical offset that corresponds to
that physical offset. Returns a logical offset. The input value is
assumed to be on the current PVFS server.
\begin{verbatim}
PVFS_offset next_mapped_offset( void* params,
uint32_t dfile_nr,
uint32_t dfile_ct,
PVFS_offset logical_offset)
\end{verbatim}
Given a logical offset, find the logical offset greater than or equal
to the logical offset that maps to a physical offset on the current
PVFS server. Returns a logical offset.
\begin{verbatim}
PVFS_size contiguous_length( void* params,
uint32_t dfile_nr,
uint32_t dfile_ct,
PVFS_offset physical_offset)
\end{verbatim}
Beginning in a given physical location, return the number of contiguous
bytes in the physical bytes stream on the current PVFS server that map
to contiguous bytes in the logical byte sequence. Returns a length in bytes.
\begin{verbatim}
int get_num_dfiles( void* params,
uint32_t num_servers_requested,
uint32_t num_dfiles_requested )
\end{verbatim}
Returns the number of data file objects to use for the requested file. The
number of servers requested and number of data files requested are hints from
the user that the distribution can ignore if neccesary. A default
implementation of this function is provided in pint-dist-utils.h that returns
the number of servers requested (which is usually the number of data servers
in the system).
\begin{verbatim}
int set_param( const char* dist_name, void* params
const char* param_name, void* value )
\end{verbatim}
Set the distribution parameter described by \emph{param\_name} to
\emph{value}. A default implementation is provided in pint-dist-utils.h that
can handle parameters that have been previously registered.
\begin{verbatim}
void encode_lebf( char** pptr, void* params )
\end{verbatim}
Write \emph{params} into the data stream pptr in little endian byte format.
\begin{verbatim}
void decode_lebf( char** pptr, void* params )
\end{verbatim}
Read \emph{params} from the data stream pptr in little endian byte format.
\begin{verbatim}
void registration_init( void* params )
\end{verbatim}
Called when the distribution is registered (i.e. once). Used to set default
distribution values, register parameters, or any other initialization activity
needed by the distribution.
\end{document}