Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
pvfs2-osd/doc/design/distributions.tex
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
298 lines (234 sloc)
12.3 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
% | |
% server design | |
% | |
\documentclass[11pt]{article} | |
\usepackage[dvips]{graphicx} | |
\usepackage{times} | |
\graphicspath{{./}{figs/}} | |
\pagestyle{plain} | |
\addtolength{\hoffset}{-2cm} | |
\addtolength{\textwidth}{4cm} | |
\addtolength{\voffset}{-1.5cm} | |
\addtolength{\textheight}{3cm} | |
\setlength{\parindent}{0pt} | |
\setlength{\parskip}{11pt} | |
\title{PVFS2 Distribution Design Notes} | |
\author{PVFS Development Team} | |
\date{May 2004} | |
\begin{document} | |
\maketitle | |
\section{Introduction} | |
This document is intended to serve as a reference for the design of the | |
PVFS2 file distributions. This should (eventually) include a description | |
of the mechanism and a guide on developing new distribution methods. | |
Distributions in PVFS are a mapping from a logical sequence of bytes | |
to a physical sequence of bytes on each of several I/O servers. To | |
be of use to PVFS system code this mapping is expressed as a set of | |
methods. | |
Files in PVFS appear as a linear sequence of bytes. A specific byte | |
in a file is identified by its offset from the start of this sequence. | |
This is refered to here as a \emph{logical offset}. A contiguous | |
sequence of bytes can be specified with a logical offset and an extent. | |
Requests for access to file data can be to PVFS servers using various | |
request formats. Regardless of the format, the same data request is | |
sent to all PVFS servers that store part of the requested data. These | |
formats must be decoded to produce a series of contiguous sequences of | |
bytes each with a logical offest and extent. | |
PVFS servers store some part of the logical byte sequence of each file | |
in a linear sequence of bytes or byte stream within a data space | |
associated with the file. | |
Bytes within this byte stream are identified by their offset from the | |
start of the byte stream referred to here as a \emph{physical offset}. | |
On the server the PVFS distribution methods are used to determine which | |
portion of the requested data is stored on the server, and where in | |
the associated byte stream the data is stored. | |
\section{System Interface Distributions} | |
PVFS2 users should be able to utilize distributions effectively through | |
the system interface. API's are exposed that allow users to create files | |
with the user-specified distribution. In the case that no distribution is | |
specified (i.e. the NULL distribution is specified), the default distribution, | |
simple stripe is used. The system interface must be initialized before | |
distributions may be accessed. | |
The external distribution API is exposed to users via the following data types | |
and functions: | |
\begin{verbatim} | |
struct PVFS_sys_dist; | |
\end{verbatim} | |
The system interface distribution structure. It contains the distribution | |
identifier (i.e. the name) and a pointer to an instance of the distribution | |
parameters for this type distribution. In general, the user should not | |
modify the data within this struct. | |
\begin{verbatim} | |
int PVFS_sys_create( char* entry_name, | |
PVFS_object_ref ref, | |
PVFS_sys_attr, | |
PVFS_credentials credentials, | |
PVFS_sys_dist* dist, | |
PVFS_sysresp_create* resp ); | |
\end{verbatim} | |
Creates a file using the specified distribution. If no distribution is | |
specified, the default distribution \emph{simple\_stripe} is used during | |
creation. The distribution used during file creation is stored with the | |
file and may not be changed later. Altering the distribution used to | |
store the file contents could result in data corruption. | |
\begin{verbatim} | |
PVFS_sys_dist* PVFS_sys_dist_lookup( const char* name ); | |
\end{verbatim} | |
Allocates a new distribution instance by copying the internal distribution | |
registered for the supplied name. Note that the internal distribution has | |
additional data not exposed thru the system interface, but that should be | |
fully configurable thru the distribution parameters. | |
\begin{verbatim} | |
int PVFS_sys_dist_free( PVFS_sys_dist* dist ); | |
\end{verbatim} | |
Deallocate all system interface resources allocated during distribution | |
lookup. | |
\begin{verbatim} | |
int PVFS_sys_dist_setparam( PVFS_sys_dist* dist, | |
const char* param, | |
void* value ); | |
\end{verbatim} | |
Set the distribution parameter specified by the string \emph{param} to | |
\emph{value}. The strings used to specify parameters are distribution defined | |
but should generally correspond to the field name in the distributions | |
parameter struct. All parameters must be set before the distribution is used | |
in file creation. Once a file is created, there is no safe way to modify | |
the distribution parameters for that file. | |
\section{Distribution Initialization} | |
All distributions are registered during PVFS2 initialization. Although there | |
has been some discussion about having distributions function as loadable | |
modules, there is currently no support for that feature within PVFS2. All | |
available distributions are loaded into a registration table during | |
initialization and registered with the distribution name as the key. When a | |
user then wishes to create a distribution later, a lookup can be performed | |
with the distribution name, and a copy of the registered distribution is | |
returned. The registered distribution itself is never actually modified after | |
registration. The only opportunity to modify the registered distribution is | |
during the registration itself. Each distribution implements a callback | |
method named \emph{register\_init} that is called during registration. The | |
function signature is described completely below, for now we merely want to | |
note that this function is called exactly once (at registration time), and | |
it is generally used by distributions to setup the distribution parameter | |
strings (for use in PVFS\_sys\_dist\_setparam), and to set default parameter | |
values. | |
Distribution initialization is performed by the function | |
PINT\_dist\_initialize() in pint-dist-utils.h. In order to add a new | |
distribution to the table of registered distributions, it will be neccesary to | |
modify this function. | |
\section{Internal Distribution Representation} | |
PVFS2 distributions are internally represented with the struct PINT\_dist. | |
This structure contains a pointer to the distribution name, methods, | |
parameters and various sizes. The internal distributions are used on both the | |
clients and the metadata server, as well as being stored physically with the | |
file metadata. | |
When a user creates a file, the system distribution supplied, or the default | |
distribution is exchanged for a corresponding PINT\_dist structure. It is this | |
structure that will be used for any further operations performed on the file | |
and stored in the metadata for the file. | |
The client and server both use the distribution methods to fulfill the request | |
from the client to the server to locate a specific byte range in a specific | |
file. All this processing is performed within the PINT request for the file | |
and byte range. The main difference in the client and server processing is the | |
way segments are built is different as they represent the distribution of data | |
from the various servers, not the distribution of data on the server (What in | |
the world does this sentence mean?!?) | |
Distribution parameters are defined in the exported header for the | |
distribution (e.g. for the simple stripe distribution, the header file is | |
pvfs2-dist-simple-stripe.h). The distribution methods are usually defined in | |
a corresponding implementation file in the io/description subsystem (e.g. the | |
simple stripe implementation is in io/description/dist-simple-stripe.c). | |
The methods defined for each distribution allow it to completely specify how | |
the file data is mapped to the PVFS2 disk abstraction, the data file object. | |
The one possible exception to this is that distributions cannot currently | |
assert their preference in how data file objects are mapped to data servers. | |
This is planned in the near future, however their is no current consensus on | |
how to improve upon the current round robin mapping approach (see | |
PINT\_bucket\_get\_next\_io). | |
\section{Distribution Parameters} | |
The parameters for each distribution are defined in a struct defined | |
specifically for the distribution, and an individual instance of the | |
parameters is stored in the metadata of every file. | |
Both the PVFS\_sys\_dist and PINT\_dist data structures maintain a pointer to | |
the same distribution parameters. The parameters are passed into every call to | |
distribution code so that distribution can modify its behavior as neccesary. | |
The distribution provider can also provide a method for setting the | |
distribution parameters explicitly as described in the distribution methods | |
below. | |
\section{Distribution Methods} | |
The distribution methods are the individual code used by each distribution to | |
perform mappings between the logical file data and the data file objects. The | |
methods also provide a mechanism for encoding/decoding the distribution | |
parameters, determining the number of data file objects to create for a file, | |
modifying distribution parameters, and distribution registration tasks. For | |
some of the methods a default implementation is available that may be | |
acceptable for most distributions. | |
\begin{verbatim} | |
PVFS_offset logical_to_physical_offset( void* params, | |
uint32_t dfile_nr, | |
uint32_t dfile_ct, | |
PVFS_offset logical_offset ); | |
\end{verbatim} | |
Given a logical offset, return the physical offset that corresponds to | |
that logical offset. Returns a physical offset. The return value rounds | |
down to the largest physical offset held by the I/O server if the | |
logical offset does not map to a physical offset on that server. | |
\begin{verbatim} | |
PVFS_offset physical_to_logical_offset( void* params, | |
uint32_t dfile_nr, | |
uint32_t dfile_ct, | |
PVFS_offset physical_offset) | |
\end{verbatim} | |
Given a physical offset, return the logical offset that corresponds to | |
that physical offset. Returns a logical offset. The input value is | |
assumed to be on the current PVFS server. | |
\begin{verbatim} | |
PVFS_offset next_mapped_offset( void* params, | |
uint32_t dfile_nr, | |
uint32_t dfile_ct, | |
PVFS_offset logical_offset) | |
\end{verbatim} | |
Given a logical offset, find the logical offset greater than or equal | |
to the logical offset that maps to a physical offset on the current | |
PVFS server. Returns a logical offset. | |
\begin{verbatim} | |
PVFS_size contiguous_length( void* params, | |
uint32_t dfile_nr, | |
uint32_t dfile_ct, | |
PVFS_offset physical_offset) | |
\end{verbatim} | |
Beginning in a given physical location, return the number of contiguous | |
bytes in the physical bytes stream on the current PVFS server that map | |
to contiguous bytes in the logical byte sequence. Returns a length in bytes. | |
\begin{verbatim} | |
int get_num_dfiles( void* params, | |
uint32_t num_servers_requested, | |
uint32_t num_dfiles_requested ) | |
\end{verbatim} | |
Returns the number of data file objects to use for the requested file. The | |
number of servers requested and number of data files requested are hints from | |
the user that the distribution can ignore if neccesary. A default | |
implementation of this function is provided in pint-dist-utils.h that returns | |
the number of servers requested (which is usually the number of data servers | |
in the system). | |
\begin{verbatim} | |
int set_param( const char* dist_name, void* params | |
const char* param_name, void* value ) | |
\end{verbatim} | |
Set the distribution parameter described by \emph{param\_name} to | |
\emph{value}. A default implementation is provided in pint-dist-utils.h that | |
can handle parameters that have been previously registered. | |
\begin{verbatim} | |
void encode_lebf( char** pptr, void* params ) | |
\end{verbatim} | |
Write \emph{params} into the data stream pptr in little endian byte format. | |
\begin{verbatim} | |
void decode_lebf( char** pptr, void* params ) | |
\end{verbatim} | |
Read \emph{params} from the data stream pptr in little endian byte format. | |
\begin{verbatim} | |
void registration_init( void* params ) | |
\end{verbatim} | |
Called when the distribution is registered (i.e. once). Used to set default | |
distribution values, register parameters, or any other initialization activity | |
needed by the distribution. | |
\end{document} |