Skip to content
Permalink
1139b72d5e
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
225 lines (190 sloc) 9.7 KB
%
% pvfs2-client design
%
\documentclass[11pt]{article}
\usepackage[dvips]{graphicx}
\usepackage{times}
\graphicspath{{./}{figs/}}
\pagestyle{plain}
\addtolength{\hoffset}{-2cm}
\addtolength{\textwidth}{4cm}
\addtolength{\voffset}{-1.5cm}
\addtolength{\textheight}{3cm}
\setlength{\parindent}{0pt}
\setlength{\parskip}{12pt}
\title{\texttt{pvfs2-client} Design Document (DRAFT)}
\author{PVFS Development Team}
\date{April 2003}
\begin{document}
\maketitle
\section{Introduction}
The primary role of the pvfs2-client daemon is to efficiently {\it
marshal} operation requests and data from the kernel's VFS ({\it
Virtual File System}, or {\it Virtual Filesystem Switch}) layer to
the pvfs2-server, and return responses from the pvfs2-server(s)
back to the VFS layer. This involves waiting for file system and
I/O requests, performing operations against the pvfs2-server
application(s), and passing responses back to the Linux kernel's VFS
layer. The data medium for the communication between the VFS request
and the pvfs2-client application is the /dev/pvfs2 device node. An
interface that will allow incoming unexpected requests from the
/dev/pvfs2 device node is required, and using the existing BMI
interface is preferred.
\begin{figure*}
\begin{center}
\includegraphics[scale=0.4]{pvfs2-vfs.eps}
\end{center}
\caption{High Level PVFS2 Architecture}
\label{figure:arch}
\end{figure*}
Figure 1 illustrates the architecture of several components of
PVFS2. This document will focus specifically on the pvfs2-client
application.
\section{Motivation for the \texttt{pvfs2-client} Application}
Currently, our entire code base exists as user space code. This
includes all of our networking support (through the {\it BMI} and {\it
Flow Interfaces}), and our non-blocking request handling architecture
through the {\it Job Interface}. The pvfs2-server already uses these
interfaces to manage multiple simultaneous operations in flight at
once. Similarly, it is highly desirable to have a pvfs2-client
application that can issue and manage multiple simultaneous operations
at once when communicating with the pvfs2-servers. Therefore, at least
in the short term, it would be most appropriate to leverage as much of
our existing code as possible. A user-space application is required
to make use of this code, and thus the need for the pvfs2-client
application to bridge the gap between the Linux kernel's VFS layer and
the {\it System Interface}.
\section{\texttt{pvfs2-client} Application Architecture}
The pvfs2-client application consists of a set of state machines
roughly corresponding to all file system and I/O operations that can
be requested from the VFS. At a high level, the pvfs2-client
application appears to share a common architecture with the
pvfs2-server application. The most notable distinction between the
pvfs2-client architecture and the pvfs2-server architecture is the
source of the unexpected requests. On the pvfs2-server, unexpected
requests come from over the network through the BMI Interface. The
pvfs2-client receives unexpected messages from the /dev/pvfs2 device
node. It would be ideal if the BMI Interface could be used to monitor
the /dev/pvfs2 device node.
One responsibility of the pvfs2-client application is to wait for jobs
in progress to complete. Waiting on pending jobs is implemented as a
non-blocking operation against the existing job interface using the
call job\_testcontext. This call returns a list of unexpected or
completed jobs that were submitted previously by states of the various
state machine operation implementations.
For each job returned from job\_testcontext, the pvfs2-client
application checks if the job is an unexpected request. If the
job {\it is} an unexpected request, it initializes an appropriate
state machine for that job. Regardless of whether or not the job was
unexpected, the state of each job is advanced to the next until a
blocking operation is encountered.
Unexpected requests are delivered to the pvfs2-client application only
from the /dev/pvfs2 device node that the pvfs2-client application
monitors through the job interface. These requests are generated and
passed up from the Linux kernel's VFS layer by the PVFS2 kernel module
that implements the VFS operations.
The pvfs2-client has a similar processing loop as the pvfs2-server:
\begin{verbatim}
while (pvfs2-client application is running)
{
...
wait on pending jobs in progress and expected requests
...
foreach job returned
if job is an unexpected request
initialize appropriate operation state machine
end if
...
while completions occur immediately
advance to next state in state machine
end while
end foreach
}
\end{verbatim}
\section{Limitations of the Existing System Interface}
Currently, all client interaction to a pvfs2-server is done through
the {\it System Interface} API. This interface provides a set of file
system and I/O operations to be performed against the pvfs2-server(s),
but suffers from several major limitations in its current state.
These limitations can be described briefly as:
\begin{itemize}
\item \emph{Semantic Limitations}: the current implementation
provides a blocking interface to all operations. We already know
that a non-blocking interface is required for efficient access
through other existing non-blocking iterfaces such as ROMIO.
\item \emph{Reusability Limitations}: the current implementation
performs many blocking operations. This cannot be used {\it as
is} in the proposed non-blocking state-machine oriented
architecture of the pvfs2-client.
\end{itemize}
A proposed redesign of the System Interface implemented in terms of
reusable state machines can solve these limitations, as discussed
below.
\section{\texttt{pvfs2-client} Request Servicing}
\begin{figure*}
\begin{center}
\includegraphics[scale=0.4]{core-sm.eps}
\end{center}
\caption{Operation Servicing State Machine (w/nested core state machine)}
\label{figure:generic-sm}
\end{figure*}
Operation request servicing in the pvfs2-client application will be
implemented by state machines. That is, for each type of request that
can be handed up from the PVFS2 kernel module, a matching state
machine will exist to service it. The types of operation requests
required will roughly correspond to all of the possible operations
available through the System Interface API. For the proposed
pvfs2-client architecture, it is clear that a non-blocking
implementation of the System Interface is desirable for the state
machine architecture. Further, to encourage code re-use, each
operation in the {\it System Interface} can be expressed as a state
machine. Implementing the core functionality of the System Interface
methods in terms of state machines allows an opportunity for blocking
{\it and} non-blocking interface implementations, heavier code re-use,
and design simplicity.
We can think of all pvfs2-client operations as having a similar
structure, as depicted in Figure 2. What we see here is a generic
state machine implementing an operation. For all operations there
will be a {\it use specific} initialization, execution of some core
routines (i.e. functionality provided by the current System
Interface), and a use-specific notification of status and completion.
If the core functionality of each System Interface routine were
implemented in terms of a state machine, the execution of a core
routine could be embedded as a nested state machine within the
operation specific state machine.
Figure 2 shows a complete operation state machine, along with the
embedded (nested) state machine that implements core functionality of
a System Interface call. The first state called {\it init} represents
the use specific initialization state. Each operation may have a
different initialization phase, but at the very least, the source and
target endpoints for the Flow (to be performed inside the nested state
machine) are selected. Following initialization, the nested state
machine is executed, performing the core operation requested. After
this, the operation state machine checks the status of the performed
operation to properly handle error reporting. Finally, the state is
advanced to the initial state of the state machine, which is the
default action when the operation has completed.
In order to represent the core functionality of a System Interface
method as a re-useable state machine, we must take advantage of the
source and target endpoint specifications allowed by the existing {\it
Flow Interface}. Assuming it is possible to know the source and
target endpoints of the Flow prior to executing the System Interface
core functionality, it can be re-used by embedding it as a nested
state machine in the pvfs2-client architecture, {\it and} shared
between the blocking and non-blocking System Interface
implementations. The requirement for this is that the source and
target endpoints of the Flow be established before using the core
functionality state machine. In Figure 2, for example, the
pvfs2-client application may specify that the Flow's target endpoint
should be the /dev/pvfs2 device node.
\section{Non-blocking and Blocking System Interface Implementations}
Non-blocking and blocking System Interface methods (as shown in Figure
3) can use the same core functionality once implemented as a state
machine. The blocking version will manually advance the state machine
internal to the call and not return until the operation has completed.
The non-blocking implementation will start the state machine and offer
a mechanism for testing operation completion. For the non-blocking
interface, some method of asynchronous progress must be provided.
This can be done either with a background thread, or completing work
during a test for completion.
\end{document}