Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
pvfs2-osd/doc/coding/developer-guidelines.tex
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1207 lines (974 sloc)
46.5 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
% | |
% | |
\documentclass[11pt, letterpaper]{article} | |
\usepackage[dvips]{graphicx} | |
\usepackage{epsfig} | |
\usepackage{rotating} | |
\pagestyle{plain} | |
% | |
% GET THE MARGINS RIGHT, THE UGLY WAY | |
% | |
\topmargin 0.0in | |
\textwidth 6.5in | |
\textheight 9.0in | |
\columnsep 0.25in | |
\oddsidemargin 0.0in | |
\evensidemargin 0.0in | |
\headsep 0.0in | |
\headheight 0.0in | |
\title{Parallel Architecture Research Laboratory\\Developer Guidelines} | |
\author{ PVFS Development Team } | |
\date{ Feb 12, 2001 } | |
% | |
% BEGINNING OF DOCUMENT | |
% | |
\begin{document} | |
\maketitle | |
\tableofcontents | |
\newpage | |
\thispagestyle{plain} | |
% \setlength{\parindent}{0.0cm} | |
\section{TODO} | |
\begin{itemize} | |
\item reorganize this to be PVFS-centric | |
\item document the build system | |
\end{itemize} | |
\section{Introduction} | |
This document is intended to serve as an introduction and set of | |
guidelines for programming style and development tools that are commonly | |
used in the Parallel Architecture Research Lab at Clemson University. | |
We do not claim that this | |
is the best or only way to effectively structure code or use development | |
tools. However, adhering to these guidelines should assist in the | |
maintenance, debugging, and documentation of code that must be | |
consistent over the lifetime of a large software engineering project. | |
Most of this documentation was motivated by the PVFS project, but | |
hopefully it is applicable to other projects as well. | |
The coding manual is intended for use by anyone who will be regularly | |
contributing development effort to software projects within the Parallel | |
Architecture Research Lab. | |
The first portion of this document provides a brief introduction to | |
some of the most common development tools used in the lab. All of | |
these tools are available on the lab workstations. They are also | |
available on most other UNIX-like platforms. | |
The second portion of this document covers formatting and writing | |
standard C code. Some of these guidelines are provided to maintain consistency in | |
the appearance of the code, while others actually assist in writing | |
``correct'' code or making reasonable design decisions while coding. | |
The third section provides the Java counterpart to these guidelines. | |
The final section contains information that is very specific to PVFS | |
and may not be applicable to other projects. | |
\section{Editors} | |
\label{sec:editors} | |
In the PARL lab we advocate the use of either {\tt vi} | |
or {\tt emacs}. These are the most common editors used for writing | |
software and are therefore the most likely to be available on any given | |
development platform. It is also useful to have everyone on a | |
particular project use the same set of editors so that it is easier to | |
maintain consistent formatting. | |
\subsection{Vi} | |
The | |
general use of vi is beyond the scope of this document, but you can find | |
out enough to get started by either asking a colleague or starting vi | |
and typing ``:help''. There is also a tutorial available on many systems | |
that can be started by typing ``vimtutor'' at the command line. | |
\subsubsection{Vi variants} | |
There are a few variations of the basic vi editor that support different | |
features. The most popular is {\tt vim}, or ``Vi IMproved''. vim adds | |
several important features to vi, including multilevel undo, visual | |
selection, and multiple buffers. It also is fully compatible with the | |
original vi editor. Some machines utilize a version of vim that also | |
contains optional context highlighting, while others provide a separate | |
binary with this feature that is called {\tt vimx}. | |
This may seem a little confusing, but it is actually something you don't | |
have to worry about if you just setup your environment to | |
use the version that you prefer on any given system. The easiest | |
way to do this is to add a conditional statement to your login | |
configuration that aliases the ``vi'' command to start whichever version is | |
available. This example illustrates how to do this if you use | |
the tcsh shell. Add the following lines to your \textasciitilde/.cshrc file: | |
\begin{verbatim} | |
if ( -X vimx ) then | |
alias vi vimx | |
else if ( -X vim ) then | |
alias vi vim | |
endif | |
\end{verbatim} | |
The next time you login to the system, it will alias the vi command | |
to either vimx of vim if they are available. Otherwise the vi command will simply | |
start the original vi editor. | |
\subsubsection{Syntax highlighting} | |
Syntax highlighting is an editor feature that uses various colors to | |
notate different parts of the syntax (dependent upon which language you | |
are writing in). Vim has rule sets for several languages ranging from c | |
to latex. This can be extremely useful when trying to quickly | |
read code. It is also helpful in catching a few minor coding errors. | |
You can control these options for vi by way of a file in your home directory name | |
``.vimrc''. This file can also be used to control other settings in vi, | |
as an alternative to the environment variable used in section \ref{sec:vi_env}. The following is an example | |
of some color settings for vi taken from a .vimrc file: | |
\begin{verbatim} | |
set background=dark | |
if has("syntax") | |
syntax on | |
hi! Comment ctermfg=darkgreen | |
hi Type NONE | |
hi Structure NONE | |
hi! Operator NONE | |
hi! Include ctermfg=darkblue | |
hi! PreCondit ctermfg=darkcyan | |
hi! cIncluded ctermfg=darkblue | |
hi! Statement ctermfg=brown | |
hi! Conditional ctermfg=brown | |
hi! Todo ctermfg=yellow | |
hi! Operator ctermfg=NONE | |
hi! Constant ctermfg=NONE | |
hi! cCppOut ctermfg=darkred | |
hi! cSpecial ctermfg=darkmagenta | |
endif | |
\end{verbatim} | |
\subsection{Emacs} | |
\emph{I have no idea.} | |
%======================================================================= | |
\section{CVS tutorial} | |
CVS is a network aware version control system. Several of the larger | |
projects in the PARL lab use CVS to manage source code. These are some of the | |
basic capabilities that it provides: | |
\begin{itemize} | |
\item automatically tracks changes to code so that | |
old versions are backed up | |
\item allows you to document incremental changes and browse this | |
documentation | |
\item multiple users can make changes simultaneously | |
\item keeps multiple copies synchronized between | |
different users | |
\item allows you to work remotely | |
\end{itemize} | |
You can find out more information about CVS at http://www.cvshome.org. | |
\subsection{Overview} | |
CVS stores revision history for each file in your project. Each time | |
you ``commit'' a file to CVS, a snapshot of its state at that time is | |
stored in CVS. This allows you to back up to old versions at any time, | |
or browse old versions to see how the code evolved. | |
It is important to note that CVS stores version information | |
independently for each file. | |
There is no global verion number | |
associated with your project at any time unless you manually | |
assign it (see the ``tag'' feature, which is beyond the scope of this | |
document). Version numbers for each file revision are assigned automatically by | |
CVS; there is no way to control this numbering. | |
If you wish to assign a version number or logical name to the state of | |
the entire project at one time (for example, release version 1.0), then | |
you must do this manually. | |
\subsection{Basic user commands} | |
Before using CVS, you must set an environment variable that tells it | |
where to look for the CVS repositories. If you are using tcsh, you can | |
do something like this (and add it to your .cshrc file): | |
\begin{verbatim} | |
setenv CVSROOT /projects/cvsroot | |
\end{verbatim} | |
and for bash: | |
\begin{verbatim} | |
set CVSROOT=/projects/cvsroot | |
\end{verbatim} | |
\emph{Hmm. Should we maybe have a sample project in cvs that anyone can | |
check out? Might be nice to be able to play with this stuff without | |
hurting a real project } | |
\begin{itemize} | |
\item \textbf{export:} If you wish to get a copy of a source | |
tree with no intention of modifying it or manipulating it with CVS, | |
you can just export it. This command creates a directory for the | |
project and populates it with the source code. To export the most | |
recent copy of a project: | |
\begin{verbatim} | |
cvs export -D today <projectname> | |
\end{verbatim} | |
(where $<$projectname$>$ is the name of the project). When you are done with the | |
code you can simply delete the directory. | |
\item \textbf{check out:} The check out command is used to obtain a | |
copy of the source code that will be tracked by CVS. This will let | |
you make changes and additions to the project. You must have | |
appropriate permissions in order to perform this operation: | |
\begin{verbatim} | |
cvs co <projectname> | |
\end{verbatim} | |
This will create a directory for the project that contains | |
source code and CVS information. See the ``commit'' command for | |
information on submitting modifications, and the ``release'' command to | |
get rid of the directory when you are done. | |
The checkout command can be run repeatedly without any harmful side | |
effects. This is useful for updating your copy of a project to make | |
sure that it matches the most recent modifications before adding any | |
modifications of your own. | |
\item \textbf{release:} When you are done with a project, you may | |
release the directory. This will warn you if you have made any | |
modifications to the source code that have not been committed to CVS. The options listed below will also | |
cause your copy of the project directory to be deleted. Note that | |
it is perfectly fine to leave code checked out for long periods of | |
time if you are going to be working on it regularly. | |
\begin{verbatim} | |
cvs release -d <projectname> | |
\end{verbatim} | |
\item \textbf{commit:} Once you have made any changes to the project, | |
you must perform a commit operation to record the changes in CVS and make | |
them available to other users. It is usually advisable to only | |
commit code that compiles and works correctly, unless you are the | |
only person working on the project. Otherwise, you may interfere | |
with someone else's work. This command should be carried out within | |
the project directory: | |
\begin{verbatim} | |
cvs commit | |
\end{verbatim} | |
When you do this, CVS will open up a vi session that allows you to | |
write a brief summary of the changes that you have made. Note that | |
this summary will be associated with all of the files that you have | |
changed since the last checkin, not just one. The commit will | |
complete when you close the vi session. | |
\item \textbf{add:} The add command is used to add new files to the | |
project. You must create the new file first. Then run the following | |
command from within the project directory that contains the new file: | |
\begin{verbatim} | |
cvs add <newfilename> | |
\end{verbatim} | |
You must follow this up with a ``commit'' in order for other users to | |
see the new file. This command may also be used to add new | |
directories to a project. Add directories with caution, however. | |
Unlike files, subdirectories are very difficult to remove from a CVS | |
project. | |
\item \textbf{update:} You will often find it necessary to | |
update your copy of the project | |
to reflect the most recent changes. Rather than use the update | |
command, however, it is | |
often better to simply run the checkout command again. It will | |
detect if your project is already checked out and will update if | |
needed. | |
\item \textbf{status:} The status command will tell you the status of | |
all of the files within a particular project directory. This is useful for | |
determining if your copy is up to date or if it has been modified but | |
not checked in: | |
\begin{verbatim} | |
cvs status | |
\end{verbatim} | |
If you wish to filter the output from this command so that it only | |
shows you files that are not up to date you may do the following | |
(you can make this an alias or script if you wish to use this | |
regularly): | |
\begin{verbatim} | |
cvs status |& grep Status: | grep -v Up-to-date | |
\end{verbatim} | |
\end{itemize} | |
There are many other CVS commands and features, but the ones listed | |
above should be enough to get you started. You can find out more | |
information in the CVS man pages or by reading documentation at | |
http://www.cvshome.org. | |
\subsection{Starting a new project} | |
Lab users can also create new CVS directories to keep up with | |
software projects. These can later be merged in as part of another | |
project if desired. To create a new CVS entry, go into the source | |
directory and delete all of the binary or object files (assuming that | |
you only wish to track the source code). Then run this command: | |
\begin{verbatim} | |
cvs import -m "Imported sources" <projectname> PARL start | |
\end{verbatim} | |
(for the sake of clarity, $<$projectname$>$ should probably match the name of the directory that | |
contains the source) | |
\subsection{Remote access} | |
You can also access the PARL CVS repository remotely (outside of the | |
lab) if you have an internet connection and the CVS and ssh programs | |
installed on your remote machine. It works exactly like using CVS | |
locally. You can use the same commands listed above, except substitute | |
the following for the the word ``cvs'' in the command lines: | |
\begin{verbatim} | |
cvs -d :ext:<userid>@cvs.parl.clemson.edu:/projects/cvsroot | |
\end{verbatim} | |
(where userid is your lab user login id) You may wish to create an | |
alias or script to do this for convenience. CVS will prompt you for a | |
password in this situation. | |
%====================================================================== | |
\section{Compiler flags} | |
\label{sec:gcc} | |
Gcc is the standard c compiler used for PARL software. The general use | |
of gcc is beyond the scope of this document, but there are a few | |
guidelines that should be followed when building software with gcc: | |
\begin{itemize} | |
\item Always use the -Wall command line flag to gcc. This turns on | |
most of the warnings that gcc is capable of generating at compile | |
time. The warnings tend to point out | |
bad coding habits and ambiguous statements. Use the -Wall option | |
from the very beginning of your project- it is often overwhelming to | |
try to apply it after the code base has gotten large because of the | |
sheer number of warnings that it will probably find at that point. | |
\item Always use the -Wstrict-prototypes command line flag to gcc. | |
This turns on additional warnings that enforce the use of proper | |
prototypes for all functions. See section \ref{sec:proto} for more | |
information. | |
\item Make use of the -g option to gcc in development code. This enables debugging | |
symbols for use with gdb (section \ref{sec:gdb}). When code is | |
released to the public, it may be the case that this option will be | |
removed in order to reduce binary size or increase optimization, but | |
it is invaluable during the development cycle. | |
\end{itemize} | |
\section{Debugging tutorial} | |
\label{sec:gdb} | |
Gdb is a debugger for c and c++ programs. I lets you control the | |
execution of a program, see what line of source code is being executed | |
at any given time, and inspect data structures while the program is | |
running. There are several other debuggers or interfaces built on top | |
of gdb, but gdb is still one of the most popular and most flexible. | |
\subsection{Starting gdb} | |
In order to use gdb, the code in question must have been compiled with | |
the debugging option turned on. For the gcc compiler (see section | |
\ref{sec:gcc}), this just means using the -g option: | |
\begin{verbatim} | |
gcc -g test.c -o test | |
\end{verbatim} | |
In order to start debugging, just launch gdb with with the name of the | |
program you wish to debug as the only argument: | |
\begin{verbatim} | |
gdb test | |
\end{verbatim} | |
If you start gdb by this mechanism, then gdb will present you with a | |
prompt at which you can enter commands to gdb. At this point, the | |
program you wish to debug has not been started and must be launched with | |
the {\tt run} command outlined below. | |
Alternatively, you can attach to a process that is already running if | |
you know it's pid. In this case, the command line arguments to gdb are | |
the program name followed by the pid of the running program. | |
\begin{verbatim} | |
> ./test & | |
> ps | grep test | |
23716 ttyp4 00:00:00 test | |
> gdb test 23716 | |
\end{verbatim} | |
In this scenario, gdb will still present you with a prompt for entering | |
gdb commands. However, the program that you are debugging will be in | |
the middle of execution but will have stopped running. In order to make | |
it continue where it left off you must use the {\tt continue} command | |
outlined below. | |
\subsection{Common gdb commands} | |
This is a short list of the most common commands that you may wish to use with | |
gdb. You can find out more specific information by using the {\tt help} | |
command or by looking at http://sources.redhat.com/gdb/\#documentation. | |
\begin{itemize} | |
\item \textbf{help}: Typing just help at the prompt will give you a list | |
of command classes that you can find out more information about. You | |
can also get documentation for a particular command if you know its | |
name. | |
\item \textbf{break}: Break is used for specifying where you would like | |
to pause execution of a program. You can either specify a function name | |
or a line number with break. The next time you {\tt run} or {\tt continue} your | |
program, it will stop at this breakpoint and wait for another gdb | |
command. Once it stops you can either step through execution one line | |
at a time, look at variables, or continue again. This is useful for | |
debugging a particular area of interest in your program without having | |
to step through the entire program. You can also specify multiple break | |
points if you would like the execution to stop at more than one | |
location. | |
\item \textbf{run}: This command begins execution of your program. If | |
you have not specified any breakpoints, the program will run until it | |
exits normally or an error occurs. If you do specify a breakpoint, it | |
will also stop when it hits that breakpoint. | |
\item \textbf{continue}: Continue causes execution to resume if you are | |
currently at a breakpoint or have been stepping through execution one | |
line at a time. Continue will let program execution progress until it | |
hits either an error, normal exit, or critical error. | |
\item \textbf{step}: Step lets you execute one line of source code at a | |
time. It will tell you what line it is on, as well as what line of code | |
is about to be executed. This allows you to examine the effects of | |
particular statements in your code. | |
\item \textbf{next}: Next is very similar to step, except that it | |
treats subroutine calls as a single instruction rather than stepping | |
into all subroutines. This is very helpful for skipping over functions | |
that you do not wish to inspect the internals of (such as {\tt printf} | |
for example). It is helpful to be able to skip over complex functions | |
that are either known to work or were not compiled with debugging | |
symbols. | |
\item \textbf{print}: The print command is the primary mechanism for | |
inspecting data structures and variables. Print can be used to show the | |
contents of a single variable by using the variable name as the | |
argument. It will also attempt to print out a comma separated list of | |
the elements of a structure if you try to print a structure. Print is | |
aware of a c like syntax for specifying struct elements and pointers so | |
that you can specify elements and variables using struct.element, | |
struct-$>$element and *pointer notation. Parenthesis are often required | |
in situations where they are not in c notation, however. | |
\item \textbf{where}: This command shows your current location in the | |
execution stack. It prints the line number of the current instruction, | |
as well as the heirarchy of subroutine line numbers that were passed | |
through to get to this point. | |
\item \textbf{list}: List is used to list the source code surrounding a | |
particular instruction. By default, you it lists the 10 lines | |
surrounding the current instruction. You can also use it to list source | |
code surrounding a particular line number or function definition. | |
\end{itemize} | |
\section{Makefile tutorial} | |
\emph{We may be able to get this from the MPICH coding docs or from | |
Walt's web page?} | |
\section{Electric Fence tutorial} | |
Electric Fence is a tool used for debugging buffer overruns and | |
underruns that can occur when manipulating dynamically allocated memory. It is implemented as a static library that can be linked | |
into your code without modifying the source in any way. It basically | |
works by replacing the malloc system call with a modified malloc that | |
surrounds any new memory regions with protected areas. If a process | |
attempts to write into such a protected area, it will cause a segmentation | |
violation. Without Electric Fence, buffer overruns can occur without | |
being immediately obvious, which makes debugging difficult. | |
Take note that Electric Fence does not help at all with problems that | |
occur with staticly allocated memory. It also does not indicate | |
memory leaks. Other tools should be used for debugging those types of | |
problems. | |
\subsection{Using Electric Fence} | |
To use Electric Fence, you just need to link in the efence library | |
during the last stage of linking (or compilation, if you do not have a | |
separate link step): | |
\begin{verbatim} | |
gcc -g -Wall -Wstrict-prototypes -lefence test.c | |
\end{verbatim} | |
When you run your program, it should print a message to the screen | |
indicating that Electric Fence is in use. If your program segfaults, it | |
will not show you where it occurred, but you can then debug the program with | |
gdb to determine this information (section \ref{sec:gdb}). | |
\subsection{Electric Fence options} | |
There are several helpful Electric Fence options that can be controlled | |
by way of environment variables. The following list summarizes the most | |
useful ones. They can be turned on in tcsh by typing ``setenv VARIABLE | |
value'' or in bash by typing ``export VARIABLE=value'', where VARIABLE is | |
the option you wish to control and value is the value that you wish to | |
set it to. | |
\begin{itemize} | |
\item EF\_ALIGNMENT: This controls the allignment of dynamically | |
allocated memory. By default, this alignment is equal to your | |
machines's word size. This means that small overruns might go | |
unnoticed because extra memory has been allocated for certain | |
buffers. To make sure that this does not happen, set alignment to 1. | |
This ensures that even the small overruns will be caught. | |
\item EF\_PROTECT\_BELOW: When this option is set to 1, it tells | |
Electric Fence to check for buffer underruns in addtion to buffer | |
overruns. | |
\item EF\_PROTECT\_FREE: When this option is set to 1, Electric | |
Fence will check to be sure that memory is not being accessed after | |
it has been released with the {\tt free()} system call. | |
\end{itemize} | |
Turning on all of these options is helpful in debugging dynamic memory | |
problems. Note that using Electric Fence (especially with the stricter | |
options) will cause the memory utilization of your application to | |
increase dramatically. | |
%==================================================================== | |
\section{C Programming} | |
\subsection{Formatting} | |
This section will provide guidelines so that multiple users on a given | |
project can write code with consistent appearance. This makes the code | |
easier to maintain and audit in a group environment. | |
\subsubsection{GPL} | |
\label{sec:gpl} | |
The GPL, or General Public License, is a software license created by the | |
GNU organization (http://www.gnu.org). You can find out more | |
information about it at their web site. A brief summary is that the GPL | |
insists that the source code be distributed with any software released | |
under the GPL. Furthermore, anyone who modifies and redistributes GPL | |
code must release their modifications under the GPL as well. This is | |
convenient for the research community because it encourages the sharing | |
of ideas and also legally protects your developments. | |
Any PARL project code that you release to the community should include an electronic | |
copy of the GPL. The easiest way to do this is to create a file in the | |
project's top level directory called ``COPYING'' which contains the full | |
text of the GPL version 2 as obtained from http://www.gnu.org. Then in | |
\emph{every} source code file in the project, include the following text at | |
the very top of the file: | |
\begin{verbatim} | |
/* | |
* (C) 2001 Clemson University. | |
* | |
* See COPYING in top-level directory. | |
*/ | |
\end{verbatim} | |
Other organizations that have contributed to the project may be listed | |
in the copyright line as well (See section \ref{sec:pvfs-copyright} for information on how do this in PVFS code). If you wish to credit particular developers or | |
provide contact information, please do so in the README file located in | |
the top level directory. | |
\subsubsection{Other source code header information} | |
In addition to the copyright comments, it is usually | |
helpful to provide a brief description of what is contained in the | |
each source file. This should just be a few summary lines below the | |
copyright information. | |
\subsubsection{Commenting} | |
\label{sec:comments} | |
Commenting your code effectively is very important! Please comment | |
important sections of your code clearly and concisely as you write it. | |
The habit of commenting after completing the code often leads to poor | |
comments. | |
Do not use c++ style comment delimiters ( // ) in c code. Some c | |
compilers do not accept this as a comment delimiter, and it is not a | |
part of the c language specification. | |
For single line comments (or brief comments trailing a line of code), | |
just use the /* and */ delimiters. If the comment is longer than one | |
line, use this format: | |
\begin{verbatim} | |
/* This code does lots of cool things. It is also written perfectly and | |
* will never break. It is fast, robust, extensible, and resistant to | |
* rust and corrosion. | |
*/ | |
\end{verbatim} | |
This makes it easy to tell where the comment begins and ends. | |
Comments that describe the operation of a particular function should be | |
listed just above the function definition, not the prototype. The | |
comment should give the function name, what it does, what any potential | |
side effects are, and the range of return values. This is one example: | |
\begin{verbatim} | |
/* MC_finalize() | |
* | |
* This function shuts down the method control subsystem. It is | |
* responsible for tearing down internal data structures, shutting down | |
* individual method devices, and gracefully removing any unfinished | |
* operations. | |
* | |
* returns 0 on success, -errno on failure | |
*/ | |
int MC_finalize(void) | |
{ | |
... | |
} | |
\end{verbatim} | |
If you are working on the PVFS project, then you should adhere to the | |
function comments described in section \ref{sec:pvfs-comments}. | |
\subsubsection{Brackets} | |
Brackets are of course used to delineate blocks of code contained within | |
loops, conditional statements, or functions. For clarity, \emph{any} | |
statement executed within a conditional or loop should be enclosed in | |
brackets, even if it is just one line. For example: | |
\begin{verbatim} | |
if(something true) | |
{ | |
do something; | |
} | |
\end{verbatim} | |
and \emph{not} | |
\begin{verbatim} | |
if(something true) | |
do something; | |
\end{verbatim} | |
Also note that each bracket gets it's own line in the source code. | |
\subsubsection{Indentation} | |
\label{sec:indent} | |
Indentation is also very important to writing clear code. The easiest | |
rule to remember is that any new set of brackets should add a level of | |
indentation for the code contained within it. This holds for functions, | |
loops, and conditionals. The following is an example: | |
\begin{verbatim} | |
int foofunction(int x) | |
{ | |
int y = 0; | |
if(x <= 0) | |
{ | |
do some stuff; | |
} | |
else | |
{ | |
for(y=0; y<x; y++) | |
{ | |
do lots of stuff; | |
} | |
} | |
return(0); | |
} | |
\end{verbatim} | |
\subsection{Hints for writing maintainable code} | |
\subsubsection{General code layout} | |
\label{sec:proto} | |
These are a few general guidelines for how to organize your code: | |
\begin{itemize} | |
\item Group similar functions together into the same .c file. | |
\item If a function will only be called from within the .c file where | |
it is defined, then include the prototype for the function in the | |
same .c file near the top. (see section \ref{sec:static} for | |
information on static declarations) | |
\item If a function will be called from outside of the .c file in | |
which it is declared, then put the prototype in a header file | |
separate from the .c file. This header should be included in any | |
other .c file where the function will be called. | |
\item Put comments describing the behavior of the function just | |
before its definition, not with the prototype (see section | |
\ref{sec:comments} for more detailed information about commenting | |
functions). | |
\item Header files should \emph{only} contain prototypes and structures | |
that are needed by external pieces of code. It helps to encapsulate things by not providing extraneous information in the | |
header files. | |
\end{itemize} | |
\subsubsection{Length of functions} | |
Try not to make extremely long functions. A good rule of thumb is to | |
limit your functions to 100 lines or less. If a function is longer than | |
this, then it should probably be broken apart into smaller subfunctions. | |
Exceptions to this rule are rare. | |
\subsubsection{Preventing double inclusion} | |
If you are using a header file in several locations, it is easy to | |
create a situation in which the same header file is indirectly included | |
twice in a single compilation. This causes compilation errors because | |
of function, variable, or type redefinition. In order to ensure that | |
this does not happen, you should always wrap your header files in | |
preprocessor macros that prevent the code from being read more than once | |
by the compiler. This may be done by creating a special define that can | |
be detected the second time the code is included. The name of this | |
define should stand out so that it does not conflict with other | |
variables | |
or definitions in your code. It is usually safe to pick the filename of | |
header, convert it to all uppercase, and replace punctuation with | |
underscores. Here is an example for a header file called bmi.h: | |
\begin{verbatim} | |
/* | |
* (C) 2001 Clemson University and The University of Chicago | |
* | |
* See COPYING in top-level directory. | |
*/ | |
/* This file contains the primary application interface to the BMI | |
* library. | |
*/ | |
#ifndef __BMI_H /* these macros tell the compiler to skip the */ | |
#define __BMI_H /* following code if it hits it a second time */ | |
/* now do whatever you would normally do in your header: */ | |
#include<bmi_types.h> | |
struct foo{ | |
int x; | |
int y; | |
}; | |
int foo_function(double a, double b); | |
/* don't forget to end your header with this statement */ | |
#endif /* __BMI_H */ | |
\end{verbatim} | |
\subsubsection{Static declarations} | |
\label{sec:static} | |
Any function or variable that is declared global in a particular .c file | |
but not referened in any other .c file should be declared static. This | |
helps to keep the symbol name space from becoming cluttered. It also | |
insures that local functions are not accidentally called somewhere that | |
they were not intended to be called. | |
\subsubsection{Initializing variables} | |
Initialize all variables when they are declared in your software. Even if it is a trivial | |
scalar variable, go ahead and initialize it. Integers and floats can | |
typically be initialized to -1 or 0, while pointers can be initialized | |
to NULL. This simple habit can help uncover many problems that occur | |
when the validity of a value is not checked before it is used. There is | |
no guarantee what the value of a variable will be when it is | |
created. Picking a known initial value to start out with can prevent | |
garbage data from being interpreted as valid information. | |
A similar argument applies to memory regions that are dynamically | |
allocated. Any dynamically allocated structure or variable should at | |
least be zeroed out before being used in the code. This can be done | |
with the {\tt memset()} function: | |
\begin{verbatim} | |
foopointer = (struct foostruct)malloc(sizeof(struct foostruct)); | |
if(foopointer == NULL) | |
{ | |
/* alloc failed */ | |
return(some error value); | |
} | |
memset(foopointer, 0, sizeof(struct foostruct)); | |
\end{verbatim} | |
If there are sentinal values other than 0 for elements contained in your | |
struct, they should be set as well. | |
\subsubsection{Allocating and deallocating complex structures} | |
If there is a particular structure that you are frequently dynamically | |
allocating or deallocating, it usually pays off to go ahead and create | |
functions to handle those operations. This is especially helpful if | |
there are further dynamically allocated structures within the original | |
structure. Encapsulating all of this memory management in a pair of | |
functions aids in debugging and makes your code more readable overall. | |
A good naming convention is: | |
\begin{verbatim} | |
/* returns a pointer to new structure on success, null on failure */ | |
struct foo* alloc_foo(void); | |
and | |
/* no return value */ | |
void dealloc_foo(struct foo*); | |
\end{verbatim} | |
\subsubsection{Keeping up with work in progress} | |
There are often questionable issues, or even issues that you don't have | |
time to deal with at the moment, that come up when writing large pieces | |
of code. It is generally helpful to document these questions or | |
``todo'' | |
items in a known location so that they are not forgotten. There are two | |
recommended ways of handling this. Keep larger or more imporant items | |
listed in a file called ``TODO'' in the top level directory of your | |
project. This file can be added to CVS so that other developers can see | |
a quick list of known bugs or issues that need resolution. As items on | |
this list are corrected, you may wish to log them in another file at the | |
top level called ``Changelog''. Smaller issues, that are perhaps only | |
important from a stylistic point of view, can be commented in the code | |
and marked with the text string ``TODO'' within the comment. This is highlighted with a | |
special color with vi syntax highlighting, and can easily be found with | |
the {\tt grep} tool later. | |
\subsubsection{Choosing good variable and function names} | |
Try to pick descriptive names for variables and functions, rather than | |
saving keystrokes by picking obtuse abbreviations. This makes it easier | |
for people who look at your code afterwards to understand what is going | |
on. If your function or variable name is comprised of more than one | |
word, then separate the word with underscores. If a collection of | |
functions are related, or collectively form a common interface, the | |
prepend an identifier to each function so that it is obvious that they | |
belong together: | |
\begin{verbatim} | |
int test_control_open(); | |
int test_control_close(); | |
int test_control_read(); | |
\end{verbatim} | |
Function and variable nameing issues specific to PVFS can be found in | |
section \ref{sec:pvfs-naming}. | |
\subsection{Advanced topics} | |
\subsubsection{Checking for interrupted system calls} | |
If a system call fails, always check the return value to see if it was | |
set to EINTR. If this happens, it means that the system call was | |
interrupted by a signal and probably did not actually fail; it just | |
needs to be restarted. This is a fairly common situation when doing | |
{\tt reads}, {\tt writes}, or {\tt polls}. You can restart operations | |
either by wrapping them in a while loop that causes it to try again if | |
EINTR occurs, or you can use a goto and a label to jump back to the the | |
system call you wish to repeat. | |
\subsubsection{Constant arguments} | |
If you are passing in pointers as arguments to a function, but \emph{do | |
not} | |
wish for the value contained in the pointer to be modified, then it is | |
sometimes helpful to make the argument declaration a constant. This | |
makes the compiler present a warning or an error if the value is | |
accidentally modified within your function. This technique is | |
especially useful when one of the arguments to your function is a | |
string. In this case, you will probably be passing in a char* argument | |
for convenience. However, passing in a string in this manner allows the | |
function to modify the argument, which may not be desirable. Using a | |
const char* argument can prevent this. Example: | |
\begin{verbatim} | |
int string_key(const char *key, const char *id_string) | |
{ | |
/* within this function it is now impossible to accidentally modify | |
* the character strings pointed to by key or id_string | |
*/ | |
return(0); | |
} | |
\end{verbatim} | |
\subsubsection{Obscure coding practices} | |
By all means, try to avoid the use of obscure coding tricks when writing | |
software as part of a group. This especially true when there is there | |
is an equally valid but much clearer method of accomplishing your goal. | |
Obscure coding practices include but are not limited to: | |
\begin{itemize} | |
\item the : ? conditional operator | |
\item unecessary goto statements | |
\item nested switches | |
\item implicit type conversion | |
\item placing too much emphasis on makeing code small | |
\end{itemize} | |
\subsubsection{Locking data structures} | |
If you are programming in a multithreaded or reentrant environment, it | |
is very important to use locking mechanisms effectively. Any global | |
variable should be locked before it is accessed in this type of | |
environment. The pthread library contains almost any sort of portable | |
primitive you may need for a single application. It is also helpful to | |
wrap these calls behind an interface that allows you to turn locking on | |
or off at compile time. The ability to disable locking can be useful | |
during development or when running code on a system that does not | |
require locking. Look in the pvfs-locks CVS module for an example. | |
\subsubsection{Select vs. poll} | |
Try to avoid using the select system call and use poll in its place. | |
Poll scales more efficiently. It is also the most direct function call | |
for accomplishing the desired task on modern Linux kernels because | |
select is implemented on top of the kernel's poll function. | |
\subsubsection{String parsing} | |
Be careful with regards to which functions you use when doing simple string parsing. | |
Some of the functions provided in {\tt string.h} are dangerous to use, either | |
because they do not return error values, or because they alter their | |
arguments. Most of these issues are documented in the man pages. One | |
common example occurs when an integer value must be read out of a | |
string. In this case, it is better to use sscanf than atoi: | |
\begin{verbatim} | |
char number_string[] = "300"; | |
int my_number = -1; | |
ret = -1; | |
/* if you use sscanf, you can check the return value */ | |
ret = sscanf(number_string, "%d", &my_number); | |
if(ret < 1) | |
{ | |
return an error; | |
} | |
/* as opposed to atoi, which will not tell you if it fails */ | |
my_number = atoi(number_string); | |
\end{verbatim} | |
\subsubsection{Abstraction} | |
When you are designing new interfaces, think carefully about how to | |
create an abstraction for what you want the interface to do. The important idea | |
here is to not be tied down to a particular implementation below your | |
interface because you made the interface too restrictive. For example, suppose that you wish to create an interface for | |
storing and retrieving a large number of independent objects. One way | |
to implement this may be to use a hashing function. However, most | |
people consider it to be much quicker to get a simple linked list | |
working. If you abstract the interface correctly, you can implement the | |
functionality with a linked list for now just to get your program | |
working and then come back later and plug in a hash table | |
implementation. This is only possible with a good abstraction, however. | |
If your first interface has functions such as ``add\_to\_head'' or | |
``create\_new\_list'' that pass around pointers to lists, then it will of | |
course be difficult to change this interface to use a hash table. It | |
would be better to use functions such as ``store\_item'' or | |
``create\_new\_container'' and use opaque types to keep up with your data | |
structure. | |
\subsubsection{Function pointers} | |
Function pointers can be useful when creating modular code. They allow | |
you to pick which function will be used to perform a given task at run | |
time rather than compile time. This is not really any harder than | |
manipulating pointers to variables: | |
\begin{verbatim} | |
/* this is the first way to send a message */ | |
int send_message_one(void* data, int size); | |
/* this is the second way to send a message */ | |
int send_message_two(void* data, int size); | |
/* this is a pointer to the prefered method */ | |
int (*send_message_generic)(void*, int) = NULL; | |
... | |
if(something is true) | |
{ | |
send_message_generic = send_message_one; | |
} | |
else | |
{ | |
send_message_generic = send_message_two; | |
} | |
... | |
/* We don't care which method the user chose. We know that it can be | |
* accessed through this function pointer without us modifying our code. | |
*/ | |
send_message_generic(my_data, sizeof(my_data)); | |
\end{verbatim} | |
\subsubsection{Typedefs and opaque types} | |
Choosing appropriate types for objects passed around in your code can be | |
very important in some situations. There are a couple of different | |
issues here: | |
\begin{itemize} | |
\item \textbf{Platform dependence:} Different architectures use a | |
different number of bytes for some variable types. This means that | |
it can sometimes be very helpful to explicitly choose the size of | |
some variables to aid portability. This is especially true if the | |
data is going to passed over a network, although there are more | |
issues (such as big-endian vs. little-endian) to worry about in those | |
situations. It is often a good idea to use typedefs to create new | |
type names that have a known, fixed size: | |
\begin{verbatim} | |
typedef int32_t pvfs_flag_t; | |
\end{verbatim} | |
This guarantees that when a pvfs\_flag\_t variable is declared, it | |
will be a 32 bit integer, regardless of the host architecture. | |
\item \textbf{Opaque types:} Sometimes you wish to have an | |
interface operate in terms of a specific type. If you are not | |
certain of what type should be used for this purpose in the long term, you can hide it behind a | |
typedef'd opaque type. That way, if you change the type later, you | |
may not have to change every reference to it in the code. You just | |
have to change the initial typedef statement. This can be done for | |
structs or scalar types. | |
\emph{Guess I need an example here...} | |
\end{itemize} | |
%======================================================================= | |
\section{Specific PVFS issues} | |
\subsection{Copyright information} | |
\label{sec:pvfs-copyright} | |
Copyright information at the top of source code in PVFS should include | |
the University of Chicago. (The University of Chicago is affiliated | |
with Argonne National Lab, where several key PVFS developers are | |
located). | |
\begin{verbatim} | |
/* | |
* (C) 2001 Clemson University and The University of Chicago. | |
* | |
* See COPYING in top-level directory. | |
*/ | |
\end{verbatim} | |
\subsection{Function commenting} | |
\label{sec:pvfs-comments} | |
\emph{Figure out what standard should be to match up with autodocument | |
tools. This is on hold until we settle on such a tool, so use standard | |
put forth in section \ref{sec:comments} for now.} | |
\subsection{Function naming} | |
\label{sec:pvfs-naming} | |
Interface functions and global variables in PVFS should use a standard | |
naming convention for clarity. Here are a few guidelines: | |
\begin{itemize} | |
\item The letters ``PVFS'' should only be prepended to functions and | |
global variables that exist as part of an application level interface. | |
Some examples would be {\tt PVFS\_open} and {\tt PVFS\_read}. For | |
clarity, do not use this naming scheme for interfaces internal to PVFS. | |
\item Well defined internal PVFS interfaces should use the prefix | |
``PINT'' (this is short for ``PVFS interface''). This should then be | |
followed by an identifier for the interface, and then a description of | |
what the particular function does. Some examples are {\tt | |
PINT\_flow\_alloc}, {\tt PINT\_flow\_free}, and {\tt PINT\_flow\_post}. | |
\item There are exceptions to the above rule. For example, well defined | |
interfaces that exist within a very distinct module of PVFS may use a | |
different prefix. Examples include the method functions within the BMI | |
layer of PVFS communications have names such as {\tt METH\_tcp\_send} | |
and {\tt METH\_tcp\_recv}. | |
\item Any variables that are globally visible should follow the rules | |
listed above as well. This naming convention is for both functions and | |
global variables. | |
\end{itemize} | |
\subsection{Error logging with Gossip} | |
Gossip is a simple library for logging both errors and debugging messages. | |
It allows you to send logging messages to either stderr, syslog, or a | |
text file. | |
Gossip uses a {\tt debug mask} to determine which messages get logged. | |
You may specify a mask level with each debugging call. These messages | |
can then be toggled on or off depending on what the global mask value | |
is. This allows you to turn debugging on or off just for specific parts of | |
your software at run time. The global mask may be made up of several | |
individual mask values logially or'd together in order to enable logging for | |
multiple parts of your software simultaneously. | |
Gossip also allows you to send error messages. These are similar to | |
debugging messages, except that they get logged regardless of the mask | |
value and whether debugging is turned on or off. These error messages | |
should only be used in situations in which a critical error should be | |
recorded. | |
The following is a list of functions provided in the Gossip library: | |
\begin{itemize} | |
\item \textbf{gossip\_enable\_stderr()}: Directs logging messages to | |
stderr. | |
\item \textbf{gossip\_enable\_file(filename, mode)}: Directs logging | |
messages to a specified file. The arguments are the same as those | |
taken by the {\tt fopen()} function. | |
\item \textbf{gossip\_enable\_syslog(priority)}: Directs logging | |
to syslog. The priority argument is the same as that given to the | |
{\tt syslog()} function. | |
\item \textbf{gossip\_set\_debug\_mask(debug\_on, mask)}: Turns | |
debugging messages on or off and specifies the mask value to use if | |
turned on. | |
\item \textbf{gossip\_disable()}: Gracefully shuts down the Gossip | |
logging facilities. | |
\item \textbf{gossip\_debug(mask, format, ...)}: Logs a debugging | |
message. Uses the same format syntax as the {\tt printf()} function | |
call. It will only print if debugging is turned on and the mask | |
value matches the global mask specified with | |
gossip\_set\_debug\_mask(). | |
\item \textbf{gossip\_ldebug(mask, format, ...)}: Same as above, | |
except that it prepends each message with the file name and line | |
number of the source code that invoked it. | |
\item \textbf{gossip\_err(format, ...)}: Logs error messages. These | |
will print regardless of the mask and whether debugging is turned on | |
or off. | |
\item \textbf{gossip\_lerr(format, ...)}: Same as above, except that | |
it prepends each message with the file name and line number of the | |
source code that invoked it. | |
\end{itemize} | |
Examples of how to use Gossip can be found in the {\tt gossip/examples} | |
directory of the Gossip source code. This code can be found in the | |
{\tt pvfs2/src/common/gossip} directory within the PVFS 2 source tree. | |
\subsection{Suggested error handling} | |
\subsubsection{Traditional application error handling with errno} | |
Most unix system calls set a global variable called {\tt errno} when an | |
error condition occurs. Since this is a global variable, it is | |
overwritten everytime a system call is made. This means that it must be | |
checked immediately following the failure of the system call in | |
question. The errno values correspond to to various error conditions, | |
wuch as ``bad file descriptor'' or ``permission denied.'' One can print | |
out a textual description of these error values using the {\tt perror()} | |
or {\tt strerror()} functions. More information about the use of {\tt | |
errno} can be found in the man pages for {\tt errno}, {\tt perror}, and | |
and {\tt strerror}. | |
The use of errno in this manner is fine for small applications, but | |
becomes more tedious when building larger software projects. The | |
problem is that you must store the error value somewhere when passing | |
the error back through multiple abstraction layers. This tends to cause | |
confusion in large projects. | |
\end{document} | |