Skip to content
Permalink
1139b72d5e
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
193 lines (172 sloc) 6.92 KB
####################################################################
# TODO list for pvfs2 project as a whole
#
#
NOTE: Some (dated) status information can be found in doc/pvfs2-status.tex
improving robustness of I/O apis:
====================================================================
- our internal api's should be able to handle the following cases:
a) operations posted before initialize() should return error
b) operations posted after finalize() has started should return error
c) finalize() should gracefully terminate pending operations, although
those operations will have undefined results
- these API's in particular need update in that regard:
- dbpf-attr-cache DONE
- trove
- bmi
- flow
- job
- request scheduler
- device interface
server operations:
====================================================================
- not started:
- eattrib (set/get)
- unfinished:
- general error handling
- performance monitoring (need more metrics)
general server functionality:
====================================================================
- attributes (permissions, etc.) on datafiles
- finishing file system semantics documentation
- don't forget to define semantics for access times
request scheduler:
====================================================================
- more generic implementation
- smarter concurrency rules
system interface functionality:
====================================================================
- not started:
- eattrib (set/get)?
- unfinished:
- thread safety
- way to pass in consistency semantics (timeout values, etc.)
- define how configuration info should be passed in
(how to do paths, fstab, url stuff, whatever)
- define how to pass in distribution and number of datafiles for
cases in which the caller wants to override the defaults
- add nonblocking api for some functions
- clean up API (in particular fstab parsing / initialize path, and removal of
depricated terminology)
- make input pointer argumentss to system interface be declared const
- make sure that system interface functions return an error, rather than
asserting, if the caller tries to operate on a bogus handle (one case occurs
in assertions following PINT_bucket_map_to_server())
kernel/vfs interface
====================================================================
performance tuning:
====================================================================
- instrumenting
- steal what we can from mpich2
- architecture specific locking, etc.
- thread tuning
- memory allocation cache
- do some benchmarking of thread context switches to help decide
how trove/job/flow interfaces should interact
- figure out how to make i/o faster
request encoding:
====================================================================
- come up with a mechanism for handling requests that go beyond
the BMI defined limit for unexpected messages (mainly an issue
on read/write with complex datatypes, but also potentially a
problem on setattr)
error codes:
====================================================================
- converting to new error code format (everywhere)
- documenting valid error codes from functions
I/O path:
====================================================================
- buffer cache on top of trove
- clean up buffer management in BMI to be more useful for I/O buffer
cache, maybe push to a seperate component
- optimizing small reads and writes (packing data into req/ack messages)
- native GM flowprotocol
- general optimizations (lock granularity, immediate completion, etc.)
- ability to unpost, correct use of timeouts, preposting operations
- semantics of short read and write operations
- bmi_tcp scalability and robustness
- ability to toggle synch behavior in trove
- use better buffer size in default flow protocol
- bmi shmem implementation
- many items in BMI and flow TODO files
- ability to compile out device support, or at least prevent device thread
from spawning if not used
- ability to fail over with multiple bmi transports
correctness/performance testing
====================================================================
- a comprehensive test suite of the system interface API
- more pts tests
- profiling code paths
- eliminate memory leaks
- handle server or client failures in a reasonable way (log and exit instead
of segfault, perhaps)
system management utilities
====================================================================
- pvfs2-fsck (serial tool done, evolve into parallel tool)
- decide what we want/need here?
- health monitoring
- system recovery
- system statistics (raid stat, mem used, etc.)
- etc.
- performance monitoring:
- more metrics
- more viz tools
- end user documentation
- better logging systems
- maybe make pvfs2-ping compute a cksum on the fs.conf from all
servers and issue a warning if they don't all match?
documentation:
====================================================================
- come up with an automated way to document the wire packet format
- also document headers that bmi tacks on, at least for bmi_tcp
- update the coding guidelines
- document config file options
- automate faq publishing
- mechanism for exporting to html
- update all design docs!
- review
code cleanup:
====================================================================
- remove some of the stuff from the test subdir for "make dist" target
- in particular, test/common (partial), test/io, test/proto, test/server
- put in header file wrappers to make them work with c++
- audit code to make sure that all error paths are handled when
assertions are turned off
- maybe make a checklist for each pvfs2 component to use as we clean
up each section of the code? (items to check for each component
could include stuff like symbol names, PVFS_error code usage,
properly error handling when assertions are off, etc.)
- consistent formatting
- consistent function naming
- consistent header file inclusion
- come up with more named values like TROVE_HANDLE_NULL to use in
other parts of the code
- try to clean up flow / I/O path some, in particular so we don't have
to do so much mallocing to set up from client side
- maybe do things like embed file_data struct in flow desc.
- make permission checking in prelude.sm neater, maybe assert on
unkown op types so we don't forget to add new ones here
fault tolerance:
=====================================================================
- what does the API look like
- data redundancy
- failover
testing:
=====================================================================
- run common test programs and benchmarks, like:
- flash
- iozone
- dbench
- ior
- bonnie
- make kernel
- mpiiotest
- John May's tests?
- piobench
- more pts tests
- more datatype testing
- remember example of ub < lb
rob's random list:
=====================================================================
- do something about the weird PINT_sys_wait and PINT_mgmt_wait macros in
client-state-machine.h