PVFS2-GLOBAL-TODO.txt

####################################################################
# TODO list for pvfs2 project as a whole
# 
#

NOTE: Some (dated) status information can be found in doc/pvfs2-status.tex

improving robustness of I/O apis:
====================================================================
- our internal api's should be able to handle the following cases:
  a) operations posted before initialize() should return error
  b) operations posted after finalize() has started should return error
  c) finalize() should gracefully terminate pending operations, although 
     those operations will have undefined results
- these API's in particular need update in that regard:
  - dbpf-attr-cache DONE
  - trove
  - bmi
  - flow
  - job
  - request scheduler
  - device interface

server operations:
====================================================================
- not started:
  - eattrib (set/get)
  
- unfinished:
  - general error handling
  - performance monitoring (need more metrics)

general server functionality:
====================================================================
- attributes (permissions, etc.) on datafiles
- finishing file system semantics documentation
- don't forget to define semantics for access times

request scheduler:
====================================================================
- more generic implementation
- smarter concurrency rules

system interface functionality:
====================================================================
- not started:
  - eattrib (set/get)?

- unfinished:
  - thread safety
  - way to pass in consistency semantics (timeout values, etc.)

- define how configuration info should be passed in
  (how to do paths, fstab, url stuff, whatever)
- define how to pass in distribution and number of datafiles for 
  cases in which the caller wants to override the defaults
- add nonblocking api for some functions
- clean up API (in particular fstab parsing / initialize path, and removal of
  depricated terminology)
- make input pointer argumentss to system interface be declared const
- make sure that system interface functions return an error, rather than
  asserting, if the caller tries to operate on a bogus handle (one case occurs
  in assertions following PINT_bucket_map_to_server())

kernel/vfs interface
====================================================================

performance tuning:
====================================================================
- instrumenting
- steal what we can from mpich2
- architecture specific locking, etc.
- thread tuning
- memory allocation cache
- do some benchmarking of thread context switches to help decide
  how trove/job/flow interfaces should interact
- figure out how to make i/o faster

request encoding:
====================================================================
- come up with a mechanism for handling requests that go beyond
  the BMI defined limit for unexpected messages (mainly an issue
  on read/write with complex datatypes, but also potentially a 
  problem on setattr)

error codes:
====================================================================
- converting to new error code format (everywhere)
- documenting valid error codes from functions

I/O path:
====================================================================
- buffer cache on top of trove
- clean up buffer management in BMI to be more useful for I/O buffer 
  cache, maybe push to a seperate component
- optimizing small reads and writes (packing data into req/ack messages)
- native GM flowprotocol
- general optimizations (lock granularity, immediate completion, etc.)
- ability to unpost, correct use of timeouts, preposting operations
- semantics of short read and write operations
- bmi_tcp scalability and robustness
- ability to toggle synch behavior in trove
- use better buffer size in default flow protocol
- bmi shmem implementation
- many items in BMI and flow TODO files
- ability to compile out device support, or at least prevent device thread
  from spawning if not used
- ability to fail over with multiple bmi transports

correctness/performance testing
====================================================================
- a comprehensive test suite of the system interface API
- more pts tests
- profiling code paths
- eliminate memory leaks
- handle server or client failures in a reasonable way (log and exit instead
  of segfault, perhaps)

system management utilities
====================================================================
- pvfs2-fsck (serial tool done, evolve into parallel tool)
- decide what we want/need here?
  - health monitoring
  - system recovery
  - system statistics (raid stat, mem used, etc.)
  - etc.
- performance monitoring:
  - more metrics
  - more viz tools
- end user documentation
- better logging systems
- maybe make pvfs2-ping compute a cksum on the fs.conf from all
  servers and issue a warning if they don't all match?

documentation:
====================================================================
- come up with an automated way to document the wire packet format
  - also document headers that bmi tacks on, at least for bmi_tcp
- update the coding guidelines
- document config file options
- automate faq publishing
- mechanism for exporting to html
- update all design docs!
- review

code cleanup:
====================================================================
- remove some of the stuff from the test subdir for "make dist" target
  - in particular, test/common (partial), test/io, test/proto, test/server
- put in header file wrappers to make them work with c++
- audit code to make sure that all error paths are handled when
  assertions are turned off
- maybe make a checklist for each pvfs2 component to use as we clean 
  up each section of the code?  (items to check for each component
  could include stuff like symbol names, PVFS_error code usage,
  properly error handling when assertions are off, etc.)
- consistent formatting
- consistent function naming
- consistent header file inclusion
- come up with more named values like TROVE_HANDLE_NULL to use in 
  other parts of the code
- try to clean up flow / I/O path some, in particular so we don't have
  to do so much mallocing to set up from client side
  - maybe do things like embed file_data struct in flow desc.
- make permission checking in prelude.sm neater, maybe assert on 
  unkown op types so we don't forget to add new ones here

fault tolerance:
=====================================================================
- what does the API look like
- data redundancy
- failover

testing:
=====================================================================
- run common test programs and benchmarks, like:
  - flash
  - iozone
  - dbench
  - ior
  - bonnie
  - make kernel
  - mpiiotest
  - John May's tests?
  - piobench
- more pts tests
- more datatype testing
  - remember example of ub < lb

rob's random list:
=====================================================================
- do something about the weird PINT_sys_wait and PINT_mgmt_wait macros in
  client-state-machine.h