README.txt
PyTables 0.9.1
http://pytables.sf.net/
Nov 26, 2004
--------------------------------------

PyTables is a hierarchical database package designed to efficiently
manage very large amounts of data.

It is built on top of the HDF5 library and the numarray package. It
features an object-oriented interface that, combined with C extensions
for the peformance-critical parts of the code (generated using Pyrex),
makes it a fast, yet extremely easy to use tool for interactively save
and retrieve very large amounts of data. One important feature of
PyTables is that it optimizes memory and disk resources so that they
take much less space (between a factor 3 to 5, and more if the data is
compressible) than other solutions, like for example, relational or
object oriented databases.

PyTables is not designed to work as a relational database competitor,
but rather as a teammate. If you want to work with large datasets of
multidimensional data (for example, for multidimensional analysis), or
just provide a categorized structure for some portions of your
cluttered RDBS, then give PyTables a try. It works well for
storing data from data acquisition systems (DAS), simulation
software, network data monitoring systems (for example, traffic
measurements of IP packets on routers), or as a centralized repository
for system logs, to name only a few possible uses.

A table is defined as a collection of records whose values are stored
in fixed-length fields. All records have the same structure and all
values in each field have the same data type.  The terms
"fixed-length" and strict "data types" seems to be quite a strange
requirement for an interpreted language like Python, but they serve a
useful function if the goal is to save very large quantities of data
(such as is generated by many scientifc applications, for example) in
an efficient manner that reduces demand on CPU time and I/O.

In order to emulate records (C structs in HDF5) in Python, PyTables
implements a special metaclass that detects errors in field
assignments. PyTables also provides an easy to use, yet powerful,
interface to process table data.

There are other useful objects like arrays, enlargeable arrays or
variable length arrays that can cope with different missions on your
project (see the reference library chapter in User's Manual:
http://pytables.sourceforge.net/html-doc/usersguide4.html).

Quite a bit effort has been invested to make browsing the hierarchical
data structure a pleasant experience. PyTables implements a few
easy-to-use methods for browsing. See the documentation (located in
the doc/ directory) for more details.

One of the principal objectives of PyTables is to be user-friendly.
To that end, the improvements introduced in Python 2.2 (such as
generators, slots and metaclasses in new-brand classes) have been
used.  In addition, iterators has been implemented were context was
appropriate so as to enable the interative work to be as productive as
possible.  Python 2.2 is also required in order to allow PyTables make
use of Pyrex, a convenient tool to access C libraries from Python
using Python syntax. For these reasons, you will need to use Python
2.2 or higher (Python 2.3.3 or better recommended) to take advantage
of PyTables.

To compile PyTables you will need, at least, a recent version of HDF5
(C flavor) library, the Zlib compression library and the numarray
package. Besides, if you want to take advantage of the LZO and UCL
compression libraries support you will also need recent versions of
them. These two compression libraries are, however, optional.

I've tested this PyTables version with HDF5 1.6.2 and 1.6.3-patch
versions and numarray 1.1, and you *need* to use these versions, or
higher, to make use of PyTables. Albeit you won't need Numeric Python
in order to compile PyTables, it is supported; you only will need a
reasonably recent version of it (>= 21.x). PyTables has been
succesfully tested against Numeric 21.3, 22.0, 23.0 and 23.1.

I'm using Linux on top of Intel as the main development platform, but
PyTables should be easy to compile/install on other UNIX machines.
This package has also been successfully compiled and tested on a
UltraSparc platform with Solaris 7 and Solaris 8, a SGI Origin2000
with MIPS R12000 processors running IRIX 6.5 (with both gcc and
MIPSPro compilers), Microsoft Windows and MacOSX (10.2 although 10.3
should work fine as well). In particular, it has been thoroughly
tested on 64-bit platforms, like Linux-64 on top of an Intel Itanium,
AMD Opteron (in 64-bit mode) or PowerPC G5 (in 64-bit mode) where all
the tests pass successfully.

Nonetheless, caveat emptor: more testing is needed to achieve complete
portability, I'd appreciate input on how it compiles and installs on
other platforms.


Installation
------------

The Python Distutils are used to build and install PyTables, so it is
fairly simple to get things ready to go. Following are very simple
instructions on how to proceed. However, more detailed instructions,
including a section on binary installation for Windows users, is
available in Chapter 2 of the User's Manual (doc/usersguide.pdf or
http://pytables.sourceforge.net/html-doc/usersguide.html).

1. First, make sure that you have HDF5 and numarray installed (you
   will need at least HDF5 1.6.2 and numarray 1.0). If don't, you can
   find them at http://hdf.ncsa.uiuc.edu/HDF5 and
   http://sourceforge.net/projects/numpy/. Compile/install them.

   Caveat emptor: HDF5 1.6.3 has a bug that cause a seg fault when
   deleting a chunked dataset. This bug is exposed in pytables when
   deleting indexes. If you are having this problem, then go to
   ftp://hdf.ncsa.uiuc.edu/HDF5/hdf5-1.6.3/src/patches and download
   the patched version (1.6.3-patch) that has a cure for this.

   Optionally, consider to install the excellent LZO and UCL
   compression libraries from http://www.oberhumer.com/opensource/.

2. From the main PyTables distribution directory run this command,
   (plus any extra flags needed as discussed above):

        python setup.py build_ext --inplace

3. To run the test suite change into the test directory, set the
   PYTHONPATH environment variable to include the ".." directory and
   issue the command:
   
        python test_all.py

   If you would like to see some verbose output from the tests simply
   add the flag "-v" and/or the word "verbose" to the command line.
   You can also run just the tests in a particular test module.  
   For example:

        python test_types.py -v

   If there is some test that do not pass, please, run the failing
   test module with all verbosity enabled (flags -v verbose), and send
   back the output to me.

   If you run into problems because Python can't load the HDF5, or any
   other shared library:

4. To install the entire PyTables Python package, change back to the
   root distribution directory and run this command as the root user
   (remember to add any extra flags needed):

        python setup.py install


That's it!  Good luck, and let me know of any bugs, suggestions, gripes,
kudos, etc. you may have.

-- Francesc Altet
falted@pytables.org
