One of my projects at work involves collecting a large number of small, fixed-length data records. Billions of records per day, less than 50 bytes each. We currently store the data in hourly binary files arranged in a relatively simple directory hierarchy to provide for cheap and fast indexing of the type of data, date/time collected, etc.

This works very well for the command line tools we provide to analyze the data, but some users want a richer query interface. Several of our users have actually gone through the trouble to import our data into an RDBMS to provide for more advanced queries, but this is obviously not an ideal solution for more than a small portion of the data that's collected. Adding in some type of SQL interface to our data has been suggested, and I'm wondering if anyone has been down this road before and could offer some advice.

My boss is currently talking with the folks who sell c-tree, which purports to do some of what we want, but we'd prefer something that's free and open source, since our tools are released that way. It's also important that we change our current on-disk format as little as possible to add the SQL layer in. The goal is to support SQL queries of our data as efficiently as possible without using an RDBMS as our storage engine.

Ideas, anyone?
_________________________
- Tony C
my empeg stuff