Quote:
Well, there are some potential intermediate steps, such as implementing some sort of indexing, but you're talking about reimplementing portions of a database engine. But I think you're basically right.

There are libraries that will allow us to do b-tree indexing with very little fuss, and we've experimented with them. The problem in our case is that with such small records, and so many of them, indexing becomes a bit crazy, because the indexes end up being nearly as large as the data they're indexing.

Really, though, this is a bit of a tangent, since our customers don't have a problem with the speed of the current system. What they want is added flexibility on the query side.

Quote:
Tony, have you looked at MySQL's CSV engine? It doesn't actually provide any indexing, though, so it'll still be a big churn, albeit a somewhat automated one.


I don't think that's the right way to go. Our on-disk representation is as compact as possible, whereas CSV is very wasteful. Also, there would be a ton of conversion back and forth between textual representation of numbers, timestamps, IP addresses, etc. and the native C representations we use in our tools.

The goal here is to keep the existing record data structure as intact as possible. Adding stuff to each file or each record is acceptable, changing it to some file format that's managed by the database probably is not.

Quote:
Why would you need to partition the data across multiple database files?


Because that is how our collection system currently works, and I'm trying to provide parity with what we have without rewriting all of our tools. As I mentioned above, using this hierarchical directory structure and having files broken down with this granularity provides a hierarchical composite index of (type, class, year/month/day, sensor) at virtually no cost, and probably 80% of user queries take advantage of this pattern. This way, we optimize for the most common case by using the filesystem's very efficient data structures.

I had given some thought to SQlite, but my experience with it in the past was that it didn't scale to the level we need. Maybe I'll play around with a scaled-down set of our data to see if it's gotten better since then. I was hoping for something that wouldn't force us to adopt someone else's storage engine, but if we do have to drink the RDBMS kool-aid, SQLite is probably the least onerous to work with on disk.
_________________________
- Tony C
my empeg stuff