Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#100928 - 23/06/2002 16:11 Archiving Maildir mail?
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
So here I am, months after my friend shows me the wonders of Maildir and a decent IMAP server, and now some of the wonder is disappearing. Why? Well, I now have certain folders that take my web mail interface literially minutes to open. And searching for a certain message just from two weeks ago takes forever. So I need a way of archiving my mail. My old way was to simply zip up Eudora, reinstall it, and put my filters and settings back into place. That worked well assuming I wanted to unzip and run a program any time I needed something out of the archive.

I'd prefer some way of just having a program run monthly that tosses all the mail into an archive folder with the month and year. I'd also like it to keep the existing folder structure at the time. I've seen some promising things, but nothing that quite does everything I need. The other solution would be a time consuming (for both me and the server) script to do it, but I assume someone else with maildir formatted mail has needed this as well.

And as a side mail question, anyone know how to get procmail to scan a folder of messages and filter it?

Top
#100929 - 23/06/2002 17:45 Re: Archiving Maildir mail? [Re: drakino]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Well, Pine does that for you.

You might also want to consider using an actually decent IMAP server.
_________________________
Bitt Faulk

Top
#100930 - 24/06/2002 08:24 Re: Archiving Maildir mail? [Re: wfaulk]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
I guess I could manually run Pine every month, I was hoping for something a bit more automated though to avoid being tied to any particular format.

And personally, I prefer the Maildir format, and would rather not switch back to an mbox style IMAP server. That was even more painful.

Top
#100931 - 24/06/2002 15:24 Re: Archiving Maildir mail? [Re: drakino]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Cyrus stores its mail in a proprietary format (it's really just one file per message plus a database file to speed indexing). It's about a million times faster than any single-file message store I've seen. Of course, that prevents you from accessing the mail store directly, but I can't come up with a good reason you'd want to do that anyway.

Let me put it this way. I've got a Cyrus server running on a Pentium 166. I have a particular mailbox that has 45,000 messages in it. For a client that hasn't cached any information about that, it takes approximately (hang on, let's see 6 minutes to load (over a 100Mbps network). Of course, that's loading headers for all 45,000 messages, something I hope your webmail client isn't doing. Loading any number of messages takes the same amount of time, no matter what set it is. And on-server searching is fast, too. Searching for the word "test" in the subject took on the order of ten seconds (and returned about 300 results). And that's all on a Pentium 166 that's doing other stuff, too. And it's running on OpenBSD, which has a mmap bug which slows it down significantly compared to a Solaris or Linux machine.

Which, of course, doesn't help your archiving problem. You could write a little script of some nature that performed these IMAP commands:
. login username password

. SELECT INBOX
. SEARCH SINCE "1-Jun-2002"
<get message ids from output>
. CREATE June-Archive
. COPY <message ids> June-Archive
. STORE <message ids> FLAGS \Deleted
. EXPUNGE
That should be pretty easy using Expect. There are also IMAP modules for Perl (and probably also for Python, Tcl, Ruby, etc.) that should make that even easier. (My server responded to the above search in a fraction of a second on that 45,000 message mailbox, BTW.)
_________________________
Bitt Faulk

Top
#100932 - 24/06/2002 17:29 Re: Archiving Maildir mail? [Re: wfaulk]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Hmm, ok, Cyus seems to use something similar to the standard maildir format, just in a proprietary way. Can it store the mail in home directories still? My root directory is rather small due to everything going into users home directories.

Also, how does procmail work with it? For me, procmail simply manipulates the maildir file structure to put the mail in the right box. And what exactly is Cyrus? Just the IMAP server, or does it replace Sendmail for example?

Top
#100933 - 24/06/2002 19:36 Re: Archiving Maildir mail? [Re: drakino]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
It uses something similar to the MH format, where each message is contained in its own file, plus a database. Maildir, unless I'm mistaken, is still a single file per mailbox.

Cyrus is just the IMAP server. It does not replace your MTA (sendmail, qmail, postfix, exim, etc.). It does not write data into home directories, as all messages are stored in Cyrus's own store, but you can specify the directory that's in (so you could put it in /home/cyrus if you wanted), and you can actually have multiple stores, but I've never had a server quite large enough to bother dealing with that.

It doesn't work with procmail by default (in fact, I'll bet in your case it will take procmail's place, since you're probably using procmail as your MDA). However, it provides a facility called sieve that performs the same type of filtering. In addition, you can usually set up your MTA to deliver directly into specific mailboxes based on extended email addresses. For example, the address [email protected] could be extended to [email protected] to have mail delivered directly to user's ``box'' maildrop, assuming that the permissions were set up on it correctly. Which brings me to the fact that you can set permissions on your mailboxes so that other IMAP users can use them, including mailboxes not owned by any user that can be used as bulletin boards.

It is, by far, IMHO, the best IMAP server out there, and has been since it's introduction. It is somewhat complicated to set up, especially initial user administration (since Cyrus users are not necessarily tied to Unix users, which is nice, since that means fewer potential security holes), but it's worth it.
_________________________
Bitt Faulk

Top
#100934 - 24/06/2002 23:18 Re: Archiving Maildir mail? [Re: wfaulk]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Maildir, unless I'm mistaken, is still a single file per mailbox.

Nope, with maildir, each message is a file, and each mailbox is three folders. cur, new and tmp. new is where mail is delivered, cur is where it's moved when a mail program touches it (not read, just touches it), and tmp is self explanitory. Theres several more rules to define it, like the filesystem date being the date it arrived, but that pretty much covers it.

in fact, I'll bet in your case it will take procmail's place, since you're probably using procmail as your MDA

Hmm, ok. My question is then, is there a really good spam filter for it? The only reason I am tied to procmail is the filter from www.spambouncer.org. It works really well, and saves me and my other users a ton of time. If I can replace it easially, I'll be happy to dump procmail, as I find it a pain for anything else. I was dreading the web interface I was thinking of creating for it.

Ok, so far you have convinced me to convert as long as you can answer two questions.

1. Can I install it alongside another IMAP server and change it's port for testing and such. I want to try it, but without bothering the existing setup. Plus, the existing setup will probably be phased out slowly, moving to a complete change when I reload the box on new hardware later this year.

2. What about archiving? I still want an easy way to do this, and working with a propritary mailbox format could complicate things.

Top
#100935 - 25/06/2002 12:32 Re: Archiving Maildir mail? [Re: drakino]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Spam is usually dealt with at the MTA level. And, honestly, I usually don't deal with that sort of thing (my personal feeling is that one mistakenly flagged message isn't worth any number of correctly flagged ones). However, I cannot imagine anything that procmail could do that Sieve couldn't (but I'm not an expert at either one, so I could be wrong). Check out more information at http://www.cyrusoft.com/sieve/

You can certainly install it alongside another IMAP server, and since it won't be using the same store, you won't even have any concurrency issues. However, you'll have a minor problem in getting mail to it. It's easily solvable, but you'll need to figure out the best method for your testing (certain addresses delivered there, separate SMTP server for it, etc.)

I think the best way to go about archiving is to set up a script that moves all the mail around using IMAP commands, like I suggested above. In fact, if you're looking to do this for multiple users, you could set up permissions on everyone's mailboxes so that a particular administrative pseudo-user has access to do this for everyone. Automating it should be trivial.
_________________________
Bitt Faulk

Top
#100936 - 02/07/2002 15:35 Re: Archiving Maildir mail? [Re: wfaulk]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868

Spam is usually dealt with at the MTA level. And, honestly, I usually don't deal with that sort of thing (my personal feeling is that one mistakenly flagged message isn't worth any number of correctly flagged ones). However, I cannot imagine anything that procmail could do that Sieve couldn't (but I'm not an expert at either one, so I could be wrong).


Well I agree with the flagged part, but I have my spam configuration set to not delete any messages, nor reply automaticially. Basicially the Spambouncer package assigns a score to the e-mail. The higher the score, the more likely it's spam. So I have 4 total basic folders, Inbox, bulk, blocked and spam. Spam is the highest scoring, and usually I get hit with 1-3 a day here. Out of all the messages that go there, very few are actually legitimate. Blocked is where lower scored ones go, and I get about 20 or so new messages there. These do contain a legitimate message here and there, but still a small percentage. Bulk gets any mail not directly to me (based on putting all my addresses into the config), and Inbox gets the rest. It works out well, because I don't miss peoples e-mail even if they do get filtered wrongly. I just tend to visit the spam boxes a few times a week, scan the subjects for anything I am expecting, and trash the rest.

Messages that are legitimate get filtered before the scoring system, saving both CPU time and my time since they typicially go to their own box. (CPU time due to the massive amount of filters, and DNS lookups it does against blacklist servers)

Replacing procmail with sieve would be a pain, because the Spambouncer is 477k worth of procmail scripts, typicially updated at least once a month.

I'll play around with the IMAP server you are recommending on a spare machine. I just don't see the benefits outweighing the work involved, considering it still dosen't solve my desire to archive mail easially either.

Top