It always makes sense to hang onto a good selection of spam messages, just in case you need to move to another system in the future, or in case your current system craps out and corrupts its database or something. It is much easier to get up and running with any of these Bayesian filters if you have spam and non-spam to train it on to start with.

I keep all my spam and non-spam. Anything that ends up in my spam folder gets archived away each day (so that it is outside of my mail boxes so things don't slow down). Any mail that I put in the trash also gets archived away as non-spam (I don't ever put spam in the trash folder).

I then have a bunch of data that I can feed at BogoFilter once every nine months when it comes time to rebuild my spam/non-spam databases because they come up with a better way to analyse the data. I feed the current contents of my filter folder and my archived trash in as non-spam and my archived spam as spam.

That is 150Mb of zipped spam so far and 60Mb of zipped trash (if I trash something big it doesn't get archived).

I really don't know how I managed before Bayesian filtering came along.
_________________________
Remind me to change my signature to something more interesting someday