A mildly crufty script to slurp mail from an mbox into a Django model. I use a variant of this script to pull the contents of my scammy-spam mbox into the database displayed at http://purportal.com/spam/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | #!/usr/bin/env python
"""
This code presumes this model:
class Message(models.Model):
subject = models.CharField(maxlength=250)
date = models.DateField()
body = models.TextField()
raw = models.TextField()
"""
import os, mailbox, email, datetime, shutil, sys
from email.Utils import parsedate # change to email.utils for Python 2.5
set os.environ["DJANGO_SETTINGS_MODULE"] = "YOURPROJECT.settings"
# set sys.path as needed
from models import Message
from MySQLdb import OperationalError
MAILBOX = '/path/to/mbox'
mbox = file(MAILBOX, 'rb')
for message in mailbox.PortableUnixMailbox(mbox, email.message_from_file):
try:
date = datetime.datetime(*parsedate(message['date'])[:6])
except TypeError: # silently ignore badly-formed dates
date = datetime.datetime.now()
try:
msg = Message(
subject=message['subject'],
date=date,
body=message.get_payload(decode=False),
raw=message.as_string(),
)
print "Adding: %s..." % msg.subject[:40]
msg.save()
except OperationalError:
print "Trouble parsing message (%s...)" % msg.subject[:40]
print "Archive now contains %s messages" % Message.objects.count()
# Depending on your application, you might clear the mbox now: open(MAILBOX, "w").write("")
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 10 months, 2 weeks ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 10 months, 3 weeks ago
- Serializer factory with Django Rest Framework by julio 1 year, 5 months ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 6 months ago
- Help text hyperlinks by sa2812 1 year, 6 months ago
Comments
Great idea! But it seems to broke one of the main python power - one language, many platforms (of course IMHO :). This will be usable only for Unix-like OS'es. An option for me is to connect to POP3 and get all data from there.
#
Yes, it's definitely Unix-specific. I'll leave writing a generalized version as an exercise for the reader, since it would likely quadruple in size (and not be any more useful to me personally!).
#
Oh, and it also has a gratuitous hardcoded reference to MySQLdb, simply because the DB was frequently barfing on bad dates (the mail I'm processing with this is spam). One could certainly find a way to catch that without resorting to a DB-specific reference.
#
Please login first before commenting.