This is form field for PDF or Microsoft Word Document (both .doc and .docx) It will validate the file uploaded as a valid PDF and MS Word Document.
It extends a forms.FileField, so you can put all the arguments relevant to FileField.
IMPORTANT NOTE: The method of validation is actually run thru *nix OS shell command 'file', therefore,
- only *nix system can use this class.
- The file uploaded must be saved on disk, meaning you need to set your upload handler to use TempoaryFileUploadHandler Only.
(i.e. put this in your settings.py)
FILE_UPLOAD_HANDLERS = ( "django.core.files.uploadhandler.TemporaryFileUploadHandler", )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | from django import forms
from django.utils.translation import ugettext_lazy as _
class UploadedFileInMemoryError(Exception):
pass
class DocField(forms.FileField):
"""
This is form field for PDF or Microsoft Word Document (both .doc and .docx)
It will validate the file uploaded as a valid PDF and MS Word Document.
~~~~~~~~~
Usage:
import DocField
doc = models.DocField()
~~~~~~~~~
It extends a forms.FileField, so you can put all the arguments relevant to FileField.
IMPORTANT NOTE: The method of validation is actually run thru *nix OS shell command 'file',
therefore, 1. only *nix system can use this class.
2. The file uploaded must be saved on disk, meaning you need to set your upload handler to use TempoaryFileUploadHandler Only.
# (i.e. put this in your settings.py)
FILE_UPLOAD_HANDLERS = (
"django.core.files.uploadhandler.TemporaryFileUploadHandler",
)
"""
default_error_messages = {
'invalid': _(u"No file was submitted. Check the encoding type on the form."),
'missing': _(u"No file was submitted."),
'empty': _(u"The submitted file is empty."),
'not_doc': _(u"Upload a valid document. The file you uploaded was not a acceptable document or a corrupted document."),
}
def clean(self, data, initial=None):
super(DocField, self).clean(initial or data)
#before save check if the writing sample is valid
import os, re
from django.forms.util import ValidationError
match = r'PDF document|Microsoft Office Document|Zip archive data'
if hasattr(data, 'temporary_file_path'):
file = data.temporary_file_path()
else:
# throw an error because uploaded file in memory
raise UploadedFileInMemoryError('The file uploaded is stored in memory instead of disk and the validation cannot be performed.')
out = os.popen('file %s' % file)
ck = re.search(match, out.read())
if ck == None:
raise ValidationError(self.error_messages['not_doc'])
# check further for docx file as it's zip file
if ck.group(0)[0] == 'Z':
import zipfile
docx = 'word/document.xml'
if not zipfile.is_zipfile(file):
raise ValidationError(self.error_messages['not_doc'])
zf = zipfile.ZipFile(file)
if not docx in zf.namelist():
raise ValidationError(self.error_messages['not_doc'])
return data
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 3 months, 2 weeks ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 3 months, 3 weeks ago
- Serializer factory with Django Rest Framework by julio 10 months, 2 weeks ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 11 months, 1 week ago
- Help text hyperlinks by sa2812 12 months ago
Comments
In an effort to make this snippet portable, wouldn't it make more sense to look for the magic pattern in the file to identify it as a PDF or doc file?
For example, with PDF, make sure the file begins with "%PDF-" For a Word document, check that it being with "\x31\xbe\x00\x00" or "PO^Q`"
Those are merely taken from a /etc/gnome-vfg-mime-magic file on a Fedora 8 box, but given that specifications for both file formats are now openly available, I'm sure you can verify the true file magic necessary to identify these files.
#
Older Office documents are identified by:
So that should be added to the list.
#
Please login first before commenting.