Improved Pickled Object Field

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
# --------------------------------------- fields.py  --------------------------------------- #

from copy import deepcopy
from base64 import b64encode, b64decode
from zlib import compress, decompress
try:
    from cPickle import loads, dumps
except ImportError:
    from pickle import loads, dumps

from django.db import models
from django.utils.encoding import force_unicode

class PickledObject(str):
    """
    A subclass of string so it can be told whether a string is a pickled
    object or not (if the object is an instance of this class then it must
    [well, should] be a pickled one).
    
    Only really useful for passing pre-encoded values to ``default``
    with ``dbsafe_encode``, not that doing so is necessary. If you
    remove PickledObject and its references, you won't be able to pass
    in pre-encoded values anymore, but you can always just pass in the
    python objects themselves.
    
    """
    pass

def dbsafe_encode(value, compress_object=False):
    """
    We use deepcopy() here to avoid a problem with cPickle, where dumps
    can generate different character streams for same lookup value if
    they are referenced differently. 
    
    The reason this is important is because we do all of our lookups as
    simple string matches, thus the character streams must be the same
    for the lookups to work properly. See tests.py for more information.
    """
    if not compress_object:
        value = b64encode(dumps(deepcopy(value)))
    else:
        value = b64encode(compress(dumps(deepcopy(value))))
    return PickledObject(value)

def dbsafe_decode(value, compress_object=False):
    if not compress_object:
        value = loads(b64decode(value))
    else:
        value = loads(decompress(b64decode(value)))
    return value

class PickledObjectField(models.Field):
    """
    A field that will accept *any* python object and store it in the
    database. PickledObjectField will optionally compress it's values if
    declared with the keyword argument ``compress=True``.
    
    Does not actually encode and compress ``None`` objects (although you
    can still do lookups using None). This way, it is still possible to
    use the ``isnull`` lookup type correctly. Because of this, the field
    defaults to ``null=True``, as otherwise it wouldn't be able to store
    None values since they aren't pickled and encoded.
    
    """
    __metaclass__ = models.SubfieldBase
    
    def __init__(self, *args, **kwargs):
        self.compress = kwargs.pop('compress', False)
        self.protocol = kwargs.pop('protocol', 2)
        kwargs.setdefault('null', True)
        kwargs.setdefault('editable', False)
        super(PickledObjectField, self).__init__(*args, **kwargs)
    
    def get_default(self):
        """
        Returns the default value for this field.
        
        The default implementation on models.Field calls force_unicode
        on the default, which means you can't set arbitrary Python
        objects as the default. To fix this, we just return the value
        without calling force_unicode on it. Note that if you set a
        callable as a default, the field will still call it. It will
        *not* try to pickle and encode it.
        
        """
        if self.has_default():
            if callable(self.default):
                return self.default()
            return self.default
        # If the field doesn't have a default, then we punt to models.Field.
        return super(PickledObjectField, self).get_default()

    def to_python(self, value):
        """
        B64decode and unpickle the object, optionally decompressing it.
        
        If an error is raised in de-pickling and we're sure the value is
        a definite pickle, the error is allowed to propogate. If we
        aren't sure if the value is a pickle or not, then we catch the
        error and return the original value instead.
        
        """
        if value is not None:
            try:
                value = dbsafe_decode(value, self.compress)
            except:
                # If the value is a definite pickle; and an error is raised in
                # de-pickling it should be allowed to propogate.
                if isinstance(value, PickledObject):
                    raise
        return value

    def get_db_prep_value(self, value):
        """
        Pickle and b64encode the object, optionally compressing it.
        
        The pickling protocol is specified explicitly (by default 2),
        rather than as -1 or HIGHEST_PROTOCOL, because we don't want the
        protocol to change over time. If it did, ``exact`` and ``in``
        lookups would likely fail, since pickle would now be generating
        a different string. 
        
        """
        if value is not None and not isinstance(value, PickledObject):
            # We call force_unicode here explicitly, so that the encoded string
            # isn't rejected by the postgresql_psycopg2 backend. Alternatively,
            # we could have just registered PickledObject with the psycopg
            # marshaller (telling it to store it like it would a string), but
            # since both of these methods result in the same value being stored,
            # doing things this way is much easier.
            value = force_unicode(dbsafe_encode(value, self.compress))
        return value

    def value_to_string(self, obj):
        value = self._get_val_from_obj(obj)
        return self.get_db_prep_value(value)

    def get_internal_type(self): 
        return 'TextField'
    
    def get_db_prep_lookup(self, lookup_type, value):
        if lookup_type not in ['exact', 'in', 'isnull']:
            raise TypeError('Lookup type %s is not supported.' % lookup_type)
        # The Field model already calls get_db_prep_value before doing the
        # actual lookup, so all we need to do is limit the lookup types.
        return super(PickledObjectField, self).get_db_prep_lookup(lookup_type, value)

# --------------------------------------- tests.py  --------------------------------------- #

"""Unit testing for this module."""

from django.test import TestCase
from django.db import models
from fields import PickledObjectField

class TestingModel(models.Model):
    pickle_field = PickledObjectField()
    compressed_pickle_field = PickledObjectField(compress=True)
    default_pickle_field = PickledObjectField(default=({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))

class TestCustomDataType(str):
    pass

class PickledObjectFieldTests(TestCase):
    def setUp(self):
        self.testing_data = (
            {1:2, 2:4, 3:6, 4:8, 5:10},
            'Hello World',
            (1, 2, 3, 4, 5),
            [1, 2, 3, 4, 5],
            TestCustomDataType('Hello World'),
        )
        return super(PickledObjectFieldTests, self).setUp()
    
    def testDataIntegriry(self):
        """
        Tests that data remains the same when saved to and fetched from
        the database, whether compression is enabled or not.
        
        """
        for value in self.testing_data:
            model_test = TestingModel(pickle_field=value, compressed_pickle_field=value)
            model_test.save()
            model_test = TestingModel.objects.get(id__exact=model_test.id)
            # Make sure that both the compressed and uncompressed fields return
            # the same data, even thought it's stored differently in the DB.
            self.assertEquals(value, model_test.pickle_field)
            self.assertEquals(value, model_test.compressed_pickle_field)
            model_test.delete()
        
        # Make sure the default value for default_pickled_field gets stored
        # correctly and that it isn't converted to a string.
        model_test = TestingModel()
        model_test.save()
        model_test = TestingModel.objects.get(id__exact=model_test.id)
        self.assertEquals(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]), model_test.default_pickle_field)


    def testLookups(self):
        """
        Tests that lookups can be performed on data once stored in the
        database, whether compression is enabled or not.
        
        One problem with cPickle is that it will sometimes output
        different streams for the same object, depending on how they are
        referenced. It should be noted though, that this does not happen
        for every object, but usually only with more complex ones.
                
        >>> from pickle import dumps
        >>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, \
        ... 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
        >>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, \
        ... 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))
        "((dp0\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np1\n(I1\nI2\nI3\nI4\nI5\ntp2\n(lp3\nI1\naI2\naI3\naI4\naI5\natp4\n."
        >>> dumps(t)
        "((dp0\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np1\n(I1\nI2\nI3\nI4\nI5\ntp2\n(lp3\nI1\naI2\naI3\naI4\naI5\natp4\n."
        >>> # Both dumps() are the same using pickle.

        >>> from cPickle import dumps
        >>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
        >>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))
        "((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
        >>> dumps(t)
        "((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\n(I1\nI2\nI3\nI4\nI5\nt(lp2\nI1\naI2\naI3\naI4\naI5\natp3\n."
        >>> # But with cPickle the two dumps() are not the same!
        >>> # Both will generate the same object when loads() is called though.

        We can solve this by calling deepcopy() on the value before
        pickling it, as this copies everything to a brand new data
        structure.
        
        >>> from cPickle import dumps
        >>> from copy import deepcopy
        >>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
        >>> dumps(deepcopy(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])))
        "((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
        >>> dumps(deepcopy(t))
        "((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
        >>> # Using deepcopy() beforehand means that now both dumps() are idential.
        >>> # It may not be necessary, but deepcopy() ensures that lookups will always work.
        
        Unfortunately calling copy() alone doesn't seem to fix the
        problem as it lies primarily with complex data types.
        
        >>> from cPickle import dumps
        >>> from copy import copy
        >>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
        >>> dumps(copy(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])))
        "((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
        >>> dumps(copy(t))
        "((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\n(I1\nI2\nI3\nI4\nI5\nt(lp2\nI1\naI2\naI3\naI4\naI5\natp3\n."

        """
        for value in self.testing_data:
            model_test = TestingModel(pickle_field=value, compressed_pickle_field=value)
            model_test.save()
            # Make sure that we can do an ``exact`` lookup by both the
            # pickle_field and the compressed_pickle_field.
            model_test = TestingModel.objects.get(pickle_field__exact=value, compressed_pickle_field__exact=value)
            self.assertEquals(value, model_test.pickle_field)
            self.assertEquals(value, model_test.compressed_pickle_field)
            # Make sure that ``in`` lookups also work correctly.
            model_test = TestingModel.objects.get(pickle_field__in=[value], compressed_pickle_field__in=[value])
            self.assertEquals(value, model_test.pickle_field)
            self.assertEquals(value, model_test.compressed_pickle_field)
            # Make sure that ``is_null`` lookups are working.
            self.assertEquals(1, TestingModel.objects.filter(pickle_field__isnull=False).count())
            self.assertEquals(0, TestingModel.objects.filter(pickle_field__isnull=True).count())
            model_test.delete()
        
        # Make sure that lookups of the same value work, even when referenced
        # differently. See the above docstring for more info on the issue.
        value = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
        model_test = TestingModel(pickle_field=value, compressed_pickle_field=value)
        model_test.save()
        # Test lookup using an assigned variable.
        model_test = TestingModel.objects.get(pickle_field__exact=value)
        self.assertEquals(value, model_test.pickle_field)
        # Test lookup using direct input of a matching value.
        model_test = TestingModel.objects.get(
            pickle_field__exact = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]),
            compressed_pickle_field__exact = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]),
        )
        self.assertEquals(value, model_test.pickle_field)
        model_test.delete()

More like this

  1. Pickled Object Field by obeattie 6 years, 4 months ago
  2. Improved Pickled Object Field (Fixed for Django 1.2) by danielsokolowski 3 years, 2 months ago
  3. Accessing URL variable from within a Form by blacktav 7 months, 3 weeks ago
  4. Simple Plone Migration by msm-art 6 years, 3 months ago
  5. Custom model field to store dict object in database by rudyryk 4 years ago

Comments

jamesgpearce (on August 30, 2009):

I have a baffling problem with this. I am trying to save an (unconnected) model instance into the field.

It works when the instance is in a field in a new record: it gets nicely pickled in the INSERT and I can see it in the database.

But it does not work on update and NULL gets written to the database (regardless of my default). Programmatically, the field contains the instance prior to (and post) the save - but somewhere between there and the UPDATE SQL it goes missing.

(Everything works perfectly if the pickled instance is not of a subclass of models.Model. Even models.Manager can be pickled!)

I am no field-extension expert and I am having trouble tracking this down. In the meantime any thoughts?

#

jamesgpearce (on August 30, 2009):

To (mostly) answer my own question.

My issue is way down at the bottom of the code, just before the SQL execution of an UPDATE:

if hasattr(val, 'prepare_database_save'): val = val.prepare_database_save(field) else: val = field.get_db_prep_save(val)

(It doesn't do this for INSERTs for reasons I don't quite understand... but that's why the inserts DO work)

Of course all models implement prepare_database_save (in order to get the ID for a foreign key relationship), and so the value turns into that key at the last minute (instead of going through your pickling code in get_db_prep_save).

And because my model is 'abstract' - in the sense that it hasn't gone into the database in the traditional way - it has no ID. Hence 'NULL' for the PickledObjectField value after an update.

Hard to find... not too hard to fix. (These 'picklable' models just need to derive from a super class that overrides that method to do get_db_prep_save instead).

Thought I'd go to the effort of writing it up, since I've seen at least one other person trying to do something similar (for an undo stack of model state

Otherwise, a wonderful snippet.

#

taavi223 (on September 2, 2009):

James,

That's a nice find! I mainly use the field for storing dictionary data that is arbitrary and that I don't need to query against, so I probably never would have found that error.

I spent a little bit of time trying to find a true solution, but was unable to come up with one. An easy workaround however, is to wrap the model object inside of a list or tuple. Since the list/tuple would not have the prepare_database_save method, it will call the field's get_db_prep_value as usual. Not fully transparent, but it does prevent the problem from occurring.

Another possibility is to write a proxy class for the model you wish to store, like so:

from django.db import models
from fields.py import dbsafe_encode, PickledObjectField

class MyClass(models.Model):
    pickled_field = PickledObjectField()

class MyProxyClass(MyClass):

    def prepare_database_save(self, field):
        return dbsafe_encode(self, field.compress)

    class Meta:
        proxy=True

You can then use the proxy class when assigning a model to the PickledObjectField and it should work as expect (although I haven't tested this out explicitly). This probably won't work well if you're trying to store an arbitrary model though, since you'd need a proxy class for each and every model.

Let me know if you find any other problems; I'll do my best to help solve them.

In other news I've fixed a few bugs with the snippet. Despite my best efforts, a change I thought I made somehow wasn't included (although the docstrings mentioned it--so where did it go!?). To fix this, I've once again added the deepcopy function into dbsafe_encode, so now lookups should work in all cases.

Second, I fixed the snippet's value_to_string method, so that serialization should now actually work as expected. Before, serializing a model with a PickledObjectField would return not the encoded object as expected, but the encoded __repr__ of the object. Can't believe I missed that.

Finally, I've changed the field to now be editable=False by default. I had changed it to this earlier, but somehow (like with deepcopy) it managed to disappear. Having the object editable in the admin is a bad idea, since any stored object will be converted to a string for display and then upon save, the string will be written to the database instead of the original object.

#

ivankirigin (on September 13, 2009):

I just got this error: ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '})' at line 1")

I'm working with MySQL on OS X, and maybe a too old version of django. I got the same unicode errors others got with the original snippet, and I'm trying to solve it.

Any ideas on what the hell is going on?

Thanks

#

taavi223 (on September 13, 2009):

ivankirigin,

Can you post the actual traceback you're getting and what data you're trying to save into the PickledObjectField? Without some more details I don't know what the problem could be. Also, are you getting both the ProgrammingError and the DjangoUnicodeError? More details would really help with troubleshooting this...

#

erussell (on September 30, 2009):

I made a form field to edit PicledObjectFields as JSON in the admin. This doesn't work if you're storing objects in your pickled field that can't be JSON-encoded. But for simple objects like dictionaries, it works very well. Add the following to the PickledObjectField class:

def formfield(self, **kwargs):
    defaults = {'form_class': JSONField}
    defaults.update(kwargs)
    return super(PickledObjectField, self).formfield(**defaults)

Then add this code to fields.py:

from django import forms
from django.forms import widgets
from django.forms.util import flatatt, ValidationError
from django.utils import simplejson
from django.utils.safestring import mark_safe
from django.utils.html import conditional_escape

class JSONWidget (widgets.Widget):

    def __init__(self, attrs=None):
        self.attrs = {'cols': '84', 'rows': '5'}
        if attrs:
            self.attrs.update(attrs)

    def render (self, name, value, attrs=None, choices=()):
        if not isinstance(value, unicode):
            value = simplejson.dumps(value)
        final_attrs = self.build_attrs(attrs, name=name)
        return mark_safe(
                u'<textarea%s>%s</textarea>' % 
                    ( flatatt(final_attrs), conditional_escape(force_unicode(value)) ) 
            )

    def value_from_datadict(self, data, files, name):
        return data.get(name, u'{ }')

class JSONField (forms.Field):

    widget = JSONWidget

    default_error_messages = {
        'invalid': u'Enter a valid JSON string.'
    }

    def __init__(self, max_value=None, min_value=None, *args, **kwargs):
        super(JSONField, self).__init__(*args, **kwargs)

    def clean (self, value):
        super(JSONField, self).clean(value)
        if value is None or value == '':
            return { }
        try:
            value = simplejson.loads(value) 
        except ValueError:
            raise ValidationError(self.error_messages['invalid'])
        return value

#

jkafader (on November 26, 2009):

This may be obvious (to you) but the information may save somebody some time down the road: if you use erussel's code for turning on editing via JSON above, you must also delete the line

 kwargs.setdefault('editable', False)

in the original class from above, otherwise you'll get an error in the admin that is difficult to debug.

#

sachmonkey (on December 24, 2009):

I found an issue that I was hoping you could take a look at.

I have a model that has a PickledObjectField, which works fine. But then I run a QuerySet operation in which I defer() the PickledObjectField, then make a few edits to the model, and then perform a model.save().

But now when I attempt to read the PickledObjectField value in all future QuerySet operations, I get returned the raw base64 pickled string instead of a python object! It seems like the data has been corrupted somehow. But sometimes I can manually call dbsafe_decode to get back the python object, but I shouldn't have to do that. But even using dbsafe_decode only works sometimes.

When I looked into the SQL queries being executed on the model.save(), it runs a SELECT to get the value of the pickledobjectfield from the db first so that it has the full model which is then saved. It appears that the full pickled object field string is appropriately saved, but for some reason it still isn't working.

Any help would be great!

#

(Forgotten your password?)