Login

ByteSplitterField

Author:
Lacour
Posted:
August 24, 2011
Language:
Python
Version:
1.3
Tags:
db field custom-model-field model custom model-field database multibit-field IntegerField
Score:
0 (after 0 ratings)

When you want to save integers to the db, you usually have the choice between 16-, 32- and 64-bit Integers (also 8- and 24-bit for MySQL). If that doesn't fit your needs and you want to use your db-memory more efficient, this field might be handy to you. Imagine you have 3 numbers, but need only 10 bit to encode each (i.e. from 0 to 1000). Instead of creating 3 smallint-fields (48 bit), you can create one 'ByteSplitterField' which implements 3 'subfields' and automatically encodes them inside a 32 bit integer. You don't have to take care how each 10-bit chunk is encoded into the 32-bit integer, it's all handled by the field (see also field's description). Additionally, the Field offers opportunity to use decimal_places for each of your subfields. These are 'binary decimal places', meaning the integer-content is automatically divided by 2, 4, 8, etc. when you fetch the value from the field. You can also specify how values are rounded ('round' parameter) and what happens when you try to save a value out of range ('overflow' parameter)

Not implemented (maybe in the future if I should need it sometime):

  • signed values. All values are positive right now!
  • real (10-based) decimal places (actually you could probably directly use DecimalFields here)
  • further space optimization, i.e. saving into CharField that's length can be chosen byte-wise
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
from __future__ import division
import decimal
from django.core import exceptions
from django.db import models

class ByteSplitterField(models.IntegerField):
    description = """
    A field that stores multiple (positive) number values inside virtual 'subfields'
    of an IntegerField. These numbers can be integers or decimals with a defined
    precision of n binary digits, giving a precision of 2 ** (-n).
    Remember that you will NOT be able to select or filter by any of the subfields!
    You can also only save the whole bunch of fields, not single subfields to the db.
    Configure Field with keywords:
        - 'subfield_names' (default=['value']): iterable of names for your subfields.
        - 'subfield_lengths (default=[64])': iterable of length in bits for each of the fields
        - 'subfield_decimal_places' (default=[0, ..., 0]): decimal-precision
                for each of the fields, omit to set 0 for each field (=int-field)
        - 'round' (default=int): choose how to save floats: floor-round <int> or <round>
        - 'overflow' (default='error'): choose what happens with out-of-range-values:
                raise 'error', wrap value in case of 'overflow' or 'truncate'

    Usage example of a field that stores 3 numbers from [0, 8[ with precison 1, 0.5 and 0.25
    and otherfield that stores a single value with 3 binary digits (precision 0.125):
    >>> class MyModel(models.Model):
    >>>     multifield = ByteSplitterField(subfield_names=[0, 1, 'x'], subfield_lengths=[3, 4, 5],
    >>>         subfield_decimal_places=[0, 1, 2], round=round) #will be represented as 'smallint'
    >>>     otherfield = ByteSplitterField(subfield_lengths=13, subfield_decimal_places=3,
    >>>         overflow='truncate') # single field that saves number 0, 0.125, ... 1023.875
    >>> # set the fields on 'instance', which is an instance of MyModel()
    >>> instance.multifield = {0: 3, 1: 2.3, 'x': 6.78}   #If you'd omit a subfield, it will be zero
    >>> instance.otherfield = 2000  # you shouldn't do that, because...
    >>> instance.otherfield         # accessing the attribute again will return {'value': 250}
    >>> instance.otherfield['value'] = 2000   # this is the correct way
    >>> instance.save()
    >>> # after you fetch it again from the db, you can access the fields
    >>> instance.multifield   # will return {0: 3.0, 1: 2.5, 'x': 6.75}
    >>>     # notice that key 1 would be 2.0 for round=int
    >>> instance.multifield[0]  # will return 3
    >>> instance.otherfield['value']   # will return 1023.875 because of truncation
    """
    __metaclass__ = models.SubfieldBase

    def __init__(self, *args, **kwargs):
        self.subfield_names = kwargs.pop('subfield_names', ['value'])
        if not hasattr(self.subfield_names, '__iter__'):
            self.subfield_names = [self.subfield_names]

        self.subfield_lengths = kwargs.pop('subfield_lengths', 64)
        if not hasattr(self.subfield_lengths, '__iter__'):
            self.subfield_lengths = [self.subfield_lengths]
        for length in self.subfield_lengths:
            assert type(length) is int, "Please provide 'subfield_lengths': an iterable of type(int): %s" %self.subfield_lengths
            assert length >= 1, "Length of each subfield must be >= 1 bit, but is %d bits" %length

        self.subfield_decimal_places = kwargs.pop('subfield_decimal_places', None)
        if self.subfield_decimal_places is None:
            self.subfield_decimal_places = [0 for i in range(len(self.subfield_lengths))]
        if not hasattr(self.subfield_decimal_places, '__iter__'):
            self.subfield_decimal_places = [self.subfield_decimal_places]
        for dec in self.subfield_decimal_places:
            assert type(dec) is int, "'subfield_decimal_places' must be an iterable of type(int): %s" %self.subfield_decimal_places
        assert len(self.subfield_names) == len(self.subfield_lengths) == len(self.subfield_decimal_places),\
            "'subfield_lengths' must have the same length as 'subfield_names' (and also 'subfield_decimal_places' if you pass this keyword)"
        self.n_bits = reduce(lambda x, y: x+y, self.subfield_lengths)   #required length in bits
        assert self.n_bits <= 64,\
            "Sorry, but currently a maximum of 64 bit is supported (stored as 'bigint' on the db-backend), you requested a total of %d bits" %self.n_bits

        self.round = kwargs.pop('round', int)
        assert self.round in (int, round), "Please provide the built-in function <int> or <round> with the keyword 'round', not: %s" %self.round
        if self.round == round:
            self.round = lambda x: int(round(x))

        self.overflow = kwargs.pop('overflow', 'error')
        assert self.overflow in ('error', 'overflow', 'trunc', 'truncate'),\
            "'overflow' must be set to 'error', 'overflow' or 'truncate', not '%s'" % self.overflow
        super(ByteSplitterField, self).__init__(*args, **kwargs)

    def db_type(self, connection):
        if self.n_bits <= 8 and connection.settings_dict['ENGINE'] == 'django.db.backends.mysql':
            return "tinyint unsigned"
        elif self.n_bits <= 16:
            return connection.creation.data_types['PositiveSmallIntegerField']
        elif self.n_bits <= 24 and connection.settings_dict['ENGINE'] == 'django.db.backends.mysql':
            return "mediumint unsigned"
        elif self.n_bits <= 32:
            return connection.creation.data_types['PositiveIntegerField']
        # since django doesn't know a PositiveBigIntegerField, the db-representation is made manually (only tested for MySQL!)
        elif self.n_bits <= 64 and connection.settings_dict['ENGINE'] == 'django.db.backends.mysql':
            return "bigint unsigned"
        elif self.n_bits <= 64 and connection.settings_dict['ENGINE'] == 'django.db.backends.oracle':
            return "NUMBER(19) CHECK (%(qn_column)s >= 0)"
        elif self.n_bits <= 64 and connection.settings_dict['ENGINE'] == 'django.db.backends.postgresql':
            return 'bigint CHECK ("%(column)s" >= 0)'
        elif self.n_bits <= 64 and connection.settings_dict['ENGINE'] == 'django.db.backends.sqlite3':
            return "bigint unsigned"

    def to_python(self, value):
        """
        Returns a dict of {'subfield_name': subfield_value, ...}
        """
        if value == None:
            return None
        #if the value is a string-representation, try to evaluate the string
        if type(value) in (str, unicode):
            try:
                value = eval(value)
            except:
                raise exceptions.ValidationError(self.error_messages['invalid'])
        #if value is a dict, check if it fits the model (i.e. contains all subfield_names and has values of correct type)
        if type(value) is dict:
            for k, v in value.items():
                if k not in self.subfield_names:
                    raise exceptions.ValidationError("This is not a valid subfield_name: '%s'" %k)
                if type(v) not in (int, float, decimal.Decimal):
                    raise exceptions.ValidationError("subfield_name '%s' must be of type int, float or Decimal, but is %s" %(k, type(v)))
            return value  #valid dict is directly returned
        #now the value must be convertable to int (comes from db or from python via model's __setattr__,
        #which is something that shouldn't be done (see field's description) but sadly can't be distinguished here)
        try:
            value = int(value)
        except (TypeError, ValueError):
            raise exceptions.ValidationError(self.error_messages['invalid'])
        #standard-case: value is an int from db: process it
        result = {}
        for i in range(len(self.subfield_names) -1, -1, -1):   #reverse range
            result[self.subfield_names[i]] = (value % (2 ** self.subfield_lengths[i])) / (2 ** self.subfield_decimal_places[i])
            value >>= self.subfield_lengths[i]
        return result


    def get_prep_value(self, value):
        if type(value) <> dict:
            return None
        for k in value.keys():
            if k not in self.subfield_names:
                raise exceptions.ValidationError("This subfield_name doesn't exist: %s. Choices are %s" % (k, value.keys()))
        result = 0
        for i in range(len(self.subfield_names)):
            result <<= self.subfield_lengths[i]
            number = value.get(self.subfield_names[i], None)
            if number <> None:
                value_db = self.round(number * (2 ** self.subfield_decimal_places[i]))
                value_db_max = (2 ** self.subfield_lengths[i] - 1)
                if value_db > value_db_max or value_db < 0:
                    if self.overflow == 'error':
                        raise exceptions.ValidationError(
                            "The value %(value)f for field '%(field)s' (rounded to %(value_round)f) is out of specified range: [0, %(range)f]"
                            %{'value': value[self.subfield_names[i]], 'value_round': value_db / (2 ** self.subfield_decimal_places[i]),
                            'field': self.subfield_names[i], 'range': value_db_max / (2 ** self.subfield_decimal_places[i])})
                    elif self.overflow in ('truncate', 'trunc'):
                        result += max(0, min(value_db_max, value_db))
                    elif self.overflow == 'overflow':
                        value_db &= (2 ** self.subfield_lengths[i] - 1)
                        result += value_db
                else:
                    result += value_db
        return result

More like this

  1. Translated choices fields by anibal 6 years, 10 months ago
  2. Improved Pickled Object Field by taavi223 5 years, 9 months ago
  3. UTC DateTime field by ludo 7 years, 9 months ago
  4. Admin action for a generic "CSV Export" by javinievas 4 years, 3 months ago
  5. db_dump.py - for dumpping and loading data from database by limodou 8 years, 3 months ago

Comments

Please login first before commenting.