Login

CSVImport

Author:
jonathanholst
Posted:
March 9, 2008
Language:
Python
Version:
.96
Score:
1 (after 1 ratings)

Importing data from other sources than SQL can be an annoyance. This script serves as a general tool for importing data from CSV files.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
"""
CSV Import 0.1
==============

Import a Comma Separated Values file and map it to a Django model.

I apologise for the rather clumsy format, but there does not really seem to
be an intuitive way to do this. I hope the documentation will suffice.

Sparse documentation
====================

CSV Import is run from the commandline. It has a few options: --csv, --help,
--mappings, --model, --modelspy, --name-indexes and --test.

--csv
-----

The path reference to the CSV file from which the script will import. Can be
either relative or absolute, as a regular UNIX file reference.

--help
------

Self-explanatory, shows documentation of these options.

--mappings
----------

Definition of the column-to-field mapping. Written in the format 
column=field(ForeignKeyName|field). The foreign key part is optional, and
may be omitted completely, leaving column=field.

--model
-------

The model in which the contents of the CSV file will be inserted.

--modelspy
----------

The path of the models.py file in which the model is located.

--name-indexes
--------------

Whether or not to use the first line of the CSV file as an index, giving the
possibility to refer to a column by its name. If omitted, column references 
are written as [column number]=field, e.g. 1=field.

Example runs
============

An example could look like the following. Please note that all occurences of
-> on the end of a line indicate a hard break, where none should be present.

We have an import.csv file with these contents:
"Contents of first column, first row","Contents of second column, first ->
row","Contents of third column, first row"
"Contents of first column, second row","Contents of second column, second -> 
row","Contents of third column, second row"
"Contents of first column, third row","Contents of second column, third -> 
row","Contents of third column, third row"

We have a models.py with a TestModel looking like this:

class TestModel(models.Model):
    field1 = models.CharField(max_length=255)
    field2 = models.CharField(max_length=255)
    field3 = models.CharField(max_length=255)

Then we run it:

$ python csvimport.py --csv import.csv --mappings 1=field1,2=field2, ->
3=field3 --model TestModel --modelspy .
TestModel.field1 = "Contents of first column, first row"
TestModel.field2 = "Contents of second column, first row"
TestModel.field3 = "Contents of third column, first row"
--------------------
TestModel.field1 = "Contents of first column, second row"
TestModel.field2 = "Contents of second column, second row"
TestModel.field3 = "Contents of third column, second row"
--------------------
TestModel.field1 = "Contents of first column, third row"
TestModel.field2 = "Contents of second column, third row"
TestModel.field3 = "Contents of third column, third row"

Using named indexes
-------------------

It is custom to have the first line of a CSV file serve as a sort of index, 
naming the columns. CSV Import supports this through the --name-indexes 
option. Using this, we can alter the command a bit, the --mappings option 
specifically:

$ python csvimport.py --csv import.csv --mappings column1=field1, -> 
column2=field2,column3=field3 --model TestModel --modelspy .

This asserts that the first line of import.csv looks like this:
"column1","column2","column3"

Version history
===============

*   0.1, 9 March 2008: Initial Release.

Credits
=======

Jonathan Holst <jonathan.holst [at] gmta.info>

Founded in 2003, GMTA ApS is the partnership between three
Danish computer enthusiasts with a strong interest in the
web-phenomenon.

Visit our website: http://www.gmta.info

License
=======

Creative Commons Attribution-Share Alike 3.0 License
http://creativecommons.org/licenses/by-sa/3.0/

When attributing this work, you must maintain the Credits
paragraph above.
"""

__version__ = '0.1'

import sys, os, csv, re

class CSVImport:
    """
    Parse and map a CSV resource to a Django model.
    
    Notice that the doc tests are merely illustrational, and will not run 
    as is.
    """
    def __init__(self, csvfile, mappings, model, modelspy, nameindexes=False):
        # This setting can be overriden at any time through an 
        # instance.debug = True, but this is for the hardcore crowd, and
        # should not polute the API
        self.debug = False
        self.errors = []
    
        self.csvfile = self.__csvfile(csvfile)
        self.mappings = self.__mappings(mappings)
        self.modelspy = self.__modelspy(modelspy)
        self.model = self.__model(model)
        self.nameindexes = bool(nameindexes)
    
    def run(self):
        if self.nameindexes:
            indexes = self.csvfile.pop(0)
            
        for row in self.csvfile:
            model_instance = getattr(self.models, self.model)()
            
            for (column, field, foreignkey) in self.mappings:
                if self.nameindexes:
                    column = indexes.index(column)
                else:
                    column = int(column)-1
                
                if foreignkey:
                    fk_key, fk_field = foreignkey
                try:
                    fk = getattr(self.models, fk_key)
                except AttributeError:
                    self.error('Referenced foreign keys must be in specified models.py', 0)
                
                # If there is corresponding data in the model already,
                # we do not need to add more, since we are dealing with
                # foreign keys, therefore foreign data
                matches = fk.objects.filter(**{fk_field+'__exact': 
                row[column]})
                
                if not matches:
                    key = fk()
                    key.__setattr__(fk_field, row[column])
                    key.save()
                
                row[column] = fk.objects.filter(**{fk_field+'__exact': 
                row[column]})[0]
            
                print '%s.%s = "%s"' % (self.model, field, row[column])
                model_instance.__setattr__(field, row[column])
        
        model_instance.save()
        
        print '-' * 20
        
    def error(self, message, type=1):
        """
        Types:
            0. A fatal error. The most drastic one. Will quit the program.
            1. A notice. Some minor thing is in disorder.
        """
        
        types = (
            ('Fatal error', FatalError),
            ('Notice', None),
        )
        
        self.errors.append((message, type))
        
        if type == 0:
            # There is nothing to do. We have to quite at this point
            raise types[0][1], message
        elif self.debug == True:
            print "%s: %s" % (types[type][0], message)
    
    def __csvfile(self, datafile):
        try:
            csvfile = file(datafile, 'r')
        except IOError:
            self.error('Could not open specified csv file, %s, or it does not exist' % datafile, 0)
        else:
            # CSV Reader returns an iterable, but as we possibly need to
            # perform list commands and since list is an acceptable iterable, 
            # we'll just transform it.
            return list(csv.reader(csvfile))
    
    def __modelspy(self, modelspy):
        # We need the path containing the models.py, so if it's in the
        # specified path, we'll just remove it.
        if modelspy.endswith('models.py'):
            self.error('Specified models.py path has "models.py" in it.', 1)
            modelspy = modelspy[:-9]
        
        if not os.path.exists(modelspy):
            self.error('Specified directory does not contain a models.py', 0)
        
        return modelspy
    
    def __model(self, model):
        # In order to properly import the models, and figure out what settings 
        # to use, we need to figure out the application and project names.
        a_dir = os.path.abspath(self.modelspy)
        a_name = os.path.basename(a_dir)
        p_dir = os.path.dirname(a_dir)
        p_name = os.path.basename(p_dir)
        
        # To import the models.py, it needs to be in sys.path
        sys.path.append(os.path.abspath(self.modelspy))
        # The project path should too
        sys.path.append(os.path.dirname(p_dir))
        
        os.environ['DJANGO_SETTINGS_MODULE'] = '%s.settings' % p_name
        
        try:
            self.models = __import__('%s.%s.models' % (p_name, a_name),
            globals(), locals(), '%s.%s' % (p_name, p_name))
        except ImportError:
            self.error('Specified directory does not exist')
        else:
            return model
    
        def __mappings(mappings):
            """
            Parse the mappings, and return a list of them.
            """
            
            def parse_mapping(args):
                """
                Parse the custom mapping syntax (column1=field1(ForeignKey|field),
                etc.)
                
                >>> parse_mapping('a=b(c|d)')
                [('a', 'b', '(c|d)')]
                """
                
                pattern = re.compile(r'(\w+)=(\w+)(\(\w+\|\w+\))?')
                mappings = pattern.findall(args)
                
                mappings = list(mappings)
                for mapping in mappings:
                    m = mappings.index(mapping)
                    mappings[m] = list(mappings[m])
                    mappings[m][2] = parse_foreignkey(mapping[2])
                    mappings[m] = tuple(mappings[m])
                mappings = list(mappings)
                
                return mappings
                
            def parse_foreignkey(key):
                """
                Parse the foreignkey syntax (Key|field)
                
                >>> parse_foreignkey('(a|b)')
                ('a', 'b')
                """
                
                pattern = re.compile(r'(\w+)\|(\w+)', re.U)
                if key.startswith('(') and key.endswith(')'):
                    key = key[1:-1]
                    
                found = pattern.search(key)
                
                return (found.group(1), found.group(2))
                
            mappings = mappings.replace(',', ' ')
            return parse_mapping(mappings)

class FatalError(Exception):
    """
    Something really bad happened.
    """
    def __init__(self, value):
        self.value = value
        
    def __str__(self):
        return repr(self.value)

if __name__ == '__main__':
    import optparse
    parser = optparse.OptionParser(usage='csvimport.py [args]', version='CSV Import %s' % __version__)
    
    try:
        sys.argv[1]
    except IndexError:
        parser.error('No arguments specified')
        
    parser.add_option('-n', '--name-indexes',
    action='store_true', dest='nameindexes',
    help='If this flag is on, the first line of the CSV file will be used as an index reference, meaning the mapping will appear as columnname=modelname. If it is not on, columns should instead be referred to by their position (first column is 1, second is 2, etc.)')
    
    parser.add_option('-c', '--csv', 
    action='store', type='string', dest='csvfile',
    help='The desired CSV file to import from')
    
    parser.add_option('-p', '--modelspy',
    action='store', type='string', dest='modelspy', 
    help='The mapping, to specify which columns from the CSV file that go with which models.')
    
    parser.add_option('-m', '--model',
    action='store', type='string', dest='model',
    help='The model to perform the mapping on. Must be in the specified models.py')
    
    parser.add_option('-a', '--mappings', 
    action='store', type='string', dest='mappings', 
    help='The mapping, to specify which columns from the CSV file that go with which models. Written as a list of column=field, eg. column1=field1,column2=field2. No spaces. Foreign keys are specified by appending (key|field), where key is the name of the foreign key, and field is the attribute in the foreign key, that holds the data specified in the CSV column.')
    
    opts, args = parser.parse_args()
    
    if not opts.csvfile or not opts.modelspy or not opts.model or not\
    opts.mappings:
        parser.error('Not enough arguments specified.')
    
    try:
        c = CSVImport(opts.csvfile, opts.mappings, opts.model, opts.modelspy,
        opts.nameindexes)
        c.run()
    except FatalError, e:
        parser.error(e)

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 11 months, 2 weeks ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 11 months, 3 weeks ago
  3. Serializer factory with Django Rest Framework by julio 1 year, 6 months ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 7 months ago
  5. Help text hyperlinks by sa2812 1 year, 8 months ago

Comments

rachid (on March 11, 2008):

i am not sure, but line 233

return list(csv.reader(datafile))

should be

return list(csv.reader(csvfile))

#

jonathanholst (on March 16, 2008):

@ rachid

Sure thing, I must have missed it when doing a bit of manual refactoring. Thanks for pointing it out.

#

ryan.flores (on March 31, 2008):

I am having trouble using this script. I am pretty new to Django so any help at all is appreciated:

$ python csvimport.py --csv import.csv --mappings column1=title,column2=slug,column3=intro,column4=body,column5=url --model NewsArticle --modelspy .

Traceback (most recent call last): File "csvimport.py", line 352, in <module> opts.nameindexes) File "csvimport.py", line 147, in init self.mappings = self.__mappings(mappings) AttributeError: CSVImport instance has no attribute '_CSVImport__mappings'

I did Tried:

dir(CSVImport) ['_CSVImport__csvfile', '_CSVImport__model', '_CSVImport__modelspy', 'doc', 'init', 'module', 'error', 'run']

And I don't see mappings there, not sure why...

#

Please login first before commenting.