Login

CSVImport

Author:
sharkus
Posted:
June 6, 2008
Language:
Python
Version:
.96
Score:
1 (after 1 ratings)

This is an awesome script! The original can be found here: http://www.djangosnippets.org/snippets/633/

I had to make a couple minor changes to get it working for me, so I thought I would share the love...

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
"""
CSV Import 0.2
==============
This script is based upon the following:
http://www.djangosnippets.org/snippets/633/

Modified with the following changes:
* Several problems due to wrong indention (at least with my editor/pc)
* Don't completely fail if exception is caught during save() operation
* Remove trailing white space from data
* Added basic keyfield example

CSV Import 0.1
==============

Import a Comma Separated Values file and map it to a Django model.

I apologise for the rather clumsy format, but there does not really seem to
be an intuitive way to do this. I hope the documentation will suffice.

Sparse documentation
====================

CSV Import is run from the commandline. It has a few options: --csv, --help,
--mappings, --model, --modelspy, --name-indexes and --test.

--csv
-----

The path reference to the CSV file from which the script will import. Can be
either relative or absolute, as a regular UNIX file reference.

--help
------

Self-explanatory, shows documentation of these options.

--mappings
----------

Definition of the column-to-field mapping. Written in the format 
column=field(ForeignKeyName|field). The foreign key part is optional, and
may be omitted completely, leaving column=field.

--model
-------

The model in which the contents of the CSV file will be inserted.

--modelspy
----------

The path of the models.py file in which the model is located.

--name-indexes
--------------

Whether or not to use the first line of the CSV file as an index, giving the
possibility to refer to a column by its name. If omitted, column references 
are written as [column number]=field, e.g. 1=field.

Example runs
============

An example could look like the following. Please note that all occurences of
-> on the end of a line indicate a hard break, where none should be present.

We have an import.csv file with these contents:
"Contents of first column, first row","Contents of second column, first ->
row","Contents of third column, first row"
"Contents of first column, second row","Contents of second column, second -> 
row","Contents of third column, second row"
"Contents of first column, third row","Contents of second column, third -> 
row","Contents of third column, third row"

We have a models.py with a TestModel looking like this:

class TestModel(models.Model):
    field1 = models.CharField(max_length=255)
    field2 = models.CharField(max_length=255)
    field3 = models.CharField(max_length=255)

Then we run it:

$ python csvimport.py --csv import.csv --mappings 1=field1,2=field2, ->
3=field3 --model TestModel --modelspy .
TestModel.field1 = "Contents of first column, first row"
TestModel.field2 = "Contents of second column, first row"
TestModel.field3 = "Contents of third column, first row"
--------------------
TestModel.field1 = "Contents of first column, second row"
TestModel.field2 = "Contents of second column, second row"
TestModel.field3 = "Contents of third column, second row"
--------------------
TestModel.field1 = "Contents of first column, third row"
TestModel.field2 = "Contents of second column, third row"
TestModel.field3 = "Contents of third column, third row"

Using named indexes
-------------------

It is custom to have the first line of a CSV file serve as a sort of index, 
naming the columns. CSV Import supports this through the --name-indexes 
option. Using this, we can alter the command a bit, the --mappings option 
specifically:

$ python csvimport.py --csv import.csv --mappings column1=field1, -> 
column2=field2,column3=field3 --model TestModel --modelspy .

This asserts that the first line of import.csv looks like this:
"column1","column2","column3"

Using KeyFields
---------------

Here's an example of using key fields.  In this case the csv file contains three
columns, last_name, first_name, and the manager's last_name.  The resource model
contains a ForeignKey field named "Manager" that is a self referencing.

Note, the mappings string must be contiain in quotes to prevent the | character
from acting like a pipe (at least under windows).

$ python CSVImport.py --csv engineers.csv --mappings "1=last_name 2=first_name -> 
3=manager(Resource|last_name)" --model Resource --modelspy .

Version history
===============

*   0.1, 9 March 2008: Initial Release.

Credits
=======

Jonathan Holst <jonathan.holst [at] gmta.info>

Founded in 2003, GMTA ApS is the partnership between three
Danish computer enthusiasts with a strong interest in the
web-phenomenon.

Visit our website: http://www.gmta.info

License
=======

Creative Commons Attribution-Share Alike 3.0 License
http://creativecommons.org/licenses/by-sa/3.0/

When attributing this work, you must maintain the Credits
paragraph above.
"""

__version__ = '0.1'

import sys, os, csv, re

class CSVImport:
    """
    Parse and map a CSV resource to a Django model.
    
    Notice that the doc tests are merely illustrational, and will not run 
    as is.
    """
    def __init__(self, csvfile, mappings, model, modelspy, nameindexes=False):
        # This setting can be overriden at any time through an 
        # instance.debug = True, but this is for the hardcore crowd, and
        # should not polute the API
        self.debug = False
        self.errors = []
    
        self.csvfile = self.__csvfile(csvfile)
        self.mappings = self.__mappings(mappings)
        self.modelspy = self.__modelspy(modelspy)
        self.model = self.__model(model)
        self.nameindexes = bool(nameindexes)
    
    def run(self):
        if self.nameindexes:
            indexes = self.csvfile.pop(0)
            
        for row in self.csvfile:
            model_instance = getattr(self.models, self.model)()
            
            for (column, field, foreignkey) in self.mappings:
                if self.nameindexes:
                    column = indexes.index(column)
                else:
                    column = int(column)-1
                    
                row[column] = row[column].strip()
                
                if foreignkey:
                    fk_key, fk_field = foreignkey
                    try:
                        fk = getattr(self.models, fk_key)
                    except AttributeError:
                        self.error('Referenced foreign keys must be in specified models.py', 0)
                
                    # If there is corresponding data in the model already,
                    # we do not need to add more, since we are dealing with
                    # foreign keys, therefore foreign data
                    matches = fk.objects.filter(**{fk_field+'__exact': 
                    row[column]})
                    
                    if not matches:
                        key = fk()
                        key.__setattr__(fk_field, row[column])
                        key.save()
                    
                    row[column] = fk.objects.filter(**{fk_field+'__exact': 
                    row[column]})[0]
            
                print '%s.%s = "%s"' % (self.model, field, row[column])
                model_instance.__setattr__(field, row[column])

            try:
                model_instance.save()
            except:
                print 'Exception found... Instance not saved.'

            print '-' * 20
        
    def error(self, message, type=1):
        """
        Types:
            0. A fatal error. The most drastic one. Will quit the program.
            1. A notice. Some minor thing is in disorder.
        """
        
        types = (
            ('Fatal error', FatalError),
            ('Notice', None),
        )
        
        self.errors.append((message, type))
        
        if type == 0:
            # There is nothing to do. We have to quite at this point
            raise types[0][1], message
        elif self.debug == True:
            print "%s: %s" % (types[type][0], message)
    
    def __csvfile(self, datafile):
        try:
            csvfile = file(datafile, 'r')
        except IOError:
            self.error('Could not open specified csv file, %s, or it does not exist' % datafile, 0)
        else:
            # CSV Reader returns an iterable, but as we possibly need to
            # perform list commands and since list is an acceptable iterable, 
            # we'll just transform it.
            return list(csv.reader(csvfile))
    
    def __modelspy(self, modelspy):
        # We need the path containing the models.py, so if it's in the
        # specified path, we'll just remove it.
        if modelspy.endswith('models.py'):
            self.error('Specified models.py path has "models.py" in it.', 1)
            modelspy = modelspy[:-9]
        
        if not os.path.exists(modelspy):
            self.error('Specified directory does not contain a models.py', 0)
        
        return modelspy
    
    def __model(self, model):
        # In order to properly import the models, and figure out what settings 
        # to use, we need to figure out the application and project names.
        a_dir = os.path.abspath(self.modelspy)
        a_name = os.path.basename(a_dir)
        p_dir = os.path.dirname(a_dir)
        p_name = os.path.basename(p_dir)
        
        # To import the models.py, it needs to be in sys.path
        sys.path.append(os.path.abspath(self.modelspy))
        # The project path should too
        sys.path.append(os.path.dirname(p_dir))
        
        os.environ['DJANGO_SETTINGS_MODULE'] = '%s.settings' % p_name
        
        try:
            self.models = __import__('%s.%s.models' % (p_name, a_name),
            globals(), locals(), '%s.%s' % (p_name, p_name))
        except ImportError:
            self.error('Specified directory does not exist')
        else:
            return model
    
    def __mappings(self, mappings):
        """
        Parse the mappings, and return a list of them.
        """
        
        def parse_mapping(args):
            """
            Parse the custom mapping syntax (column1=field1(ForeignKey|field),
            etc.)
            
            >>> parse_mapping('a=b(c|d)')
            [('a', 'b', '(c|d)')]
            """
            
            pattern = re.compile(r'(\w+)=(\w+)(\(\w+\|\w+\))?')
            mappings = pattern.findall(args)
            
            mappings = list(mappings)
            for mapping in mappings:
                m = mappings.index(mapping)
                mappings[m] = list(mappings[m])
                mappings[m][2] = parse_foreignkey(mapping[2])
                mappings[m] = tuple(mappings[m])
            mappings = list(mappings)
            
            return mappings
            
        def parse_foreignkey(key):
            """
            Parse the foreignkey syntax (Key|field)
            
            >>> parse_foreignkey('(a|b)')
            ('a', 'b')
            """
            
            pattern = re.compile(r'(\w+)\|(\w+)', re.U)
            if key.startswith('(') and key.endswith(')'):
                key = key[1:-1]
                
            found = pattern.search(key)
            
            if found != None:
                return (found.group(1), found.group(2))
            else:
                return None
            
        mappings = mappings.replace(',', ' ')
        return parse_mapping(mappings)

class FatalError(Exception):
    """
    Something really bad happened.
    """
    def __init__(self, value):
        self.value = value
        
    def __str__(self):
        return repr(self.value)

if __name__ == '__main__':
    import optparse
    parser = optparse.OptionParser(usage='csvimport.py [args]', version='CSV Import %s' % __version__)
    
    try:
        sys.argv[1]
    except IndexError:
        parser.error('No arguments specified')
        
    parser.add_option('-n', '--name-indexes',
    action='store_true', dest='nameindexes',
    help='If this flag is on, the first line of the CSV file will be used as an index reference, meaning the mapping will appear as columnname=modelname. If it is not on, columns should instead be referred to by their position (first column is 1, second is 2, etc.)')
    
    parser.add_option('-c', '--csv',
    action='store', type='string', dest='csvfile',
    help='The desired CSV file to import from')
    
    parser.add_option('-p', '--modelspy',
    action='store', type='string', dest='modelspy',
    help='The mapping, to specify which columns from the CSV file that go with which models.')
    
    parser.add_option('-m', '--model',
    action='store', type='string', dest='model',
    help='The model to perform the mapping on. Must be in the specified models.py')
    
    parser.add_option('-a', '--mappings',
    action='store', type='string', dest='mappings',
    help='The mapping, to specify which columns from the CSV file that go with which models. Written as a list of column=field, eg. column1=field1,column2=field2. No spaces. Foreign keys are specified by appending (key|field), where key is the name of the foreign key, and field is the attribute in the foreign key, that holds the data specified in the CSV column.')
    
    opts, args = parser.parse_args()
    
    if not opts.csvfile or not opts.modelspy or not opts.model or not\
    opts.mappings:
        parser.error('Not enough arguments specified.')
    
    try:
        c = CSVImport(opts.csvfile, opts.mappings, opts.model, opts.modelspy,
                      opts.nameindexes)
        c.run()
    except FatalError, e:
        parser.error(e)

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 1 year ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 1 year ago
  3. Serializer factory with Django Rest Framework by julio 1 year, 7 months ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 8 months ago
  5. Help text hyperlinks by sa2812 1 year, 8 months ago

Comments

giulio.galiero (on May 7, 2010):

I had the same problem, after some debugging I found out it is related to sqlite not finding the table when saving your model (in the script, try placing the "model_instance.save()" outside the try block and you'll get the full exception stacktrace).

Anyway, switching to a mysql backend solved the issue. Hope this helps.

#

rossdavidh (on May 18, 2010):

Hi, when I ran this script I had the following result:

python csvimport.py --csv moneymade2010.csv --mappings "1=salesdate,2=cashsales,3=visasales,4=mastercardsales,5=taxfreesales,6=salestaxcollected" --model Dailysales --modelspy . Traceback (most recent call last): File "csvimport.py", line 385, in <module> c.run() File "csvimport.py", line 181, in run model_instance = getattr(self.models, self.model)() AttributeError: CSVImport instance has no attribute 'models'

Anybody experience anything similar?

#

andrejas (on September 28, 2011):

That's a great snippet. It saved me some time. Thanks!

#

andrejas (on October 1, 2011):

@rossdavidh It sounds like you may not have run syncdb to create the actual db tables according to your model.

#

Please login first before commenting.