- Author:
- jonathanholst
- Posted:
- March 9, 2008
- Language:
- Python
- Version:
- .96
- Score:
- 1 (after 1 ratings)
Importing data from other sources than SQL can be an annoyance. This script serves as a general tool for importing data from CSV files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 | """
CSV Import 0.1
==============
Import a Comma Separated Values file and map it to a Django model.
I apologise for the rather clumsy format, but there does not really seem to
be an intuitive way to do this. I hope the documentation will suffice.
Sparse documentation
====================
CSV Import is run from the commandline. It has a few options: --csv, --help,
--mappings, --model, --modelspy, --name-indexes and --test.
--csv
-----
The path reference to the CSV file from which the script will import. Can be
either relative or absolute, as a regular UNIX file reference.
--help
------
Self-explanatory, shows documentation of these options.
--mappings
----------
Definition of the column-to-field mapping. Written in the format
column=field(ForeignKeyName|field). The foreign key part is optional, and
may be omitted completely, leaving column=field.
--model
-------
The model in which the contents of the CSV file will be inserted.
--modelspy
----------
The path of the models.py file in which the model is located.
--name-indexes
--------------
Whether or not to use the first line of the CSV file as an index, giving the
possibility to refer to a column by its name. If omitted, column references
are written as [column number]=field, e.g. 1=field.
Example runs
============
An example could look like the following. Please note that all occurences of
-> on the end of a line indicate a hard break, where none should be present.
We have an import.csv file with these contents:
"Contents of first column, first row","Contents of second column, first ->
row","Contents of third column, first row"
"Contents of first column, second row","Contents of second column, second ->
row","Contents of third column, second row"
"Contents of first column, third row","Contents of second column, third ->
row","Contents of third column, third row"
We have a models.py with a TestModel looking like this:
class TestModel(models.Model):
field1 = models.CharField(max_length=255)
field2 = models.CharField(max_length=255)
field3 = models.CharField(max_length=255)
Then we run it:
$ python csvimport.py --csv import.csv --mappings 1=field1,2=field2, ->
3=field3 --model TestModel --modelspy .
TestModel.field1 = "Contents of first column, first row"
TestModel.field2 = "Contents of second column, first row"
TestModel.field3 = "Contents of third column, first row"
--------------------
TestModel.field1 = "Contents of first column, second row"
TestModel.field2 = "Contents of second column, second row"
TestModel.field3 = "Contents of third column, second row"
--------------------
TestModel.field1 = "Contents of first column, third row"
TestModel.field2 = "Contents of second column, third row"
TestModel.field3 = "Contents of third column, third row"
Using named indexes
-------------------
It is custom to have the first line of a CSV file serve as a sort of index,
naming the columns. CSV Import supports this through the --name-indexes
option. Using this, we can alter the command a bit, the --mappings option
specifically:
$ python csvimport.py --csv import.csv --mappings column1=field1, ->
column2=field2,column3=field3 --model TestModel --modelspy .
This asserts that the first line of import.csv looks like this:
"column1","column2","column3"
Version history
===============
* 0.1, 9 March 2008: Initial Release.
Credits
=======
Jonathan Holst <jonathan.holst [at] gmta.info>
Founded in 2003, GMTA ApS is the partnership between three
Danish computer enthusiasts with a strong interest in the
web-phenomenon.
Visit our website: http://www.gmta.info
License
=======
Creative Commons Attribution-Share Alike 3.0 License
http://creativecommons.org/licenses/by-sa/3.0/
When attributing this work, you must maintain the Credits
paragraph above.
"""
__version__ = '0.1'
import sys, os, csv, re
class CSVImport:
"""
Parse and map a CSV resource to a Django model.
Notice that the doc tests are merely illustrational, and will not run
as is.
"""
def __init__(self, csvfile, mappings, model, modelspy, nameindexes=False):
# This setting can be overriden at any time through an
# instance.debug = True, but this is for the hardcore crowd, and
# should not polute the API
self.debug = False
self.errors = []
self.csvfile = self.__csvfile(csvfile)
self.mappings = self.__mappings(mappings)
self.modelspy = self.__modelspy(modelspy)
self.model = self.__model(model)
self.nameindexes = bool(nameindexes)
def run(self):
if self.nameindexes:
indexes = self.csvfile.pop(0)
for row in self.csvfile:
model_instance = getattr(self.models, self.model)()
for (column, field, foreignkey) in self.mappings:
if self.nameindexes:
column = indexes.index(column)
else:
column = int(column)-1
if foreignkey:
fk_key, fk_field = foreignkey
try:
fk = getattr(self.models, fk_key)
except AttributeError:
self.error('Referenced foreign keys must be in specified models.py', 0)
# If there is corresponding data in the model already,
# we do not need to add more, since we are dealing with
# foreign keys, therefore foreign data
matches = fk.objects.filter(**{fk_field+'__exact':
row[column]})
if not matches:
key = fk()
key.__setattr__(fk_field, row[column])
key.save()
row[column] = fk.objects.filter(**{fk_field+'__exact':
row[column]})[0]
print '%s.%s = "%s"' % (self.model, field, row[column])
model_instance.__setattr__(field, row[column])
model_instance.save()
print '-' * 20
def error(self, message, type=1):
"""
Types:
0. A fatal error. The most drastic one. Will quit the program.
1. A notice. Some minor thing is in disorder.
"""
types = (
('Fatal error', FatalError),
('Notice', None),
)
self.errors.append((message, type))
if type == 0:
# There is nothing to do. We have to quite at this point
raise types[0][1], message
elif self.debug == True:
print "%s: %s" % (types[type][0], message)
def __csvfile(self, datafile):
try:
csvfile = file(datafile, 'r')
except IOError:
self.error('Could not open specified csv file, %s, or it does not exist' % datafile, 0)
else:
# CSV Reader returns an iterable, but as we possibly need to
# perform list commands and since list is an acceptable iterable,
# we'll just transform it.
return list(csv.reader(csvfile))
def __modelspy(self, modelspy):
# We need the path containing the models.py, so if it's in the
# specified path, we'll just remove it.
if modelspy.endswith('models.py'):
self.error('Specified models.py path has "models.py" in it.', 1)
modelspy = modelspy[:-9]
if not os.path.exists(modelspy):
self.error('Specified directory does not contain a models.py', 0)
return modelspy
def __model(self, model):
# In order to properly import the models, and figure out what settings
# to use, we need to figure out the application and project names.
a_dir = os.path.abspath(self.modelspy)
a_name = os.path.basename(a_dir)
p_dir = os.path.dirname(a_dir)
p_name = os.path.basename(p_dir)
# To import the models.py, it needs to be in sys.path
sys.path.append(os.path.abspath(self.modelspy))
# The project path should too
sys.path.append(os.path.dirname(p_dir))
os.environ['DJANGO_SETTINGS_MODULE'] = '%s.settings' % p_name
try:
self.models = __import__('%s.%s.models' % (p_name, a_name),
globals(), locals(), '%s.%s' % (p_name, p_name))
except ImportError:
self.error('Specified directory does not exist')
else:
return model
def __mappings(mappings):
"""
Parse the mappings, and return a list of them.
"""
def parse_mapping(args):
"""
Parse the custom mapping syntax (column1=field1(ForeignKey|field),
etc.)
>>> parse_mapping('a=b(c|d)')
[('a', 'b', '(c|d)')]
"""
pattern = re.compile(r'(\w+)=(\w+)(\(\w+\|\w+\))?')
mappings = pattern.findall(args)
mappings = list(mappings)
for mapping in mappings:
m = mappings.index(mapping)
mappings[m] = list(mappings[m])
mappings[m][2] = parse_foreignkey(mapping[2])
mappings[m] = tuple(mappings[m])
mappings = list(mappings)
return mappings
def parse_foreignkey(key):
"""
Parse the foreignkey syntax (Key|field)
>>> parse_foreignkey('(a|b)')
('a', 'b')
"""
pattern = re.compile(r'(\w+)\|(\w+)', re.U)
if key.startswith('(') and key.endswith(')'):
key = key[1:-1]
found = pattern.search(key)
return (found.group(1), found.group(2))
mappings = mappings.replace(',', ' ')
return parse_mapping(mappings)
class FatalError(Exception):
"""
Something really bad happened.
"""
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
if __name__ == '__main__':
import optparse
parser = optparse.OptionParser(usage='csvimport.py [args]', version='CSV Import %s' % __version__)
try:
sys.argv[1]
except IndexError:
parser.error('No arguments specified')
parser.add_option('-n', '--name-indexes',
action='store_true', dest='nameindexes',
help='If this flag is on, the first line of the CSV file will be used as an index reference, meaning the mapping will appear as columnname=modelname. If it is not on, columns should instead be referred to by their position (first column is 1, second is 2, etc.)')
parser.add_option('-c', '--csv',
action='store', type='string', dest='csvfile',
help='The desired CSV file to import from')
parser.add_option('-p', '--modelspy',
action='store', type='string', dest='modelspy',
help='The mapping, to specify which columns from the CSV file that go with which models.')
parser.add_option('-m', '--model',
action='store', type='string', dest='model',
help='The model to perform the mapping on. Must be in the specified models.py')
parser.add_option('-a', '--mappings',
action='store', type='string', dest='mappings',
help='The mapping, to specify which columns from the CSV file that go with which models. Written as a list of column=field, eg. column1=field1,column2=field2. No spaces. Foreign keys are specified by appending (key|field), where key is the name of the foreign key, and field is the attribute in the foreign key, that holds the data specified in the CSV column.')
opts, args = parser.parse_args()
if not opts.csvfile or not opts.modelspy or not opts.model or not\
opts.mappings:
parser.error('Not enough arguments specified.')
try:
c = CSVImport(opts.csvfile, opts.mappings, opts.model, opts.modelspy,
opts.nameindexes)
c.run()
except FatalError, e:
parser.error(e)
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 10 months, 2 weeks ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 10 months, 3 weeks ago
- Serializer factory with Django Rest Framework by julio 1 year, 5 months ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 6 months ago
- Help text hyperlinks by sa2812 1 year, 7 months ago
Comments
i am not sure, but line 233
should be
#
@ rachid
Sure thing, I must have missed it when doing a bit of manual refactoring. Thanks for pointing it out.
#
I am having trouble using this script. I am pretty new to Django so any help at all is appreciated:
$ python csvimport.py --csv import.csv --mappings column1=title,column2=slug,column3=intro,column4=body,column5=url --model NewsArticle --modelspy .
Traceback (most recent call last): File "csvimport.py", line 352, in <module> opts.nameindexes) File "csvimport.py", line 147, in init self.mappings = self.__mappings(mappings) AttributeError: CSVImport instance has no attribute '_CSVImport__mappings'
I did Tried:
And I don't see mappings there, not sure why...
#
Please login first before commenting.