Login

CSV to JSON Fixture

Author:
briangershon
Posted:
August 11, 2009
Language:
Python
Version:
1.1
Score:
4 (after 4 ratings)

This script converts a CSV file into a JSON file ready to be imported via manage.py loaddata like any other fixture data.

It can be used manually to do a one-time conversion (for placing into a /fixtures folder), or used in a fabric script that automatically converts CSV to JSON live then runs loaddata to import as fixture data.

To run script:

csv2json.py input_file_name model_name

e.g. csv2json.py airport.csv app_airport.Airport

Note: input_file_name should be a path relative to where this script is.

Scripts depends on simplejson module. The module can just be placed in a sub-folder to the script to make it easy to import. If you use the same Python binary that you use for your Django site, you could use the Django import instead: from django.utils import simplejson

File Input/Ouptut formats:

Assumes CSV files are saved with LF line endings, and that first line has field values. First column is the model's pk field.

Sample CSV input:

id,ident,name,city,state
1,00C,Animas Air Park,Durango,CO
6,00V,Meadow Lake,Colorado Springs,CO
7,00W,Lower Granite State,Colfax,WA
12,01J,Hilliard Airpark,Hilliard,FL

Output file name is input name + ".json" extension.

Sample JSON output:

[
    {
        "pk": 1, 
        "model": "app_airport.Airport", 
        "fields": {
            "name": "Animas Air Park", 
            "city": "Durango", 
            "ident": "00C", 
            "state": "CO", 
        }
    }
]

Debugging Conversion Problems

If JSON import errors out with "ValidationError: This value must be an integer", you probably have a blank in an Integer field within your CSV file, but if can't figure out, try setting a breakpoint in file:

./django/django/db/models/fields/__init__.py

e.g.

688             try:
689                 return int(value)
690             except (TypeError, ValueError):
691                 import pdb; pdb.set_trace()
692  ->             raise exceptions.ValidationError(
693                     _("This value must be an integer."))

To figure out what field caused the error, while in the debugger:

(Pdb) u                 <- to go UP the callstack
(Pdb) field.name
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# csv2json.py
# 
# Copyright 2009 Brian Gershon -- briang at webcollective.coop
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
#     http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import sys
import getopt
import csv
from os.path import dirname
import simplejson

try:
    script, input_file_name, model_name = sys.argv
except ValueError:
    print "\nRun via:\n\n%s input_file_name model_name" % sys.argv[0]
    print "\ne.g. %s airport.csv app_airport.Airport" % sys.argv[0]
    print "\nNote: input_file_name should be a path relative to where this script is."
    sys.exit()

in_file = dirname(__file__) + input_file_name
out_file = dirname(__file__) + input_file_name + ".json"

print "Converting %s from CSV to JSON as %s" % (in_file, out_file)

f = open(in_file, 'r' )
fo = open(out_file, 'w')

reader = csv.reader( f )

header_row = []
entries = []

# debugging
# if model_name == 'app_airport.Airport':
#     import pdb ; pdb.set_trace( )

for row in reader:
    if not header_row:
        header_row = row
        continue
        
    pk = row[0]
    model = model_name
    fields = {}
    for i in range(len(row)-1):
        active_field = row[i+1]

        # convert numeric strings into actual numbers by converting to either int or float
        if active_field.isdigit():
            try:
                new_number = int(active_field)
            except ValueError:
                new_number = float(active_field)
            fields[header_row[i+1]] = new_number
        else:
            fields[header_row[i+1]] = active_field.strip()
        
    row_dict = {}
    row_dict["pk"] = int(pk)
    row_dict["model"] = model_name
    
    row_dict["fields"] = fields
    entries.append(row_dict)

fo.write("%s" % simplejson.dumps(entries, indent=4))

f.close()
fo.close()

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 2 months, 2 weeks ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 2 months, 3 weeks ago
  3. Serializer factory with Django Rest Framework by julio 9 months, 2 weeks ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 10 months, 1 week ago
  5. Help text hyperlinks by sa2812 11 months ago

Comments

Zeynel (on November 11, 2009):

Hi,

Thank you so much for this snippet. I am just learning Django and Python. I've been trying to use this snippet but i am getting an error message:

File "<stdin>", line 1 csv2json.py sw.csv wkw1.Lawyer SyntaxError: invalid syntax

(the invalid sytax caret points to sw.csv)

I created a csv file sw.csv and I put it in the same directory (sw1) as the snippet csv2json.py.

I started the python shell by typing "python" on the command prompt, and I entered csv2json.py sw.csv wkw1.Lawyer, but I got the error message.

What am I doing wrong? Thanks for the help.

#

Please login first before commenting.