UnicodeWriter and UnicodeDictWriter - write unicode strings out to Excel compatible CSV files

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import csv, StringIO

class UnicodeWriter(object):
    """
    Like UnicodeDictWriter, but takes lists rather than dictionaries.
    
    Usage example:
    
    fp = open('my-file.csv', 'wb')
    writer = UnicodeWriter(fp)
    writer.writerows([
        [u'Bob', 22, 7],
        [u'Sue', 28, 6],
        [u'Ben', 31, 8],
        # \xc3\x80 is LATIN CAPITAL LETTER A WITH MACRON
        ['\xc4\x80dam'.decode('utf8'), 11, 4],
    ])
    fp.close()
    """
    def __init__(self, f, dialect=csv.excel_tab, encoding="utf-16", **kwds):
        # Redirect output to a queue
        self.queue = StringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoding = encoding
    
    def writerow(self, row):
        # Modified from original: now using unicode(s) to deal with e.g. ints
        self.writer.writerow([unicode(s).encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = data.encode(self.encoding)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)
    
    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

class UnicodeDictWriter(UnicodeWriter):
    """
    A CSV writer that produces Excel-compatibly CSV files from unicode data.
    Uses UTF-16 and tabs as delimeters - it turns out this is the only way to
    get unicode data in to Excel using CSV.
    
    Usage example:
    
    fp = open('my-file.csv', 'wb')
    writer = UnicodeDictWriter(fp, ['name', 'age', 'shoesize'])
    writer.writerows([
        {'name': u'Bob', 'age': 22, 'shoesize': 7},
        {'name': u'Sue', 'age': 28, 'shoesize': 6},
        {'name': u'Ben', 'age': 31, 'shoesize': 8},
        # \xc3\x80 is LATIN CAPITAL LETTER A WITH MACRON
        {'name': '\xc4\x80dam'.decode('utf8'), 'age': 11, 'shoesize': 4},
    ])
    fp.close()
    
    Initially derived from http://docs.python.org/lib/csv-examples.html
    """
    
    def __init__(self, f, fields, dialect=csv.excel_tab,
            encoding="utf-16", **kwds):
        super(UnicodeDictWriter, self).__init__(f, dialect, encoding, **kwds)
        self.fields = fields
    
    def writerow(self, drow):
        row = [drow.get(field, '') for field in self.fields]
        super(UnicodeDictWriter, self).writerow(row)

More like this

  1. Incrementally Return CSV by davidblewett 5 years, 8 months ago
  2. ExcelResponse by Tarken 4 years, 7 months ago
  3. Generic admin action export selected rows to excel by jordic 1 year, 7 months ago
  4. Excel Export with xlwt by sspross 2 years, 7 months ago
  5. Quick script to convert json data to csv by stephenemslie 3 years ago

Comments

stephendwolff (on September 17, 2008):

This all seemed to work very nicely for a mac when opening the CSV files with Excel (2004). But... on a PC, opening the CSV with either Office 2003 or 2000, some lovely square characters appear (ie control code). I had a look for mention of Unicode in the Office conversion dialogs (for csv to excel), but no mention. just a heap of different character sets from around the world.

#

msanders (on January 16, 2009):

I have the same issue as stephendwolff. The first line of the CSV file is fine and then all subsequent lines are prefixed with a U+FEFF character (ZERO WIDTH NO-BREAK SPACE). This is with Python 2.5.2.

#

simon (on April 10, 2009):

I had the same problems with Office on Windows - I eventually gave up and produced a .xls file using http://sourceforge.net/projects/pyexcelerator

#

jsoderba (on June 16, 2010):

Writing the BOM by hand and stripping it from each row seems to work:

def __init__(self, f, dialect=csv.excel_tab, encoding="utf-16", **kwds):
    # Redirect output to a queue
    self.queue = StringIO.StringIO()
    self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
    self.stream = f

    # Force BOM
    if encoding=="utf-16":
        import codecs
        f.write(codecs.BOM_UTF16)

    self.encoding = encoding

def writerow(self, row):
    # Modified from original: now using unicode(s) to deal with e.g. ints
    self.writer.writerow([unicode(s).encode("utf-8") for s in row])
    # Fetch UTF-8 output from the queue ...
    data = self.queue.getvalue()
    data = data.decode("utf-8")
    # ... and reencode it into the target encoding
    data = data.encode(self.encoding)

    # strip BOM
    if self.encoding == "utf-16":
        data = data[2:]

    # write to the target stream
    self.stream.write(data)
    # empty queue
    self.queue.truncate(0)

#

jsoderba (on June 16, 2010):

Note that my solution would still produce an excess BOM if the file was reopened in a new UnicodeWriter object, so you might want to check if there's a BOM at the start of the file.

Something like (untested):

# read first two bytes in file
fpos = f.tell()
f.seek(0)
fstartbytes = f.read(2)
f.seek(fpos)

# Write BOM if needed
if fstartbytes != codecs.BOM_UTF16:
    f.write(codecs.BOM_UTF16)

#

msanders (on July 13, 2010):

jsoderba's solution works for me with Windows Office 2007

#

kalvin_jones (on February 17, 2012):

by speciying the encoding as "utf-16-le" rather then "utf-16", and manually writing the BOM [0xff, 0xfe] before to the response before calling writerows(), this works for me on 2007 with sp3, no garbage characters.

the problem with the orig version is that the encode call to utf-16 prepends the BOM to each and every row, whhen it should only appear at the start of the file, hence the garbage characters.

#

siblek31 (on April 13, 2013):

Perjanjian antara Prancis dan Inggris ditempatkan Brittany di bawah kekuasaan Inggris. Pada 1402 Perancis dan Inggris bergaul sedikit lebih baik, setelah beberapa dekade tips cepat hamil sengketa tahun 1518 perjanjian soal ulangan sd ditandatangani memberikan kepemilikan Inggris dari Brittany, sandera diserahkan oleh Perancis sebagai jaminan bahwa perjanjian itu akan merasa terhormat.

The Lords of Sains di Brittany adalah pihak banyak pertemuan dan diskusi, dan terlibat dalam menyetujui persyaratan. Sementara perjanjian dibuat Brittany masih diperdebatkan selama bertahun-tahun kemudian dan akhirnya jatuh kembali ke tangan Prancis.

19 September 1402 - Surat Raja Perancis, menunjuk Lancelot de Longvilliers, Sire d'Engoudsent, Chamberlain dan Jehan de Sains, Kanselir Amiens, sekretarisnya, untuk mengobati dengan utusan dari Henry IV: Corbeil (ditujukan kepada Henry IV utusan)

4 Februari 1521 - Powers oleh Francis, teknisi komputer Raja Perancis, John de Sains, Tuhan Marigny, kursus bahasa inggris murah untuk mengatur pertukaran empat dari delapan sandera diserahkan kepada Henry VIII. sesuai dengan cara mendapatkan uang dari internet perjanjian 8 Oktober 1518, empat yang telah sudah dirilis. Romorantin.

#

(Forgotten your password?)