1

I am using the CSV reader to read a TSV in Python. The code is:

f = csv.reader(open('sample.csv'), delimiter='\t')
for chunk in f:
   print(chunk)

One row from the tab separated CSV file looks like this (csv hosted here):

doc unit1_toks unit2_toks unit1_txt1 unit2_txt2 s1_toks s2_toks unit1_sent unit2_sent dir
GUM_bio_galois 156-160 161-170 " We zouden dan voorstellen dat de auteur al zijn werk zou moeten publiceren 107-182 107-182 Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument is not sufficient . " [ 16 ] Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument would then suggest that the author should publish the opinion . " [ 16 ] 1>2

I am getting the following output (the CSV reader is missing some tab spaces):

['GUM_bio_galois', 
'156-160', 
'161-170', 
' We zouden dan voorstellen\tdat de auteur al zijn werk zou moeten publiceren\t107-182\t107-182\tPoisson declared Galois \' work  incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]', 
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]', 
'1>2']

I want it to look like this:

['GUM_bio_galois', 
'156-160', 
'161-170', 
'" We zouden dan voorstellen',
'dat de auteur al zijn werk zou moeten publiceren',
'107-182',
'107-182',
'Poisson declared Galois \' work  incomprehensible " , declaring that " [ Galois \' ] argument is not sufficient . " [ 16 ]', 
'Poisson declared Galois \' work " incomprehensible " , declaring that " [ Galois \' ] argument would then suggest that the author should publish the opinion . " [ 16 ]', 
'1>2']

How can I get the CSV reader to handle incomplete quotes and retain them in my output?

martineau
  • 119,623
  • 25
  • 170
  • 301
Someone
  • 35
  • 2
  • 7
  • 1
    Can you post the actual `repr()` of the header and one line so that we don't need to reconstruct it ourselves? – buran Nov 18 '21 at 10:10
  • Note that the fact it uses quotes suggest there are fields that have delimited inside it and that's why they have to use quotes. Of course there is always possibility the csv file was not constructed properly in the first place – buran Nov 18 '21 at 10:29
  • Try using `open('sample.csv', encoding='utf8')`. – martineau Nov 18 '21 at 10:44
  • @martineau using `open('sample.csv', encoding='utf8')` has no effect. – Someone Nov 18 '21 at 12:27
  • @buran I have provided a link to the csv (header included) hosted on github. Also adding in the comment again: [here](https://github.com/erzaliator/RadomDump/blob/main/sample.csv) – Someone Nov 18 '21 at 12:48

1 Answers1

1
import csv
with open('sample.csv') as f:
   rdr = csv.reader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
   header = next(rdr)
   for line in rdr:
      print(line)

or using csv.DictReader:

import csv
with open('sample.csv') as f:
   rdr = csv.DictReader(f, quoting=csv.QUOTE_NONE, delimiter='\t')
   for line in rdr:
      print(line)
buran
  • 13,682
  • 10
  • 36
  • 61
  • I would like to write the same to another file. I am again running into an issue with the quotes. I tried `writer = csv.writer(open('target.csv', 'w'), delimiter='\t', quoting=csv.QUOTE_NONE)` and then `writer.writerow(line)` but I get this error: `_csv.Error: need to escape, but no escapechar set` – Someone Nov 18 '21 at 13:53
  • 1
    @Someone: The error is because you need to define `escapechar=` to "a one-character string used by the writer to escape the *delimiter* if *quoting* is set to `QUOTE_NONE`", to something. Note that after doing so, you'll also need to define it when reading the file later. See the [documentation](https://docs.python.org/3/library/csv.html#csv.Dialect.escapechar). – martineau Nov 18 '21 at 15:50
  • @martineau I don't want to add any escapechars (need to keep the file in original state). I have found this way to work: `writer = csv.writer(open('target.csv', 'w'), delimiter='\t', quoting=csv.QUOTE_NONE, quotechar='')` – Someone Nov 19 '21 at 07:53
  • 1
    @Someone: I was just quoting you the documentation which explains the `_csv.Error` you were getting. BTW, in Python 3, CSV files should always be opened with a `newline=""` option — again, read the fine docs. – martineau Nov 19 '21 at 08:24