5

Input:

A    B    C
D    E    F

This file is NOT exclusively tab-delimited, some entries are space-delimited to look like they were tab-delimited (which is annoying). I tried reading in the file with the csv module using the canonical tab delimited option hoping it wouldn't mind a few spaces (needless to say, my output came out botched with this code):

with open('file.txt') as f:
    input = csv.reader(f, delimiter='\t')
    for row in input:
        print row

I then tried replacing the second line with csv.reader('\t'.join(f.split())) to try to take advantage of Remove whitespace in Python using string.whitespace but my error was: AttributeError: 'file' object has no attribute 'split'.

I also tried examining Can I import a CSV file and automatically infer the delimiter? but here the OP imported either semicolon-delimited or comma-delimited files, but not a file which was a random mixture of both kinds of delimiters.

Was wondering if the csv module can handle reading in files with a mix of various delimiters or whether I should try a different approach (e.g., not use the csv module)?

I am hoping that there exists a way to read in a file with a mixture of delimiters and automatically turn this file into a tab-delimited file.

Community
  • 1
  • 1
warship
  • 2,924
  • 6
  • 39
  • 65
  • Are your space delimited lines always delimited with the same number of spaces? – Andy Aug 22 '14 at 01:06
  • 3
    I would say it's probably best to normalize your file and then process it, than handling edge cases all day. – monkut Aug 22 '14 at 01:08
  • I agree, but how would I normalize/process it if the file is hundreds of lines long? Perhaps there is a better alternative to the `csv` module? – warship Aug 22 '14 at 01:10
  • Any fields that have quotes around them: `"A Dont break B"`? – dawg Aug 22 '14 at 01:14
  • @dawg: No quotes in the file fields. – warship Aug 22 '14 at 01:15
  • Does every single tab or space denote a new field, or are there some fields with spaces within the field itself? – Gerrat Aug 22 '14 at 01:17
  • Every single tab or space denotes a new field (thankfully). – warship Aug 22 '14 at 01:19
  • ...and that got the answers coming! :) – Gerrat Aug 22 '14 at 01:21
  • @Gerrat: Could you imagine the nightmare of dealing with fields that had spaces within the field? :) Would that even be possible (don't think `split()` would get the job done)? – warship Aug 22 '14 at 01:54
  • Yes I can - which is why I asked. The answer to your question was either going to be trivial (which it was), or really ugly and error-prone. – Gerrat Aug 22 '14 at 01:58

3 Answers3

6

Just use .split():

csv='''\
A\tB\tC
D    E    F
'''

data=[]
for line in csv.splitlines():
    data.append(line.split())

print data 
# [['A', 'B', 'C'], ['D', 'E', 'F']]

Or, more succinctly:

>>> [line.split() for line in csv.splitlines()]  
[['A', 'B', 'C'], ['D', 'E', 'F']]

For a file, something like:

with open(fn, 'r') as fin:
    data=[line.split() for line in fin]

It works because str.split() will split on all whitespace between data elements even if more than 1 whitespace character or if mixed:

>>> '1\t\t\t2     3\t  \t  \t4'.split()
['1', '2', '3', '4']
dawg
  • 98,345
  • 23
  • 131
  • 206
  • +1 and accepted answer for showing how to accomplish with `.split()` what you wish `csv` module could easily do. – warship Aug 22 '14 at 01:47
  • 2
    This approach will fail if the values have whitespace in them (e.g. strings set off with quote characters). – abeboparebop Jan 08 '16 at 10:48
1

Why not just roll your own splitter rather than the CSV module?

delimeters = [',', ' ', '\t']

unique = '[**This is a unique delimeter**]'

with open(fileName) as f:
    for l in f: 
        for d in delimeters: l = unique.join(l.split(d))
        row = l.split(unique)
ssm
  • 5,277
  • 1
  • 24
  • 42
0

.split() is an easy and nice solution for the situation that "consecutive, arbitrarily-mixed tabs and blanks as one delimiter"; However, this does not work while value with blank (enclosed by quote mark) appears.

First, we may replace each tab in the text file with one blank ' '; This can simplify the situation to "consecutive, arbitrary-number of blanks as one delimiter".
There is a good example for replacing a pattern over a file: https://www.safaribooksonline.com/library/view/python-cookbook/0596001673/ch04s04.html
Note 1: DO NOT replace with '' (empty string), due to there may be a delimiter includes ONLY tabs.
Note 2: This approach DOES NOT work while you have tab character (/t) inside a value that enclosed by quote mark.

Then we can use Python's csv module, with delimiter as ' ' (one blank), and use skipinitialspace=True to ignore consecutive blanks.

Gawi - Kai
  • 127
  • 3
  • 10