1

how can i convert first line of a text file into list in python? I want to escape NaNs while converting into the list.

import csv
with open ('data.txt', 'r') as f:
    first_row = [column[0] for column in csv.reader(f,delimiter='\t')]
    print (first_row)
lisa
  • 61
  • 1
  • 2
  • 8
  • Give some more context for this - what kind of input data? what do you mean by "escape NaNs"? – Jeff Tratner Jun 12 '13 at 02:28
  • 1
    @lisa, you should revert your edit and ask a new question. Now none of the below answers have any context. – dansalmo Jun 12 '13 at 04:23
  • seems lisa created a new sock puppet account http://stackoverflow.com/questions/17057641/creating-lists-from-text-file-using-pandas-in-python. – dansalmo Jun 12 '13 at 04:29

4 Answers4

4

Make it easier on yourself, use pandas:

import pandas
df  = pandas.read_csv("data.txt")

If you need to explicitly tell pandas that a particular value is NaN, just pass it to the reader

df = pandas.read_csv("data.txt", na_values=["NAN"])

or if you want to skip lines that have issues

df = pandas.read_csv("data.txt", error_bad_lines=False)

To get row 1:

row1 = df.irow(0)

TO get column 1:

col1 = df.icol(0)
Jeff Tratner
  • 16,270
  • 4
  • 47
  • 67
  • 2
    Pandas is perfect for this, and you beat me to it, +1, but you don't need to specify the delimiter, pandas `read_csv` "sniffs" what the delimiter of the file is! – Ryan Saxe Jun 12 '13 at 02:38
  • @RyanSaxe okay, updated to reflect that (plus I misspelled "\t" to boot :P) – Jeff Tratner Jun 12 '13 at 02:40
  • @Jeff Tratner thank you i accepted your answer. then how can i extract only row one or column one? – lisa Jun 12 '13 at 02:46
  • Isn't usage of pandas heavyweight for this ? If one can do this easily using standard libraries from Python, doesn't usage of Pandas add an additional dependency ? – sateesh Jun 12 '13 at 02:53
  • I do agree that Pandas is a bit much for just this simple task, but you have to realize, that the`error_bad_lines` is not something easily available. It would be best to just use the `try` and `except` answer already given if the format of the empty rows was given, but if it's not, pandas makes this much easier and has many functions for dealing with NaNs. – Ryan Saxe Jun 12 '13 at 02:56
  • @Jeff Tratner, I have just tried your suggestion on a text file shown in the question and am not getting the expected results. row1 includes the header info. col1 displays the entire table. – dansalmo Jun 12 '13 at 04:22
  • @dansalmo use the `values` attribute. And if you really are only looking for the first row or first column, then pandas might be too much. – Jeff Tratner Jun 12 '13 at 09:50
  • @sateesh maybe, but the upside is that you can handle pretty much anything without worrying about it and be able to slice and dice it all later very easily – Jeff Tratner Jun 12 '13 at 09:51
  • 1
    @Jeff Tratner, the problem was caused by the white space in the text file. Using this `df = pandas.read_csv("test.txt", sep=r"\s+")` fixed it. The original question was changed here and moved to here by OP under a different account for some reason. http://stackoverflow.com/questions/17057641/creating-lists-from-text-file-using-pandas-in-python – dansalmo Jun 12 '13 at 15:28
1

If you have sure way of determining what constitutes invalid value for a cell you can use the string comparison and ignore those values.

If your purpose is to ignore those values which Python doesn't consider as floats you can do something like below:

cell = <cell_value>
try:
    f = float(cell)
    # store f somewhere
except ValueError:
    # ignore cell, or may be log this
    pass
sateesh
  • 27,947
  • 7
  • 36
  • 45
1

csv.reader() returns an iterator that yields an array of columns per iteration (i.e. line).

Simply put, this is sufficient to get you the first line of data.txt as a list:

import csv
with open ('data.txt') as f:
    first_row = csv.reader(f, delimiter='\t')

It appears you also want to convert the list elements to a decimal type, which can be done using map(...) and float(...).

e.g.:

first_row = map(float, first_row)

If the list contains the text "NaN", float() converts this to the special value nan without much intervention.

e.g.:

>>> float("NaN")
nan
Community
  • 1
  • 1
antak
  • 19,481
  • 9
  • 72
  • 80
0

This worked for me (puts all cells on a row or line into a list):

import csv
with open ('data.txt', 'r') as f:
    for row in csv.reader(f,delimiter='\t'):
        print row # prints a list of entries for current row.
cforbish
  • 8,567
  • 3
  • 28
  • 32