4

I have a text file containing simulation data (60 columns, 100k rows):

a  b   c  
1  11 111
2  22 222
3  33 333
4  44 444

... where in the first row are variable names, and beneath (in columns) is the corresponding data (float type).

I need to use all these variables with their data in Python for further calculations. For example, when I insert:

print(b)

I need to receive the values from the second column.

I know how to import data:

data=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

Assign variables "manually":

a,b,c=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

But I'm having trouble with getting variable names:

reader = csv.reader(open("1.txt", "rt"))
for row in reader: 
   list.append(row)
variables=(list[0])  

How can I change this code to get all variable names from the first row and assign them to the imported arrays ?

Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
Michal
  • 1,927
  • 5
  • 21
  • 27
  • I'm not quite following that last sentence. Are you asking how to use the individual characters in the first row as the names of variables that then hold each column? – andyg0808 Aug 10 '13 at 00:27
  • Yes, I need to have a variable names taken from cells in first row. Later on, I need to multiply one column by another or by an equation and plot the results/save to file. – Michal Aug 10 '13 at 12:35

4 Answers4

3

Instead of trying to assign names, you might think about using an associative array, which is known in Python as a dict, to store your variables and their values. The code could then look something like this (borrowing liberally from the csv docs):

import csv
with open('1.txt', 'rt') as f:
  reader = csv.reader(f, delimiter=' ', skipinitialspace=True)

  lineData = list()

  cols = next(reader)
  print(cols)

  for col in cols:
    # Create a list in lineData for each column of data.
    lineData.append(list())


  for line in reader:
    for i in xrange(0, len(lineData)):
      # Copy the data from the line into the correct columns.
      lineData[i].append(line[i])

  data = dict()

  for i in xrange(0, len(cols)):
    # Create each key in the dict with the data in its column.
    data[cols[i]] = lineData[i]

print(data)

data then contains each of your variables, which can be accessed via data['varname'].

So, for example, you could do data['a'] to get the list ['1', '2', '3', '4'] given the input provided in your question.

I think trying to create names based on data in your document might be a rather awkward way to do this, compared to the dict-based method shown above. If you really want to do that, though, you might look into reflection in Python (a subject I don't really know anything about).

andyg0808
  • 1,367
  • 8
  • 18
2

The answer is: you don't want to do that.

Dictionaries are designed for exactly this purpose: the data structure you actually want is going to be something like:

data = {
    "a": [1, 2, 3, 4],
    "b": [11, 22, 33, 44],
    "c": [111, 222, 333, 444],
}

... which you can then easily access using e.g. data["a"].

It's possible to do what you want, but the usual way is a hack which relies on the fact that Python uses (drumroll) a dict internally to store variables - and since your code won't know the names of those variables, you'll be stuck using dictionary access to get at them as well ... so you might as well just use a dictionary in the first place.

It's worth pointing out that this is deliberately made difficult in Python, because if your code doesn't know the names of your variables, they are by definition data rather than logic, and should be treated as such.

In case you aren't convinced yet, here's a good article on this subject:

Stupid Python Ideas: Why you don't want to dynamically create variables

Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
  • The "dictionary" method works nicely! Is this method suitable for big files (e.g. 100MB txt file with 50k rows)? What should I do to multiply data["a"]*data["b"]*function ? – Michal Aug 10 '13 at 15:02
  • To multiply elements from the lists in `data`, you can use e.g. `data["a"][0] * data["a"][0]`. Typically you'd be iterating over those lists rather than accessing an individual member, but that's really outside the scope of this question, and a little too involved to explain properly in a comment. – Zero Piraeus Aug 10 '13 at 15:07
  • @Michal Again, efficient ways to handle large volumes of data are outside the scope of this question - if you have a new question resulting from the answer to a previous one, you should search SO to see if it's already been answered, and if not, ask it separately. – Zero Piraeus Aug 10 '13 at 15:12
0

Thanks to @andyg0808 and @Zero Piraeus I have found another solution. For me, the most appropriate - using Pandas Data Analysis Library.

   import pandas as pd

   data=pd.read_csv("1.txt",
           delim_whitespace=True,
           skipinitialspace=True)

  result=data["a"]*data["b"]*3
  print(result)

  0     33
  1    132
  2    297
  3    528

...where 0,1,2,3 are the row index.

Michal
  • 1,927
  • 5
  • 21
  • 27
0

Here is a simple way to convert a .txt file of variable names and data to NumPy arrays.

D = np.genfromtxt('1.txt',dtype='str')    # load the data in as strings
D_data = np.asarray(D[1::,:],dtype=float) # convert the data to floats
D_names = D[0,:]                          # save a list of the variable names

for i in range(len(D_names)):
    key = D_names[i]                      # define the key for this variable 
    val = D_data[:,i]                     # set the value for this variable 
    exec(key + '=val')                    # build the variable  code here

I like this method because it is easy to follow and simple to maintain. We can compact this code as follows:

D = np.genfromtxt('1.txt',dtype='str')     # load the data in as strings
for i in range(D.shape[1]):
    val = np.asarray(D[1::,i],dtype=float) # set the value for this variable 
    exec(D[0,i] + '=val')                  # build the variable 

Both codes do the same thing, return NumPy arrays named a,b, and c with their associated data.

Austin Downey
  • 943
  • 2
  • 11
  • 28