1

So I am trying to read in some data which looks like this (this is just the first line):

1 14.4132966509 (-1.2936631396696465, 0.0077236319580324952,   0.066687939649724415) (-13.170491147387787, 0.0051387952329040587, 0.0527163312916894)

I'm attempting to read it in with np.genfromtxt using:

skirt_data = np.genfromtxt('skirt_data.dat', names = ['halo', 'IRX', 'beta', 'intercept'], delimiter = ' ', dtype = None)

But it's returning this:

ValueError: size of tuple must match number of fields.

My question is, how exactly do I load in the arrays that are within the data, so that I can pull out the first number in that array? Ultimately, I want to do something like this to look at the first value of the beta column:

skirt_data['beta'][1]

Thanks ahead of time!

Dyell
  • 51
  • 3
  • 1
    Your data are not a square table, having quite a bit of structure. I think you should write your own import. Use plain Python `readlines` and take apart each line according to the structure only you understand. One could play with multiple separators and still use `numpy` importers but it would not be very elegant. – roadrunner66 Apr 09 '16 at 00:33
  • Those `()` will give `genfromtxt` problems. It's designed for lines with just fields and delimiters, without quotes or other bracketing. But it will accept input from your own line reader (anything that feeds it lines). So you can filter out the `()`, replacing them with regular delimiters. – hpaulj Apr 09 '16 at 04:45

1 Answers1

1

If each line is the same, I would go with a custom parser.

You can split the line using str.split(sep, optional max splits)

So something along the lines of

names = [list from above]
output = {}
with open('skirt_data.dat') as sfd:
    for i, line in enumerate(sfd.readlines()):
        skirt_name = names[i]
        first_col, second_col, rest = line.split(' ', 2)
        output[skirt_name] = int(first_col)
print output
Charles L.
  • 5,795
  • 10
  • 40
  • 60
  • Wow! I didn't even think about writing a custom parser. Thanks for tip (obviously I'm new to python). So I've sort of edited your suggestion to look like this: `with open('skirt_data.dat') as sfd: for i, line in enumerate(sfd.readlines()): skirt_name = names[i] first_col, second_col, third_col, fourth_col = line.split(' ') output[skirt_name] = int(third_col[1]) print output` But now it's giving me this error: `ValueError: too many values to unpack` on line 4. Is there a way to use iteritems() here? – Dyell Apr 09 '16 at 01:39
  • Sorry, I did not see the comment. The problem is there are more than 4 columns in a row, and Python doesn't know how to put >4 values into 4 variables. I wouldn't use iteritems because that is for dictionaries, and this is a list. You could save the spit results into a list: `cols = line.split(' ')` and then check it's length and only save it when it's good: `if len(cols)> 4: first_col = col[0] ...` or set a the `max_split` argument to split (the 2nd arg) to `4`, and change the line to `first_col, second_col, third_col, fourth_col, rest = line.split(' ', 4)` extra cols go to `rest` – Charles L. Apr 20 '16 at 17:32