0

I have a text file that, amongst other data, contains data of the form

215
1 0.0 0.0 0.0
[...]
9 -0.4330127018930699 0.2499999999985268 1.0
10 -0.1366025403783193 -0.03660254037890862 1.0
11 -0.2499999999985268 -0.4330127018930699 1.0
12 0.03660254037890862 -0.1366025403783193 1.0
13 0.4330127018930699 -0.2499999999985268 1.0
14 0.1366025403783193 0.03660254037890862 1.0
15 0.2499999999985268 0.4330127018930699 1.0
[...]
215 1.0 1.0 1.0
[...]  # some more data, other format

i.e.,

  • an integer specifying the number of rows of data to come,
  • N rows with an integer followed by three floats,
  • some more data, formatted differently.

I would like to convert these data into a numpy array. Since I can best access the file with a generator over the lines, numpy.fromiter() comes in handy. I fail to specify the data type correctly, though. This

with open(filename) as f:
    line = islice(f, 1).next()
    num_nodes = int(line)
    points = numpy.fromiter(
        islice(f, num_nodes),
        dtype=[('idx', int, 1), ('vals', float, 3)],
        count=num_nodes
        )

does not work. Any hints?

Nico Schlömer
  • 53,797
  • 27
  • 201
  • 249
  • 1
    I'd suggest `loadtxt` or `genfromtxt`. With `dtype=None` they will deduce the int v float for you. Or try a `i,f,f,f` dtype. Your dtype might also work. – hpaulj Oct 29 '15 at 05:55
  • I'm having a hard time with methods that require a file handle since the file contains lots of other data that is differently formatted. This is why I'm using a generator (`islice`). – Nico Schlömer Oct 29 '15 at 06:32
  • 1
    `genfromtxt` takes anything that can feed it one line at a time. For testing I often use a list of strings. A generator should work fine. – hpaulj Oct 29 '15 at 06:43
  • http://stackoverflow.com/a/14791245/901925 – hpaulj Oct 29 '15 at 07:06

1 Answers1

0

This script:

import numpy as np

txt = b"""7
9 -0.4330127018930699 0.2499999999985268 1.0
10 -0.1366025403783193 -0.03660254037890862 1.0
11 -0.2499999999985268 -0.4330127018930699 1.0
12 0.03660254037890862 -0.1366025403783193 1.0
13 0.4330127018930699 -0.2499999999985268 1.0
14 0.1366025403783193 0.03660254037890862 1.0
15 0.2499999999985268 0.4330127018930699 1.0
[...]  # some more data, other format
"""
dt = np.dtype([('idx', int, 1), ('vals', float, 3)])
#dt = np.dtype('i,f,f,f')
print(dt)

def gentxt(txt, dt):
    f = txt.splitlines()
    line = f[0]
    num_nodes = int(line)
    aslice = slice(1,num_nodes+1)
    # print(f[aslice])
    points = np.genfromtxt(
        f[aslice],
        dtype=dt)
    return points

M = gentxt(txt,dt)
print(repr(M))

produces

1304:~/mypy$ python3 stack33406545.py 
[('idx', '<i4'), ('vals', '<f8', (3,))]
array([(9, [-0.4330127018930699, 0.2499999999985268, 1.0]),
       (10, [-0.1366025403783193, -0.03660254037890862, 1.0]),
       (11, [-0.2499999999985268, -0.4330127018930699, 1.0]),
       (12, [0.03660254037890862, -0.1366025403783193, 1.0]),
       (13, [0.4330127018930699, -0.2499999999985268, 1.0]),
       (14, [0.1366025403783193, 0.03660254037890862, 1.0]),
       (15, [0.2499999999985268, 0.4330127018930699, 1.0])], 
      dtype=[('idx', '<i4'), ('vals', '<f8', (3,))])

I used simple slicing of a list of text lines. I tried to use islice as you do, but decided it wasn't worth my time to get it right. The central thing is to use an interable that produces the desired text lines. It doesn't matter whether it's a list, a range of file lines, or the output of a generator.


fromiter is picky about what it accepts. It must produce a 1d array;

A list or iterable that returns individual strings (convertable to a simple dtype) work:

In [233]: np.fromiter(['1', '2', '3', '4'],dtype=int)
Out[233]: array([1, 2, 3, 4])

but a list of lists (2d) does not:

In [234]: np.fromiter([['1', '2'],['3', '4']],dtype=int)
....
ValueError: setting an array element with a sequence.

with a complex dtype I have to give it tuples:

In [236]: np.fromiter([('1', '2'),('3', '4')],dtype=np.dtype('i,i'))
Out[236]: 
array([(1, 2), (3, 4)], dtype=[('f0', '<i4'), ('f1', '<i4')])

Strings or tuples of strings with several numbers doesn't work,['1 2','3 4'], [('1 2',),('3 4',)]. genfromtxt is much better handling text with rows and columns (csv like).

hpaulj
  • 221,503
  • 14
  • 230
  • 353