The file has multiple lines, with comma separation, 3 numbers perline, except the last has only 2
In [182]: fname='../Downloads/pastie-10860707.txt'
In [183]: np.fromregex(fname,regexp=pattern,dtype=float)
...
np.fromregex(fname,regexp=pattern,dtype=float)
/usr/lib/python3/dist-packages/numpy/lib/npyio.py in fromregex(file, regexp, dtype)
1240 # Create the new array as a single data-type and then
1241 # re-interpret as a single-field structured array.
-> 1242 newdtype = np.dtype(dtype[dtype.names[0]])
1243 output = np.array(seq, dtype=newdtype)
1244 output.dtype = dtype
TypeError: 'NoneType' object is not subscriptable
Loaded with a simple 'br' read, the file looks like:
In [184]: txt
Out[184]: b'2.75386225e+00,1.80508078e+00,2.95729122e+00,\n-4.21413726e+00, -3.38139076e+00, -4.22751379e+00,\n ... 4.23010784e-01, -1.14839331e+00, -9.56098910e-01,\n -1.15019836e+00, 1.13845303e-06'
That missing number on the last line will give genfromtxt
problems.
Your choice of pattern is wrong. It looks like a delimiter pattern. But the pattern in fromregex
docs produces groups:
regexp = r"(\\d+)\\s+(...)"
fromregex
does
seq = regexp.findall(file.read()) # read whole file and group it
output = np.array(seq, dtype=dtype) # make array from seq
If you want to use fromregex
you need to come up with a pattern that produces a list of tuples that can be turned into an array directly.
================
Though looking again at the error messsage I see that the immediate problem is with the dtype
. dtype=float
is not a valid dtype spec for this function. It expects a compound dtype (structured).
The error is produced by this action, where float
is your dtype
parameter:
In [189]: np.dtype(float).names[0]
...
TypeError: 'NoneType' object is not subscriptable
But it's trying to do this because the pattern has produced
In [194]: pattern.findall(txt)
Out[194]:
[b',',
b',',
b',',
b'\n',
b',',
b' ',
b' ',
....]
not the list of tuples that it expected.
==================
I can load the file with
In [213]: np.genfromtxt(txt.splitlines(),delimiter=',',usecols=[0,1])
Out[213]:
array([[ 2.75386225e+00, 1.80508078e+00],
[ -4.21413726e+00, -3.38139076e+00],
[ 7.46991792e-01, -1.08010066e+00],
...
[ 4.23010784e-01, -1.14839331e+00],
[ -1.15019836e+00, 1.13845303e-06]])
I'm using usecols
to temporarily get around the problem with only 2 numbers on the last line.
If I remove the \n
and split it on commas, I can parse the resulting text fields directly with np.array
.
In [231]: txt1=txt.replace(b'\n',b'').split(b',')
In [232]: np.array(txt1,float)
Out[232]:
array([ 2.75386225e+00, 1.80508078e+00, 2.95729122e+00,
-4.21413726e+00, -3.38139076e+00, -4.22751379e+00,
...
4.23010784e-01, -1.14839331e+00, -9.56098910e-01,
-1.15019836e+00, 1.13845303e-06])
This pattern includes the decimal and scientific notation:
In [266]: pattern=re.compile(br"(\d+\.\d+e[\+\-]\d+)")
In [267]: np.fromregex(fname,regexp=pattern,dtype=np.dtype([('f0',float)]))['f0']
Out[267]:
array([ 2.75386225e+00, 1.80508078e+00, 2.95729122e+00,
4.21413726e+00, 3.38139076e+00, 4.22751379e+00,
...
4.23010784e-01, 1.14839331e+00, 9.56098910e-01,
1.15019836e+00, 1.13845303e-06])
For now I'm creating a structured array and extracting that field. There may be a way around that. But fromregex
seems to favor the use of structured dtypes.