-3

This question has many answers (see Python Math - TypeError: 'NoneType' object is not subscriptable). My question is different, because I correctly expect np.genfromtxt(...) to return an array (i.e np.genfromtxt(...) is not an in place function).

I am trying to parse and store the following into a single dimensional array:

http://pastie.org/10860707#2-3

To do so, I tried:

pattern = re.compile(b'[\s,]')
theta = np.fromregex("RegLogTheta", regexp = pattern, dtype = float)

This is the traceback (how should it be formatted?):

Traceback (most recent call last):
File "/Users/ahanagrawal/Documents/Java/MachL/Chap3/ExamScoreVisual2.py", line    36, in <module>
theta = np.fromregex("RegLogTheta", regexp = pattern, dtype = float)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1240, in fromregex
newdtype = np.dtype(dtype[dtype.names[0]])
TypeError: 'NoneType' object is not subscriptable

If you would like to run this, please download the text file from: http://pastie.org/10860707#2-3 and run the code above.

Community
  • 1
  • 1
Muno
  • 575
  • 2
  • 5
  • 20
  • 1
    Please post the full traceback. – kindall Jun 01 '16 at 20:38
  • 1
    You don't even use `np.genfromtxt` in your posted code. – user2357112 Jun 01 '16 at 20:39
  • Please don't post data on external websites. Copy it into your question. – MattDMo Jun 01 '16 at 20:46
  • Questions seeking debugging help (**"why isn't this code working?"**) must include the desired behavior, *a specific problem or error* and *the shortest code necessary* to reproduce it **in the question itself**. Questions without **a clear problem statement** are not useful to other readers. See: [How to create a Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve). – MattDMo Jun 01 '16 at 20:47
  • Actually with this setup it is easier to download that data from the external website than it would be copy-n-paste it from question. – hpaulj Jun 01 '16 at 21:00
  • @kindall Should be good now – Muno Jun 01 '16 at 21:06
  • Your `fromregex` call makes no sense, and the exception you're getting is [documented](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.fromregex.html). It's what happens when your dtype isn't valid for a structured array, which is one of the several things that make no sense about your `fromregex` call. What were you expecting it to do? How did you expect that regex to extract any data from your file? – user2357112 Jun 01 '16 at 21:09
  • @user2357112 I believe `np.fromregex` does the following "Construct an array from a text file, using regular expression parsing." My textfile is not suitable for parsing with just one delimiter, so I thought that a regex would do the trick. I now see, however, that `np.fromregex` actually returns part of the file matching the regex I pass in! So I really need something that returns everything but whatever matches the regex. – Muno Jun 01 '16 at 21:15

1 Answers1

1

The file has multiple lines, with comma separation, 3 numbers perline, except the last has only 2

In [182]: fname='../Downloads/pastie-10860707.txt'

In [183]: np.fromregex(fname,regexp=pattern,dtype=float)
... 
np.fromregex(fname,regexp=pattern,dtype=float)

/usr/lib/python3/dist-packages/numpy/lib/npyio.py in fromregex(file, regexp, dtype)
   1240             # Create the new array as a single data-type and then
   1241             #   re-interpret as a single-field structured array.
-> 1242             newdtype = np.dtype(dtype[dtype.names[0]])
   1243             output = np.array(seq, dtype=newdtype)
   1244             output.dtype = dtype

TypeError: 'NoneType' object is not subscriptable

Loaded with a simple 'br' read, the file looks like:

In [184]: txt
Out[184]: b'2.75386225e+00,1.80508078e+00,2.95729122e+00,\n-4.21413726e+00,  -3.38139076e+00,  -4.22751379e+00,\n ...      4.23010784e-01,  -1.14839331e+00,  -9.56098910e-01,\n        -1.15019836e+00,   1.13845303e-06'

That missing number on the last line will give genfromtxt problems.

Your choice of pattern is wrong. It looks like a delimiter pattern. But the pattern in fromregex docs produces groups:

regexp = r"(\\d+)\\s+(...)"

fromregex does

seq = regexp.findall(file.read())  # read whole file and group it
output = np.array(seq, dtype=dtype)  # make array from seq

If you want to use fromregex you need to come up with a pattern that produces a list of tuples that can be turned into an array directly.

================

Though looking again at the error messsage I see that the immediate problem is with the dtype. dtype=float is not a valid dtype spec for this function. It expects a compound dtype (structured).

The error is produced by this action, where float is your dtype parameter:

In [189]: np.dtype(float).names[0]
 ...
TypeError: 'NoneType' object is not subscriptable

But it's trying to do this because the pattern has produced

In [194]: pattern.findall(txt)
Out[194]: 
[b',',
 b',',
 b',',
 b'\n',
 b',',
 b' ',
 b' ',
 ....]

not the list of tuples that it expected.

==================

I can load the file with

In [213]: np.genfromtxt(txt.splitlines(),delimiter=',',usecols=[0,1])
Out[213]: 
array([[  2.75386225e+00,   1.80508078e+00],
       [ -4.21413726e+00,  -3.38139076e+00],
       [  7.46991792e-01,  -1.08010066e+00],
        ...
       [  4.23010784e-01,  -1.14839331e+00],
       [ -1.15019836e+00,   1.13845303e-06]])

I'm using usecols to temporarily get around the problem with only 2 numbers on the last line.

If I remove the \n and split it on commas, I can parse the resulting text fields directly with np.array.

In [231]: txt1=txt.replace(b'\n',b'').split(b',')

In [232]: np.array(txt1,float)
Out[232]: 
array([  2.75386225e+00,   1.80508078e+00,   2.95729122e+00,
        -4.21413726e+00,  -3.38139076e+00,  -4.22751379e+00,
          ...
         4.23010784e-01,  -1.14839331e+00,  -9.56098910e-01,
        -1.15019836e+00,   1.13845303e-06])

This pattern includes the decimal and scientific notation:

In [266]: pattern=re.compile(br"(\d+\.\d+e[\+\-]\d+)")

In [267]: np.fromregex(fname,regexp=pattern,dtype=np.dtype([('f0',float)]))['f0']
Out[267]: 
array([  2.75386225e+00,   1.80508078e+00,   2.95729122e+00,
         4.21413726e+00,   3.38139076e+00,   4.22751379e+00,
      ...
         4.23010784e-01,   1.14839331e+00,   9.56098910e-01,
         1.15019836e+00,   1.13845303e-06])

For now I'm creating a structured array and extracting that field. There may be a way around that. But fromregex seems to favor the use of structured dtypes.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Is this supposed to be an answer? – Bryan Oakley Jun 01 '16 at 21:10
  • It's too long to be comment! Plus you jumped in with your question before I finished editing. I'm still not done editing. – hpaulj Jun 01 '16 at 21:22
  • @hpaulj I think the following should work: `theta = np.fromregex("RegLogTheta", regexp = r"\s+,(\d+)\s+,", dtype = [(np.float128)])`. The only problem is that [(np.float128)] is not a recognized dtype, which I don't understand why. – Muno Jun 01 '16 at 21:34
  • @hpaulj Oh you edited after I commented, too. Let me check that edit first. – Muno Jun 01 '16 at 21:49
  • I generalized your pattern to handle scientific notation. – hpaulj Jun 02 '16 at 02:00