1

Using oct2py to call corrcoef.m on several (10MM+) size dataframes to return [R,P] matrices to generate training sets for a ML algorithm. Yesterday, I had this working no problem. Ran the script from the top this morning, returning an identical test set to be passed to Octave through oct2py.

I am being returned:

Oct2PyError: Octave evaluation error:
error: isnan: not defined for cell
error: called from:
corrcoef at line 152, column 5
CorrCoefScript at line 1, column 7

First, there are no null/nan values in the set. In fact, there aren't even any zeros. There is no uniformity in any column such that there is no standard deviation being returned in the corrcoef calculation. It is mathmatically sound.

Second, when I load the test set into Octave through the GUI and execute the same .m on the same data no errors are returned, and the [R,P] matrices are identical to the saved outputs from last night. I tested to see if the matrix var is being passed to Octave through oct2py correctly, and Octave is receiving an identical matrix. However, oct2py can no longer execute ANY .m with a nan check in the source code. The error above is returned for any Octave packaged .m script that contains .isnan at any point.

For s&g, I modified my .m to receive the matrix var and write it to a flat file like so:

csvwrite ('filename', data);

This also fails with an fprintf error; if I run the same code on the same dataset inside of the Octave GUI, works fine.

I'm at a loss here. I updated conda, oct2py, and Octave with the same results. Again, the exact code with the exact data ran behaved as expected less than 24 hours prior.

I'm using the code below in Jupyter Notebook to test:

%env OCTAVE_EXECUTABLE = F:\Octave\Octave-5.1.0.0\mingw32\bin\octave-cli-5.1.0.exe
import oct2py
from oct2py import octave

octave.addpath('F:\\FinanceServer\\Python\\Secondary Docs\\autotesting\\atOctave_Scripts');
data = x
octave.push('data',data)
octave.eval('CorrCoefScript')
cmat = octave.pull('R')
enter code here

Side note - I am only having this issue inside of a specific .ipynb script. Through some stroke of luck, the no other scripts using oct2py seem to be affected.

broseidon
  • 85
  • 1
  • 7
  • 1
    You might be passing a list rather than a Numpy array. – Cris Luengo Mar 22 '19 at 17:59
  • (x) is being generated by downcasting a df to a numpy array using the .values approach - how would I check to see if this is resulting in an array in Octave, with oct2py being behind the scenes? – broseidon Mar 22 '19 at 18:07
  • Note - when I write (x) to a csv from Python and load the csv in Ocatve Gui, it is a matrix with the same dimensions as the Python object (x) – broseidon Mar 22 '19 at 18:08
  • 1
    In the octave script, type `class(x)` (or whatever the variable is called there). This should print `double` or `cell` or something like this to the terminal. I'm guessing it's a cell array, as calling the `isnan` function on it is what causes your error. Cell arrays are generated from Python sets, lists and tuples: http://blink1073.github.io/oct2py/source/conversions.html – Cris Luengo Mar 22 '19 at 18:59
  • 1
    BTW: not only `isnan` will generate this error, most numerical functions will. It just looks like your functions all first encounter an `isnan`, but this is coincidental. – Cris Luengo Mar 22 '19 at 19:01
  • @CrisLuengo I've now been able to implement the loop by iterating through a dictionary of dataframes. Following your post above, I realized i was passing dataframes as list objects to Octave in the original post.. – broseidon Mar 25 '19 at 15:03

1 Answers1

1

Got it fixed, but it generates more questions than answers. I was using a list of dataframes to loop by type, such that for each iteration i, x was generated through x = dflst[i]. For reasons beyond my understanding, that failed with the passage of time. However, by writing my loop into a custom function and explicitly calling each data frame within that function as so: oct_func(type1df) I am seeing the expected behavior and desired outcome. However, I still cannot use a loop to pass the dataframes to oct_func(). So, it's a band-aid solution that will fit my purposes, but is frustratingly unable to scale .

Edit: The loop works fine if iterating through a dict of dataframes instead of a list.

broseidon
  • 85
  • 1
  • 7