34

Strange error from numpy via matplotlib when trying to get a histogram of a tiny toy dataset. I'm just not sure how to interpret the error, which makes it hard to see what to do next.

Didn't find much related, though this nltk question and this gdsCAD question are superficially similar.

I intend the debugging info at bottom to be more helpful than the driver code, but if I've missed something, please ask. This is reproducible as part of an existing test suite.

        if n > 1:
            return diff(a[slice1]-a[slice2], n-1, axis=axis)
        else:
>           return a[slice1]-a[slice2]
E           TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

../py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py:1567: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) bt
[...]
py2.7.11-venv/lib/python2.7/site-packages/matplotlib/axes/_axes.py(5678)hist()
-> m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
  py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(606)histogram()
-> if (np.diff(bins) < 0).any():
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) p numpy.__version__
'1.11.0'
(Pdb) p matplotlib.__version__
'1.4.3'
(Pdb) a
a = [u'A' u'B' u'C' u'D' u'E']
n = 1
axis = -1
(Pdb) p slice1
(slice(1, None, None),)
(Pdb) p slice2
(slice(None, -1, None),)
(Pdb)
Community
  • 1
  • 1
Gregory Marton
  • 453
  • 1
  • 4
  • 7
  • Hi, I ran into the same problem in my script, because ```numpy``` module was not imported. Could this be the case? – Mike Oct 15 '20 at 07:11

6 Answers6

11

I got the same error, but in my case I am subtracting dict.key from dict.value. I have fixed this by subtracting dict.value for corresponding key from other dict.value.

cosine_sim = cosine_similarity(e_b-e_a, w-e_c)

here I got error because e_b, e_a and e_c are embedding vector for word a,b,c respectively. I didn't know that 'w' is string, when I sought out w is string then I fix this by following line:

cosine_sim = cosine_similarity(e_b-e_a, word_to_vec_map[w]-e_c)

Instead of subtracting dict.key, now I have subtracted corresponding value for key

6

Why is it applying diff to an array of strings.

I get an error at the same point, though with a different message

In [23]: a=np.array([u'A' u'B' u'C' u'D' u'E'])

In [24]: np.diff(a)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-9d5a62fc3ff0> in <module>()
----> 1 np.diff(a)

C:\Users\paul\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\lib\function_base.pyc in diff(a, n, axis)
   1112         return diff(a[slice1]-a[slice2], n-1, axis=axis)
   1113     else:
-> 1114         return a[slice1]-a[slice2]
   1115 
   1116 

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray' 

Is this a array the bins parameter? What does the docs say bins should be?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Ah, interesting! I was trying to create a histogram with categorical bins, which means that what I want is not a histogram here at all, but a barplot. Thanks! – Gregory Marton Apr 15 '16 at 03:49
6

I had a similar issue where an integer in a row of a DataFrame I was iterating over was of type numpy.int64. I got the

TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

error when trying to subtract a float from it.

The easiest fix for me was to convert the row using pd.to_numeric(row).

SherylHohman
  • 16,580
  • 17
  • 88
  • 94
colster
  • 359
  • 1
  • 3
  • 9
  • I was getting a similar error because I also subtracting a 0.0 from 0.00 . `pd.to_numeric(DataFrame)` It solved my issue – John Sick Jan 09 '20 at 11:33
3

I am fairly new to this myself, but I had a similar error and found that it is due to a type casting issue. I was trying to concatenate rather than take the difference but I think the principle is the same here. I provided a similar answer on another question so I hope that is OK.

In essence you need to use a different data type cast, in my case I needed str not float, I suspect yours is the same so my suggested solution is. I am sorry I cannot test it before suggesting but I am unclear from your example what you were doing.

return diff(str(a[slice1])-str(a[slice2]), n-1, axis=axis)

Please see my example code below for the fix to my code, the change occurs on the third to last line. The code is to produce a basic random forest model.

import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation

Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"

npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay =  npArray.shape

names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)

XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)

# Predictions results initialised 
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain)       # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)

with open(test_name,'a') as fpred :
    lenpredictions = len(RFpreds)
    lentrue = yTest.shape[0]
    if lenpredictions == lentrue :
            fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
            for i in range(0,lenpredictions) :
                    fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
    else :
            print "ERROR - names, prediction and true value array size mismatch."

This leads to an error of;

Traceback (most recent call last):
  File "min_example.py", line 40, in <module>
    fpred.write(RFpreds[i]+",,"+yTest[i]+",\n")
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')

The solution is to make each variable a str() type on the third to last line then write to file. No other changes to then code have been made from the above.

import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation

Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"

npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay =  npArray.shape

names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)

XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)

# Predictions results initialised 
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain)       # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)

with open(test_name,'a') as fpred :
    lenpredictions = len(RFpreds)
    lentrue = yTest.shape[0]
    if lenpredictions == lentrue :
            fpred.write("Names/Label,, Prediction Random Forest,, True Value,\n")
            for i in range(0,lenpredictions) :
                    fpred.write(str(RFpreds[i])+",,"+str(yTest[i])+",\n")
    else :
            print "ERROR - names, prediction and true value array size mismatch."

These examples are from a larger code so I hope the examples are clear enough.

Community
  • 1
  • 1
James
  • 223
  • 1
  • 3
  • 9
0

I think @James is right. I got stuck by same error while working on Polyval(). And yeah solution is to use the same type of variabes. You can use typecast to cast all variables in the same type.

BELOW IS A EXAMPLE CODE

import numpy
P = numpy.array(input().split(), float)
x = float(input())
print(numpy.polyval(P,x))

here I used float as an output type. so even the user inputs the INT value (whole number). the final answer will be typecasted to float.

rohit290554
  • 79
  • 3
  • 10
0

I ran into the same issue, but in my case it was just a Python list instead of a Numpy array used. Using two Numpy arrays solved the issue for me.

ah bon
  • 9,293
  • 12
  • 65
  • 148
user3352632
  • 617
  • 6
  • 18