I'm trying to predict the category of a news article based on 2 features: author name and article headline.
I have transformed both columns separately using CountVectorizer and TfidfTransformer. Thus, what I have now is a 3D array (ie. array of list of arrays), each row containing the [author_tfid, summary_tfid] of each data instance:
X_train = array([[array([0., 3., 0., ..., 0., 4., 0.]),
array([0., 0., 3., ..., 0., 0., 0.])],
[array([0., 0., 0., ..., 0., 0., 9.]),
array([1., 0., 0., ..., 0., 0., 0.])],
[array([2., 0., 0., ..., 0., 0., 0.]),
array([0., 0., 0., ..., 0., 5., 0.])],
However, when I try using imblearn's RandomOversampler.fit_transform(X_train), I get the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-44-210227188cde> in <module>()
----> 1 X_oversampled, y_oversampled = oversampler.fit_resample(X, y)
4 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
62 # for object dtype data, we only check for NaNs (GH-13254)
63 elif X.dtype == np.dtype('object') and not allow_nan:
---> 64 if _object_dtype_isnan(X).any():
65 raise ValueError("Input contains NaN")
66
AttributeError: 'bool' object has no attribute 'any'
Tried searching the forums and google but can't seem to find anyone with this problem. So would like to find out what's wrong / the correct way to conduct oversampling on a 3D array.