3

I have viewed many of the questions that come up with this error. I am running pandas '0.10.1'

df = DataFrame({'A' : np.random.randn(5),
 'B' : np.random.randn(5),'C' : np.random.randn(5), 
  'D':['a','b','c','d','e'] })

#gives error
df.take([2,0,1,2,3], axis=1).drop(['C'],axis=1)

#works fine
df.take([2,0,1,2,1], axis=1).drop(['C'],axis=1)

Only thing I can see is that in the former case I have the non-numeric column, which seems to be affecting the index somehow but the below command returns empty:

df.take([2,0,1,2,3], axis=1).index.get_duplicates()

Reindexing error makes no sense does not seem to apply as my old index is unique.

My index appears unique as far as I can tell using this command df.take([2,0,1,2,3], axis=1).index.get_duplicates() from this Q&A: problems with reindexing dataframes: Reindexing only valid with uniquely valued Index objects

"Reindexing only valid with uniquely valued Index objects" does not seem to apply

I think my pandas version# is ok so this should bug should not be the problem pandas Reindexing only valid with uniquely valued Index objects

Community
  • 1
  • 1
Paul
  • 7,155
  • 8
  • 41
  • 40
  • 3
    you are taking on the *columns*, and they are clearly *not-uniques*, since by definition you are taking duplicates. What are you actually trying to do? – Jeff Feb 03 '14 at 17:46
  • You are correct. But notice that in both cases the columns are not unique after the take, but in the former case it returns with the error while in the latter case there is no error and the correct result is returned. My actual use case is machine learning related where I have an MxN matrix representing M feature vectors in N space. I want to repeat the classification column every 10-15 columns so that I do not lose track of the classification when I am scrolling around looking at the feature vectors. That being said I would appreciate the solution/explanation to the above problem – Paul Feb 03 '14 at 17:59
  • 1
    might be a bug: https://github.com/pydata/pandas/issues/6240; you in general want to be careful of duplicate columns. You should not create duplicate columns simply to look at things. – Jeff Feb 03 '14 at 19:44

1 Answers1

9

Firstly, I believe you meant to test for duplicates using the following command:

df.take([2,0,1,2,3],axis=1).columns.get_duplicates()

because if you used index instead of columns, then it would obviously returned an empty array because the random float values don't repeat. The above command returns, as expected:

['C']

Secondly, I think you're right, the non-numeric column is throwing it off, because even if you use the following, there is still an error:

df = DataFrame({'A' : np.random.randn(5), 'B' : np.random.randn(5),'C' :np.random.randn(5), 'D':[str(x) for x in np.random.randn(5) ]})

It could be a bug, because if you check out the core file called 'index.py', on line 86, and line 1228, the type it is expecting is either (respectively):

_engine_type = _index.ObjectEngine


_engine_type = _index.Int64Engine

and neither of those seem to be expecting a string, if you look deeper into the documentation. That's the best I got, good luck!! Let me know if you solve this as I'm interested too.

makansij
  • 9,303
  • 37
  • 105
  • 183
  • 2
    Yeah, I think you are onto something. It may be related to the bug report that @Jeff posted on github: https://github.com/pydata/pandas/issues/6240 Thank for taking a look at the source code and for the even better minimal working example. – Paul Feb 04 '14 at 12:54