4

Installed latest version of pandas 0.9.0 in case this was an error. EDIT: forgot to mention this is Python 2.7. Trying to read Excel file. That part seems ok. Originally, I was trying iteritems() for each row of the pandas dataframe, as the id_company had to be verified against a mysql database (code not included). Same/similar error message to putting it into a tuple (code is below). Error message follows.

Note there is a .reindex() but it didn't work before, either. The reindex() was kind of a hail-mary.

As a work-around, I'm probably going to simply import from my target sql and do a join. I'm concerned because of the size of the datasets.

 import pandas as pd
def runNow():
    #identify sheet
    source = 'C:\Users\jlalonde\Desktop\startup_geno\startupgenome_w_id_xl_20121109.xlsx'
    xls_file = pd.ExcelFile(source)
    sd = xls_file.parse('Sheet1')
    source_u = sd.drop_duplicates(cols = 'id_company', take_last=False)
    source_r = source_u[['id_company','id_good','description', 'website','keyword', 'company_name','founded_month', 'founded_year', 'description']]
    source_i = source_r.reindex() #hail mary
    tup_r = [tuple(x) for x in source_i.values]

Here is the error:

Traceback (most recent call last):
  File "<pyshell#10>", line 1, in <module>
    sg_sql_2.runNow()
  File "sg_sql_2.py", line 31, in runNow
    tup_r = [tuple(x) for x in source_r.values]
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1443, in as_matrix
    return self._data.as_matrix(columns).T
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 723, in as_matrix
    mat = self._interleave(self.items)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 743, in _interleave
    indexer = items.get_indexer(block.items)
  File "C:\Python27\lib\site-packages\pandas\core\index.py", line 748, in get_indexer
    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

So, after hammering my head against the wall on this for the better part of the day, can anyone tell me if this is a bug or if I am missing something really obvious?

joseph_pindi
  • 857
  • 2
  • 10
  • 22
  • Possible dupe: http://stackoverflow.com/questions/13292944/resample-non-unique-time-indexes-in-python. Do you have an example that can be used to reproduce the error? – Garrett Nov 12 '12 at 22:54
  • I can post the excel file no problem. No, this is not a duplicate, because I have a unique index and the other link the index was duplicated...although the solution may be the same.... – joseph_pindi Nov 13 '12 at 01:50
  • github issue tracking this error: https://github.com/pydata/pandas/issues/2236 – Garrett Nov 13 '12 at 04:16
  • Yep, and whichever answers first gets my eternal thanks! I'll post the solution on both forums, just so there are no hanging, unanswered questions, anywhere. I did this because I need to expedite a sol'n quickly and couldn't post, and then wait a week for no answer, and then try another forum and wait another week for it to be resolved. My deadlines at work don't allow that kind of delay. So apologies for the carpet bombing, but it will be clean in the end, I promise. – joseph_pindi Nov 13 '12 at 14:04

1 Answers1

4

Fixed underlying bug today on GitHub: https://github.com/pydata/pandas/issues/2236

Wes McKinney
  • 101,437
  • 32
  • 142
  • 108