0

I used pandas to load data from a dataSource.cvs file:

DF = pd.read_csv('dataSoruce.csv')

In pandas I can clean the data, like filling missing values with 0.

Next I use DF.to_csv('temp.csv', sep=',') to write the DF as a temporary cvs file, and then use the python file handler to open the file again

     hd = open('temp.csv')
     for line in hd:
         line = line.split(',').....

to parse the data and associate more information from other data tables. This works. However, if I directly doing

      hd = DF

Then it shows the error message as

       IndexError: list index out of range

Are there any ways to skip saving to cvs and reading csv? i.e. directly open the pandas dataFrame as a file handler?

A ton of thanks!

TripleH
  • 447
  • 7
  • 16
  • 1
    Why do you want to do that? Why can't you "associate more information from other tables" just by using the DataFrame object itself? What exactly are you doing with the temporary CSV file? – BrenBarn Jul 24 '16 at 21:53
  • One thing: When you iterate over a dataframe object, you just iterate over the columns names.. They are strings, so when you have the ```.split(',')``` method you are splitting your column names.. If you have a columns name ```col='something'```, then when you split it you get ```'somthing'.split('.')``` which is just ['something']. You are probably trying to index that array and it is throwing an error. Anyway, with a little research you can figure out what you want to do in pandas, and if you cant find out exactly what you want, then S.O. slmost certainly has an answer! – dermen Jul 24 '16 at 21:53
  • @dermen: He is iterating over a file object, not a DataFrame. – BrenBarn Jul 24 '16 at 21:54
  • yea, but he wants to iterate over a df.. And he was wondering why when he did ```hd = DF``` (where DF is presumably a dataframe object) he was getting an error. – dermen Jul 24 '16 at 21:55
  • @BrenBarn, I think you are right. Doing the way just iterates the column name, not even the following numbers. The iteration stops after through the column name. – TripleH Jul 25 '16 at 01:56
  • The pandas is certainly convenient to clean data. But I am wondering to do this, since assume now i have user profile table which is not big and dirty, but there is another very big table: user-movie. In this case pandas no longer works well. – TripleH Jul 25 '16 at 01:59
  • @Hsiang: To do *what*? If you want to do something with the data, explain what it is you want to do and explain what problems you're having doing it. – BrenBarn Jul 25 '16 at 03:53
  • @BernBarn: I meant using pandas to operate replacing missing values to zero, or get dummy variables. I found pandas is extremely convenient! Anyway, I figured it out. Thanks – TripleH Jul 25 '16 at 16:31

1 Answers1

0

suppose DF is a dataframe in pandas, doing the following:

 for x in DF.values:
     x = tuple(x)

then x will be (x1, x2, x3...) format.

TripleH
  • 447
  • 7
  • 16