21

I want to convert a Pandas DataFrame into a list of objects.

This is my class:

class Reading:

    def __init__(self):
        self.HourOfDay: int = 0
        self.Percentage: float = 0

I read up on .to_dict, so I tried

df.to_dict(into=Reading)

but it returned

TypeError: unsupported type

I don't want a list of tuples, or a list of dicts, but a list of Readings. Every question I've found so far seems to be about these two scenarios. But I want my own typed objects.

Thanks

zola25
  • 1,774
  • 6
  • 24
  • 44

3 Answers3

24

Option 1: make Reading inherit from collections.MutableMapping and implement the necessary methods of that base class. Seems like a lot of work.

Option 2: Call Reading() in a list comprehension:

>>> import pandas as pd
>>> 
>>> df = pd.DataFrame({
...     'HourOfDay': [5, 10],
...     'Percentage': [0.25, 0.40]
... })
>>> 
>>> class Reading(object):
...     def __init__(self, HourOfDay: int = 0, Percentage: float = 0):
...         self.HourOfDay = int(HourOfDay)
...         self.Percentage = Percentage
...     def __repr__(self):
...         return f'{self.__class__.__name__}> (hour {self.HourOfDay}, pct. {self.Percentage})'
... 
>>> 
>>> readings = [Reading(**kwargs) for kwargs in df.to_dict(orient='records')]
>>> 
>>> 
>>> readings
[Reading> (hour 5, pct. 0.25), Reading> (hour 10, pct. 0.4)]

From docs:

into: The collections.Mapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • Your answer macthed perfectly my needs !!! Thank you very much !! Just to explain: I'm trying to convert some dataframes to "object" like formats in order to prepare them to be used as "data" for an OpenOffice template, using the [py3o.template](https://pypi.org/project/py3o.template/) library..... By the way, is there a way to automate the class "columns" innitialization ?? – silvio May 13 '21 at 14:54
  • 1
    This should be marked as the valid answer – linSESH Jun 29 '21 at 15:34
  • @linSESH I no longer use Python and was a beginner when I asked this question. Given how popular this question has become, if you can explain to me why this answer is better than the accepted one, I will happily accept this one instead – zola25 Apr 22 '23 at 01:23
  • @zola25 It proposes 2 solutions, and both are better than the accepted one IMO. The second one is the same but just more elegant. – linSESH Apr 23 '23 at 08:21
  • @linSESH thanks for the input, on reflection I think the most recent answer is the best – zola25 Apr 23 '23 at 14:51
19

having data frame with two column HourOfDay and Percentage, and parameterized constructor of your class you could define a list of Object like this:

 class Reading:

   def __init__(self, h, p):
       self.HourOfDay = h 
       self.Percentage = p 

 listOfReading= [(Reading(row.HourOfDay,row.Percentage)) for index, row in df.iterrows() ]  
NargesooTv
  • 837
  • 9
  • 15
9

It would probably be better to initialise the class with arguments, as follows:

 class Reading:
   def __init__(self, h, p):
       self.HourOfDay = h 
       self.Percentage = p 

Then, to create a list of reading, you could use this function, that takes the DataFrame as an argument:

 def reading_list(df:pd.DataFrame)->list:
    return list(map(lambda x:Reading(h=x[0],p=x[1]),df.values.tolist()))

Execution is fast, even with a large dataset.

Victor Guillaud
  • 106
  • 1
  • 1
  • 2
    This is insanely fast!I just switched to this from `reading_objects = reading_df.progress_apply(lambda row: Reading(*row.to_list()), axis=1)` and I got 4 fold speedup! (progress_apply is apply with tqdm progress bar, and I am still using the tqdm() function around the df.values.tolist(), so I can not be this). – Ben Jul 05 '21 at 21:29
  • 2
    I'm amazed this isn't the accepted answer. Elegant and very fast. – DavidWalker Oct 20 '22 at 01:26