5

I am using below dataframe to convert to dictionary in specific format.

However, I am getting an error TypeError: unhashable type: 'Series'

import polars as pl

#input (polars eager dataframe):
polar_df = pl.DataFrame(
"foo": ['a', 'b', 'c'],
"bar": [[6.0, 7.0, 8.0],[9.0,10.0,11.0],[12.0,13.0,14.0]]
)

#expected output (dictionary):
#{'a':[6.0, 7.0, 8.0],'b':[9.0,10.0,11.0],'c':[12.0,13.0,14.0]}

dict_output = 
dict(zip(polar_df.select(pl.col('foo')),
polar_df.select(pl.col('bar'))
))
protob
  • 3,317
  • 1
  • 8
  • 19
  • you're almost there brother, you just need to extract the actual values from the selected columns using a to_list() method. – Kozydot Apr 12 '23 at 09:54
  • The below code worked :) Thank you for quick response. dict(zip(lf.select(pl.col('foo')).to_series().to_list(),lf.select(pl.col('bar')).to_series().to_list())) – Rakesh Chaudhary Apr 12 '23 at 10:19

2 Answers2

4

The "polars way" to do this is dict() + .iter_rows()

df = pl.DataFrame({
    "foo": ['a', 'b', 'c'],
    "bar": [[6.0, 7.0, 8.0],[9.0,10.0,11.0],[12.0,13.0,14.0]]
})
>>> dict(df.iter_rows())
{'a': [6.0, 7.0, 8.0], 'b': [9.0, 10.0, 11.0], 'c': [12.0, 13.0, 14.0]}
jqurious
  • 9,953
  • 1
  • 4
  • 14
2

I've turned jqurious's solution into a convenience function:

def df_to_dict(df: pl.DataFrame,key_col: str,value_col: str) -> Dict[Any,Any]:
    """
    Get a Python dict from two columns of a DataFrame
    If the key column is not unique, the last row is used
    """
    return dict(df.select(key_col,value_col).iter_rows())

For even greater convenience, you can extend polars' API like this:

from typing import Dict, Any
@pl.api.register_dataframe_namespace("util")
class Export:
    def __init__(self, df: pl.DataFrame):
        self._df = df

    def to_dict(self,key_col: str,value_col: str) -> Dict[Any,Any]:
        """
        Get a Python dict from two columns of a DataFrame
        If the key column is not unique, the last row is used
        """
        return dict(self._df.select(key_col,value_col).iter_rows())

Which you can then use like this:

df = pl.DataFrame({
    "foo": ['a','b','c'],
    "bar": [1,2,3],
    "baz": [10,11,12]
})

df_to_dict(df,"foo","baz")
# {'a': 10, 'b': 11, 'c': 12}

df.util.to_dict("foo","baz")
# {'a': 10, 'b': 11, 'c': 12}
Cornelius Roemer
  • 3,772
  • 1
  • 24
  • 55