0

I start to use clickhouse-driver Client in my data pipelines and every time I faced an issue:

TypeError("unhashable type: 'dict'")

I know this kind of error that dict is not hashable type, but maybe I was wrong in arguments which i used below?

cl = Client(**pl.config.click_credents.get('client_local')) #connector

chunk = data_building(f_name) #do opening and data preprocessing cl.insert_dataframe("INSERT INTO {} (here column names) VALUES".format(tabname),chunk, settings={'use_numpy':True})

I expected that it will push to db... but no, any ideas?

I've ran already through my chunk and no one key in it is dict..

Traceback (most recent call last):
  File "/PATH/.ve/lib/python3.11/site-packages/pandas/core/arrays/categorical.py", line 441, in __init__
    codes, categories = factorize(values, sort=True)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "PATH/.ve/lib/python3.11/site-packages/pandas/core/algorithms.py", line 818, in factorize
    codes, uniques = factorize_array(
                     ^^^^^^^^^^^^^^^^
  File "PATH/.ve/lib/python3.11/site-packages/pandas/core/algorithms.py", line 574, in factorize_array
    uniques, codes = table.factorize(
                     ^^^^^^^^^^^^^^^^
  File "pandas/_libs/hashtable_class_helper.pxi", line 5943, in pandas._libs.hashtable.PyObjectHashTable.factorize
  File "pandas/_libs/hashtable_class_helper.pxi", line 5857, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'dict'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "PATH/new_tools.py", line 98, in <module>
    chunk_to_click(take_1, 'TABNAME')
  File "PATH/new_tools.py", line 69, in chunk_to_click
    cl.insert_dataframe(
  File "PATH/.ve/lib/python3.11/site-packages/clickhouse_driver/client.py", line 456, in insert_dataframe
    rv = self.send_data(sample_block, data, columnar=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "PATH/.ve/lib/python3.11/site-packages/clickhouse_driver/client.py", line 571, in send_data
    self.connection.send_data(block)
  File "PATH/.ve/lib/python3.11/site-packages/clickhouse_driver/connection.py", line 587, in send_data
    self.block_out.write(block)
  File "/PATH/.ve/lib/python3.11/site-packages/clickhouse_driver/streams/native.py", line 38, in write
    write_column(self.context, col_name, col_type, items,
  File "PATH/.ve/lib/python3.11/site-packages/clickhouse_driver/columns/service.py", line 157, in write_column
    column.write_data(items, buf)
  File "PATH/.ve/lib/python3.11/site-packages/clickhouse_driver/columns/base.py", line 87, in write_data
    self._write_data(items, buf)
  File "/PATH/.ve/lib/python3.11/site-packages/clickhouse_driver/columns/numpy/lowcardinalitycolumn.py", line 37, in _write_data
    c = pd.Categorical(items)
        ^^^^^^^^^^^^^^^^^^^^^
  File "PATH/.ve/lib/python3.11/site-packages/pandas/core/arrays/categorical.py", line 443, in __init__
    codes, categories = factorize(values, sort=False)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "PATH/.ve/lib/python3.11/site-packages/pandas/core/algorithms.py", line 818, in factorize
    codes, uniques = factorize_array(
                     ^^^^^^^^^^^^^^^^
  File "PATH/.ve/lib/python3.11/site-packages/pandas/core/algorithms.py", line 574, in factorize_array
    uniques, codes = table.factorize(
                     ^^^^^^^^^^^^^^^^
  File "pandas/_libs/hashtable_class_helper.pxi", line 5943, in pandas._libs.hashtable.PyObjectHashTable.factorize
  File "pandas/_libs/hashtable_class_helper.pxi", line 5857, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'dict'

Data types in DF:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7304 entries, 0 to 7303
Data columns (total 18 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   amplitude_id        7304 non-null   uint32        
 1   user_properties     7304 non-null   object        
 2   app                 7304 non-null   uint16        
 3   city                7087 non-null   object        
 4   country             7304 non-null   object        
 5   device_type         6100 non-null   object        
 6   device_id           7304 non-null   object        
 7   event_type          7304 non-null   object        
 8   event_properties    7304 non-null   object        
 9   event_time          7304 non-null   datetime64[ns]
 10  user_creation_time  0 non-null      datetime64[ns]
 11  ip_address          7304 non-null   object        
 12  platform            7304 non-null   object        
 13  session_id          7304 non-null   uint32        
 14  user_id             6808 non-null   float32       
 15  version_name_1      7304 non-null   float32       
 16  version_name_2      7304 non-null   float32       
 17  role_type           7304 non-null   object        
dtypes: datetime64[ns](2), float32(3), object(10), uint16(1), uint32(2)
memory usage: 841.8+ KB
  • What is the full stack trace of the TypeError? – Geoff Genz Feb 24 '23 at 15:35
  • @GeoffGenz I added the full stacktrace, and it looks like that its a pandas exception. – maiseo4allPYwarrior Feb 25 '23 at 04:45
  • It looks like you including a Map or other complex type into a dataframe that clickhouse-driver is not able to handle. What are the datatypes in your dataframe? – Geoff Genz Feb 25 '23 at 17:23
  • @GeoffGenz please have a look I added dtypes, there are not any complex types there. – maiseo4allPYwarrior Feb 27 '23 at 06:12
  • Object can contain a complex type. In particular, what is event_properties? That sounds like a possible dictionary. – Geoff Genz Feb 27 '23 at 22:56
  • @GeoffGenz Yes you are right, user_properties and events_properties is a dictionaries. So maybe I should put them just like a strings... I used this type in click house -> event_properties LowCardinality(String) – maiseo4allPYwarrior Feb 28 '23 at 05:19
  • converting the dictionaries in that pandas column to a strings before insert might be the fastest way to get it to work. You can also look at using a ClickHouse Map column or an JSON column, although you would probably have to switch the clickhouse-connect driver to insert into a JSON column. – Geoff Genz Feb 28 '23 at 05:36

0 Answers0