0

I have a fairly large pandas DataFrame that I am attempting to INSERT into a MySQL table.

import pandas as pd 

# mysql connection
import pymysql
from sqlalchemy import create_engine
user = 'user'
pwd = 'xxxxx'
host =  'x.com'
port = 3306
database = 'dbname'

engine = create_engine("mysql+pymysql://{}:{}@{}/{}".format(user,pwd,host,database))


# column dtypes
[
  dtype('O'),
  dtype('O'),
  dtype('O'),
  dtype('int64'),
  dtype('float64'),
  dtype('float64'),
  dtype('O'),
  dtype('O')
]

Here are the steps I take for dict to str:

# Find columns containing dicts
col_dict = [i for i in df.columns if isinstance(df[i][0],dict)]
print(col_dict)
[] <-- empty list

I checked the datatype of columns itself, not the data and it's all <class 'str'>.

col_type = [type(i) for i in df.columns] print(col_dict)

# Dict to string
df[col_dict] = df[col_dict].astype(str)

# Insert into SQL
con = engine.connect()
df.to_sql(name="df_table", con=con, if_exists = 'replace')
con.close()

Traceback:

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_item(val, charset, mapping)
     21 
     22     if encoder in (escape_dict, escape_sequence):
---> 23         val = encoder(val, charset, mapping)
     24     else:
     25         val = encoder(val, mapping)

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_dict(val, charset, mapping)
     30     n = {}
     31     for k, v in val.items():
---> 32         quoted = escape_item(v, charset, mapping)
     33         n[k] = quoted
     34     return n

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_item(val, charset, mapping)
     21 
     22     if encoder in (escape_dict, escape_sequence):
---> 23         val = encoder(val, charset, mapping)
     24     else:
     25         val = encoder(val, mapping)

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_sequence(val, charset, mapping)
     40         quoted = escape_item(item, charset, mapping)
     41         n.append(quoted)
---> 42     return "(" + ",".join(n) + ")"
     43 
     44 

TypeError: sequence item 0: expected str instance, dict found
kms
  • 1,810
  • 1
  • 41
  • 92
  • 1
    Please post sample data (and not as image) for a [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Parfait May 18 '22 at 18:54
  • I just think its interesting that the docs don't show an example of converting the dicts to strings (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html). You must have an interesting data structure? Surely you've tried just the df.to_sql() without the conversion? – Shmack May 18 '22 at 19:02
  • @Shmack I have, same error. – kms May 18 '22 at 19:07
  • I'm not too familiar with pandas, but is there a chance that to_sql is trying to identify the column headers when writing to your db, but they return a dict instead of a string - meaning that `col_dict = [i for i in df.columns if isinstance(df[i][0],dict)]` is only returning the data from the df and not the headers as well? – Shmack May 18 '22 at 19:18
  • @Parfait agree that an MRE would be helpful, however, my DataFrame has 100+ cols and the data is private. There is probably something in the data that the `col_dict` is not identifying, so I am looking for other ideas / things to try. – kms May 18 '22 at 19:19
  • @Shmack I checked the datatype of columns itself, not the `data` and it's all ``. `col_type = [type(i) for i in df.columns] print(col_dict)` – kms May 18 '22 at 19:26
  • Okay, I can't believe I totally misunderstood the `col_dict` - that was blatantly obvious that its just the column names. Sounds like you are going to have to do preprocessing for your data to convert the objects to strings, but maybe I'm wrong. I am just going to be a fly on the wall for this one. – Shmack May 18 '22 at 19:31
  • 1
    Try converting the col_dict to the preferred [StringDtype](https://pandas.pydata.org/docs/user_guide/text.html) and not `object` type: `df[col_dict] = df[col_dict].astype("string")` (notice `"str"` is not used here). Your above conversion just converts to `object` dtype. – Parfait May 18 '22 at 19:31
  • @kms Parfait's answer was it? – Shmack May 18 '22 at 19:52
  • @Shmack not quite, I had to `df.reset_index()` and then run the code and it worked. – kms May 18 '22 at 20:23
  • Nice. Glad you were able to fix it. – Shmack May 18 '22 at 20:26

1 Answers1

1

As recommended by pandas authors, consider using the StringDtype (introduced in Pandas 1.0.0) to store text data by converting with astype("string") and not object type derived from astype("str") which the latter type can store a mix of anything and not just strings. Consequently, your column of dictionaries remained dict types and hence the incompatibility database error.

To demonstrate, a column of dicts render its cell values as dict types even though the Series is an object dtype:

list_of_dicts = [
    {"test1": [1,2,3], "test2": ['a', 'b', 'c']},
    {"example1": [7, 8, 9], "example2": ['x', 'y', 'z']}
]

df = pd.DataFrame({"col_dict": list_of_dicts})
df["col_dict"]
# 0       {'test1': [1, 2, 3], 'test2': ['a', 'b', 'c']}
# 1    {'example1': [7, 8, 9], 'example2': ['x', 'y',...
# Name: col_dict, dtype: object

type(df["col_dict"][0])
# dict
isinstance(df["col_dict"][0], dict)
# True

However, converting to StringDType, column values render as str types. Note: the abbreviated str is not the same as "string".

df = pd.DataFrame({"col_dict": pd.Series(list_of_dicts, dtype="string")})
df["col_dict"]
# 0       {'test1': [1, 2, 3], 'test2': ['a', 'b', 'c']}
# 1    {'example1': [7, 8, 9], 'example2': ['x', 'y',...
# Name: col_dict, dtype: string

type(df["col_dict"][0])
# str
isinstance(df["col_dict"][0], dict)
# False

Below are the multiple ways to cast, map, or parse column of dicts to StringDType:

df[col_dict] = df[col_dict].astype("string")
df = pd.DataFrame({"col_dict": pd.Series([list_of_dicts], dtype="string")})
df = pd.read_csv("input.csv", dtype={"col_dict": "string"})
Parfait
  • 104,375
  • 17
  • 94
  • 125