Looking at the latest pandas
doc, the actual kwarg
to be used is chunksize
, not chunk_size
. Please see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html.
Since kedro
only wraps your save_args
and passes them to pd.DataFrame.to_sql
these need to match:
def _save(self, data: pd.DataFrame) -> None:
try:
data.to_sql(**self._save_args)
except ImportError as import_error:
raise _get_missing_module_error(import_error) from import_error
except NoSuchModuleError as exc:
raise _get_sql_alchemy_missing_error() from exc
EDIT: Once you have this working in your pipeline, the docs show that pandas.DataFrame.read_sql
with chunksize
set will return type Iterator[DataFrame]
. This means that in your node function, you should iterate over the input (and annotate accordingly, if appropriate) such as:
def my_node_func(input_dfs: Iterator[pd.DataFrame], *args):
for df in input_dfs:
...
This works for the latest version of pandas
. I have noticed, however, that pandas
is aligning the API so that read_csv
with chunksize
set returns a ContextManager
from pandas>=1.2
so I would expect this change to occur in read_sql
as well.