1

I am working with the python pandas_dedupe package, specifically with pandas_dedupe.dedupe_dataframe.

I have trained the dedupe_dataframe module via the interactive prompts. But now I need to retrain the dedupe_dataframe module. How can I erase the training set and start from scratch?

I have tried deleting the dedupe_dataframe_learned_settings and dedupe_dataframe_training.json files, but then then the python script throws an error.

I work with PyCharm as my IDE.

Any hint would be much appreciated. Thanks!

Stefan
  • 53
  • 1
  • 1
  • 5

1 Answers1

0

pandas-dedupe v1.3.1, you simply need to do the following:

  1. delete dedupe_dataframe_learned_settings and dedupe_dataframe_training.json;
  2. run dedupe_dataframe setting update_model=False [note: this is the default].

This is the standard procedure. If it does not work, please provide more info related to the error you get.

iEriii
  • 403
  • 2
  • 7