-2

I have this assignment where I need to clean up the data and do feature engineering on a dataset but the data itself is very dirty as some of the data is shifted (on the wrong columns), or NULL. How can I clean up the data all using python? I'm not allowed to change the dataset in any way except using python?

Nic
  • 1
  • 3
    I think just googling "data cleaning python" will give you more answers than you'll get here. Stackoverflow is meant for other purposes. I hope you find what you need. – debsim Jun 22 '20 at 14:20
  • Can you add which methods of feature engineering are you looking to implement, that way people can be more specific with their answers. – pykam Jun 22 '20 at 14:43

2 Answers2

1

I recommend using pandas and NumPy, I have used the packages to import data from CSV and Excel files, then transform the existing columns using lambda functions, or you can drop columns and rows based on their values, using conditions to select the rows. Finally, you can also export back to any of the original formats, like Excel or CSV.

Here is an article from Real Python about cleaning data with those packages. I hope this can help get you started.

https://realpython.com/python-data-cleaning-numpy-pandas/

0

In general I would recommend using the pandas library (https://pandas.pydata.org/docs/index.html) for doing data cleaning in python. However your question is very vague and includes few specifics, making it hard to give any more advice than that.