How to decide whether to keep the duplicate rows or remove them. I have two duplicate records but they refer to two different persons

Asked Mar 07 '21 at 14:48

Active Mar 07 '21 at 14:48

Viewed 65 times

I am trying to build an NLP model on this data set where I have data from some accidents where I need to predict the Accident Level. There are a total of 13 duplicate rows. But on looking into them I figure out that they are for different people involved in the same accident. I am not sure if I should drop or keep them.

I am new here. Please consider.

Here is a snapshot of those duplicate rows for the date 01-04-2016 00:00

Preview Dataset

asked Mar 07 '21 at 14:48

Priyanshi Tyagi

Hello, you have to be clear with the output you want. Based on which we would need an input v/s an expected dataframe. Right now you are asking for opinions and SO is not the correct place for this I think. Help: [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) – anky Mar 07 '21 at 14:52
Hey. OK. Is there any place we can get a general idea or opinions as you say? – Priyanshi Tyagi Mar 07 '21 at 14:57
I am not sure unfortunately, but found [this](https://meta.stackexchange.com/questions/130524/which-stack-exchange-website-for-machine-learning-and-computational-algorithms) (*consider reading all the answers*) which might help you. Good luck..!! – anky Mar 07 '21 at 14:59

How to decide whether to keep the duplicate rows or remove them. I have two duplicate records but they refer to two different persons

0 Answers0