0

I am new on Python. I would like to find the duplicated lines in a data frame. To explain myself, I have the following data frame

type(data)
pandas.core.frame.DataFrame

data.head()

   User Hour    Min Day Month   Year    Latitude    Longitude
0   0   1   48  17  10  2010    39.75000    -105.000000
1   0   6   2   16  10  2010    39.90625    -105.062500
2   0   3   48  16  10  2010    39.90625    -105.062500
3   0   18  25  14  10  2010    39.75000    -105.000000

I would like to find the duplicated lines in this data frame and to return the 'User' that corresponds to this line.

Thanks a lot,

Mitch
  • 69
  • 1
  • 1
  • 4
  • 1
    There are no duplicated lines in this data. What constitutes duplicate in this case? Did you check the docs? http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.duplicated.html – Woody Pride Dec 17 '15 at 21:05
  • 1
    Did you check other SO questions such as this one that provides a very clear answer: http://stackoverflow.com/questions/26244309/how-to-analyze-all-duplicate-entries-in-this-pandas-dataframe – Woody Pride Dec 17 '15 at 21:05
  • I tried the Counter from collection to begin with, so it can give me the **number** of duplicated data. But Counter works only on single column data frame. @Chris – Mitch Dec 17 '15 at 21:05
  • Thanks a lot @WoodyPride, I have not seen this – Mitch Dec 17 '15 at 21:25

1 Answers1

0

Is this what you are looking for?

user = data[data.duplicated()]['User']
screenpaver
  • 1,120
  • 8
  • 14