Why is .loc[]
producing duplicate rows in my DataFrame? I'm trying to select a few columns from m3, a DataFrame with 47 columns,to create a new DataFrame called output.
The problem: after accessing m3's columns with .loc[]
, output has way more duplicates than m3 started with. Where could these duplicates have come from? I haven't found anything online about .loc[]
duplicating rows. The output DataFrame is declared on the line that reads output = m3.loc[...]
, by the way.
The Code:
print("ARE THERE DUPLICATES in m3? ")
print(m3.duplicated().loc[lambda x: x==True])
output = m3.loc[:,["PLC_name", "line", "track", "notes", "final_source",
"s_name", "s_line", "s_track", "loc", "alt_loc", "suffix", "alt_match_name"]]
print("ARE THERE DUPLICATES in output? ")
print(output.duplicated().loc[lambda x: x==True].size, "duplicates")
The Terminal Output:
ARE THERE DUPLICATES in m3?
5241 True
5242 True
5243 True
5355 True
5356 True
5357 True
dtype: bool
ARE THERE DUPLICATES in output?
1838 duplicates
Of course, I could easily fix the problem by calling .drop_duplicates(keep="first")
, but I'm more interesting in learning why .loc[]
displays this behavior.