Questions tagged [drop-duplicates]

questions related to removing (or dropping) unwanted duplicate values

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about removing unwanted duplicates.

See also:

144 questions
0
votes
1 answer

Compare two pandas data frame from csv

I have 2 csv files and i need to compare them using by pandas. The values in these two files are the same so I expect the df result to be empty but it shows to me they are different. Do you think i miss something when i read csv files? or another…
0
votes
1 answer

Pandas drop_duplicates() not working after add a row to DataFrame when read from a csv file

My code like below: indexing_file_path = 'indexing.csv' if not os.path.exists(indexing_file_path): df = pd.DataFrame([['1111', '20200101', '20200101'], ['1112', '20200101', '20200101'], ['1113',…
fish
  • 2,173
  • 2
  • 13
  • 18
0
votes
1 answer

How to df.drop_columns() but store values of one colums as list

at the moment I am working on some data have a problem with some duplicates. Here my problem in detail: I have the DF: Col1 Col2 Col3 'aa1' 'bb1' 'cc1' 'aa2' 'bb2' 'cc2' 'aa1' 'bb3' 'cc3' I can simply use…
Pet
  • 251
  • 1
  • 3
  • 14
0
votes
2 answers

Eliminate duplicates in MongoDB with a specific sort

I have a database composed by entries which correspond to work contracts. In the MongoDB database I have aggregated by specific worker, then the database - in a simplified version - looks like something like that. { "_id" :…
0
votes
1 answer

how to drop duplicates after merging two dataframes?

I have two dataframes , A= ID compponent weight 12 Cap 0.4 12 Pump 183 12 label 0.05 14 cap 0.6 B= ID compponent_B weight_B 12 Cap_B 0.7 12 Pump_B 189 12 label 0.05 when i do merge of this two…
chero
  • 61
  • 6
0
votes
0 answers

Pandas drop_duplicates only possible after to_csv and read_csv

I got two Data Frames which I combine and they definitely have duplicates as shown later: total_scrobbles = total_scrobbles.append(new_scrobbles) After that the drop_duplicates Function doesnt do anything. Not a single row is…
thepic
  • 13
  • 2
0
votes
1 answer

Python: Remove Duplicates From List of Dicts Based on DateTime Key

I want to reduce this list of dictionaries to take the most current record of the duplicates, where duplicates are determined by same project_name and same feature_group_name. How do I go about doing that? The way I'm doing it right now is as…
Riley Hun
  • 2,541
  • 5
  • 31
  • 77
0
votes
1 answer

Python pandas drop_duplicates inserts unnecessary " which lead to csv loading error

in my project I am loading every other day data from Twitter an append it to a csv file. This procedure leads to exact duplicates of tweets in my csv file. That's why I want to remove these exact duplicates. However, when I run the following…
0
votes
1 answer

python: drop_duplicates(subset='col_name', inplace=True), why some of the rows can not be dropped?

I'm going to drop duplicates by one of the columns, but some of the rows can be dropped. the wired thing is: if i read the 2 files directly instead of by my func1, func2, then apply the drop function, every thing is fine! update1: highly like is the…
Sean.H
  • 640
  • 1
  • 6
  • 18
0
votes
3 answers

Groupby to create a list

I am using JupyterLab to print some data in a spreadsheet in a specific way. I have two different files: 1) 2) For every original_id == id I want to group by country and list the brands and summing and listing the holding for each brand. The…
0
votes
3 answers

Remove repeated rows with inverted values

I have the following dataframe: print(df) col_1 col_2 A B B A A C I would like to remove the duplicated rows, with inverted values, obtaining: print(df_final) col_1 col_2 A …
Alessandro Ceccarelli
  • 1,775
  • 5
  • 21
  • 41
0
votes
2 answers

Pandas Drop Specified Duplicates After Concat

I'm trying to write a python script that concats two csv files and then drops the duplicate rows. Here is an example of the csv's I'm concating: csv_1 type state city date estimate id lux tx dal 2019/08/15 .8273452 …
JMV12
  • 965
  • 1
  • 20
  • 52
0
votes
1 answer

Pyspark dataframe not dropping all duplicates

I am stuck on what seems to be a simple problem, but I can't see what I'm doing wrong, or why the expected behavior of .dropDuplicates() is not working. a variable I use: print type(pk) print pk ('column1', 'column4') I have a…
nojohnny101
  • 514
  • 7
  • 26
0
votes
2 answers

How to drop_duplicate using different condition per group?

I have dataFrame and I need to drop duplicates per group ('col1') based on a minimum value in another column 'abs(col1 - col2)', but I need to change this condition for the last group by taking the max value in 'abs(col1 - col2)' that corresponding…
Sidhom
  • 935
  • 1
  • 8
  • 15
0
votes
1 answer

drop duplicates isnt working on my imported csv file

Looking for some help on this one. I do not know why but drop duplicates is not working, tried a loop with lambda. still nothing I can do will remove mutlple duplicates on the output. # Import files for use in the program: import pandas as…
ahhdioguy
  • 29
  • 4
1 2 3
9
10