Questions tagged [drop-duplicates]

questions related to removing (or dropping) unwanted duplicate values

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about removing unwanted duplicates.

Applying PySpark dropDuplicates method messes up the sorting of the data frame

I'm not sure why this is the behaviour, but when I apply dropDuplicates to a sorted data frame, the sorting order is disrupted. See the following two tables in comparison. The following table is the output of sorted_df.show(), in which the sorting…

asked Nov 09 '21 at 19:23

xiexieni9527

votes

1 answer

python pandas: duplicated rows using sort_values and drop_duplicates

I have this dataframe in column stage I have 4 values : I Have duplicates rows in this dataframe, and I wanted to drop them, for example: I want to keep row #8015 and I don't have 2 rows with the same stage and the same tweet_id, for example: I…

python pandas dataframe nan drop-duplicates

asked Sep 11 '21 at 14:27

Asma

votes

1 answer

AttributeError: module 'pandas' has no attribute 'drop_duplicates'

The error: "AttributeError: module 'pandas' has no attribute 'drop_duplicates'" This is a new error on a section of code that has been working fine, the code in question: def The_function(): file = os.getcwd() + "the_file.xls" xls =…

python pandas drop-duplicates

asked Sep 04 '21 at 14:30

11l

votes

0 answers

Concatenate two dataframes and then drop all duplicates- not working properly?

I want to create a dataframe which has entries from df dataframe which don't exist in any of the other dataframes (dfA, dfB, dfC, dfD). Basically, entries from dfA, dfB, dfC, dfD are also contained in df, i.e. df is the superset of them and n(df) =…

python pandas dataframe concatenation drop-duplicates

asked Sep 02 '21 at 10:11

iamakhilverma

votes

1 answer

pandas DataFrame select specific data

I want to build a for loop to only select row 5, row 10 and row 14 in pandas. enter image description here The actual file include thousands of rows in similar format. Please teach me a function that can go over the entire file. Many Thanks…

python pandas dataframe drop-duplicates

asked Aug 31 '21 at 16:29

Yumeng Xu

votes

2 answers

Pandas one-to-one row merge, maintaining the structure on the left hand side?

a similar question to an unresolved SO question (Can one perform a left join in pandas that selects only the first match on the right?), but slightly more complex and with no obvious workaround. I am hoping that there may be some fresh functionality…

python pandas merge drop-duplicates

asked Jul 30 '21 at 13:12

piemashandgravy

votes

1 answer

I am trying to remove duplicate consequtive elements and keep the last value in data frame using pandas

There are two columns in the data frame and am trying to remove the consecutive element from column "a" and its corresponding element from column "b" while keeping only the last element. import pandas as…

python pandas dataframe drop-duplicates

asked May 08 '21 at 21:17

Kshtj

votes

0 answers

Python-Deleting duplicate rows Pandas (Specifically)

Here, is the data set which I'm working on Which looks like this. Basically, I want to delete duplicate rows specifically I know the drop_duplicate command but I need some help. Let me show you by sorting the data so that It'll give you a clear…

python pandas drop-duplicates

asked Mar 19 '21 at 08:40

kirti purohit

votes

1 answer

Is there a quick way to subset columns in PANDAS?

I am trying to setup a PANDAS project that I can use to compare and return the differences in excel and csv files over time. Currently I load the excel/csv files into pandas and assign them a version column. I assign them a "Version" column because…

pandas subset drop-duplicates

asked Jan 06 '21 at 17:04

TBergy

votes

0 answers

KeyError: Float64Index when running drop_duplicates

I have a dataframe with the duplicates in the t column, however when I run the drop_duplicates function the following error is returned. Could someone please explain how to fix this? print(df.columns) Index(['t', 'ax', 'ay', 'az', 'gx', 'gy', 'gz',…

python keyerror drop-duplicates

asked Dec 23 '20 at 22:19

gee_whiz

votes

1 answer

Drop duplicates and complete nan with oldest values and optimise runing time

I'm working on a data base with some columns, and I drop duplicates after sorting values by date (format Y-m-d). My df is like the following : id date name firstname 01 2020-04-01 max smith 04 2020-08-04 georges …

python pandas drop-duplicates

asked Dec 17 '20 at 11:00

mathilde

votes

1 answer

Does drop_duplicate guarantee to keep the first row and drop rest of the rows after sorting the dataframe in spark?

I have a dataframe, read from Avro file in Hadoop, with three columns (a,b,c), Where one is a key column and among two other columns one is of integer type and the other is of date type. I am ordering the frame by the integer column and date column…

python scala apache-spark apache-spark-sql drop-duplicates

asked Dec 10 '20 at 12:32

Vamen95

votes

2 answers

Drop duplicates row in Spark SQL based on custom function on a column in Java

I am trying to remove duplicates from my Dataset in Spark SQL in Java. My dataset has three columns. Let's say the name of the column are name, timestamp, and score. The name is the String representation of employee name and timestamp is in long…

java apache-spark apache-spark-sql drop-duplicates

asked Jul 20 '20 at 18:54

Ajay Kr Choudhary

1,304
1
14
23

votes

0 answers

Pandas drop_duplicates function not working properly

I'm working on building a bridge table between two tables with a many-to-many relationship. Table A contains employee IDs and job assignments (one employee can have more than one assignment). Table B has the same structure. The aim is to build a…

python python-3.x jupyter-notebook many-to-many drop-duplicates

asked Jun 03 '20 at 16:22

syd

votes

1 answer

Making df columns unique to one column permanently outside of df loop function

Note: Correction - the code returns AttributeError: 'str' object has no attribute 'drop_duplicates' I am trying to loop through a number of dfs and reduce my 'user_id' column to only unique values using the df.drop_duplicates(subset…

python pandas global-variables unique drop-duplicates

asked May 29 '20 at 11:28

rpatt97

Prev 1 2 3

…

9 10 Next