How do I extract variables that repeat from an Excel Column using Python?

Question

I'm a beginner at Python and I have a school proyect where I need to analyze an excel document with information. It has aproximately 7 columns and more than 1000 rows.

Theres a column named "Materials" that starts at B13. It contains a code that we use to identify some materials. The material code looks like this -> 3A8356. There are different material codes in the same column they repeat a lot. I want to identify them and make a list with only one code, no repeating. Is there a way I can analyze the column and extract the codes that repeat so I can take them and make a new column with only one of each material codes?

An example would be:

12 Materials    
13 3A8356
14 3A8376
15 3A8356
16 3A8356
17 3A8346
18 3A8346

and transform it toosomething like this:

1 Materials
2 3A8346
3 3A8356
4 3A8376

@GrantMcCloskey that's a bit broad to the scope of the question, isn't it pal? — , Aug 17 '18 at 20:44
@J.C.Rocamonde He did not post code so I assume he has no base to work off of. yeah? — bison, Aug 17 '18 at 20:45
If a list of the unique entries is what you're after, look into [`pd.Series.unique`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html) — ALollz, Aug 17 '18 at 20:48
It would be more productive if you try to address the problem first, write some code, get an specific question, and then come back — Omar Gonzalez, Aug 17 '18 at 20:55
Please, take a look at my answer; I hope it helps you with your issue. And in the future, try to write a bit more concise questions so that people get a clear picture of your problem, and don't have to guess. — , Aug 17 '18 at 21:03
You should at least include some code demonstrating what you have tried: SO is not the place to get people to do your homework. — Charlie Clark, Aug 18 '18 at 14:01

score 1 · Answer 1 · 2018-08-17T20:57:09.523

Yes.

If df is your dataframe, you only have to do df = df.drop_duplicates(subset=['Materials',], keep=False)

To load the dataframe from an excel file, just do:

import pandas as pd
df = pd.read_excel(path_to_file)

the subset argument indicates which column headings you want to look at.

Docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

For the docs, the new data frame with the duplicates dropped is returned so you can assign it to any variable you want. If you want to re_index the first column, take a look at:

new_data_frame = new_data_frame.reset_index(drop=True)

Or simply

new_data_frame.reset_index(drop=True, inplace=True)

How do I extract variables that repeat from an Excel Column using Python?

1 Answers1