-1

I'm a beginner at Python and I have a school proyect where I need to analyze an excel document with information. It has aproximately 7 columns and more than 1000 rows.

Theres a column named "Materials" that starts at B13. It contains a code that we use to identify some materials. The material code looks like this -> 3A8356. There are different material codes in the same column they repeat a lot. I want to identify them and make a list with only one code, no repeating. Is there a way I can analyze the column and extract the codes that repeat so I can take them and make a new column with only one of each material codes?

An example would be:

12 Materials    
13 3A8356
14 3A8376
15 3A8356
16 3A8356
17 3A8346
18 3A8346

and transform it toosomething like this:

1 Materials
2 3A8346
3 3A8356
4 3A8376
  • @GrantMcCloskey that's a bit broad to the scope of the question, isn't it pal? –  Aug 17 '18 at 20:44
  • 1
    @J.C.Rocamonde He did not post code so I assume he has no base to work off of. yeah? – bison Aug 17 '18 at 20:45
  • If a list of the unique entries is what you're after, look into [`pd.Series.unique`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html) – ALollz Aug 17 '18 at 20:48
  • It would be more productive if you try to address the problem first, write some code, get an specific question, and then come back – Omar Gonzalez Aug 17 '18 at 20:55
  • Please, take a look at my answer; I hope it helps you with your issue. And in the future, try to write a bit more concise questions so that people get a clear picture of your problem, and don't have to guess. –  Aug 17 '18 at 21:03
  • You should at least include some code demonstrating what you have tried: SO is not the place to get people to do your homework. – Charlie Clark Aug 18 '18 at 14:01

1 Answers1

1

Yes.

If df is your dataframe, you only have to do df = df.drop_duplicates(subset=['Materials',], keep=False)

To load the dataframe from an excel file, just do:

import pandas as pd
df = pd.read_excel(path_to_file)

the subset argument indicates which column headings you want to look at.

Docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

enter image description here

For the docs, the new data frame with the duplicates dropped is returned so you can assign it to any variable you want. If you want to re_index the first column, take a look at:

new_data_frame = new_data_frame.reset_index(drop=True)

Or simply

new_data_frame.reset_index(drop=True, inplace=True)