1

Format Like this email excel file

name       email
A          A@gmail.com
B          B@gmailcom
C          c@gmail.com
A          A@gmail.com
B          B@gmail.com

In second excel file outfile.csv This is the output

name       email               count
    A          A@gmail.com         2
    B          B@gmailcom          2
    C          c@gmail.com          1

This is python code First, I read the excel file

data_file=pd.read_excel('email.xlsx')
writer = csv.writer(open('outfiles.csv','wb'))
code = defaultdict(int)
for row in data_file:
    code[row[0]] += 1
# now write the file
for row in code.items():
   writer.writerow(row)

Error:

writer.writerow(row) TypeError: a bytes-like object is required, not 'str'

I am getting this error so could you please help me out.

Ashish
  • 6,791
  • 3
  • 26
  • 48
Vignesh
  • 25
  • 3

1 Answers1

2

If you just want to count the duplicates, use pandas.DataFrame.unique()!

import pandas as pd
data = pd.read_excel('email.xlsx') 
unique = data.column_name.unique() 
duplicates = len(data)-len(unique)
print("number of duplicate rows is:",duplicates)

you just need to know the column_name, you can see all using print(data.columns)

Guinther Kovalski
  • 1,629
  • 1
  • 7
  • 15
  • 1
    Almost. You need to use the `DataFrame.duplicated()` method to count duplicated rows. `unique()` will include rows where just one exists in the `DataFrame` – moo Jan 04 '20 at 14:38