Count the duplicate rows in excel using python and i am getting error TypeError: a bytes-like object is required, not 'str'

Question

Format Like this email excel file

name       email
A          A@gmail.com
B          B@gmailcom
C          c@gmail.com
A          A@gmail.com
B          B@gmail.com

In second excel file outfile.csv This is the output

name       email               count
    A          A@gmail.com         2
    B          B@gmailcom          2
    C          c@gmail.com          1

This is python code First, I read the excel file

data_file=pd.read_excel('email.xlsx')
writer = csv.writer(open('outfiles.csv','wb'))
code = defaultdict(int)
for row in data_file:
    code[row[0]] += 1
# now write the file
for row in code.items():
   writer.writerow(row)

Error:

writer.writerow(row) TypeError: a bytes-like object is required, not 'str'

I am getting this error so could you please help me out.

Open the file `open('outfiles.csv','w')` without the `b` character (you don't want byte file mode) — Andrej Kesely, Jan 04 '20 at 10:50
Please refer to https://stackoverflow.com/questions/34283178/typeerror-a-bytes-like-object-is-required-not-str-in-python-and-csv . — Swetha Shanmugam, Jan 04 '20 at 11:28

Guinther Kovalski · Accepted Answer · 2020-01-04T14:11:37.707

2

If you just want to count the duplicates, use pandas.DataFrame.unique()!

import pandas as pd
data = pd.read_excel('email.xlsx') 
unique = data.column_name.unique() 
duplicates = len(data)-len(unique)
print("number of duplicate rows is:",duplicates)

you just need to know the column_name, you can see all using print(data.columns)

edited Jan 04 '20 at 14:11

answered Jan 04 '20 at 14:06

Guinther Kovalski

1,629
1
7
15

1

Almost. You need to use the `DataFrame.duplicated()` method to count duplicated rows. `unique()` will include rows where just one exists in the `DataFrame` – moo Jan 04 '20 at 14:38

Count the duplicate rows in excel using python and i am getting error TypeError: a bytes-like object is required, not 'str'

1 Answers1