How to use fuzzyWuzzy with two csv's?

Question

I am trying to compare two csv's that contain job titles. One csv contains job titles from the U.S. Bureau of Labor Statistics and the other contains a manually generated list of job titles. There are roughly 2000 job titles in each list. I am very much a beginner so it's very likely I have some glaring fundamental issues with my approach. Apologies in advance.

I am able to get all predicted_job values but for some reason they are only comparing to the first bls_job value.


from fuzzywuzzy import fuzz

bls_job_list = open("bls_jobs.csv", "r")
predicted_job_list = open("predicted_jobs.csv", "r")

for bls_job in bls_job_list.readlines():
    for predicted_job in predicted_job_list.readlines():
        print(bls_job + "," + predicted_job + "," + str((fuzz.partial_ratio(bls_job, predicted_job))) + "\n")

bls_job_list.close()
predicted_job_list.close()

I want to be able to get fuzzyRatio values for all values in both lists compared to each other.

INPUT _bls_sample:_

admiral, ceo, chief executive officer, chief financial officer, chief operating officer, chief sustainability officer, commissioner of internal revenue, coo, county commissioner, government service, executive governor, mayor, school superintendent, university president,

_predicted_sample:_

abstractor, accessioner, account coordinator, account executive, account manager, account representative, account service representative, account specialist, accountant, accounting clerk, accounting manager, accounting supervisor, accounts manager,

Below is a sample of my current output:

BLS_job_1 ,analyst ,25

BLS_job_1 ,analysis manager ,25

BLS_job_1 ,ambulance driver ,33

BLS_job_1 ,alf worker ,27

Hi Alex, you can use pandas also to read the csv file and then compare the job title,Thanks — JON, May 07 '19 at 05:30
@RussellB see sample below ``` bls_sample: admiral ceo chief executive officer chief financial officer chief operating officer chief sustainability officer commissioner of internal revenue coo county commissioner government service executive governor mayor school superintendent university president ``` ``` predicted_sample abstractor accessioner account coordinator account executive account manager account representative account service representative account specialist accountant accounting clerk accounting manager accounting supervisor accounts manager ``` — Alex, May 07 '19 at 13:57
Alex, Please read https://stackoverflow.com/help/how-to-ask and edit your question. Having a well formed question would help us to help you. — Bussller, May 08 '19 at 06:45

score 0 · Answer 1 · answered May 08 '19 at 07:39

I believe you're using generators in the for loop and that may be the reason. I have made a list of your jobs and iterating through each element for fuzzywuzzy comparison. Following is such an attempt:

from fuzzywuzzy import fuzz

bls_job_list = open("/russellb/data/py_devel/SO_answrs/input.csv", "r")
predicted_job_list = open("/russellb/data/py_devel/SO_answrs/compare.csv", "r")

bls_job_filtered = [line.replace('\r', '') for line in bls_job_list]
predicted_job_filtered = [line.replace('\r','') for line in predicted_job_list]


for idx, bls_job in enumerate(bls_job_filtered):
    for idw, predicted_job in enumerate(predicted_job_filtered):
        print(bls_job + "," + predicted_job + "," + str((fuzz.partial_ratio(bls_job, predicted_job))) + "\n")

And the output using the above code is:

admiral,
,abstractor,
,44

admiral,
,accessioner,
,50

admiral,
,account coordinator,
,50

admiral,
,account executive,
,35

admiral,
,account manager,
,50

admiral,
,account representative,
,47

admiral,
,account service representative,
,44

admiral,
,account specialist,
,56

admiral,
,accountant,
,33

admiral,
,accounting clerk,
,35

admiral,
,accounting manager,
,50

admiral,
,accounting supervisor,
,44

admiral,
,accounts manager,
,50

ceo,
,abstractor,
,60

ceo,
,accessioner,
,60

ceo,
,account coordinator,
,60

ceo,
,account executive,
,60

ceo,
,account manager,
,60

ceo,
,account representative,
,60

ceo,
,account service representative,
,60

ceo,
,account specialist,
,40

ceo,
,accountant,
,40

ceo,
,accounting clerk,
,60
...
...
...
school superintendent,
,accounting manager,
,41

school superintendent,
,accounting supervisor,
,48

school superintendent,
,accounts manager,
,42

university president,
,abstractor,
,36

university president,
,accessioner,
,48

university president,
,account coordinator,
,43

university president,
,account executive,
,26

university president,
,account manager,
,24

university president,
,account representative,
,57

university president,
,account service representative,
,59

university president,
,account specialist,
,35

university president,
,accountant,
,33

university president,
,accounting clerk,
,28

university president,
,accounting manager,
,25

university president,
,accounting supervisor,
,44

university president,
,accounts manager,
,22

How to use fuzzyWuzzy with two csv's?

1 Answers1