I am trying to compare two csv's that contain job titles. One csv contains job titles from the U.S. Bureau of Labor Statistics and the other contains a manually generated list of job titles. There are roughly 2000 job titles in each list. I am very much a beginner so it's very likely I have some glaring fundamental issues with my approach. Apologies in advance.
I am able to get all predicted_job values but for some reason they are only comparing to the first bls_job value.
from fuzzywuzzy import fuzz
bls_job_list = open("bls_jobs.csv", "r")
predicted_job_list = open("predicted_jobs.csv", "r")
for bls_job in bls_job_list.readlines():
for predicted_job in predicted_job_list.readlines():
print(bls_job + "," + predicted_job + "," + str((fuzz.partial_ratio(bls_job, predicted_job))) + "\n")
bls_job_list.close()
predicted_job_list.close()
I want to be able to get fuzzyRatio values for all values in both lists compared to each other.
INPUT _bls_sample:_
admiral, ceo, chief executive officer, chief financial officer, chief operating officer, chief sustainability officer, commissioner of internal revenue, coo, county commissioner, government service, executive governor, mayor, school superintendent, university president,
_predicted_sample:_
abstractor, accessioner, account coordinator, account executive, account manager, account representative, account service representative, account specialist, accountant, accounting clerk, accounting manager, accounting supervisor, accounts manager,
Below is a sample of my current output:
BLS_job_1 ,analyst ,25
BLS_job_1 ,analysis manager ,25
BLS_job_1 ,ambulance driver ,33
BLS_job_1 ,alf worker ,27