I have a data set with a column which has google drive link for resumes, I have 5000 rows so there are 5000 links , I am trying to extract information like years of experience and salary from these resumes in 2 separate columns. so far I've seen so many examples mentioned here on SO.
For example: the code mentioned below can only read the data from one file , how do I replicate this to multiple rows ?
Please help me with this , else I will have to manually go through 500 resumes and fill in the data
Hoping that I'll get a solution for this painful problem that I have.
pdf_file = open('sample.pdf', 'rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
page = read_pdf.getPage(0)
page_content = page.extractText()
print page_content.encode('utf-8')
#to extract salary , experience using regular expressions
import re
prog = re.compile("\s*(Name|name|nick).*")
result = prog.match("Name: Bob Exampleson")
if result:
print result.group(0)
result = prog.match("University: MIT")
if result:
print result.group(0)