At the moment my code is extracting data out of a PDF & counting the word frequency. I've been trying for a while now to arrange it in order of frequency but haven't been able to. I've looked at multiple similar answers but can't find an answer that I can get to work. Can someone point out what I need to do?
import PyPDF2
import re
pdfFileObj = open('ch8.pdf', 'rb') #Open the File
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) #Read the file
frequency = {} #Create dict
print "Number of Pages %s " % pdfReader.numPages #Print Num Pages
pageObj = pdfReader.getPage(0) # Get the first page
match_pattern = re.findall(r'\b[a-z]{3,15}\b', pageObj.extractText()) #Find the text
for word in match_pattern: #Start counting the frequency
word = word.lower()
count = frequency.get(word,0)
frequency[word] = count + 1
frequency_list = frequency.keys()
for words in frequency_list:
print words, frequency[words]
Thanks in Advance.