Step1: You simply need to install pdftotext and put the .exe in the same working directory.
Step2: Copy the code down below and paste it in the same directory.
step3: Also keep in mind that the pdf files should also be in the same directory
step4: Run the .py file
Complete Code that worked for me :
import os
import glob
import subprocess
files=[]
#remember to put your pdftotxt.exe to the folder with your pdf files
for filename in glob.glob(os.getcwd() + '\\*.pdf'):
files.append(filename[0:-4]+".txt")
subprocess.call([os.getcwd() + '\\pdftotext', filename, filename[0:-4]+".txt"])
all_files=[]
for i in range(len(files)):
with open(files[i],'r') as f:
text=f.read()
text=text.split('carry one mark each')[1].split('WWW.UNITOPERATION.COM')[0]
text_ls=text.splitlines()
ques=[]
counter=1
for i in range(len(text_ls)):
if text_ls[i].startswith(str(counter)+'.'):
ques.append(''.join(text_ls[i:]).split('\n'[0]))
counter+=1
all_files.append(ques)
# Now you have list of all_files in which ques list is added
# You simply need take one by one element out from all_files and write it in a .txt file
# and that will be your task