Here are uploaded pdfs and it will convert it into text. After converting into text I use a regular expression to get some specific data from the pdfs. Now there are various kinds of pdfs and I have to use different types of regular expression for each pdf. but I am facing problem to distinguish the pdf in the if conditions just like below. What I have done here is only going to the first if condition. how can I pass the pdf into its desire place I meant into the specific regular expression I have created. Or is there any other ways to do that mainly I just wanted to build up pdf extractor for some specific data.
def upload(request):
if request.method == 'POST':
form = PoForm(request.POST, request.FILES)
if form.is_valid():
form.save()
file_name = form.cleaned_data['pdf'].name
print(form.cleaned_data['pdf'].name)
text=convert_pdf_to_txt(file_name)
text=text.replace('\n','')
print(text)
path = 'media/pos/pdfs/{}'.format(file_name)
print(path)
basename = os.path.basename(path)
if file_name == basename:
print(basename)
print(file_name)
regex_Quantity ='Quantity:\s?([0-9]+)'
regex_style_no ='No:\s\s\s\s?([0-9]+)'
elif file_name == basename:
print("print2")
print(basename)
regex_Quantity = 'Total Units\s?([0-9\,]+)'
regex_style_no = 'Number:\s?([0-9]+)'
elif file_name == basename:
print(basename)
print("print3")
regex_Quantity ='PO\s?([0-9\.]+)'
regex_style_no = 'Article-No.:\s?([0-9]+)'