3

I'm uploading multiple files to flask using a form, I'm getting the file objects in the flask backend without a problem but the issue is I want to read the PDF files to extract text from them. I can't do it on the file objects I received from the form, another method I thought of was saving the file in the local storage then read them again when I did that using file.save(path, filename) it created an empty text file with the name - filename.pdf

app=Flask(__name__)


@app.route('/')
def index():
    return '''
        <form method='POST' action='/saveData'>
        <input type='file' name='testReport'>
        <input type='submit'>
        </form>
    '''

@app.route('/saveData', methods=['POST'])
def saveData():
    if 'testReport' in request.files:
        testReport= request.files['testReport']
        #This isn't working, a text file is saved with the same name ,ending in pdf
        testReport.save(os.path.join(app.config['UPLOAD_FOLDER'], testReport.filename))       
        return f'<h1>File saved {testReport.filename}</h1>'
        
    else:
        return 'Not done'

How do we operate on PDF files after uploading them to flask ?

Shashank Prasad
  • 474
  • 8
  • 11

2 Answers2

1

How do we operate on PDF files after uploading them to flask ?

You should treat them just like normal PDF files - if they were uploaded via Flask application or gathered using other method is irrelevant here. As you

want to read the PDF files to extract text from them.

you should use PDF text-extraction tool, for example pdfminer.six, as this is external module you need to install it first: pip install pdfminer.six

Daweo
  • 31,313
  • 3
  • 12
  • 25
  • Is there a way to save the PDFs in my PC? – Shashank Prasad May 14 '21 at 12:10
  • @ShashankPrasad according to [Flask Uploading Files Guide](https://flask.palletsprojects.com/en/2.0.x/patterns/fileuploads/) `form` should have `enctype` set to `multipart/form-data` (`
    – Daweo May 14 '21 at 12:16
0

You can directly follow the flask own way as mentioned [here]

This easily works with pdfs. Just don't forget to include your extension in ALLOWED_EXTENSIONS