0

Everytime the code runs, the site opens, input is entered, then all that is returned is an error in the terminal "raise EmptyFileError("Cannot read an empty file") PyPDF2.errors.EmptyFileError: Cannot read an empty file". I have tried multiple things to try to fix the issue like a verification that the file size is not 0, but it seems to be a problem within my current code that I dont understand. Any help is much appreaciated.

I have tried veriying file sizes at the start of text extraction. I have tried different pdf readers like pdfplumber, which also returns with no contents in PDF. Code:

import openai
import gradio as gr
import PyPDF2
import tempfile
import re

# Authenticate with OpenAI API
openai.api_key = "MY KEY GOES HERE"


# Define function to extract text from PDF file
def extract_text_from_pdf(file):
    with open(file, "rb") as f:
        reader = PyPDF2.PdfReader(f)
        text = ""
        for i in range(len(reader.pages)):
            page = reader.pages[i]
            text += page.extract_text()
    return text

def extract_teacher_names(text):
    # Define a regular expression pattern to match names
    pattern = r'\b[A-Z][a-z]+ [A-Z][a-z]+\b'

    # Search for names in the text and return them as a list
    teacher_names = re.findall(pattern, text)
    return teacher_names

# Define function to handle the Gradio interface
def predict(input_file, subject, class_name):
    # Save the uploaded file to a temporary location
    with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as temp_pdf_file:
        temp_pdf_file.write(input_file.read())

    # Extract text from the saved PDF file
    text = extract_text_from_pdf(temp_pdf_file.name)

    # Extract teacher names from the text
    teacher_names = extract_teacher_names(text)

    teacher_names_str = ", ".join(teacher_names)
    prompt = f"Based on the grade distribution in the {class_name} class, which was taught by {teacher_names_str}, and based on the combination of the A's column given and B's column given in each class, who is the hardest teacher, the easiest teacher, and the most balanced teacher? Consider the teaching styles and reputations of each teacher in your analysis."
    
    # Use OpenAI API to analyze text and find information about the class
    response = openai.Completion.create(
        engine="text-davinci-002",
        prompt=prompt,
        max_tokens=1024,
        n=1,
        stop=None,
        temperature=0.7,
        timeout=10
    )
    
    # Extract the generated text from the OpenAI API response
    output_text = response.choices[0].text.strip()
    return output_text

# Define the Gradio interface
input_file = gr.inputs.File(label="Upload PDF file")
input_subject = gr.inputs.Textbox(label="Enter subject code (e.g. CS101)")
input_class = gr.inputs.Textbox(label="Enter class name (e.g. Calculus)")
output_text = gr.outputs.Textbox(label="Results")
iface = gr.Interface(fn=predict, inputs=[input_file, input_subject, input_class], outputs=output_text)

# Run the interface
iface.launch()

Liam
  • 13
  • 3
  • It seems like this error is independent of Gradio and OpenAI. Can you make an example which demonstrates this failure, while only using PyPDF2 to read the file? You can hardcode the name of the file, for example. I think that would be easier to debug, both for you and us. – Nick ODell Apr 18 '23 at 20:42

0 Answers0