1

Im extracting text from a pdf and passing it into a .txt fillet afterwards clean it up and select the parts I want to keep. So I installed the PyPDF2 library. I managed to extract the text from the pdf and copy it into a .txt file. But when I print the lines inside the .txt file the first line is always a "/" followed by the .txt file's name. The piece of code is the following:

import re

f=open('/Users/kenny/Documents/Atomtest1/Analizador_sintaxis/cleanpage.txt','r')
for h in f:
    h=h.strip()
    if re.search('\S+',h):
        print(h)

This is the .txt file, cleanpage.txt :

hello

my name is alfred

And this is the output I receive when I run the code with a virtual environment that has PyPDF2 installed:

/trial.py
hello
my name is alfred

But if I run the program in a virtual environment that doesn't have PyPDF2 installed the output is the following:

hello
my name is alfred

Does anyone know what it is that is causing this variation in the output of the same program when run in different virtual environments. My best guess is that there is some overlap of keywords belonging to basic python and PyPDF2 of which im not aware. Any responses are greatly appreciated.

Arthur
  • 11
  • 4
  • 2
    That first line isn't coming from this script. Do you have this problem with any other scripts? – Barmar Jan 31 '23 at 18:50
  • 1
    Welcome to Stack Overflow. "But when I print the lines inside the .txt file the first line is always a "" followed by the .txt file's name." What you show in the example doesn't seem to match that description. Also, how exactly do you run the program in each virtual environment? – Karl Knechtel Jan 31 '23 at 18:53
  • @Barmar yes I have this problem with other scripts if I run them in the same virtual enviroment. – Arthur Feb 03 '23 at 11:59
  • @KarlKnechtel thx for pointing it out I meant to type "/" but I seemed to have not pressed said key. I run them in Virtual Studio code by selecting the different python interpreters. I have also tried activating the different virtual environments manually by selecting them via the integrated console in VSCode. If this isn't the proper way to select the venvs, do let me know. – Arthur Feb 03 '23 at 12:03
  • "thx for pointing it out I meant to type "/" but I seemed to have not pressed said key", no, the issue I am raising is not about the slash. The issue is that you say that you are reading a file named `cleanpage.txt`, and that "the first line [of the output] is always a "/" **followed by the .txt file's name**"; but in the output you show, it does not say `cleanpage.txt`, but instead `trial.py`. – Karl Knechtel Feb 03 '23 at 12:07
  • "I have also tried activating the different virtual environments manually by selecting them via the integrated console in VSCode" After activating the virtual environment, **how do you run the program**? By using a menu option etc. in the IDE? By manually typing in a command at the terminal? Something else? – Karl Knechtel Feb 03 '23 at 12:08
  • @KarlKnechtel y usually use a menu option that runs the program automatically. Just tried to replicate it again after restarting the computer due to some problems with other applications and I can no longer replicate the problem. So im now even less sure what was causing it. Btw thx for the help so far, don't want to seems ungreatful for your attention and patience. – Arthur Feb 03 '23 at 12:59
  • @KarlKnechtel "output you show, it does not say cleanpage.txt, but instead trial.py" I should have specified that what I meant is that it prints out the name of the file that is being executed, in this case the name of the file is "trial.py" – Arthur Feb 05 '23 at 16:50

0 Answers0