-1

I am trying to open an online txt file using codecs.open. The code I have now is:

url = r'https://www.sec.gov/Archives/edgar/data/20/0000893220-96-000500.txt'
soup = BeautifulSoup(codecs.open(url, 'r',encoding='utf-8'), "lxml")

However, Python keeps reminding OSError:

OSError: [Errno 22] Invalid argument: 'https://www.sec.gov/Archives/edgar/data/20/0000893220-96-000500.txt'

I tried to replace "/" with "\". It still does not work. Is there any way to solve it? Since I have more than thousands of links to open, I do not quite want to download the online text files into my local drive.

I will appreciate it very much if someone can help here.

Thanks!

dara wong
  • 37
  • 5

1 Answers1

2

Is it something like this you're thinking of?

`from urllib.request import urlopen
url = urlopen('https://www.sec.gov/Archives/edgar/data/20/0000893220-96- 000500.txt')
 html = url.read().decode('utf-8')
 file = open('yourfile.txt', 'r')
 file.read(html)
 file.close`
Sean D'Arcy
  • 211
  • 1
  • 8