0

while trying to scrape the homepage of youtube for the titles of each video im running this code

import request
from bs4 import BeautifulSoup

url = 'https://www.youtube.com'
html = requests.get(url)
soup = BeautifulSoup(html.content, "html.parser")
print(soup('a'))

and its returning this error

Traceback (most recent call last):
File "C:\Users\kenda\OneDrive\Desktop\Projects\youtube.py", line 7, in < 
<module>
print(soup('a'))
File "C:\Users\kenda\AppData\Local\Programs\Python\Python36- 
32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f384' in 
position 45442: character maps to <undefined>
[Finished in 4.83s]

how do i fix this? and why is it doing this specifically when im scraping youtube

Kendall Kelly
  • 121
  • 1
  • 10
  • Looks like you are trying to print a Christmas Tree (Emoji or something similar, I guess...) with bad encoding settings. Have you already had a look at this: https://stackoverflow.com/questions/15056850/python-except-for-unicodeerror ? – floatingpurr Dec 30 '18 at 17:39

1 Answers1

1

Urllib is much better and it's comfortable for use.

from urllib.request import urlopen

from bs4 import BeautifulSoup

The urlopen function will transform the url into html

url = 'https://www.youtube.com'
html = urlopen(url)

beautifulsoup will perse the html

soup = BeautifulSoup(html, 'html.parser')
print(soup.find_all('a'))

if you absolutely want to do it with requests, the solution is:

import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com'
resp = requests.get(url)
html = resp.text
soup = BeautifulSoup(html, 'html.parser')
print(soup.find_all('a'))
cincinnatus
  • 27
  • 1
  • 10