Python: error while trying to scrape youtube

Question

while trying to scrape the homepage of youtube for the titles of each video im running this code

import request
from bs4 import BeautifulSoup

url = 'https://www.youtube.com'
html = requests.get(url)
soup = BeautifulSoup(html.content, "html.parser")
print(soup('a'))

and its returning this error

Traceback (most recent call last):
File "C:\Users\kenda\OneDrive\Desktop\Projects\youtube.py", line 7, in < 
<module>
print(soup('a'))
File "C:\Users\kenda\AppData\Local\Programs\Python\Python36- 
32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f384' in 
position 45442: character maps to <undefined>
[Finished in 4.83s]

how do i fix this? and why is it doing this specifically when im scraping youtube

Looks like you are trying to print a Christmas Tree (Emoji or something similar, I guess...) with bad encoding settings. Have you already had a look at this: https://stackoverflow.com/questions/15056850/python-except-for-unicodeerror ? — floatingpurr, Dec 30 '18 at 17:39

cincinnatus · Answer 1 · 2018-12-30T18:35:27.223

1

Urllib is much better and it's comfortable for use.

from urllib.request import urlopen

from bs4 import BeautifulSoup

The urlopen function will transform the url into html

url = 'https://www.youtube.com'
html = urlopen(url)

beautifulsoup will perse the html

soup = BeautifulSoup(html, 'html.parser')
print(soup.find_all('a'))

if you absolutely want to do it with requests, the solution is:

import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com'
resp = requests.get(url)
html = resp.text
soup = BeautifulSoup(html, 'html.parser')
print(soup.find_all('a'))

edited Dec 30 '18 at 18:35

answered Dec 30 '18 at 17:48

cincinnatus

27
1
10

the requests module is much easier, i'd rather just figure out how to do this – Kendall Kelly Dec 30 '18 at 18:11

Python: error while trying to scrape youtube

1 Answers1