I am building a Python code for scraping colors from a website and linking them to the elements that have that color (for example, I would like to associate all the <p>
with their color).
The way I am doing it is that I am trying to access the website CSS and from there scrape all the hex colors present, and eventually linking it to its selector.
The problem is, there seems to be a problem with getting the CSS url. I am using Beautiful Soup to parse the html, but when trying to get the CSS I previously got the error:
MissingSchema: Invalid URL '/_next/static/css/19cb64e37006115a.css': No scheme supplied. Perhaps you meant http:///_next/static/css/19cb64e37006115a.css?
So I added a snippet that allows to add the protocol http when not present.
Anyway, I get the new error: InvalidURL: Invalid URL 'http:///_next/static/css/19cb64e37006115a.css': No host supplied.
There still is some problem with getting the right CSS url.
Here is the full code:
import re
import cssutils
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlsplit, urlunsplit
url = 'https://www.endy.com/'
# Retrieve the HTML code of the website
response = requests.get(url)
html = response.text
# Use BeautifulSoup to find the CSS file(s) and extract the URLs
soup = BeautifulSoup(html, 'html.parser')
css_urls = [link['href'] for link in soup.find_all('link', rel='stylesheet')]
# Create a dictionary to store the color and corresponding selectors
color_dict = {}
# Retrieve the CSS file(s) and parse the CSS rules
for css_url in css_urls:
# Add default scheme if none is provided
if not urlsplit(css_url).scheme:
css_url = urlunsplit(('http',) + urlsplit(css_url)[1:])
css_response = requests.get(css_url)
css_text = css_response.text
sheet = cssutils.parseString(css_text)
# Extract CSS rules containing hexadecimal color codes
for rule in sheet:
if rule.type == rule.STYLE_RULE:
css_text = rule.selectorText + ' {' + rule.style.cssText + '}'
hex_colors = re.findall(r'#(?:[0-9a-fA-F]{3}){1,2}\b', css_text)
if hex_colors:
for color in hex_colors:
if color not in color_dict:
color_dict[color] = []
color_dict[color].append(rule.selectorText)
# Print the color and corresponding selectors
for color, selectors in color_dict.items():
print(f"Color: {color}")
print(f"Selectors: {', '.join(selectors)}")
print("------------------------------")
Any help will be greatly appreciated. Thank you!