I'm working in a dataset construction, make of dataLayer variable (object) information. I want to automatized a classification process of pages with machine learning. enter image description here
Asked
Active
Viewed 397 times
1 Answers
0
Yes, there is.
If the variable is statically assigned in e.g. a <script>
block, then you can just parse the HTML with e.g. Beautiful Soup, find the script block and get the result.
More likely, though, the data is dynamically generated after the page loads (or in separate script blocks), so you'd need e.g. Playwright to automate a headless browser, and then read the variable from there.
Playwright example
from playwright.sync_api import sync_playwright, BrowserContext
def get_datalayer(ctx: BrowserContext, url: str):
page = ctx.new_page()
page.goto(url)
page.wait_for_load_state("networkidle")
return page.evaluate("window.dataLayer")
with sync_playwright() as p:
browser = p.chromium.launch()
with browser.new_context() as bcon:
data_layer = get_datalayer(bcon, "https://www.berceaumagique.com/")
print(data_layer)
This prints out
[
{
"UtmSource": "",
"EmailHash": "...",
"NewCustomer": "0",
"AcceptFunctionalCookie": "",
"AcceptTargetingCookie": "",
"IdUser": "",
"Page": "home",
"RealPage": "home",
"urlElitrack": "...",
},
{"google_tag_params": {"ecomm_pagetype": "home"}},
{"PageType": "HomePage"},
{"EffinityPage": "home", "Session": "0", "NewCustomer": "0"},
{"gtm.start": 1658135295957, "event": "gtm.js", "gtm.uniqueEventId": 1},
{
"event": "axeptio_update",
"axeptio_authorized_vendors": [],
"gtm.uniqueEventId": 19,
},
{"event": "gtm.dom", "gtm.uniqueEventId": 22},
{"event": "gtm.js", "gtm.uniqueEventId": 23},
{
"event": "promotionsView",
"ecommerce": {
"promoView": {
"promotions": [
{
"id": "slider-1",
"name": "rentree-scolaire",
"creative": "home slider",
"position": "1",
}
]
}
},
"gtm.uniqueEventId": 24,
},
...
]

AKX
- 152,115
- 15
- 115
- 172