0

I'm working in a dataset construction, make of dataLayer variable (object) information. I want to automatized a classification process of pages with machine learning. enter image description here

Andres
  • 11
  • 1

1 Answers1

0

Yes, there is.

If the variable is statically assigned in e.g. a <script> block, then you can just parse the HTML with e.g. Beautiful Soup, find the script block and get the result.

More likely, though, the data is dynamically generated after the page loads (or in separate script blocks), so you'd need e.g. Playwright to automate a headless browser, and then read the variable from there.

Playwright example

from playwright.sync_api import sync_playwright, BrowserContext


def get_datalayer(ctx: BrowserContext, url: str):
    page = ctx.new_page()
    page.goto(url)
    page.wait_for_load_state("networkidle")
    return page.evaluate("window.dataLayer")


with sync_playwright() as p:
    browser = p.chromium.launch()
    with browser.new_context() as bcon:
        data_layer = get_datalayer(bcon, "https://www.berceaumagique.com/")
        print(data_layer)

This prints out

[
    {
        "UtmSource": "",
        "EmailHash": "...",
        "NewCustomer": "0",
        "AcceptFunctionalCookie": "",
        "AcceptTargetingCookie": "",
        "IdUser": "",
        "Page": "home",
        "RealPage": "home",
        "urlElitrack": "...",
    },
    {"google_tag_params": {"ecomm_pagetype": "home"}},
    {"PageType": "HomePage"},
    {"EffinityPage": "home", "Session": "0", "NewCustomer": "0"},
    {"gtm.start": 1658135295957, "event": "gtm.js", "gtm.uniqueEventId": 1},
    {
        "event": "axeptio_update",
        "axeptio_authorized_vendors": [],
        "gtm.uniqueEventId": 19,
    },
    {"event": "gtm.dom", "gtm.uniqueEventId": 22},
    {"event": "gtm.js", "gtm.uniqueEventId": 23},
    {
        "event": "promotionsView",
        "ecommerce": {
            "promoView": {
                "promotions": [
                    {
                        "id": "slider-1",
                        "name": "rentree-scolaire",
                        "creative": "home slider",
                        "position": "1",
                    }
                ]
            }
        },
        "gtm.uniqueEventId": 24,
    },
    ...
]
AKX
  • 152,115
  • 15
  • 115
  • 172