0

The Dom Crawler Component is powerfull to parse html content, in its documentation describes basics selections (like filter('body > p')) or more complex xpath like //span[contains(@id, "article-")]

Is it possible to fetch elements by regular expression? Maybe something like that is available: filter('body')->filter('div.*-timeLabel-*') ?

Danil Pyatnitsev
  • 2,172
  • 2
  • 26
  • 39

3 Answers3

1

Something like this? Modified one of the examples from the docs applying a anonymous function.

$nodeValues = $crawler->filter('body')->each(function (Crawler $node, $i) {
    // regex and return $node->attr('class')
});
0

i'm not sure but i think the answer is yes cuz the filter method of the crawler calls this method of the CssSelectorConverter and according to the documentation you can pass an expression as a parameter

    /**
     * Translates a CSS expression to its XPath equivalent.
     *
     * Optionally, a prefix can be added to the resulting XPath
     * expression with the $prefix parameter.
     *
     * @param string $cssExpr The CSS expression
     * @param string $prefix  An optional prefix for the XPath expression
     *
     * @return string
     */
    public function toXPath($cssExpr, $prefix = 'descendant-or-self::')
    {
        return $this->translator->cssToXPath($cssExpr, $prefix);
    }
0

in XPath 2.0, you can use matches:

$crawler->filterXPath("//div[matches(@id, '*-timeLabel-*')]");

but if you don't have that available, your best bet is to try and combine some of the other XPath methods, for example this should do the trick for your case:

$crawler->filterXPath("//div[contains(@id, '*-timeLabel-*')]");
Wissem
  • 1,705
  • 4
  • 16
  • 22