3

I want to select an element/div tag by using XPath (Symfony Dom Crawler):

$element->filterXPath('//span[text() = "SOMEtext"]')->text();

It works fine if there are no special characters in the string. It won't work if a string contains the accents or characters such as: Prénom, expérience, à toi, etc.

$element->filterXPath('//span[text() = "Référence"]')->text(); gives me an error.

Is there a way to filter out the non-english text ?

I tried many combinations to convert the text into the unicode string, but it always fails.

Référence
Référence
R\u00E9f\u00E9rence
R\u{00E9}f\u{00E9}rence
R\00E9 f\00E9 rence
R%C3%A9f%C3%A9rence
RU+00E9fU+00E9rence
R0xE9f0xE9rence
aspirinemaga
  • 3,753
  • 10
  • 52
  • 95

1 Answers1

2

You didn't specify which XPath implementation you're using, and because filterXpath is non-standard in PHP, the first thing I'd check is encoding. Is the encoding in which your PHP script is saved the same encoding that is expected by the object?

The second thing I'd try is to use the standard XPath implementation of DOMDocument, but there are other implementations as well.

$oDom = (new DOMImplementation())->createDocument(NULL, '');
// import your DOM here
$XPath = new DOMXPath($oDom);
$XPath->query('//span[text() = "Référence"')->item(0);
Code4R7
  • 2,600
  • 1
  • 19
  • 42
  • Sorry, I forgot to mention that I'm using a Symfony DomCrawler component (via composer require). I will try out your code now – aspirinemaga May 14 '17 at 19:06
  • 1
    The problem was in the html source code! Some of the words with accents were stripped, and some of them not. Instead of `Référence` - I got `Rférence`. I don't understand why. – aspirinemaga May 14 '17 at 20:43
  • Thank you, I was able to find out the core of my problem while trying to use your code. – aspirinemaga May 14 '17 at 20:44