0

I am using xpath extractor in jmeter to extract links in a html response. Some of the links includes umlauts! (I know that is bad style)

In the response of the http request the umlaut looks ok a href="/ich-moechte/für-das-alter-vorsorgen"

But xpath creates the following variable with wrong representation of umlauts: subLink_12=/ich-moechte/f%FCr-das-alter-vorsorgen

I compared the results with the Regular expression extractor which is delivering the correct result subLinkold_46=/ich-moechte/für-das-alter-vorsorgen

I also tried storing the http response in a variable and using the variable in the xpath extractor, but this gave the same result.

1 Answers1

0

I cannot reproduce your issue using latest JMeter 5.5 and XPath Extractor

enter image description here

can it be the case your Java locale doesn't really support working with national characters? Check your file.encoding system property value using Debug Sampler and if it is not suitable for the umlauts switch it to the correct one (or even better use UTF-8)

More information: Setting java locale settings

Using XPath for getting data from HTML might be not the best idea, have you considered switching to CSS Selector Extractor instead?

Dmitri T
  • 159,985
  • 5
  • 83
  • 133
  • I have now also used the dummy sampler for the test (thanks!). And there was exactly one difference in your script. I used the tidy option. When I turn it off, it shows the correct results!!! In other words, when I turn on the tidy option, the xpath extractor has a problem with umlauts. I will also check CSS Selector Extractor. – Frank Ode Jun 06 '23 at 13:13
  • With Tidy unfortunately you won't be able to amend the behavior without rebuilding JMeter from source as [output encoding is set to UTF-8](https://github.com/apache/jmeter/blob/rel/v5.5/src/core/src/main/java/org/apache/jmeter/util/XPathUtil.java#L254) while you need `windows-1252`. You can either [create an enhancement request in JMeter](https://github.com/apache/jmeter/issues) or switch to [JSR223 PostProcessor](https://jmeter.apache.org/usermanual/component_reference.html#JSR223_PostProcessor) with [Groovy](https://www.blazemeter.com/blog/apache-groovy) if you really need XPath/GPath – Dmitri T Jun 06 '23 at 15:11
  • In JSR223 PostProcessor with Groovy I added the following workaround: String newLink = vars.get("subLink_" + i).replaceAll("%E4","ä").replaceAll("%F6","ö").replaceAll("%FC","ü").replaceAll("%C4","Ä").replaceAll("%D6","Ö").replaceAll("%DC","Ü").replaceAll("%DF","ß") Is that the option you meant? – Frank Ode Jun 07 '23 at 07:09