1

I have a big string (1116902 char length) that I want to process with a regex (pretty simple one). I get a response from a soap server that is encoded in base64. So I just get the result between the appropriate xml tags and then decode the response.

This working for a small request. But when I get a big response back, the callback function of the replace() method is never called. I have tried to test the string on the regex101 website and it can find the result. So I wonder if there is a limitation in my JavaScript engine. I'm working on a Wakanda Server V10 that use Webkit as JavaScript engine. I cannot provide the string because it contains some enterprise information.

Here is my regex : /xsd:base64Binary">((.|\n)*?)<\/responseData>/

I taught it is maybe a special character that is not included in the ((.|\n)*?) group. But then why the regex101 find out the result (then maybe is the JavaScript engine)

Maybe anybody can help me?

Thanks

Ganbin
  • 2,004
  • 2
  • 12
  • 19
  • 3
    `(.|\n)*` causes so much backtracking that the engine quickly runs into **catastrophic backtracking**. With `(.|\n)*?` a timeout issue may occur with large strins where the ending delimiter cannot be found or it is too far from the starting delimiter (a similar issue). When parsing XML or HTML, use appropriate corresponding parsers instead of regex. – Wiktor Stribiżew Dec 16 '15 at 14:54
  • So how should I do? If I use ony `.*` it will not go through breaklines. Do you have any suggestions how to catch the base64 response inside of my xml structure? – Ganbin Dec 16 '15 at 14:58
  • 1
    So it is XML. Use XML parser, here is [an example of parsing a string into an XML tree and getting an attribute value](http://stackoverflow.com/questions/32497417/how-to-make-regex-in-node-js-return-the-first-matching-group/32499583#32499583). Just amend it a bit to get the element value. There are other ways to parse XML, too, see other comments here. – Wiktor Stribiżew Dec 16 '15 at 14:59
  • 1
    Another approach that will probably perform better than regex and xml parser is to use simple `String.indexOf` and `String.substring`. – Wagner DosAnjos Dec 16 '15 at 15:01
  • For that matter, if it's XML you may be able to convert it into a jQuery object and use `.find()` – Blazemonger Dec 16 '15 at 15:01
  • 1
    If you wanted a quick and dirty solution, get the locations of all of the `xsd:base64Binary` tags, and all of the `responseData` tags, then do the replacements manually with `splice()`. Not sure what kind of performance you will get – N. Leavy Dec 16 '15 at 15:01
  • Thanks for your answers, I will try to do it with the index of my response. I will come back when I have done the test. For the DOM parser, I'm on server side so If I want to do this I will have to adapt jQuery on the server-side. I have never done this, but I'm sure it is possible. – Ganbin Dec 16 '15 at 15:05
  • Thanks for pointed me in the right direction. I'm so stupid I have not seen that I can easely do a `substr` of my string by searching for index of my delimiter. – Ganbin Dec 16 '15 at 15:13
  • If you can guarantee that there are no tags between your start and end delimiter, which sounds like it might be the case, you could just change your RE to `/xsd:base64Binary">([^<]*)<\/responseData>/` which shouldn't require any backtracking and might work for you – N. Leavy Dec 16 '15 at 15:18
  • @N.Leavy Wow thanks. That work great and fast. Can you explain me more about this `[^<]` ? – Ganbin Dec 16 '15 at 15:44
  • 1
    `[^<]` simply means everything but the '<' character. Since there shouldn't be any tags between the open and closing tags of your section (at least that's what I understand) that will accept everything until you hit your closing tag. The important thing is that the RE engine can tell immediately whether something matches that or not, so no branch or backtracking is required. – N. Leavy Dec 16 '15 at 19:20

1 Answers1

1

If you can guarantee that there are no tags between your start and end delimiter, which sounds like it might be the case, you could just change your RE to

/xsd:base64Binary">([^<]*)<\/responseData>/ 

which shouldn't require any backtracking and might work for you.

[^<] simply means everything but the < character. Since there shouldn't be any tags between the open and closing tags of your section (at least that's what I understand) that will accept everything until you hit your closing tag. The important thing is that the RE engine can tell immediately whether something matches that or not, so no branching or backtracking is required.

N. Leavy
  • 1,004
  • 9
  • 13
  • Many thanks. I learn more about regex. It was not so complicated but I didn't think to get everything exept something that I'm sure I will never get. – Ganbin Dec 17 '15 at 15:53