1

I am trying to remove extra whitespaces and newline characters from my GraphQL query but the data between 2 double quotes in filter argument should remain intact.

Here's how the query is received on our fastly's CDN

# input
{"query":"query OpName    {\n   itemCollection         (filter: { text: "aa aa     aa", text2: "aa             aa"}){\n    group         {      slug\n\n\n\n            text text2  } }   }"}

# expected output
{"query":"query OpName { itemCollection (filter: { text: "aa aa     aa", text2: "aa             aa"}){ group { slug text text2 } } }"}

The objective is to

  • Remove extra whitespaces from the query
  • The whitespaces between 2 double quotes, should remain intact inside the graphql query (since the filter argument's value will be used to match records in our database)

We have tried the following:

  • \s+(?=(?:['|%22](?:\\['|%22]|[^'|%22])+['|%22]|[^'|%22])+$) given at fastly docs
  • \s+(?=([^"]*"[^"]*")*[^"]*$)

But it doesn't seem to work.

Rahul Vaid
  • 21
  • 3

1 Answers1

0

I suggest doing it in 3 steps:

  • save the filter temporarily;
  • make the replacements (delete the unwanted spaces);
  • add back the filter.

The required regexes become significantly simpler (and I guess faster, as you do not need look-aheads or other tricks).


Alternative:

Split the string in 3 parts:

  • before filter;
  • filter;
  • after filter.

Make the replacements in the "before" and "after" strings. Join the parts at the end.


The regex for splitting might look like:

(.*?)(filter: {[^}]*})(.*)

Rebuild the final string with something as:

removeSpaces(group1) + group2 + removeSpaces(group3)

removeSpaces() actually replaces \s+ with . If you want to keep new lines (\n), then replace + with .

I am not very familiar with your programming language, so I cannot provide the exact code, but you should be able to get the idea.

virolino
  • 2,073
  • 5
  • 21
  • the argument for filter is of type JSON object, thus it can have any combination of keys and values. Thus, I am only concerned that the value of the JSON keys are not distorted when the whitespace is being removed. – Rahul Vaid Oct 06 '20 at 13:26
  • It does not matter. You isolate `filter` with `filter: {[^}]*}`. As long as the JSON does not contain curly braces, you are safe. BTW, what is the source of those strings with many spaces? You might be able to remove the spaces before the string is build, and thus save time and effort, and increase speed. – virolino Oct 06 '20 at 13:29
  • The filter's JSON could be like this as well: `{text: "text value", group: {key: "value", arr: ["a", 1]}}`. Thus, the regex provided won't fully work. The regex is in PCRE. Language is Varnish - VCL – Rahul Vaid Oct 06 '20 at 14:55
  • The JSON has only one level of curly braces, or they can be embedded for any number of levels? – virolino Oct 08 '20 at 05:21
  • 1
    I fear that you are actually in the situation to need proper parsers for SQL and for JSON. – virolino Oct 08 '20 at 05:21