0

I have JSONL file as text - string, its a very big file and not useful to convert to standard JSON:

{"id":"gid:\/\/shopify\/ProductVariant\/32620848382088","__parentId":"gid:\/\/shopify\/Product\/4632300847240"}
{"id":"gid:\/\/shopify\/Product\/4632300912776"}
{"namespace":"daily_deals","key":"status","value":"inactive","__parentId":"gid:\/\/shopify\/Product\/4632300912776"}
{"namespace":"daily_deals","key":"endtime","value":"1604966400000","__parentId":"gid:\/\/shopify\/Product\/4632300912776"}
{"id":"gid:\/\/shopify\/ProductVariant\/32620848447624","__parentId":"gid:\/\/shopify\/Product\/4632300912776"}
{"id":"gid:\/\/shopify\/Product\/4632301011080"}
{"namespace":"daily_deals","key":"status","value":"inactive","__parentId":"gid:\/\/shopify\/Product\/4632301011080"}
{"namespace":"daily_deals","key":"endtime","value":"1604966400000","__parentId":"gid:\/\/shopify\/Product\/4632301011080"}
{"id":"gid:\/\/shopify\/ProductVariant\/32620848808072","__parentId":"gid:\/\/shopify\/Product\/4632301011080"}
{"id":"gid:\/\/shopify\/ProductVariant\/39402297720968","__parentId":"gid:\/\/shopify\/Product\/4632301011080"}
{"id":"gid:\/\/shopify\/Product\/4673135444104"}

I want to solve problem at frontend so I need to use javascript . How I can using regex to select only rows which contain text: "gid://shopify/Product/4632301011080" and "namespace":"daily_deals" ? So I need whole row from { to } if contain text

Is the best solution to use regex or some other technic? Please suggest? The text JSONL file is average 10mb so I think it wont affect browser memory a lot.

UPDATE: All rows I want to search starts with {"namespace": and other onces I want to ignore because of performance

Aleks Per
  • 1,549
  • 7
  • 33
  • 68

3 Answers3

2

Try this: Suppose "namespace":"daily_deals" part always comes before "gid://shopify/Product/4632301011080" this regex will work.

^{"namespace":"daily_deals".*"gid:\/\/shopify\/Product\/4632301011080".*

See live demo.

SaSkY
  • 1,086
  • 1
  • 4
  • 14
2

You could use this regex:

/^{"namespace":"daily_deals".*?"gid:\/\/shopify\/Product\/4632301011080".*/gm

let content = `{"id":"gid:\/\/shopify\/ProductVariant\/32620848382088","__parentId":"gid:\/\/shopify\/Product\/4632300847240"}
{"id":"gid:\/\/shopify\/Product\/4632300912776"}
{"namespace":"daily_deals","key":"status","value":"inactive","__parentId":"gid:\/\/shopify\/Product\/4632300912776"}
{"namespace":"daily_deals","key":"endtime","value":"1604966400000","__parentId":"gid:\/\/shopify\/Product\/4632300912776"}
{"id":"gid:\/\/shopify\/ProductVariant\/32620848447624","__parentId":"gid:\/\/shopify\/Product\/4632300912776"}
{"id":"gid:\/\/shopify\/Product\/4632301011080"}
{"namespace":"daily_deals","key":"status","value":"inactive","__parentId":"gid:\/\/shopify\/Product\/4632301011080"}
{"namespace":"daily_deals","key":"endtime","value":"1604966400000","__parentId":"gid:\/\/shopify\/Product\/4632301011080"}
{"id":"gid:\/\/shopify\/ProductVariant\/32620848808072","__parentId":"gid:\/\/shopify\/Product\/4632301011080"}
{"id":"gid:\/\/shopify\/ProductVariant\/39402297720968","__parentId":"gid:\/\/shopify\/Product\/4632301011080"}
{"id":"gid:\/\/shopify\/Product\/4673135444104"}`;

let result = content.match(/^{"namespace":"daily_deals".*?"gid:\/\/shopify\/Product\/4632301011080".*/gm);

console.log(result);
trincot
  • 317,000
  • 35
  • 244
  • 286
  • For my full text I got error: Your expression took too long to finish and was terminated. Please increase the timeout and try again. This may be an indication of catastrophic backtracking. To find out more and what this is, please read the following article: Runaway Regular Expressions – Aleks Per Nov 09 '22 at 22:11
  • 1
    I made an update with `^` prefix and `m` flag. – trincot Nov 09 '22 at 22:15
  • Please check https://regex101.com/r/R8fJLl/1 – Aleks Per Nov 09 '22 at 22:16
  • 1
    Yes, update with `^` prefix and `m` flag. Also make sure your text does not have *literal* backslashes. They should be present if you have a JavaScript string literal, but not in the regex tool. – trincot Nov 09 '22 at 22:19
  • Whats the other possible solution instead regex ? Is there some more elegant solution? – Aleks Per Nov 09 '22 at 22:24
  • I also know that all rows that I want to search starts with {"namespace": so other we can just ignore – Aleks Per Nov 09 '22 at 22:30
  • 1
    Well then my answer is not efficient. It would have been useful if your question would include those specifics. The more we know about the input, the better solutions can be. – trincot Nov 09 '22 at 22:31
  • I just update my questions, so I need to look only at rows that starts with {"namespace": and other onces I want to ignore because of the performance – Aleks Per Nov 09 '22 at 22:33
  • 1
    And maybe you can add how you read your file? – trincot Nov 09 '22 at 22:33
  • 1
    I read/download as string all at once, not row by row – Aleks Per Nov 09 '22 at 22:34
  • 1
    Updated the regex. I don't see any other solution that would beat that in speed. The next thing would be to read the file line by line to cut on memory use. – trincot Nov 09 '22 at 22:46
  • Thanks a lot. One more think - I dont want to store text as variable because my original text is 10mb so instead that I want to store in local storage and use from there. That will save the memory ? – Aleks Per Nov 09 '22 at 23:27
  • In theory that will save memory, although the memory will already have been allocated before you write it to local storage, and then (if your string variable is out of scope) it is up to the garbage collector to really free the memory -- you don't have control over that. Moreover, whenever you need that text again, you'll need to read it into a variable again, taking up the memory again. – trincot Nov 10 '22 at 06:16
2
/gid://shopify/Product/\d{1,}/