0

I have huge file which compose like this:

This line is wrong, because the name after the code, for example: (20000000) NAME

where NAME does not recur in other line’s fragments (example 1):

;100000;(20000000) Face wash su Acai uogomis Ziaja Jagody Acai 200 ml, (26700000) Face rinse gel Avene 75 ml, (26000000) Face tonic Alcina Skin Manager AHA Effect 50 ml, (30000000) Moisturing face lotion Tony Moly The Chok Chok Green Tea 160 ml, (31000000) Cleansing micel water Jowae Micellar Cleansing Water 400 ml

This line is correct, because all names after codes goes the same (example 2):

;100001;(20000000) Face wash su Acai uogomis Ziaja Jagody Acai 200 ml, (20000000) Face wash su Acai uogomis Ziaja Jagody Acai 200 ml, (20000000) Face wash su Acai uogomis Ziaja Jagody Acai 200 ml, (20000000) Face wash su Acai uogomis Ziaja Jagody Acai 200 ml

All fragments on one line of file are separated by codes like this (888888888), after this code following 1-5 words name, which have been the same in all fragments.

Goal is find all lines, whit the same names.

I used for this purpose regexp (where "Face wash" is the NAME:

^;([0-9]{5,12};(\([0-9]{6,12}\).Face wash.*){1,20})$

but it’s was find all lines with name which are just in first fragment

I think wrong part of regexp is .*

How to change .* if I want to find lines with same name of fragments (like example 2), but not to find fragments with not same names (like example 1)

P.S. coma (,) is unreliable delimiter the only reliable delimiter is (35465468) number in brackets that is followed by NAME

  • This question is hard to understand. I would recommend changing it. It would be better if you posted 5-10 line of the file (as a block) and say you want lines (for example) 1, 3, 7, and 10 to be matched because...and explain the reasoning and then explain why lines 2, 4, 5, 6, 8, and 9 should not match. That would get suggestions on whether a regex solution is possible or if you need a programming language to loop through the file. – MDR Jul 12 '21 at 16:42
  • I don't think a regex and find/replace operation using just Kate is going to be possible. In any case you say you want to match the words after the code numbers. Since the items on the 'correct' line in your example all have the code '20000000' why can you not use the code number to check they are all the same? This would be better as if you are using the words after the code for a match and it's a large file how do you know if you need to match one word on one line and then several words on another after the code number? Sounds a little bit messy. – MDR Jul 12 '21 at 18:48

1 Answers1

0

Sorry for not clear question. I find solution:

^;[0-9]+;(\([0-9]+\)[0-9a-zA-Z,\/ ]*Face wash[0-9a-zA-Z,\/ ]*){1,24}$