I have an input of lines of text, of which some contain relevant GeoData of Objects. I want to identify the relevant lines by matching a given prefix (that identifies the following coordinates as belonging to the desired Geo-Object).
It may look similar to the following:
/line with irrelevant prefix
/line with irrelevant prefix, potentially also containing coordinates
[relevant prefix][bunch of characters][1 to X coordinates in the form "lat: X.XXXXXX, lon:X.XXXXXX"][bunch of other characters][other potentially relevant information]
/line with irrelevant prefix
/line with irrelevant prefix
The actual data I want to extract is in the form of a String of coordinates (LineString), in order to generate an object representing the LineString for further use in my C# code. Additional attributes (such as a name or an ID for example) might be relevant, too. However, I also want to disqualify lines that may contain coordinates, but do not include my relevant prefix.
From what I understand, I can use named capturing groups in Regexes to get substrings as "variables" from the relevant lines, like this (please do not mind the imprecise format for the coordinate):
(lat=(?<lat>\d{0,2}[.,]\d+)), (lon=(?<lon>\d{0,2}[.,]\d+))
However, as far as I can see, I cannot have my expressions match an arbitrary number of coordinates in the line (since I do not know the length of the LineString object) and at the same time have the expression match the prefix pattern.
Is there a solution to have the expression match the prefix, have named capturing groups for the arbitrarily many pairs of coordinates, and also have additional named capturing groups for potentially relevant additional variables?
Suppose I have the following line:
PREFIXafkzh(lat=34.42344, lon=23.6346jsdfkh,lat=2.4234, lon=12.124)
I have tried the following regex:
(lat=(?<lat>\d{0,2}[.,]\d+)), (lon=(?<lon>\d{0,2}[.,]\d+))
This matches all coordinates, but not the prefix
(PREFIX).\*(lat=(?<lat>\d{0,2}[.,]\d+)), (lon=(?<lon>\d{0,2}[.,]\d+))
(PREFIX).\*((lat=(?<lat>\d{0,2}[.,]\d+)), (lon=(?<lon>\d{0,2}[.,]\d+)))+
Both of these will only match the last coordinate in the line.