I need to split a narrative field (free text) into multiple rows. Format is currently along the lines of:
Case_Reference | Narrative
```````````````|`````````````````````````````````````
XXXX/XX-123456 | [Endless_Text up to ~50k characters]
Within the narrative field as text, individual entries (when various agents have done something to the case) begin with the entry date followed by two spaces (i.e. 'dd/mm/yyyy '
), with the values of the dates changing with each entry within that same field.
In other words, after trawling for a better delimiter, the only one I can use is this format of string, so I need to identify multiple positions within the Narrative text where the format (would mask be a better word?) matches 'dd/mm/yyyy '
.
I can identify multiple occurrences of a consistent string no problem, but it's identifying it where I'm essentially looking for:
'%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %'
PATINDEX
of course returns the first occurrence/position of this, but so far as I'm aware, there's no way to "modify" this (i.e. a created function) to allow for picking up the rest of the occurrences/positions of this they way we can with CHARINDEX
(since PATINDEX
doesn't have a starting position parameter).
For clarity, I'm not looking for code to delimit this directly as I need to further manipulate each entry, so it's purely the positions of multiple occurrences of the string within the Narrative text I'm looking for.
Any help would be very much appreciated.
For clarity, there's no option to do this pre-import, so it needs to be done on this landed data.
Desired output would be
Case_Reference1 | 1st_Position_of_Delimiter_String
Case_Reference1 | 2nd_Position_of_Delimiter_String
Case_Reference2 | 1st_Position_of_Delimiter_String
Case_Reference2 | 2nd_Position_of_Delimiter_String
Case_Reference2 | 3rd_Position_of_Delimiter_String