I am trying to extract string of text from a whole field with Openrefine. This is an extract of my dataset:
172. D3B: 23Y1-Up, 27Y1-Up (36 LK) 6-S/F Rollers, 4-D/F Rollers, 2-Carrier Rollers
179. D3C: 23Y2508-UP (37LK) 6-S/F, 4-D/F, 2-T/C
180. 27Y5050-UP (37LK) 6-S/F, 4-D/F, 2-T/C
181. 2XF622-UP (37LK) 6-S/F, 4-D/F, 2-T/C
182. 3RF0147-UP (36LK) 6-S/F, 4-D/F, 2-T/C
200. D4D:67A1-UP, 78A1-UP, 85A1-UP, 86A1-UP, 59J1-644, 58J1-UP, 49J1-473, 22C1-UP, 91A1-UP, 88A1-UP
I want to extract 23Y1-Up, 27Y1-Up
from record 172,
23Y2508-UP
from record 179, 27Y5050-UP
from record 180 and the whole 67A1-UP, 78A1-UP, 85A1-UP, 86A1-UP, 59J1-644, 58J1-UP, 49J1-473, 22C1-UP, 91A1-UP, 88A1-UP
from record 200
So basically the rule would be to extract everything between :
if present and (
if present. Maybe restricting it to where there is one or more occurrence of the string UP
So I am adding a new column based on existing column using value.match.
I tried to adapt some query to my scope but I am very far from succeding despite multiple attempts.
I started with this regex expression value.match(/\:?\s*(\w+\.?)+?.*/)[0]
that I tought would isolate any word AFTER the semicolon (and the space) but it works only with words BEFORE...
Yesterday I successfully extracted the numbers before the LK that is also relevant information for my dataset, but I can't grasp this.
Any help is much appreciated! Thanks