I'm attempting to scrape and clean wikipedia data. I have a data field that contains dimensions as shown below.
["112 x 76 yards (102.4m x 69.4m)", "104.5 x 70.3 m", "107m x 72m",
"109×73 yds / 100×67 m", "{{convert|105|x|68|m|yd|1}}", "100 metres by 70 metres"]
Extracting the dimensions is easy enough, but extracting the unit is rather difficult given how many variations of entries there are. What is the best way to approach this?
I have started by using;
"(\d+\.?\d*)"
Which should extract all the dimensions, I was then going to save only the first 2 numerical matches, save the first match of a unit('m','metre','metres','y','yard','yds','yd','ft'.....) and then I can convert all to metres later.
I am just unsure about how I would go about saving the first unit match.