I have lines of text containing multiple variables which correspond to a specific entry.
I have been trying to use regular expressions, such as the one below, with mixed success (lines are quite standardised but do contain typos and inconsistencies)
re.compile('matching factor').findall(input)
I was wondering what is the best way to approach this case, what data structures to use and how to loop it to go though multiple lines of text. Here is the sample of the text, with highlighted data I would like to scrape:
CHINA: National Grain Trade Centre: in auction of state reserves, govt. sold 70,418 t wheat (equivalent to 3.5% of total volume offered) at an average price of CNY2,507/t ($378.19) and 4,359 t maize (4.7%), at an average price of CNY1,290/t ($194.39). Separately, sold 2,100 t of 2013 wheat imports (1.5%) at CNY2,617/t ($394.25). 23 Oct
I am interested to create a data set containing variable such as:
VOLUME - COMMODITY - PERCENTAGE SOLD - PRICE - DATE