I'm just learning R for data science, and used these few lines to extract numbers from data (using data.table):
library(stringr)
library(data.table)
prods[, weights := str_extract(NombreProducto, "([0-9]+)[kgKG]+")]
prods[, weights := str_extract(weights, "[0-9]+")]
prods[, weights := as.numeric(weights)]
Here's an example of the 'NombreProducto' field I want to extract numbers/text from:
"Tostado 210g CU BIM 1182"
Is there an easy way to do this in a succinct one-liner? I tried
prods[, weights := str_match(NombreProducto, "([0-9]+)[kgKG]+")[2]]
but it set everything in the 'weights' column to the first result from the data.table. This is from the Grupo Bimbo Kaggle competition by the way.