I've been reading through the documentation for the Python client of OpenRefine (https://github.com/OpenRefine/refine-client-py) but it seems as though the link for "David Huynh's Refine tutorial" is broken.
Through my python code, I would like to…
I am creating a new project in OpenRefine Version 2.6-rc.2 and loading a csv file with 3185 rows. The file is small (342 KB). Everything seems to go fine (no error or malformed columns) except that I end up with 3155 records: 30 records disappeared…
I'd like to automatically number a column. Similar to Excel, where I can type "1" in one cell and the cells below it automatically get numbered 2, 3, 4, 5, etc. I don't know why I'm having so much trouble figuring out this function on Openrefine but…
I have large amount of rows on a csv file, which look like:
name a,1
name b,1
name c,1
name d,2
name e,2
I need to concatenate the rows based on number. Result should be:
name a|name b|name c
name d|name e
How can I do it in Google Refine or in…
I have a large set of data I need to clean with open refine.
I am quite bad with regex and I can't think of a way to get what I want,
which is extracting a text string between quotes that includes lots of special characters like " ' / \ # @ -
In…
I am new to using OpenRefine, and I cannot figure out how split a multivalue cell on each character in the cell. For example, I cannot split a cell with value "mod" in to three rows: one with "m", one with "o", and one with "d".
When the data has a…
Is there a way to tell uniques() to ignore case?
I have a GREL that runs like
forEach(value.split(","),v,v.trim()).uniques().join(",")
This takes each value in the cell seperated by commas, and then spits out the unique value/s in that cell. Works…
The table Locations has the following items :
The problem is that there are some rows which are "semi-repeated" (all the elements are equals except for the attribute attb that's an integer). I want to delete all repeated rows and append all the…
I'm using OpenRefine to clean about 300 records and have some html text that has multiple paragraph tags with a specific class (class="essay-header") that wraps text that I'd like to convert to h2 tags. What kind of GREL would I need to use to…
I have a dataset like this, and I'm looking for a way to add a category, based on what kind of product I have.
Can I search for Apple + Orange and assign them to a category named Fruits, and similar with Milk + Wine and assign them to another…
I have a row of data as below. The problem is that it will be exported as three rows in csv file; how do I export it as one row?
________________________________________________________________________________________
|id …
I have data in column in two columns like this
Id Value
1 a
2 f
1 c
1 h
2 a
and I'd like couple the data of the 'Value' column in all possible combinations based on the same Id such as
(a,c)
(a,h)
(c,h)
(f,a)
Is there any R or Python or…
Inside openRefine I want to run the below regex on a website's source that finds email addresses with a mailto link. My trouble is when running value.match, I get this error:
Parsing error at offset 12: Bad regular expression (Unclosed character…
Trying to use Open Refine to analyze a data set of messy JSON strings (40k lines), however due to JSONs' nature of being unordered, some of the lines of JSON objects were mixed up when returned and recorded to a file.
Some objects are missing keys,…
I am cleaning a data set with google open refine and then trying to use it in Weka to do some cluster analysis. I am dealing with a nominal column that stores range of salaries.
I've specified the attribute as below
@ATTRIBUTE Income…