Questions tagged [openrefine]

OpenRefine is the new name for the data cleaning tool which used to be called Google Refine (and was born as Freebase Gridworks)

Resources

400 questions
0
votes
1 answer

How to import csv and open a webpage for Google's Open Refine through python code?

I've been reading through the documentation for the Python client of OpenRefine (https://github.com/OpenRefine/refine-client-py) but it seems as though the link for "David Huynh's Refine tutorial" is broken. Through my python code, I would like to…
0
votes
1 answer

OpenRefine - Lost records

I am creating a new project in OpenRefine Version 2.6-rc.2 and loading a csv file with 3185 rows. The file is small (342 KB). Everything seems to go fine (no error or malformed columns) except that I end up with 3155 records: 30 records disappeared…
ivankeller
  • 1,923
  • 1
  • 19
  • 20
0
votes
2 answers

add numbers down a column in OpenRefine

I'd like to automatically number a column. Similar to Excel, where I can type "1" in one cell and the cells below it automatically get numbered 2, 3, 4, 5, etc. I don't know why I'm having so much trouble figuring out this function on Openrefine but…
Gail G
  • 1
0
votes
2 answers

Concatenate the rows based on number (Google Refine, Excel/Google Spreadsheet)

I have large amount of rows on a csv file, which look like: name a,1 name b,1 name c,1 name d,2 name e,2 I need to concatenate the rows based on number. Result should be: name a|name b|name c name d|name e How can I do it in Google Refine or in…
armando85
  • 5
  • 4
0
votes
2 answers

Extract a text string with regex

I have a large set of data I need to clean with open refine. I am quite bad with regex and I can't think of a way to get what I want, which is extracting a text string between quotes that includes lots of special characters like " ' / \ # @ - In…
Gauthier
  • 11
  • 2
0
votes
1 answer

OpenRefine split on character in multivalue cell

I am new to using OpenRefine, and I cannot figure out how split a multivalue cell on each character in the cell. For example, I cannot split a cell with value "mod" in to three rows: one with "m", one with "o", and one with "d". When the data has a…
Bill
  • 179
  • 8
0
votes
2 answers

OpenRefine text transform unique() ignoring case

Is there a way to tell uniques() to ignore case? I have a GREL that runs like forEach(value.split(","),v,v.trim()).uniques().join(",") This takes each value in the cell seperated by commas, and then spits out the unique value/s in that cell. Works…
Paul M
  • 3,937
  • 9
  • 45
  • 53
0
votes
1 answer

Select multiple repeated records OpenRefine

The table Locations has the following items : The problem is that there are some rows which are "semi-repeated" (all the elements are equals except for the attribute attb that's an integer). I want to delete all repeated rows and append all the…
nacho c
  • 311
  • 1
  • 3
  • 15
0
votes
1 answer

OpenRefine GREL to change

to

I'm using OpenRefine to clean about 300 records and have some html text that has multiple paragraph tags with a specific class (class="essay-header") that wraps text that I'd like to convert to h2 tags. What kind of GREL would I need to use to…
user3206
  • 27
  • 4

0
votes
3 answers

Assign rows to category in Openrefine

I have a dataset like this, and I'm looking for a way to add a category, based on what kind of product I have. Can I search for Apple + Orange and assign them to a category named Fruits, and similar with Milk + Wine and assign them to another…
Filip Blaauw
  • 731
  • 2
  • 16
  • 29
0
votes
1 answer

How to export the cell that contains new line character properly?

I have a row of data as below. The problem is that it will be exported as three rows in csv file; how do I export it as one row? ________________________________________________________________________________________ |id …
Alex Luya
  • 9,412
  • 15
  • 59
  • 91
0
votes
3 answers

Couple the data in all possible combinations

I have data in column in two columns like this Id Value 1 a 2 f 1 c 1 h 2 a and I'd like couple the data of the 'Value' column in all possible combinations based on the same Id such as (a,c) (a,h) (c,h) (f,a) Is there any R or Python or…
Andrea Angeli
  • 131
  • 1
  • 16
0
votes
2 answers

OpenRefine Regex and GREL match error

Inside openRefine I want to run the below regex on a website's source that finds email addresses with a mailto link. My trouble is when running value.match, I get this error: Parsing error at offset 12: Bad regular expression (Unclosed character…
0
votes
2 answers

Strict consumption of JSON, how to reorder key:values to specific JSON schema for Open Refine

Trying to use Open Refine to analyze a data set of messy JSON strings (40k lines), however due to JSONs' nature of being unordered, some of the lines of JSON objects were mixed up when returned and recorded to a file. Some objects are missing keys,…
aquaflamingo
  • 792
  • 6
  • 17
0
votes
0 answers

weka- replace null value in a nominal attribute with a string

I am cleaning a data set with google open refine and then trying to use it in Weka to do some cluster analysis. I am dealing with a nominal column that stores range of salaries. I've specified the attribute as below @ATTRIBUTE Income…
user1189851
  • 4,861
  • 15
  • 47
  • 69