Questions tagged [openrefine]

OpenRefine is the new name for the data cleaning tool which used to be called Google Refine (and was born as Freebase Gridworks)

Resources

400 questions
2
votes
1 answer

Openrefine - Transpose rows into columns based on text

I've received a data dump from a library catalogue, it came out in .txt format. I've been able to get the data into a spreadsheet, but it is all in one column. I would to transpose the rows into columns. The data is in this one column in the…
Philip G
  • 21
  • 5
2
votes
1 answer

Openrefine: split with regex gives strange result

I applied the GREL expression "value.split(/a/)" to some cells: abcdef -> [ "", "bcdef" ] bcdefa -> [ "bcdef" ] badef -> [ "b", "def" ] I can't understand why the first cell gives me a "" element in the resulting table. Is it a bug? Thanks!
Mathieu Saby
  • 125
  • 5
2
votes
1 answer

How to make a random sample in Openrefine?

Very often we need to extract random samples of a large dataset? What is the best way to do it on openrefine? This might be useful for practitioners used to do it in R and Python. Thanks in advance for any advice!
Joni Hoppen
  • 658
  • 5
  • 23
2
votes
1 answer

Is it possible to make summarized table on openrefine?

I have be wondering if is it possible to create an aggregation and summary of values on OpenRefine on the same way as it is done on python and R? Example: Table of medical appoints with 300k records Id-patient | Age | Id-appointment | value The…
Joni Hoppen
  • 658
  • 5
  • 23
2
votes
1 answer

How to fulfill blank fields in Open Refine?

I found the blank rows, that is already great. Now I want to type "Not informed Value" to all blank values, but I don´t know how, Any hints. Thanks in advance! I am having a great fun working with this distributed community! Joni
Joni Hoppen
  • 658
  • 5
  • 23
2
votes
4 answers

How to execute OpenRefine JSON on CSV in Python?

I am trying to find a Python solution which can execute the following OpenRefine Python commands in JSON without OpenRefine server being on. My OpenRefine JSON contains mappings and custom Python commands on each field of any properly formatted CSV…
Léo Léopold Hertz 준영
  • 134,464
  • 179
  • 445
  • 697
2
votes
2 answers

OpenRefine: select value based on a variable another column

I have a problem with OpenRefine. I am adding a new column based on a url and from there calling an API for getting some terms from a controlled vocabulary (AAT). I parse the results and I obtain a multivalued cells such…
K3it4r0
  • 195
  • 12
2
votes
1 answer

How to use Reconciliation Service API from Google sheets

OpenRefine (formerly Google Refine) supports to match records to external identifiers via Reconciliation Service API, for instance to find Wikidata identifiers for entities described in table rows (see Wikidata OpenRefine Service). Is it possible to…
Jakob
  • 3,570
  • 3
  • 36
  • 49
2
votes
1 answer

Incrementing a date in openrefine

I have a date in format of YYYY-MM-DDThh:mm:ss Please provide a GREL expression that increments date to 1 month from the present date value for all cells in the column in openrefine. Thanks!
danimal
  • 23
  • 3
2
votes
1 answer

Openrefine: text facet by counting

I've a huge file primary composed of book metadata (author, title, date, url). My problem is that I want to operate on author names (which are often repeated: an author can have hundreds of records) and I want to operate on the subset of these…
Lara M.
  • 855
  • 2
  • 10
  • 23
2
votes
1 answer

Access column name for a specific value in GREL/Open Refine (or R, Python)

I'm trying to access the value of a column name for a specific cell in Open Refine, so I can replace the value of the cell with the column name. I'm aware of the variable row.columnNames that returns ALL column names but is there a way to return…
Matt B
  • 45
  • 4
2
votes
1 answer

Openrefine: cross.cell for similar but not identical values

I have two dataset: one dataset has names of countries, but dirty ones like Gaule Cisalpine (province romaine) Gaule belgique Gaule , Histoire Gaule ecc. the second dataset has two columns with the names of countries (clean) and a code like Gaule |…
Lara M.
  • 855
  • 2
  • 10
  • 23
2
votes
1 answer

Create json in python for Openrefine

I'm scraping resources in python and I want to make a json file, using it in Openrefine to clean data. Here's my code: import json import codecs A = xpath B = xpath C = xpath D = xpath with codecs.open('info2.json', 'a', 'utf-8-sig') as f: …
Lara M.
  • 855
  • 2
  • 10
  • 23
2
votes
1 answer

OpenRefine - add sequence number, reset for each record

I have some records containing multiple rows. I want to give each row within a record a unique ID based on the string in the first row, containing the original ID + _01 _02 _03 and so forth. Then I would like the counter to reset when the next…
nils
  • 23
  • 4
2
votes
3 answers

OpenRefine columnwise scripting

I spent some time Googling, but couldn't find anything useful. How to select all the values of a single column in OpenRefine in a script? It seems that all the operations are row-wise In particular, I want to find highest and lowest values in a…
Boris Mocialov
  • 3,439
  • 2
  • 28
  • 55