I've received a data dump from a library catalogue, it came out in .txt format. I've been able to get the data into a spreadsheet, but it is all in one column. I would to transpose the rows into columns.
The data is in this one column in the…
I applied the GREL expression "value.split(/a/)" to some cells:
abcdef -> [ "", "bcdef" ]
bcdefa -> [ "bcdef" ]
badef -> [ "b", "def" ]
I can't understand why the first cell gives me a "" element in the resulting table. Is it a bug?
Thanks!
Very often we need to extract random samples of a large dataset? What is the best way to do it on openrefine? This might be useful for practitioners used to do it in R and Python.
Thanks in advance for any advice!
I have be wondering if is it possible to create an aggregation and summary of values on OpenRefine on the same way as it is done on python and R? Example:
Table of medical appoints with 300k records
Id-patient | Age | Id-appointment | value
The…
I found the blank rows, that is already great. Now I want to type "Not informed Value" to all blank values, but I don´t know how, Any hints.
Thanks in advance! I am having a great fun working with this distributed community!
Joni
I am trying to find a Python solution which can execute the following OpenRefine Python commands in JSON without OpenRefine server being on.
My OpenRefine JSON contains mappings and custom Python commands on each field of any properly formatted CSV…
I have a problem with OpenRefine. I am adding a new column based on a url and from there calling an API for getting some terms from a controlled vocabulary (AAT).
I parse the results and I obtain a multivalued cells such…
OpenRefine (formerly Google Refine) supports to match records to external identifiers via Reconciliation Service API, for instance to find Wikidata identifiers for entities described in table rows (see Wikidata OpenRefine Service). Is it possible to…
I have a date in format of YYYY-MM-DDThh:mm:ss
Please provide a GREL expression that increments date to 1 month from the present date value for all cells in the column in openrefine. Thanks!
I've a huge file primary composed of book metadata (author, title, date, url). My problem is that I want to operate on author names (which are often repeated: an author can have hundreds of records) and I want to operate on the subset of these…
I'm trying to access the value of a column name for a specific cell in Open Refine, so I can replace the value of the cell with the column name. I'm aware of the variable row.columnNames that returns ALL column names but is there a way to return…
I have two dataset:
one dataset has names of countries, but dirty ones like
Gaule Cisalpine (province romaine)
Gaule belgique
Gaule , Histoire
Gaule
ecc.
the second dataset has two columns with the names of countries (clean) and a code like
Gaule |…
I'm scraping resources in python and I want to make a json file, using it in Openrefine to clean data.
Here's my code:
import json
import codecs
A = xpath
B = xpath
C = xpath
D = xpath
with codecs.open('info2.json', 'a', 'utf-8-sig') as f:
…
I have some records containing multiple rows. I want to give each row within a record a unique ID based on the string in the first row, containing the original ID + _01 _02 _03 and so forth.
Then I would like the counter to reset when the next…
I spent some time Googling, but couldn't find anything useful.
How to select all the values of a single column in OpenRefine in a script?
It seems that all the operations are row-wise
In particular, I want to find highest and lowest values in a…