Questions tagged [google-refine]

OpenRefine (formerly Google Refine) is a free, open source, data cleaning tool.

[Google Refine] is a free, open source, data cleaning tool. It was originally called Freebase Gridworks and was developed by Metaweb before Metaweb's acquisition by Google. In 2012 support from Google have been removed and code moved to GitHub 1.

44 questions
0
votes
2 answers

Strict consumption of JSON, how to reorder key:values to specific JSON schema for Open Refine

Trying to use Open Refine to analyze a data set of messy JSON strings (40k lines), however due to JSONs' nature of being unordered, some of the lines of JSON objects were mixed up when returned and recorded to a file. Some objects are missing keys,…
aquaflamingo
  • 792
  • 6
  • 17
0
votes
1 answer

Lost all my files on Openrefine

I tried de beta version of Openrefine and now I have lost all my previous files on the version 2.5. Do you know where the files are located? I am on Mac. Thanks!
0
votes
1 answer

Connecting to GoogleRefine using Java program

This question is similar to the posting 'Script-driven automation of Google refine with ruby python perl java or otherwise': Script-driven automation of Google refine with ruby python perl java or otherwise I have a lengthy JSON Script that I…
gg-14
  • 67
  • 5
0
votes
1 answer

How to merge columns that both have blank spaces in Google refine

im working with a data base in google refine and I have 2 columns with the information "year". Both columns have values and blank spaces, and where one has a value, the other has a blank space so I want to merge both. I found this…
0
votes
1 answer

Merge ALMOST identical data rows

I have a large amount of data (UK & US Postal addresses) 100,000+, that contains duplicate or ALMOST identical data rows (with 5 columns) in the near identical rows four out of the five columns have exact matches of data for example:- AAAA BBBB…
Hector
  • 4,016
  • 21
  • 112
  • 211
0
votes
1 answer

How to add formatting commas to a number in Google Refine

Due to what we're using the data for, it's important that long numbers (8+ digits) have commas every 3 digits for formatting and readability. The issue is I really don't know how to make an expression that does this. Would anyone with some more…
0
votes
2 answers

Extract postcode from Google Maps API JSON using Google Refine

I'm trying to use Google Refine to extract postcodes from Google Maps API JSON. I added a new column by fetching URLs: "http://maps.googleapis.com/maps/api/geocode/json?sensor=false&address=" + escape(value, "url") Then the resulting JSON is as…
0
votes
1 answer

fill down by record in Google Refine

I have the following comma-delimited CSV file in Google refine: How do I fill down the values from column1 using Jython or GREL to become: I have tried: if value is None: return row["record"]["cells"]["column1"]["value"][0] else: return…
Edan
  • 593
  • 1
  • 8
  • 23
0
votes
1 answer

How to change values in facet to same in Google Refine?

I'm trying to clean this data: https://dl.dropbox.com/u/820037/local_council_election_data_w_occupation.gz It's all the candidates for a local councils' election in Finland. In the column "Ammatti" there is the occupation of a candidate as reported…
user1718189
  • 51
  • 1
  • 3
0
votes
2 answers

Google Refine: Regular expression not working

I need to match a regular expression for a text facade in google refine. I tried the expression and it didn't work. Then I tried a simple case of matching string lenovo in www.lenovo.com using value.match(/lenovo/) in some of the rows my value…
darshan
  • 1,230
  • 1
  • 11
  • 17
0
votes
1 answer

Getting value by row number and column number

In custom text facet I want to check value of previous row's cell. I tried rows[row.index - 1] with no result.
skfd
  • 2,528
  • 1
  • 19
  • 29
-1
votes
1 answer

How to save only specific JSON elements in a new OpenRefine column

{ "business_id": "SQ0j7bgSTazkVQlF5AnqyQ", "full_address": "214 E Main St\nCarnegie\nCarnegie, PA 15106", "hours": {}, "open": true, ** "categories": ["Chinese", "Restaurants"] ** , "city": "Carnegie", "review_count": 9, …
-1
votes
3 answers

Merge all the data in the second column for each unique value in the first column

I have two columns of data. Some of the data in the first column repeats (they represent questions). The data in the second column is unique (they represent multiple answers to the same question). I need to merge all the data in the second column…
John
  • 241
  • 1
  • 6
-2
votes
1 answer

Clean unstructured place name to a structured format

I have around 300k unstructured data as below screen.I'm trying to use Google refine or OpenRefine to make this correct. However, I'm unable to find a proper way to do this. I'm new to this tool. Anyone's help would be greatly appreciated.Also, this…
AskMe
  • 2,495
  • 8
  • 49
  • 102
1 2
3