Questions tagged [openrefine]

OpenRefine is the new name for the data cleaning tool which used to be called Google Refine (and was born as Freebase Gridworks)

Resources

400 questions
0
votes
2 answers

Nested objects to relational format

i have JSON data on user profiles that i want to eventually analyze with SPSS. Currently i imported the data in Google Refine, to run some data cleansing. My problem is however that the original JSON consists of nested objects, namely e.g. the…
0
votes
1 answer

Is it possible to add column based on keywords in an existing column?

Broadly speaking, here's what I'm trying to do: Parse a string in one cell of a spreadsheet, then add keywords to another cell in that row if certain keywords are found in the parsed cell. I'm using OpenRefine (technically Google Refine 2.5) to try…
Dani
  • 73
  • 2
  • 11
0
votes
1 answer

Open Refine Text Facet Cluster

In openrefine when I upload the data, and click on text facet and then clustering. It creates the clusters. Like : Aniket Ghodke and Ghodke Aniket it will suggest to merge them. But is there any way where I can store these values. Like if I merge…
0
votes
1 answer

Can't import hyperlink to Open Refine

I imported an .xlsx file where one of the columns is filled with hyperlinks, but the links don't show on OpenRefine, just the value. Does it happens only on Linux or it's the same thing with Windows? If not, is there any other way to import those…
0
votes
1 answer

Format a date like "20110822" in Google Open Refine (or Excel)?

I have a dataset that has two different date formats in the same column. Some are formatted like: 2008-05-15T00:00:00Z and others are formatted like: 20090804 Google Open Refine will recognize the first type as a date and will sort and allow me…
user1502186
  • 319
  • 3
  • 11
0
votes
1 answer

Open Refine : Reconciliation with Freebase data based on ORganization Name

I've been following this tutorial: https://www.youtube.com/watch?v=5tsyz3ibYzk I've been following all the steps, but I noticed that, for my dataset, freebase doesn't suggest any kind of type, like it does for 'movies' in the example. I have a…
user3314418
  • 2,903
  • 9
  • 33
  • 55
0
votes
1 answer

Keep newest duplicate row depending on multiple Columns

I seem to have a workflow problem with Open Refine (Google Refine 2.5 [r2407]) to do sophisticated duplicate row cleaning. All I have found so far is how to delete duplicate rows based on a single column. My aim is to delete duplicate rows based on…
Dino
  • 352
  • 2
  • 8
0
votes
1 answer

adding a new column from an exiting column using Regular Expression

I am trying to extract followers count from the data below: {TruOptik': {follow_request_sent': False, profile_use_background_image': True, default_profile_image': False, id': 1308292578, profile_background_image_url_https': , verified': False,…
0
votes
1 answer

Ues cross function in openrefine with jython

Can I use the cross function ( https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#crosscell-c-string-projectname-string-columnname ) with jython language in openrefine (googlerefine 2.5)
jacquarg
  • 176
  • 1
  • 7
0
votes
1 answer

Add multiple consecutive whitespaces in a string replace operation

g.e. I would like to prepend all occurrences of the string "foo" with three spaces: value.replace(/(foo)/, " " + "$1") value.replace(/(foo)/, " $1") value.replace(/(foo)/, " " + " " + " $1") all return foo instead of foo
XedMada
  • 151
  • 8
0
votes
1 answer

multiple filters in google openrefine

I have a following table in googlerefine, Host Plugin Output 3 1 - 1 KB2932677 1 (MS14-014) 1 (2 1 vulnerabilities) - 1 KB2837617 1 (MS14-001) 1 (3 1 vulnerabilities) - 1 KB2760415 1 (MS13-091) I want output as all…
0
votes
2 answers

Google Refine does not recognize match

Using Google Refine, I'm trying to add a column based on the current column. The current column contains url params,…
Tjorriemorrie
  • 16,818
  • 20
  • 89
  • 131
0
votes
1 answer

Domain Names to Webpage Titles in OpenRefine

I have a column in Excel of domain names (like stackoverflow.com) and would like to create a corresponding column with the title of the domains (like "Stack Overflow"). I uploaded the Excel file into OpenRefine. I believe the best way to do this…
Tomero
  • 183
  • 2
  • 10
0
votes
2 answers

OpenRefine - Cross-column clustering

As it seems, cross-column clustering isn't supported yet with OpenRefine. Does anyone have any suggestions of how to cluster 'models' based on 'manufacturers', much like a 'city' would be based on a 'state' (many 'Springfield' could exist in the…
c-griffin
  • 2,938
  • 1
  • 18
  • 25
0
votes
2 answers

Open Refine / Google Refine - Remove blank cells in a column

The task is simple to understand, I have a table like this: And I would like to edit the column "L1_latitud" to collapse (or remove) all the blank cells: It looks like a simple task but I can't find out a way to deal with it.
Jesus
  • 655
  • 1
  • 7
  • 21