2

I've got two datasets with titles and other informations, but in dataset A I have titles, in dataset B I have titles and URL.
I have to put the URL in dataset A from dataset B. Some titles are the same in A and B, some others are not, some others are slightly different (and here comes the problem).

So I need to merge and cluster at the same time those who are similar. I know that I can reconcile with DBpedia, but what I need is to "reconcile" between the two dataset. Is it possible in some way?

Thank you.

Lara M.
  • 855
  • 2
  • 10
  • 23

1 Answers1

2

You can use reconcile-csv application (it's not plugin for OpenRefine, but standalone program that runs local reconciliation API server).

Export dataset B as csv with first row as column names, then start reconcile-csv, using URL as id column and name as search column:

java -Xmx2g -jar reconcile-csv-0.1.2.jar <CSV-File> <Search Column> <ID Column>

Then open dataset A and add http://localhost:8000/reconcile as reconciliation service. After reconciliation, cell.recon.match.id for each reconciled cell will contain URL.

kolen
  • 2,752
  • 2
  • 27
  • 35
  • afaik reconcile-csv just let you import the id from the dataset B. Then you will need to use the cell.cross function to actually import the URL – magdmartin Jul 11 '15 at 16:08
  • @magdmartin I suggest using URL as id, if urls are unique. – kolen Jul 11 '15 at 18:10
  • Thank you a lot for this answer! Sorry if I reply you so late. I tried a little bit with reconcile-csv but, not being so experienced, I had some problems in understanding how it works. I'll try it now your instructions, thanks! – Lara M. Jul 23 '15 at 15:51