0

I have a column in Excel of domain names (like stackoverflow.com) and would like to create a corresponding column with the title of the domains (like "Stack Overflow").

I uploaded the Excel file into OpenRefine. I believe the best way to do this would be to call the "Add column by fetching URLs on column" function. But I don't know what expression to use.

pnuts
  • 58,317
  • 11
  • 87
  • 139
Tomero
  • 183
  • 2
  • 10

1 Answers1

0

The way I do it is as follows:

(1) Have visitable URLs in the source column. I.e., http://stackoverflow.com instead of just the domain name.

(2) Apply "Add column by fetching URLs..." as you said. (If you're hitting pages on the same domain over and over, make sure you set a reasonable delay.)

(3) Using this first new column, create a second new column based on newCol1 by parsing the HTML that's returned:

value.parseHtml().select("title")[0].toString()

Notes: (a) You need the toString() else you'll see blank values in the new column after you apply the function.

(b) You don't have to create a second new column; you could just apply a transform using the same formula as above.

(c) I've also tried using a split:

value.split("")[1].split("")[0]

I don't have my results handy at the moment, but I believe that also worked.

ultrageek
  • 637
  • 1
  • 6
  • 13