3

I have a column of names like:

  • Quaglia, Pietro Paolo
  • Bernard, of Clairvaux, Saint, or
  • .E., Calvin F.
  • Swingle, M Abate, Agostino, Assereto
  • Abati, Antonio
  • 10-NA)\u, Ferraro, Giuseppe, ed, Biblioteca comunale ariostea. Mss. (Esteri

I want to make a Custom text facet with openrefine that mark as "true" the names with one comma and "false" all the others, so that I can work with those last (".E., Calvin F." is not a problem, I'll work with that later).

I'm trying using "Custom text facet" and this expression:

if(value.match(/([^,]+),([^,]+)/), "true", "false")

But the result is all false. What's the wrong part?

Tom Morris
  • 10,490
  • 32
  • 53
Lara M.
  • 855
  • 2
  • 10
  • 23
  • 1
    Your [regex seems correct](https://regex101.com/r/iG4hX6/1) - is there really the expected value in the `value` variable? Additional hint: I have added the newline character in the second subexpression, this does not matter if you only have one string at a time. – Jan Feb 17 '16 at 09:53

3 Answers3

3

The expression you are using:

if(value.match(/([^,]+),([^,]+)/), "true", "false")

will always evaluate to false because the output of the 'match' function is either an array, or null. When evaluated by 'if' neither an array nor 'null' evaluate to true.

You can wrap the match function in a 'isNonBlank' or similar to get a boolean true/false, which would then cause the 'if' function to work as you want. However, once you have a boolean true/false result the 'if' becomes redundant as its only function is to turn the boolean true/false into string "true" or "false" - which won't make any difference to the values function of the custom text facet.

So:

isNonBlank(value.match(/([^,]+),([^,]+)/))

should give you the desired result using match

Owen Stephens
  • 1,550
  • 1
  • 8
  • 10
1

Instead of using 'match' you could use 'split' to split the string into an array using the comma as a split character. If you measure the length of the resulting array, it will give you the number of commas in the string (i.e. number of commas = length-1).

So your custom text facet expression becomes:

value.split(",").length()==2

This will give you true/false

If you want to break down the data based on the number of commas that appear, you could leave off the '==2' to get a facet which just gives you the length of the resulting array.

Owen Stephens
  • 1,550
  • 1
  • 8
  • 10
0

I would go with lookahead assertion to check if only 1 "," can find from the beginning until the end of line.

^(?=[^\,]+,[^\,]+$).* https://regex101.com/r/iG4hX6/2

zolo
  • 444
  • 2
  • 6