0

I'm trying to use the string replacer node in Knime with a regex expression to standardize a field in a database. I need to extract a number from a string, and then replace the entire string with that number. For example,

String: The Yearly order limit = 5,000

The result would be: 5,000

String: monthly order limit is 5

Result: 5

But, I also need to know whether it is a monthly, quarterly, or yearly limit as well. Some of the variations I've tried are: .\*yearly.*([0-9,]+) and variations there of. Simply using ([0-9,]+) gets the number for me, but does not identify anything else. Including anything outside the parentheses will match the whole string overall, but referring to capture group 1 gives me a 0 every time, regardless of what the number is. Does anyone know how to do this? Thanks!

nekomatic
  • 5,988
  • 1
  • 20
  • 27
  • Welcome to Stack Overflow! Check how to create How to [create a Minimal, Complete, and Verifiable](http://stackoverflow.com/help/mcve) example so that you can get a much better response to your question. – n4m31ess_c0d3r Mar 15 '18 at 22:37

1 Answers1

0

Try this (escaped to be used in Java string literals):

(?i)[\\w ]*(yearly|monthly|quarterly)[\\w ]*(?:is|=) *([\\d,]+)

Now the yearly/monthly/quarterly is in group(1), and the number in group(2).

Full scala example (try online with Scastie, if you want; To convert to Java, add System.out and semicolons etc.):

val examples = List(
  "The Yearly order limit = 5,000",
  "monthly order limit is 5"
)

import java.util.regex.Pattern
import java.util.regex.Matcher

val p = Pattern.compile("(?i)[\\w ]*(yearly|monthly|quarterly)[\\w ]*(?:is|=) *([\\d,]+)")

for (e <- examples) {
  val m = p.matcher(e)
  if (m.matches()) {
    println(e + " matched")
    println("group 0: " + m.group(0))
    println("group 1: " + m.group(1))
    println("group 2: " + m.group(2))
  } else {
    println(e + " didn't match")
  }
}

Output:

The Yearly order limit = 5,000 matched
group 0: The Yearly order limit = 5,000
group 1: Yearly
group 2: 5,000
monthly order limit is 5 matched
group 0: monthly order limit is 5
group 1: monthly
group 2: 5
Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93