0

I have a literal string like "10-16.7" . I want to filter it and take the first number. In this example, I want to filter only the number 10 .

I think this is the correct regex: (["'])(.*-*)-.* but when I try it on my query it doesn't recognize the ["'] pattern. So I found this Escape sequences in strings but when I try FILTER regex(?mystring, "(\")(.*-*)-.*") } it doesn't give an error but it doesn't catch anything either. ( My dataset contains values like "50-58", "9.4-13", "72-85", etc.)

linous
  • 263
  • 1
  • 6
  • 17

1 Answers1

1

If you have a number followed by a dash followed by anything, I would use the following regex (assuming you don't need to worry about the numbers being valid, as this matches strings like "00323..23....3.-2", for example):

^([0-9.]+)-.*

I'm assuming the quotation marks are not actually part of the string. If they are, just add \": ^\"([0-9.]+)-.*

To extract the number, you need to BIND it to a variable, like so:

FILTER(REGEX(?test, "^[0-9.]+-"))
BIND(REPLACE(?mystring, "^([0-9.]+)-.*", "$1") AS ?number)

Here I get the number by replacing the string with the first capture group ($1), which matches the number, and bind the result to a variable called ?number.

evsheino
  • 2,147
  • 18
  • 20
  • 1
    Yes, if the pattern is actually digits, then dot, then hyphen, this is the way to go. If it's always first the two characters, though, just getting the substring would probably be better. – Joshua Taylor Apr 25 '16 at 15:34
  • 1
    Yes, you're right, except that the pattern here is "digits and/or dots, then hyphen", and not "digits, then dot, then hyphen". – evsheino Apr 25 '16 at 15:47
  • Actually, I do think using a regex is a bit safer (unless you're 100% sure you want the first n characters), as it's harder to validate you got it right with the substring. – evsheino Apr 25 '16 at 16:08
  • you 're right, I wanted exactly this. The regex catches the first value before the dash and then stores it in the ?number – linous Apr 26 '16 at 16:00