-3

Documentation for Go's built-in regex pkg is here: https://golang.org/pkg/regexp/ Regex tester in Go here: https://regoio.herokuapp.com

I have a list of predefined words:

christmas, santa, tree  ( -> the order here is important. Check for words from left to right)

I am trying to check for one of the above words in different url strings:

/api/container/:containerID/santa           ( -> I want back santa)
/api/tree/:containerID/                     ( -> I want back tree)
/api/tree/:containerID/christmas            ( -> I want back christmas, not tree)

The regex I have tried is is:

re := regexp.MustCompile(`^(christmas)|(santa)|(tree)$`)
      fmt.Println("santa? ", string(re.Find([]byte(`/api/container/:containerID/santa`))))
      // output OK: santa? santa
      fmt.Println("tree? ", string(re.Find([]byte(`/api/tree/:containerID/`))))  
      // output FAIL/EMPTY: tree? 
      fmt.Println("christmas? ", string(re.Find([]byte(`/api/tree/:containerID/christmas`))))  
      // output FAIL/EMPTY: christmas? 

Have also tried the following, but that gives back the hole string, and not the words I am looking for:

re := regexp.MustCompile(`^.*(christmas).*|.*(santa).*|.*(tree).*$`
      fmt.Println("santa? ", string(re.Find([]byte(`/api/container/:containerID/santa`))))
      // output FAIL/HOLE URL BACK: santa? /api/container/:containerID/santa
      fmt.Println("tree? ", string(re.Find([]byte(`/api/tree/:containerID/`))))  
      // output FAIL/FAIL/HOLE URL BACK: tree? /api/tree/:containerID/ 
      string(re.Find([]byte(`/api/tree/:containerID/christmas`))))  
      // output FAIL/FAIL/HOLE URL BACK: christmas? /api/tree/:containerID/christmas

I do not know what is wrong with the last expression for the regex "engine" should only remember the things inside the paranthesis.

Mr. B
  • 1
  • 2
  • Are you expecting these words to be entire URL segments, or can they be substring matches? I.e. do you want to match only `/tree/` or also `/street/`? – Jonathan Hall Dec 11 '20 at 11:02
  • Also, can you explain why order of the target words is important? You say it is, but your code doesn't seem to pay any attention to the ordering. – Jonathan Hall Dec 11 '20 at 11:03
  • I want them to be entire url segment. – Mr. B Dec 11 '20 at 11:05
  • Good. That simplifies things. The easiest solution, then, is to stop using a regular expression (regular expresions are almost always the wrong tool), and instead just split your path into segements, and loop through them to see if any match. – Jonathan Hall Dec 11 '20 at 11:06
  • Most of the url`s contains the same words, but not all. So we want to check for the most specific word first that only some url`s contain, before we check for more and more genral cases. I try to set theorder in the regex expression. – Mr. B Dec 11 '20 at 11:08
  • 1
    Regexps are not the right tool for everything. Splitting the URL path and processing is trivial. – Volker Dec 11 '20 at 11:33

1 Answers1

0

Don't use a regular expression for this task. It's over-complex, hard to reason about (as you now know first hand), and slow. A much simpler approach is to simply loop over each path segment and look for a match:

needles := []string{"christmas", "santa", "tree"}
sampleURL := `/api/container/:containerID/santa`
for _, part := range strings.Split(sampleURL, "/") {
    for _, needle := range needles {
        if part == needle {
            fmt.Printf("found %s\n", needle)
        }
    }
}

If you have a lot of words you're searching for, efficiency may possibly be improved by using a map:

needles := []string{"christmas", "santa", "tree", "reindeer", "bells", "chior", /* and possibly hundreds more */ }
needleMap := make(map[string]struct{}, len(needles))
for _, needle := range needles {
    needleMap[needle] = struct{}{}
}

sampleURL := `/api/container/:containerID/santa`

for _, part := range strings.Split(sampleURL, "/") {
    if _, ok := needleMap[part]; ok {
        fmt.Printf("found %s\n", needle)
    }
}
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189