0

I am parsing an Apache log that I have customised to give me two values only: "time" and "memory" (the values are number of milliseconds and number of bytes) that are both are int64 or float64, but I am using regexp and Go to parse through the file, so when I am matching the contents of the files it return "[]" (empty braces) and is not populating the slice, my code is:

for _, line := range lines {
    var buffer bytes.Buffer

    buffer.WriteString(`\[0-9]+\s`)
    buffer.WriteString(`[0-9]+\s`)
    re1, err := regexp.Compile(buffer.String())

    if err != nil {
        log.Fatalf("regexp: %s", err)
    }
    result := re1.FindStringSubmatch(line)
    fmt.Println(result)
}

When I am printing result, it gives me empty braces and when I am running the whole program, it gives index out of range (which is understandable because result is empty).

My data looks like this:

1040 3952
2849 6832
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
azee
  • 27
  • 1
  • 7
  • You have `\[0-9]+\s[0-9]+\s` regex, the first `[` should not be escaped. You are using `FindStringSubmatch` but your regex has no capturing groups. What is the expected result? – Wiktor Stribiżew Apr 03 '19 at 06:59
  • Don't use a regex for this. Just split the input on whitespace. Regex is slow, cumbersome, and hard to read for such a trivial example. – Jonathan Hall Apr 03 '19 at 07:46
  • If you need to only match lines with 2 numbers on them you may use ``regexp.Compile(`(?m)^([0-9]+)\s+([0-9]+)$`)``, see [this demo](https://play.golang.org/p/nE0j5dwlGpR) – Wiktor Stribiżew Apr 03 '19 at 07:49

1 Answers1

0

Regexp is entirely the wrong tool for this job. It will be much easier to read, and much faster to operate, to just use strings.Split or strings.Fields:

for _, line := range lines {
    fields := strings.Fields(line)
    ms := fields[0]
    size := fields[1]
    fmt.Printf("time: %v, size: %v\n", ms, size)
}

If you want to convert these to numbers, you can easily do so with the strconv package, with the additional benefit that it will detect if you get unexpected (non-numeric) input):

for _, line := range lines {
    fields := strings.Fields(line)
    ms, err := strconv.Itoa(fields[0])
    if err != nil {
        log.Fatalf("time field: %s", err)
    }
    size, err := strconv.Atoi(fields[1])
    if err != nil {
        log.Fatalf("size field: %s", err)
    }
    fmt.Printf("time: %v, size: %v\n", ms, size)
}


If you do insist on using a regular expression, at least compile it only once, outside of your for loop:
re, err := regexp.Compile( ... )
if err != nil {
    log.Fatalf("regexp: %s", err)
}
for _, line := range lines {
    result := re.FindStringSubmatch(line)
    fmt.Println(result)
}
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189