0

I'm trying to parse a file with lines that consist of a key, a space, a number and then a newline.

My code works, but it doesn't smell right to me. Is there a better way to use Scanner? Particularly, I don't like having the Scan() inside the for-loop without any protection on it.

func TestScanner(t *testing.T) {
    const input = `key1 62128128\n
key2 8337182720\n
key3 7834959872\n
key4 18001920\n
key5 593104896\n`
    scanner := bufio.NewScanner(strings.NewReader(input))
    scanner.Split(bufio.ScanWords)
    for scanner.Scan() {
        key := scanner.Text()
        scanner.Scan()
        value := scanner.Text();
        fmt.Printf("k: %v, v: %v\n", key, value)
    }
}
adapt-dev
  • 1,608
  • 1
  • 19
  • 30
  • 1
    I wouldn't split on ScanWords. seems it would be more idiomatic to split on newlines as that ensures 1 grouping per iteration. Then within the iteration, using `strings.Split()`. idiomatic as in easier to read by others and understand what the logic is doing. – eduncan911 Jun 25 '16 at 11:58

2 Answers2

3

you should not use \n in input, and always check for errors.
working sample code:

package main

import (
    "bufio"
    "fmt"
    "strings"
)

func main() {
    const input = `key1 62128128
key2 8337182720
key3 7834959872
key4 18001920
key5 593104896`
    scanner := bufio.NewScanner(strings.NewReader(input))
    scanner.Split(bufio.ScanWords)
    for scanner.Scan() {
        key := scanner.Text()
        if !scanner.Scan() {
            break
        }
        value := scanner.Text()
        fmt.Printf("k: %v, v: %v\n", key, value)
    }
}

output:

k: key1, v: 62128128
k: key2, v: 8337182720
k: key3, v: 7834959872
k: key4, v: 18001920
k: key5, v: 593104896  

Also you may use Fscan which scans to desired type, like this:

package main

import "fmt"
import "strings"

func main() {
    const input = `key1 62128128
key2 8337182720
key3 7834959872
key4 18001920
key5 593104896`
    rdr := strings.NewReader(input)
    for {
        k, v := "", 0
        n, _ := fmt.Fscan(rdr, &k, &v)
        if n != 2 {
            //fmt.Println(err)
            break
        }
        fmt.Printf("%T: %[1]v, %T: %[2]v\n", k, v)
    }
}

output:

string: key1, int: 62128128
string: key2, int: 8337182720
string: key3, int: 7834959872
string: key4, int: 18001920
string: key5, int: 593104896
  • Thanks. Interestingly, Scanner is about 5x-10x faster than Fscan on my system, repeatedly doing 100k line files. – adapt-dev Jun 27 '16 at 21:20
  • @adapt-dev: if you are looking for faster input scanning see this: http://stackoverflow.com/questions/31333353/faster-input-scanning –  Jun 28 '16 at 05:47
1

Actually it's perfectly safe to do that, as Scan() validate input and set an error that you could get with Err().

So if you want to check if Scan() fail, you have to do it at the end of the loop as shown in many examples.

Your code shoud be:

func TestScanner(t *testing.T) {
    const input = `key1 62128128
key2 8337182720
key3 7834959872
key4 18001920
key5 593104896`
    scanner := bufio.NewScanner(strings.NewReader(input))
    scanner.Split(bufio.ScanWords)
    for scanner.Scan() {
        key := scanner.Text()
        scanner.Scan()
        value := scanner.Text();
        fmt.Printf("k: %v, v: %v\n", key, value)
    }

    if err := scanner.Err(); err != nil {
        fmt.Printf("Invalid input: %s", err)
    }

}
Mario Santini
  • 2,905
  • 2
  • 20
  • 27