3

I trying to use this package in Github for string matching. My dictionary is 4 MB. When creating the Trie, I got fatal error: runtime: out of memory. I am using Ubuntu 14.04 with 8 GB of RAM and Golang version 1.4.2.

It seems the error come from the line 99 (now) here : m.trie = make([]node, max)

The program stops at this line.

This is the error:

fatal error: runtime: out of memory

runtime stack:
runtime.SysMap(0xc209cd0000, 0x3b1bc0000, 0x570a00, 0x5783f8)
    /usr/local/go/src/runtime/mem_linux.c:149 +0x98
runtime.MHeap_SysAlloc(0x57dae0, 0x3b1bc0000, 0x4296f2)
    /usr/local/go/src/runtime/malloc.c:284 +0x124
runtime.MHeap_Alloc(0x57dae0, 0x1d8dda, 0x10100000000, 0x8)
    /usr/local/go/src/runtime/mheap.c:240 +0x66

goroutine 1 [running]:
runtime.switchtoM()
    /usr/local/go/src/runtime/asm_amd64.s:198 fp=0xc208518a60 sp=0xc208518a58
runtime.mallocgc(0x3b1bb25f0, 0x4d7fc0, 0x0, 0xc20803c0d0)
    /usr/local/go/src/runtime/malloc.go:199 +0x9f3 fp=0xc208518b10 sp=0xc208518a60
runtime.newarray(0x4d7fc0, 0x3a164e, 0x1)
    /usr/local/go/src/runtime/malloc.go:365 +0xc1 fp=0xc208518b48 sp=0xc208518b10
runtime.makeslice(0x4a52a0, 0x3a164e, 0x3a164e, 0x0, 0x0, 0x0)
    /usr/local/go/src/runtime/slice.go:32 +0x15c fp=0xc208518b90 sp=0xc208518b48
github.com/mf/ahocorasick.(*Matcher).buildTrie(0xc2083c7e60, 0xc209860000, 0x26afb, 0x2f555)
    /home/go/ahocorasick/ahocorasick.go:104 +0x28b fp=0xc208518d90 sp=0xc208518b90
github.com/mf/ahocorasick.NewStringMatcher(0xc208bd0000, 0x26afb, 0x2d600, 0x8)
    /home/go/ahocorasick/ahocorasick.go:222 +0x34b fp=0xc208518ec0 sp=0xc208518d90
main.main()
    /home/go/seme/substrings.go:66 +0x257 fp=0xc208518f98 sp=0xc208518ec0
runtime.main()
    /usr/local/go/src/runtime/proc.go:63 +0xf3 fp=0xc208518fe0 sp=0xc208518f98
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208518fe8 sp=0xc208518fe0
exit status 2

This is the content of the main function (taken from the same repo: test file)

var dictionary = InitDictionary()   
var bytes = []byte(""Partial invoice (€100,000, so roughly 40%) for the consignment C27655 we shipped on 15th August to London from the Make Believe Town depot. INV2345 is for the balance.. Customer contact (Sigourney) says they will pay this on the usual credit terms (30 days).")   

var precomputed = ahocorasick.NewStringMatcher(dictionary)// line 66 here
fmt.Println(precomputed.Match(bytes))
David
  • 311
  • 1
  • 4
  • 14
  • What do you have at line 66 in your program? – squiguy May 31 '16 at 17:24
  • 2
    BTW, in general, questions need to include enough information to be answerable *in the question itself*; a question that folks can only answer after following a github link is technically against the rules here, though since you've already got some good answers this isn't likely to be closed. – Charles Duffy May 31 '16 at 17:49
  • I had a similar issue, but at 256GB. Turns out it was a linux kernel setting https://stackoverflow.com/questions/41268091/increase-maximum-virtual-memory-size-above-256gb – Josh Hibschman Aug 25 '23 at 14:36

3 Answers3

12

Your structure is awfully inefficient in terms of memory, let's look at the internals. But before that, a quick reminder of the space required for some go types:

  • bool: 1 byte
  • int: 4 bytes
  • uintptr: 4 bytes
  • [N]type: N*sizeof(type)
  • []type: 12 + len(slice)*sizeof(type)

Now, let's have a look at your structure:

type node struct {
    root bool        // 1 byte
    b []byte         // 12 + len(slice)*1
    output bool      // 1 byte
    index int        // 4 bytes
    counter int      // 4 bytes
    child [256]*node // 256*4 = 1024 bytes
    fails [256]*node // 256*4 = 1024 bytes
    suffix *node     // 4 bytes
    fail *node       // 4 bytes
}

Ok, you should have a guess of what happens here: each node weighs more than 2KB, this is huge ! Finally, we'll look at the code that you use to initialize your trie:

func (m *Matcher) buildTrie(dictionary [][]byte) {
    max := 1
    for _, blice := range dictionary {
        max += len(blice)
    }
    m.trie = make([]node, max)

    // ...
}

You said your dictionary is 4 MB. If it is 4MB in total, then it means that at the end of the for loop, max = 4MB. It it holds 4 MB different words, then max = 4MB*avg(word_length).

We'll take the first scenario, the nicest one. You are initializing a slice of 4M of nodes, each of which uses 2KB. Yup, that makes a nice 8GB necessary.

You should review how you build your trie. From the wikipedia page related to the Aho-Corasick algorithm, each node contains one character, so there is at most 256 characters that go from the root, not 4MB.

Some material to make it right: https://web.archive.org/web/20160315124629/http://www.cs.uku.fi/~kilpelai/BSA05/lectures/slides04.pdf

fatal_error
  • 5,457
  • 2
  • 18
  • 18
T. Claverie
  • 11,380
  • 1
  • 17
  • 28
5

The node type has a memory size of 2084 bytes. I wrote a litte program to demonstrate the memory usage: https://play.golang.org/p/szm7AirsDB

As you can see, the three strings (11(+1) bytes in size) dictionary := []string{"fizz", "buzz", "123"} require 24 MB of memory.

If your dictionary has a length of 4 MB you would need about 4000 * 2084 = 8.1 GB of memory.

So you should try to decrease the size of your dictionary.

stonith
  • 166
  • 5
1

Set resource limit to unlimited worked for me

if ulimit -a return 0 run ulimit -c unlimited

Maybe set a real size limit to be more secure