-1

I have an application which loads a byte array of several gigabytes. I dont have control of the binary format. The program spends most of its time converting sections of the array into strings, doing string manipulation and then releasing all of the strings. It occasionally runs out of memory when there are large numbers of clients triggering large numbers of objects being allocated in memory.

Given that the byte array lives in memory for the entire life of he Application it seems like an ideal candidate for using the unsafe package to avoid memory allocation.

Just testing this out in the go playground, it appears a "SliceHeader" is needed to generate an actual string. But this means a "SliceHeader" must still be allocated every time a string needs to be returned. (i.e. the "x" variable in this example)

func main() {
    t := []byte{
        65, 66, 67, 68, 69, 70,
        71, 72, 73, 74, 75, 76,
        77, 78, 79, 80, 81, 82,
        83, 84, 85,
    }
    var x [10]reflect.StringHeader

    h := (*reflect.StringHeader)(unsafe.Pointer(&x[0]))
    h.Len = 4
    h.Data = uintptr(unsafe.Pointer(&t[8]))

    fmt.Printf("test %v\n", *(*string)(unsafe.Pointer(&x[0])))

    h = (*reflect.StringHeader)(unsafe.Pointer(&x[1]))
    h.Len = 4
    h.Data = uintptr(unsafe.Pointer(&t[3]))

    fmt.Printf("test %v\n", *(*string)(unsafe.Pointer(&x[1])))
}

I could probably attach an array with a fixed length set of string header objects to each client when they connect to the server (that is re-cycled when new clients connect).

This means that 1. string data would no longer be copied around, and 2. string headers are not being allocated/garbage collected. 3. We know the maximum number of clients per server because they have a fixed/hardcoded amount of stringheaders available when they are pulling out strings.

Am I on track, crazy? Let me know Thanks.

Jay
  • 19,649
  • 38
  • 121
  • 184
  • 3
    We don't know much of your use, but you may just do byte slice operations instead of string operations. Do you really need to convert to `string`? What is it you need in the form of string that you can't do on `[]byte`? The `bytes` package mirrors `strings`, giving you all the utilities. – icza Jan 18 '22 at 07:06
  • I like how you are thinking. The strings are also going into third party code that requires strings, and then formatted JSON, Im not sure if I can make it work... maybe... – Jay Jan 18 '22 at 07:13
  • 3
    You can send a byte slice to the output just like you can send a `string`, the former might be even faster. Marshaling into JSON also generates a `[]byte` and not a `string`, still I don't see any issue. – icza Jan 18 '22 at 07:14
  • JSON is only part of it, one of the things that without fail causes an OOM on a server is when google crawls the PDF API, millions of tiny objets are generated in memory. Stopping all of those tiny strings going into thee PDF library will help somewhat. (Although I haven't benchmarked that, im still exploring the unsafe package to see what is possible) – Jay Jan 18 '22 at 07:18

1 Answers1

2

Use the following function to convert a byte slice to a string without allocation:

func btos(p []byte) string {
    return *(*string)(unsafe.Pointer(&p))
}

The function takes advantage of the fact that the memory layout for a string header is a prefix of the memory layout for a slice header.

Do not modify the backing array of the slice after calling this function -- that will break the assumption that strings are immutable.

Use the function like this:

t := []byte{
    65, 66, 67, 68, 69, 70,
    71, 72, 73, 74, 75, 76,
    77, 78, 79, 80, 81, 82,
    83, 84, 85,
}
s := btos(t[8:12])
fmt.Printf("test %v\n", s) // prints test IJKL

s = btos(t[3:7])
fmt.Printf("test %v\n", s) // prints test DEFG
  • Wow, I just tried this with a section of our code, and it benchmarks _20x faster_ when we use your code instead of `string(data[x:x+l])`. I am now wondering if this alone might solve the memory issues. Thanks! – Jay Jan 18 '22 at 08:00
  • I am wondering why this is so much faster. Is the difference simply that it avoids memory copy operations.. hmm.. I still have more to learn about go memory management and performance. Thank you. – Jay Jan 18 '22 at 08:02
  • 1
    `btos(t[8:12])` simply returns a string header copied from the slice header. `string(t[8:12])` allocates memory for the string data, copies the data from the byte slice to the newly allocated memory, and returns the string header that references the memory. –  Jan 18 '22 at 16:36