0

I am implementing a cache in Golang. Let's say the cache could be implemented as sync.Map with integer key and value as a struct:

type value struct {
    fileName     string
    functionName string
}

Huge number of records have the same fileName and functionName. To save memory I want to use string pool. Go has immutable strings and my idea looks like:

var (
    cache      sync.Map
    stringPool sync.Map
)

type value struct {
    fileName     string
    functionName string
}

func addRecord(key int64, val value) {
    fileName, _ := stringPool.LoadOrStore(val.fileName, val.fileName)
    val.fileName = fileName.(string)
    functionName, _ := stringPool.LoadOrStore(val.functionName, val.functionName)
    val.functionName = functionName.(string)
    cache.Store(key, val)
}

My idea is to keep every unique string (fileName and functionName) in memory once. Will it work?

Cache implementation must be concurrent safe. The number of records in the cache is about 10^8. The number of records in the string pool is about 10^6.

I have some logic that removes records from the cache. There is no problem with main cache size.

Could you please suggest how to manage string pool size?

I am thinking about storing reference count for every record in the string pool. It will require additional synchronizations or probably global locks to maintain it. I would like to implementation as simple as possible. You can see in my code snippet I don't use additional mutexes.

Or may be I need to follow completely different approach to minimize memory usage for my cache?

CAFxX
  • 28,060
  • 6
  • 41
  • 66
dhythhsba
  • 972
  • 2
  • 11
  • 21

1 Answers1

2

What you are trying to do with stringPool is commonly known as string interning. There are libraries like github.com/josharian/intern that provide "good enough" solutions to that kind of problem, and that do not require you to manually maintain the stringPool map. Note that no solution (including yours, assuming you eventually remove some elements from stringPool) can reliably deduplicate 100% of strings without incurring impractical levels of CPU overhead.

As a side note, it's worth pointing out that sync.Map is not really designed for update-heavy workloads. Depending on the keys used, you may actually experience significant contention when calling cache.Store. Furthermore, since sync.Map relies on interface{} for both keys and values, it normally incurs much more allocations that a plain map. Make sure to benchmark with realistic workloads to ensure that you picked the right approach.

CAFxX
  • 28,060
  • 6
  • 41
  • 66