0

I am working on a new type of database, using GO. One of the things I would like to do is have a distributed disk so that I can distribute queries over multiple machines (think Pi type architectures). This means building my own structures on raw disk.

My challenge is that I can't find a GO package that will let me write N bytes from a pointer to a structure. All the IO packages limit the access to []byte slices.

That's nice for protection, but if I have to buffer everything through a byte array via some form of encoding it will slow down the access to a specific object.

Anyone got any idea on how to do raw IO? Or am I going to have to handle GOBs as my unit of IO and suffer the penalty for encoding/decoding?

3 Answers3

3

Big warning first: don't do it: it is neither safe nor portable

For a given struct, you can reflect over it to figure out the in-memory size of the actual struct, then unsafely cast it to a []byte using unsafe.

eg: (*[in-mem size]byte)(unsafe.Pointer(&mystruct))

This will give you something C-ish with absolutely no safety guarantees or portability.

I'll quote the Go spec:

A package using unsafe must be vetted manually for type safety and may not be portable.

You can find a lot more details in this Go and Memory layout post, including all the steps you need to unsafely treat structs as just bytes.

Overall, it's fascinating to examine how Go functions on a low level, but this is absolutely the wrong thing to do in your case. Any real data infrastructure will need storage logic way more complicated than just dumping in-memory structs to disk anyway.

Marc
  • 19,394
  • 6
  • 47
  • 51
  • Oh, I do agree. For reference, I used to write device drives on VMS in days long ago, and I know that there is a lot more than dumping raw values.I was involved in writing a driver to make a Cray XMP appear as a peripheral on the VAX. I miss my low level stuff! I just wanted to get started on some simple test cases and hit a brick wall. I know it's all 'unsafe', but then I am writing for a specific use case, on specific hardware configuration. Portability is not my issue. Thanks for the reply, and so fast. – Adrian Challinor Jan 06 '18 at 19:57
  • The portability is important, btw. The size of those structures will change depending on 32 or 64-bit architecture, and the ordering and padding of those structures is not guaranteed to be the same between Go versions or the target OS/architecture for the compilation. This really is a terribly bug-prone practice. There's a _reason_ Go doesn't let you muck with memory directly or do C things like pointer arithmetic. It causes super brittle and buggy code that's insanely difficult to debug. – Kaedys Jan 08 '18 at 21:02
2

In general, you cannot do raw IO of a Go struct (i.e. memdump). This is because many things in Go contain pointers, and the actual data is not contiguous in memory.

For example, a struct like this:

type Person struct {
    Name string
}

contains a string, which in turn contains a pointer to the bytes of the string. A raw memdump would only dump the pointer.

The solution is serialization. This is never free, although some implementations do a pretty good job.

The closest to what you are describing is something like go-memdump, but I wouldn't recommend it for production.

Otherwise, I recommend looking at a performant serialization technique. (Go's gob encoding is not the best.)

chowey
  • 9,138
  • 6
  • 54
  • 84
0

...Or am I going to have to handle GOBs as my unit of IO and suffer the penalty for encoding/decoding?

Just use GOBs. Premature optimization is the root of all evil.

Amnon
  • 334
  • 2
  • 7