4

I want to build a buffer in Go that supports multiple concurrent readers and one writer. Whatever is written to the buffer should be read by all readers. New readers are allowed to drop in at any time, which means already written data must be able to be played back for late readers.

The buffer should satisfy the following interface:

type MyBuffer interface {
    Write(p []byte) (n int, err error)
    NextReader() io.Reader
}

Do you have any suggestions for such an implementation preferably using built in types?

icza
  • 389,944
  • 63
  • 907
  • 827
Tympanix
  • 113
  • 1
  • 2
  • 9
  • 1
    https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying https://kafka.apache.org/intro – dm03514 Jun 01 '17 at 15:21
  • 2
    Nothing in the standard library will do this. You can use a custom structure built around channels, though, where each reader echoes what they read back to the channel so other readers can read it. Problem is defining where the limit is. You want to be able to play back old data for late readers, but that means retaining all of that data forever (well, for the life of the program), since you never know when a new reader will join. That's a large memory leak risk. – Kaedys Jun 01 '17 at 15:26
  • If retaining and replaying old data is less important, this provides a solid implementation of the single-broadcaster-multiple-receivers system: https://rogpeppe.wordpress.com/2009/12/01/concurrent-idioms-1-broadcasting-values-in-go-with-linked-channels/ – Kaedys Jun 01 '17 at 15:29
  • https://github.com/djherbis/bufit – wim Oct 08 '20 at 23:58

3 Answers3

7

Depending on the nature of this writer and how you use it, keeping everything in memory (to be able to re-play everything for readers joining later) is very risky and might demand a lot of memory, or cause your app to crash due to out of memory.

Using it for a "low-traffic" logger keeping everything in memory is probably ok, but for example streaming some audio or video is most likely not.

If the reader implementations below read all the data that was written to the buffer, their Read() method will report io.EOF, properly. Care must be taken as some constructs (such as bufio.Scanner) may not read more data once io.EOF is encountered (but this is not the flaw of our implementation).

If you want the readers of our buffer to wait if no more data is available in the buffer, to wait until new data is written instead of returning io.EOF, you may wrap the returned readers in a "tail reader" presented here: Go: "tail -f"-like generator.

"Memory-safe" file implementation

Here is an extremely simple and elegant solution. It uses a file to write to, and also uses files to read from. The synchronization is basically provided by the operating system. This does not risk out of memory error, as the data is solely stored on the disk. Depending on the nature of your writer, this may or may not be sufficient.

I will rather use the following interface, because Close() is important in case of files.

type MyBuf interface {
    io.WriteCloser
    NewReader() (io.ReadCloser, error)
}

And the implementation is extremely simple:

type mybuf struct {
    *os.File
}

func (mb *mybuf) NewReader() (io.ReadCloser, error) {
    f, err := os.Open(mb.Name())
    if err != nil {
        return nil, err
    }
    return f, nil
}

func NewMyBuf(name string) (MyBuf, error) {
    f, err := os.Create(name)
    if err != nil {
        return nil, err
    }
    return &mybuf{File: f}, nil
}

Our mybuf type embeds *os.File, so we get the Write() and Close() methods for "free".

The NewReader() simply opens the existing, backing file for reading (in read-only mode) and returns it, again taking advantage of that it implements io.ReadCloser.

Creating a new MyBuf value is implementing in the NewMyBuf() function which may also return an error if creating the file fails.

Notes:

Note that since mybuf embeds *os.File, it is possible with a type assertion to "reach" other exported methods of os.File even though they are not part of the MyBuf interface. I do not consider this a flaw, but if you want to disallow this, you have to change the implementation of mybuf to not embed os.File but rather have it as a named field (but then you have to add the Write() and Close() methods yourself, properly forwarding to the os.File field).

In-memory implementation

If the file implementation is not sufficient, here comes an in-memory implementation.

Since we're now in-memory only, we will use the following interface:

type MyBuf interface {
    io.Writer
    NewReader() io.Reader
}

The idea is to store all byte slices that are ever passed to our buffer. Readers will provide the stored slices when Read() is called, each reader will keep track of how many of the stored slices were served by its Read() method. Synchronization must be dealt with, we will use a simple sync.RWMutex.

Without further ado, here is the implementation:

type mybuf struct {
    data [][]byte
    sync.RWMutex
}

func (mb *mybuf) Write(p []byte) (n int, err error) {
    if len(p) == 0 {
        return 0, nil
    }
    // Cannot retain p, so we must copy it:
    p2 := make([]byte, len(p))
    copy(p2, p)
    mb.Lock()
    mb.data = append(mb.data, p2)
    mb.Unlock()
    return len(p), nil
}

type mybufReader struct {
    mb   *mybuf // buffer we read from
    i    int    // next slice index
    data []byte // current data slice to serve
}

func (mbr *mybufReader) Read(p []byte) (n int, err error) {
    if len(p) == 0 {
        return 0, nil
    }
    // Do we have data to send?
    if len(mbr.data) == 0 {
        mb := mbr.mb
        mb.RLock()
        if mbr.i < len(mb.data) {
            mbr.data = mb.data[mbr.i]
            mbr.i++
        }
        mb.RUnlock()
    }
    if len(mbr.data) == 0 {
        return 0, io.EOF
    }

    n = copy(p, mbr.data)
    mbr.data = mbr.data[n:]
    return n, nil
}

func (mb *mybuf) NewReader() io.Reader {
    return &mybufReader{mb: mb}
}

func NewMyBuf() MyBuf {
    return &mybuf{}
}

Note that the general contract of Writer.Write() includes that an implementation must not retain the passed slice, so we have to make a copy of it before "storing" it.

Also note that the Read() of readers attempts to lock for minimal amount of time. That is, it only locks if we need new data slice from buffer, and only does read-locking, meaning if the reader has a partial data slice, will send that in Read() without locking and touching the buffer.

icza
  • 389,944
  • 63
  • 907
  • 827
  • 1
    Thank you very much. Elegant solution. Your suggestion with the `io.WriteCloser` is a good idea. Ideally the `Read()` method of readers would wait for more data, until the writer has been closed, at which point they receive the `io.EOF`. Concerning the memory consumption, one could start the buffer in memory, and dump the data to disk, once a certain size was exceeded. I will use your suggestion to work my way towards that. Thank you once again. – Tympanix Jun 03 '17 at 01:11
1

I linked to the append only commit log, because it seems very similar to your requirements. I am pretty new to distributed systems and the commit log so I may be butchering a couple of the concepts, but the kafka introduction clearly explains everything with nice charts.

Go is also pretty new to me, so i'm sure there's a better way to do it:

But perhaps you could model your buffer as a slice, I think a couple of cases:

  • buffer has no readers, new data is written to the buffer, buffer length grows
  • buffer has one/many reader(s):

    • reader subscribes to buffer
    • buffer creates and returns a channel to that client
    • buffer maintains a list of client channels
    • write occurs -> loops through all client channels and publishes to it (pub sub)

This addresses a pubsub real time consumer stream, where messages are fanned out, but does not address the backfill.

Kafka enables a backfill and their intro illustrates how it can be done :)

This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads records, but, in fact, since the position is controlled by the consumer it can consume records in any order it likes. For example a consumer can reset to an older offset to reprocess data from the past or skip ahead to the most recent record and start consuming from "now".

This combination of features means that Kafka consumers are very cheap—they can come and go without much impact on the cluster or on other consumers. For example, you can use our command line tools to "tail" the contents of any topic without changing what is consumed by any existing consumers.

dm03514
  • 54,664
  • 18
  • 108
  • 145
1

I had to do something similar as part of an experiment, so sharing:

type MultiReaderBuffer struct {
    mu  sync.RWMutex
    buf []byte
}

func (b *MultiReaderBuffer) Write(p []byte) (n int, err error) {
    if len(p) == 0 {
        return 0, nil
    }
    b.mu.Lock()
    b.buf = append(b.buf, p...)
    b.mu.Unlock()
    return len(p), nil
}

func (b *MultiReaderBuffer) NewReader() io.Reader {
    return &mrbReader{mrb: b}
}

type mrbReader struct {
    mrb *MultiReaderBuffer
    off int
}

func (r *mrbReader) Read(p []byte) (n int, err error) {
    if len(p) == 0 {
        return 0, nil
    }
    r.mrb.mu.RLock()
    n = copy(p, r.mrb.buf[r.off:])
    r.mrb.mu.RUnlock()
    if n == 0 {
        return 0, io.EOF
    }
    r.off += n
    return n, nil
}
Rodolfo Carvalho
  • 1,737
  • 1
  • 20
  • 18