0

I need to write large amount of data to text file from a number of goroutines (say 30) concurrently. What I do is this:

workers.Add(core.Concurrency)
    for i := 0; i < core.Concurrency; i++ {
        go func() {
            defer workers.Done()
            writer := bufio.NewWriter(f)
            defer writer.Flush()
            a.Worker(workChan, writer)
        }()
    }

But this doesn't seems to work for some cases. Here f is the *os.File object. This doesn't writes to the file in some cases at all, and in some cases it writes some data but doesn't do future writes. The behaviour is very inconsistent and there are no errors too.

Any ideas why this might be happening?

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Ice3man
  • 115
  • 2
  • 9
  • @Flimzy That makes sense, i figured it was related to concurrency. I found this - https://github.com/free/concurrent-writer. Gonna try it out – Ice3man Mar 16 '19 at 16:04
  • Because bufio.Writer flushes when the buffer is full, data is written in an unpredictable order to `f`. There are ways to fix this. Give some details on what the application is writing (is the app writing delimited records of some sort, how large are the records, ...) – Charlie Tumahai Mar 16 '19 at 16:05
  • @Flimzy I have multiple writers because the data that needs to be parsed is fairly large, it's actually a large number of dns names, that are being scraped from an online resource so i wanted multiple writer in order to gain speed. I guess single writer isn't that bad either. – Ice3man Mar 16 '19 at 16:08
  • @ThunderCat app is writing host names newline separated to the file, and yeah the file is unprotected. I think a global mutex on the file would be sufficient. – Ice3man Mar 16 '19 at 16:09
  • The size of the data shouldn't matter. Channels are very light weight--sending the data to a single goroutine should add practically no overhead (other than any possible I/O delay which comes from waiting for the previous write to complete--but that's what you need) – Jonathan Hall Mar 16 '19 at 16:10
  • 1
    @Flimzy It actually makes sense, i'll go with a single writer. Thanks for your help. – Ice3man Mar 16 '19 at 16:11
  • You can write directly to the file with correct results, but you probably want to buffer given the small record size. Create one buffered writer and protect with a mutex or create a writing goroutine as suggested in the answer. – Charlie Tumahai Mar 16 '19 at 16:31

1 Answers1

3

The problem is that you have multiple goroutines trying to write to a file at once, without any synchronization. This will lead to an unpredicatable output order, and possibly lost writes. Using buffered I/O on top of that just helps to obscure the behavior.

The best solution is to kick off a single goroutine that writes to your output (with or without buffered I/O, depending on your needs), and have all of your workers send the data to be written to the writing goroutine over a channel.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189