0

I'm facing a problem as an exercise, which is a small variant of classical producer-consumer using two threads. One thread is the producer (P), and the other one the consumer (C).

I have to process a big file, but I can only read it in pieces, like 128 bytes at once. The file contains strings separated by numbers, in a format like this.

[01] this is a string [02] since here it starts another string or sub-string if you wish[02] this is still the first string[01]

P reads from the file, and sends the data to C. If the C is done than the P writes the new data back to the file. The problem is that the elaboration of that data, may not be entirely within that 128 bytes array.

Let's suppose that the C thread need to remove the substring from [02] to [02]. But the data array is filled from the first [01] to the middle of the substring. I cannot check for the 2nd [02] because it will be read by the producer in a successive call.

I need a way to link between 2 or more pairs of 128-bytes reads. (The sync is not a problem I can of course use critical section and event to handle that.)

My idea is to use a boolean value that keep track if I had something to remove in the current data. But it's not a good way, because [02] may contain another substring [03] which needs to be removed. Also the bracket [ may be at array[127] so I don't know if the next read array[0-1] == 02.

How might I approach this?

alexth
  • 29
  • 1
  • 3
  • 2
    It was hard to parse your question but I [edited it](http://stackoverflow.com/revisions/26503712/2). Still, your question would be better if it had sample inputs and outputs; you don't formalize it well enough to say if you are always looking for *two* digits between brackets, etc. Also providing a "broken" example of an implementation and explaining how it's broken shows effort. Advice: pick a smaller test than 128. That's kind of a large number; go with a lower length and it's easier to study the edge cases and tailor your examples. – HostileFork says dont trust SE Oct 22 '14 at 09:20
  • What you mean exactly, not sure how much difference there is between 128 or 16, there are always the same condition to take care of (like a '[' at the end of the array). My original idea (not depending on the buffer size) is to check every char in the buffer, if I find a '[' I save till that point, then I scroll the array and read another piece and fill the array starting from '[' position. – alexth Oct 22 '14 at 09:46
  • Yes, but I mean if you chose a smaller size then you could more easily give us a bit of code you've written that is a scaffold representing your attempt that doesn't work...and you wouldn't need long inputs to show the failure. That scaffold shows you have put effort into the problem yourself. Please read up on ["Minimal, Complete, Verifiable Example"](http://stackoverflow.com/help/mcve) which is the best way to ask for help here. A self-running, failing program--that takes you to the precise point of failure--may be the smoking gun that lets you answer your own question! – HostileFork says dont trust SE Oct 22 '14 at 09:52
  • Ah OK I understand now, but I don't have a non-working code, my code is working, I would just compare different solutions, see if there is already a well-know way to deal with these kind of "interconnected" buffers you read one by one. – alexth Oct 22 '14 at 10:01
  • If your code works then there is no question. You need at least a graph showing a performance curve you get, and a reasoning for why it could be better! It seems you were presenting an edge case unhandled: *"it's not a good way, because [02] may contain another substring [03] which needs to be removed. Also the bracket [ may be at array[127] so I don't know if the next read array[0-1] == 02."* If we can't tell how this adds up to "not working" (on some input) and you say your code "works" then perhaps you see why I encourage refining the question to have a clear focus, with code! – HostileFork says dont trust SE Oct 22 '14 at 10:05

0 Answers0