3

I have an algorithm that requires making two passes a file's data. The file may be stdin or stream (like a |) since this is a command line tool, which makes me unfortunately (to the best of my knowledge) rule out mmap.

I require the information from the 1st pass in order to perform a write operation on the 2nd pass. This is because I need a sum of all the bytes on the first pass for a specific cipher on the second pass.

One way I have thought of to do this is to use the heap as a single contiguous region of memory, and to allocate additional size once the end has been reached with sbrk (similar to what I believe the first implementation of bash) did. Is there a simple way to do this?

Specifically, how can I avoid stdlib's set-up of the heap and do so myself?

user129393192
  • 797
  • 1
  • 8
  • You probably either need to read the whole input into malloced memory, or you need copy it into a temporary file. OTOH, once you have copied the input into a file, you can use `mmap` on that file. – Jabberwocky Jun 12 '23 at 08:37
  • 2
    Read into a buffer, doing all your first-pass processing while doing that. Then use the buffer for your second pass. If you read from `stdin` (no matter if it's from a terminal or a pipe) that's the *only* way to do multiple passes over the input, because once the data is read you can't seek back. If the buffer is in memory or as a temporary file doesn't matter, but you need some kind of buffer to store the input. – Some programmer dude Jun 12 '23 at 08:40
  • 1
    Two-pass problems are rare. What's yours? Entire compilers can be written in one pass. – user207421 Jun 12 '23 at 08:46
  • 1
    @user129393192, "Do I simply have to allocate memory for the entire length of the data as I read, or is there an alternative?" --> Yes. Post details of your task to find useful alternatives. – chux - Reinstate Monica Jun 12 '23 at 10:36
  • Also please note that whenever "efficiency" is mentioned, its more often than not a [red herring](https://en.wikipedia.org/wiki/Red_herring). Gather requirements, do analysis of them, and do a good design from the analysis. Then concentrate to write a good, simple, maintainable program from the design. If there are "efficiency" requirements, they should already be in the analysis and the design, and the code should reflect that. – Some programmer dude Jun 12 '23 at 10:56
  • [Continued...] And if there are stated "efficiency" requirements, and the code doesn't live up to them, take an optimized release build to benchmark, measure and profile to find the top ***one*** bottleneck, and optimize that with plenty of documentation and comments. Repeat until it's Good Enough™ – Some programmer dude Jun 12 '23 at 10:58
  • I made some edits to make the question more clear and proposed a possible solution – user129393192 Jun 12 '23 at 19:25
  • _"Specifically, how can I avoid stdlib's set-up of the heap and do so myself?"_ Why wouldf you want to do this?? – Jabberwocky Jun 12 '23 at 22:20
  • So I can just use the heap as a contiguous region of memory that I can load and store to as reading from a stream @Jabberwocky and increase it at the end if necessary – user129393192 Jun 12 '23 at 22:42
  • Such algorithms are not often seen in the wild, is it your own? – n. m. could be an AI Jun 20 '23 at 03:48

0 Answers0