3

I have a perl script that must create a file, write a few hundred lines of content to it, then read all the lines it's written and add lines whenever it finds a match based on a separate configuration file. It's not as ideal as just writing all the correct things the first time, but I have a few different use cases that require different lines to be added in different scenarios.

My question is this: would it be better to write the initial file, then read through all the lines in this file and append to it at several different locations? Or should I write the initial file, then read the lines of the initial file, and write them out to a new file adding the new lines as I do so?

I have a rough understanding of how file management happens from the operating systems class I took, and from what I understand I think it could be more expensive to repeatedly have to move the file offset, but I'm not sure if this cost would outweigh the cost of file creation.

I'm aware that the cost difference is likely very trivial for a small text file of only a few hundred lines, but I am more so just curious about what is faster.

Other context that I am not sure is relevant is that the OS is Linux, and that it is a multi user system, although concurrent access to this file should be rare or nonexistent. The file is created by this script, read by a single user afterwards then all but discarded.

  • 2
    You can only append to the end of a file. Writing new data somewhere in the middle overwrites any existing data present at that location. – Shawn Aug 22 '23 at 15:08
  • @Shawn you're right, this was a bit of an oversight on my part. Appending could still be achieved with libraires like Tie::File in perl but this gets pretty complex and requires shifting all the data after the insertion point up which I imagine is quite expensive. – Andrew Makin Aug 22 '23 at 15:20
  • 2
    Yup! I'd say your second approach is the only feasible one of the two. Or keep everything in memory in a list and only write it out to a file at the end. – Shawn Aug 22 '23 at 15:22
  • I'd keep it in memory and go to disk once, when ready. If the content were huge, or was needed on disk even before it's final, that'd be a different situation (but it isn't) – zdim Aug 22 '23 at 17:29
  • Re "*Appending could still be achieved with libraires like Tie::File*", Never use Tie::File. All it does it makes things extremely slower for no gain. (A common use of Tie::File took 30 times longer than doing the same without using Tie::File, I once tested. Oh, but it saves memory, right? No. You might as well load the entire file into memory. It doesn't even make the code simpler, or not much.) – ikegami Aug 22 '23 at 19:19

1 Answers1

4

If it's only a few hundred lines, keep it in memory. That does away with the concurrency concerns too and your data structure won't hit the disk unless you're short of RAM and start swapping.

Dave Hodgkinson
  • 375
  • 4
  • 13