So I hope this question already hasn't been answered, but I can't seem to figure out the right search term.
First some background: I have text data files that are tabular and can easily climb into the 10s of GBs. The computer processing them is already heavily loaded from the hours long data collection(at up to 30-50MB/s) as it is doing device processing and control.Therefore, disk space and access are at a premium. We haven't moved from spinning disks to SSDs due to space constraints.
However, we are looking to do something with the just collected data that doesn't need every data point. We were hoping to decimate the data and collect every 1000th point. However, loading these files (Gigabytes each) puts a huge load on the disk which is unacceptable as it could interrupt the live collection system.
I was wondering if it was possible to use a low level method to access every nth byte (or some other method) in the file (like a database does) because the file is very well defined (Two 64 bit doubles in each row). I understand too low level access might not work because the hard drive might be fragmented, but what would the best approach/method be? I'd prefer a solution in python or ruby because that's what the processing will be done in, but in theory R, C, or Fortran could also work.
Finally, upgrading the computer or hardware isn't an option, setting up the system took hundreds of man-hours so only software changes can be performed. However, it would be a longer term project but if a text file isn't the best way to handle these files, I'm open to other solutions too.
EDIT: We generate (depending on usage) anywhere from 50000 lines(records)/sec to 5 million lines/sec databases aren't feasible at this rate regardless.