I have two large-ish CSV files. One file is just a list of records. The other file is a list of records, but the first column is the line number of the record that it modifies in the other file. It doesn't replace the whole row; It just replaces the value in the row that has the matching header.
For example:
File 1:
"First","Last","Lang"
"John","Doe","Ruby"
"Jane","Doe","Perl"
"Dane","Joe","Lisp"
File 2:
"Seq","Lang"
2,"Ruby"
The goal is to end up with one file that looks like this:
"First","Last","Lang"
"John","Doe","Ruby"
"Jane","Doe","Ruby"
"Dane","Joe","Lisp"
The data is, however, much more complicated than that and could even contain line breaks in the CSV. Thus, I can't rely on the line number and instead I have to rely on the record count. (Unless, of course, I preprocess both files to replace newlines and carriage returns.. which I suppose is possible but less interesting.)
The question I have is how do I loop through both files and do the proper replacement without loading either of the entire files into memory. I believe 100mb+ files loaded into memory is a bad idea, right?
Also, the records in the resulting file should be in the same order when it's done.