I am working in Node.JS, and we are opening a read stream from a CSV, loading that data into chunks, processing each line of those chunks by regex and comparing to another data file, then passing it into a write stream to write to a new file
The issue is that the second read stream (called beta), for the comparison file, is taking longer to run than the first read stream (called alpha) in some cases, and this causes the issue of not all the comparison data being ready to go and readable, resulting in null values. I am unsure how I can manage to get alpha to hold execution and not run until beta has called readStream.on(end), the way I think I should go about it, because in all my attempts with promises and await and while loops, it either freezes the program completely or it doesn't wait at all for beta to finish running and ends execution before its even done. The only solution I have found is to take beta directly into the main code and put alpha hardcoded into the beta readStream.on(end) function, however due to their being multiple permutations of how beta will run based on the data we're using, that means I will have to have multiple repeats of alpha in each instance of beta in the main code locked in switch or if statements, and I do not like that at all
This has been a massive problem for a few days and I am at the end of my rope about it. Nothing has worked
To note, the secondary script for running beta is referenced by a require statement, and the variables and scripts are returned to it using module.exports. I tried to put the entire beta stream into a function in module.exports and wait for a return of true, or to change a value, or to call the alpha from that function by passing into the secondary script and returning it, however it still does not ever wait for beta to finish before it takes off and does its own thing
I'm sorry for the long text
This is the beta filestream, as it currently is implemented. I would prefer it be contained in postValid.js, as that's where the data from this filestream gets used and there's going to be multiple permutations of postValid based on the data
postValid = require("./postValid.js");
postValid.country = country;
fs.createReadStream(postValid.country + " ADDRESS REF DATA.csv")
.pipe(csv({e: null, headers: false, separator: '\t'}))
//Indicate start of reading
.on('resume', (data) => console.log("Reading complete postal code file..."))
//Pass data to be buffered and chunked to the processing script
.on('data', (data) => {
//Each line of data gets processed as needed and stored here
})
.on('end', () => {
postValid.complete = true;
console.log("Done reading");
ThisFunc();
});
This is the alpha filestream, as it currently is. Its in the file I want, but I have to lock it within a function to work this and I would rather it be the first to run for future possible data sets
function ThisFunc() {
//Do a quick parse of the csv to get a row count
fs.createReadStream(fileName)
.pipe(csv())
.on('resume', () => {
console.log("Getting file length...");
postValid.complete = false;
})
.on('data', () => initLen++)
.on('end', () => {
console.log("The csv is " + initLen + " lines long, beginning processing")
//Length of 1/100 of the file size, needs to be modular to file length rounded up
chunkLen = Math.round(initLen/100);
//Read the CVS filestream
fs.createReadStream(fileName)
.pipe(csv())
//Indicate start of reading
.on('resume', () => {
console.log("Loading...");
})
//Pass data to be buffered and chunked to the processing script
.on('data', (data) => {
//Lines of data get passed into the processing script here
})
//End of reading
.on('end', () => {
//Read the final chunk of lines that don't fit and process them
console.log("100%");
//Print results
console.log("File Length: " + results.length);
console.log("Processed: " + finalAdd.length);
console.log("Found Code: " + postCodes);
});
});
}
I removed a bit of the code to keep it legal, this should contain all the important bits. The files can be anything between several million lines or just a few thousand