0

I'm trying to update a huge text document by deleting text that is dynamically received from an array. I cannot use readFileSync because the file is way too large so I have to stream it. The problem im encountering is the function deletes everything instead of only deleting what's in the array. Perhaps im not understanding how to properly delete something from a stream. How can this be done?

largeFile_example.txt

test_domain_1
test_domain_2
test_domain_3
test_domain_4
test_domain_5
test_domain_6
test_domain_7
test_domain_8
test_domain_9
test_domain_10

stream.js

 const es = require('event-stream');
 const fs = require('fs');

//array of domains to delete
var domains = ['test_domain_2','test_domain_6','test_domain_8'];

//loop
domains.forEach(function(domain){

//domain to delete
var dom_to_delete = domain;

//stream
var s = fs
.createReadStream('largeFile_example.txt')
.pipe(es.split())
.pipe(
es
.mapSync(function(line) {

//check if found in text
if(line === dom_to_delete){

//delete
var newValue = dom_to_delete.replace(line, '');
fs.createWriteStream('largeFile_example.txt', newValue, 'utf-8');

}


})
.on('error', function(err) {
console.log('Error while reading file.', err);
})
.on('end', function() {

//...do something

}),
);


})
Grogu
  • 2,097
  • 15
  • 36

1 Answers1

0

You can simply use readline interface with the streams and you can read line by line. When you encounter any domain from the array just don't add it.

You can use for-of with async/await

const fs = require('fs');
const readline = require('readline');

async function processLine() {
  const fileStream = fs.createReadStream('yourfile');

  const rl = readline.createInterface({
    input: fileStream,
    crlfDelay: Infinity
  });

  // Note: crlfDelay recognize all instances of CR LF
  // ('\r\n') in file as a single line break.

  for await (const line of rl) {
    // each line will be here as domain
    // create a write stream and append it to the file
    // line by line using { flag: a }
    
  }
}

processLine();

To delete the domains from the existing file, you need to follow these steps:

  1. Need to read the file as a stream.
  2. Replace the text you don't want with the '' using regex or replace method.
  3. add the updated content to the temp file or a new file.

There is no way you can read from one point and update the same line. I mean I am not aware of such a technique in Node.js(will be happy to know that). So that's why you need to create a new file and once updated remove the old file.

Maybe you can add some more value to how you code it as I am not sure why you want to do that. If your file is not large you can do that in-place, but your case is different.

Apoorva Chikara
  • 8,277
  • 3
  • 20
  • 35
  • The task is to update document by deleting the domains that are from the array. I am able to read the lines already. This is not the issue. – Grogu Feb 20 '22 at 21:25
  • When you say delete something from streams, it means removing some part of it. I'm pretty sure you won't find any event on the stream itself. If I understand it correctly you need to remove the domains from it and update the file? – Apoorva Chikara Feb 21 '22 at 04:48
  • If I see your code, you are updating the same file from where you are reading it. However, this won't work. I am updating it in the answer. – Apoorva Chikara Feb 21 '22 at 04:50