Deleting few lines from a large file using a ruby script

Question

File 1: 1356775 lines
File 2: 9516 lines

File 2 contains lines of numbers which when matched in File 1 should be deleted from that file. Example:

File 1

34234323432 some useless stuff
23423432342 more useless stuff
98989898329 foo bar blah
65367389473 one two three

File 2

234234323
653673894

New File

34234323432 some useless stuff
98989898329 foo bar blah

My approach right now is to

Read entire file2 content into an array
Get first line of File1 and extract first 8 numbers
Loop through entire array from step 1 to see if 8 numbers from step1 match
If numbers don't match then write line from step1 into a new file
If they match then break out of the loop and don't write the line to new file
continue until there are no more lines to read from step2

However, since the file is so big, it take an enormous amount of time to do this since for each line in file1 we are looping through entire array(9516 elements). Is there a simpler way to do this type of file manipulation without putting records from file in a DB table.

score 1 · Answer 1 · edited May 23 '17 at 12:20

1

Read file2 in a Hash with the number as key and 'true' as value. Hashes are designed to be fast at lookups - much faster then arrays.

edited May 23 '17 at 12:20

Community

1
1

answered Feb 09 '12 at 15:27

steenslag

79,051
16
138
171

1

Searching an Array for each line results in `O(N*M)` performance for N lines and M triggers, whereas with a Hash it's pretty much `O(N)` time. – tadman Feb 09 '12 at 15:48
As Hashes are implemented as search trees, it's `O(M*log(N))` for them. Still much faster for big N. – jupp0r Feb 09 '12 at 16:21
awesome. Great information guys. just made the changes, lets see when it finishes. I'll update the post with rsults. – Omnipresent Feb 09 '12 at 16:21

score 0 · Answer 2 · answered Feb 09 '12 at 15:21

0

You could read chunks of File1 into memory, avoiding a lot of blocking IO.

answered Feb 09 '12 at 15:21

jupp0r

4,502
1
27
34

Deleting few lines from a large file using a ruby script

2 Answers2