I am trying to do the following with a sed script but it's taking too much time. Looks like something I'm doing wrongly.
Scenario:
I've student records (> 1 million) in students.txt
.
In This file (each line) 1st 10 characters are student ID and next 10 characters are contact number and so on
students.txt
10000000019234567890XXX...
10000000029325788532YYY...
.
.
.
10010000008766443367ZZZZ...
I have another file (encrypted_contact_numbers.txt) which has all the phone but numbers and corresponding encrypted phone numbers as below
encrypted_contact_numbers.txt
Phone_Number, Encrypted_Phone_Number
9234567890, 1122334455
9325788532, 4466742178
.
.
.
8766443367, 2964267747
I wanted to replace all the contact numbers (11th–20th position) in students.txt
with the corresponding encrypted phone number from encrypted_contact_numbers.txt
.
Expected Output:
10000000011122334455XXX...
10000000024466742178YYY...
.
.
.
10010000002964267747ZZZZ...
I am using the below sed script to do this operation. It is working fine but too slowly.
Approach 1:
while read -r pattern replacement; do
sed -i "s/$pattern/$replacement/" students.txt
done < encrypted_contact_numbers.txt
Approach 2:
sed 's| *\([^ ]*\) *\([^ ]*\).*|s/\1/\2/g|' <encrypted_contact_numbers.txt |
sed -f- students.txt > outfile.txt
Is there any way to process this huge file quickly?
Update: 9-Feb-2018
Solutions given in AWK and Perl is working fine if the phone number is in specified position (column 10-20), If I try to do global replacement it took too much time to process. Is there any best way to achieve this?
students.txt : Updated version
10000000019234567890XXX...9234567890
10000000029325788532YYY...
.
.
.
10010000008766443367ZZZZ9234567890...