I am trying to do pattern replacement using SED script but its not working properly
sample_content.txt
288Y2RZDBPX1000000001dhana
JP2F64EI1000000002d
EU9V3IXI1000000003dfg1000000001dfdfds
XATSSSSFOO4dhanaUXIBB7TF71000000004adf
10Q1W4ZEAV18LXNPSPGRTTIDHBN1000000005egw
patterns.txt
1000000001 9000000003
1000000002 2000000001
1000000003 3000000001
1000000004 4000000001
1000000005 5000000001
Expected output
288Y2RZDBPX9000000003dhana
JP2F64EI2000000001d
EU9V3IXI3000000001dfg9000000003dfdfds
XATSSSSFOO4dhanaUXIBB7TF74000000001adf
10Q1W4ZEAV18LXNPSPGRTTIDHBN5000000001egw
I am able to do with single SED replacement like
sed 's/1000000001/1000000003/g' sample_content.txt
Note:
- Matching pattern is not in fixed position.
- Single line may have multiple matching value to replace in sample_content.txt
- Sample_content.txt and patterns.txt has > 1 Million records
File attachment link: https://drive.google.com/open?id=1dVzivKMirEQU3yk9KfPM6iE7tTzVRdt_
Could anyone suggest how can achieve this without affecting performance?
Updated on 11-Feb-2018
After analyzing the real file I just got a hint that there is a grade value at the 30 and 31th position. Which helps where and all we need to apply replacement.
If grade AB then replace the 10 digit phone number at 41-50 and 101-110
If grade BC then replace the 10 digit phone number at 11-20, 61-70 and 151-160
If grade DE then replace the 10 digit phone number at 1-10, 71-80, 151-160 and 181-190
Like this I am seeing 50 unique grades for 2 Million sample records.
{ grade=substr($0,110,2)} // identify grade
{
if (grade == "AB") {
print substr($0,41,10) ORS substr($0,101,10)
} else if(RT == "BC"){
print substr($0,11,10) ORS substr($0,61,10) ORS substr($0,151,10)
}
like wise 50 coiditions
}
May I know, whether this approach is advisable or anyother better approach?