replace strings in file1 with empty space if strings not found in file2

Question

This question: https://unix.stackexchange.com/questions/20322/replace-string-with-contents-of-a-file-using-sed replaces a fixed string in file1 with the contents of file 2.

I want to do this the other way around plus an inversion.

If I have file1:

A:B
B:B
C:
D:
E:A

and file2:

D
E
:

then I want to be left with

:
:
:
D:
E:

If anyone has any pointers that would be great. Bonus points if this can be done on a specific column of a file1 while preserving the rest of the file1.

i.e. If I have three columns:

A:B    A:B    A:B
B:B    B:B    B:B
C:     C:     C:
D:     D:     D:
E:A    E:A    E:A

I would end up with (target column 2)

A:B    :      A:B
B:B    :      B:B
C:     :      C:
D:     D:     D:
E:A    E:     E:A

I don't understand. What is replaced with what? Is `A:B` replaced with `:`? why? — KamilCuk, Aug 01 '19 at 10:59
@KamilCuk A and B are replaced with nothing i.e `sed 's/A|B//g'` because those characters don't exist in file2 — brucezepplin, Aug 01 '19 at 13:19
`characters` ? So you want to remove all characters that are in file2 from file1? Why didn't you specify that? Where is the "replace"-ing part? Newline characters are ignored? Och, and you want to apply the removal only on one column? So you want to remove all characters in file2 from a specified column from file1? — KamilCuk, Aug 01 '19 at 13:38
You should have included regexp metachars in your example since that could trip up a potential solution, especially if you include `^` since it needs to be escaped differently from all other metachars to be treated literally. — Ed Morton, Aug 01 '19 at 13:39

score 2 · Answer 1 · answered Aug 01 '19 at 11:41

2

tr makes this trivial:

$ tr -cd "$(cat file2)" < file1         
:
:
:
D:
E:

answered Aug 01 '19 at 11:41

Shawn

47,241
3
26
60

score 0 · Answer 2 · answered Aug 01 '19 at 13:37

$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR == FNR {
    goodChars[$1]
    next
}
{
    goodStr = ""
    for (i=1; i<=length($2); i++) {
        char = substr($2,i,1)
        if (char in goodChars) {
            goodStr = goodStr char
        }
    }
    $2 = goodStr
    print
}

$ awk -f tst.awk file2 file1
A:B     :       A:B
B:B     :       B:B
C:      :       C:
D:      D:      D:
E:A     E:      E:A

The above assumes your input file is tab-separated as it looks like it is, otherwise just get rid of the BEGIN section.

score 0 · Answer 3 · answered Aug 02 '19 at 07:13

This might work for you (GNU sed):

sed -z 's/\n//g;s/.*/s#[^&]##g/' file2 | sed -f - file1

Convert file2 into a sed script and run it against file1. This concatenates each character in file2 and places them in a negative character class inside a sed substitution command which runs globally i.e. the command removes all occurrences of any character in file2 from file1.

To cater for the second problem, add newlines to the negative character class, isolate the second column, make a copy, apply the same code and using pattern matching replace the second column with the amended value:

sed -z 's/\n//g;s/.*/s#[^&\\n]##g/' file2 |
sed -Ee 's/\S+/\n&\n/2;h' -f - -e 'H;g;s/\n.*\n(.*)\n.*\n(.*)\n/\2\1/' file3

replace strings in file1 with empty space if strings not found in file2

3 Answers3