shell - Characters contained in both strings - edited

Question

I want to compare two string variables and print the characters that are the same for both. I'm not really sure how to do this, I was thinking of using comm or diff but I'm not really sure the right parameters to print only matching characters. also they say they take in files and these are strings. Can anyone help?

Input:

a=$(echo "abghrsy")
b=$(echo "cgmnorstuvz")

Output:

"grs"

No they can be where ever, I edited the question to reflect that — Mike Weber, Apr 06 '13 at 04:30

score 2 · Answer 1 · answered Apr 06 '13 at 04:32

2

You don't need to do that much work to assign $a and $b shell variables, you can just...

a=abghrsy
b=cdgmrstuvz

Now, there is a classic computer science problem called the longest common subsequence¹ that is similar to yours.

However, if you just want the common characters, one way would let Ruby do the work...

$ ruby -e "puts ('$a'.chars.to_a & '$b'.chars.to_a).join"

^{1. Not to be confused with the different longest common substring problem.}

answered Apr 06 '13 at 04:32

DigitalRoss

143,651
25
248
329

The system I'm working on doesn't have ruby and I can't install it because I'm not the admin. – Mike Weber Apr 06 '13 at 04:43
Ok, split the strings one per line, run `comm -3` and then combine the output lines. Ugly, but it would work. – DigitalRoss Apr 06 '13 at 04:47
How do I use `comm` with strings because it normally wants a file. I've tried `comm -3 $a $b` `comm -3 '$a' '$b'` `comm -3 "$a" "$b"` – Mike Weber Apr 06 '13 at 06:10
Well, you would have to put them in a file, yes. There are tricks with bash that can help but I would keep it simple. You can separate a string into files of one character each with `split -b 1`. Or, could you write a C program? – DigitalRoss Apr 06 '13 at 16:36

vidit · Answer 2 · 2013-04-08T05:13:13.910

Using gnu coreutils(inspired by @DigitalRoss)..

a="abghrsy"
b="cgmnorstuvz"

echo "$(comm -12 <(echo "$a" | fold -w1 | sort | uniq) <(echo "$b" | fold -w1 | sort | uniq) | tr -d '\n')"

will print grs. I assumed you only want uniq characters.

UPDATE: Modified for dash..

 #!/bin/dash

 string1=$(printf "$1" | fold -w1 | sort | uniq | tr -d '\n');
 string2=$(printf "$2" | fold -w1 | sort | uniq | tr -d '\n');

 while [ "$string1" != "" ]; do
   c1=$(printf '%s\n' "$string1" | cut -c 1-1 )
   string2=$(printf "$2" | fold -w1 | sort | uniq | tr -d '\n');
   while [ "$string2" != "" ]; do
     c2=$(printf '%s\n' "$string2" | cut -c 1-1 )
     if [ "$c1" = "$c2" ]; then
       echo "$c1\c"
     fi
     string2=$(printf '%s\n' "$string2" | cut -c 2- )
   done
   string1=$(printf '%s\n' "$string1" | cut -c 2- )
 done
 echo;

_{Note: I am just a beginner. There might be a better way of doing this.}

The system I have to do this on uses dash. Both of your solutions work well on my bash system, but where this eventually needs to go it does not. — Mike Weber, Apr 06 '13 at 06:16

score 1 · Answer 3 · edited Jun 20 '20 at 09:12

Nice question +1.

You can use an awk trick to get this done.

a=abghrsy
b=cdgmrstuvz
comm -12 <(echo $a|awk -F"\0" '{for (i=1; i<=NF; i++) print $i}') <(echo $b|awk -F"\0" '{for (i=1; i<=NF; i++) print $i}')|tr -d '\n'

OUTPUT:

grs

Note use of awk -F"\0" that breaks input string character by character into different awk fiedls. Rest is pretty straightforward use of comm and tr.

PS: If you input string is not sorted then you need to pipe awk's output to sort or do sort of an array inside awk.

UPDATE: awk only solution (without comm):

echo "$a;$b" | awk -F"\0" '{scnd=0; for (i=1; i<=NF; i++) {if ($i!=";") {if (!scnd) arr1[$i]=$i; else if ($i in arr1) arr2[$i]=$i} else scnd=1}} END { for (a in arr2) printf("%s", a)}'

This assumes semicolon doesn't appear in your string (you can use any other character if that's not the case).

UPDATE 2: I think simplest solution is using grep -o

(thanks to answer from @CodeGnome)

echo "$b" | grep -o "[$a]" | tr -d '\n'

score 1 · Accepted Answer · answered Apr 06 '13 at 06:13

Use Character Classes with GNU Grep

The isn't a widely-applicable solution, but it fits your particular use case quite well. The idea is to use the first variable as a character class to match against the second string. For example:

a='abghrsy'
b='cgmnorstuvz'
echo "$b" | grep --only-matching "[$a]" | xargs | tr --delete ' '

This produces grs as you expect. Note that the use of xargs and tr is simply to remove the newlines and spaces from the output; you can certainly handle this some other way if you prefer.

Set Intersection

What you're really looking for is a set intersection, though. While you can "wing it" in the shell, you'd be better off using a language like Ruby, Python, or Perl to do this.

A Ruby One-Liner

If you need to integrate with an existing shell script, a simple Ruby one-liner that uses Bash variables could be called like this inside your current script:

a='abghrsy'
b='cgmnorstuvz'
ruby -e "puts ('$a'.split(//) & '$b'.split(//)).join"

A Ruby Script

You could certainly make things more elegant by doing the whole thing in Ruby instead.

string1_chars = 'abghrsy'.split //
string2_chars = 'cgmnorstuvz'.split //
intersection  = string1_chars & string2_chars
puts intersection.join

This certainly seems more readable and robust to me, but your mileage may vary. At least now you have some options to choose from.

The first line of code worked fantastically. That's exactly what I was hoping for the whole time\, some nice one liner that gets the job done without muddling my code. — Mike Weber, Apr 06 '13 at 09:46