1
method1:
$echo -n "The quick brown fox jumps over the lazy dog" | openssl sha1 | base64
MmZkNGUxYzY3YTJkMjhmY2VkODQ5ZWUxYmI3NmU3MzkxYjkzZWIxMgo=

method2:
$ echo -n "The quick brown fox jumps over the lazy dog" | openssl sha1 | xxd -r -p | base64
L9ThxnotKPzthJ7hu3bnORuT6xI=

method3:
echo -n "The quick brown fox jumps over the lazy dog" | openssl sha1 | xxd -b -p | base64
MzI2NjY0MzQ2NTMxNjMzNjM3NjEzMjY0MzIzODY2NjM2NTY0MzgzNDM5NjU2NTMxNjI2MjM3MzY2NTM3CjMzMzkzMTYyMzkzMzY1NjIzMTMyMGEK

I am basically trying to do a checksum an input string The quick brown fox jumps over the lazy dog via sha1 as a checksum and then base64 the result and I have two methods above, I think the method2 is correct answer but I have to an extra step to convert the hex back into binary via xxd -r and plain format -p before I feed it into base64 again, why do I have to do this extra step ?

I don't find anywhere the base64 cmd tool is expecting the input to be binary ? But let's assume so when I explicitly convert it into binary and feed it to base64 via mehod3 xxd -b option,the result is different again.

This might be easier if it's in programing language bcos we have full control but via a few cmd tools its a bit confusing, could someone help me explain this ?

RoundPi
  • 5,819
  • 7
  • 49
  • 75
  • Without `-d` option, `base64` encodes its input. – oguz ismail Apr 13 '20 at 14:39
  • that I konw and I didn't use any option for `base64` bcos I just want to encoding. When I speak options above, it's mostly for `xxd` – RoundPi Apr 13 '20 at 14:44
  • 1
    *All* data is binary; text is just a stream of bytes representing an encoding (ASCII, UTF-8, etc) of text. `base64` and `xxd`, in some sense, do the same thing in different ways: provide an ASCII representation of arbitrary binary data. – chepner Apr 14 '20 at 13:46
  • @chepner yup that I know but I was wondering how would base64 cmd tool would inteprate the encoded string , would it assume its a hex in string or binary in string or in ASCII string. – RoundPi Apr 15 '20 at 17:39

1 Answers1

0

There are three different results here because you are passing in three different strings to base64.

Per your question on base64 expecting the input to be binary, @chepner is right here:

All data is binary; text is just a stream of bytes representing an encoding (ASCII, UTF-8, etc) of text.

Intermediary steps

Let's store the shared command in a variable for clarity.

$ msg='The quick brown fox jumps over the lazy dog'
$ sha_val="$(printf "$msg" | openssl sha1 | awk '{ print $2 }')"
$ printf "$sha_val"
2fd4e1c67a2d28fced849ee1bb76e7391b93eb12

A couple things to note:

  • Using printf because it is more consistent, especially when we are comparing bytes and hashes.
  • Piping to awk '{ print $2 }' as openssl may prepend with (stdin)=.

Comparing the bytes

We can use xxd to compare the bytes for each, using -c 1000 to use 1000-char lines (i.e. don't add newlines for < 1000-char strings). This is useful for strings like the output in method2, where there are control characters that can't be printed.

method 1

This is the hex representation of the sha value. For example, the first 2 in the sha output is 32 in this result because hex 32 <=> dec 50 <=> ASCII/UTF-8 "2". If this is confusing, take a look at an ASCII table.

$ printf "$sha_val" | xxd -p -c 1000
32666434653163363761326432386663656438343965653162623736653733393162393365623132

method 2

This output is the EXACT SAME as $sha_val, given that we are converting from hex to ASCII binary and then back with xxd. Note that converting the sha value from hex to binary is not necessary for base64.

$ printf "$sha_val" | xxd -r -p | xxd -p -c 1000
2fd4e1c67a2d28fced849ee1bb76e7391b93eb12

method 3

xxd's -p option is overriding the -b option, so xxd -b -p <=> xxd -p.

$ printf "$sha_val$" | xxd -p -c 1000 | xxd -p -c 1000
33323636363433343635333136333336333736313332363433323338363636333635363433383334333936353635333136323632333733363635333733333339333136323339333336353632333133323061

As you can see, base64 generates three different strings because it receives three different strings.

Ross Jacobs
  • 2,962
  • 1
  • 17
  • 27
  • thanks for taking your time and answering I think one thing is being made clear is the output from openssll is binary encoded as ASCII/UTF8. But it's still unclear how base64 tool interpret the input ? you said it takes binary ? so am I right to assume base64 would take ASCII?UTF8 string and decode into binary and work them? If so which of the method has the correct answers as to get a string's SHA1 in base64 format ? – RoundPi Apr 15 '20 at 17:37
  • Baae64 takes bytes as input. Encoding like ASCII/utf8 isn’t relevant (encodings may say this is an “A” or “B”, but the bytes representing them will stay the same). The first one is likely what you want. – Ross Jacobs Apr 15 '20 at 18:02
  • If that's the case it would be much easier hence I said `This might be easier if it's in programing language ` bcos we can simply pass in byte array. How could you pass a byte array directly here to the cmd tool ? In method1, it passes the hex string format of the SHA and in method2, it passes the binary format string of the SHA. Why you think method1 is the correct answer given you said base64 take binary as input ? – RoundPi Apr 15 '20 at 19:21
  • "How could you pass a byte array directly here to the cmd tool ?" => How do you know we're not? We're writing bytes composed of 1s and 0s, ordered by endianness to the stdin file descriptor which passes this as input to `base64`. – Ross Jacobs Apr 15 '20 at 22:16
  • @Gob00st it’s not clear what “correct” even means here. I’ve tried to explain this to the best of my ability, but it sounds like you’ve been given vague requirements. – Ross Jacobs Apr 16 '20 at 20:36