0

I have a problem understanding this specific code and managing to convert it to Python from Javascript. The problem lies in the Buffer method used by Javascript which creates a different hash output than in Python. The main goal is to get the merkleRoot of the transactions ["a","b"].

Javascript: (The hashes of "a" and "b" individually are the same as with a python SHA256 implementation. However, the method (Buffer.concat([hashA, hashB])) makes the difference apparently, however I cannot figure out how to implement it in Python. In python I get a merkleRoot of "ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb3e23e8160039594a33894f6564e1b1348bbd7a0088d42c4acb73eeaed59c009d", which is not correct. I posted the correct merkleRoot below.

const sha256 = (tx) => crypto.createHash("sha256").update(tx).digest();
const hashPair = (hashA, hashB, hashFunction = sha256) =>
  hashFunction(Buffer.concat([hashA, hashB]));

const a = sha256("a");
const b = sha256("b");
hashPair(a, b).toString("hex");
e5a01fee14e0ed5c48714f22180f25ad8365b53f9779f79dc4a3d7e93963f94a
     ├─ ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb
     └─ 3e23e8160039594a33894f6564e1b1348bbd7a0088d42c4acb73eeaed59c009d

I have tried some approaches like with base64 and encodings, however due to my limitation in cryptography knowledge I can't seem to figure out the right approach. My approach in python was:

  1. Get SHA256 of the string "a"
  2. Get SHA256 of the string "b"
  3. Get SHA256 of the concatenated hashes of "a"+"b": ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb3e23e8160039594a33894f6564e1b1348bbd7a0088d42c4acb73eeaed59c009d

Here is the Python Implementation from: https://www.geeksforgeeks.org/introduction-to-merkle-tree/

Python:

# Python code for implemementing Merkle Tree
from typing import List
import hashlib
class Node:
    def __init__(self, left, right, value: str, content, is_copied=False) -> None:
        self.left: Node = left
        self.right: Node = right
        self.value = value
        self.content = content
        self.is_copied = is_copied
         
    @staticmethod
    def hash(val: str) -> str:
        return hashlib.sha256(val.encode('utf-8')).hexdigest()
 
    def __str__(self):
        return (str(self.value))
 
    def copy(self):
        """
        class copy function
        """
        return Node(self.left, self.right, self.value, self.content, True)
       
class MerkleTree:
    def __init__(self, values: List[str]) -> None:
        self.__buildTree(values)
 
    def __buildTree(self, values: List[str]) -> None:
 
        leaves: List[Node] = [Node(None, None, Node.hash(e), e) for e in values]
        if len(leaves) % 2 == 1:
            leaves.append(leaves[-1].copy())  # duplicate last elem if odd number of elements
        self.root: Node = self.__buildTreeRec(leaves)
 
    def __buildTreeRec(self, nodes: List[Node]) -> Node:
        if len(nodes) % 2 == 1:
            nodes.append(nodes[-1].copy())  # duplicate last elem if odd number of elements
        half: int = len(nodes) // 2
 
        if len(nodes) == 2:
            return Node(nodes[0], nodes[1], Node.hash(nodes[0].value + nodes[1].value), nodes[0].content+"+"+nodes[1].content)
 
        left: Node = self.__buildTreeRec(nodes[:half])
        right: Node = self.__buildTreeRec(nodes[half:])
        value: str = Node.hash(left.value + right.value)
        content: str = f'{left.content}+{right.content}'
        return Node(left, right, value, content)
 
    def printTree(self) -> None:
        self.__printTreeRec(self.root)
         
    def __printTreeRec(self, node: Node) -> None:
        if node != None:
            if node.left != None:
                print("Left: "+str(node.left))
                print("Right: "+str(node.right))
            else:
                print("Input")
                 
            if node.is_copied:
                print('(Padding)')
            print("Value: "+str(node.value))
            print("Content: "+str(node.content))
            print("")
            self.__printTreeRec(node.left)
            self.__printTreeRec(node.right)
 
    def getRootHash(self) -> str:
      return self.root.value
  
def mixmerkletree() -> None:
    elems = ["a", "b"]
    #as there are odd number of inputs, the last input is repeated
    print("Inputs: ")
    print(*elems, sep=" | ")
    print("")
    mtree = MerkleTree(elems)
    print("Root Hash: "+mtree.getRootHash()+"\n")
    mtree.printTree()
 
 
mixmerkletree()
 
#This code was contributed by Pranay Arora (TSEC-2023).

Python Output:

Inputs: 
a | b

Root Hash: 62af5c3cb8da3e4f25061e829ebeea5c7513c54949115b1acc225930a90154da

Left: ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb
Right: 3e23e8160039594a33894f6564e1b1348bbd7a0088d42c4acb73eeaed59c009d
Value: 62af5c3cb8da3e4f25061e829ebeea5c7513c54949115b1acc225930a90154da
Content: a+b

So my main question is, how can I correctly implement the Buffer method from javascript into Python to get the same hash of when combining the hashes of "a" and "b". The correct merkleRoot as shown above should be: e5a01fee14e0ed5c48714f22180f25ad8365b53f9779f79dc4a3d7e93963f94a

  • Show the Python code you used to calculate the Merkle root as properly formatted code in the question. – Michael Butscher Oct 18 '22 at 18:01
  • Thanks for the suggestion. I added the full python MerkleTree implementation with the corresponding output. However, I though that one could also just go to an online SHA256 encoder like: [link](https://emn178.github.io/online-tools/sha256.html) ... and re-create it manually as there are only two inputs "a" and "b". – Projects KK Oct 18 '22 at 18:11
  • The JavaScript "Buffer" objects translate to Python "bytes" (or sometimes "bytearray") objects. Ask the hash object for its "digest" instead of the hexadecimal representation "hexdigest". The digest is a "bytes" object you can concatenate with another with a simple plus sign. A "bytes" object can be fed into a hash function (as the code does already) and has a "hex" method to return a hexadecimal string representation for printing. – Michael Butscher Oct 18 '22 at 18:32
  • Thank you a lot. That was a great hint. I was able to solve it. – Projects KK Oct 19 '22 at 08:16

1 Answers1

0

SOLVED, thanks to the great explanation of Michael Butscher above.

The JavaScript "Buffer" objects translate to Python "bytes" (or sometimes "bytearray") objects. Ask the hash object for its "digest" instead of the hexadecimal representation "hexdigest". The digest is a "bytes" object you can concatenate with another with a simple plus sign. A "bytes" object can be fed into a hash function (as the code does already) and has a "hex" method to return a hexadecimal string representation for printing. – Michael Butscher

Here is a simplified python solution to get the same MerkleRoot as with the Javascript Buffer method:

import hashlib

def hashab(string):
    x = string.encode()
    return hashlib.sha256(x).digest()  

a = hashab("a")
b = hashab("b")

ab = a+b

print(ab)
print(hashlib.sha256(ab).hexdigest())

Output:

b'\xca\x97\x81\x12\xca\x1b\xbd\xca\xfa\xc21\xb3\x9a#\xdcM\xa7\x86\xef\xf8\x14|Nr\xb9\x80w\x85\xaf\xeeH\xbb>#\xe8\x16\x009YJ3\x89Oed\xe1\xb14\x8b\xbdz\x00\x88\xd4,J\xcbs\xee\xae\xd5\x9c\x00\x9d'

e5a01fee14e0ed5c48714f22180f25ad8365b53f9779f79dc4a3d7e93963f94a