24

I've been trying to work out how SHA-256 works. One thing I've been doing for other algorithms is I've worked out a sort of step by step pseudocode function for the algorithm.

I've tried to do the same for SHA256 but thus far I'm having quite a bit of trouble.

I've tried to work out how the Wikipedia diagram works but besides the text part explaining the functions I'm not sure I've got it right.

Wikipedia diagram of SHA256 Caption for Wikipedia diagram of SHA256

Here's what I have so far:

  • Input is an array 8 items long where each item is 32 bits.
  • Output is an array 8 items long where each item is 32 bits.
  • Calculate all the function boxes and store those values. I'll refer to them by function name.
  • Store input, right shifted by 32 bits, into output.
    • At this point, in the out array, E is the wrong value and A is empty
  • Store the function boxes.
    • now we need to calculate out E and out A.
    • note: I've replaced the modulo commands with a bitwise AND 2^(32-1)

I can't figure out how the modulus adding lines up, but I think it is like this:

  • Store (Input H + Ch + ( (Wt+Kt) AND 2^31 ) ) AND 2^31 As mod1
  • Store (sum1 + mod1) AND 2^31 as mod2
  • Store (d + mod2) AND 2^31 into output E
    • now output E is correct and all we need is output A
  • Store (MA + mod2) AND 2^31 as mod3
  • Store (sum0 + mod3) AND 2^31 into output A
    • output now contains the correct hash of input.
    • Do we return now or does this need to be run repeatedly?

Did I get all of those addition modulos right?

what are Wt and Kt?

Would this get run once, and you're done or does it need to be run a certain number of times, with the output being re-used as input?

Here's the link by the way. http://en.wikipedia.org/wiki/SHA-2#Hash_function

Thanks alot, Brian

nmr
  • 16,625
  • 10
  • 53
  • 67
codelion
  • 319
  • 1
  • 2
  • 11
  • For starters, the output of SHA256 is 32 bytes. Think about it: 256/8=32 – Nick ODell Aug 13 '12 at 16:35
  • Ah ok... should have realized that. So each of the input/output boxes in the diagram would be... 32 bits? I think you meant bits. I'll edit my starting code to reflect this. – codelion Aug 13 '12 at 17:19

3 Answers3

12

Have a look at the official standard that describes the algorithm, the variables are described here: http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf

(Oh, now I see I'm almost a year late with my answer, ah, never mind...)

lathspell
  • 3,040
  • 1
  • 30
  • 49
  • 3
    The link above is to the archived version available for historical purposes only. http://csrc.nist.gov/publications/fips/fips180-4/fips180-4.pdf is current as of August 5, 2015. – user56983 Aug 06 '17 at 21:24
8

W_t is derived from the current block being processed while K_t is a fixed constant determined by the iteration number. The compression function is repeated 64 times for each block in SHA256. There is a specific constant K_t and a derived value W_t for each iteration 0 <= t <= 63.

I have provided my own implementation of SHA256 using Python 3.6. The tuple K contains the 64 constant values of K_t. The Sha256 function shows how the value of W_t is computed in the list W. The implementation focuses on code clarity and not high-performance.

W = 32          #Number of bits in word
M = 1 << W
FF = M - 1      #0xFFFFFFFF (for performing addition mod 2**32)

#Constants from SHA256 definition
K = (0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
     0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
     0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
     0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
     0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
     0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
     0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
     0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
     0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
     0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
     0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
     0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
     0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
     0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
     0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
     0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2)

#Initial values for compression function
I = (0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
     0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19)

def RR(x, b):
    '''
    32-bit bitwise rotate right
    '''
    return ((x >> b) | (x << (W - b))) & FF

def Pad(W):
    '''
    Pads a message and converts to byte array
    '''
    mdi = len(W) % 64           
    L = (len(W) << 3).to_bytes(8, 'big')        #Binary of len(W) in bits
    npad = 55 - mdi if mdi < 56 else 119 - mdi  #Pad so 64 | len; add 1 block if needed
    return bytes(W, 'ascii') + b'\x80' + (b'\x00' * npad) + L   #64 | 1 + npad + 8 + len(W)

def Sha256CF(Wt, Kt, A, B, C, D, E, F, G, H):
    '''
    SHA256 Compression Function
    '''
    Ch = (E & F) ^ (~E & G)
    Ma = (A & B) ^ (A & C) ^ (B & C)        #Major
    S0 = RR(A, 2) ^ RR(A, 13) ^ RR(A, 22)   #Sigma_0
    S1 = RR(E, 6) ^ RR(E, 11) ^ RR(E, 25)   #Sigma_1
    T1 = H + S1 + Ch + Wt + Kt
    return (T1 + S0 + Ma) & FF, A, B, C, (D + T1) & FF, E, F, G

def Sha256(M):
    '''
    Performs SHA256 on an input string 
    M: The string to process
    return: A 32 byte array of the binary digest
    '''
    M = Pad(M)          #Pad message so that length is divisible by 64
    DG = list(I)        #Digest as 8 32-bit words (A-H)
    for j in range(0, len(M), 64):  #Iterate over message in chunks of 64
        S = M[j:j + 64]             #Current chunk
        W = [0] * 64
        W[0:16] = [int.from_bytes(S[i:i + 4], 'big') for i in range(0, 64, 4)]  
        for i in range(16, 64):
            s0 = RR(W[i - 15], 7) ^ RR(W[i - 15], 18) ^ (W[i - 15] >> 3)
            s1 = RR(W[i - 2], 17) ^ RR(W[i - 2], 19) ^ (W[i - 2] >> 10)
            W[i] = (W[i - 16] + s0 + W[i-7] + s1) & FF
        A, B, C, D, E, F, G, H = DG #State of the compression function
        for i in range(64):
            A, B, C, D, E, F, G, H = Sha256CF(W[i], K[i], A, B, C, D, E, F, G, H)
        DG = [(X + Y) & FF for X, Y in zip(DG, (A, B, C, D, E, F, G, H))]
    return b''.join(Di.to_bytes(4, 'big') for Di in DG)  #Convert to byte array

if __name__ == "__main__":
    bd = Sha256('Hello World')
    print(''.join('{:02x}'.format(i) for i in bd))
2
initial_hash_values=[
'6a09e667','bb67ae85','3c6ef372','a54ff53a',
'510e527f','9b05688c','1f83d9ab','5be0cd19'
]

sha_256_constants=[
'428a2f98','71374491','b5c0fbcf','e9b5dba5',
'3956c25b','59f111f1','923f82a4','ab1c5ed5',
'd807aa98','12835b01','243185be','550c7dc3',
'72be5d74','80deb1fe','9bdc06a7','c19bf174',
'e49b69c1','efbe4786','0fc19dc6','240ca1cc',
'2de92c6f','4a7484aa','5cb0a9dc','76f988da',
'983e5152','a831c66d','b00327c8','bf597fc7',
'c6e00bf3','d5a79147','06ca6351','14292967',
'27b70a85','2e1b2138','4d2c6dfc','53380d13',
'650a7354','766a0abb','81c2c92e','92722c85',
'a2bfe8a1','a81a664b','c24b8b70','c76c51a3',
'd192e819','d6990624','f40e3585','106aa070',
'19a4c116','1e376c08','2748774c','34b0bcb5',
'391c0cb3','4ed8aa4a','5b9cca4f','682e6ff3',
'748f82ee','78a5636f','84c87814','8cc70208',
'90befffa','a4506ceb','bef9a3f7','c67178f2'
]

def bin_return(dec):
    return(str(format(dec,'b')))

def bin_8bit(dec):
    return(str(format(dec,'08b')))

def bin_32bit(dec):
    return(str(format(dec,'032b')))

def bin_64bit(dec):
    return(str(format(dec,'064b')))

def hex_return(dec):
    return(str(format(dec,'x')))

def dec_return_bin(bin_string):
    return(int(bin_string,2))

def dec_return_hex(hex_string):
    return(int(hex_string,16))

def L_P(SET,n):
    to_return=[]
    j=0
    k=n
    while k<len(SET)+1:
        to_return.append(SET[j:k])
        j=k
        k+=n 
    return(to_return)

def s_l(bit_string):
    bit_list=[]
    for i in range(len(bit_string)):
        bit_list.append(bit_string[i])
    return(bit_list)

def l_s(bit_list):
    bit_string=''
    for i in range(len(bit_list)):
        bit_string+=bit_list[i]
    return(bit_string)

def rotate_right(bit_string,n):
    bit_list = s_l(bit_string)
    count=0
    while count <= n-1:
        list_main=list(bit_list)
        var_0=list_main.pop(-1)
        list_main=list([var_0]+list_main)
        bit_list=list(list_main)
        count+=1
    return(l_s(list_main))

def shift_right(bit_string,n):
    bit_list=s_l(bit_string)
    count=0
    while count <= n-1:
        bit_list.pop(-1)
        count+=1
    front_append=['0']*n
    return(l_s(front_append+bit_list))

def mod_32_addition(input_set):
    value=0
    for i in range(len(input_set)):
        value+=input_set[i]
    mod_32 = 4294967296
    return(value%mod_32)

def xor_2str(bit_string_1,bit_string_2):
    xor_list=[]
    for i in range(len(bit_string_1)):
        if bit_string_1[i]=='0' and bit_string_2[i]=='0':
            xor_list.append('0')
        if bit_string_1[i]=='1' and bit_string_2[i]=='1':
            xor_list.append('0')
        if bit_string_1[i]=='0' and bit_string_2[i]=='1':
            xor_list.append('1')
        if bit_string_1[i]=='1' and bit_string_2[i]=='0':
            xor_list.append('1')
    return(l_s(xor_list))

def and_2str(bit_string_1,bit_string_2):
    and_list=[]
    for i in range(len(bit_string_1)):
        if bit_string_1[i]=='1' and bit_string_2[i]=='1':
            and_list.append('1')
        else:
            and_list.append('0')

    return(l_s(and_list))

def or_2str(bit_string_1,bit_string_2):
    or_list=[]
    for i in range(len(bit_string_1)):
        if bit_string_1[i]=='0' and bit_string_2[i]=='0':
            or_list.append('0')
        else:
            or_list.append('1')
    return(l_s(or_list))

def not_str(bit_string):
    not_list=[]
    for i in range(len(bit_string)):
        if bit_string[i]=='0':
            not_list.append('1')
        else:
            not_list.append('0')
    return(l_s(not_list))

'''
SHA-256 Specific Functions:
'''

def Ch(x,y,z):
    return(xor_2str(and_2str(x,y),and_2str(not_str(x),z)))

def Maj(x,y,z):
    return(xor_2str(xor_2str(and_2str(x,y),and_2str(x,z)),and_2str(y,z)))

def e_0(x):
    return(xor_2str(xor_2str(rotate_right(x,2),rotate_right(x,13)),rotate_right(x,22)))

def e_1(x):
    return(xor_2str(xor_2str(rotate_right(x,6),rotate_right(x,11)),rotate_right(x,25)))

def s_0(x):
    return(xor_2str(xor_2str(rotate_right(x,7),rotate_right(x,18)),shift_right(x,3)))

def s_1(x):
    return(xor_2str(xor_2str(rotate_right(x,17),rotate_right(x,19)),shift_right(x,10)))

def message_pad(bit_list):
    pad_one = bit_list + '1'
    pad_len = len(pad_one)
    k=0
    while ((pad_len+k)-448)%512 != 0:
        k+=1
    back_append_0 = '0'*k
    back_append_1 = bin_64bit(len(bit_list))
    return(pad_one+back_append_0+back_append_1)

def message_bit_return(string_input):
    bit_list=[]
    for i in range(len(string_input)):
        bit_list.append(bin_8bit(ord(string_input[i])))
    return(l_s(bit_list))

def message_pre_pro(input_string):
    bit_main = message_bit_return(input_string)
    return(message_pad(bit_main))

def message_parsing(input_string):
    return(L_P(message_pre_pro(input_string),32))

def message_schedule(index,w_t):
    new_word = bin_32bit(mod_32_addition([int(s_1(w_t[index-2]),2),int(w_t[index-7],2),int(s_0(w_t[index-15]),2),int(w_t[index-16],2)]))
    return(new_word)

'''
This example of SHA_256 works for an input string >56 characters.
'''

def sha_256(input_string):
    w_t=message_parsing(input_string)
    a=bin_32bit(dec_return_hex(initial_hash_values[0]))
    b=bin_32bit(dec_return_hex(initial_hash_values[1]))
    c=bin_32bit(dec_return_hex(initial_hash_values[2]))
    d=bin_32bit(dec_return_hex(initial_hash_values[3]))
    e=bin_32bit(dec_return_hex(initial_hash_values[4]))
    f=bin_32bit(dec_return_hex(initial_hash_values[5]))
    g=bin_32bit(dec_return_hex(initial_hash_values[6]))
    h=bin_32bit(dec_return_hex(initial_hash_values[7]))
    for i in range(0,64):
        if i <= 15: 
            t_1=mod_32_addition([int(h,2),int(e_1(e),2),int(Ch(e,f,g),2),int(sha_256_constants[i],16),int(w_t[i],2)])
            t_2=mod_32_addition([int(e_0(a),2),int(Maj(a,b,c),2)])
            h=g
            g=f
            f=e
            e=mod_32_addition([int(d,2),t_1])
            d=c
            c=b
            b=a 
            a=mod_32_addition([t_1,t_2])
            a=bin_32bit(a)
            e=bin_32bit(e)
        if i > 15:
            w_t.append(message_schedule(i,w_t))
            t_1=mod_32_addition([int(h,2),int(e_1(e),2),int(Ch(e,f,g),2),int(sha_256_constants[i],16),int(w_t[i],2)])
            t_2=mod_32_addition([int(e_0(a),2),int(Maj(a,b,c),2)])
            h=g
            g=f
            f=e
            e=mod_32_addition([int(d,2),t_1])
            d=c
            c=b
            b=a 
            a=mod_32_addition([t_1,t_2])
            a=bin_32bit(a)
            e=bin_32bit(e)
    hash_0 = mod_32_addition([dec_return_hex(initial_hash_values[0]),int(a,2)])
    hash_1 = mod_32_addition([dec_return_hex(initial_hash_values[1]),int(b,2)])
    hash_2 = mod_32_addition([dec_return_hex(initial_hash_values[2]),int(c,2)])
    hash_3 = mod_32_addition([dec_return_hex(initial_hash_values[3]),int(d,2)])
    hash_4 = mod_32_addition([dec_return_hex(initial_hash_values[4]),int(e,2)])
    hash_5 = mod_32_addition([dec_return_hex(initial_hash_values[5]),int(f,2)])
    hash_6 = mod_32_addition([dec_return_hex(initial_hash_values[6]),int(g,2)])
    hash_7 = mod_32_addition([dec_return_hex(initial_hash_values[7]),int(h,2)])
    final_hash = (hex_return(hash_0),
                  hex_return(hash_1),
                  hex_return(hash_2),
                  hex_return(hash_3),
                  hex_return(hash_4),
                  hex_return(hash_5),
                  hex_return(hash_6),
                  hex_return(hash_7))
    return(final_hash)
Q-Club
  • 1
  • 1
  • 6
  • @VladislavsDovgalecs You should be far more concerned that people are finding [fixed point collisions](https://crypto.stackexchange.com/questions/48580/fixed-point-of-the-sha-256-compression-function) with SAT solvers. – Q-Club Jan 03 '18 at 04:06
  • @gkcn I've heard the constant selection reasoning too, but what you should try to find is what encoding they used, how many digits of precision were used, big-endian or little-edian, was the decimal point dropped from the encoding process, were 4-bit chunks used to convert base-10 to base-2, and finally how they decided to deal will rounding issues. – Q-Club Jan 03 '18 at 04:20
  • 1
    @nickandross You wrote a much cleaner implementation! You should write one for SHA-3 now! – Q-Club Jan 03 '18 at 04:23