1

I have a text file with python bytecode which is part of the output you'd get when issuing python -m dis file.py. My goal is to reassemble the source from the bytecode.

I've seen a similar question asked here but the answers provided are focused on tools that (from my understanding) should solve the problem only if my bytecode file had all the necessary info (python bytecode version, timestamp, flags, etc).

.pyc code:

##################################################
 15           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('loading application')
              4 CALL_FUNCTION            1
              6 POP_TOP

 17           8 LOAD_GLOBAL              1 (magic)
             10 LOAD_CONST               2 ('8934')
             12 LOAD_GLOBAL              2 (get_flag)
             14 CALL_FUNCTION            0
             16 CALL_FUNCTION            2
             18 STORE_FAST               0 (d)

 19          20 LOAD_GLOBAL              0 (print)
             22 LOAD_FAST                0 (d)
             24 CALL_FUNCTION            1
             26 POP_TOP
             28 LOAD_CONST               0 (None)
             30 RETURN_VALUE
None
##################################################
  4           0 LOAD_CONST               1 ('k\\PbYUHDAM[[VJlVAMVk[VWQE')
              2 RETURN_VALUE
None
##################################################
  7           0 LOAD_CONST               1 (b'')
              2 STORE_FAST               2 (out)

  9           4 LOAD_GLOBAL              0 (range)
              6 LOAD_GLOBAL              1 (len)
              8 LOAD_FAST                1 (f)
             10 CALL_FUNCTION            1
             12 CALL_FUNCTION            1
             14 GET_ITER
        >>   16 FOR_ITER                46 (to 64)
             18 STORE_FAST               3 (i)

 10          20 LOAD_FAST                2 (out)
             22 LOAD_GLOBAL              2 (bytes)
             24 LOAD_GLOBAL              3 (ord)
             26 LOAD_FAST                1 (f)
             28 LOAD_FAST                3 (i)
             30 BINARY_SUBSCR
             32 CALL_FUNCTION            1
             34 LOAD_GLOBAL              3 (ord)
             36 LOAD_FAST                0 (k)
             38 LOAD_FAST                3 (i)
             40 LOAD_GLOBAL              1 (len)
             42 LOAD_FAST                0 (k)
             44 CALL_FUNCTION            1
             46 BINARY_MODULO
             48 BINARY_SUBSCR
             50 CALL_FUNCTION            1
             52 BINARY_XOR
             54 BUILD_LIST               1
             56 CALL_FUNCTION            1
             58 INPLACE_ADD
             60 STORE_FAST               2 (out)
             62 JUMP_ABSOLUTE           16

 12     >>   64 LOAD_FAST                2 (out)
             66 RETURN_VALUE
None

What I've tried
I've tried some of the tools suggested in similar questions such as uncompyle6, pycbc and pyc-xasm.

However, from my understanding these tools expect a .pyc/python disassembled file with all the 'header information' (python bytecode version, timestamp, flags, etc) to work, which my file does not have, so I was not able to use the tools as they give me errors. I also point this out because I don't fully understand how to use these tools so I might have missed something that would help solve my problem. I would love some help here as well if I did miss something.

My current solution
I'm currently trying to reassemble the source by figuring out how the opcodes work following the docs at https://docs.python.org/3/library/dis.html and writing the corresponding python code. So far I've been able to reproduce the code up untill the second return statement with the python code bellow.

test.py

def bla():
    print("loading app")
    d = magic("8934", get_flag())
    print(d)

def magic():
    return "k\\PbYUHDAM[[VJlVAMVk[VWQE"

Output from python -m dis test.py:

Disassembly of <code object bla at 0x7fea8a11e240, file "test.py", line 5>:                                                                           
  6           0 LOAD_GLOBAL              0 (print)                                                                                                    
              2 LOAD_CONST               1 ('loading app')                                                                                            
              4 CALL_FUNCTION            1                                                                                                            
              6 POP_TOP                                                                                                                               
                                                                                                                                                      
  7           8 LOAD_GLOBAL              1 (magic)                                                                                                    
             10 LOAD_CONST               2 ('8934')                                                                                                   
             12 LOAD_GLOBAL              2 (get_flag)                                                                                                 
             14 CALL_FUNCTION            0                                                                                                            
             16 CALL_FUNCTION            2                                                                                                            
             18 STORE_FAST               0 (d)                                                                                                        
                                                                                                                                                      
  8          20 LOAD_GLOBAL              0 (print)                                                                                                    
             22 LOAD_FAST                0 (d)                                                                                                        
             24 CALL_FUNCTION            1                                                                                                            
             26 POP_TOP                                                                                                                               
             28 LOAD_CONST               0 (None)                                                                                                     
             30 RETURN_VALUE                                                                                                                          
                                                                                                                                                      
Disassembly of <code object magic at 0x7fea8a11e2f0, file "test.py", line 11>:                                                                        
 12           0 LOAD_CONST               1 ('k\\PbYUHDAM[[VJlVAMVk[VWQE')                                                                             
              2 RETURN_VALUE                                                                                                                          

However, I'm having issues reproducing the python code that matches the opcodes on the last block of code (blocks are separeted by multiple '#' ). I've matched some of the opcodes to the correct python instructions but still, the argument counts are incorrect, and the python code obviously makes no sense...so far.

function get_flag:

def get_flag():
    out = b""
    for i in range(len(f)):
        out += bytes([ord(f[i]) ^ ord(k[i % len(k)])])
    return out

dis output of function get_flag

Disassembly of <code object get_flag at 0x7fea8a11e3a0, file "test.py", line 15>:                                                                     
 16           0 LOAD_CONST               1 (b'')           
              2 STORE_FAST               0 (out)

 17           4 LOAD_GLOBAL              0 (range)
              6 LOAD_GLOBAL              1 (len)
              8 LOAD_GLOBAL              2 (f)
             10 CALL_FUNCTION            1
             12 CALL_FUNCTION            1
             14 GET_ITER
        >>   16 FOR_ITER                46 (to 64)
             18 STORE_FAST               1 (i)

 18          20 LOAD_FAST                0 (out)
             22 LOAD_GLOBAL              3 (bytes)
             24 LOAD_GLOBAL              4 (ord)
             26 LOAD_GLOBAL              2 (f)
             28 LOAD_FAST                1 (i)
             30 BINARY_SUBSCR
             32 CALL_FUNCTION            1
             34 LOAD_GLOBAL              4 (ord)
             36 LOAD_GLOBAL              5 (k)
             38 LOAD_FAST                1 (i)
             40 LOAD_GLOBAL              1 (len)
             42 LOAD_GLOBAL              5 (k)
             44 CALL_FUNCTION            1
             46 BINARY_MODULO
             48 BINARY_SUBSCR
             50 CALL_FUNCTION            1
             52 BINARY_XOR
             54 BUILD_LIST               1
             56 CALL_FUNCTION            1
             58 INPLACE_ADD
             60 STORE_FAST               0 (out)
             62 JUMP_ABSOLUTE           16

 19     >>   64 LOAD_CONST               2 ('')
             66 RETURN_VALUE

Specifically, I need help understanding how bytecode argument count can alter the corresponding python code, so I can better reverse the bytecode. Hope my question and goals are clear. All help will be appreciated.

  • So reproducing Python code is the goal of the challenge? Is the challenge online somewhere for us to see? – Kelly Bundy Dec 18 '22 at 02:37
  • Deleted my comment. I misread something. Sorry for the confusion. In any case, you can find the document for the Python3 bytecodes at https://docs.python.org/3.10/library/dis.html Again, is there a bytecode in particular that's confusing you? – Frank Yellin Dec 18 '22 at 03:37
  • @FrankYellin Just curious: how do you know it's Python 3.10, not 3.11? – Kelly Bundy Dec 18 '22 at 03:45
  • I had looked at the 3.11 documentation when I had earlier wrongly claimed you were using 2.x. 3.11 has gotten rid of CALL_FUNCTION and most of the BINARY_xxx opcodes. – Frank Yellin Dec 18 '22 at 06:21
  • Yes @KellyBundy the challenge is available [here](https://ctf.securityvalley.org/login), under "coding" category. Challenge name is Weird code. – Ricardo Uqueio Dec 18 '22 at 12:58
  • @FrankYellin my issue is in understanding how the argument count of the opcodes work. I kind of get that they refer to tos, but I'm having issues relating this information with python code. – Ricardo Uqueio Dec 18 '22 at 19:16
  • 2
    For the various LOAD and STORE opcodes, you can ignore the value, because the disassembly also gives you the actual name. For CALL, it tells you the number of arguments on the stack (with the function just below the arguments). BUILD_LIST tells you how many objects are on the stack. For FOR_ITER, it tells you how far to jump when the iterator is exhausted. – Frank Yellin Dec 18 '22 at 19:39
  • @FrankYellin thank you so much!!! Ignoring the LOAD and STORE opcodes argument values is just the information I needed, as I was stuck trying to understand the meaning behind them! With that in mind plus some more code tweaks, I was able to reverse the code. – Ricardo Uqueio Dec 19 '22 at 08:42

1 Answers1

0

Answer To Security Valley's "Weird Code" CTF

You just need to convert the .pyc file to .py file. However you have to do it manually as uncompyle6 and other libraries does not work because of incomplete .pyc file.

Here is the .py code for .pyc file :-

def get_flag():
    return "k\\PbYUHDAM[[VJlVAMVk[VWQE"

def magic(k,f):
    out = b""
    for i in range(len(f)):
        out += bytes([ord(f[i])^ord(k[i%len(k)])])
    return out
    
def hello():
    print("loading application")
    d = magic('8934',get_flag())
    print(d)

hello()

I Hope this helps :)