I have a text file with python bytecode which is part of the output you'd get when issuing python -m dis file.py.
My goal is to reassemble the source from the bytecode.
I've seen a similar question asked here but the answers provided are focused on tools that (from my understanding) should solve the problem only if my bytecode file had all the necessary info (python bytecode version, timestamp, flags, etc).
.pyc code:
##################################################
15 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('loading application')
4 CALL_FUNCTION 1
6 POP_TOP
17 8 LOAD_GLOBAL 1 (magic)
10 LOAD_CONST 2 ('8934')
12 LOAD_GLOBAL 2 (get_flag)
14 CALL_FUNCTION 0
16 CALL_FUNCTION 2
18 STORE_FAST 0 (d)
19 20 LOAD_GLOBAL 0 (print)
22 LOAD_FAST 0 (d)
24 CALL_FUNCTION 1
26 POP_TOP
28 LOAD_CONST 0 (None)
30 RETURN_VALUE
None
##################################################
4 0 LOAD_CONST 1 ('k\\PbYUHDAM[[VJlVAMVk[VWQE')
2 RETURN_VALUE
None
##################################################
7 0 LOAD_CONST 1 (b'')
2 STORE_FAST 2 (out)
9 4 LOAD_GLOBAL 0 (range)
6 LOAD_GLOBAL 1 (len)
8 LOAD_FAST 1 (f)
10 CALL_FUNCTION 1
12 CALL_FUNCTION 1
14 GET_ITER
>> 16 FOR_ITER 46 (to 64)
18 STORE_FAST 3 (i)
10 20 LOAD_FAST 2 (out)
22 LOAD_GLOBAL 2 (bytes)
24 LOAD_GLOBAL 3 (ord)
26 LOAD_FAST 1 (f)
28 LOAD_FAST 3 (i)
30 BINARY_SUBSCR
32 CALL_FUNCTION 1
34 LOAD_GLOBAL 3 (ord)
36 LOAD_FAST 0 (k)
38 LOAD_FAST 3 (i)
40 LOAD_GLOBAL 1 (len)
42 LOAD_FAST 0 (k)
44 CALL_FUNCTION 1
46 BINARY_MODULO
48 BINARY_SUBSCR
50 CALL_FUNCTION 1
52 BINARY_XOR
54 BUILD_LIST 1
56 CALL_FUNCTION 1
58 INPLACE_ADD
60 STORE_FAST 2 (out)
62 JUMP_ABSOLUTE 16
12 >> 64 LOAD_FAST 2 (out)
66 RETURN_VALUE
None
What I've tried
I've tried some of the tools suggested in similar questions such as uncompyle6, pycbc and pyc-xasm.
However, from my understanding these tools expect a .pyc/python disassembled file with all the 'header information' (python bytecode version, timestamp, flags, etc) to work, which my file does not have, so I was not able to use the tools as they give me errors. I also point this out because I don't fully understand how to use these tools so I might have missed something that would help solve my problem. I would love some help here as well if I did miss something.
My current solution
I'm currently trying to reassemble the source by figuring out how the opcodes work following the docs at https://docs.python.org/3/library/dis.html and writing the corresponding python code. So far I've been able to reproduce the code up untill the second return statement with the python code bellow.
test.py
def bla():
print("loading app")
d = magic("8934", get_flag())
print(d)
def magic():
return "k\\PbYUHDAM[[VJlVAMVk[VWQE"
Output from python -m dis test.py:
Disassembly of <code object bla at 0x7fea8a11e240, file "test.py", line 5>:
6 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('loading app')
4 CALL_FUNCTION 1
6 POP_TOP
7 8 LOAD_GLOBAL 1 (magic)
10 LOAD_CONST 2 ('8934')
12 LOAD_GLOBAL 2 (get_flag)
14 CALL_FUNCTION 0
16 CALL_FUNCTION 2
18 STORE_FAST 0 (d)
8 20 LOAD_GLOBAL 0 (print)
22 LOAD_FAST 0 (d)
24 CALL_FUNCTION 1
26 POP_TOP
28 LOAD_CONST 0 (None)
30 RETURN_VALUE
Disassembly of <code object magic at 0x7fea8a11e2f0, file "test.py", line 11>:
12 0 LOAD_CONST 1 ('k\\PbYUHDAM[[VJlVAMVk[VWQE')
2 RETURN_VALUE
However, I'm having issues reproducing the python code that matches the opcodes on the last block of code (blocks are separeted by multiple '#' ). I've matched some of the opcodes to the correct python instructions but still, the argument counts are incorrect, and the python code obviously makes no sense...so far.
function get_flag:
def get_flag():
out = b""
for i in range(len(f)):
out += bytes([ord(f[i]) ^ ord(k[i % len(k)])])
return out
dis output of function get_flag
Disassembly of <code object get_flag at 0x7fea8a11e3a0, file "test.py", line 15>:
16 0 LOAD_CONST 1 (b'')
2 STORE_FAST 0 (out)
17 4 LOAD_GLOBAL 0 (range)
6 LOAD_GLOBAL 1 (len)
8 LOAD_GLOBAL 2 (f)
10 CALL_FUNCTION 1
12 CALL_FUNCTION 1
14 GET_ITER
>> 16 FOR_ITER 46 (to 64)
18 STORE_FAST 1 (i)
18 20 LOAD_FAST 0 (out)
22 LOAD_GLOBAL 3 (bytes)
24 LOAD_GLOBAL 4 (ord)
26 LOAD_GLOBAL 2 (f)
28 LOAD_FAST 1 (i)
30 BINARY_SUBSCR
32 CALL_FUNCTION 1
34 LOAD_GLOBAL 4 (ord)
36 LOAD_GLOBAL 5 (k)
38 LOAD_FAST 1 (i)
40 LOAD_GLOBAL 1 (len)
42 LOAD_GLOBAL 5 (k)
44 CALL_FUNCTION 1
46 BINARY_MODULO
48 BINARY_SUBSCR
50 CALL_FUNCTION 1
52 BINARY_XOR
54 BUILD_LIST 1
56 CALL_FUNCTION 1
58 INPLACE_ADD
60 STORE_FAST 0 (out)
62 JUMP_ABSOLUTE 16
19 >> 64 LOAD_CONST 2 ('')
66 RETURN_VALUE
Specifically, I need help understanding how bytecode argument count can alter the corresponding python code, so I can better reverse the bytecode. Hope my question and goals are clear. All help will be appreciated.