0

I have been looking for a way to extract constants from C source files and reverse their byte order in one automated process (no manual input). So far, I've managed to utilize pycparser to do most of the heavy lifting for me and created a script that will print out all of the constants of a C file to the console. The format it prints is like this: Constant: int, 0x243F6A88

My question is does anyone know of an intuitive way to automate this conversion process in Python? I know how to reverse the byte order with join() but I am struggling to think of a way to do this in which I can minimize the amount of manual input. Ideally, my script would print out the constants (done already) and then use some sort of regex(maybe?) to convert any constant that starts with a 0x (there are a lot of random numbers that get printed that I don't want). I hope this makes sense, thanks!

what I have so far:

class ConstantVisitor(c_ast.NodeVisitor):
def __init__(self):
    self.values = []
def visit_Constant(self, node):
    self.values.append(node.value)
    node.show(showcoord=True)

def show_tree(filename):
# Note that cpp is used. Provide a path to your own cpp or
# make sure one exists in PATH.
    ast = parse_file(filename, use_cpp=True,cpp_args=['-E', r'-Iutils/fake_libc_include'])
    cv = ConstantVisitor()
    cv.visit(ast)

if __name__ == "__main__":
    if len(sys.argv) > 1:
        filename  = sys.argv[1]
    else:
        filename = 'xmrig-master/src/crypto/c_blake256.c'

    show_tree(filename)
GoMonkey
  • 321
  • 1
  • 2
  • 7
  • can you add example code to your question? – Sufiyan Ghori Nov 28 '18 at 03:38
  • It feels more stable to use a proper C parser instead of relying on regexes. Have a look at https://github.com/eliben/pycparser or something similar. – Selcuk Nov 28 '18 at 03:46
  • Hey Selcuk, I am using pycparser to print out the constants. However, I am looking for a way to reverse the byte order of those constants and I don't know if pycparser has that functionality. Thanks! – GoMonkey Nov 28 '18 at 04:26
  • If you only want to handle `0xDEADBEEF` style integer constants and they're always full length in the source (exactly eight hex digits), then using a regex to process an entire C file as text is pretty easy: `new_file.write(re.sub(r'\b0x([0-9A-Fa-f]{2})([0-9A-Fa-f]{2})([0-9A-Fa-f]{2})([0-9A-Fa-f]{2})\b', r'0x\4\3\2\1', old_file.read())`. Using a C parser library is going to be more robust, but also more complicated. To make it work, you'd need to make your `Visitor` print out *everything* it encounters. Hex constants would get byteswapped before printing, everything else would stay the same. – Blckknght Nov 28 '18 at 05:15
  • Hey Blckknght, thanks for the reply. I know how to make the visitor print out everything it encounters but do you know of how I can byteswap the constants before printing? Thanks – GoMonkey Nov 29 '18 at 18:26

1 Answers1

0

You seem to have 3 steps in the task:

  1. Parse the code with pycparser - you have that
  2. Find all constants (just integer constants? how about floats?) and reverse their byte order
  3. Do something with the results

For (2) you can use something like the suggestions in this answer, but adjust it to the actual types you need.

For (3) it's not clear what you're trying to do; are you trying to write the constants back to the original C file? pycparser is not the best tool for that, then. You may want to use the Python bindings to Clang instead, because Clang tools are designed to modify existing code in place.

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412