0

I am using latest version of angr (9,0,'gitrollling'). [I get the same behavior with angr version (9, 0, 4663)].

Using gcc 9.3.0 I created an ELF binary for this simple C program:

float func3(float y) {
  float temp = 5.5; // expected angr to find this constant
  return y + temp;
}

int main(int argc, char *argv[]) {
  float ans;
  ans = func3(2.2); // expected angr to find this constant
}

I then used angr to extract the constants in my functions (namely 'func3' and 'main') as well as number of arguments for the functions. Unfortunately the answers I get back for constants ("const" in output below) or for "argc" make no sense. I get:

name main const [8, 32, 8, 32, 18446744073709551596, 18446744073709551584, 0, 4202504, 4202504,
    8, 4198767, 128, 4198697, 18446744073709551612, 0, 8, 8, 128] argc -1 

name func3 const [8, 18446744073709551596, 4202500, 4202500, 18446744073709551612,
     18446744073709551596, 0, 18446744073709551612, 8, 8, 128] argc -1 

My angr code:

#!/usr/bin/env python3

import angr
from angrutils import *

def get_attributes(cfg, addr):
    if addr in cfg.kb.functions:
        func = cfg.kb.functions.get_by_addr(addr)
        if func:
            name = func.demangled_name
            if name != 'main' and name != 'func3':
                return # only care about these 2 funcs
            const = func.code_constants
            argc = len(func.arguments) if func.arguments else -1
            print('  name %s const %s argc %s ' % (name, const, argc))
    return

proj = angr.Project('simple', main_opts={'backend': 'elf'}, load_options={'auto_load_libs':False})
main = proj.loader.main_object.get_symbol('main')

start_state = proj.factory.blank_state(addr=main.rebased_addr)
start_state.stack_push(0x0)
with hook0(proj):
    cfg = proj.analyses.CFGFast()  # using CFGEmulated() also does not change the answer!
    #cfg = proj.analyses.CFGEmulated(fail_fast=False, starts=[main.rebased_addr], context_sensitivity_level=1, enable_function_hints=False, keep_state=True, enable_advanced_backward_slicing=False, enable_symbolic_back_traversal=False,normalize=True)

d=dict()
for src, dst in cfg.kb.functions.callgraph.edges():
    if not d.get(src):             # only need to do this once.
        src_attr = get_attributes(cfg, src)
        d[src] = True              # mark completed
    if not d.get(dst):             # only need to do this once.
        dst_attr = get_attributes(cfg, dst)
        d[dst] = True              # mark completed

Where am I going wrong?

  • I strongly suspect that one of those very large numbers is the floating-point number you're looking for. Constants in object files do not have type information associated with them, so this may be the best angr can do. – zwol Feb 17 '21 at 19:58
  • ```#include void foo(long a) { double x; x = *((double *) &a); printf("%ld %f\n", a, x); } int main() { long a = 18446744073709551596; long b = 18446744073709551584; long c = 18446744073709551612; foo(a); foo(b); foo(c); } ``` I tried to see if these are "double" values, but gcc give me warning: foo.c: In function 'main': foo.c:12:12: warning: integer constant is so large that it is unsigned 12 | long a = 18446744073709551596; and prints "-NaN" – Gregory Hines Feb 18 '21 at 16:04
  • It apears that these very large values are not floating point numbers but are stack pointers on my 64bit machine: >>> hex(18446744073709551612) '0xfffffffffffffffc' – Gregory Hines Feb 24 '21 at 21:37
  • I should say these very large number *may* be stack pointers, or offset of a stack pointer. They are small negative numbers in two's complement arithmetic. – Gregory Hines Feb 25 '21 at 15:21

1 Answers1

0

I have no experience with angr, but based on inspecting the assembly generated for your program, I have some hypotheses for what went wrong:

  1. func3 has no side effects and main does not use the value of ans, so the compiler can eliminate the call to func3 entirely, e.g. on x86-64 I get this for main:

    main:
        movl $0, %eax
        ret
    

    So the constant 2.2 may well not be in the executable at all.

  2. Floating point constants usually have to be emitted into memory and loaded by reference, e.g. on x86-64 I get this assembly for func3:

    .section .text
    func3:
        addss   .LC0(%rip), %xmm0
        ret
    .section .rodata
    .LC0:
        .long 1085276160
    

    In a fully linked executable the cross-reference .LC0 becomes a relative offset:

    1125:       f3 0f 58 05 d7 0e 00 00  addss  0xed7(%rip),%xmm0
    112d:       c3                       retq   
    

    It is possible that angr does not recognize this offset as a constant to be extracted, or that it can only extract this offset and not the value in .rodata that it refers to. And even if it could pull out the value in .rodata, the only way it could know that the value should be interpreted as a single-precision float rather than an integer, is if it decoded the instruction that uses the value.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Appreciate the comments @zwol. I should have had main() print the value of "ans". That said, indeed 1085276160=0x40b00000=(float) 5.5 per IEEE floating point representation. In theory "angr" could detect that this value is being load into an XMM register, or perhaps more direcrtly that "addss" instruction is "Add Scalar Single-Precision Floating-Point Values" an thus know that this is floating point constant. – Gregory Hines Mar 01 '21 at 17:57