I've been working on a project that analyzes EVM-assembly of Solidity smart contracts. Currently I am stuck with the problem of finding the endings of all the contract functions in the assembly. There is a bruteforce approach with simulating the EVM and simply tracking at what line the execution reaches the finish, but producing a complete EVM simulator is, I am afraid, well beyond my capabilities. I am searching for a simpler solution if there is one.
So far I've managed to (almost) consistenly find beginnings of the functions (corresponding JUMPDESTs) in the assembly assuming that I have access to the contract's ABI. The idea there is quite simple. At the top of the EVM assembly file there are multiple blocks looking as such:
PUSH4 0x8ac28d5a
GT
PUSH2 0x191
JUMPI
DUP1
and also as such:
PUSH4 0xfeaf968c
EQ
PUSH2 0xc82
JUMPI
PUSH2 0x2f4
JUMP
JUMPDEST
DUP1
Let's call them "header blocks" (if there is an official name, I am sorry for my illeteracy :) ). Each header block compares the hash of the method signature that came in the calldata and decides whether to jump on the JUMPDEST that corresponds to the beginning of the desired function. But there is a catch. As you can see, there is a GT at the top of the first header block. Why would we compare hashes with less/greater? So the header blocks do not perform a linear search over all the signatures. Instead, they do some kind of a logarithmic search as I deduced (please correct me if I am wrong). And, as we can see in the second header block, in some cases they can decide to unconditionally proceed somewhere else seemingly in the middle of the search process. But in reality, they just have enough information at that moment to infer that there is no function in this assembly that has the required hash of the signature. So we can deduce that those "else" JUMPs jump right to the fallback.
So this is the context of what I have done so far. I am able to obtain the list of the beginnings of all the functions including the fallback. Obtaining the list of the ends of the functions is what I am currently struggling with. So far I've had a hypothesis that I can split the whole assembly file by JUMPDESTs of the beginnings of functions (and the dispatch part with header blocks) and each part except the first will correspond to each Solidity function. Unfortunately, it can be easily disproven by looking at what is the assembly of a basic contract with only a couple of functions. You can experiment yourself at godbolt.org (a little example here). There will be a number of auxiliary "functions" created by the Solidity compiler. So my approach is not viable here. Are there any approaches of finding the endings of the functions without simulating the EVM?