1

I'm using baksmali and dexlib2 library in Android Studio project. I'm trying to find a method from dex file. Inside method definition, there is a unique string in one of the method body statements. I want to find that method in the fastest possible way.
As example:

public boolean A01() {
    ..........
    return tvar.quals("unique_string") ? false : true;
}

I want to find this method using this "unique_string".

Currently I'm using DexBackedDexFile.getClasses() to get all classes from the dex file and decoding all of them into smali code. Then searching the string inside the generated smali code. I'm able to find my desired method this way, but decoding thousands of class is time consuming. So I'm thinking if there is a quick way to do that.

There is a way to quickly get a string reference from dex file using DexBackedDexFile.getStringReferences(), which returns a DexBackedStringReference object. I can find my desired string reference quickly this way, which contains a stringIndex. I'm wondering if there is a way to find the method using this index number. Thanks

Nobody
  • 13
  • 3
  • If it's for a serious purpose you could always try Unix `strings` – g00se Jul 10 '21 at 21:00
  • @Robert Yes this app has multiple dex, I'm already using this trick to detect which dex file to search. But still it takes 7-8 seconds to decode all classes. – Nobody Jul 10 '21 at 21:18
  • @Robert From what I figured out, that all strings are stored in a different section inside dex file, so that means if the method use this string, it needs to identify this string using a reference somehow. there should be a link between them. I guess. – Nobody Jul 10 '21 at 21:22
  • What exactly does "finding" a method mean? – chrylis -cautiouslyoptimistic- Jul 10 '21 at 21:27
  • @Robert searching isn't taking time, decoding the class into smali is what taking time. I'm not sure if decoding process already using multi-thread. – Nobody Jul 10 '21 at 21:28
  • @chrylis-cautiouslyoptimistic- means getting a method object `DexBackedMethod` which contains method name, it's class, other details etc.. – Nobody Jul 10 '21 at 21:30
  • @Robert Thanks. Using Multi-Thread, I manged to speed up the decoding process. It's now almost 3x faster. – Nobody Jul 11 '21 at 20:04
  • I converted my comments into an answer. – Robert Jul 19 '21 at 12:04

1 Answers1

1

If the app has multiple dex files, using the string table you know which dex file to load and thus you can reduce the number of methods to search in.

Unfortunately I don't think that there is a different way than just decoding searching in all methods for the string reference. Android does not need any back references from the string pool to the used method so it does not exist in dex format.

But you can speed-up the search process by distribute the method-decoding and searching for the string reference into multiple threads. As far as I know the Dex decoding library dexlib2 used in baksmali does not use threads at all.

Modern Android devices have 4-8 cores and decoding a DEX method is a process that can run in parallel without having to synchronize anything, therefore you should be able to nearly get a speed-up e.g. by 400% on a 4-core system.

Robert
  • 39,162
  • 17
  • 99
  • 152
  • I still want to know if it's possible what I asked in the original question. As I have mentioned, all strings are stored in a different section inside a dex file, so that means if a method use this string, it needs to identify this string using a reference somehow. I am hoping there is a way to point it directly. – Nobody Jul 27 '21 at 18:14
  • @Nobody On dex code level I don't see why strings should be stored that way. The dex commands `const-string` for loading a string just gets the string index, hence it is totally unrelated to where the string is stored within the file. https://source.android.com/devices/tech/dalvik/dalvik-bytecode Even if strings are sometimes located next to the method this may be optional and thus you don't know if the string you are searching is stored near or not. – Robert Jul 27 '21 at 18:22
  • You are right. I didn't know it was loading the string with it's index only. I looked into dexlib2 source code now, it seems they are retrieving the string using string index embedded inside method instruction.. Which gave me another idea. Instead of decoding thousands of classes, all I had to do is iterate through method instructions and if it's opcode is `CONST_STRING` type, then load the string and compare it. The result is instant now. Thank you for showing me the light. – Nobody Jul 27 '21 at 23:51
  • @nobody Don't forget that there are two opcodes for string loading. The newer const_string jumbo instruction can also occur. – Robert Jul 28 '21 at 02:33
  • Yeah, I was scratching through source code to understand if it's gonna be a problem, it seems dexlib2 is taking care of it internally. – Nobody Jul 28 '21 at 11:07