As you noted, the assembly instructions referencing the string typically look like this:
push offset aString
After assembling and linking this is resolved to an actual address say:
push 0x00ABCDEF
This gives you two options:
- Write data: modify the contents of
aString
(ie. memory pointed to by 0x00ABCDEF
)
- Write code: modify references to
aString
Write Data
When source code is compiled involving standard C string literals (immutable array of characters in memory), at runtime the string is typically mapped to some read-only page with all the other read-only data. This data is generally packed contiguously to reduce the memory footprint of the program. This is the problem that you are hitting by trying to write a larger string. You will overwrite the next piece of data and any references to this overwritten data will now point to the middle of your large string.
Writing a longer string by changing the data is non-trivial because in order to not lose the original functional behaviour, you must shift all data after your string forward. After that you must update all the references to the shifted data (some of which may be calculated dynamically with pointer arithmetic). As I say, this process is non-trivial - you are trying to reproduce the task of the linker in terms of relocation without full (if any) symbolic information.
Write Code
The easy way out is to write your new string at some arbitrary location. This might be unused but reserved memory in the process already (commonly referred to as 'code caves') or it might be a string literal which you map in when you inject your DLL. Alternatively, you could allocate this dynamically at runtime after your injection.
The next step is to find all references to aString
and replace them to reference your new string instead.
Bonus Method :)
Since you are delving with reverse engineering at this level, likely you have come across the concept of detours/interception/instrumentation. A similar approach can be applied here to intercept all references and redirect them at runtime. This will cause a heavier hit on performance than the 'Write Code' method outlined above, but will guarantee that all accesses are caught and redirected.
A hardware breakpoint on access is set to the data pointed at by the string. When the breakpoint is triggered, some register will hold the address of the string. In assembly, this might look something like this:
mov esi, 0x00ABCDEF
...
If the first character is accessed, the code might do this:
mov al, byte ptr ds:[esi]
When your breakpoint is hit, you can set the thread context (SetThreadContext
on Windows) to modify the value of esi
to point to your new string.