0

I'm working on a basic file carver and I'm currently stuck on calculate the byte position of the file.

I've worked out that I need a piece of code to perform the following steps;

  1. Locate the $searchQuery in the variable
  2. Remove the rest of the string after the $searchQuery is found
  3. Count the number of fields that now exist within the variable
  4. Minus 2 from this variable to take into account the Hex Offset and the $searchQuery itself
  5. Then multiply the answer by two to get the correct byte count

An example of this would be;

  1. Locate "ffd8" within "00052a0: b4f1 559c ffd8 ffe0 0010 4a46 4946 0001"
  2. Variable is updated to "00052a0: b4f1 559c ffd8"
  3. $fieldCount is assigned the value of "4"
  4. $fieldCount=((fieldCount-2))
  5. $byteCount=((fieldCount*2))

I have a basic idea of how to do everything but count the number of fields in the variable. For example, how would I count how many fields there are in the variable until the $searchQuery is found? And similarly, how do I count the number of fields once I've removed the unnecessary part of the string?

After locating the $searchString with grep I have no idea how to proceed. My current code looks like this;

#!/bin/bash
#***************************************************************
#Name:          fileCarver.sh
#Purpose:       Extracts files hidden within other files
#Author:        
#Date Written:      12/01/2013
#Last Updated:      12/01/2013
#***************************************************************

clear

#Request user input
printf "Please enter the input file name: "
read inputFile
printf "Please enter the search string: "
read searchString

#Search for the required string
searchFunction()
{
    #Search for required string and remove unnecessary characters
    startHexOffset=`xxd $1 | grep $2 | cut -d":" -f 1`
    #Convert the Hex Offset to Decimal
    startDecOffset=$(echo "ibase=16;${startHexOffset^^}" | bc)
}

searchFunction $inputFile $searchString


exit 0

Thanks for the help!

Revenant
  • 19
  • 1
  • 4
  • read about awk and it's `FS` (field separator), and `NF` (Number of Field) variables. You can eliminate all of the extra processes like `grep` and `cut`. Also, you question is a little unclear, what you you see being the output of this function? The doc block says "files hidden within other files", but your sample data doesn't seem to support that. (Sample output please). Good luck! – shellter Jan 12 '13 at 23:34
  • Thank you, I will do! Essentially it will be able to extract a file that is obfuscated with non related data by obtaining the exact bytes that the hidden file starts at. For example, it will search for the JPG header "ffd8" and footer "ffd9" and basically just cut and paste all the data from both ends into another file, allowing the image to be viewed normally. – Revenant Jan 12 '13 at 23:55
  • I added the line `echo $fullOffset | awk -F " " "/$searchString/{print NF}"` but it only outputs the amount of fields in the variable. The variable contains `00052a0: b4f1 559c ffd8 ffe0 0010 4a46 4946 0001 ..U.......JFIF..`. – Revenant Jan 13 '13 at 01:16
  • ok, yes. now, you can interate thru all elements of the current line with `{for (i=1;i<=NF;i++){if ($i ~ /$searchString/) printf("fldNum=%d=%s\n", i, $i)}` Note how `i` can be a counter, like in `c`, but also referenced for it's value with `$i`. This is true for all variables that hold numbers in awk, including NF, hence $NF will print the last element on the line, and `$(NF-3)` will print the 3rd from last field on the line (for example). you could use `$(NF-n)` with n as an integer value. You can assingn the value of i to `startPos` & `endPos` and then loop thru that range 4urfile.Good luck. – shellter Jan 13 '13 at 03:10
  • If I'm understanding what you are trying to do correctly, parsing the output of xxd looking for a given pattern is a really inflexible method. What if the byte sequence you are looking for doesn't align itself to a 4-byte boundary? What if, when dumped by xxd, your pattern is split across lines? – Josh Cartwright Jan 13 '13 at 03:17
  • @JoshCartwright You're right... I can't think how to alter my current code to take this into account, although rici's suggestion below seems to take that into account. shellter I don't understand that piece of code you've given and when I attempt to use it I'm given syntax errors, starting with "i=1" – Revenant Jan 13 '13 at 21:04

2 Answers2

0

You might find this easier if you convert the file to hex in a simpler format. For example, you can use the command

hexdump -v -e '/1 "%02x "' $FILE

to print the file with every byte converted to exactly three characters: two hex digits and a space.

You could find all instances of ffd8 prefixed with their byte offset:

hexdump -v -e '/1 "%02x "' $FILE | grep -Fbo 'ff d8 '

(The byte offsets need to be divided by 3.)

So you could stream the entire file from the first instance of ffd8 using:

tail -c+$((
  $(hexdump -v -e '/1 "%02x "' $FILE | grep -Fbo 'ff d8 ' | head -n1 | cut -f1 -d:)
  / 3 + 1)) $FILE

(That assumes that whatever you use to display the file knows enough to stop when it hits the end of the image. But you could similarly find the last end marker.)

This depends on GNU grep; standard Posix grep lacks the -b option. However, it can be done with awk:

tail -c+$(
    hexdump -v -e '/1 "%02x\n"' $FILE |
    awk '/d8/&&p=="ff"{print NR-1;exit}{p=$1}'
  ) $FILE

Explanation of options:

tail    -c+N    file starting at byte number N (first byte is number 1)

hexdump -v      do not compress repeated lines on output
        -e 'FORMAT'  use indicated format for output:
            /1       each format consumes 1 byte
            "%02X "  output two hex digits, including leading 0, using lower case,
                     followed by a space.

grep    -F      pattern is just plain characters, not a regular expression
        -b      print the (0-based) byte offset of the... 
        -o      ... match instead of the line containing the match

cut     -f1     output the first field of each line
        -d:     fields are separated by :
rici
  • 234,347
  • 28
  • 237
  • 341
  • Given Josh's input above, your method takes into account file headers/footers that are split over multiple lines which makes it better that what I already have, but I'm having a little trouble understanding it. The method I've been using searches for the header, in this case "ffd8", and take the offset for that line and converts it to decimal. The number of bytes in between the start of the line and the header is then added onto that decimal number to make the starting position for the dd process. The same is then done for the file footer but with the bytes in the footer also included. – Revenant Jan 13 '13 at 21:05
  • With the starting and ending position now known the difference between the two is calculated and the file is extracted using the dd command, skipping unwanted data (the starting position) and extracting a specific length of bytes (the difference between the starting and end position). Now, with your method, am I right in thinking that after it parses all data in the file so its viewed as single bytes separated with a single space, it removes all data before the file header with the ability to repeat this again for the footer, but remove all data after? – Revenant Jan 13 '13 at 21:14
  • Thank you! I got it working and understand pretty much all of it. Would you be able to explain the significance of the "/ 3 + 1" at the end though? If I change it to "/ 4" the line no longer works. – Revenant Jan 13 '13 at 22:30
  • @revenant: I agree that's a little messy, but the first part was really long. Think of it as `x / 3 + 1` (i.e. one more than x/3). The `+1` is because `tail` thinks of the first byte as the first byte, whereas `grep` reports it as the 0th byte. – rici Jan 13 '13 at 22:32
  • Okay it turns out the server I have to run this from doesn't like the -b flag for grep, which in turns breaks the entire script... Is there any alternative to " grep -Fbo "ff d8" "? – Revenant Jan 14 '13 at 22:01
  • @Revenant: Put a slightly different expression using awk into the answer. Hope it helps. – rici Jan 14 '13 at 22:59
  • I only just noticed you updated the answer above... I ended up coming up with my own solution, which is probably over complicated but it works fine on the server that doesn't like greps -b flag. Only downside is it won't work if the header/footer is split across multiple lines. – Revenant Jan 15 '13 at 00:15
0

try:

echo "00052a0: b4f1 559c ffd8 ffe0 0010 4a46 4946 0001"| awk '
{
for (a=1;a<=NF; a++) {
    if ($a == "ffd8") {
        print substr($0,0,index($0,$a)+length($a))
        break
        }
    }
}'

output: 00052a0: b4f1 559c ffd8

Henry Barber
  • 123
  • 1
  • 5