2

I have an application where I can upload files and add metadata to the file. This metadata information is stored in a database, but parts of the added information is encoded somehow (sadly I have no access to the source code).

The raw representation of the metadata in the Oracle database is as follows:

00000009010000000000000000512005B69801505B000000010000000700000040000000010000000A0100000006496D616765000000003C000000010000000A010000000A696D6167652F706E670000000027000000030000000501000000010000000500000001010000000B64653A3132332E706E6700000002A8000000030000000501000000030000000700000001010000000E737461636B6F766572666C6F770000000042000000010000000A010000001844433078303166363565396420307830303033336433640000000A2600000001000000020100033D3D0000003E000000010000000A0100000021346266653539343939343631356333323861613736313431636337346134353900

Whereas the raw sequence

737461636B6F766572666C6F77

corresponds to

stackoverflow

The query select UTL_RAW.CAST_TO_VARCHAR2(<raw_data>) from dual;

returns the string below:

CASTTOVARCHAR2

Here the values of the metadata are shown. But the names/identifier of the properties are unreadable. The corresponding name/identifier of stackoverflow should be test or a foreign key to a table that contains test. The other data contains additional information about the file (like the checksum, title or mime type)

Is it possible to retrieve the unreadable data (identifier) from the raw string?

true_gler
  • 544
  • 6
  • 23
  • you might wanna post this at Code Golf @ Stack Exchange. – Sudipta Mondal Aug 14 '18 at 10:11
  • they sent me back here :) – true_gler Aug 14 '18 at 11:57
  • s maps to 76 t maps to 77 and o to 6F, could be a simple mapping – Wernfried Domscheit Aug 14 '18 at 12:50
  • I think it depends on the way the unreadable data encodes your information. You can use UTL_RAW.SUBSTR to pull out part of the raw data and then try various cast functions on the substring but they will only work if the data is in one of Oracle's formats. Otherwise I guess you could pull out the substring and use DUMP() to convert it to hex and write some code to convert that to the format you want. – Bobby Durrett Aug 16 '18 at 18:47
  • There are a lot of zeros (nulls) and what could be control characters in there, so I don't think it's text data you can recover. They look like a mix of (maybe) byte counts for the following values and perhaps indicators or flags for what the next block of bytes represents; so perhaps some are enum values which would make more sense than labels. (Once section almost looks like an encoded epoch date/time, until I realised is wasn't on a byte boundary...). I don't think this is something you can interpret further without access to the application source code. – Alex Poole Aug 17 '18 at 16:59
  • I got it by myself. By analyzing loads of metadata and adding custom and self-constructed values, I was able to recognize the pattern they used to store the information. The reference to the identifier is hidden in (and at position of): "A01". Thanks for all the help and answers! – true_gler Aug 20 '18 at 12:47
  • Both "A01" and "101" are used to preface a 4 byte length followed by the Text, which is null terminated – donPablo Aug 21 '18 at 00:52

2 Answers2

1

RAW columns are not always containing a string, since the results it looks like that the content is binary data, more exactly a jpg file which has a string header in it but among binary information. Converting it to a varchar will generate invalid charcode that are represented as rectangular boxes. What you are doing here with varchar is the equivalent of opening a binary file, i.e a winword.doc or even a .jpeg by using Notepad.

To be able to get the content you need to treat it as image, not as varchar.

You can obtain the jpg file by using PLSQL as described here: http://www.dba-oracle.com/t_extract_jpg_image_photo_sql_file.htm

Eventually it is possible to get all the content without loss in a char datatype using the following:

select RAWTOHEX(<raw_data>) from dual;

This will return the whole content as character value containing its hexadecimal equivalent and should not present any invalid ANSI character which is rapresented with a rectangular box. Indeed you will not be able to read anymore "stackoverflow" or any other text since you will get only a sequence of HEX values. You will need then from your program to convert it to binary/image and treat it properly.

A. Lion
  • 673
  • 5
  • 12
1

Both "A01" and "101" are used to preface a 4 byte length followed by the Text, which is null terminated

00000009 010000000000000000512005B69801505B000000010000000700000040000000010000000A01
00000006 496D61676500                   Image

0000003C 000000010000000A01
0000000A 696D6167652F706E6700           image/png

00000027 00000003000000050100000001000000050000000101
0000000B 64653A3132332E706E6700         de:123.png

000002A8 00000003000000050100000003000000070000000101
0000000E 737461636B6F766572666C6F7700   stackoverflow

00000042 000000010000000A01
00000018 444330783031663635653964203078303030333364336400
          D C 0 x 0 1 f 6 5 e 9 d   0 x 0 0 0 3 3 d 3 d

00000A26 00000001000000020100033D3D0000003E000000010000000A01
00000021 346266653539343939343631356333323861613736313431636337346134353900
          4 b f e 5 9 4 9 9 4 6 1 5 c 3 2 8 a a 7 6 1 4 1 c c 7 4 a 4 5 9
donPablo
  • 1,937
  • 1
  • 13
  • 18