0

I have some data in hexdump code. left hand are DEC and right hand are hexdump code.

16 = 10
51 = 33
164 = A4 01
388 = 84 03
570 = BA 04
657 = 91 05
1025 = 81 08
246172 = 9C 83 0F

How to calculate any hexdump to DEC ? In perl, I tried to use ord() command but don't work.

Update I don't known what it call. It look like 7bits data. I try to build formula in excel look like these:

DEC = hex2dec(X) + (128^1 * hex2dec(Y-1)) + (128^2 * hex2dec(Z-1)) + ...
  • 164 = 0xa4, not 0xA401 and not 0x01A4 (in case you were making it little endian). You have issues with most of the rest of your conversions. – David Hoelzer Jan 06 '19 at 11:25
  • 4
    Possible duplicate of [How do I convert decimal to hexadecimal in Perl?](https://stackoverflow.com/questions/10481001/how-do-i-convert-decimal-to-hexadecimal-in-perl) – David Hoelzer Jan 06 '19 at 11:26
  • It's not a normal hex2dec conversion. I think it's computer code. – ต้อง เอกมัย Jan 06 '19 at 11:44
  • When the DEC value higher than 128, it has 01 at the end. – ต้อง เอกมัย Jan 06 '19 at 11:46
  • Errr, no.... 1 byte, which is 8 bits or 2 nibbles, can represent any value from zero through 255. 128 would be 0x80.... There is no "1" involved. – David Hoelzer Jan 06 '19 at 12:24
  • 1
    I would guess that any byte that has the high-byte set indicates that another byte is following. Mostly a variable-length encoding not unlike UTF-8. So I would assume that the number is the sum*`0x80**index` of all bytes `& 0x7f` , until you encounter a byte which doesn't have bit 8 set. Much like the Excel formula already is, except not `-1` but `-128`. Converting the Excel formula to Perl is left as an exercise for the reader. – Corion Jan 06 '19 at 12:30
  • Thank you very much. After google your keyword. I think it's variable-length 7-bit integers encoding. – ต้อง เอกมัย Jan 06 '19 at 14:19

1 Answers1

5

What you have is a variable-length encoding. The length is encoded using a form of sentinel value: Each byte of the encoded number except the last has its high bit set. The remaining bits form the two's-complement encoding of the number in little-ending byte order.

0xxxxxxx                   ⇒                   0xxxxxxx
1xxxxxxx 0yyyyyyy          ⇒          00yyyyyy yxxxxxxx
1xxxxxxx 1yyyyyyy 0zzzzzzz ⇒ 000zzzzz zzyyyyyy yxxxxxxx
etc

The following can be used to decode a stream:

use strict;
use warnings;
use feature qw( say );

sub extract_first_num {
   $_[0] =~ s/^([\x80-\xFF]*[\x00-\x7F])//
      or return;

   my $encoded_num = $1;
   my $num = 0;
   for (reverse unpack 'C*', $encoded_num) {
      $num = ( $num << 7 ) | ( $_ & 0x7F );
   }

   return $num;
}

my $stream_buf = "\x10\x33\xA4\x01\x84\x03\xBA\x04\x91\x05\x81\x08\x9C\x83\x0F";
while ( my ($num) = extract_first_num($stream_buf) ) {
   say $num;
}

die("Bad data") if length($stream_buf);

Output:

16
51
164
388
570
657
1025
246172
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1
    Differences from UTF-8: UTF-8 uses a length prefix rather than a sentinel value, it uses big-ending byte order, and it "wastes" bits to make seeking possible. – ikegami Jan 06 '19 at 15:04
  • 1
    Like UTF-8, this format has multiple ways of encoding the same number. For example, one can be encoded as `01`, `81 00`, `81 80 00`, `81 80 80 00`, etc. UTF-8 considers "overly long" encodings to be illegal. – ikegami Jan 06 '19 at 21:01
  • I am working with seeking job too. Could you please guide more example ? My data are separated by some bytes and in series/stream. Eg. `08 XX XX 10 YY YY 20 ZZ ZZ ZZ 30` Sometimes XX XX can be the value of 10 and the seek will invalid. – ต้อง เอกมัย Jan 07 '19 at 20:34
  • 1
    What I meant is that if you seeked into the middle of an encoded number, you could find the end of the number. That assumes you know you are seeking into an encoded number (e.g. if the file consists entirely of encoded numbers). If you want help with your seeking problem, ask a Question. Be sure to provide clearer and more comprehensive information. – ikegami Jan 08 '19 at 05:02