-1

I have a text file that contains 4 parts that I want to extract. I managed to get 3 of them but am struggling with the last one as it has a different marker. Can anyone please help me out. below is a chunk of the text file.

*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x00001AB8
  Huffman table length = 418
  ----
  Destination ID = 0
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (001 total): 00 
    Codes of length 03 bits (005 total): 01 02 03 04 05 
    Codes of length 04 bits (001 total): 06 
    Codes of length 05 bits (001 total): 07 
    Codes of length 06 bits (001 total): 08 
    Codes of length 07 bits (001 total): 09 
    Codes of length 08 bits (001 total): 0A 
    Codes of length 09 bits (001 total): 0B 
    Codes of length 10 bits (000 total): 
    Codes of length 11 bits (000 total): 
    Codes of length 12 bits (000 total): 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (000 total): 
    Codes of length 16 bits (000 total): 
    Total number of codes: 012

  ----
  Destination ID = 1
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (003 total): 00 01 02 
    Codes of length 03 bits (001 total): 03 
    Codes of length 04 bits (001 total): 04 
    Codes of length 05 bits (001 total): 05 
    Codes of length 06 bits (001 total): 06 
    Codes of length 07 bits (001 total): 07 
    Codes of length 08 bits (001 total): 08 
    Codes of length 09 bits (001 total): 09 
    Codes of length 10 bits (001 total): 0A 
    Codes of length 11 bits (001 total): 0B 
    Codes of length 12 bits (000 total): 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (000 total): 
    Codes of length 16 bits (000 total): 
    Total number of codes: 012

  ----
  Destination ID = 0
  Class = 1 (AC Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (002 total): 01 02 
    Codes of length 03 bits (001 total): 03 
    Codes of length 04 bits (003 total): 00 04 11 
    Codes of length 05 bits (003 total): 05 12 21 
    Codes of length 06 bits (002 total): 31 41 
    Codes of length 07 bits (004 total): 06 13 51 61 
    Codes of length 08 bits (003 total): 07 22 71 
    Codes of length 09 bits (005 total): 14 32 81 91 A1 
    Codes of length 10 bits (005 total): 08 23 42 B1 C1 
    Codes of length 11 bits (004 total): 15 52 D1 F0 
    Codes of length 12 bits (004 total): 24 33 62 72 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (001 total): 82 
    Codes of length 16 bits (125 total): 09 0A 16 17 18 19 1A 25 26 27 28 29 2A 34 35 36 
                                         37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 
                                         57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 
                                         77 78 79 7A 83 84 85 86 87 88 89 8A 92 93 94 95 
                                         96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 
                                         B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA 
                                         D2 D3 D4 D5 D6 D7 D8 D9 DA E1 E2 E3 E4 E5 E6 E7 
                                         E8 E9 EA F1 F2 F3 F4 F5 F6 F7 F8 F9 FA 
    Total number of codes: 162

  ----
  Destination ID = 1
  Class = 1 (AC Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (002 total): 00 01 
    Codes of length 03 bits (001 total): 02 
    Codes of length 04 bits (002 total): 03 11 
    Codes of length 05 bits (004 total): 04 05 21 31 
    Codes of length 06 bits (004 total): 06 12 41 51 
    Codes of length 07 bits (003 total): 07 61 71 
    Codes of length 08 bits (004 total): 13 22 32 81 
    Codes of length 09 bits (007 total): 08 14 42 91 A1 B1 C1 
    Codes of length 10 bits (005 total): 09 23 33 52 F0 
    Codes of length 11 bits (004 total): 15 62 72 D1 
    Codes of length 12 bits (004 total): 0A 16 24 34 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (001 total): E1 
    Codes of length 15 bits (002 total): 25 F1 
    Codes of length 16 bits (119 total): 17 18 19 1A 26 27 28 29 2A 35 36 37 38 39 3A 43 
                                         44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 
                                         64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 82 
                                         83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 
                                         9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 
                                         B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 
                                         D6 D7 D8 D9 DA E2 E3 E4 E5 E6 E7 E8 E9 EA F2 F3 
                                         F4 F5 F6 F7 F8 F9 FA 
    Total number of codes: 162

 
*** Marker: SOF0 (Baseline DCT) (xFFC0) ***
 

I need to extract the tables that starts with "destination ID". I have taken the first three (0/0, 0/1, 1/0) but cannot take 1/1. I tried many things such as iter() and basic read()/readLines() but they didn't work. The "read()" did work for the other three tables though.

NBG
  • 30
  • 7

1 Answers1

2

Assuming each Destination ID has the Total number of codes parameter, you could use the following (not the best approach, but works):

with open('ex.txt') as f:
    lines = f.readlines()


starts = [i for i in range(len(lines)) if lines[i].startswith('  Destination ID = ')]
ends= [i for i in range(len(lines)) if lines[i].startswith('    Total number of codes')]

This way you have the start and end indexes of the block of texts you are looking for. Hope this helps.

SavvY
  • 131
  • 1
  • 6
  • That looks like it’s gonna work, I’m outside rn, when I get back I’ll try it, Thank you! – NBG Jun 19 '22 at 13:09
  • 1
    You can then query saying: `lines[starts[0]:ends[0]+1]` and you'll get the blocks accordingly. Replace 0 with the block number you are looking for. – SavvY Jun 19 '22 at 13:10
  • Sure, I’ll comment here if it works. – NBG Jun 19 '22 at 13:11
  • One thing tho, I have two tables that have destination ID = 1 – NBG Jun 19 '22 at 13:15
  • In other words, I have to check two lines in order to validate my index. – NBG Jun 19 '22 at 13:15
  • @NBG It doesn't matter, this loop will go through once and look for all instances of ` Destination ID = ` fetching all indexes. If it works, don't forget to upvote it :) – SavvY Jun 19 '22 at 13:18
  • Oh, that’s even better – NBG Jun 19 '22 at 13:23
  • This answer worked perfectly, if anyone wants to implement this, just make sure to join the lines with a "\n" so you get the output as paraghraphs. – NBG Jun 20 '22 at 05:07