0

I have been advised by a colleague to build my own ts parser, which I want to do in Python (mainly to help me learn this stuff and challenge myself). My colleague mentioned that I should look into pickling the .ts file and then reading that pickle file into my main Python script.

An excerpt of a pickle file of a .ts that was given to me by my colleague as an example is shown below (first 40 lines) but I have no idea how to achieve this.

If anyone can point me in the right direction that would be very much appreciated.

8004 95eb ab00 0000 0000 008c 1170 616e
6461 732e 636f 7265 2e66 7261 6d65 948c
0944 6174 6146 7261 6d65 9493 9429 8194
7d94 288c 055f 6461 7461 948c 1570 616e
6461 732e 636f 7265 2e69 6e74 6572 6e61
6c73 948c 0c42 6c6f 636b 4d61 6e61 6765
7294 9394 2981 9428 5d94 288c 1870 616e
6461 732e 636f 7265 2e69 6e64 6578 6573
2e62 6173 6594 8c0a 5f6e 6577 5f49 6e64
6578 9493 9468 0b8c 0549 6e64 6578 9493
947d 9428 8c04 6461 7461 948c 156e 756d
7079 2e63 6f72 652e 6d75 6c74 6961 7272
6179 948c 0c5f 7265 636f 6e73 7472 7563
7494 9394 8c05 6e75 6d70 7994 8c07 6e64
6172 7261 7994 9394 4b00 8594 4301 6294
8794 5294 284b 014b 0685 9468 158c 0564
7479 7065 9493 948c 024f 3894 4b00 4b01
8794 5294 284b 038c 017c 944e 4e4e 4aff
ffff ff4a ffff ffff 4b3f 7494 6289 5d94
284b 004b 654d e101 4dd6 074d ff1f 8c03
7375 6d94 6574 9462 8c04 6e61 6d65 944e
7586 9452 9468 0d8c 1b70 616e 6461 732e
636f 7265 2e69 6e64 6578 6573 2e6e 756d
6572 6963 948c 0a49 6e74 3634 496e 6465
7894 9394 7d94 2868 1168 1468 174b 0085
9468 1987 9452 9428 4b01 4d3b 1585 9468
1e8c 0269 3894 4b00 4b01 8794 5294 284b
038c 013c 944e 4e4e 4aff ffff ff4a ffff
ffff 4b00 7494 6289 42d8 a900 0001 0000
0000 0000 0002 0000 0000 0000 0003 0000
0000 0000 0004 0000 0000 0000 0005 0000
0000 0000 0006 0000 0000 0000 0007 0000
0000 0000 0008 0000 0000 0000 0009 0000
0000 0000 000a 0000 0000 0000 000b 0000
0000 0000 000c 0000 0000 0000 000d 0000
0000 0000 000e 0000 0000 0000 000f 0000
0000 0000 0010 0000 0000 0000 0011 0000
0000 0000 0012 0000 0000 0000 0013 0000
0000 0000 0014 0000 0000 0000 0015 0000
0000 0000 0016 0000 0000 0000 0017 0000

EDIT: There was actually no need to do this pickling. It doesn't help in any way. So if you are reading this, this step is totally not necessary. However, I have chosen not to delete this because someone else may learn from my pain and mistakes. Pickle was not needed here, but is used to move objects between applications or to save an object state for a subsequent import. What I did was just dump a binary variable into a pickle file and it does absolutely nothing to help me.

Baba.S
  • 324
  • 2
  • 14
  • 1
    There's more than one pickle file format — it depends on the version of Python and the protocol used to create the file — plus none of them are publicly documented. In theory you could read the module source code and figure it out I suppose. If you want to read the pickle file into a script, then just load it with the appropriate function from the module which will unpickle it. See [Saving an Object (Data persistence)](https://stackoverflow.com/questions/4529815/saving-an-object-data-persistence) – martineau Nov 17 '19 at 10:28
  • 1
    I suspect this is not what your colleague had in mind. I suspect they meant you to parse a ts file into some sort of struct per frame, then write that back to a valid ts file. – szatmary Nov 17 '19 at 13:31
  • @szatmary Thanks, yes i see what you mean. I think the next steps are going to be focusing on reading the pickle file then checking against a number of requirements to see if the stream is compliant e.g to check that the first frame is IDR (which you've helped me with previously!) And stuff like that. – Baba.S Nov 17 '19 at 13:50

1 Answers1

-1

So I believe I have answered my own question. Thanks to @martineau who helped me understand that pickle has many different file formats.

Here's the code I used to pickle an mpeg TS file:

import pickle

with open("my_mpeg_ts_file.ts", "rb") as file_object:
    data = file_object.read()
    with open("my_pickle.pickle", "wb") as f:
        pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

This then created a file called my_pickle.pickle! :) And here's the first 13 lines of this beautiful pickle file:

8004 42a8 f89a 0f47 4000 1000 00b0 0d00
01c1 0000 000a e065 8d2c a3ec ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ff47 4065 1000 02b0 2000 0ac1 0000

And I used tsduck to confirm that I am looking at the correct data. Here's the first packet of my mpeg ts file from tsduck's tsdump utility:

* Packet 0
  ---- TS Header ----
  PID: 0 (0x0000), header size: 4, sync: 0x47
  Error: 0, unit start: 1, priority: 0
  Scrambling: 0, continuity counter: 0
  Adaptation field: no (0 bytes), payload: yes (184 bytes)
  ---- Full TS Packet Content ----
  47 40 00 10 00 00 B0 0D 00 01 C1 00 00 00 0A E0 65 8D 2C A3 EC FF FF FF
  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

So I can see that there is data here from the transport stream. But as @szatmary points out, there seems to be extra information added to the pickle file.

Baba.S
  • 324
  • 2
  • 14
  • 1
    It’s not correct. The pickle header is extra garbage that was unnecessary added to the file. tsduck just has some error handling, likely skipping all bytes in the file until is finds a sync byte. Basically, you are corrupting a file, than relying on another too to be able to detect and work around the corruption! – szatmary Nov 17 '19 at 18:31
  • How can I correct this? What was wrong in the way I pickled it? I understand that I used tsduck to compare, that was because I needed to know if the data I am looking at represents the data of the ts stream (this is something I've never done before so I wanted to check). If tsduck is skipping all the bytes until a sync byte, wouldn't that be the same approach I can use in my Python script? – Baba.S Nov 17 '19 at 18:44
  • 1
    Why are you pickling it the first place? What problem does that solve? Your corrupting the file hoping something can deal with it later. Why corrupt it in the first place? – szatmary Nov 17 '19 at 19:18
  • How you correct this, is instead of doing this, do nothing. You misunderstood the suggestion from your colleague. You must first understand WHY you are attempting this, and how these steps help you achieve that goal. – szatmary Nov 17 '19 at 19:23
  • I am pickling it in the first instance because that was what he told me to do. He then said I could use a Python script to load the pickle file and analyse the data. But I understand your point about corrupting the file. Out of interest, if you were to check a ts file to see if it complies with certain checks for e.g 1)Is the first frame IDR, 2) Is the PAT and PMT consistent across the file, 3) Is the max number of consecutive B pictures exceeding 3 etc. how would you go about doing that? Would your script be reading all that ts information into memory and wouldn't that be slow? – Baba.S Nov 17 '19 at 20:16
  • 1
    New questions should get new posts. – szatmary Nov 17 '19 at 20:18
  • How can I message you privately once I have completed a bit of code regarding the parser you are guiding me with? – Baba.S Nov 18 '19 at 17:49