How Can I Parse a Pcapng File in C#?

Question

I'm new to Pcapng files. I've read the 40+ page whitepaper and I'm still scratching my head and sweating. I understand that the Pcapng file is:

Made up of a Section Header Block - This is the start of every Pcapng file.

Question 1: How large is this?

It appears that it's BlockType (4 Bytes) + BlockTotalLength (4 bytes) + Byte Order Magic (4 Bytes) + Mahor and Minor Version (4 bytes total, 2 bytes each) + Section Length (4 bytes) + Options (Variable) + Block Total length (again, 4 bytes).

If I'm building a parser, how would I know how many bytes I need to skip to arrive at my first data frame block?

Question 2: Where is the data stored? By data I mean the entire frame that contains Ethernet, IP, and TCP Data, as shown in the picture below (Figure 1).

The documentation states that:

A section includes data delimited by two section header blocks.

When doing a manual inspection (yes, I went byte by byte over a file to see how many bytes lie in between two frames :'( ), I noticed there were 35 bytes in between each message (each message shown on wireshark had 35 bytes in between). Are these bytes related to a pcapng block?

Once I understand how to get to the first tcp frame, and how many bytes I need to skip to get to the next, I can build my parser.

I'm willing to send Bitcoin/Monero to anyone who can help me understand how I can best parse these pcapng messages. Thanks!

Files consist of blocks. Blocks can be of different types. If you don't care about byte order, you can ignore section headers. Just look for Enhanced Packet Blocks, Simple Packet Blocks and (if dealing with old data) the obsolete Packet Blocks. — NetMage, Mar 12 '20 at 23:47
Checking the spec, there is a minimum of 32 bytes between packet data, unless there are options, and those are always multiples of 4 bytes, so I don't see how you got 35. Can you show a section of the pcapng file? (Preferably from the start.) — NetMage, Mar 12 '20 at 23:51
You could also just use this managed project for parsing [PcapngUtils](https://github.com/ryrychj/PcapngUtils). — NetMage, Mar 13 '20 at 00:07
@NetMage Can I add you on some other platform? How do yuou know there are a minimum of 32 bytes between packet data without options? I'll pay you to teach me. Indeed it is 36 bytes in between each. I have no idea how I'd know how many options would be in each section. — , Mar 13 '20 at 00:08
From the [pcapnp file format](http://xml2rfc.tools.ietf.org/cgi-bin/xml2rfc.cgi?url=https://raw.githubusercontent.com/pcapng/pcapng/master/draft-tuexen-opsawg-pcapng.xml&modeAsFormat=html/ascii&type=ascii#section_epb) you can see in section 4.3 an Enhanced Packet Block has fixed 28 bytes before the packet data and fixed 4 bytes after (repeated Block Total Length) plus the Options. Section 3.5 says options have 4 fixed bytes and then an optional value that is a multiple of 4 bytes. — NetMage, Mar 13 '20 at 00:12
Also, you determine the size of the options area by taking the Block Total Length, subtracting the beginning overhead (28), the duplicate length (4) and the Packet Data length (Captured Packet Length rounded up to multiple of 4). What's left is the Options block. But if all you care about is the Packet Data, you can just skip to the next block using the Block Total Length. — NetMage, Mar 13 '20 at 00:15
@NetMage Great point... I can skip Options using Block Total Length. — , Mar 13 '20 at 01:07

score 2 · Answer 1 · answered Mar 15 '20 at 08:42

If I'm building a parser, how would I know how many bytes I need to skip to arrive at my first data frame block?

That's not how you do it.

If you're building a parser, note that a parser must look at more than just the first data frame block.

First of all, it must look at the Section Header Block (SHB), to determine the byte order of the data in all the subsequent blocks by looking at the Byte-Order Magic field.

After that, you need to look at all subsequent blocks, looking for Interface Description Blocks and Enhanced Packet Blocks (EPBs), Simple Packet Blocks (SPBs), and possibly Packet Blocks (PBs) (those are obsolete, so no program should write them, but programs should be prepared to read them). Each EPB or PB has an interface ID that refers to an IDB, which must have appeared before the EPB or PB in question; an SPB implicitly refers to the first IDB, which, again, must have appeared before the SPB in question.

The format of the packet data in an EPB, SPB, or PB depends on the link-layer type specified by the IDB to which it refers, so you need to have read the IDB in question.

And, as the above indicates, there is no fixed number of bytes between the SHB and the first EPB, SPB, or PB, so there is no simple fixed number of bytes to skip to get to the first data frame block. For one thing, there's a variable number of bytes, which you can only determine by reading all the blocks before the first EPB, SPB, or PB. For another thing, you can't skip them, you have to read them to get enough information to interpret the packet data in them.

Where is the data stored? By data I mean the entire frame that contains Ethernet, IP, and TCP Data, as shown in the picture below (Figure 1).

It's stored in EPBs, SPBs, or PBs. See the descriptions of those three block types; frames are in the "Packet Data" fields of those blocks.

So I'm at my Interface Description Block and the 64 bit number that contains both a Timestamp Resolution of 9 (10^-9, Nanoseconds?) and 6 (10^-6, Microseconds).

As Christopher Maynard indicated, the 9 isn't a timestamp resolution, it's an option type. Pcapng blocks have both fixed information at the beginning and options; an option begins with an option type and option value length, followed by the option data. An IDB if_tsresol option has

2 bytes of option type, with the value 9;
2 bytes of option value length, with the value 1;
1 byte of option value, with the value as specified in the description of that option.

A value of 6 means the time stamp resolution is 1/10^6 of a second, which means 1 microsecond.

score 1 · Accepted Answer · edited Oct 07 '21 at 07:27

I think @tee-zad-awk found an answer that helped over at https://ask.wireshark.org/question/15159/how-can-i-display-as-much-pcapng-information-as-possible/, but for the benefit of anyone else looking for an answer to this question, I've linked it here and have provided my answer below, just in case the link is ever broken someday.

It seems that, after reading the 40 page whitepaper on Pcapng ...

The current PCAP Next Generation (pcapng) Capture File Format draft document is 52 pages, so perhaps you're not looking at the most recent version? Other versions do exist, such as those at https://datatracker.ietf.org/doc/html/draft-tuexen-opswg-pcapng-00, https://pcapng.github.io/pcapng/ or https://www.tcpdump.org/pcap/pcap.html and probably others, but they're all obsolete.

If you're looking for a pcapng parser to help you decipher the file, then look no further than Wireshark itself. If you've loaded a pcapng file into Wireshark, you can use "View -> Reload as File Format/Capture" (Ctrl+Shift+F) to cause Wireshark to load and display the raw file contents itself rather than to load and display the packets from the file. This should cause you to be able to see the various pcapng blocks and be able to drill down into them. For example:

Frame 1: 184 bytes on wire (1472 bits), 184 bytes captured (1472 bits)
MIME file
PCAPNG File Format
    Block: Section Header Block 1
    Block: Interface Description Block 0
    Block: Enhanced Packet Block 1

You can also have a look at the Wireshark source code, such as the epan/dissectors/file-pcapng.c and wiretap/pcapng.c files.

By the way, if you're looking to support all extensions, the Wireshark [PcapNg wiki page] (https://wiki.wireshark.org/Development/PcapNg) has a link to Augmented PCAP Next Generation Dump File Format page that you might also want to take a look at. I don't know how many other extensions may have been implemented but not included in the main pcapng file format specification, but hopefully not many, as this could quickly become problematic with different projects possibly using the same block type for different blocks. That practice should be highly discouraged.

Hahha, that was my question that I asked on Wireshark after posting here. For doing the research and coming up with this answer, I'll give you the green checkmark. — , Mar 13 '20 at 16:15
I am a little confused by the timestamp though. I see a low and high component, but I'm not sure what to make of it and how to convert it to a UTC timestamp with nanosecond precision. — , Mar 13 '20 at 16:16
@TeeZadAwk Unfortunately, timestamps are 64-bit timestamps in units of 10^-6 seconds unless there is an Interface Description Block with a if_tsresol option, in which case you need to interpret the option. Which makes getting timestamps more complicated than it should be. It is all in the file format documents. — NetMage, Mar 13 '20 at 16:22
@NetMage Yep, I read that part, I meant I'm confused on interpretting. So I'm at my Interface Description Block and the 64 bit number that contains both a Timestamp Resolution of 9 (10^-9, Nanoseconds?) and 6 (10^-6, Microseconds). https://i.imgur.com/G8KI47k.png How do I make sense of this? Which one is it? In hex this is `09 00 01 00 06 00 00 00` in Little endian. So the most significant bit is 0. This means `the remaining bits indicates the resolution of the timestamp as a negative power of 10 `. What do I make of this? — , Mar 13 '20 at 16:36
@NetMage, Copy/paste didn't pick up all the links properly, and I missed re-linking to the most recent pcapng capture file format specification. It should be fixed now. Thanks for pointing that out. — Christopher Maynard, Mar 13 '20 at 16:43
There's only 1 resolution specified, namely 6 (i.e., 10^-6 or microseconds). The 9 is just the option code for the time resolution option, which you can see in the Interface Description Block Options table below Figure 10 of the pcapng capture file format specification. More information over at https://ask.wireshark.org/question/15177/what-are-the-units-of-time-referring-to-in-an-enhanced-packet-block/ — Christopher Maynard, Mar 13 '20 at 19:45

score 0 · Answer 3 · edited Nov 13 '22 at 09:01

In order to find it out, it is helpful to read the specifications of the protocol of the network device and the package that has been sent. For example, we need to know the frame description of an Ethernet device and the package description of a TCP/IP package in order to understand the raw data. Having studied this, we record some traffic in Wireshark and select a block in the upper window of Wireshark. The middle window will tell you in clear text what Wireshark has received. On clicking on any of the lines in the middle window, Wireshark will mark the bytes of the raw data in the lower window that bear the information of the clicked line. Also, you can click on the raw data and then the clear text is marked. Moreover, the status line also informs you about it. This is very helpful for understanding the data.

I needed to read the TCP / IPv4 packages of Ethernet traffic. The block starts with the identification block type = 0x00000006 and the length of the block. The device was Ethernet so that I had the link type LINKTYPE_ETHERNET. The section length can be taken from byte 16-23. The other entries of the block header can be taken from here.

After the block header or after 28 bytes , the Ethernet frame came with the following entries (see here for a description):

mac address destination, 6 bytes
mac address source, 6 bytes
type: 0x0800 for IPv4, 0x0806 for ARP, 0x86DD for IPv6, 0x8100 for the presence of an IEEE 802.1Q tag.

For an IPv4 package or type = 0x0800, the following bytes are the IPv4 header (see here for a description):

IP version and header length, 1 byte
differentiated services field, 1 byte
total length, 2 bytes
identification, 2 bytes
flags, 1 byte
fragment offset, 1 byte
time to live, 1 byte
protocol with 0x06 for TCP, 1 byte
header checksum, 2 bytes
source IP address
destination IP address
options

The total length is very important: the byte that follows the last byte of the IPv4 + TCP package is located at total length bytes after the entry IP version and header length. However, this entry can be tricky. I head an entry with length 0 though the IP header length already had 20 bytes. In this case, Wireshark was helpful. It reported

[Total Length: 1547 bytes (reported as 0, presumed to be because of "TCP segmentation offload" (TSO))]

A detailed description of this phenomenon can be taken from here. In this case I could compute the payload length by the section length from above minus the length of the Ethernet frame (14 bytes) minus the length of the IP Header minus the length of the TCP header. However, padding problems could arise though I did not have these problems. A padding problem occurs when the package length is extended to a multiple of 4 bytes or something.

If the protocol of the IPv4 header is 0x06, the TCP package follows. The details of an TCP package can be taken from here. Of course, Wireshark also helps you interpreting the TCP package: just click on the lines in the middle window that belong to the TCP package or click on the raw data.

As outlined here, the interpretation of a pcapng file has many ifs and whens.

How Can I Parse a Pcapng File in C#?

3 Answers3