0

Running Pyshark on Ubuntu 20.04 desktop. Forgive me for the very basic question since I do not have much background on networking

I am able to capture packets on my Wi-Fi interface for, let's say 10-20 minutes and inspect the packets via python. While the packets are being captured, of course, I am running several applications that are using the internet like

  • thunderbird for email
  • my update and upgrade manager which updates my system from the ubuntu repositories
  • browsing the internet, including playing videos via youtube
  • making conference calls via zoom etc.

Can I assume all these packets are being captured in the .pcap file unless I set a filter? After I capture them, my goal is to find which application (e.g. firefox or thunderbird) each packet belongs to, and whether the packet corresponds to web traffic, or video streaming data, email, file transfer etc. Is it possible? Basically, I want to give each application a score on each category, let's say video, text, file transfer etc. to judge how many packets, how much data it transfer over WiFi. Or are there rules of thumb I can apply based on the port numbers available at the packets?

So which attributes of each packet object do I look for to know the application, and the category?

Further, for each packet, I also want to know whether I am transmitting uplink or I am receiving as downlink.

Della
  • 1,264
  • 2
  • 15
  • 32

1 Answers1

1

Such classification is quite complex. Something like that is used for NextGen firewalls, but the algorithms are proprietary. You will need some research to discover for every application what makes it unique. This is not a simple task, and you will need to learn a lot about many different applications and what make their traffic different from other applications, but remember that there are millions of applications, and new applications are being made every day. Companies have researches and spend millions to do exactly this, which is why the ways they do it are proprietary.

First determine if the packet is inbound or outbound and the transport protocol; remember that TCP ports are not UDP ports, even if they use the same number range, and not all transport protocols use port numbers. If your application is acting as a server, it may use one of the well-known ports for a transport protocol as the source port, but that is not guaranteed, and the well-known port would be a destination port if your application is acting as a client.

That may determine some application types, e.g. web browser, but not all (other applications use web protocols, too), and to determine the specific application is much harder, requiring you to look inside the transport protocol payload for some signatures. This is where you spend your time and money doing research to determine what to look for for each application. You may also find encrypted traffic, e.g. TLS, that some applications use, and you will need to either decrypt the traffic or use some other method to determine the application.

Some transport protocols, such as TCP, create connections, and you need to get into the connection stream early to get the segment payload where application-layer protocols, like , HTTP, can give you more information about the specific browser or other application using the connection. Once you have that, you know the application for the entire connection, but some transport protocols, like UDP, are connectionless, although some application may create pseudo-connections using an application-layer protocols the uses a connectionless transport protocol.

This is not casual or simple programming, and Python really is not fast enough to keep up with a lot of traffic. You would need to use something that compiles to native code, and even then you may miss some things, or you would need to slow the traffic to be able to inspect all of it. That is why many firewalls have dedicated hardware to help their NextGen firewalls.

Ron Maupin
  • 6,180
  • 4
  • 29
  • 36
  • Thanks a lot for the answer. Just to clarify, I don't need to inspect realtime. I have already captured the pcap file. Now I want to draw some plots showing inferential statistic about the packets, and their breakdown among applications/categories etc. – Della Sep 18 '22 at 00:21
  • Then you should just use Wireshark. It can break down a lot of protocols, and you can add to what it comes with. – Ron Maupin Sep 18 '22 at 00:25