0

The following code gives unpredictable results with the following advice in use:

import pyshark
import pandas as pd
import asyncio

def ProcessPackets(packet):
    global packet_list
    packet_version = packet.layers[1].version
    layer_name = packet.layers[2].layer_name
    packet_list.append([packet_version, layer_name, packet.length, packet.sniff_time])

def Capture(timeOrPath):
    global packet_list
    packet_list=[]
    try:
        timeout=int(timeOrPath)
        capture = pyshark.LiveCapture()          
        capture.apply_on_packets(ProcessPackets, timeout=timeout)
    except asyncio.TimeoutError:
        pass
    except ValueError:
        capture = pyshark.FileCapture(timeOrPath)
        capture.load_packets()
        capture.apply_on_packets(ProcessPackets)
    data = pd.DataFrame(packet_list, columns=['vIP', 'protocol', 'length','timestamp']) 
    print(data['timestamp'].iloc[-1]-data['timestamp'].iloc[0])

def main(): 
    Capture(6)

if __name__ == '__main__':
    main()

Sometimes the calculated time exceeds the timeout given. (timestamp is packet.sniff_time)

YoNa
  • 19
  • 7
  • _[If a timeout is given, raises a Timeout error if not complete before the timeout (in seconds)](https://github.com/KimiNewt/pyshark/blob/master/src/pyshark/capture/capture.py)_ **could** be the reason? I can't figure how to fix though – YoNa May 27 '21 at 22:54
  • FYI *capture.apply_on_packets(dosomething, timeout=timeout)* and *capture.sniff(timeout=timeout)* should not be used together, because they are doing the same thing. – Life is complex Jun 02 '21 at 12:55
  • @Lifeiscomplex I have actually changed it since posted but this didn't help. I figured that the timeout given includes the processing time of `ProcessPackets()` but I couldn't solve the problem. The time is unpredictable. The `incoming=data['timestamp'].iloc[-1]-data['timestamp'].iloc[0]` line gives the correct value with capture from a file so the issue must be with the timeout – YoNa Jun 02 '21 at 17:09
  • @Lifeiscomplex I suppose it raises the `asyncio.TimeoutError` at some point and that's when the capture stops. Other times it wouldn't go past `capture.apply_on_packets(App.ProcessPackets, timeout=timeout)`. Although, other times I'd get duration exceeding the given timeout... – YoNa Jun 02 '21 at 17:13
  • can you update the question with the current code that you're using? Also please provide more details on this: *incoming=data['timestamp'].iloc[-1]-data['timestamp'].iloc[0]*, because I don't fully understand the issue. Thanks. – Life is complex Jun 02 '21 at 17:44
  • @Lifeiscomplex I've updated. As of now, I added `start = datetime.datetime.now()` at the beginning of `Capture()` and `print(datetime.datetime.now()-start)` in `except asyncio.TimeoutError:` and it gives the timeout given or with +1 sec. It is the utterly wrong approach, though. – YoNa Jun 02 '21 at 18:11
  • @Lifeiscomplex I think the problem is that there is not enough time to _process+capture_ packets within the timeout due to the `apply_on_packets`'s realization. I can't use `capture.sniff(timeout=timeout)` as you [have pointed out](https://stackoverflow.com/questions/67234858/why-does-pyshark-continue-a-livecapture-with-a-timeout) – YoNa Jun 02 '21 at 18:24
  • I see the issue now with your code. You combined both my previous pyshark examples into your code. In the current state your code will have problems. I will write up the correct code as I see it. If it's wrong we can work on fixing it together. – Life is complex Jun 02 '21 at 19:40

1 Answers1

1

UPDATED 06-03-2021


After doing some research into this capture latency issue, I have determined that the problem likely is linked to pyshark waiting for dumpcap to load. dumpcap is loaded in LiveCapture mode

  def _get_dumpcap_parameters(self):
        # Don't report packet counts.
        params = ["-q"]
        if self._get_tshark_version() < LooseVersion("2.5.0"):
            # Tshark versions older than 2.5 don't support pcapng. This flag forces dumpcap to output pcap.
            params += ["-P"]
        if self.bpf_filter:
            params += ["-f", self.bpf_filter]
        if self.monitor_mode:
            params += ["-I"]
        for interface in self.interfaces:
            params += ["-i", interface]
        # Write to STDOUT
        params += ["-w", "-"]
        return params

    async def _get_tshark_process(self, packet_count=None, stdin=None):
        read, write = os.pipe()

        dumpcap_params = [get_process_path(process_name="dumpcap", tshark_path=self.tshark_path)] + self._get_dumpcap_parameters()

        self._log.debug("Creating Dumpcap subprocess with parameters: %s" % " ".join(dumpcap_params))
        dumpcap_process = await asyncio.create_subprocess_exec(*dumpcap_params, stdout=write,
                                                               stderr=self._stderr_output())
        self._created_new_process(dumpcap_params, dumpcap_process, process_name="Dumpcap")

        tshark = await super(LiveCapture, self)._get_tshark_process(packet_count=packet_count, stdin=read)
        return tshark

The code above launches this on my system:

 /usr/local/bin/dumpcap -q -i en0 -w -

and this:

/usr/local/bin/tshark -l -n -T pdml -r -

I have attempted to pass in some custom parameters to LiveCapture

capture = pyshark.LiveCapture(interface='en0', custom_parameters=["-q", "--no-promiscuous-mode", "-l"])

but there is still around a 1/2 of a second delay.

10.015577793121338
0 days 00:00:09.371264

In the dumpcap documentation there is a -a mode, which allows for a duration timeout, but I cannot pass that parameter into pyshark without causing an error.

Tshark also has a -a mode, but it also causes an error within pyshark

capture = pyshark.LiveCapture(interface='en0', override_prefs={'': '-r'}, custom_parameters={'': '-a duration:20'})

There might be way to modify the timeout parameters within pyshark code base, to allow the -a mode. To do this would require some testing, which I don't have the time to do at the moment.

I opened an issue on this problem with the developers of pyshark.

ORIGINAL POST 06-02-2021


I reworked your code to write the extracted items to a pandas dataframe. If this isn't what you wanted please update your questions with your exact requirements.

import pyshark
import asyncio
import pandas as pd

packet_list = []


def process_packets(packet):
    global packet_list
    try:
        packet_version = packet.layers[1].version
        layer_name = packet.layers[2].layer_name
        packet_list.append([packet_version, layer_name, packet.length, str(packet.sniff_time)])
    except AttributeError:
        pass


def capture_packets(timeout):
    capture = pyshark.LiveCapture(interface='en0')
    try:
        capture.apply_on_packets(process_packets, timeout=timeout)
    except asyncio.TimeoutError:
        pass
    finally:
        return packet_list


def main():
    capture_packets(6)
    df = pd.DataFrame(packet_list, columns=['packet version', 'layer type', 'length', 'capture time'])
    print(df)
    # output 
          packet version layer type length                capture time
    0                 4        udp     75  2021-06-02 16:22:36.463805
    1                 4        udp     67  2021-06-02 16:22:36.517076
    2                 4        udp   1388  2021-06-02 16:22:36.706240
    3                 4        udp   1392  2021-06-02 16:22:36.706245
    4                 4        udp   1392  2021-06-02 16:22:36.706246
    truncated...


if __name__ == '__main__':
    main()

Life is complex
  • 15,374
  • 5
  • 29
  • 58
  • The problem is that the actual capture duration differs from the timeout given. I check this by adding this line in the main: `print(df['capture time'].iloc[-1]-df['capture time'].iloc[0])` I also changed `packet_list.append([packet_version, layer_name, packet.length, str(packet.sniff_time)])` to `packet_list.append([packet_version, layer_name, packet.length, packet.sniff_time])`. For the timeout of 10 the result was 0 days 00:00:07.364705 – YoNa Jun 02 '21 at 21:05
  • It isn't _that_ important for my project actually since it's done as a part of my student curriculum, not an actual work. I believe it's an issue with pyshark – YoNa Jun 02 '21 at 21:08
  • I 'fixed' it by adding `start = datetime.datetime.now()` at the beginning and `except asyncio.TimeoutError: print(datetime.datetime.now()-start)` which gives +- correct results. Therefore timeout must be the processing time not the capture duration – YoNa Jun 02 '21 at 21:11
  • I am thinking whether it would be better to use pcapy or another way of using libpcap with python directly avoiding pyshark/tshark/wireshark. I am not familiar with it though – YoNa Jun 02 '21 at 21:16
  • 1
    WOW!! There is a delay in packet processing and it ranges from .90 to .50 of a second. This might be a bug, but I need to look in latency issues in both python and pyshark. – Life is complex Jun 03 '21 at 00:57
  • 1
    I looked into this more and there is a know delay in processing packets with pyshark/tshark. This delay can be up to a second, because the application is doing various things including DNS lookups. From what I see there is no way to reduce this delay. – Life is complex Jun 03 '21 at 02:43