-1

I have a Python function I'm trying to speed up, which just takes a line of tshark output, eg:

'1\t0.000000000\tTCP\t100.0.1.190,111.0.0.2\t35291\t55321\t\t\t56\t20\t··········S·\t36\n'

and assigns the data to variables like so:

            arr = line.strip('\n').split("\t")

            sip = arr[3].split(',')[0]
            dip = arr[3].split(',')[1]

            s_flag = 1 if 'S' in arr[10] else '0'
            a_flag = 1 if 'A' in arr[10] else '0'
            f_flag = 1 if 'F' in arr[10] else '0'
            r_flag = 1 if 'R' in arr[10] else '0'
            p_flag = 1 if 'P' in arr[10] else '0'
            u_flag = 1 if 'U' in arr[10] else '0'
            e_flag = 1 if 'E' in arr[10] else '0'
            c_flag = 1 if 'C' in arr[10] else '0'

What is a way to speed this up using Cython? I'm thinking of casting the results of line.strip('\n').split("\t") to a numpy array since I heard it's faster than Python lists in Cython? How else can I speed this up? eg:

import numpy
cimport numpy

arr = np.array(line.strip('\n').split("\t"))

Will this work? Thank you in advance!

Frankfurters
  • 108
  • 8
  • 1
    `flags = {}`, `for char in char_flags: flags[char + "_flag"] = 1 if char.upper() in arr[10] else '0'` where charflags is `"s", "a", ...`. Note that this stores the results in a `dict`. – Larry the Llama Dec 06 '21 at 09:39
  • Have you determined this to not be fast enough? Is `··········S·` always in the same order, with each missing character replaced by a `.`? – Mad Physicist Dec 06 '21 at 10:01
  • `c_flag = 1 if 'C' in arr[10] else '0'` seems odd. Either `1` (int) or `'0'` (str). Perhaps use the boolean value instead: `c_flag = ('C' in arr[10])`. Using numpy: `np.isin(list('SAFRPUEC'), list(arr[10]))` – Mad Physicist Dec 06 '21 at 10:03
  • @MadPhysicist yeah, my code took about 4 mins to process 930000 packets which were sent in a minute. Of course this number of packets can be any arbitrary number depending on the hardware, but I'd like to try and cut down the processing time as much as possible. Yes, the flags are always in the same order. – Frankfurters Dec 06 '21 at 11:17
  • Is there any point to use numpy instead of a Python list then? Would it still have any benefits? And if you're familiar, how do you cythonize a Python list using numpy, if that's how it works? I've merely heard of the concept. – Frankfurters Dec 06 '21 at 11:20
  • @Frankfurters. It sounds like you need to read a basic tutorial more than anything. I think you're doing something weird with the I/O for it to take that long. My not-so-powerful machine shows <35us for the code I wrote in my answer, while running other stuff in the background. That's ~30sec for 930000 like the one you showed. – Mad Physicist Dec 06 '21 at 12:19
  • Ah I'm sorry I didn't include it in the question, but this is just a snippet of my code, which does more of the same stuff. Either way, thank you for all the suggestions, I'll test them out today – Frankfurters Dec 07 '21 at 00:37

1 Answers1

2

Since you're dealing with lists of strings, numpy, and likely even cython won't help you much. The transformations you are looking for are so trivial, you can just clean up your python code a bit and move on:

FLAGS = np.array(list('SAFRPUEC'))

items = line.strip('\n').split("\t")
sip, dip = items[3].split(',')
flags = dict(zip(FLAGS, np.isin(FLAGS, list(items[10]))))
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264