1

The Wireshark is a powerful tool for network traffic analysis. But from my practice, it can only export the processed data(which means, tell you which part is what, e.g. "data":123456 and so on) to .pcap file, but I would like to output 'data' segment in every TCP packet in real-time(or 90% real-time) to other application such as my python script for further use(may be via TCP forward?pipe?)

I don't know how to get it done exactly. Is anyone feel willing to help me with this? Thank you~

ps: did not get some snapshot cause I get nothing to show, even a code...

shih alex
  • 71
  • 1
  • 8

2 Answers2

3

tldr; Pipe tshark output in any format (-T) into your python program and parse it there.

I am currently working on a project called pdml2flow which might be of help for you as well. For the project I rely on the pdml output (XML) from tshark. Which is piped into pdml2flow:

$ tshark -i interface -Tpdml | pdml2flow +json

I chose pdml because it was the most complete and stable when I started. But these days many output formats such as json or postscript are also possible. From tshark(1):

-T ek|fields|json|jsonraw|pdml|ps|psml|tabs|text

Set the format of the output when viewing decoded packet data. The options are one of:

  • ek: Newline delimited JSON format for bulk import into Elasticsearch. It can be used with -j or -J including the JSON filter or with -x to include raw hex-encoded packet data. If -P is specified it will print the packet summary only, with both -P and -V it will print the packet summary and packet details. If neither -P or -V are used it will print the packet details only. Example of usage to import data into Elasticsearch:
$ tshark -T ek -j "http tcp ip" -P -V -x -r file.pcap > file.json
$ curl -H "Content-Type: application/x-ndjson" -XPOST http://elasticsearch:9200/_bulk --data-binary "@file.json"

Elastic requires a mapping file to be loaded as template for packets-* index in order to convert wireshark types to elastic types. This file can be auto-generated with the command tshark -G elastic-mapping. Since the mapping file can be huge, protocols can be selected by using the option --elastic-mapping-filter:

tshark -G elastic-mapping --elastic-mapping-filter ip,udp,dns
  • fields: The values of fields specified with the -e option, in a form specified by the -E option. For example,
tshark -T fields -E separator=, -E quote=d

would generate comma-separated values (CSV) output suitable for importing into your favorite spreadsheet program.

  • json: JSON file format. It can be used with -j or -J including the JSON filter or with -x option to include raw hex-encoded packet data. Example of usage:
$ tshark -T json -r file.pcap
$ tshark -T json -j "http tcp ip" -x -r file.pcap
  • jsonraw: JSON file format including only raw hex-encoded packet data. It can be used with -j including or -J the JSON filter option. Example of usage:
$ tshark -T jsonraw -r file.pcap
$ tshark -T jsonraw -j "http tcp ip" -x -r file.pcap
  • pdml: Packet Details Markup Language, an XML-based format for the details of a decoded packet. This information is equivalent to the packet details printed with the -V option. Using the --color option will add color attributes to pdml output. These attributes are nonstandard.

  • ps: PostScript for a human-readable one-line summary of each of the packets, or a multi-line view of the details of each of the packets, depending on whether the -V option was specified.

  • psml: Packet Summary Markup Language, an XML-based format for the summary information of a decoded packet. This information is equivalent to the information shown in the one-line summary printed by default. Using the --color option will add color attributes to pdml output. These attributes are nonstandard.

  • tabs: Similar to the default text report except the human-readable one-line summary of each packet will include an ASCII horizontal tab (0x09) character as a delimiter between each column.

  • text: Text of a human-readable one-line summary of each of the packets, or a multi-line view of the details of each of the packets, depending on whether the -V option was specified. This is the default.

This means nothing stops you from writing your own parser for any of those output formats:

$ tshark -i interface -Tjson | python your_program.py

For convenience, pdml2flow already parses pdml to a python nested dict and provides this to your code implemented as a plugin. In such a plugin you then have full access to each frame and flow and are free to do whatever you wish.

Example plugins:

The following screencast demonstrates how to create and run a new plugin in seconds:

asciicast

pdml2flow implements all the building blocks to get you quickly started processing frames in python. I hope this helped and I do appreciate any feedback. Thank you.

Ente
  • 2,301
  • 1
  • 16
  • 34
  • Thank you Sir. But how can I accept a complete packet in python? I use print command in sys.stdin, it just print a little part of the json format of the packet. – shih alex Apr 29 '19 at 08:04
  • sure, but it's quite simple. – shih alex Apr 29 '19 at 23:35
  • I use command: tshark -i eth0 -Tjson | python alpha.py to launch my code, and in the .py file is: for line in sys.stdin: print line,therefore I could only get the "[" in the first line——the first character of the json output format instead of a continuely json string, I don't know how to grap a complete json object. – shih alex Apr 29 '19 at 23:35
  • I would like a function that capable to return a complete json str or python dict object once the tshark finish the decode job for a single packet. I want this because If I just use pypcap to capture packet in python I could only get every single packet, but in fact the TCP packet have to reassembled to get the full payload. I stuck at this so I'm looking forward to directly get the tshark decode results. Sorry I don't know If I explain it clear or not but firstly I should thank you! – shih alex Apr 29 '19 at 23:49
  • What your `print line` - example will do is it will just print each line of the output. I can't see anything wrong with your example. Maybe there is just no traffic on eth0? But what you probably want instead of just printing line by line is storing that input somewhere and then parse it using [json.loads()](https://docs.python.org/3/library/json.html). This is very much exactly what [pdml2frame](https://github.com/Enteee/pdml2flow#pdml2frame) already does for you. Just run: `tshark -i any -Tpdml | pdml2frame +json`. In a `pdml2frame`-plugin the core will provide you every frame as `dict`. – Ente Apr 30 '19 at 06:23
0

Consider using named pipes as a buffer for interprocess communication.

neau
  • 48
  • 8