Importing PCAP into Elasticsearch

Question

I'm trying out Elasticsearch for the very first time.

I've downloaded Elasticsearch and Kibana and everything seems to run fine. I can visit http://localhost:5601 and view Kibana without errors.

I've made some traces with wireshark/tshark and converted it into Elasticsearch format with:

tshark -r test_trace.pcap -T ek > test_trace.pcap.json

Now I'm trying to import that .json into Elasticsearch, but it seems to fail:

curl -s -H "Content-Type: application/x-ndjson" -XPOST "localhost:9200/foo/_bulk" --data-binary "@/Users/test-elastic/test_trace.pcap.json"

I'm getting no errors or any output, but visiting Kibana shows index_not_found_exception and running:

curl 'http://127.0.0.1:9200/foo/_search/?size=10&pretty=true'

Outputs

{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index",
        "resource.type" : "index_or_alias",
        "resource.id" : "foo",
        "index_uuid" : "_na_",
        "index" : "foo"
      }
    ],
    "type" : "index_not_found_exception",
    "reason" : "no such index",
    "resource.type" : "index_or_alias",
    "resource.id" : "foo",
    "index_uuid" : "_na_",
    "index" : "foo"
  },
  "status" : 404
}

How can I import my data correctly and view it in Elasticsearch and Kibana?

The JSON file is 195MB, converted from 10MB PCAP file. Output of first lines in json-file is:

{"index" : {"_index": "packets-2019-02-15", "_type": "pcap_file", "_score": null}}
{"timestamp" : "1549540104875", "layers" : {"frame": {"frame_frame_interface_id":...

UPDATE

After removing -s in curl I'm getting output:

HTTP/1.1 413 Request Entity Too Large

Now I've tried to use split to split the files into mulitple smaller files.

Testing import again now gives me multiple errors with:

..."reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'ip_ip_addr'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@5d2f82db; line: 1, column: 1300...

UPDATE

I used the following command on my test_trace.pcap.json to get smaller files:

split -l 10000 -a 10 test_trace.pcap.json.pcap.json ./tmp/test_trace.pcap

Then I got lots of files and tested import wit the first file:

./tmp/test_trace.pcapaaaaaaaaaa

The file type in my .json is:

"frame_frame_protocols": "sll:ethertype:ip:sctp"

and there are indeed multiple ip_ip_addr fields, as I have source and destination ip addresses in the traces.

Can you also show the first few lines of `test_trace.pcap.json`? and how big is this file? — Val, Feb 18 '19 at 09:33

score 1 · Accepted Answer · answered Feb 18 '19 at 09:39

1

Your JSON file already contains the index into which the data is supposed to be indexed, i.e. packets-2019-02-15, so your query should simply be:

curl 'http://127.0.0.1:9200/packets-2019-02-15/_search/?size=10&pretty=true'

However, I doubt that you can send a 195MB file in one go, I suggest you split it and load it in chunks

answered Feb 18 '19 at 09:39

Val

207,596
13
358
360

Ok, running `curl 'http://127.0.0.1:9200/packets-2019-02-15/_search/?size=10&pretty=true'` gives me nothing. Won't an error be reported if file is to big? – Alfred Balle Feb 18 '19 at 09:47
The file is probably too big and if you wait long enough you'd probably get a 5xx error. Did your curl bulk command finish executing or did you kill it? – Val Feb 18 '19 at 09:49
The `curl` finishes within a few seconds, no error and no need to kill it. – Alfred Balle Feb 18 '19 at 09:51
Hmmm, a few seconds for 195MB sounds weird... can you remove the -s switch? – Val Feb 18 '19 at 09:53
any more feedback here? – Val Feb 19 '19 at 08:12
How do I remove the -s switch ? – Alfred Balle Feb 19 '19 at 08:47
from your curl command where you bulk upload, simply remove the `-s` switch and replace it with `-v` – Val Feb 19 '19 at 08:47
Ah, now I'm getting `HTTP/1.1 413 Request Entity Too Large`. Perfect thank you. I'll update the post with information and output after using `split`. – Alfred Balle Feb 19 '19 at 08:50
May I ask how you did the split? the split operation is simply suppose to partition your big bulk file into smaller ones. Does each of the split file have a command and document line for each document? – Val Feb 19 '19 at 09:11
Used `split -l 10000 -a 10 test_trace.pcap.json /tmp/test_trace.pcap` – Alfred Balle Feb 19 '19 at 10:08
You need to show how you adapted my split script to your case, please update your question with it, so we have a full picture. – Val Feb 19 '19 at 10:11
Just testing if https://ask.wireshark.org/question/505/deduplication-in-tshark-t-ek/ might be my issue. – Alfred Balle Feb 19 '19 at 11:41
Well, indeed, if the generated JSON has multiple duplicate fields, it's going to be an issue as it's not valid JSON. – Val Feb 19 '19 at 12:04
Ok, now things work with updated Wireshark. Thank you. – Alfred Balle Feb 19 '19 at 13:19
Awesome, glad you figured it out! – Val Feb 19 '19 at 13:25

Importing PCAP into Elasticsearch

1 Answers1