0

some logs from access.log file:

202.134.9.131 - - [24/Jun/2020:05:03:28 +0000] "GET /static/img/p-logos/ruby-rails.png HTTP/1.1" 200 7289 "http://35.230.90.99/static/css/main.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/58.2.2878.53403"
202.134.9.131 - - [24/Jun/2020:05:03:28 +0000] "GET /static/img/p-logos/aws.png HTTP/1.1" 200 7230 "http://35.230.90.99/static/css/main.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/58.2.2878.53403"
202.134.9.131 - - [24/Jun/2020:05:03:28 +0000] "GET /static/img/p-logos/js.png HTTP/1.1" 200 7335 "http://35.230.90.99/static/css/main.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/58.2.2878.53403"
202.134.9.131 - - [24/Jun/2020:05:03:26 +0000] "GET /static/img/business-img.png HTTP/1.1" 200 853648 "http://35.230.90.99/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/58.2.2878.53403"

I can upload logs to elasticsearch in this way:

with open(access.log, "r") as ral:
    for line in ral:
        try:
            log={
                "user_IP" : line.split(" ")[0],
                "request_date" : line.split("[")[1].split("]")[0],
                "request_method" : line.split('"')[1].split(" ")[0],
                "internal_url" : line.split('"')[1].split(" ")[1],
                "HTTP_version" : line.split('"')[1][-3:],
                "request_status ": line.split('"')[2][1:4],
                "request_size" : line.split('"')[2].split(' ')[2],
                "external_url" :  line.split('"')[3],
                "user_agent" : line.split('"')[5]
            }
            res = es.index(index=index, body=log)
            print(res)
        except:
            log={"log" : line}
            res = es.index(index=index, body=log)
            print(res)

But I'm facing a few problems in this way:

  1. It creates heavy traffic in the elasticsearch server.
  2. It takes a long time and consumes resources for big logs file

In Kibana, there are options for uploading CSV, LOG, JSON files. Elasticsearch parse logs automatically. But here, I'm parsing logs with python and it takes time and resources.

My question is:

  • Is there are any way to upload the "access.log" file without parsing?

I want to upload logs as a file. I don't want to parse logs and upload them as JSON. I can upload logs as a file from kivana. But is there are any way to do this with python?

Shuvo
  • 3
  • 4
  • Do you want ES to do parsing of your logs before storing? – Gibbs Jun 27 '20 at 08:06
  • @Gibbs I just want to upload "access.log" as it is and let ES handle everything – Shuvo Jun 27 '20 at 08:08
  • @Gibbs Something like this: || files = {'file': open("access.log", 'rb')} || requests.post(url, files=files) – Shuvo Jun 27 '20 at 08:13
  • I would suggest you to take a look at `logstash`for parsing while ingesting into elasticsearch and split you files into multiple files. You can add multiple input files to logstash which will make you the job faster. – Gibbs Jun 27 '20 at 08:17
  • @Gibbs I wish I could use logstash. But I'm working on a project where I get logs file from users through HTTP request. What I'm doing now is: 1. I'm creating index name as the file name. 2. parsing and uploading those logs to ES. 3. After uploading I'm deleting those files. Can I do this kind of dynamic things with logstash? actually, I'm new in ELK and don't know too much. – Shuvo Jun 27 '20 at 09:05
  • You can create index. You can parse and load to that index. Logstash ll do it. Reception you need to take care from that end point. – Gibbs Jun 27 '20 at 09:36

1 Answers1

0

I would suggest the below.

Receive log file from your end point. Then use a tool/application to split the file into more files for parallel processing.

Have a common Logstash conf with multiple input paths to process all files parallel. Have custom pipeline to process the data in the log file. Have one output to insert into elasticsearch.

You might need below things.

  1. conf
  2. filter
  3. Especially ES processors
Gibbs
  • 21,904
  • 13
  • 74
  • 138