How to efficiently index logs to elastic search

Question

I'm developing a web application in which i'll upload a log file, the file will be read and classified based on logger levels (INFO,ERROR,WARNING etc..). I need to index those logs to elasticsearch using java high level rest client api.

Currently i'm creating one index for a classname(Logs will contain classname) and store those class logs in that particular index. I feel this approach is wont be nice in some cases if a log file contains logs from 100 different classes, i'll be creating 100 indices for it and storing those logs.

Is there any efficient way of indexing logs to elasticsearch?How to determine indices in my case?

Sample log:

02-Jul-2021|10:03:10.040|INFO|[main]|org.apache.catalina.startup.VersionLoggerListener.log|Server built: Jun 11 2021 13:32:01 UTC

There's no point in creating class-specific indexes. The common approach is to store everything in the same index and then you can filter your logs based on anything you need, be it class name, log level, timestamp, etc. That's how it's supposed to work in the general case — Val, Aug 04 '21 at 06:01
Will be it an efficient approach if i store 2million logs on same index? — Ajay Venkatesh, Aug 04 '21 at 06:06
Much better approach than having 100** indexes for sure ;-) You need to read up a bit on how Elasticsearch works, because a wrong design up front can cost you down the road — Val, Aug 04 '21 at 06:07

score 1 · Answer 1 · answered Aug 04 '21 at 13:26

Here is the practice that we use in some of our java based stack and it has many privileges for the usage of Apache Kafka as middle data pipe line and logstash as data ingestion pipeline. First you need to remove default providers for logs in your spring boot application inside your pom.xml file, Which are Logback and perhaps Log-classic then you need to add log4j2 as new log provider and adding Kafka appender. After adding dependencies you need xml configuration file where you can add your Kafka appender configurations. By default you need to locate your configuration file in resource path of your project and name it as "log4j2.xml".

You can find many others Log4j2 appenders like Cassandra or Failover appenders and add them beside your Kafka appender inside your configuration file. You can find an applicable and correct example in below.

<!--excluding logback -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>org.springframework.boot</groupId>
                    <artifactId>spring-boot-starter-logging</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.springframework.boot</groupId>
                    <artifactId>logback-classic</artifactId>
                </exclusion>
            </exclusions>
        </dependency>


<!--adding log4j2 and kafka appender-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-log4j2</artifactId>
        </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-log4j-appender</artifactId>
        <exclusions>
            <exclusion>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-log4j12</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

Kafka appender configuration

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="info" name="kafka-appender" packages="Citydi.ElasticDemo">

    <Appenders>

        <Kafka name="kafkaLogAppender" topic="Second-Topic">
            <JSONLayout />
            <Property name="bootstrap.servers">localhost:9092</Property>
            <MarkerFilter marker="Recorder" onMatch="DENY" onMismatch="ACCEPT"/>
        </Kafka>

        <Console name="stdout" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{HH:mm:ss.SSS} stdout %highlight{%-5p} [%-7t] %F:%L - %m%n"/>
            <MarkerFilter marker="Recorder" onMatch="DENY" onMismatch="ACCEPT"/>
        </Console>

    </Appenders>
    <Loggers>
        <Root level="INFO">
            <AppenderRef ref="kafkaLogAppender"/>
            <AppenderRef ref="stdout"/>
        </Root>
        <Logger name="org.apache.kafka" level="warn" />
    </Loggers>

</Configuration>

Activating Zookeeper broker

./zookeeper-server-start.sh ../config/zookeeper.properties

Activating Kafka broker

./kafka-server-start.sh ../config/server.properties

Create Topic

./kafka-topics.sh --create --topic test-topic -zookeeper localhost:2181 --replication-factor 1 --partitions 4

Active consumer of the created topic

./kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic

Then add the log appender for created topic for consuming logs(This one is up to you) and after that create a Logstash pipeline such as below configuration as ingest your logs into your desired index in elastic .

input {
    kafka{
        group_id => "35834"
        topics => ["yourtopicname"]
        bootstrap_servers => "localhost:9092"
        codec => json
    }
}

filter {

}

output {
    file {
        path => "C:\somedirectory"
    }
    elasticsearch {
        hosts => ["localhost:9200"]
        document_type => "_doc"
        index => "yourindexname"
    }
    stdout { codec => rubydebug
    }
}

How to efficiently index logs to elastic search

1 Answers1