edit: Achtung! Warning!
Please use this docker-compose
instead of the one I linked (it does require docker
and compose
and maybe machine
but it has done more for you and you will have to struggle less.
CELK: https://github.com/codenamekt/celk-docker-compose/blob/master/logstash/logstash.conf
Also
So start here to get a nice overview of a working system. They've done some of the work for you so you have to worry only about your question, which pertains to configuration and deployment.
Even if you don't end up using Docker this will still get you on the track to success and have the added benefit of showing you how it fits together.
Get Vagrant first and build the Vagrant image w/ vagrant up
If you don't know what Vagrant is, it's wonderful. It's a program that allows people to share entire set of virtual machines and provisioners so that you can define a VM and its configuration only, rather than sharing the entire VM, and it "just works." It feels magical, but it is really just solid systems work.
You will need to install Vagrant to use this. Just do it! Then, you don't have to install docker
because it will run on a Vagrant VM.
You have four choices of how you want to use it, but first, get Vagrant ready with the command in bold....
vagrant up
Decide which programs you need to run
Your options are:
- Full suite or ELK (elasticsearch, logstash, kibana)
- Agent only (Logstash collector)
- Kibana only
There are other configurations available but for testing only.
Showtime
Now it is time to configure Logstash, which is really the only part that has complex behavior.
Logstash configuration files are plain text files that end in conf
and optionally can be grouped together using a tar
or gunzipgz
.
You get the config files in one of two ways:
- you download them from the Internet, using the environment variable
LOGSTASH_CONFIG_URL
to point to the url of your config, and **if you get the url wrong or there is some problem and it cannot get a config from the url, it falls back to a knonw url or else
- read them from the disk, kind of - since this is docker, you will actually be creating a volume one time (now) and you will mount the volume each time you run the container.
Here is what it looks like when you run using a config from the Internet:
$ docker run -d \
-e LOGSTASH_CONFIG_URL=https://secretlogstashesstash.com/myconfig.tar.gz \
-p 9292:9292 \
-p 9200:9200 \
pblittle/docker-logstash
The author of the docker
warns you:
The default logstash.conf only listens on stdin and file inputs. If
you wish to configure tcp and/or udp input, use your own logstash
configuration files and expose the ports yourself. See logstash
documentation for config syntax and more information.
Recall it is the file you get when you don't put in a correct URL for the required environment variable LOGSTASH_CONFIG_URL
This is the input section:
// As the author warned, this is all you get. StdIn and Syslog file inputs.
input {
stdin {
type => "stdin-type"
}
file {
type => "syslog"
path => [ "/var/log/*.log", "/var/log/messages", "/var/log/syslog" ]
}
file {
type => "logstash"
path => [ "/var/log/logstash/logstash.log" ]
start_position => "beginning"
}
}
Beyond default
Read more about logstash
on the website.
Now logstash
has plugins that push data into input
. The plugins vary exactly as you would expect; here are a few:
s3
from amazon (file system events)
stdin
from logstash
(default, reads the stdin
buffer)
http
from logstash
(your guess)
- ...etc...
Example: UDP sockets
UDP
is a connectionless and fast protocol that operates at the bottom of L4
(transport) and supports multiplexiing, handles failures, and generally is a good choice for logging data transmission.
You pick the port that you want; other options depend on what you are doing .
TCP works the same way.
udp {
port => 9999
codec => json
buffer_size => 1452
}
Example 2: UDP sockets from collectd
filtered and output
This is stolen from https://github.com/codenamekt/celk-docker-compose/blob/master/logstash/logstash.conf
input {
udp {
port => 25826 # 25826 matches port specified in collectd.conf
buffer_size => 1452 # 1452 is the default buffer size for Collectd
codec => collectd { } # specific Collectd codec to invoke
type => collectd
}
}
output {
elasticsearch {
host => elasticsearch
cluster => logstash
protocol => http
}
}
And the filtering is a great example:
That is, it is really long and I think it does stuff
filter {
# TEST implementation of parse for collectd
if [type] == "collectd" {
if [plugin] {
mutate {
rename => { "plugin" => "collectd_plugin" }
}
}
if [plugin_instance] {
mutate {
rename => { "plugin_instance" => "collectd_plugin_instance" }
}
}
if [type_instance] {
mutate {
rename => { "type_instance" => "collectd_type_instance" }
}
}
if [value] {
mutate {
rename => { "value" => "collectd_value" }
}
mutate {
convert => { "collectd_value" => "float" }
}
}
if [collectd_plugin] == "interface" {
mutate {
add_field => {
"collectd_value_instance" => "rx"
"collectd_value" => "%{rx}"
}
}
mutate {
convert => {
"tx" => "float"
"collectd_value" => "float"
}
}
# force clone for kibana3
clone {
clones => [ "tx" ]
}
##### BUG EXISTS : AFTER clone 'if [type] == "foo"' NOT WORKING : ruby code is working #####
ruby {
code => "
if event['type'] == 'tx'
event['collectd_value_instance'] = 'tx'
event['collectd_value'] = event['tx']
end
"
}
mutate {
replace => { "_type" => "collectd" }
replace => { "type" => "collectd" }
remove_field => [ "rx", "tx" ]
}
}
if [collectd_plugin] == "disk" {
mutate {
add_field => {
"collectd_value_instance" => "read"
"collectd_value" => "%{read}"
}
}
mutate {
convert => {
"write" => "float"
"collectd_value" => "float"
}
}
# force clone for kibana3
clone {
clones => [ "write" ]
}
##### BUG EXISTS : AFTER clone 'if [type] == "foo"' NOT WORKING : ruby code is working #####
ruby {
code => "
if event['type'] == 'write'
event['collectd_value_instance'] = 'write'
event['collectd_value'] = event['write']
end
"
}
mutate {
replace => { "_type" => "collectd" }
replace => { "type" => "collectd" }
remove_field => [ "read", "write" ]
}
}
if [collectd_plugin] == "df" {
mutate {
add_field => {
"collectd_value_instance" => "free"
"collectd_value" => "%{free}"
}
}
mutate {
convert => {
"used" => "float"
"collectd_value" => "float"
}
}
# force clone for kibana3
clone {
clones => [ "used" ]
}
##### BUG EXISTS : AFTER clone 'if [type] == "foo"' NOT WORKING : ruby code is working #####
ruby {
code => "
if event['type'] == 'used'
event['collectd_value_instance'] = 'used'
event['collectd_value'] = event['used']
end
"
}
mutate {
replace => { "_type" => "collectd" }
replace => { "type" => "collectd" }
remove_field => [ "used", "free" ]
}
}
if [collectd_plugin] == "load" {
mutate {
add_field => {
"collectd_value_instance" => "shortterm"
"collectd_value" => "%{shortterm}"
}
}
mutate {
convert => {
"longterm" => "float"
"midterm" => "float"
"collectd_value" => "float"
}
}
# force clone for kibana3
clone {
clones => [ "longterm", "midterm" ]
}
##### BUG EXISTS : AFTER clone 'if [type] == "foo"' NOT WORKING : ruby code is working #####
ruby {
code => "
if event['type'] != 'collectd'
event['collectd_value_instance'] = event['type']
event['collectd_value'] = event[event['type']]
end
"
}
mutate {
replace => { "_type" => "collectd" }
replace => { "type" => "collectd" }
remove_field => [ "longterm", "midterm", "shortterm" ]
}
}
}
}
edit 3: I probably shouldn't be doing your work for you, but that's ok.
collectd
like any good software ENCAPSULTAES certain aspects that are ugly or difficult for users to deal with, and tries to make it easy for you in that it look slike you are sending data (a tuple in this case) instead of fooling with serialization.
Your example:
(date_time, current_cpu_load), for example ('2016-0-04-24 11:09:12', 12.3)
I'm not going to spend my time figuring out how you are forming that. If you are able to get that data using CPU plugin, great. I'm going to copy and paste one I found online to make it easy for me.
That said, think about it ... just a little bit, it won't hurt.
You see the CPU plugin is loaded below.
You see the interface for collectd
in the conf
file is too small to specify fields.
So if you just do this, it will work , but you will get much much much more data than just CPU load.
That's where you can use a filter. But you can also do that in Kibana I think . So I'd rather not waste tim writing a filter you a) don't need and b) could easily write if you spent some time.
## In `collectd`:
# For each instance where collectd is running, we define
# hostname proper to that instance. When metrics from
# multiple instances are aggregated, hostname will tell
# us were they came from.
Hostname "**YOUR_HOSTNAME**"
# Fully qualified domain name, false for our little lab
FQDNLookup false
# Plugins we are going to use with their configurations,
# if needed
LoadPlugin cpu
LoadPlugin df
<Plugin df>
Device "/dev/sda1"
MountPoint "/"
FSType "ext4"
ReportReserved "true"
</Plugin>
LoadPlugin interface
<Plugin interface>
Interface "eth0"
IgnoreSelected false
</Plugin>
LoadPlugin network
<Plugin network>
Server "**YOUR.HOST.IP.ADDR**" "**PORTNUMBER**"
</Plugin>
LoadPlugin memory
LoadPlugin syslog
<Plugin syslog>
LogLevel info
</Plugin>
LoadPlugin swap
<Include "/etc/collectd/collectd.conf.d">
Filter ".conf"
</Include>
Your logstash config
input {
udp {
port => **PORTNUMBER** # 25826 matches port specified in collectd.conf
buffer_size => **1452** **# 1452 is the default buffer size for Collectd**
codec => collectd { } # specific Collectd codec to invoke
type => collectd
}
}
output {
elasticsearch {
cluster => **ELASTICSEARCH_CLUSTER_NAME** # this matches out elasticsearch cluster.name
protocol => http
}
}
an update from aaron at collectd
In Logstash 1.3.x, we introduced the collectd input plugin. It was
awesome! We could process metrics in Logstash, store them in
Elasticsearch and view them with Kibana. The only downside was that
you could only get around 3100 events per second through the plugin.
With Logstash 1.4.0 we introduced a newly revamped UDP input plugin
which was multi-threaded and had a queue. I refactored the collectd
input plugin to be a codec (with some help from my co-workers and the
community) to take advantage of this huge performance increase. Now
with only 3 threads on my dual-core Macbook Air I can get over 45,000
events per second through the collectd codec!
So, I wanted to provide some quick examples you could use to change
your plugin configuration to use the codec instead.
The old way: input { collectd {} }
The new way:
input { udp {
port => 25826 # Must be specified. 25826 is the default for collectd
buffer_size => 1452 # Should be specified. 1452 is the default for recent versions of collectd
codec => collectd { } # This will invoke the default options for the codec
type => "collectd" } } This new configuration will use 2 threads and a queue size of 2000 by default for the UDP input plugin. With
this you should easily be able to break 30,000 events per second!
I have provided a gist with some other configuration examples. For
more information, please check out the Logstash documentation for the
collectd codec.
Happy Logstashing!