I often get SystemOut.log files from customers or friends to help them analyzing a problem. Often it is complicated to find the right server and application which generates the real error, because most WebSphere Applications (like IBM Connections or Sametime) are installed on different Application Servers and Nodes. So you need to open multiple large files in your editor, scroll each to the needed timestamps and check the lines before for possible error messages.
Klaus Bild showed on several conferences and in his blog the functionality of ELK, so I thought about using ELK too. I started to build a virtual machine with ELK Stack (Elasticsearch, Logstash & Kibana) and imported my local logs and logs i got mailed. This way is cool to analyze your environment, but adding just some SystemOut.logs from outside is not the best way. It’s hard to remove this stuff after analyzing from a ELK instance.
Then I found a tutorial to install ELK with Docker, there is even a great Part 2 of this post, which helps us installing an ELK Cluster. Just follow this blog posts, install Docker and Docker-compose. It’s really fast deployed. In my case i do not use flocker, it’s enough to use a local data container for the elasticsearch data.
Why do I use Docker instead of a virtual machine? It’s super easy to just drop the data container and begin with an empty database.
My setup
I created a folder on my Mac to put all needed stuff for the Docker images to it:
mkdir elk
cd elk
Create a file Docker-compose.yaml with following content:
elasticsearch:
image: elasticsearch:latest
command: elasticsearch -Des.network.host=0.0.0.0
ports:
- "9200:9200"
- "9300:9300"
volumes:
- elasticsearch1:/usr/share/elasticsearch/data
logstash:
image: logstash:latest
command: logstash -f /etc/logstash/conf.d
volumes:
- ./conf.d:/etc/logstash/conf.d
- ./logs:/opt/logs
ports:
- "5000:5000"
links:
- elasticsearch
kibana:
image: kibana
ports:
- "5601:5601"
links:
- elasticsearch
environment:
- ELASTICSEARCH_URL=http://elasticsearch:9200
So you see that elasticsearch is using a data-container for /usr/share/elasticsearch/data
and logstash reads the configuration from a local folder ./conf.d
and we map an additional local folder logs
to /opt/logs
. This log
folder will be used to copy SystemOut.logs for analyzation to it.
elk/conf.d
I splitted my logstash configuration to multiple files, but you can copy them together if you want.
elk/conf.d/10-websphere-input.conf
input {
file {
path => [ "/opt/logs/*/*/SystemOut.log" ]
start_position => "beginning"
type => "websphere"
# important! logstash read only logs from files touched the last 24 hours
# 8640000 = 100 days
ignore_older => "8640000"
}
}
You can see that i added //
in front of the log file. This will be used to tag the log messages with company and servername, so i can search and filter on basis of these tags. So i assume that the first folder after /opt/logs
is the company (or test/production) and the second folder will be the servername.
elk/conf.d/50-websphere-filter.conf
filter {
multiline{
# Nothing will pass this filter unless it is a new event ( new [2014-03-02 1.... )
# multiline adds java error traces to original log entry
pattern => "^\["
what => "previous"
negate=> true
}
if [path] =~ "SystemOut.log" {
grok {
# get company and servernames from folder names
match => ["path", "%{GREEDYDATA}/%{GREEDYDATA:company}/%{GREEDYDATA:server_name}/SystemOut.log"]
}
grok {
match => ["message", "\[%{DATA:timestamp} %{WORD:tz}\] %{BASE16NUM:was_threadID} %{WORD:was_shortname} %{SPACE}%{WORD:was_loglevel} %{SPACE}%{SPACE}%{SPACE}%{WORD:was_errorcode}: %{SPACE}%{GREEDYDATA:message}"]
match => ["message", "\[%{DATA:timestamp} %{WORD:tz}\] %{BASE16NUM:was_threadID} %{WORD:was_shortname} %{SPACE}%{WORD:was_loglevel} %{SPACE}%{SPACE}%{SPACE} \[%{GREEDYDATA:was_sibbus}\] +%{WORD:was_errorcode}: %{SPACE}%{GREEDYDATA:message}"]
match => ["message", "\[%{DATA:timestamp} %{WORD:tz}\] %{BASE16NUM:was_threadID} %{WORD:was_shortname} %{SPACE}%{WORD:was_loglevel} %{GREEDYDATA:message2} +%{WORD:was_errorcode}: %{SPACE}%{GREEDYDATA:message}"]
match => ["message", "\[%{DATA:timestamp} %{WORD:tz}\] %{BASE16NUM:was_threadID} %{WORD:was_shortname} %{SPACE}%{WORD:was_loglevel} %{GREEDYDATA:message2}\) +%{WORD:was_errorcode} +%{SPACE}%{GREEDYDATA:message}"]
match => ["message", "\[%{DATA:timestamp} %{WORD:tz}\] %{BASE16NUM:was_threadID} %{WORD:was_shortname} %{SPACE}%{WORD:was_loglevel} %{SPACE}%{GREEDYDATA:message}"]
overwrite => [ "message" ]
}
date{
match => ["timestamp", "dd/MM/YY HH:mm:ss:SSS", "M/d/YY HH:mm:ss:SSS", "MM/d/YY HH:mm:ss:SSS", "M/dd/YY HH:mm:ss:SSS", "MM/dd/YY H:mm:ss:SSS", "M/d/YY H:mm:ss:SSS", "MM/d/YY H:mm:ss:SSS",
"M/dd/YY H:mm:ss:SSS"]
}
}
}
I remove the timezone information of the timestamp, so if you need to know the local time of your system, you should adjust this (had problems with CEST and so on, so i just removed it).
I tried to get the most informations i can, but the filter is not complete until now, some message do not provide us the right error_code, but i think it is a good starting point.
You can use the same filter for SystemOut.logs sent through the network, or with other ELK tools, then the company and server information is not tagged.
elk/conf.d/90-elasticsearch-output.conf
output {
if "_grokparsefailure" not in [tags] {
elasticsearch {
hosts => "elasticsearch:9200"
}
}
}
Just the code that logstash forwards the log messages to elasticsearch with the additional tags and updated timestamps.
That’s pretty much all. Open a command window and change to the elk folder:
cd elk
docker-compose -f docker-compose.yml up -d
Open http://localhost:5601 in your browser and you can start using kibana with your logs.
When you want to reset the containers, just use:
docker-compose -f docker-compose.yml rm -f --all
docker volume rm elasticsearch1
And after this you can create the containers again.
Start using Kibana
Open http://localhost:5601 in your favorite browser and configure the index pattern:
I check rather old logfiles here, so i get following page on “Discover”:
So moving the timepicker a little bit and we see the first log messages:
You see we can expand a single message entry and check tags and informations, even the java error stacks (multiline) are grouped together, additionally to the error or log message we get all added fields we generated out of the log file:
Example of multiline:
Example of filtering and changing the view (only showing special error codes):
So now we can start custimizing the view filter for event type, error code and so on. Have fun to start analyzing with ELK.