Making Micro-services Visible through ELK - Part 3 of 3

Log Format

Typically, many application logs are stored in non-JSON format and that is where we started with. Eventually, we decided to go with JSON-formatted logs for a number of reasons.

The first reason is flexibility. With JSON-formatted log events, there is no tight coupling between different micro-services. Each micro-service has the flexibility of adding relevant fields where they are deemed required to assist RCA or enhanced business activity monitoring (BAM).

The second reason is keeping maintenance overhead to the minimum. Aside from loose coupling between different services, addition or removal of fields do not require any update on the log parser. This gives each team the autonomy to decide on what is appropriate for their use case without bottleneck potentially caused by the team that maintains the GROK parser.

The third reason is to keep computation requirement to the minimum at the logstash server. This reduced requirement is good news for us since we are going with t2.nano instance type for our logstash servers.

The fourth reason is to allow multi-line log events to be shipped and parsed easily. Normal log events are fine as they usually span only a single line. The problem arises when we are trying to log such events as exception where stack trace can span multiple lines.

Beyond Application Logs

With full-stack ELK running, we started to realize the versatility of the stack and its potential as visualization tool. The world of possibilities is an endless one and here are a few use cases we have come up with other than just application logs.

Figure 5a. CPU Utilization Snapshot of Different Micro-services

Figure 5b. Network In Snapshot of Different Micro-services

Figure 5c. Network Out Snapshot of Different Micro-services

First, we use it as our AWS EC2 resource utilization monitoring dashboard. As every micro-service would be running across multiple EC2 instances, key metrics (eg. CPU Utilization and Network I/O) of these instances can be grouped together. This dashboard provides us a high-level overview on the level of resource utilization of every individual micro-service and whether provision of additional resources is required.

Figure 6. Actual AWS Usage vs RI Plan of T2 EC2 Instance Types

Second, we use it to monitor actual AWS usage versus reserved-instance (RI). This dashboard allows us to know exactly how far our usage has grown and deviated from our RI plan and if additional reservation is required.

Third, we use it as our infra visualization dashboard. Key statistics from RabbitMQ, Redis and etc are being collected and shipped to our ELK stack. Through the Kibana dashboard, we have a means to see the latest health state of our infra component in just one glance.

As our team scales up and the total number of micro-services are ranging in hundreds, the ELK stack is one of the many key initiatives that we are undertaking in order to meet the increasingly challenging task of managing micro-service architecture.