As systems grow in complexity, it becomes increasingly important to know the precise dynamics of each system and how these systems relate to your business. Achieving this insight is most commonly done through metrics. However, creating a rich set of metrics that integrate both system level information and business level information can be challenging. Today’s solutions to these problems are lacking when it comes to developer experience. L2met is unlike other solutions in that it is optimized for an excellent developer experience. Since l2met is painless to maintain and easy to use, you will create more metrics and consequently have more insight into the operations of your systems and your business.
You can have l2met up and running in less than a minute. Give it a try by following these setup instructions. Keep reading for more details.
L2met is a network daemon that receives log lines, extracts data from the logs using a convention, and outlets the aggregation of data to charting services like Graphite or Librato.
L2met was developed to address several of the visibility problems we faced at Heroku:
L2met extracts data from log lines and creates metrics as a result. This means developers can create metrics without installing an external library. As long as you can write to STDOUT you can create metrics. Here is an example of a structured log message:
measure#cache.get=4ms measure.db.get=50ms measure#cache.put=4ms
There are a couple things to note: key=value structure, multiple metrics on a single log line, and units associated with the values. L2met will parse this log line and create 3 metics: cache.get, db.get, and cache.put. The sum, count, mean, min, median, 95/99th percentile, max, and last values will be computed on the metrics.
Since there are no client libraries for the app developer to install, the host which runs the app must deliver the logs to l2met. If your app is running on Heroku, this is as simple as adding a log drain:
$ heroku drains:add https://your-l2met -a your-app
Once you are draining into l2met, you simply put structured logs to STDOUT.
If you manage your own infrastructure, you can use log-shuttle(1) to format the app logs into syslog packets and forward the logs to l2met.
$ export LOGS_URL=https://u:firstname.lastname@example.org/logs $ echo 'measure#hello=1 measure#world=1' | log-shuttle
In both cases, l2met will parse 2 metrics, hello=1 and world=1.
Most of Heroku’s internal services (dyno managers, API, etc.) log using l2met conventions. Those logs are sent to logplex, our distributed log routing cluster, which then drains each app’s stream to a fleet of l2met instances. L2met computes metrics from the incoming logs, and outlets those metrics to Librato. We use the data from Librato’s API and l2met’s (experimental) HTTP outlet to alert engineers if operating thresholds are exceeded.
The use of logplex and its drains make it super easy to setup metrics for both platform (apps that run on Heroku) and kernel services. New services can have a full fledged metric pipeline in a matter of seconds by typing one command (`heroku drains:add`), provided they are following the logging conventions.
L2met was built on the shoulders of giants. It is a fresh implementation that leverages modern technologies and solutions to a well known and rigorously studied and solved problem. The visibility problem has plagued many engineers for many years and consequently there have been many great solutions. Here is a brief timeline of the problem to which I am most familiar with:
* Solutions used within Heroku.
While l2met has no direct relationship with RRD or Graphite, it does recognize that backend storage and visualizations software is tough business. The goal of l2met was to leverage the storage and visualization platforms while providing the simplest input interface for application developers. This was achieved by the fact that application developers need only to put to STDOUT to collect metrics. This is after all something every programmer can and has done already. If you squint your eyes, l2met basically asks programmers to write a hello world program.
Statsd was a great inspiration. However, there were certain implementation details that made it unsuitable for our particular set of engineering goals. The reliance of UDP as a transport and the lack of proper UDP routing support on the Heroku platform meant that we were not going to be able to run Statsd on the Heroku platform.
Pulse was the next generation visibility solution for Heroku. It differed from Statsd in that it parsed data from a log stream delivered by syslog-ng. Users of Pulse would register functions in the code base that would analyze the incoming logs. A customer web interface was added to Pulse to report on the numbers in real time.
Wcld varied a bit from Pulse in that it didn’t require users to deploy code to acquire new metrics. It parsed and stored data in hstore columns inside a PostgreSQL database. You could then write programs that connected to the database and executed queries and the output of the queries would be sent to a Librato account.
Exprd was the realization that there could be one program that received and stored log data and also provided an interface to get the data out. Users of exprd would drain their log data into it and then create expressions that would be evaluated on the data over some interval that was defined in the expression. The data would then be outleted to Librato. After several production issues, we learned that a RDBMS was not the best solution for the type of data and workload.
After dealing with several availability issues and a general lack of multi tenancy, a new project was born. L2met was to be better than the rest in that it would not require long term state. If a database were to be used, it should be used in such a way that if the current database failed, a new, fresh database could be swapped in place with little effect. This implied that user expressions and configuration could not be permitted. Instead, l2met defined a set of log conventions. If users logged in a loosely defined way, they would be rewarded with metrics with the smallest possible footprint (writing to STDOUT) across all platforms.