Humio

Graphing Modem Data for Fun: Part 4

Posted: October 22, 2020

Welcome to part 4 of the series.  In the previous posts I have built a dashboard to show the status of my cable modem using different log analytics tools.  In this post I will be using Humio to build the dashboard.  Humio is a relative newcomer to the log analytics space.  A quick scan of the Humio web page tells me that they are focused on real-time search, ingesting gobs of data, and keeping the TCO low.  These are priorities that sound good on their own, and then they get to “index-free design”.  The curious technologist part of me wants to see how well it works. With my modem-data use case ready to go, let’s get to it!

Like many platforms Humio allows you to send data to their cloud or install it locally in your own environment.  I like servers and infrastructure, so I am going to install it locally. I have recently discovered how useful Docker can be for development and prototyping.  You know that joke about pilots, cross fitters, and vegans? I’m kind of like that with Docker right now.  Since Humio has a Docker image, I am going to be using that.  Spinning it up takes a few seconds, and we are ready to go.

When I pitched this series to my boss, I thought I would be able to knock out the articles in a couple of weeks. I started the trials, stood up the systems, and wrote the fetch and transmit scripts to send the data to all 4 systems at the same time.  I was, perhaps, overly optimistic about how long it would take to complete the series while getting other work done too. To any of my past, present, or future project managers, I am sorry about all those estimates that were not rooted in reality.  I sometimes forget that I need to sleep occasionally.  Here we are 2 months later, the trial period for Humio expired after 30 days.  I appreciate that Humio keeps ingesting data, and lets you search data from the last hour after the trial period expires.  Since I did not get this post written in the expected time frame, I’ll have to start over with a clean Humio instance.  Dropping the docker container and starting over was as easy as reading that sentence.

To get started, I need to create a repository for the data.  In Humio repositories are a collection of data, saved searches, parsers, users, and dashboards. This is an interesting grouping, but I like it.  Once I have a repository, I need to create my parser to tell Humio how to handle the data I am sending it.  Humio comes with several parsers out of the box for common data types.  When I was building my scripts, I could not figure out how to get Humio to recognize the timestamp properly.  I tried several ideas before I stumbled into the idea of a custom parser. It works, so I am sticking with it until I figure out a better solution.  My custom parser is a modification of the out-of-the-box JSON parser, adding a setting to specify the time zone for my data, which is always UTC.

While I was trying to figure out how to get the data into Humio, I found out that when it fails to identify the timestamp, or encounters another ingestion problem, Humio appends that error information to the event.  This was unexpected, but very convenient during troubleshooting.

With the parser created I can create an API token and select my xfirouter-json parser for that token to use.  Once the token is copied to my transmit script, I can start the ingestion of data from the archive.  The backfill completes without any issues.  Since these scripts have been running virtually unattended for the last 2 months, this success is not surprising.  However, I like celebrating minor victories, so I treat myself to pumpkin cookies for achieving this milestone and move a sticky note to the completed column.

Creating the Dashboard

As I said earlier, Humio has specific focus points for their product.  Even with this relatively small sample data set of about 1 million records, I can feel a difference in the search speed.  It is FAST.  By default, searches are done as a live search.  The results update automatically as new data comes in.  I can run a fixed time range search as well, but Humio defaults to live searches.  This is unusual, many platforms allow live searches, but do not recommend large numbers of them due to the pressure they can put on the system. Humio made different architectural choices and seems to prefer live real-time search.

Humio has a rich query language with a familiar command-line, piped-function structure.   The syntax takes a little bit of time to get used to, but that is true for any purpose-built language.  One thing that I like about the language is that it includes common scripting structures.  I can pass the timechart function an array of functions to create different series for the chart.  In this case a line chart showing the min, max, and average lines over time.

Now that I have a couple of commands under control, I can start building the dashboard.  Almost immediately I am faced with a challenge.  It was not obvious to me that to display the metrics values, I would use the Gauge visualization. I ate one cookie in frustration from not being able to figure it out immediately, and another to celebrate figuring it out.

Now that I can make the metrics panels, the top row of the dashboard was simple to put together.  When I finished this row of widgets, I set the goal of completing the next row before going back for more cookies.  I would regret this decision when I realized that the next row was going to be impossible.

Humio offers the core visualizations that are expected.  Line Charts, bar charts, pie charts, tables are all available.  An out of the box Sankey diagram, and map visualizations are nice to see as well.  There does not appear to be a heat map option currently.  This means that the heat maps that I am supposed to create in the second row are not going to be possible.  Because there is a cookie reward on the line, I remind myself that this project is about visualizing data, not rendering a particular chart type.  Maybe that is moving the goalposts, maybe it is a little bit of ret-conning my requirements, but no matter what, I am going to get this done.

Heat maps are not the only way to represent group values over time. In fact, heat maps are not usually time-based, which I talked about before in the series.  I have heard rumors that before there were heat maps – back when folks walked up hill both ways to school in a blizzard – there were multi-series line charts. Armed with new inspiration, I build some line chart widgets.  While you look at this proof-of-progress picture, I am going to get myself a cookie.

The rest of the dashboard came together in much the same way as the earlier panels.  I cannot do everything exactly the way that I want to, but I am able to render the required information.  The dashboard loads almost instantly, even with 12 widgets loading at once.  The searches and widget building process is easy and intuitive.  I would love to have more control over the layout and style, but I can easily toggle to dark mode and that is a big deal.   In fact, dark mode is cause for a celebration cookie.

This use case does not highlight Humio’s unique strengths, it does demonstrate some of the search and visualization functionality. I was able to build the dashboard, and I got the introduction to the tool that I was looking for, so I call it a success.  As with previous posts in this series, I am using a tool I am not very familiar with to accomplish a goal.  Sometimes the mix of tool and requirements is not perfectly aligned.  Not every task or project can afford to go through a selection and procurement cycle to find the perfect fit.  I look forward to following Humio as they evolve, I think they have an interesting and useful platform to build on.  Eventually, I will have another use case to explore and show off more of Humio’s core values and features.

As I wrap up this post, I am making a note to myself to not use cookies as milestone rewards for my next project.  I will be adding a few miles to my walks to make up for all the cookies I consumed during this project.  I will be back in a couple of weeks with the 4th iteration of this dashboard using Splunk

***

This blog was written by Greg Porterfield, Senior Security Consultant at Set Solutions.