It all started with a simple use case
The idea is to send server metrics to InfluxDB and visualize the data with Grafana. I do not know anything about InfluxDB, and I have not used Grafana in years. Armed with this overwhelming amount of experience and knowledge, when asked if it should be possible to send a bunch of metrics from a number of servers for an application, I respond “Yeah, it should be possible, let me look into it”. I fire up a new browser window and start typing searches to figure out to what InfluxDB is. I fire up some virtual machines, and I get to work building.
As use cases go, storing metric data is easy enough. Still, no matter how easy something might seem, I think it is wise to look at the products before definitively stating it can be done. Always hedge the answer until you have something to show. It is good to work through the trials and tribbles, before presenter mode is activated.
When starting a discovery project like this, I think that finding a good use case is the hardest part. Luckily for me, this project comes with its own use case. With the hardest part out of the way the rest will be easy, right? Even with a use case, sample data that fits the use case is sometimes hard to come by. The customer application in question has 50 nodes, in 4 clusters. I have a lot of hardware and screens here in my office, but I am not going to be able to replicate that scale. It is also not terribly important that I do so. I could use the performance counters on my laptop, which is like the customer’s data, just smaller. Simple, logical, easy.
On the other hand, I could do something completely different. I could use data I do not really understand, to solve a problem which I do not have, and hope I can explain the process to someone else when I am done. Let’s do it that way!
As it happens, my colleague is working on an issue with his cable modem. I too have a cable modem, and I can get data from it, I think. My colleague sent me a link and a screenshot of a dashboard someone else created in Grafana for their modem. Cable modem data does not look anything like server metrics. However, it is data, it changes periodically, and the screenshot tells me what I am supposed to build. Sounds like a use case to me! I toss the initial requirements out the door, start again with a whole new idea. This is going great!
The inspiration for this use case comes from this site: https://www.going-flying.com/blog/arris-cable-modem-monitoring.html. Those numbers and lines probably mean something to people who do that kind of thing for a living. I do not know what the transmit power, or signal to noise ratio really means regarding my internet connection. That is ok, maybe the people who know what those things are do not know how to capture the data and build the charts. Synergy, cooperation, exploration, this is how we do it. How many buzzwords can I fit in this blog? I don’t know but stick with me and we will find out.
Anyway, I have a new goal. My goal is not remotely related to the original use case, but it is self-contained. Blindly ingesting data, without having an idea of what to do with it is not a great place to begin. I have done this before, I get bogged down fiddling with the settings, knobs and levers. I randomly explore features of the product, not knowing which ones to try out. I end up mostly wandering around aimlessly.
I do not know what problem the customer wants to solve with their server metrics. I do not know if they plan to use interactive searches, or dashboards on a wall, or AI/ML to predict failures based on trends. However, I am pretty sure that I can get metrics off my modem and turn those metrics into charts like the screenshot.
In case anyone thinks that this post will transition to a story about not meeting the requirements, fear not. I was able to create the dashboard with my data. It’s pretty, I think. I changed the colors because it’s my dashboard and I can do that. I do not know if the original dashboard colors meant anything, but purple is cool, and I like blue more than green, also trendlines are nifty in my opinion.
Why do I show the end-result before talking about how I get there? First, because I am proud of that silly dashboard and I want to share it. Second, and probably more relevant to the series, about half-way through the process of building the Grafana dashboard, a thought is going to occur to me, “Why explore one product when I can upcycle this use case and explore several products?”.
With that one simple thought, a task that should have taken half of a day, turned into project that will take quite a bit longer. I also got the idea for a blog series out of the deal, and here we are. Scope creep, it’s a killer.
Plan the work, change the plan, ignore the plan, make a new plan, work the plan.
When I first review the data for the cable modem dashboard blog site, I recognize a lot of the data as stuff I have seen in the configuration screens for my modem. It looks a little different, but it is close enough. A simple web scraper to capture the data should be sufficient. It is a just web page after all. With the data in hand, I will make some API calls to send data into the products, build some dashboards, and have time for celebratory drinks after work.
Good plan. Not at all how it will work out, but a good plan, nonetheless. Getting a hold of the data is an adventure that could get its own series. The simple web scraper ended up being a Selenium ChromeDriver script driving a real browser instead of a simple CURL request, because reasons. The data needed to be parsed from the HTML tables, and then then pivoted in order to be useful. And my data is not correlated the way that other modems are. I will be fudging that bit a little. On the bright side, both my modem, and the example modem use blue header-bars for their tables. The similarities end there. So far, this is relatively normal for something like this, really.
With data collection handled, I can move onto getting the data into the various tools I am going to be looking at. The original use case for my customer was to use Grafana and InfluxDB, so I need to do that part. While building the Grafana dashboard I fired up Splunk to do some sanity checking. I have worked with Splunk for a long time, so it helped me verify the results.
Elasticsearch was added to the list because if I had already loaded the data into 2 systems, why not 3. I am currently training on Elasticsearch so I could advance that goal while at the same time. Finally, Humio is on the list because it has a different architecture, and it seems interesting. I drew the line at 4 systems at one time. I did not really draw that line, my laptop fans are running like a jet engine, and I’m out of memory, so four systems is the limit
There are a lot of good log aggregation and data analytics tools out there. I will probably pull this use case out again if I need to learn one of them, but I can’t try this on all of them at once. Well, I probably could, but I would need a lot more hardware to work with.
Each post in this series will focus on ingesting the data and creating this dashboard with one of the tools. I’ll tell stories about interesting parts of the process from getting data loaded, to the final dashboard. It’s the same dashboard each time, but it looks different in each tool and I had to learn different things to get it to work. In the next post I will talk about getting the data loaded into InfluxDB and using Grafana to visualize it.