Splunk: big data visualization & intelligence

brian-rossi-splunk

WHAT IS SPLUNK?

Splunk is effing sweet. I first saw them at a trade show (O’Reilly Strata Big Data NYC) and I liked them because they had great t-shirts. I used to make t-shirts so I can appreciate a quality promo tee. After joining their mailing list and watching the company for while, I liked what I saw.

As the swiss army knife of big data, Splunk brings a real-time visualization layer to business analytics and operational intelligence. Imagine if Google Analytics was an open source software package that you could plug all of your business data into. That’s Splunk. It has a built in API so your developers can feed the resulting data insights into other visualization tools and machine learning algorithms. A really great application of the API would be to export the resulting data into Microsoft Excel for some regressions / predictive analysis.

If you’re a forward thinking organization, hopefully you have a data scientist or two. They are very likely your engineers, so the skill set needed to get Splunk running on your own private cloud may already be at your fingertips.

The coolest part? It’s not a hosted cloud solution. That means your organization won’t be forced into paying double and triple hosting costs to Splunk for hosting you may not want. You don’t even have to put Splunk on the cloud, you can host it in a data center or private cloud inside your building. Does your organization retire computers after just a few years? A great re-use of those commodity machines could be a Splunk cluster.

HOW DOES SPLUNK WORK?

Splunk ingests your log / data files, parses, then indexes them. Once all of your data is indexed, you can run searches via the Splunk interface. Splunk has built in map reduction capabilities, kinda like Hadoop. Map Reduce is a really simple yet powerful concept. It maps log records (events) to a value and the reduces (counts) the number of occurrences of that value. For example, counting the number of times a certain page was accessed from your raw server logs.

Once counted, Splunk returns a visualization of the data you asked for. The display graph is selectable between different types (line, pie, bar, etc). The user can also build dashboards with certain repeatable information and visualizations. For some Google Analytics like functionality, try the Splunk Web Intelligence App. The end result data can be exported via the API to excel for additional regressions and predictive modeling.

Splunk is also in private beta of it’s Splunk + Hadoop Enterprise product. Very cool to see they are willing to adopt the data science industry standard Hadoop and package it up nicely for everyone. There is also a mySQL connector for Splunk. Splunk takes you far down the big data path, while not holding you captive in their environment. Super win.

HOW MUCH DOES SPLUNK COST?

Splunk is far less than the cost of a dedicated big data employee (unless you’re indexing many many gigabytes of data per day.) Refocus your IT team on data science rather than building infrastructure that Splunk has already developed.

Splunk’s business model takes it back to the old school software licensing model, and it works. They give you the full tool set for 60 days, and if you want to use it as a distributed system, you need to purchase Splunk Enterprise. A Splunk Enterprise License for up to 500mb of indexed data costs $500/mo, and scales up with your organization’s indexing needs. Note the cost is only based on data indexed (ingested), not how much or how often you access and query the data. It also doesn’t matter if you have a cluster of 3 or 100 machines, the price doesn’t change.

Splunk can quickly and cost effectively get your organization up and running with business & operational intelligence.


Brian Rossi is a web architect and entrepreneur. Brian loves scalable architectures, small business development, tech startups, big data, mobile payments, social web, branding, & user experience.

Advertisement