The IoT challenge isn’t capacity planning, it’s about database choice

No matter what it means to your business, the Internet of Things (IoT) in its broadest sense is affecting all businesses from the sports brand delivering fitness trackers to the manufacturer predicting pump failures. But if you want to build an IoT application today, you will struggle to find a good cookbook because there are many careful considerations to be made upfront.

[easy-tweet tweet=”Internet of Things (IoT) in its broadest sense is affecting all businesses” hashtags=”IoT, tech, cloud”]

How much data will I collect? Will my current tools perform? Will I run out of capacity, and if so how can I predict my need for servers in the cloud?  A good place to start is Gartner analyst Doug Laney’s famous Three V’s of Big Data: Volume, Velocity, Variety. You may not categorise your needs as big data as such, but it is likely one of these Vs will become – or already is – a new challenge.

Volume is a commonly cited problem because it can be hard to predict how much data will grow with device deployment and subsequent analyses, and the cost of over-provisioning is unacceptable. Today you can opt for a so-called IoTaaS or IoT Platform and outsource large parts of your service to a Google, or a Cisco (Jasper) or you can stay DIY by starting with the servers you need today and harnessing cloud and software-defined infrastructure to scale. The key here is not visiting the fortune teller to predict what you need, but understanding how web-scale companies scale efficiently and reliably on commodity hardware by choosing their technology stack carefully from the outset.

Then there’s Variety. The bad news is that most relational databases (nor most NoSQL databases) aren’t designed to query IoT data efficiently. The good news is that IoT data are, in essence, very simple – a stream of timestamped values – which is why the database you choose must be optimised to ingest and query this data as efficiently as possible. It is by getting this database foundation right that today’s pioneers have the Velocity to deliver the real-time analytics and deep insights enabling them to live up to the promise of IoT.

In short, if you’re building for IoT you need a time series database that scales efficiently across many servers in the cloud.

Ticking time (series) bomb

Time Series data are being generated everywhere; by devices in homes, on your wrist, increasingly in our vehicles, linking patients to healthcare service providers, to industrial machinery, and the financial world has been immersing themselves in it – time series data is fast becoming the coolest kid on the block.

Businesses that understand how to make full use of time series data can identify patterns and forecast trends creating significant competitive advantage.  For example, it has been estimated by McKinsey & Company that retailers who leverage the full power of their data could increase their operating margins by as much as 60%. At the moment less than 0.5% of all data are ever analysed and used, and a recent study by Oxford Economics found that only 8% of business are using more that 25% of their IoT data1. Just imagine the potential.

[easy-tweet tweet=”Businesses that understand how to make full use of time series data can see patterns and forecast trends” hashtags=”tech, cloud, data”]

Accepting that promise, the next question is how you can continue to capture it cost effectively? What if you need to archive?  What are the implications when you’re required by law to keep seven years of data with the granularity you need?  Try doing this with Teradata, Netezza or Oracle.  The short version of the story is that it’s just too expensive and cumbersome.  You need to be considering commodity hardware combined with operationally simple open source software.

And it’s not only the expense.  Not to labour a point we’ve made many times, but these databases are just not designed to hold and manage unstructured, or specifically time series, data.  They were built for a different era.

Setting scalability aside, you should ask: How does my database

  1. Meet the cumulative volume of “readings” or data points over the period of data retention as required by the business
  2. Provide adequate query performance for time series data
  3. Manage data as it expires over time
  4. Support the analysis needed for my use cases

Growing ambition

The aforementioned IoTaaS, of course, are masters of scale. They have invested hugely in building and optimising their “stack” for seamless operation so they can focus on invoicing happy customers and delivering new features. But these technologies are typically free and open source – if a backroom startup can profit from them why can’t established enterprise? Well, here’s a secret: Those that are succeeding in early IoT products and internal services already do.

I am fortunate to live in the “new stack” world where our customers have hit the limits of what’s possible and needed solutions to grow. Take the Weather Company as an example. Their system collects several gigabytes of data per second from many sources including millions of mobile phones, sensors in weather observing stations, as well as thousands of daily flights that measure data on temperature, turbulence, and barometric pressure.

In October 2015, Bryson Koehler, executive vice president and CIO, The Weather Company described the challenge: “At The Weather Company, we manage 20 terabytes of new data a day, including real-time forecasting data from over 130,000 sources. The sheer volume of time series data requires databases that can efficiently and reliably store and query time series data. Riak TS delivers on this need and allows us to perform the associated queries and transactions on time series data while maintaining high availability and scale.”

Another good example is Intellicore, that decided to build its Sports Data Platform on a time series database which is used to provide live value-added analysis (called second screen) to spectators and broadcast audiences during sporting events such as the recent Formula E electric car championship. Intellicore acquired the live telemetry data from the Formula E racing cars, generating tens of thousands of events per second and redistributing that data normalised and analysed in real time to millions of live online viewers thanks to Riak TS.

This is only possible because these IoT applications are highly distributed like Riak TS. The Riak TS database is masterless and scales with just a few lines of code. As such, it provides fault-tolerant storage for the relentless stream of incoming data. If a server dies, the data are stored elsewhere, and another node will remain responsive to queries.

I appreciate that at this moment you might be saying “but that is not what it is like for most companies.” As businesses wake up to the possibilities, the number of organisations that will start to leverage IoT data for insight and competitive advantage will mean these sorts of use cases are quickly becoming mainstream.

[easy-tweet tweet=”Designing an IoT application using a time series database which scales efficiently means the cloud is the limit.” hashtags=”cloud, tech, IoT”]

By designing a distributed IoT application using a time series database which scales efficiently means the cloud is the limit. If you are planning to launch a distributed application, then you can now develop and launch into production for free using open source Riak TS – launched in April this year.

+ posts

AI Show - Episode 1 - Clare Walsh

Newsletter

Related articles

From the lab to production: Driving GenAI business success

Generative AI (GenAI) was the technology story of 2023....

AI Show – Episode 1 – Clare Walsh

We invite you to join us on an extraordinary...

Generative AI and the copyright conundrum

In the last days of 2023, The New York...

Cloud ERP shouldn’t be a challenge or a chore

More integrated applications and a streamlined approach mean that...

Top 7 Cloud FinOps Strategies for Optimising Cloud Costs

According to a survey by Everest Group, 67% of...

Subscribe to our Newsletter