IoT – Scalability and Availability

by Uwe Meding

The Internet of Things (IoT) is a commercial reality. It is already changing the way that products are manufactured and sold, and the way we interact with them.

Every time a consumer interacts with your product, they will expect a swift, seamless experience. They don’t care if ten million other products are being operated concurrently; they only care that when they open their garage door via a smartphone, there’s an instant, 24X7 response. Failure to deliver that experience, whether due to high platform latency or an inability to scale your platform to handle the load, is not only unacceptable to consumers, but also highly damaging to the reputation of both your product and your brand.

To handle the challenge of this aggregated demand, it is important that you select a platform that is built with scalability, performance and availability in mind.

Some IoT platforms on the market today are made to handle the hundreds and perhaps thousands of smart products typically involved in a pilot project, but are ill-equipped to deal with a full commercial roll-out, let alone the enormous growth that the market anticipates (over 50 billion connected devices by 2020).

High Speed, Low Latency

The cloud delivers scale, offering both elasticity in terms of infrastructure and significant cost savings when compared to on-premise computation. But the cloud itself is not enough. With different architectural approaches to scaling in the cloud, it is vital to select a platform that responds to demand with the highest speed and lowest latency possible.

For example, most IoT platforms should be able to manage to connect a consumer product like a washing machine that communicates a message every second via standard IoT messaging protocols like HTTP, MQTT or CoAP. The real challenge lies in handling the considerable load increase that occurs when operating 1 million connected washing machines, without any noticeable latency or delay.

Latency is a problem for any product or service involved in the Internet of Things (IoT) — it’s affecting more and more businesses and vastly changing the way we view, receive, access, and use data. In a 2017 business study, for instance, 2/3 of companies reported that they were using the IoT in their operations:

The problem is that the things (e.g., sensors, gadgets, controls, etc.) of the connected devices depend on the responsiveness of a system or network to work effectively. High latency means delayed responsiveness. With delayed responsiveness comes the inability of the things to function to their full capacity — or even as they need to.

Guaranteed Availability

With 24×7 consumer demand on your smart products, your IoT platform must also offer superb availability. To achieve a minimum of 99.95% availability service level agreement (SLA), your chosen platform will need to offer auto-failover, auto-scaling and load balancing by default, as well as robust performance and security monitoring.

The impact of scalability and availability on a consumer product IoT platform is so far-reaching that a post-launch, piecemeal approach to solving scale is a high risk proposition. Be wary of promises to address core scale technology issues in ‘the near future’ — the success of your smart products line should not depend on a vendor who sees you as a guinea pig.

What Kind of Availability Metrics should you expect?

The types of SLA metrics required will of course depend on the services being provided. Many items can be monitored as part of an SLA, but the scheme should be kept as simple as possible to avoid confusion and excessive cost on either side. In choosing metrics, examine your operation and decide what is most important. The more complex the monitoring (and associated remedy) scheme, the less likely it is to be effective, since no one will have time to properly analyze the data. When in doubt, opt for ease of collection of metric data; automated systems are best, since it is unlikely that costly manual collection of metrics will be reliable.

Depending on the service, the types of metric to monitor may include:

Service availability: the amount of time the service is available for use. This may be measured by time slot, with, for example, 99.5 percent availability required between the hours of 8 am and 6 pm, and more or less availability specified during other times. E-commerce operations typically have extremely aggressive SLAs at all times; 99.999 percent up time is a not uncommon requirement for a site that generates millions of dollars an hour.

Defect rates: Counts or percentages of errors in major deliverables. Production failures such as incomplete backups and restores, coding errors/rework, and missed deadlines may be included in this category.

Technical quality: in outsourced application development, measurement of technical quality by commercial analysis tools that examine factors such as program size and coding defects.

Security: application and network security breaches can be costly. Measuring controllable security measures such as anti-virus updates and patching is key in proving all reasonable preventive measures were taken, in the event of an incident.

Business results: Increasingly, (industrial) IoT customers want to incorporate business process metrics into their SLAs. Using existing key performance indicators is typically the best approach as long as the vendor’s contribution to those KPIs can be calculated.