Time Series Insights: A great IoT building block for your solution
I’ve been working on a few real-world IoT projects with our customers. A few months back, I didn’t know anything about Time Series Analytics (TSI), apart from the name.
So, I’ve decided to write a few blog posts, sharing my key learnings, and experiences I had while in some of these projects.
Of course, I had to start with an introduction to this Azure service. However, I’ll do it in a slightly different way – I’ll provide my opinion about its features and some interesting facts.
Let’s kick off with it!
Time Series Insights (TSI) is a fully-fledged Azure service, specially meant for IoT scenarios. It includes the storage (so it’s a database), visualization (it’s a ready-to-use dashboard), and its near real time. It’s an end-to-end solution that empowers you to analyze data from storage to analytics while offering queries capabilities together with a powerful and flexible user interface.
The final fact is that you can integrate TSI data into your applications products or solutions, by hitting REST query API.
Now, enough of this marketing sentences (but true)!
A few enterprises and system integrators join efforts building customized solutions to similar scenarios (ex: predictive maintenance). They typically use Apache Kafka / HDFS, InfluxDB, REDIS or other storage/database technologies. This also requires data cleansing, standardization besides supporting time series data streams. Now the best part: TSI automatically infers the schema of your incoming data, which means, that it requires no upfront data preparation. It also supports querying data over assets and time frames, with 400 days’ retention period.
Sweet! This is a tool that is just perfect for IoT solutions. Now let’s dive into some details:
TSI determines a concept of Environment that is no more than a logical grouping of events, which are read from event brokers.The event source is a connection to an event broker from which Time Series Insights reads and ingests events into the environment. Currently, it supports Event Hub and IoT Hub, services that also a have real-time nature. Nevertheless, in the future, we expect to have additional event sources to feed TSO.
And always remember that you can always combine, cross, transform data leveraging features in Azure Data Lake, HDInsight, Spark, Power BI or other similar open source technologies.
Now turning to data, TSI is able to process data in JSON format, which our life easier (think about EventHub and IoT Hub). The supported JSON shapes vary from simple JSON to more complex, like nested JSON containing multiple objects, arrays or events with two JSON objects.
You can also configure reference data in your TSI environment to join your incoming data, upon ingestion, augmenting the values, accordingly to your needs.
The data ingestion process will make sure your data is stored during a configurable retention period. There are no cores or storage to be configured – it’s managed by TSI itself.
Your data stored by TSI is actually using Azure Kusto, a technology created in-house, due to need! This is actually used, for quite some time, to log every single event on Azure (billions of events a day), and it is in public preview)
The relevant information is that ingestion, retention and capacity are all related and are important concepts to deeply understand. I’ll blog about this soon in more detail.
For now, take note of the limits that each SKU offer:
|SKU||Events Count Per Month, Per Unit||Events size Per Month, Per Unit||Events Count Per Minute, Per Unit||Size Per Minute, Per Unit|
|S1||30 million||30 GB||700||700 KB|
|S2||300 million||300 GB||7,000||7,000 KB|
And remember, each environment can be scaled up to 10 times by adding more units. One other important fact is that, currently, you cannot scale an S1 into an S2 SKU.
When TSI went GA, the fantastic 400 days of retention were announced, and are today available to you.
As the name implies, one of the TSI core goals is to provide data scientists, process engineers and asset operators query capabilities on near real-time data allowing them to focus on data analysis, decision making and KPI tracking. Having an intuitive user interface (TSI Explorer), users can construct queries without having to know query semantics.
TSI Explorer presents data through a basic line series trend type or through a heat map control (useful for spot deviation). Understanding asset behavior and performing root cause analytics, becomes a natural process, as you can drill down or zoom data, or specify a time segment for your analysis. In fact, data can be grouped, filtered, explored in any way, without having to think about indexing or waiting for an index to be updated.
Generated visualizations can be persisted across sessions as user queries so that common analytics scenarios can be re-used over time.
Besides the out of the box TSI explorer, enterprises may want to create custom applications, while leveraging the storage and query capabilities offered by TSI. For that there is a REST API focused on querying data and aggregations, getting information about the TSI environments, and the availability of data in different time segments. This can be useful when we want reporting data and then throw it on a client-specific dashboard.
Apart from customer’s and partner’s products built using TSI’s REST Apis, you have this a few IoT related solutions leveraging this same APIs:
Azure IoT Connected Factory - connect, monitor and control industrial devices for insights using OPC UA to drive operational productivity and profitability;
Microsoft IoT Central - A new software-as-a-service (SaaS) solution that reduces the complexity of solution management and cloud development with easy solution configuration;
In Time series, there are several concepts used by data scientists, academics and professionals that are interest to understand. For example, Time series represents a series of data points listed in time order.
A Unit of observation corresponds to the unit described by the data that one analyzes. For example, in a study of the demand for money, the unit of observation might be chosen as the individual, with different observations (data points) for a given point in time differing as to which individual they refer to; or the unit of observation might be the country, with different observations differing only in regard to the country they refer to.
Already mentioned, a Data point is a set of one or more measurements on a single member of a unit of observation at a point in time. For example, in a study of the determinants of money demand with the unit of observation being the individual, a data point might be the values of income, wealth, the age of an individual or number of dependents. In TSI’s data model, a data point is a synonym for an event.
‘Tag’ is the term used by operational historians with the semantics similar to the notion of time series as defined above. A typical tag is a series of timestamped measurements from a single instrument attached to a unit of observation, for example, a tag representing the flow rate of a pipe, a tag representing the valve state, etc. If an instrument emits multiple attributes (e.g. process value, setpoint, alarm state, upper limit), each of these attributes would normally produce their own tags. Note that a tag represents a single-variable time series, unlike a more generic case where time series data points may carry multiple kinds of measurements (income, wealth, age, etc.) inside each data point.
These concepts have been used for years in time series solutions. However, combining large amounts of data points (storage) and near real-time query capabilities, is not that common. For Azure Time Series Insights near real-time means that your data will be available for querying, shortly after ingestion occurs. This happens in the shorted period possible, not exceeding 60 seconds.
One final comment about TSI management in your Azure subscription – TSI exposes an ARM TEST API. This means you can take advantage of Azure Resource Manager (ARM) templates to define the infrastructure and configuration of a TSI resource.
As a big DevOps fan, this is mandatory in all my projects. It will allow you to easily complement your Continuous Integration and Continuous Delivery pipelines.
Check the sample in ARM Quick templates Github repo, to get started.
And that’s it, hope this becomes handy to your current challenges! For the next blog post, I’m getting more technical into several topics.
Please provide feedback, and suggestions on TSI topics to cover.