IoT

Time Series Insights: A real case capacity analysis

Introduction

I’ve been blog posting about Time Series Insights (TSI) and I’ll continue to do so. Let me first run a quick introduction to this Azure Service.

Time Series Insights (TSI) is a fully-fledged Azure service, specially meant for IoT scenarios. It includes the storage (so it’s a database), visualization (it’s a ready-to-use dashboard), and its near real time. It’s an end-to-end solution that empowers you to analyze data from storage to analytics while offering queries capabilities together with a powerful and flexible user interface.

Now, enough of this marketing sentences (but true)!

A few enterprises and system integrators join efforts building customized solutions to similar scenarios (ex: predictive maintenance). They typically use REDIS, Hadoop HDFS, InfluxDB, Elastic Search or other storage/database technologies (specialized per industries). This also requires data cleansing, standardization besides supporting time series data streams. Now the best part: TSI automatically infers the schema of your incoming data, which means, that it requires no upfront data preparation. It also supports querying data over assets and time frames, with 400 days’ retention period.

Sweet! This is a tool that is just perfect for IoT solutions, specifically tackling large-scale historian-like solutions.

Capacity

Like any IoT project, especially large-scale ones, planning is the keyword. With TSI, capacity planning should be done early and based on your expected data ingress rate.

This means, understanding data ingression and retention, is very important. Let’s dive into it:

Capacity \ SKU S1 S2
Storage per unit 1 30 GB or 30 million events 2 300 GB or 300 million events 2
Daily ingress per unit 1,3 1 GB or 1 million events 2 10 GB or 10 million events 2
Maximum retention 4,5 13 months 4 13 months 4
Maximum number of units 6 10 10

1 Ingress and total storage are measured by the number of events or data size, whichever comes first.
2 An event is a single unit of data with a timestamp. For billing purposes, we count events in 1-KB blocks. For example, a 0.8-KB actual event is billed as one event, but a 2.6-KB event is billed as three events. The maximum size of an actual event is 32 KB.
3 Ingress is measured per minute. S1 can ingress up to 720 events/minute/unit and S2 can ingress 7,200 events/minute/unit.
4 The data is retained in Time Series Insights based on the selected retention days or maximum limits.
5 Retention is configurable in the Azure portal. The longest allowable retention period is a rolling year of 12 months + 1 month, which is defined as 400 days.
6 An environment can be scaled up to 10 times by adding more units.

After configuring your event source (IoT Hub or Event Hub), TSI will always start ingesting the oldest events (FIFO), within the event source. TSI can ingress appx. 1Million events per day for every unit of S1 provisioned, and 10M events per day for every unit of S2 provisioned.

If you plan to upload historical data to TSI, it’s useful to increase the number of units provisioned for a brief period of time to allow TSI to ingest this historical data. The supported event sources can store data for up to 7 days.

Remember! TSI will always start ingress from the oldest event in the event source.

When the amount of incoming data exceeds your environment’s configuration, you may experience latency or throttling in TSI.

Throttling

Let’s start with a simple example: If you have five million events in an event source when you connect to an S1, single-unit TSI environment, TSI will read approximately one million events per day. This might appear to look as though TSI is experiencing 5 days of latency at first glance. In this scenario, the TSI environment is being throttled.

If you have old events in your event source, you can approach one of two ways:

  • Change your event source’s retention limits to help so they get removed from the TSI store;
  • Provision a larger environment size (in terms of number of units) to increase the throughput of old events.

Using the example above, if you increased that same S1 environment to five units for one day, the environment should catch-up to now within a day. When your steady state event production is 1M or fewer events/day, then you can reduce the capacity of the event back down to one unit after it has caught up.

The throttling limit is enforced based on the environment’s SKU type and capacity. All event sources in the environment share this capacity. If the event source for your IoT Hub or event hub is pushing data beyond the enforced limits, you see throttling and a lag.

Best way to identify and monitor throttling is through ingress metrics in both TSI and your event source. If you see a value for ingress received message lag time or ingress received message count lag, there’s a throttling problem. TSI’s environment metrics include:

TSI Metric Description
Ingress Received Bytes Count of raw bytes read from event sources. Raw count usually includes the property name and value.
Ingress Received Invalid Messages Count of invalid messages read from all Azure Event Hubs or Azure IoT Hub event sources.
Ingress Received Messages Count of messages read from all Event Hubs or IoT Hubs event sources.
Ingress Stored Bytes Total size of events stored and available for query. Size is computed only on the property value.
Ingress Stored Events Count of flattened events stored and available for query.
Ingress Received Message Time Lag Difference between the time that the message is enqueued in the event source and the time it is processed in Ingress.
Ingress Received Message Count Lag Difference between the sequence number of last enqueued message in the event source partition and sequence number of message being processed in Ingress.

Real Case scenario

To make use of these learnings, we just went through, nothing better than a real case scenario. You can use it as an example, accessing or planning your TSI environment.

The business scenario is around an enterprise collecting huge amount of data around vessels navigating throughout the world. They have the usual requirements around storage, and near-real-time querying. With maritime operators, they need to do analysis on data, detect and prevent anomalies.

Once we decided to provision Time Series Insights, we noticed data was missing or didn’t match up correctly. This means vessel precise positioning. We started the capacity analysis immediately:

The analysis done covered a total of 15 days (360 hours). The provisioned TSI was an S1 SKU with 1 Capacity unit - The minimum possible.

After looking at the new metrics available in Azure, for the TSI resource, best approach was to write down metrics:

TOTAL (15 Days = 360 hours)

  • Ingress Received Messages: 9,13M = 9130000
  • Ingress Stored Events: 31,71M = 31710000
  • Ingress Received Bytes: 11,8 GB = 11800000 KB
  • Ingress Stored Bytes: 5 GB = 5000000 KB

AVERAGE /hour

  • Ingress Received Messages: 9130000 / 360 = 25.361,1 messages
  • Ingress Received Bytes: 11800000 / 360 = 32.777,8 bytes
  • Message Size = 1,292 bytes
  • Ingress Stored Events: 31710000 / 360 = 88.083,3 events
  • Ingress Stored Bytes: 5000000 / 360 = 13.888,9 bytes
  • Event Size = 0,157 bytes

Starting by storage, it did not represent any issue, as storage usage was way inferior then the maximum storage capacity (for the 1x S1): 30 GB (30 million events) per month.

Ingress Received Bytes and Ingress Stored Bytes (sum)

Shifting the analysis to ingestion throughput, consider the following images with detailed information about message and events volume:

Message / Events ingest metrics (sum)

Looking at the TSI thresholds for a 1 Capacity unit of S1 SKU:

  • (1 million events) per day
  • or (41.666 events) per hour
  • or (694) per minute
  • or (11,5) per sec

Always check your event source data volume and throughput of ingestion. For this real scenario, vent Hub was the single event source for TSI. For a complete analysis, we must evaluate the existence the EventHub volume of data and its retention capacity.

Event Hub Metric Description
Incoming Messages The number of events or messages sent to Event Hubs over a specified period.
Outgoing Messages The number of events or messages retrieved from Event Hubs over a specified period.
Incoming Bytes The number of bytes sent to the Azure Event Hubs service over a specified period.
Outgoing Bytes The number of bytes retrieved from the Azure Event Hubs service over a specified period.

This particular Event Hub was provisioned with a Standard SKU, 16 Throughput Units, auto-inflate on set to 20 Upper Limit.

Main goal is always to measure the volume of data being sent into TSI. Let me describe the metrics collected also from the Azure Portal and Event Hub Metrics tab:

TOTAL (15 Days = 360 hours)

  • Incoming Messages: 26.5 6M = 26500000
  • Incoming Bytes: 60.6 GB = 60600000 KB
  • Outgoing Messages: 343.96 M = 343960000
  • Outgoing Bytes: 750.6 GB = 750600000 KB

AVERAGE /hour

  • Incoming Messages: 26500000 / 360 = 73611,1 messages = 20,4 messages/sec
  • Incoming Bytes: 60600000 / 360 = 168333,3 bytes = 168.3 KB = 0,4675 KB/sec
  • Outgoing Messages: 343960000 / 360 = 955444,4 messages = 2654 messages/sec
  • Outgoing Bytes: 750600000 / 360 = 2085000 bytes = 2085 KB = 0,5791 KB/sec

Message metrics - Incoming Messages

Message metrics - Outgoing Messages

Message Metrics - Outgoing Bytes

Request metrics

Looking at the EventHub thresholds in a STANDARD SKU:

  • Max event size: 256 KB
  • Throughput Unit 16x (1000 events/sec or 1 MB/sec ingress, 2000 events/sec or 2 MB/sec egress) = (16K events/sec or 16 MB/sec ingress, 24K events/sec or 24 MB/sec egress)
  • Maximum Storage Capacity: 7 days (1 day included)

Conclusions

The precepted latency in TSI’s data ingestion was due to high volume of data throughput and the existence of historical data when TSI was provisioned. This caused throttling and a factual ingestion throughput of 2,5M distinct data points when the expected was ~10M;

The recommended TSI capacity should have been 2x capacity S2, for the expected data ingress rate. However, because there was historical data, the best decision should have been to temporarily configure TSI with a 2x capacity S2;

Note: As for now, you cannot change dynamically the provisioned SKU in TSI. If you provisioned an S1 SKU, and require an S2 KU, a new TSI environment must be created, and data ingestion will begin again (from oldest event in your event source). Nevertheless, you can dynamically change SKUs capacity from 1 to 10 (max). We expect bring this option in the future, as it brings great flexibility to customers.

Exploring Time Series Insights queries with Postman

As with any other service in Microsoft Azure, there are RESTful APIs to interact with. Time Series Insights (TSI) is no exception.

One of the best ways to explore the Azure REST APIs is to use Postman. Postman has been maturing for the past few years and is nowadays used widely to build modern software for the API-first world. Besides the sleek UI allows you to publish, monitor, document, test, but mostly to design and mock APIs.

One of the biggest blocker when using Azure REST APIs is related to authentication, requiring a Bearer Token Authorization header. Read the documentation, has it is extremely relevant when working with any service in Microsoft Azure. However, the documentation doesn’t explain you how to get started quick-and-dirty!

Coming back to TSI, when working on IoT projects we follow the IoT Reference architecture and implement specific scenarios accordingly to the requirements. One of the usual requirements is related to providing an always easily accessible UI for analysis purposes, especially oriented to the operator’s daily needs. TSI offers out-of-the-box a powerful TSI explorer – a UI that corresponds to a managed visualization service for managing IoT-scale time-series data in the cloud.

Time Series Insights explorer

Yes, it’s the perfect tool factory or power plant operators/advanced users that require granular, free-text query and point & click exploration.

Beyond this, TSI exposes REST Query APIs, enabling you to build applications that use time series data. And this fact makes TSI even more flexible for your needs!

Querying TSI will be one of the first things you’ll try before getting into more advanced topics – such as capacity, managing input data throttling, scaling the environment or even building applications with TSI data.

To use Postman and Azure Rest APIs, let’s start by authentication. By creating an Azure Active Directory Service Principal and using Postman to generate a Bearer Token, we’ll have things ready to start calling the TSI query APIs.

Azure Setup

Generally speaking, in Azure, authorization is implemented with Service Principal and application objects and their relationships. Here we’ll configure a default Service Principal. For Production scenarios/applications, you should constrain to specific areas of your Azure resources. I’ll assume you have azure cli installed and ready to run on your Windows/Mac/Linux box. If not, check it here.

First authentication on your Azure subscription:

Az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code XXXXXXXXX to authenticate.

Just do as you’re told. Head to http://aka.ms/devicelogin and enter the code shown in your shell.

If you’re like me, you’ll have access to several azure subscriptions, so make sure you explicitly pick one:

az account set --subscription "subscription name or id"

Next, we’ll create the default Service Principal

az ad sp create-for-rbac -n "your service principal name"

I always copy the output to a temp location, because we’ll need it later.

For additional information on Azure CLI commands related to Service Principal, just take a look here.

Get the Bearer Token with Postman

I’ll assume you’ll have Postman installed. If not just check here. For every Azure REST API call, you must provide your client code to authenticate with valid credentials. This is called the access token and represents a proof of the authentication – it is sent to the Azure service in the HTTP Authorization header of any subsequent REST API requests.

Explanation:
In accordance with the OAuth2 Authorization Framework, Azure Platform makes available a platform- and language-neutral OAuth2 service endpoint (OAuth2 /token endpoint), that you use to authenticate your client and acquire an access token. Depending on how you use it, you’ll probably following an OAuth2 authorization grant flow.

Now close (yes, make sure your Postman is closed), and click this button:

Run in Postman

You’ll notice your browser popped up with the following screen:

Whether you use postman in your browser or the app, make your pick!

Now, there will be a new collection in Postman, once it opens. It’s called TSI Azure REST and includes a set of calls that will help you tackle authorization for Azure REST API calls. Let’s take a closer look and get familiar with them:

The Get AAD Token Request will POST to https://login.microsoftonline.com/oauth2/token with our Service Principal settings and then, in the “Tests” tab there is a script that will set a Postman Global Variable called azure_bearerToken with the access_token in the response.

The TSI ENVIRONMENTS will GET https://api.timeseries.azure.com/environments?api-version=2016-12-12 with an Authorization header set to the Bearer Token collected with the ‘Get AAD Token’ call.

Like TSI ENVIRONMENTS, the following requests also target the TSI REST API: TSI AVAILABILITY, TSI METADATA, TSI QUERY AGGREGATES and TSI QUERY EVENTS. The purpose of this requests is for you to be able to quickly query your data, changing the queries accordingly to your data. At the end of this post, I leave a brief explanation about these requests.

Because we will be executing several REST requests to TSI REST API, let’s use Postman environments. By clicking Run in Postman an environment was already created for your use. Next step is to set your Service Principal settings in the environment and be used in the requests. Click the gears icon in the upper right-hand corner, and select Manage Environments:

Click on the TSI Azure Rest Environment and you will see all the required settings:

Now, get back to the Service Principal settings info you saved temporarily. Set the values accordingly. Some info may be confusing – but no worries – you should put appID into clientId and password into clientSecret. The rest, as the name implies.

Close all dialogs, and in main Postman screen, check if the TSI Azure REST environment is selected: In the Environment dropdown in the upper right-hand corner of Postman.

OK, all configuration has been done, and we’re ready to execute the requests!

Let’s start by executing the Get AAD Token request. This will provide you your Bearer Token and set it in a Postman global variable. Open the Get AAD Token request and click the Send button.

The expected output should be similar to:

This request has ‘Test’ script setup, and it was executed immediately after the request. The script set the access_token property in the Postman global variable, named azure_bearerToken.

TSI Data Access Policies

To access data in TSI, it’s necessary to data access policy to Azure Active Directory principals (users or apps). This will allow principals to issue data queries, manipulate reference data in the environment, and share saved queries and perspectives.

In this case, we will provide data access to the Service Principal we created initially. In Azure Portal, add a new Data Access Policy in TSI. When asked for the selected user, place the appID (taken from your notes, when creating the Service Principal).

Querying TSI

Now that we have the access token and the correct TSI Data Access policies, we can call any TSI REST API endpoint. To start, we’ll use TSI ENVIRONMENTS request.

Open the TSI ENVIRONMENTS request and click the Send button.

You will see the following output:

Again, this request has ‘Test’ script setup, that will create a new Postman global variable called tsi_environmentFqdn with the first environment FQDN returned.

That’s all there is to it. Now you can explore all the other TSI Azure REST APIs, and use this same method to generate the required Bearer Token Authorization header.

Feel free to provide feedback or ping me if you hit a blocker.

Appendix

Here, I’ll summarize the rest of TSI queries included in the shared Postman collection:

TSI AVAILABILITY: this request will query TSI for the distribution of event count over the event timestamp $ts. This means you’ll be able to understand the time range of data ingested into TSI. Also, you’ll be able to understand the volume of events in a 1h distribution.

Also notice, that you will be hitting your TSI environment FQDN, through the usage of tsi_environmentFqdn.

The ‘Tests’ tab will execute a script creating the global Postman variables named tsi_range_from and tsi_range_to. These variables will be used in subsequent queries.

TSI METADATA: A request to collect your TSI environment metadata for a given search span. The metadata returned is as a set of property references.

Notice how we’re using tsi_range_from and tsi_range_to in the request body, to define the search span. Feel free to play with your search ranges.

TSI QUERY AGGREGATES will group events by given property with optionally measuring values of other properties.

NOTE: The query used in the request body is an example query taken from the documentation. You should change the query accordingly to your data schema.

TSI QUERY EVENTS: This query will probably be the one you’ll use the most, specially when understanding query patterns. It will return a list of raw events matching the search span and predicate.

NOTE The query used in the request body is an example query taken from the documentation. You should change the query accordingly to your data schema.

Time Series Insights: A great IoT building block for your solution

I’ve been working on a few real-world IoT projects with our customers. A few months back, I didn’t know anything about Time Series Analytics (TSI), apart from the name.

So, I’ve decided to write a few blog posts, sharing my key learnings, and experiences I had while in some of these projects.

Of course, I had to start with an introduction to this Azure service. However, I’ll do it in a slightly different way – I’ll provide my opinion about its features and some interesting facts.

Let’s kick off with it!

Time Series Insights (TSI) is a fully-fledged Azure service, specially meant for IoT scenarios. It includes the storage (so it’s a database), visualization (it’s a ready-to-use dashboard), and its near real time. It’s an end-to-end solution that empowers you to analyze data from storage to analytics while offering queries capabilities together with a powerful and flexible user interface.

The final fact is that you can integrate TSI data into your applications products or solutions, by hitting REST query API.

Now, enough of this marketing sentences (but true)!

A few enterprises and system integrators join efforts building customized solutions to similar scenarios (ex: predictive maintenance). They typically use Apache Kafka / HDFS, InfluxDB, REDIS or other storage/database technologies. This also requires data cleansing, standardization besides supporting time series data streams. Now the best part: TSI automatically infers the schema of your incoming data, which means, that it requires no upfront data preparation. It also supports querying data over assets and time frames, with 400 days’ retention period.

Sweet! This is a tool that is just perfect for IoT solutions. Now let’s dive into some details:

Sources, Ingestion, Data and Storage

TSI determines a concept of Environment that is no more than a logical grouping of events, which are read from event brokers.The event source is a connection to an event broker from which Time Series Insights reads and ingests events into the environment. Currently, it supports Event Hub and IoT Hub, services that also a have real-time nature. Nevertheless, in the future, we expect to have additional event sources to feed TSO.

And always remember that you can always combine, cross, transform data leveraging features in Azure Data Lake, HDInsight, Spark, Power BI or other similar open source technologies.

Now turning to data, TSI is able to process data in JSON format, which our life easier (think about EventHub and IoT Hub). The supported JSON shapes vary from simple JSON to more complex, like nested JSON containing multiple objects, arrays or events with two JSON objects.

You can also configure reference data in your TSI environment to join your incoming data, upon ingestion, augmenting the values, accordingly to your needs.

The data ingestion process will make sure your data is stored during a configurable retention period. There are no cores or storage to be configured – it’s managed by TSI itself.

Your data stored by TSI is actually using Azure Kusto, a technology created in-house, due to need! This is actually used, for quite some time, to log every single event on Azure (billions of events a day), and it is in public preview)

The relevant information is that ingestion, retention and capacity are all related and are important concepts to deeply understand. I’ll blog about this soon in more detail.

For now, take note of the limits that each SKU offer:

SKU Events Count Per Month, Per Unit Events size Per Month, Per Unit Events Count Per Minute, Per Unit Size Per Minute, Per Unit
S1 30 million 30 GB 700 700 KB
S2 300 million 300 GB 7,000 7,000 KB

And remember, each environment can be scaled up to 10 times by adding more units. One other important fact is that, currently, you cannot scale an S1 into an S2 SKU.

When TSI went GA, the fantastic 400 days of retention were announced, and are today available to you.

Query, Data Visualization and APIs

As the name implies, one of the TSI core goals is to provide data scientists, process engineers and asset operators query capabilities on near real-time data allowing them to focus on data analysis, decision making and KPI tracking. Having an intuitive user interface (TSI Explorer), users can construct queries without having to know query semantics.

TSI Explorer presents data through a basic line series trend type or through a heat map control (useful for spot deviation). Understanding asset behavior and performing root cause analytics, becomes a natural process, as you can drill down or zoom data, or specify a time segment for your analysis. In fact, data can be grouped, filtered, explored in any way, without having to think about indexing or waiting for an index to be updated.

Generated visualizations can be persisted across sessions as user queries so that common analytics scenarios can be re-used over time.

Besides the out of the box TSI explorer, enterprises may want to create custom applications, while leveraging the storage and query capabilities offered by TSI. For that there is a REST API focused on querying data and aggregations, getting information about the TSI environments, and the availability of data in different time segments. This can be useful when we want reporting data and then throw it on a client-specific dashboard.

Apart from customer’s and partner’s products built using TSI’s REST Apis, you have this a few IoT related solutions leveraging this same APIs:

  • Azure IoT Connected Factory - connect, monitor and control industrial devices for insights using OPC UA to drive operational productivity and profitability;

  • Microsoft IoT Central - A new software-as-a-service (SaaS) solution that reduces the complexity of solution management and cloud development with easy solution configuration;

Analytics, Time Series and Near Real Time

In Time series, there are several concepts used by data scientists, academics and professionals that are interest to understand. For example, Time series represents a series of data points listed in time order.

A Unit of observation corresponds to the unit described by the data that one analyzes. For example, in a study of the demand for money, the unit of observation might be chosen as the individual, with different observations (data points) for a given point in time differing as to which individual they refer to; or the unit of observation might be the country, with different observations differing only in regard to the country they refer to.

Already mentioned, a Data point is a set of one or more measurements on a single member of a unit of observation at a point in time. For example, in a study of the determinants of money demand with the unit of observation being the individual, a data point might be the values of income, wealth, the age of an individual or number of dependents. In TSI’s data model, a data point is a synonym for an event.

‘Tag’ is the term used by operational historians with the semantics similar to the notion of time series as defined above. A typical tag is a series of timestamped measurements from a single instrument attached to a unit of observation, for example, a tag representing the flow rate of a pipe, a tag representing the valve state, etc. If an instrument emits multiple attributes (e.g. process value, setpoint, alarm state, upper limit), each of these attributes would normally produce their own tags. Note that a tag represents a single-variable time series, unlike a more generic case where time series data points may carry multiple kinds of measurements (income, wealth, age, etc.) inside each data point.

These concepts have been used for years in time series solutions. However, combining large amounts of data points (storage) and near real-time query capabilities, is not that common. For Azure Time Series Insights near real-time means that your data will be available for querying, shortly after ingestion occurs. This happens in the shorted period possible, not exceeding 60 seconds.

Infrastructure as Code

One final comment about TSI management in your Azure subscription – TSI exposes an ARM TEST API. This means you can take advantage of Azure Resource Manager (ARM) templates to define the infrastructure and configuration of a TSI resource.

As a big DevOps fan, this is mandatory in all my projects. It will allow you to easily complement your Continuous Integration and Continuous Delivery pipelines.
Check the sample in ARM Quick templates Github repo, to get started.

And that’s it, hope this becomes handy to your current challenges! For the next blog post, I’m getting more technical into several topics.
Please provide feedback, and suggestions on TSI topics to cover.

Also, have to thank Andrew Shannon (@AndrewBShannon) and TSI team (@MSFTAzureTSI) for all help and guidance!