Month: August 2017

Azure IoT Stream Analytics step by step

IoT hub and stream analytics are two very good feature provided by Azure cloud, and here I am trying to showcase how we can use it for an IoT use case.
Steps in high level
1.  Create IoT Hub in Azure
2.  Create devices
In order to create the “IoT Hub”, first we need to login to the azure portal https://portal.azure.com with a valid id. (Free trials will be expire usually within one month)
Once you login you can see the below screen. You can ignore the blackened line as they are created for different use.
Click on “New” and enter “IoT Hub”
Use “Create” button to move to the next blade.
Select the “Pricing” as per your need but only one IoT hub can be created using “F1 Free”.


On click on “Create” button the deployment of new IoT Hub will be starting and we can see its status in the right top

1   Create device in IoT Hub
Click on the newly created “IoT Hub” from “All resource” page and then navigate to Device explorer

Once succeeded, we can see the new created resource in “All Resource” page.
1.    Create device in IoT Hub

Click on the newly created “IoT Hub” from “All resource” page and then navigate to Device explorer.

I have created 5 devices for the trial, and it looks as below

 

 Create stream analytics job

1   Create input in stream analytics job

You can see the “Input” link under Job topology of the Stream Analytics Job

Create output in stream analytics job.
You can see the “Output” link under Job topology of the Stream Analytics Job

In order to filter and save two different type of critical errors, I have created two output in this Stream analytic job and it looks are below.

In order to filter and save two different type of critical errors, I have created two output in this Stream analytic job and it looks are below.
1.    Now we need to send simulated data to the IoT hub devices and using a Stream Analytics query we will be able to process it
To send data I have used a sample Java code. In the Java application we need to feed to the IoT hub full name and device’s primary key.
(Download Sample java code)

Here using the java application I am sending below values along with the device-ids.
At the time of configuring the device/gateways we need to set the device-id based on a plan and usually this will be unique to the system.
{“temperature”:84,”heartbate”:159,”bladeload”:1267,”error_code”:1,”deviceId”:”manudevice4″}

After the data simulator, we can set the Query for stream analytics and I have used the below simple query to filter the records based on the error code to two different data set

——————-
SELECT
    deviceId as deviceid,heartbate as heartbeat,temperature as temperature,bladeload as bladeload,error_code as error_code,System.Timestamp as ts
INTO
manuoutputerror1devices
FROM
    manuinput where  error_code = 1
SELECT
    deviceId as deviceid,heartbate as heartbeat,temperature as temperature,bladeload as bladeload,error_code as error_code,System.Timestamp as ts
INTO
manuoutputerror2devices
FROM
    manuinput where  error_code = 2
——————-
Once the query setup is over, we can start the Steam Analytics job and then we need to execute the java program to send the data continuously, and then we will be able to see two output files in two output location based on the query filter we used in the Stream Analytics job.
Instead of data as csv, we will be able to send the data to another layer for further action without any delay.



IoT solution two approaches

Architecture comparison of an IoT solution with and without Azure IoT hub and Stream Analytics.

 
Many of the studies are predicting the demands of IoT solutions in future and as an IoT solution architect I can also feel its symptoms. “IoT” is a generic term to represent the connected “things” through internet. “Things” are connected to communicate each other for certain purpose and which is having a huge and wider area and opportunities from all the perspectives. Even though the term “IoT” is quite new, it does not mean that humans are not used connected machines earlier.
 
In IoT projects we can have various sensors to generate data. Temperature sensors, pressure sensors, humidity sensors, light sensors, sound sensors, magnetic field sensors, water sensors, etc are using in various circumstances. But in many cases when we are working with advanced machines we may not require to place the sensors to generate the data, because many of the advanced machines having all these sensors attached and they would have the capability to communicate the data in various standard communication protocol like TCP using ether-net or any other way.
Lets take a simple use case to discuss the possibilities of Azure data streaming.
 
Business Requirement

 

I have 3 “ice” factories which is located in 3 different locations, and each of these factories have 2 different units to produce ice. Each of these factories having the capability to produce 10 tons of ice on daily basis. Once production is over there are cold storages to keep the ice for a day till it get delivered to the customer carriers. The temperature in the cold storage need to be maintained effectively by monitoring the temperature continuously and action has to be taken in case any temperature variation. 
 
In the production unit compressor pressure need to be monitored and any variation of pressure may lead to a serious accidents. So one major requirement is to get an alert to the operators and management if the pressure and temperature thresholds are getting violated so that they can take manual action to avoid any critical situation.
 
A possible solution without Azure IoT hub
 
*Here I have covered only the basic components.
 
Picture 1
 
To collect the temperature and pressure from each of the production unit and its cold storage we have placed the sensors. These sensors are not capable to send the data to external system, and not capable to store data as a buffer for the communication purpose as well. Because of all these reason we need to use a hardware called Gateway. Gateway is a hardware with capability to connect different sensors either in “wired” or in “blue tooth”. Apart from the sensor data, gateways can have its own configured parameters. In this article we are not going to discuss Gateway-sensor communication (EDGE) in detail and that will be discussed in a separate article.
 
* Only basic components and features are covered.
 
Picture 2
 
Using a lightweight messaging protocol we can connect the gateway to the cloud. Here we can use MQTT protocol based communication and we can collect data from these sensors in every second. That means in each second gateway of factory-1 can send 8 (4+4) data points to the cloud using MQTT protocol. At the time of sending the data to private cloud, Gateway can send the factory name and other configured information as well.
 
In the cloud we need to consume these data, and for that we would need a queue service as a best practice. Apache Kafka can be used as the Distributed Queue service and once the data is received by the MQTT receiver/broker the data need to be send to the Queue (Apache Kafka) immediately without any check.
 
The data processing layer need to pick the data from the other side of the queue. We need to use specific topics in MQTT broker and Kafka for input data, and we can use Spark stream processing jobs to process the real time data. There are multiple stages need to be defined in real time processing for initial filtering, cleaning and enriching. After the initial level filtering if any of the records are matching for any of “critical” action based on rule-engine definition, then those records need to send to initiate actions to the “Event-Manager” via “priority” queue.
 
A possible solution with Azure IoT hub and Stream Analytics job
 
In the case of Azure, we can use IoT hub as the data receiver at cloud, and using a stream analytics job. We will be able to filter records and can do data enriching also up to a level. We can use simple SQL statements to filter records in stream analytics job after setting “devices”, “input” and “output”. The usage of simple SQL makes it easy to use. During development phase diagnostic logs are really helpful and developers can easily manage logs. It is allowed use multiple queries in a single job and can use “joining” criteria also to select records from multiple devices. As this actions are happening in the early stage of data ingestion, we can chop down the records to the subsequent stages and can improve the performance as well.
 
The usage of distributed queues need to be incorporated based on the data volume and processing load.
Any heavy data processing need to be done using spark layer.
 
We will be able to incorporate any real time analytical model with machine-learning algorithm for any predictive maintenance requirement also because the predictive maintenance is one of the main feature of IoT solution.
 
 
The step by step illustration to create Azure IoT hub, Devices in IoT Hub, input, sample code to simulate sensor data, Stream Analytics jobs, queries used in Stream Analytics job to filter, output, storage-account and blobs etc will be covered with screen shot in the coming post and I am working on that.
 
Thank you for your time.
-Manu pradeep
www.manupradeep.com