Architecture comparison of an IoT solution with and without Azure IoT hub and Stream Analytics.
Many of the studies are predicting the demands of IoT solutions in future and as an IoT solution architect I can also feel its symptoms. “IoT” is a generic term to represent the connected “things” through internet. “Things” are connected to communicate each other for certain purpose and which is having a huge and wider area and opportunities from all the perspectives. Even though the term “IoT” is quite new, it does not mean that humans are not used connected machines earlier.
In IoT projects we can have various sensors to generate data. Temperature sensors, pressure sensors, humidity sensors, light sensors, sound sensors, magnetic field sensors, water sensors, etc are using in various circumstances. But in many cases when we are working with advanced machines we may not require to place the sensors to generate the data, because many of the advanced machines having all these sensors attached and they would have the capability to communicate the data in various standard communication protocol like TCP using ether-net or any other way.
Lets take a simple use case to discuss the possibilities of Azure data streaming.
I have 3 “ice” factories which is located in 3 different locations, and each of these factories have 2 different units to produce ice. Each of these factories having the capability to produce 10 tons of ice on daily basis. Once production is over there are cold storages to keep the ice for a day till it get delivered to the customer carriers. The temperature in the cold storage need to be maintained effectively by monitoring the temperature continuously and action has to be taken in case any temperature variation.
In the production unit compressor pressure need to be monitored and any variation of pressure may lead to a serious accidents. So one major requirement is to get an alert to the operators and management if the pressure and temperature thresholds are getting violated so that they can take manual action to avoid any critical situation.
A possible solution without Azure IoT hub
*Here I have covered only the basic components.
To collect the temperature and pressure from each of the production unit and its cold storage we have placed the sensors. These sensors are not capable to send the data to external system, and not capable to store data as a buffer for the communication purpose as well. Because of all these reason we need to use a hardware called Gateway. Gateway is a hardware with capability to connect different sensors either in “wired” or in “blue tooth”. Apart from the sensor data, gateways can have its own configured parameters. In this article we are not going to discuss Gateway-sensor communication (EDGE) in detail and that will be discussed in a separate article.
* Only basic components and features are covered.
Using a lightweight messaging protocol we can connect the gateway to the cloud. Here we can use MQTT protocol based communication and we can collect data from these sensors in every second. That means in each second gateway of factory-1 can send 8 (4+4) data points to the cloud using MQTT protocol. At the time of sending the data to private cloud, Gateway can send the factory name and other configured information as well.
In the cloud we need to consume these data, and for that we would need a queue service as a best practice. Apache Kafka can be used as the Distributed Queue service and once the data is received by the MQTT receiver/broker the data need to be send to the Queue (Apache Kafka) immediately without any check.
The data processing layer need to pick the data from the other side of the queue. We need to use specific topics in MQTT broker and Kafka for input data, and we can use Spark stream processing jobs to process the real time data. There are multiple stages need to be defined in real time processing for initial filtering, cleaning and enriching. After the initial level filtering if any of the records are matching for any of “critical” action based on rule-engine definition, then those records need to send to initiate actions to the “Event-Manager” via “priority” queue.
A possible solution with Azure IoT hub and Stream Analytics job
In the case of Azure, we can use IoT hub as the data receiver at cloud, and using a stream analytics job. We will be able to filter records and can do data enriching also up to a level. We can use simple SQL statements to filter records in stream analytics job after setting “devices”, “input” and “output”. The usage of simple SQL makes it easy to use. During development phase diagnostic logs are really helpful and developers can easily manage logs. It is allowed use multiple queries in a single job and can use “joining” criteria also to select records from multiple devices. As this actions are happening in the early stage of data ingestion, we can chop down the records to the subsequent stages and can improve the performance as well.
The usage of distributed queues need to be incorporated based on the data volume and processing load.
Any heavy data processing need to be done using spark layer.
We will be able to incorporate any real time analytical model with machine-learning algorithm for any predictive maintenance requirement also because the predictive maintenance is one of the main feature of IoT solution.
The step by step illustration to create Azure IoT hub, Devices in IoT Hub, input, sample code to simulate sensor data, Stream Analytics jobs, queries used in Stream Analytics job to filter, output, storage-account and blobs etc will be covered with screen shot in the coming post and I am working on that.
Thank you for your time.