Category: IoT

Internet of things

Azure IoT Stream Analytics step by step

IoT hub and stream analytics are two very good feature provided by Azure cloud, and here I am trying to showcase how we can use it for an IoT use case.
Steps in high level
1.  Create IoT Hub in Azure
2.  Create devices
In order to create the “IoT Hub”, first we need to login to the azure portal with a valid id. (Free trials will be expire usually within one month)
Once you login you can see the below screen. You can ignore the blackened line as they are created for different use.
Click on “New” and enter “IoT Hub”
Use “Create” button to move to the next blade.
Select the “Pricing” as per your need but only one IoT hub can be created using “F1 Free”.

On click on “Create” button the deployment of new IoT Hub will be starting and we can see its status in the right top

1   Create device in IoT Hub
Click on the newly created “IoT Hub” from “All resource” page and then navigate to Device explorer

Once succeeded, we can see the new created resource in “All Resource” page.
1.    Create device in IoT Hub

Click on the newly created “IoT Hub” from “All resource” page and then navigate to Device explorer.

I have created 5 devices for the trial, and it looks as below


 Create stream analytics job

1   Create input in stream analytics job

You can see the “Input” link under Job topology of the Stream Analytics Job

Create output in stream analytics job.
You can see the “Output” link under Job topology of the Stream Analytics Job

In order to filter and save two different type of critical errors, I have created two output in this Stream analytic job and it looks are below.

In order to filter and save two different type of critical errors, I have created two output in this Stream analytic job and it looks are below.
1.    Now we need to send simulated data to the IoT hub devices and using a Stream Analytics query we will be able to process it
To send data I have used a sample Java code. In the Java application we need to feed to the IoT hub full name and device’s primary key.
(Download Sample java code)

Here using the java application I am sending below values along with the device-ids.
At the time of configuring the device/gateways we need to set the device-id based on a plan and usually this will be unique to the system.

After the data simulator, we can set the Query for stream analytics and I have used the below simple query to filter the records based on the error code to two different data set

    deviceId as deviceid,heartbate as heartbeat,temperature as temperature,bladeload as bladeload,error_code as error_code,System.Timestamp as ts
    manuinput where  error_code = 1
    deviceId as deviceid,heartbate as heartbeat,temperature as temperature,bladeload as bladeload,error_code as error_code,System.Timestamp as ts
    manuinput where  error_code = 2
Once the query setup is over, we can start the Steam Analytics job and then we need to execute the java program to send the data continuously, and then we will be able to see two output files in two output location based on the query filter we used in the Stream Analytics job.
Instead of data as csv, we will be able to send the data to another layer for further action without any delay.

IoT solution two approaches

Architecture comparison of an IoT solution with and without Azure IoT hub and Stream Analytics.

Many of the studies are predicting the demands of IoT solutions in future and as an IoT solution architect I can also feel its symptoms. “IoT” is a generic term to represent the connected “things” through internet. “Things” are connected to communicate each other for certain purpose and which is having a huge and wider area and opportunities from all the perspectives. Even though the term “IoT” is quite new, it does not mean that humans are not used connected machines earlier.
In IoT projects we can have various sensors to generate data. Temperature sensors, pressure sensors, humidity sensors, light sensors, sound sensors, magnetic field sensors, water sensors, etc are using in various circumstances. But in many cases when we are working with advanced machines we may not require to place the sensors to generate the data, because many of the advanced machines having all these sensors attached and they would have the capability to communicate the data in various standard communication protocol like TCP using ether-net or any other way.
Lets take a simple use case to discuss the possibilities of Azure data streaming.
Business Requirement


I have 3 “ice” factories which is located in 3 different locations, and each of these factories have 2 different units to produce ice. Each of these factories having the capability to produce 10 tons of ice on daily basis. Once production is over there are cold storages to keep the ice for a day till it get delivered to the customer carriers. The temperature in the cold storage need to be maintained effectively by monitoring the temperature continuously and action has to be taken in case any temperature variation. 
In the production unit compressor pressure need to be monitored and any variation of pressure may lead to a serious accidents. So one major requirement is to get an alert to the operators and management if the pressure and temperature thresholds are getting violated so that they can take manual action to avoid any critical situation.
A possible solution without Azure IoT hub
*Here I have covered only the basic components.
Picture 1
To collect the temperature and pressure from each of the production unit and its cold storage we have placed the sensors. These sensors are not capable to send the data to external system, and not capable to store data as a buffer for the communication purpose as well. Because of all these reason we need to use a hardware called Gateway. Gateway is a hardware with capability to connect different sensors either in “wired” or in “blue tooth”. Apart from the sensor data, gateways can have its own configured parameters. In this article we are not going to discuss Gateway-sensor communication (EDGE) in detail and that will be discussed in a separate article.
* Only basic components and features are covered.
Picture 2
Using a lightweight messaging protocol we can connect the gateway to the cloud. Here we can use MQTT protocol based communication and we can collect data from these sensors in every second. That means in each second gateway of factory-1 can send 8 (4+4) data points to the cloud using MQTT protocol. At the time of sending the data to private cloud, Gateway can send the factory name and other configured information as well.
In the cloud we need to consume these data, and for that we would need a queue service as a best practice. Apache Kafka can be used as the Distributed Queue service and once the data is received by the MQTT receiver/broker the data need to be send to the Queue (Apache Kafka) immediately without any check.
The data processing layer need to pick the data from the other side of the queue. We need to use specific topics in MQTT broker and Kafka for input data, and we can use Spark stream processing jobs to process the real time data. There are multiple stages need to be defined in real time processing for initial filtering, cleaning and enriching. After the initial level filtering if any of the records are matching for any of “critical” action based on rule-engine definition, then those records need to send to initiate actions to the “Event-Manager” via “priority” queue.
A possible solution with Azure IoT hub and Stream Analytics job
In the case of Azure, we can use IoT hub as the data receiver at cloud, and using a stream analytics job. We will be able to filter records and can do data enriching also up to a level. We can use simple SQL statements to filter records in stream analytics job after setting “devices”, “input” and “output”. The usage of simple SQL makes it easy to use. During development phase diagnostic logs are really helpful and developers can easily manage logs. It is allowed use multiple queries in a single job and can use “joining” criteria also to select records from multiple devices. As this actions are happening in the early stage of data ingestion, we can chop down the records to the subsequent stages and can improve the performance as well.
The usage of distributed queues need to be incorporated based on the data volume and processing load.
Any heavy data processing need to be done using spark layer.
We will be able to incorporate any real time analytical model with machine-learning algorithm for any predictive maintenance requirement also because the predictive maintenance is one of the main feature of IoT solution.
The step by step illustration to create Azure IoT hub, Devices in IoT Hub, input, sample code to simulate sensor data, Stream Analytics jobs, queries used in Stream Analytics job to filter, output, storage-account and blobs etc will be covered with screen shot in the coming post and I am working on that.
Thank you for your time.
-Manu pradeep

Postgresql Disaster recovery plan

Disaster recovery plan – an overview.

Disaster recovery plan is the preparation to keep the system up and running after a complete crash of hardware/software system. It may not be completely automated and may need a downtime and manual effort to retrieve the readiness of the system. But the effort and downtime should be minimal as much as possible. This ensures the business continuity and confidence in a software system.
In the case of software we have,
1. Application, called as front-end through which users are interacting to the system. This includes UI component, server component, middle ware, supporting libraries, scripts etc.

2. Database, called as the back-end in which all the application data are kept.
In the case of a disaster we may lose all these hardware and software, and here we are discussing a plan to make the software system up and running as just before to the crash.
In the case of application we can keep the required files in a server which is located in a different geographical region, and usually there is no frequent change in those files except the software upgrades. As a best practice, we need to keep the copies of all deployed code along with the development code with the support of a version control tool.
In the case of database, the data is varying and its frequency can be microsecond/second based on the usage/type of software. We have to retrieve the database to the possible nearest point just before the disaster.
Copy of a database on a particular time can be saved as a backup file. This file can be archived and restored up on requirement. But based on the data/usage/hardware capacity the time of backup-generation may be varying from minutes to hours, and heavy database backups may be hundreds of MBs to several TBs.

Back-up of database.

Backup of a database is the snap shot of a database at a time. All the committed transactions till that point will be available in a backup.
          In the case of postgresql we can generate two types of backup, physical and logical.


Physical backup

Physical backup is the copy of data stored by postgresql. Apart from the actual stored data, Postgresql engine uses storage space to keep temporary processing data in binary format. This particular data is not required to be backed-up during the physical backup. During the server set-up and maintenance it is very important to keep the required free space in the storage device/location for the smooth performance of this database system.

Logical backup

Logical backup is a file generated by by postgresql in the required format which can be opted during the backup generation process. The logical backup cannot be generated if the postgresql service is not running. The logical backup is not recommended as a standalone backup in critical cases due to various reasons.

Transaction log-backup

Apart from these two types of backup there is one more data extract, which is called transaction log backup. This is a piece of data that can be extracted from a database server which contains a set of transactions in binary format that was committed through normal database operation. The file size of transaction log backup is comparatively small. This file can be restored to another server to make it up to date.
As a “DR” plan we should be ready with a copy of database which is up to date.

Log Shipping


Log shipping is a well-known method to keep a copy of database up to date as the actual database. In the perspective of log shipping, the actual database is called as primary database and copy is called as secondary.
As discussed, the database may get updated at higher frequency, and it is not practical to generate the full database backup at the same frequency. But we can generate the transaction backup from the primary database and can add to the secondary database to keep the secondary database up to date. As transaction backup is the extraction of transactions committed in the database in the serialised order, restoration of transaction backup should be done in the same order in which they are generated. 
The file based log-shipping is available since PostgreSQL 8.2. We can enable the transaction log generation by few configuration settings. This transaction logs can be shipped to the secondary and can be configured to restore the transaction log file.

Initial setup

Log shipping can be configured between two ‘postgresql servers of same version. As a best practice, use same hardware configuration for these two servers, and install same version of postgresql server. Create a sample user database in one server, generate base-backup of the primary database and restore it to the secondary.

Continuous activities

1.    Generate the transaction log backup.
2.    Move it to the secondary server immediately.
3.    Set the permissions for the file, if required.
4.    Restore it to the secondary one by one in the serial order.

Steps for initial configuration of file based log shipping.

1.    Identify two individual postgresql servers which can communicate with each other. Make sure that you have taken the required backup from each of these servers. The data of these two servers may be lost/corrupt as part of this action.
2.    Identify the location where you want to keep the log files in the server, and ensure the space availability. Set environment variable $PGARCHIVE to the path. This needs to be set both on the master and standby servers. Master should be able to write to this location and standby should be able to read from it. Use another environment variable &STANDBYNODE to identify the standby server from master.
3.    Change parameters in the “postgresql.conf” file to generate the log files. 
wal_level = ‘archive’
archive_mode = on
archive_command = ‹scp %p $STANDBYNODE:$PGARCHIVE/%f›
The first two configuration values will enable the log generation and the third one will  save the transaction log files to the specified location. If there is any issue in directly saving the log file to the standby server, we can save it to the local server and can ship to the standby server using another job/tool. In such cases, we can use the below command as “archive_command”.
archive_command = ‹ cp -i %p ../standalone/archive/%f ›
Here “standalone” folder and “data” folder are in the same hierarchical order.  Permission for the log files should be set immediately after shipping them to the destination  for  restoration, otherwise the “PostgreSQL” engine won’t be able to  read it, and will log an error.
If you are planning to save the log files in the same server and then shipping to the secondary, you should be ready with a scheduled script which can do the log-file shipping on a fixed time interval. The time interval can be finalised based on the network traffic and data sensitivity.
“scp” command can be used in the script to do this and once shipped, the log files can be removed or archived based on the plan.
4.    Start backup
psql -c “select pg_start_backup(‘base backup for log shipping’)”
5.    Copy “data” files exclude “pg_xlog” folder.
“rsync” or tar command can be used for this. If your secondary server requires any security key, it should be set before using “rsync” command. Issues may be faced with the “rsync” command  due to some restrictions in environment.
As an alternative to  “rsync” command, we can save the log transaction files to the same server, “tar” them and send to the secondary.  In the secondary, replace the “data” files with the copied files from primary server. Permissions should be set for the postgreSQL user.
The additional folders which are created to hold the transaction backups in the primary should be maintained in the secondary server as well.
6.    Stop backup
psql -c “select pg_stop_backup(), current_timestamp”
7.    Set the recovery.conf parameters on the Standby server
standby_mode = ‘on’
restore_command = ‹cp $PGARCHIVE/%f %p›
8.   Start the secondary server.

An another approach

1.    Generate the transaction-log backup.
2.    Move it to the secured backup server.
3.    Distribute it to the multiple database servers.
4.    Restore it to the database.
5.    Remove the transaction logs from the database servers.
This approach of keeping multiple replica of a database is applicable in the for very critical applications. The CPU/data-read load can be distributed to different servers so that the primary database can be used more for normal user transactions. 

Streaming Replication

What is streaming replication?

In Log shipping, the master database WAL files will change based on the transactions and then the logs will be shipped to the secondary server serially for replay. Apart from this, Postgresql has another advanced feature released in 9.0 where the logs can be directly send to the secondary through the normal database communication channel. This method is more secure and minimizes the replication delay.
We can configure the streaming replication along with the log-archival facility based on our requirement. This will give an additional flexibility if we are working on highly critical data.
          Here we are discussing steps which enable archiving along with streaming replication.

Steps to configure.

1.    Identify your Master and Standby servers. Ensure that these two servers can connect through the postgresql-port.
2.    Create a user(repuser) for replication in Master server.
3.  Set proper permission for the “repuser” . The following entry to the pg_hba.conf  file, sets access from any ip address (using encrypted password authentication)  to the server( you may wish to consider more restrictive options).
host replication repuser <<ip – of standby>> md5
4.  Set logging options in postgresql.conf, in both Master and Standby, so that we can collect more information regarding replication connection attempts and associated failures.
log_connections = on
5. Set the below mentioned parameters in postgresql.conf.
max_wal_senders = 1   
wal_keep_segments = 50
hot_standby = on
wal_level = hot_standby
archive_command = ‘cd .’  # We can use a script here which can move the log files to an archiving location.
archive_mode = on
ls ./pg_xlog/00000* -lt | tail -n 1 | awk ‘{print $NF}’ | xargs -i mv {} ../archived/
wal_keep_segments =10
6. Take the base backup of primary server.
psql -c “select pg_start_backup(‘base backup for streaming replication’)”
7. Copy “data” folder to secondary server.
“tar” command can be used for this.
Eg: “tar” the “data” folder and move the “tar”-ed file to the secondary database server.
8. Stop the secondary database server.
Service postgresql-9.1 stop
9. Restore the “tar” file to the data folder of secondary database.
                   This can be done using the “tar” command.
10. The base backup in primary database should be stopped using the below query.
psql -c “select pg_stop_backup(), current_timestamp”
11. Create the recovery.conf file in the “data” folder of secondary server.
                   standby_mode = on
primary_conninfo = ‘host=primarydbserver user=repuser  password=mypassword’
trigger_file = ‘/tmp/postgresql.trigger.5432’
Ensure the file permissions.
12. Start the secondary server.
                   Check the log.  

Arduino and Android – A hobby robot

1.   Overview

           A machine which can move on its wheels to all direction and can pick an object, lift and place it back based on the instruction from an Android smart phone.  The application developed and installed in the Android smart phone will be able to communicate with the Motorobo hardware through blue tooth.
          This thought was evolved when we were discussing the possibilities of mobile-software and hardware integration which is having various possibilities in daily life, and we decided to proceed to a working model.
          There are few challenges especially in the mechanical and electronic-circuits part, but after many trials and we were able to succeed.



2.   Architecture of the solution.


Control gestures are provided on a mobile, which sends commands to the MotoRobo.


2. Hi-level architecture of “MotoRobo”

This is an own designed and developed machine which is controlled by gestures from a mobile. This is powered by a 12 V DC Battery of 1.2 Ah.

3.   Components Used

Electronics Component used

               Arduino Uno:  The Arduino Uno is a micro controller board based on the ATmega328.
 1.   L293D, Motor drivi/ng IC.
 2.   Blue tooth receiver.
 3.   12DC, 1.2 Ah rechargeable Batteries.
 4.   6-0-6, 1Amp, 220V transformer and two diodes.
 5.   Two 12V- DC motors with pulley setup.
 6.   Two servo motors and its holders.

Mechanical Component used.

1.    Apart from a “metal frame” and “wheels”, all the other required parts are handmade.
2.    We used tin-sheet, aluminum sheet, screws, and plastic as raw materials.

4.   Development Environments.

4.1  Android-development

We developed an android application (.apk) which can pair with a blue tooth modem, and can send “serial” messages as strings. This mobile app will capture the mobile gestures and convert them to the commands and will send it to the blue- tooth modem which is in the “MotoRobo”.

4.2  Arduino-program-development

Developed an “Arduino” program and flashed to the “Arduino” board to read the serial text from a serial “pin” of “Arduino”. This program will read it and will split the strings to the characters to process. For the easier communication we maintained a string format which is having fixed length. Once received this program will process the string characters and will send programmed signals to the “out-put” pins. Arduino board does not have the capability to drive a heavy electronics device like DC which need high current flow. So we used a motor driving IC, L293D. 

The speeds of DC motors are controlled with the help of L293D-IC and it is based on pulse width modulation (PWM).

Direction controlled using rear-wheels.
     ->To turn to Left, slow down Left wheel, reduce duty cycle for left wheel
     ->To turn to Right, slow down Right wheel, reduce duty cycle for right wheel

               4.3  Electronics development

The circuits are done with the help of a common-PCB, and soldering. The electric system system which is controlled by arduino will energize the motors.

4.4  Mechanical development

1.       Designed a “three wheeler” with two driving DC motors at the back side and a single “ball” wheel at the front end. We bought a general tin frame, wheels and required electronics components. Arranged wheels and battery in the frame.


2.       Designed a mechanical “arm” with tin and “aluminum”, which can pick up to 70gms to100gms of weight and lift. An elliptical shaped plastic arrangement connected to a servo motor will rotate to get the “arm” grip open.
    Closed grip.
   Opened grip,

1.   MotoRobo Control Modes

a.    Mobile Gesture based Control

User can control the MotoRobo using paired Android phone. For direction control, user can use phone like a steering. To bring the arm down, move the phone down. For Grip open/close is done by pinch zoom gesture.

b.    Pre-programed movements.

Apart from “live” control, this machine can work based on the pre-recorded movements. The movements can be programmed using a simple “template” and can execute, so that the machine will start from the first step to the last as a series of movements.

c.    Voice control.

As an additional feature, we have included the feature of “voice-command” to control the machine. This is achieved with the help of google services.

Video URL

Contact :,