Deep Learning Performance Notes

I started experimenting with Deep Learning and immediately encountered learning issues. I am using license plate recognition code as an example. It is taking ~100K iterations to converge. For production systems,  the required number of iterations are going to be in 10s-100s of millions so performance matters a lot. Few notes:

1. MacBook Air

Each iteration took 6 seconds on my 2015 notebook. This would take 7 days (6*100K/86400) to complete the training. Not good.

2. Ubuntu Linux Server

Performance is much better but not good enough. Each iteration took 3 seconds so training time is still days.

3. Ubuntu Linux Server + GTX 1060 GPU

Each iteration took 0.3 seconds only that means I can experiment every few hours while learning DL.

However, a word of caution: I was hit with exploding/vanishing gradient problem. This is very clear that Tensorflow/GPU combo handles the floating point calculation differently than the CPU only system (I did not encounter this issue on the CPU). A way to solve exploding/vanishing gradient is by tweaking learning rate or trying with different initialization parameters. Reducing learning rate worked for me.

I also looked at running training on GPU instances in the cloud however, the cost seemed to be very high for now: ~$100-$200 per month for partial usage. I was able to upgrade existing computer for $250 to gain better performance.

For DL to become ubiquitous, independent developers need access to more affordable computing resources. For now, it appears that a personal computer with consumer grade GPU is the way to go for independent developers like me till cloud becomes cheap again.

IoT: Connectivity & Time Series Storage

This is third post in continuation of my post IoT: Thermography based operation monitoring where I talked about creating an IoT system to monitor an operation using a camera and an array of IR sensors.

In this post, I am going to set up a communication and storage system for IoT data. I am planning to use MQTT for communication as it is one of the most widely used protocols for sensor data communication. More details regarding MQTT can be found at


MQTT is a publish/subscribe, extremely simple and lightweight messaging protocol, designed for constrained devices and low-bandwidth, high-latency or unreliable networks. A MQTT publisher connects to a broker, and can send any arbitrary data to a “topic”. A MQTT subscriber can choose what  “topic(s)” to listen to and process incoming data streams. In this post, we will listen to a topic and write incoming data to our storage system.

We will use Mosquitto as MQTT Broker which is available for download at

For storage, I am going to use InfluxDB. It is well maintained and scalable open-source solution provided by Influx Data. More details regarding InfluxDB can be found at

You can write a small code to subscribe to MQTT topic and write incoming data to InfluxDB, or you can use Telegraf to achieve the same. Telegraf is one more software available from Influx Data which can subscribe to an MQTT topic and write data to InfluxDB.

To wrap this post up, I’ll show a basic Grafana dashboard to display incoming time series to verify if the complete setup is working or not. Of course, I’ll have a much better visualization scheme in the next post. For Grafana details go to


  1. This is very unsecured setup so do not deploy it in a production environment.
  2. I am going to setup it on a ubuntu VM running on AWS t2.micro instance. You can run it on your local Linux machine with minor modifications. Make sure that following ports are open and accessible: 1883 (MQTT), 3000 (Grafana), and 22 (SSH).


Please install following on your Linux server:

Configure Telegraf to have listner on/input from MQTT on topic ‘svt/ir’ with output as InfluxDB. Telegraf will connect to InfluxDB automatically in out-of-box configuration.

On your Raspberry-Pi, install Paho MQTT client:

pip install paho-mqtt

Transmitting sensor data from Raspberry Pi

We are going to upload a picture taken by the camera in the beginning. In a real system, we may want to upload continuous stream of image snapshots but that requires very high network bandwidth. We will upload the image using SCP (copy over SSH).

We will also transmit 64 IR pixel values as well average of all the IR pixel as a background temperature. We are going to read 64 IR pixel values from socket /var/run/mlx9062x.sock, calculate an average value and publish to MQTT broker in the following format.

 ir p0=<value0>,p1=<value1> .... p63=<value63>,avg=<average> <timestamp>

Python code and explanation:

1. Header setup and python modules

#!/usr/bin/env python
import os, sys, time
import numpy as np
from time import sleep
import subprocess
import paho.mqtt.client as mqtt
IP = r'&lt;server-ip&gt;'
USER = r'&lt;user&gt;'
KEY  = r’&lt;user-credentials&gt;

2. Capture a picture using Raspberry Pi camera and upload it to the server:

def getImage():
    fn = r'/home/pi/pics.jpg';
    proc = subprocess.Popen('raspistill -o %s -w 640 -h 480 -n -t 3' %(fn),
                        shell=True, stderr=subprocess.STDOUT)
    proc = subprocess.Popen('scp -i &lt;user-credentials&gt; %s &lt;user&gt;@&lt;server-ip&gt;:.' %(fn),
                        shell=True, stderr=subprocess.STDOUT)
    print "Image uploaded successfully"

3. Set up a connection to the socket for the IR data and to MQTT broker:

fifo = open('/var/run/mlx9062x.sock', 'r')
# The callback for when the client receives a CONNACK response from the server.
def on_connect(client, userdata, flags, rc):
    print("Connected with result code "+str(rc))
client = mqtt.Client()
client.on_connect = on_connect
client.connect(IP, 1883, 60)

4. Publish data to the server:

while True:
    ir_raw =
    ir_trimmed = ir_raw[0:128]
    ir = np.frombuffer(ir_trimmed, np.uint16)
    str1 = 'ir p0='+str(ir[0])
    for i in range(1, 64):
       str1 = str1 + ',p'+str(i)+'='+str(ir[i])
    avg = str(int(np.mean(ir)))
    now = int (time.time())
    str1 = str1 + ",avg=" + avg + " " + str(now) + "000000000"
    print str1
    client.publish('svt/ir', str1, qos=0, retain=False)

Go to http://<server-ip>:3000 and set up a Grafana dashboard to display average temperature data.

There are excellent tutorials available online on how to setup basic dashboard. InfluxDB database name is telegraf with measurement name being ir. The average temperature is stored in avg column.

If everything works, you should see following in your Grafana dashboard:

Next week: How to display IR sensor data in a more meaningful manner?

IoT: Thermography Hardware

This post in continuation of my previous post IoT: Thermography based operation monitoring where I talked about creating an IoT system to monitor an operation using a camera and an array of IR sensors.

In this blog post, I am going to start building upon a very good project here and use it as a baseline for our thermography hardware.

Skills/Tools needed

  1. Soldering kit/capabilities
  2. Drill and drill bits to create space in box for sensors
  3. Silicon putty for electronics and/or two-sided foam tape
  4. Screwdriver,  crimping tool, wire-cutter etc.

For collecting thermography data, we are going to use Raspberry Pi as the hardware platform. IR sensor of choice is 16×4 grid MLX90621 (Ordered on Digikey). For overlay purposes, we are going to use a raspi-camera which is easily available online.  I ordered a Raspberry Pi case online for assembling the complete system. The total cost of the system is going to be ~$100.

For deployment, Raspberry Pi Zero W + FLiR Lipton in an industrial housing is a good choice with the total cost of the system in the range of $300 per unit.

I am going to assume that there are enough tutorial on the internet (and here) for you to assemble and put everything together.

Some tips

  1. You’ll need space in your Raspberry Pi case, so order a case on the bigger side. I had to cut GPIO pins to allow me to fit everything together as I had ordered a smaller case (Looks nice, though).
  2. Use softer wires for connecting/Soldering on PCB: This allows easier cable management inside the case.
  3. Drill using small bit first and then go successively bigger till you can fit sensors.
  4. Use two-sided tape and silicon putty to fit affix/sensors in the case.
  5. Recommended is to use a heat-sink on the Raspberry Pi processor as it is going to get hotter in the case.

Finished hardware box looks like:

To make this hardware work, I am assuming that you have followed the original link posted in the beginning of this post. However, the code provided in the original link will not work. For latest code, please go to

I have made following updates:

  • The original code supports MLX90620 only, not MLX90621. Added MLX90621 support.
  • To compile code for MLX90620, please update define in mlxd.c
  • Enhanced debugging:
    • Please set following debug flags in mlxd.c
      • DEBUG: Prints intermediate debug values
      • DEBUG_TO: Prints information instead of sending to /var/run/mlx9062x.sock
      • 40
         #define VERSION "0.1.0"
         #define EXIT_FAILURE 1
         #define DEBUG 0
         #define DEBUG_TO 0
         #define MLX90620 0
    • Read socket data through the following command for debugging purposes
        sudo python
  • Bunch of bug fixes

Happy hacking!

IoT: Thermography based operation monitoring

The easiest way to implement an IoT system is to monitor assets and operations in a non-intrusive manner. IR sensors, Acoustic sensors, and Ultrasonic sensors combined together can provide complete monitoring of moving parts in industrial operationin a non-intrusive manner. Thermography is using an array of IR sensors to generate a heat map of a system to find potential failures before they occur.

This is a series of posts on setting up IoT system to monitor using a camera and an array of IR sensors.

Creating an IoT system involves following three steps:

  1. Create application-specific hardware and start collecting sensor data. For large deployments, this step involves sensors, efficient computing platform, hardware qualification and device management platform. However, my recommendation is to create a hardware prototype quickly to allow a complete analysis of the potential investment and generated value.
  2. Communication and storage system for IoT data. My preference is to use AWS IoT & AWS cloud services for this purpose. However, you can use Microsoft Azure, GE Predix, PTC ThingWorx or any other platform of your choice for this purpose. For prototyping, I’ll be using MQTT for communication and InfluxDB as a time-series storage.
  3. Sensor processing, analytics, and application dashboard: I’ll be using C for the embedded programming and Python for the sensor analytics. I have found that InfluxDB, Grafana, and Freeboard are more than enough for creating a working application dashboard.

I’ll update the links here as I make progress with above three steps.

About TestGenie: Analytics based exam preparation

I was co-founder of TestGenie, a web/android app for Indian Test Prep Market ( TestGenie was a platform for MBA Preparation and Computer Based Testing. TestGenie provided the effective technology tools to compete in the current education system and customizing contents as per students individuality and requirements.

We worked on algorithms to personalize and improve results for students. Deployed solution on AWS cloud and 40,000 students were using it at one time.

You can see demo at