Deep Learning Performance Notes

I started experimenting with Deep Learning and immediately encountered learning issues. I am using license plate recognition code as an example. It is taking ~100K iterations to converge. For production systems,  the required number of iterations are going to be in 10s-100s of millions so performance matters a lot. Few notes:

1. MacBook Air

Each iteration took 6 seconds on my 2015 notebook. This would take 7 days (6*100K/86400) to complete the training. Not good.

2. Ubuntu Linux Server

Performance is much better but not good enough. Each iteration took 3 seconds so training time is still days.

3. Ubuntu Linux Server + GTX 1060 GPU

Each iteration took 0.3 seconds only that means I can experiment every few hours while learning DL.

However, a word of caution: I was hit with exploding/vanishing gradient problem. This is very clear that Tensorflow/GPU combo handles the floating point calculation differently than the CPU only system (I did not encounter this issue on the CPU). A way to solve exploding/vanishing gradient is by tweaking learning rate or trying with different initialization parameters. Reducing learning rate worked for me.

I also looked at running training on GPU instances in the cloud however, the cost seemed to be very high for now: ~$100-$200 per month for partial usage. I was able to upgrade existing computer for $250 to gain better performance.

For DL to become ubiquitous, independent developers need access to more affordable computing resources. For now, it appears that a personal computer with consumer grade GPU is the way to go for independent developers like me till cloud becomes cheap again.

Leave a Reply

Your email address will not be published. Required fields are marked *