Reading a Live Ticker Part II: Creating a Dataset

The task of reading numbers on a computer screen is quite close to the standard ML-task of identifying hand written digits with the MNIST dataset.

This dataset however is not suited for our task at hand. In the MNIST dataset: all images of the digit “one” “1” are written just as a single line, like a small “L”: “l”. But many font types depict a “one” different.

This leads to only one option: create our own dataset, in the way that numbers look on a computer screen.

I created the CPD dataset (Computer Printed Digits dataset) with Apache Open Office and Python. I created 17700 screenshots of the digits 0-9 in different font-types and different shades of gray as a background. The different backgrounds help to make any neural network trained on the data to become more robust and versatile in its application.

The details are published in my Github Repository

You can either download the dataset created by myself (17700 images), or you can follow the steps to create your own dataset.