Deep neural networks (DNN) have been used in commercial image classification applications for sometime now with varying degrees of success. Many of these applications assign class labels and confidence scores (shirt 90%, TV 80%, book 50%, …) to the images they classify.
Most of the time, the performance of the DNN models I build is satisfactory. However, I’m really interested in understanding how a DNN may confuse non-related objects. For example, it confused running shoes for a water bottle in a dataset I was working on recently.
I used three Deep Neural Networks (DNN) models: ResNet50, InceptionV3 and Xception, which are all pre-trained on the ImageNet dataset. ImageNet is a research project to develop a large image dataset with annotations, such as standard labels and descriptions. The dataset has been used in the annual ILSVRC image classification challenge. A few of the winners published their pre-trained models with the research community, and I used them here.
The three models used to classify images from ads posted on Avito. Here is how they fared. I’m not drawing conclusions based on this simple example. The objective is to show the three most likely classes, since most of the time we only see the top one.