## Looking at the learned representation: open set recognition in the embedding space

If you are keeping up with data science and machine learning, you probably know that in recent years, deep neural networks revolutionized artificial intelligence and computer vision.

They can learn how to turn winter landscapes to summer landscapes, put zebra stripes to a horse, learn semantic representations between words with basically no supervision, generate photorealistic images from sketches, and many more amazing feats.

The technology has advanced so far such that basically everyone with a notebook can use and build neural network architectures capable of previously unattainable feats.

Many open source deep learning frameworks — such as TensorFlow and PyTorch — are available, bringing this amazing technology to your arm’s reach.

However, besides improving performance, there is an equally significant area of deep learning research concerned with a set of questions related to understanding how neural networks perceive the world and how can a model generalize beyond what is known.

A naive approach would be to threshold the Softmax probabilities: if the probability of the predicted class is below a given threshold, say 0.5, we reject the item as unknown.

The prediction probabilities provided by our network may be wrong, but looking at the distribution of the points, it is clear that probably there is a new class.

By the distribution of the activation vector values, we can tell whether a new data point is novel in terms of knowledge.

In their experiments, they used an ImageNet pretrained model, fed it with real, fooling and open set images (that is, images from a class unseen in training);

With this method, they were able to improve the open set detection accuracy compared to the naive method, which would be based upon thresholding Softmax probabilities.

If you would see all of the vectors corresponding to our training dataset in this high dimensional space, you would probably think that the subsets corresponding to each class are pretty wild.

However, with each layer, it successively transforms the data to a representation more and more tangible, ultimately outputting a very simple one: a low dimensional simplex (generalization of a triangle in multiple dimensions), with each vertex corresponding to a class.

They introduce a new loss to push clusters further from each other and squeeze together the clusters themselves: The ii-loss can be attached to the embedding layer, giving rise to the following setups.

We have seen in the very first example that uncertainty is not a good way to recognize open set examples: a network can classify an unknown example as a member of a known class with very high confidence.

in their recent paper Reducing Network Agnostophobia, where they introduce two new loss functions to disentangle unknown examples from the known ones: the Entropic Open Set loss and the Objectosphere loss.

The first observation of the authors was that by visualizing the feature representation of MNIST digits and Devanagari handwritten characters for a network trained exclusively on MNIST, the unknown examples tend to be somewhat clustered around the origin.

To use this to our advantage, they have introduced a new loss called Entropic Open Set loss, which drives the Softmax scores of the unknown instances to the uniform probability distribution.

The Entropic Open Set loss doesn’t directly influence feature magnitude: close to optimal scores for unknown samples can still have large magnitude, as long as their activation scores are similar for each class.

As we can see, the second term forces samples in the known class to have large magnitude (top row), while forces unknown samples to have a small magnitude (bottom row).

In my personal opinion, the truly interesting questions about deep learning start when we are looking at how a neural network can generalize beyond the known.

## Everyday AI Experiment: Six Lessons Learned From 7 AI Experiments

Over the past three years, I tried out multiple AI-enabled applications to increase productivity and save time.

As the power behind a variety of technology applications, AI has become part of our daily lives, such as the navigation app you use in your car, the smart speaker in your kitchen, or Siri on your iPhone.

Regardless of the technical terms, when I discuss to AI, I’m referring to the capability of machines to perform cognitive functions more typically associated with human abilities.

Businesses are increasing productivity by utilizing AI to automate repetitive tasks, generate new insights, and change how work gets done.

couple of years ago, JP Morgan Chase built their own AI, called COIN (short for Contract Intelligence), to review and interpret commercial finance contract documents.

COIN was able to complete, in a matter of seconds, what previously required the work of a team of legal staff and 350,000 hours over the course of a year.

Not only did this automation yield significant time savings and free up legal staff to focus on more important work, but COIN enabled a decrease in the number of errors stemming from human interpretation.

Today, leveraging this technology is about narrowly applied AI: the ability of machines to handle tasks in a manner that replicates—or even surpasses—human intelligence, harnessing the ability to perform these cognitive tasks consistently and at high speed.

While examining my business processes and work tasks, any time I’d find a repeatable process that I didn’t want to do, I would ask myself, “Is there an AI app for that?”

This one question has led to dozens of experiments, for all sorts of tasks ranging from navigation, to transcription, to building web pages, to improving my focus and staying on top of my bookkeeping.

