How Do Repair Ground Wire Cut Short Of Box

Anchor Boxes — The cardinal to quality object detection

One of the hardest concepts to grasp when learning virtually Convolutional Neural Networks for object detection is the thought of anchor boxes. Information technology is also i of the virtually important parameters you can tune for improved performance on your dataset. In fact, if anchor boxes are non tuned correctly, your neural network will never even know that certain small, large or irregular objects be and will never have a chance to detect them. Luckily, in that location are some elementary steps you can take to make sure you do not fall into this trap.

What are anchor boxes?

When you employ a neural network similar YOLO or SDD to predict multiple objects in a picture, the network is actually making thousands of predictions and but showing the ones that it decided were an object. The multiple predictions are output with the following format:

Prediction 1: (Ten, Y, Height, Width), Class
….
Prediction ~lxxx,000: (X, Y, Height, Width), Class

Where the(X, Y, Height, Width) is called the "bounding box", or box surrounding the objects. This box and the object class are labelled manually past human annotators.

In an extremely simplified example, imagine that we have a model that has ii predictions and receives the following paradigm:

We need to tell our network if each of its predictions is correct or non in order for it to be able to learn. But what do nosotros tell the neural network information technology prediction should be? Should the predicted grade be:

Prediction 1: Pear
Prediction 2: Apple

Or should it be:

Prediction 1: Apple tree
Prediction 2: Pear

What if the network predicts:

Prediction 1: Apple
Prediction 2: Apple

We demand our network's two predictors to exist able to tell whether it is their job to predict the pear or the apple. To practice this at that place are a several tools. Predictors tin can specialize in certain size objects, objects with a certain aspect ratio (tall vs. wide), or objects in dissimilar parts of the paradigm. Most networks utilise all three criteria. In our example of the pear/apple image, we could accept Prediction 1 be for objects on the left and Prediction 2 for objects on the right side of the image. And then nosotros would have our answer for what the network should be predicting:

Prediction 1: Pear
Prediction 2: Apple tree

Anchor Boxes in Exercise

State of the art object detection systems currently do the following:

1. Create thousands of "ballast boxes" or "prior boxes" for each predictor that represent the ideal location, shape and size of the object it specializes in predicting.

2. For each anchor box, calculate which object'due south bounding box has the highest overlap divided past not-overlap. This is called Intersection Over Union or IOU.

three. If the highest IOU is greater than 50%, tell the anchor box that it should find the object that gave the highest IOU.

4. Otherwise if the IOU is greater than xl%, tell the neural network that the true detection is ambiguous and non to learn from that example.

five. If the highest IOU is less than 40%, and so the anchor box should predict that there is no object.

This works well in practice and the thousands of predictors do a very skillful job of deciding whether their type of object appears in an epitome. Taking a look at an open source implementation of RetinaNet, a state-of-the-fine art object detector, nosotros can visualize the anchor boxes. There are too many to visualize all at once, however here are just 1% of them:

Using the default ballast box configuration tin can create predictors that are besides specialized and objects that appear in the image may non achieve an IOU of l% with any of the ballast boxes. In this case, the neural network will never know these objects existed and will never learn to predict them. We can tweak our anchor boxes to be much smaller, such as this 1% sample:

In the RetinaNet configuration, the smallest anchor box size is 32x32. This means that many objects smaller than this volition get undetected. Here is an example from the WiderFace dataset (Yang, Shuo and Luo, Ping and Loy, Chen Modify and Tang, Xiaoou) where we match bounding boxes to their respective anchor boxes, but some fall through the cracks:

In this example, only four of the footing truth bounding boxes overlap with any of the anchor boxes. The neural network will never learn to predict the other faces. Nosotros can fix this by changing our default ballast box configurations. Reducing the smallest anchor box size, all of the faces line up with at least 1 of our anchor boxes and our neural network tin learn to detect them!

Improving Anchor Box Configuration

Every bit a general rule, you should inquire yourself the following questions virtually your dataset before diving into training your model:

What is the smallest size box I desire to be able to discover?
What is the largest size box I want to exist able to detect?
What are the shapes the box can take? For example, a car detector might have curt and wide ballast boxes as long as in that location is no chance of the car or the camera being turned on its side.

Y'all tin can get a rough gauge of these by actually calculating the nigh extreme sizes and aspect ratios in the dataset. YOLO v3, some other object detector, uses One thousand-means to estimate the platonic bounding boxes. Another option is to acquire the ballast box configuration.

One time you have idea through these questions yous tin can kickoff designing your anchor boxes. Be certain to exam them by encoding your ground truth bounding boxes and so decoding them every bit though they were predictions from your model. You should be able to recover the ground truth bounding boxes.

As well, remember that if the centre of the bounding box and ballast box differ, this volition reduce the IOU. Even if you accept small anchor boxes, you may miss some ground truth boxes if the stride between anchor boxes is wide. One way to amend this is to lower the IOU threshold from 50% to twoscore%.

A recent article by David Pacassi Torrico comparison current API implementations of face detection highlights the importance of correctly specifying ballast boxes. You lot can see that the algorithms practice well except for pocket-sized faces. Below are some pictures where an API failed to detect any faces at all, but many were detected with our new model: