How Google Cracked House Number Identification in Street View

How Google Cracked House Number Identification in Street View: Google Street View has become an essential part of the online mapping experience. It allows users to drop down to street level to see the local area in photographic detail.

But it’s also a useful resource for Google as well. The company uses the images to read house numbers and match them to their geolocation. This physically locates the position of each building in its database.

That’s particularly useful in places where street numbers are otherwise unavailable or places such as Japan and South Korea where streets are rarely numbered in chronological order but in other ways such as the order in which they were constructed, a system that makes many buildings impossibly hard to find, even for locals.

But the task of spotting and identifying these numbers is hugely time-consuming. Google’s street view cameras have recorded hundreds of millions of panoramic images that together containing tens of millions of house numbers. The task of searching these images manually to spot and identify the numbers is not one anybody could approach with relish.

So, naturally, Google has solved the problem by automating it. And today, Ian Goodfellow and pals at the company, reveal how they’ve done it. Their method turns out to rely on a neural network that contains 11 levels of neurons that they have trained to spot numbers in images.

To start off with, Goodfellow and co place some limits on the task at hand to keep it as simple as possible. For example, they assume that the building number has already been spotted and the image cropped so that the number is at least one third the width of the resulting frame. They also assume that the number is no more than 5 digits long, a reasonable assumption in most parts of the world.

But the team does not divide the number into single digits, as many other groups have done. Their approach is to localise the entire number within the cropped image and to identify it in one go—all with a single neural network.

They train this net using images drawn from a publicly available dataset of number images known as the Street View House Numbers data set. This contains some 200,000 numbers taken by Google’s street view cameras and made publicly available. The training takes about 6 days to complete, they say.