Have you ever tried to guess where a picture was taken? We are not talking about those depicting Tower Bridge or Empire State Building. Pictures taken indoors with no obvious location clues are much harder to identify. With years, humans have gathered specific data about the world and all its parts, including languages, the direction of traffic, architectural styles and types of vegetation. All these can make the struggle of defining the location of the photo a bit easier.
Everything about picture analysis has changed for the better thanks to a computer vision expert at Google named Tobias Weyand and several his colleagues. Having access to the huge database of facts about our world, they trained a learning-capable machine to find out the location of any picture working only with the information on it. For example, within this approach, details from the photo of your pet can be used to define where it was shot.
How did they teach a neural network to determine locations? The method is one of the simplest in machine learning. The first step was dividing the world map into the grid of 26 thousands squares of different size. The division was based on the number of images ever taken in the location. It means that big cities, where people take more photos, have more small-scaled grid structure; while remote regions, which are not subject to many pictures, have a less detailed grid. The Network also ignores polar regions and oceans.
The next step was analyzing about 126 million pictures to create a database of pictures from the Internet with geolocation information. Through this database, they determined at which grid each of these pictures was taken. Later on, 91 million images were put into the neural network. The output was either the specific grid location or the list of possible candidates.
The other 34 million pictures were used for validation of the network (Weyand and his colleagues called it PlaNet). After the validation, PlaNet underwent several tests to prove its effectiveness. The first test was simple – they’ve fed PlaNet with over 2 million Flickr pictures to see if it would be able to define their location. The results were quite nice:
- PlaNet determined 6% of the images to the accuracy of street-level;
- 1% of the images were defined at city-grade accuracy;
- It was able to define the originating country in 28.4% of the photos;
- For the continent, where the pic was taken, the number jumped up to 48%.
The second test was a bit more amusing, as it was an actual competition between the PlaNet and 10 people who traveled very much. The test was conducted with the help of an online game. It suggested a player to pinpoint the location of random Google Street View photo on the map of the whole white world. Of course, PlaNet beat humans – it won 28 rounds out of 50 and had about 703.2 miles of localization error (compared to 1442 miles of those for people).
After teaching PlaNet to work out with street photos, Weyand and Co use it to work with pictures taken indoors. Their peculiarity is that they have much less or no location clues. This task is a bit easier in case an image belongs to a specific album. In such case, PlaNet could simply look through certain albums to assume where the picture was taken.
Unlike other neural nets, PlaNet does not require endless amounts of memory – it only needs 377 MB. It means one day it could become a part of smartphone software.