We present a hierarchical, multi-modal approach for placing Flickr videos on the map. Our approach makes use of external resources to identify toponyms in the metadata and of visual and textual features to identify similar content. First, the geographical boundaries extraction method identi es the country and its dimension. We use a database of more than 3.6 million Flickr images to group them together into geographical regions and to build a hierarchical model. A fusion of visual and textual methods is used to classify the videos location into possible regions. Next, the visually nearest neighbour method uses a nearest neighbour approach to nd correspondences with the training images within the preclassified regions. The video sequences are represented using low-level feature vectors from multiple key frames. The Flickr videos are tagged with the geo-information of the visually most similar training item within the regions that is previously ltered by the pre-classi cation step for each test video. The results show that we are able to tag one third of our videos correctly within an error of 1 km.
These days the sharing of photographs and videos is very popular in social networks. Many of these social media web- sites such as Flickr, Facebook and Youtube allows the user to manually label their uploaded videos with geo-information using a interface for dragging them into the map. However, the manually labelling for a large set of social media is still borring and error-prone. For this reason we present a hierarchical, multi-modal approach for estimating the GPS information. Our approach makes use of external resources like gazetteers to extract toponyms in the metadata and of visual and textual features to identify similar content. First, the national borders detection recognizes the country and its dimension to speed up the estimation and to eliminate geographical ambiguity. Next, we use a database of more than 3.2 million Flickr images to group them together into geographical regions and to build a hierarchical model. A fusion of visual and textual methods for different granularities is used to classify the videos' location into possible regions. The Flickr videos are tagged with the geo-information of the most similar training image within the regions that is previously filtered by the probabilistic model for each test video. In comparison with existing GPS estimation and image retrieval approaches at the Placing Task 2011 we will show the effectiveness and high accuracy relative to the state-of-the art solutions.
Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora
Multimodal Geo-tagging in Social Media Websites using Hierarchical Spatial Segmentation Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, volume 978-1-4503-1698-9/12/11, 06.11.2012 - 09.11.2012, pp. 8