Author Archives: Giorgos Giannopoulos

FAGI-gis: fusing geospatial RDF data

GeoKnow introduces the latest version of FAGI-gis, a framework for fusing Linked Data, that focuses on the geospatial properties of the linked entities. FAGI-gis receives as input two datasets (through SPARQL endpoints) and a set of links that interlink entities between the datasets and produces a new dataset where each pair of linked entities is fused into a single entity. Fusion is performed for each pair of matched properties between two linked entities, according to a selected fusion action, and considers both spatial and non-spatial properties.

The tool supports an interactive interface, offering visualization of the data at every part of the process. Especially for spatial data, it provides map previewing and graphical manipulation of geometries. Further, it provides advanced fusion functionality through batch mode fusion, clustering of links, link discovery/creation, property matching, property creation, etc.

As the first step of the fusion workflow, the tool allows the user to select and filter the interlinked entities  (using the Classes they belong to or SPARQL queries) to be loaded for further fusion. Then, at the schema matching step, a semi-automatic process facilitates the mapping of entity properties from one dataset to the other. Finally, the fusion panel allows the map-based manipulation of geometries, and the selection from a set of fusion actions in order to produce a final entity, where each pair of matched properties are fused according to the most suitable action.

The above process can be enriched with several advanced fusion facilities. The user is able to cluster the linked entities according to the way they are interlinked, so as to handle with different fusion actions, different clusters of linked entities. Moreover, the user can load unlinked entities and be recommended candidate entities to interlink. Finally, training on past fusion actions and on OpenStreetMap data, FAGI-gis is able to recommend suitable fusion actions and OSM Categories (Classes) respectively, for pairs of fused entities.

The FAGI-gis is provided as free software and its current version is available from GitHub. An introductory user guide is also available. More detailed information on FAGI-gis is provided int he following documents:


 

OSMRec – Α tool for automatic annotation of spatial entities in OpenStreetMap

GeoKnow has recently introduced OSMRec, a JOSM plugin for automatic annotation of spatial features (entities) into OpenStreetMap.  OSMRec trains on existing OSM data and is able to recommend to users OSM categories, in order to annotate newly inserted spatial entities. This is important for two reasons. First, users may not be familiar with the OSM categories; thus searching and browsing the OSM category hierarchy to find appropriate categories for the entity they wish to insert may often be a time consuming and frustrating process, to the point of users neglecting to add this information. Second, if an already existing category that matches the new entity cannot be found quickly and easily (although it exists), the user may resort instead to using his/her own term, resulting in synonyms that later need to be identified and dealt with.

The category recommendation process takes into account the similarity of the new spatial entities to already existing (and annotated with categories) ones in several levels: spatial similarity, e.g. the number of nodes of the feature’s geometry, textual similarity, e.g. common important keywords in the names of the features and semantic similarity (similarities on the categories that characterize already annotated entities). So, for each level (spatial, textual, semantic) we define and implement a series of training features that represent spatial entities into a multidimensional space. This way, by training the aforementioned models, we are able to correlate the values of the training features with the categories of the spatial entities, and consequently, recommend categories for new features. To this end, we apply multiclass SVM classification, using LIBLINEAR.

The following figure represents a screen of OSMRec within JOSM. The user can select an entity or draw a new entity on the map and ask for recommendations by clicking the “Add Recommendation” button. The recommendation panel opens and the plugin automatically loads the appropriate recommendation model that has previously been trained offline.

6. Recommendation panel

The recommendation panel provides a list with the top-10 recommended categories and the user can select from this list and click “Add and continue”. As a result the selected category is added to the OSM tags. By the time the user adds a new tag at the selected object, a new vector is computed for that OSM instance in order to recalculate the predictions and display an updated list of recommendations (taking into account the previously selected categories/tags, as extra training information). Further, OSMRec provides functionality for allowing the user to combine several recommendation models, based on (a) a selected geographic area, (b) user’s past editing history on OSM and (c) combination of (a) and (b). This way, personalized category recommendations can be provided that take into account the user’s editing history and/or the specific characteristics of a geographic area of OSM.

OSMRec plugin can be downloaded and installed in JOSM following the standard procedure. Detailed implementation information can be found in the following documents:

GeoKnow Athens Meeting

In the last days of July, the second meeting of GeoKnow project took place in Athens. GeoKnow members had the opportunity to meet again after the Leipzig kick-off meeting, discuss the work performed during the first 7 months of the project, as well as fix the next steps. Apart from that, our fellow partners had the chance to strall around some of the most historic and picturesque sites of Athens, like the old and the new parliament buildings, the National University of Athens, the Temple of Olympian Zeus, the Monastiraki and Plaka quarters and Acropolis, and taste some of the most iconic Greek dishes!

The first part of the meeting focused on performed work. Advances from each Work Package were presented, feeding discussions about (a) integrating currently developed tools, (b) utilizing these tools to manage/process use case datasets and (c) resolving research issues that had come up and enhancing the functionality and efficiency of the developed solutions. All partners agreed that the project advanced significantly since December, as the first tools for managing and processing geospatial, RDF data have been already developed, and very informative reports about the state of the art on geospatial and RDF data management, benchmarking and system requirements have been published as well.

20130726_155709

Several discussions, during both meeting days, revolved around the system architecture and, specifically, the GeoKnow Generator (GKG). Concrete decisions were made about the GKG backend that will include a set of loosely integrated components for consuming, processing and exposing geospatial, Linked Data, based on Virtuoso RDF store. We also considered issues regarding user management, GKG’s front end, workflow processing and implementation of gathered system and user requirements.

With respect to geospatial information management, where GeoKnow has already provided solutions for transforming and exposing conventional geospatial data into RDF data (Sparqlify, LinkedGeoData, TripleGeo), all partners agreed that there is the potential to build (based on the work performed in Tasks 2.1 and 1.3) a timely geospatial RDF benchmark, that will be able to test efficiency and functionality capabilities of today’s RDF stores with geospatial support. Also, next steps were discussed, with emphasis on further optimizing the geospatial query capabilities of the underlying RDF store (Virtuoso).

As far as semantic integration of geospatial data is concerned, tools developed within Geoknow for enriching and interlinking geospatial RDF data (GeoLift, LIMES), were, at first, presented to the consortium. These tools triggered further discussions about the fusion and aggregation solutions currently under development and design, as well as how these tools can directly be tested and utilized into processing commercial datasets from the use case partners. Finally, a large part of our discussions was dedicated to quality measures and quality assessment of geospatial data; although these tasks are due to later periods in the project, they are of high importance for all the functionality being built in Work Package 3, since quality indicators of datasets can constitute valuable input for processing such as interlinking and fusion.

After the presentation of (implemented or under development) GeoKnow tools for visualization and authoring of geospatial RDF data, such as Facete, creative ideas were exchanged, discussing both detailed technical implementation solutions and desired functionality for end users. Some important aspects that were considered are the implementation and functionality of spatial authoring, the issue of public and spatial Linked Data co-evolution and the potential for spatial-social networking. Again, the discussions considered the use case scenarios of the project, that is, how the offered functionality can serve commercial and industrial Linked Data management and visualization needs.

In conclusion, during the GeoKnow meeting in Athens all partners exchanged interesting ideas about ongoing and future work and set more concrete objectives to achieve through the next months. We thank all the GeoKnow members for attending and contributing to this constructive meeting!