GeoKnow introduces the latest version of FAGI-gis, a framework for fusing Linked Data, that focuses on the geospatial properties of the linked entities. FAGI-gis receives as input two datasets (through SPARQL endpoints) and a set of links that interlink entities between the datasets and produces a new dataset where each pair of linked entities is fused into a single entity. Fusion is performed for each pair of matched properties between two linked entities, according to a selected fusion action, and considers both spatial and non-spatial properties.
The tool supports an interactive interface, offering visualization of the data at every part of the process. Especially for spatial data, it provides map previewing and graphical manipulation of geometries. Further, it provides advanced fusion functionality through batch mode fusion, clustering of links, link discovery/creation, property matching, property creation, etc.
As the first step of the fusion workflow, the tool allows the user to select and filter the interlinked entities (using the Classes they belong to or SPARQL queries) to be loaded for further fusion. Then, at the schema matching step, a semi-automatic process facilitates the mapping of entity properties from one dataset to the other. Finally, the fusion panel allows the map-based manipulation of geometries, and the selection from a set of fusion actions in order to produce a final entity, where each pair of matched properties are fused according to the most suitable action.
The above process can be enriched with several advanced fusion facilities. The user is able to cluster the linked entities according to the way they are interlinked, so as to handle with different fusion actions, different clusters of linked entities. Moreover, the user can load unlinked entities and be recommended candidate entities to interlink. Finally, training on past fusion actions and on OpenStreetMap data, FAGI-gis is able to recommend suitable fusion actions and OSM Categories (Classes) respectively, for pairs of fused entities.
The FAGI-gis is provided as free software and its current version is available from GitHub. An introductory user guide is also available. More detailed information on FAGI-gis is provided int he following documents:
GeoKnow has recently introduced OSMRec, a JOSM plugin for automatic annotation of spatial features (entities) into OpenStreetMap. OSMRec trains on existing OSM data and is able to recommend to users OSM categories, in order to annotate newly inserted spatial entities. This is important for two reasons. First, users may not be familiar with the OSM categories; thus searching and browsing the OSM category hierarchy to find appropriate categories for the entity they wish to insert may often be a time consuming and frustrating process, to the point of users neglecting to add this information. Second, if an already existing category that matches the new entity cannot be found quickly and easily (although it exists), the user may resort instead to using his/her own term, resulting in synonyms that later need to be identified and dealt with.
The category recommendation process takes into account the similarity of the new spatial entities to already existing (and annotated with categories) ones in several levels: spatial similarity, e.g. the number of nodes of the feature’s geometry, textual similarity, e.g. common important keywords in the names of the features and semantic similarity (similarities on the categories that characterize already annotated entities). So, for each level (spatial, textual, semantic) we define and implement a series of training features that represent spatial entities into a multidimensional space. This way, by training the aforementioned models, we are able to correlate the values of the training features with the categories of the spatial entities, and consequently, recommend categories for new features. To this end, we apply multiclass SVM classification, using LIBLINEAR.
The following figure represents a screen of OSMRec within JOSM. The user can select an entity or draw a new entity on the map and ask for recommendations by clicking the “Add Recommendation” button. The recommendation panel opens and the plugin automatically loads the appropriate recommendation model that has previously been trained offline.
The recommendation panel provides a list with the top-10 recommended categories and the user can select from this list and click “Add and continue”. As a result the selected category is added to the OSM tags. By the time the user adds a new tag at the selected object, a new vector is computed for that OSM instance in order to recalculate the predictions and display an updated list of recommendations (taking into account the previously selected categories/tags, as extra training information). Further, OSMRec provides functionality for allowing the user to combine several recommendation models, based on (a) a selected geographic area, (b) user’s past editing history on OSM and (c) combination of (a) and (b). This way, personalized category recommendations can be provided that take into account the user’s editing history and/or the specific characteristics of a geographic area of OSM.
OSMRec plugin can be downloaded and installed in JOSM following the standard procedure. Detailed implementation information can be found in the following documents:
A typical tourist scenario is hard to picture without a map. Yet, such a scenario implies you are not familiar with your surroundings and, therefore, often not sure how to find the things that are of interest to you. Typical geospatial browsers will provide you with common exploration tools that will most often include a slippy map combined with keyword search, categorized points of interest (POIs) and a fixed set of filters. But, all of these imply either that you know what it is you’re looking for, or that the preset collection of POIs and criteria will be enough to satisfy your needs. In real life, however, those needs will often be affected by the given context, which is, in turn, dependent on multiple, dynamic factors, such as the place you’re visiting, your mood, interests, background etc. Imagine using your favorite geospatial browser to answer the following question:
“Where are the nearest buildings designed by Frank Lloyd Wright, typical of the Prairie School movement?”
GEM (Geospatial-semantic Exploration on the Move) is the very first geospatial exploration tool that offers a rich mobile experience and overcomes the abovementioned limitations of conventional solutions by exploiting all strengths of the Linked Open Data paradigm, such as built-in semantics in open, crowd-sourced knowledge found in publicly available sources, such as DBpedia, loaded and filtered on-demand, according to user’s needs, in order to prevent maps from overpopulating.
The Workbench also provides Single Sign On functionality, user and role management and data access control for the different users. The Workbench is comprised of a front-end and back-end implementations. The front-end provides GUIs for software components where a REST API is available (LimesService, GeoLiftService and TripleGeoService). Components that provide their own GUI, are integrated using containers (FAGI-gis, OntoWiki, Mappify, Sparqlify and Virtuoso SPARQL query interface). The front-end also provides GUIs for the administrative features like users and roles management, data source management and graphs management, as well as the Dashboard GUI. The Dashboard provides a visual feedback to the user with the registered jobs and the status of executions. The Workbench back-end provides REST interfaces for management of users, roles, graphs, datasources and batch jobs, for retrieving the system configuration, and for importing RDF data. All system information is stored in Virtuoso RDF store.
Manifold RDF data contain implicit references to geographic data. For example, music datasets such as Jamendo include references to locations of record labels, places where artists were born or have been, etc. The aim of the spatial mapping component, dubbed GeoLift, is to retrieve this information and make it explicit.
Geographical information can be mentioned in three different ways within Linked Data:
Through dereferencing: Several datasets contain links to datasets with explicit geographical information such as DBpedia or LinkedGeoData. For example, in a music dataset, one might find information such as http://example.org/Leipzig owl:sameAs http://dbpedia.org/resource/Leipzig. We call this type of reference explicit. We can now use the semantics of RDF to fetch geographical information from DBpedia and attach it to the resource in the other ontology as http://example.org/Leipzig and http://dbpedia.org/resource/Leipzig refer to the same realworld object.
Through linking: It is known that the Web of Data contains an insufficient number of links. The latest approximations suggest that the Linked Open Data Cloud alone consists of 31+ billion triples but only contains approximately 0.5 billion links (i.e., less than 2% of the triples are links between knowledge bases). The second intuition behind GeoLift is thus to use link discovery to map resources in an input knowledge base to resources in a knowledge that contains explicit geographical information. For example, given a resource http://example.org/Athen, GeoLift should aim to find a resource such as http://dbpedia.org/resource/Athen to map it with. Once having established the link between the two resources, GeoLift can then resolve to the approach defined above.
Through Natural Language Processing: In some cases, the geographic information is hidden in the objects of data type properties. For example, some datasets contain biographies, textual abstracts describing resources, comments from users, etc. The idea here is to use this information by extracting Named Entities and keywords using automated Information Extraction techniques. Semantic Web Frameworks such as FOX have the main advantage of providing URIs for the keywords and entities that they detect. These URIs can finally be linked with the resources to which the datatype properties were attached. Finally, the geographical information can be dereferenced and attached to the resources whose datatype properties were analyzed.
The idea behind GeoLift is to provide a generic architecture that contains means to exploit these three characteristics of Linked Data. In the following, we present the technical approach underlying GeoLift.
GeoLift was designed to be a modular tool which can be easily extended and re-purposed. In its first version, it provides two main types of artifacts:
Modules: These artifacts are in charge of generating geographical data based on RDF data. To this aim, they implement the three intuitions presented above. The input for such a module is an RDF dataset (in Java, a Jena Model ). The output is also an RDF dataset enriched with geographical information (in Java, an enriched Jena Model ).
Operators: The idea behind operators is to enable users to define a workflow for processing their input dataset. Thus, in case a user knows the type of enrichment that is to be carried out (using linking and then links for example), he can define the sequence of modules that must be used to process his dataset. Note that the format of the input and output of modules is identical. Thus, the user is empowered to create workflows of arbitrary complexity by simply connecting modules.
The corresponding architecture is shown below. The input layer allows reading RDF in different serializations. The enrichment modules are in the second layer and allow adding geographical information to RDF datasets by different means. The operators (which will be implemented in the future version of GeoLift) will combine the enrichment modules and allow defining a workflow for processing information. The output layer serializes the results in different format. The enrichment procedure will be monitored by implementing a controller, which will be added in the future version of GeoLift.