EDF2015 and Linked Data Europe: Big Geospatial Data Workshop

In 2015, the European Data Forum took place in Luxembourg on the 16th and 17th November. GeoKnow team had the pleasure to be present at the event with a booth for showing GeoKnow results. The conference welcomed over 700 participants from industry, research, policy makers, and community initiatives form all over Europe.

Our representrs at the EDF2015

Our representers at the EDF2015

The day after the conference we participated at the Linked Data Europe Workshop, that was organized by IQmulus, GeoKnow, LEO and MELODIES teams. Jens Lehmann of the University in Leipzig and Jonas Schulz from Ontos AG demonstrated our GeoKnow workbench, talked about the tools in our Linked Data Stack and had insights into other projects with the scope of Linked Geo Data and Big Data. Overall 10 projects were presented and the workshop ended with an informative discussion about Linked Geo Data tools replacing or extending existing GIS solutions.

Thanks to everyone, who organized the EDF and the workshop.

Linked Open Data Switzerland at SWBI2015

Daniel Hladky from Ontos presented GeoKnow at the SWBI2015 conference two talks.

The first talk was the keynote on October 7, 2015 with the title “Linked Data Service (LINDAS): Status quo of the Linked Data life-cycle and lessons learned“. Within this keynote an introduction was given to the LOD2 based linked data life-cycle and the LINDAS platform. The LINDAS system is based on the GeoKnow Generator tool that was developed by Ontos during the GeoKnow project. At the end of the talk an outlook was given on future developments such as an improved natural language processing system based on neural networks and the new visualisation dashboards for RDF data.

The second talk on October 8, 2015 was part of the Linked Data Switzerland workshop. The focus on this talk was to set the stage of Linked Open Data in Switzerland using the LINDAS platform. Further the various participants discussed issues that have to be solved. For example how to build an linked data economy that publishes as many as possible datasets and how to motivate companies and individuals to start to develop new applications base don the datasets.

FAGI-gis: fusing geospatial RDF data

GeoKnow introduces the latest version of FAGI-gis, a framework for fusing Linked Data, that focuses on the geospatial properties of the linked entities. FAGI-gis receives as input two datasets (through SPARQL endpoints) and a set of links that interlink entities between the datasets and produces a new dataset where each pair of linked entities is fused into a single entity. Fusion is performed for each pair of matched properties between two linked entities, according to a selected fusion action, and considers both spatial and non-spatial properties.

The tool supports an interactive interface, offering visualization of the data at every part of the process. Especially for spatial data, it provides map previewing and graphical manipulation of geometries. Further, it provides advanced fusion functionality through batch mode fusion, clustering of links, link discovery/creation, property matching, property creation, etc.

As the first step of the fusion workflow, the tool allows the user to select and filter the interlinked entities  (using the Classes they belong to or SPARQL queries) to be loaded for further fusion. Then, at the schema matching step, a semi-automatic process facilitates the mapping of entity properties from one dataset to the other. Finally, the fusion panel allows the map-based manipulation of geometries, and the selection from a set of fusion actions in order to produce a final entity, where each pair of matched properties are fused according to the most suitable action.

The above process can be enriched with several advanced fusion facilities. The user is able to cluster the linked entities according to the way they are interlinked, so as to handle with different fusion actions, different clusters of linked entities. Moreover, the user can load unlinked entities and be recommended candidate entities to interlink. Finally, training on past fusion actions and on OpenStreetMap data, FAGI-gis is able to recommend suitable fusion actions and OSM Categories (Classes) respectively, for pairs of fused entities.

The FAGI-gis is provided as free software and its current version is available from GitHub. An introductory user guide is also available. More detailed information on FAGI-gis is provided int he following documents:


GeoKnow Public Datasets

In this blogpost we want to present three public datasets that were improved/created in GeoKnow project.

Size: 177GB zipped turtle file
URL: http://linkedgeodata.org/

LinkedGeoData is the RDF version of Open Street Map (OSM), which covers the entire planet geospatial data information. As of September 2014 the zipped xml file from OSM had 36GB of data, while the size of zipped LGD files in turtle format is 177GB. The detailed description of the dataset can be found in the D1.3.2 Continuous Report on Performance Evaluation. Technically, LinkedGeoData is set of SQL files, database-to-rdf (RDB2RDF) mappings, and bash scripts. The actual RDF conversion is carried out by the SPARQL-to-SQL rewriter Sparqlify. You can view the Sparqlify Mappings for LinkedGeoData here. Within The maintenance and improvement of the Mappings required to transform OSM data to RDF has being done during all the project. This dataset has being used in several use cases, but specially for all benchmarking tasks within GeoKnow.

URL: http://wikimapia.org/api/

Wikimapia is a crowdsourced, open-content, collaborative mapping initiative, where users can contribute mapping information. This dataset existed already before the project started. However it was only accessible through Wikimapia’s API⁴ and provided in XML or JSON formats. Within GeoKnow, we downloaded several sets of geospatial entities from Wikimapia, including both spatial and non-spatial attributes for each entity and transformed them into RDF data. The process we followed is described next. We considered a set of cities throughout the world (Athens, London, Leipzig, Berlin, New York) and downloaded the whole content provided by Wikimapia regarding the geospatial entities included in those geographical areas. These cities where preferred since they are the base cities of several partners in the project, while the rest two cities were randomly selected, with the aim to reach our target of more than 100000 spatial entities from Wikimapia. Apart from geometries, Wikimapia provided a very rich set of metadata (non-spatial properties) for each entity (e.g. tags and categories describing the geospatial entities, topological relations with nearby entities, comments of the users, etc.). The aforementioned dumps were transformed into RDF triples in a straightforward way: (a) defining intermediate resources (functioning as blank nodes) where information was organized in more than one levels, (b) flattening the information of deeper levels where possible in order to simplify the structure of the dataset and (c) transforming tags into OWL classes. Specifically, we developed a parsing tool to communicate with the Wikimapia API and construct appropriate n-triples from the dataset. The tool takes as input a bounding box in the form of wgs84 coordinates (min long, min lat, max long, max lat). We chose five initial bounding boxes: one for each of the cities mentioned above. The bounding box was defined in such way so that it covered the whole area of the selected city. Each bounding box was then further divided by the tool into a grid of smaller bounding boxes in order to overcome the upper limit per area of the returned entities from Wikimapia API. For each place returned, we transformed all properties into RDF triples. Every tag was assigned an OWL class and an appropriate label, corresponding to the textual description in the initial Wikimapia XML file. Each place became an instance of the classes provided by its tags. For the rest of the returned Wikimapia attributes, we created a custom property in a uniform way for each attribute of the returned Wikimapia XML file. The properties resulting from the Wikimapia XML attributes point to their literal values. For example, we construct properties about each place’s language id, Wikipedia link, URL link, title, description, edit info, location info, global administrative areas, available languages and geometry information. If these attributes follow a deeper tree structure, we assign the properties at intermediate custom nodes by concatenating the property with the place ID; these nodes function as blank nodes and connect the initial entity with a set of properties and the respective values. This process resulted to creating an initial geospatial RDF dataset containing, for each entity, the polygon geometry that represents it, along with a wealth of non-spatial properties of the entity. The dataset contains 102,019 geospatial entities and 4,629,223 triples.
Upon that, in order to create a synthetically interlinked pair of datasets, we split the Wikimapia RDF dataset, duplicating the geometries and dividing them into the two datasets in the following way. For each polygon geometry, we created another point geometry located in the centroid of the polygon and then shifted the point by a random (but bounded) factor⁵. The polygon was left in the first dataset where the point was transferred to the second dataset. The rest of the properties where distributed between the two datasets as follows: The first dataset consists of metadata containing the main information about the Wikimapia places and edit information about users, timestamps, deletion state and editors. The second dataset consists of metadata concerning basic info, location and language information. This way, the two sub-datasets essentially refer to the same Wikimapia entities, differing only in geometric and metadata information. Each of the two sub-datasets contains 102,019 geospatial entities and the first one contains 1,225,049 triples while the second one 4,633,603 triples.

Seven Greek INSPIRE-compliant data themes of Annex I
URL: http://geodata.gov.gr/sparql/

For the INSPIRE to RDF use case, we selected seven data themes from Annex I,that are describes in the Table below. Although all metadata in geodata.gov.gr is fully compatible with INSPIRE regulations, data is not because it has been integrated from several diverse sources, which have rarely followed the proper standards. Thus, due to data variety, provenance, and excessive volume, its transformation into INSPIRE-compliant datasets is a time-consuming and demanding task. The first step was the alignment of the data to INSPIRE Annex I. To this goal, we utilised the Humboldt Alignment Editor, a powerful open-source tool with a graphical interface and a high-level language for expressing custom alignments. Such transformation can be used to turn a non-harmonised data source to an INSPIRE-compliant dataset. It only requires a source schema (an .xsd for the local GML file) and a target one (an .xsd implementing an INSPIRE data schema). As soon as the schema mapping was defined, the source GML data was loaded, and the INSPIRE-aligned GML file was produced. The second step was the transformation into RDF. This process was quite straightforward, provided the set of suitable XSL stylesheets. We developed all these transformations in XSLT 2.0, implementing one parametrised stylesheet per selected data theme. By default, all geometries were encoded in WKT serialisations according to GeoSPARQL.The produced RDF triples were finally loaded and made available in both Virtuoso and Parliament RDF stores, in http://geodata.gov.gr/sparql, as a proof of concept.

INSPIRE Data Theme Greek dataset Number of features Number of triples
[GN] Geographical names Settlements, towns, and localities in Greece. 13 259 304 957
[AU] Administrative units All Greek municipalities after the most recent restructuring (”Kallikratis”). 326 9 454
[AD] Addresses Street addresses in Kalamaria municipality. 10 776 277 838
[CP] Cadastral parcels The building blocks in Kalamaria are used. Data from the official Greek Cadastre are not available through geodata. gov.gr. 965 13 510
[TN] Transport networks Urban road network in Kalamaria. 2 584 59 432
[HY] Hydrography All rivers and waterstreams in Greece. 4299 120 372
[PS] Protected sites All areas of natural preservation in Greece according to the EU Natura 2000 network. 419 10 894

GeoKnow at Semantics 2015, Vienna

Several partners of GeoKnow were present this year at the Semantics conference 2015.
The previous day of the conference we organised a workshop about the work done during these last three years in GeoKnow.
In the conference, three papers with GeoKnow acknowledgement were presented:

  • Integrating custom index extensions into Virtuoso RDF store for E-Commerce applications, presented by Matthias Wauer,
  • An Optimization Approach for Load Balancing in Parallel Link Discovery presented by Mohamed Ahmed Sherif, and
  • Data Licensing on the Cloud – Empirical Insights and Implications for Linked Dat, presented by Ivan Emilov

And two posters in the posters sessions:

  • The GeoKnow Generator Workbench – An Integrated Tool Supporting the Linked Data Lifecycle for Enterprise Usage, and
  • RDF Editing on the Web

Moreover, the GeoKnow team was demonstrating tools and the Workbench at the Booth reserved for us. It was a nice experience and good opportunity to share our work and to see other peoples projects.


The 2nd Geospatial Linked Data Workshop

This week we the 2nd GeoLD workshop took place previous to the Semantics conference 2015 in Vienna. We had as invited speaker Franz Knibbe from Goedan in Netherlands. Franz is currently contributing to the Spatial Data on the Web Working Group, where people from OGC and W3C are trying to define the best ways to integrate geospatial data on the web of data. His talk was very inspiring, for instance he described us part of the spatial aspects that matter for both working groups, data that goes from gathering data from the galaxy, to microscopical skin structures. You can discover little bit more of his talk in slideshare.
The workshop continued with the presentation of three software tools for exploring geospatial data on the web. Facete is a faceted browser of geospatial data in RDF format, and also allows to edit the data. The second tool was ESTA-LD, which can be used for exploring statistical data that is represented using the Data Cube Vocabulary. And DEER, a data extraction and enrichment framework, allows to create pipelines for analysing unstructured data, finding interlinks with other datasets, and extracting knowledge form the linked datasets in order to enrich the data.
We also presented the GeoKnow Generator demo, which integrates the tools presented +9, offering enterprise ready features, in order to support the company usage of such tools. The usability of GeoKnow tools was demonstrated with two more presentations. The Supply Management showed how they integrated spatial data for improving information and decision making in the supply chain. Finally, the Tourism e-Commerce showed how the integration of geospatial data is used to improve recommendations in a motive-based user request.

2015-09-15 09.09.26

2nd edition of GeoLD Workshop at Semantics Conference

We are preparing the second edition of the Geospatial Linked Data Workshop that will be held before the Semantics conference the 15th of September 2015, in Vienna.

For the GeoLD workshop we have invited an active member of the Spatial Data on the Web Working Group, who will be presenting the story so far carried out by this WG. This WG was created a year ago, and brought together two major standards bodies, the Open Geospatial Consortium (OGC) and the W3C with the objective to improve interoperability and integration of spatial data on the Web.

The rest of the presentations at the workshop are about useful tools for exploring geospatial data on the web, and enriching data with geospatial features. These tools and a complete Use Case scenarios will demonstrate the importance of integrating geospatial data to solve business questions.

You can have a detailed agenda in the workshop website. You can register for free in the conference website HERE.

Third GeoKnow Plenary Meeting in Leipzig

Unister and Brox hosted the 3rd GeoKnow Plenary Meeting in the beautiful city of Leipzig. On 30th June and 1st July, the project partners gathered to discuss the progress in the work packages and the status of the demonstrators. The project partners are developing tools to help Web users, companies and organisations find and exploit geospatial data. The primary use cases are tourism e-commerce and supply chain management, but the tools can be applied in many more scenarios.

Travel back in time to some warm summer weather…

DEER at ESWC 2015

GeoKnow was present at the Extended Semantic Web Conference (which took place in Portoroz, Slovenia) in many ways. In particular, we presented a novel approach for automating RDF dataset transformation and enrichment in the machine learning track of the main conference. The presentation was followed by a constrictive discussion of results and insights plus hints pertaining to further possible applications of the approach.

GeoKnow presented semantic search approach with geospatial context at KNOW@LOD workshop on ESWC

One of the key questions considering the use of linked data is search. Search-driven applications are widely spread, for example in the e-commerce industry or for business information systems. Hence, GeoKnow is also aiming on improving semantic search components, particularly considering geospatial data. Within the GeoKnow consortium the partner Unister — as being active as B2C service provider — is focussing on this topic in the last year of the project.
At the 12th instance of the Extended Semantic Web Conference (ESWC 2015) one result of this research presented in the well-organized 4th Workshop on Knowledge Discovery and Data Mining meets Linked Open Data (KNOW@LOD). The presented paper is named “Computing Geo-Spatial Motives from Linked Data for Search-driven Applications”. Hence, we are considering geo-spatial motives within search queries (e.g., “winter holiday”, “cutural”, “sports activities”) that cannot be answered by a data instance itself but need to interpret the information from the available data to discover relevant regions (e.g., populated places having a sufficient number of cultural hotspots nearby).