Author Archives: Spiros Athanasiou

Virtual Machines of geospatial RDF stores

For our evaluation of current geospatial RDF stores (for which you can read more here), we have set up five (5) pre-built Virtual Machines. Each one contains a working installation of one of these geospatial RDF stores (in Debian 6.x “squeeze” OS):

The VMs are available at: ftp://guest@geoknow-server.imis.athena-innovation.gr/ (username:guest, password: be-my-guest).

The images are normal XEN domU disk images, which can be used to create a functional domU guest. Of course, you should also provide:

  • your own guest configuration file at /etc/xen/<the-vm>.cfg (or wherever you have chosen to configure your domU guests)
  • a swap image file

Use root:root as the root password for your newly created machine (and of course, change it just after your first login). After creating the machine, you should login with the serial console (eg. xm console <the-vm>) and:

  • configure your network interfaces to adapt to your local network
  • configure your hostname and your /etc/hosts file
  • restart your network interfaces

We have also uploaded at the same ftp location the dataset we used for our evaluation: OpenStreetMap data for Great Britain covering England, Scotland, and Wales (as of 05/03/2013) in ESRI shapefile format. Of the available OSM layers, only those concerning road network (roads), points of interest (points) and natural parks and waterbodies (natural) were actually utilized. From each original layer, only the most important attributes were retained: shape, name, osm_id, type.

You can find more information on how we used these VMs for our evaluation, in the public deliverable D2.1.1 ‘Market and Research Overview’.

Please enjoy!

Geospatial RDF stores: where do we stand?

Our goal in GeoKnow is to bring geospatial data in the Linked Data Web. Among others, our work will provide the means for efficiently managing and querying geospatial data in RDF stores.

But how can we measure our success? And what is the yard-stick we will compare ourselves to?

While many popular RDF stores advertise geospatial capabilities, we found little or no quantitative information in the relevant literature regarding the following:

  • Conformance to GeoSPARQL. Given GeoSPARQL’s status as an OGC standard and the ongoing work of various research groups regarding its integration into other Semantic Web technologies, there is a clear need for examining GeoSPARQL-compliance for RDF stores. In the highly standards-driven domain of GIS, conformance with open standards is a highly advertised and desirable property for open and proprietary software alike. End users select products based on features which have been validated under open tests that can be validated by anyone. In order for GeoSPARQL to reach the mainstream and be actively adopted by the Semantic Web and GIS community alike, we need a similar open process for compliance testing of the various products and systems
  • Performance under realistic geospatial workloads. Although there are several acknowledged benchmarks for evaluating RDF stores (e.g. BSBM), the only benchmark handling geospatial RDF is SSWB. SSWB, while it includes some important geospatial query types, is based on synthetic data and evaluates only one aspect of performance (query duration). There is a clear lack of a feature complete benchmark that is based on realistic data and workloads, in order to measure and highlight both the novel capabilities geospatial RDF stores provide, but also the support for typical everyday applications of geospatial data management.
  • Comparison with geospatial RDBMSs. Geospatial databases are used across the globe by a diverse community that produces, queries, and analyzes geospatial data. The performance, integration capabilities, and overall maturity of geospatial RDBMSs form the natural baseline of the GIS community regarding data management. Geospatial RDF stores will be compared along these lines. Of course they also provide functionalities that typical GIS and RDBMSs cannot support. This is something we can easily advertise and convince the GIS community about. However, we have no answers on what are the side-effects of applying a geospatial RDF store in a real-world setting. They might never be able to reach the native performance of an RDBMS. However, is that lack of speed balanced by increased interoperability? Similarly for scaling, compatibility with existing tools, etc. Potential users would like accurate information, enabling them to reach informed decisions regarding these trade-offs.

We feel that since incorporating geospatial capabilities into RDF stores is a subject of ongoing research and commercial efforts, it is important for the community to have access to a concrete methodology that describes the required resources and steps for objectively and realistically evaluating geospatial RDF stores.

With these goals, we introduced a benchmarking methodology for evaluating geospatial RDF stores, which will form the basis for a formal, feature-complete benchmark in the future.

Based on this methodology, Athena completed a first-cut evaluation of five RDF stores with geospatial support: Virtuoso Universal Server, Parliament, uSeekM, OWLIM-SE and Strabon. Further, we compared these RDF stores with two prominent geospatial RDBMSs: Oracle Spatial and PostGIS.

  • For our evaluation we used real geospatial data, and specifically, OpenStreetMap data for Great Britain covering England, Scotland, and Wales (as of 05/03/2013).
  • We transformed this dataset into RDF triples (~26M) using TripleGeo.
  • We tested a total of 21 SPARQL queries, covering several geospatial features: location queries, range queries, spatial joins, nearest neighbor queries and spatial aggregates. For each query we created a proper representation that would be compatible with each RDF store.
  • In parallel, we tested the same set of queries on the geospatial RDBMSs.
  • Apart from query times, we also measured other quantities, such as number of loaded triples, loading time and index creation time.
  • We evaluated the conformance of each RDF store to the GeoSPARQL standard.

A short description of the test queries used is given in the table below:

Our evaluation surfaced several interesting issues (some already known and others newly found), as well as valuable intuition and directions towards improving the functionality of geospatially-enabled RDF Stores.

  • First of all, it became clear that considerable effort from industry and academia is needed for adding baseline functionality on current RDF stores, regarding both the geospatial features and the geospatial standards they support. For example, some stores support only point geometries, while only a few of them support the GeoSPARQL standard.
  • Another important issue is the efficiency of current RDF stores, concerning loading, indexing and querying times, as compared to RDBMS’s.
  • Finally, several enhancements regarding both functionality and efficiency comprise very interesting research issues that can leverage the current performance of RDF stores, establishing them as first class research and industrial data management systems.

Detailed results, as well as a thorough survey of current geospatial and semantic standards can be found in our public deliverable D2.1.1 ‘Market and Research Overview’.

You can also find more information regarding our testing environment in this post. The complete set of pre-built Virtual Machines, queries, and data sets we used are are also freely available for download.

GeoKnow @ Greek Open Data Day 2013

GeoKnow members training volunteers on LOD technologies

GeoKnow members training volunteers on LOD technologies

The Institute for the Management of Information Systems (IMIS) of “Athena” Research Center (“Athena” RC), and the Greek Free/Open Source Software Society (GFOSS), organized for another year the Greek Open Data Day! Through this event, the open data community in Greece joined volunteers in 109 cities across the world, in celebration of the global Open Data Day.

The Greek Open Data Day followed the footsteps of yearly open data days and hackathons co-organized by IMIS and GFOSS since 2010. This year, more than 160 members of the open data community were present, marking a milestone for open data in Greece. The participation, interest and efforts of all volunteers were an unprecedented success!

All the members of the IMIS – GeoKnow team were there! It was a great opportunity to present our ongoing and planned work, involve the open data community, raise awareness from the public sector, and network with private companies in Greece that actually use open geospatial data.

We had the opportunity to present the GeoKnow project to participants and elaborate on its technologies and goals. The interest from the public sector, but more importantly, from entrepreneurs and private companies was amazing! As OpenStreetMap data are integrated in a plethora of business processes and value added products, and the Data Web is emerging as a promising activity for the ICT sector, the novel approach of GeoKnow in aggregating and fusing geospatial LOD is extremely timely. In addition, many participants requested us to include GeoKnow services in geodata.gov.gr, the open geospatial data portal that IMIS developed and maintains. Everyone wanted to explore geospatial LOD and was fascinated by the potential of rapid integration of open and closed linked data. Well, we promised and we will deliver!

In parallel, five workshops were organized in which more than 60 members of the open data community were trained by researchers of IMIS in open data technologies. Again, the participation was remarkable and we promised that more similar training events will take place in the near future. We trained volunteers in open data publishing, generating LOD from open data, anonymization techniques, and overall displayed the complete arsenal of tools and technologies GeoKnow is based on.

The date for the next Greek Open Data Day is already set and GeoKnow will be there again!

IMG_0046

The interest for the GeoKnow project was encouraging, especially from the private sector

photo 3

We trained step-by-step volunteers on how to use the tools and technologies GeoKnow is based on.

 

MODAP Workshop on the Challenges of Big Data and Privacy

The MODAP Project, an EU funded Coordination Action, organized in Athens a multi-disciplinary workshop regarding the issues of Big Open Data and Privacy. The aim of the workshop was to bring together experts from various fields that have expertise related with Big Open Data. The workshop comprised a series of talks in emerging scientific trends and presented current state-of-the art practices.

TheIMG_0031 GeoKnow project was represented in the workshop by Spiros Athanasiou, who presented the state of the art in geospatial open data technologies and highlighted select real-world examples where open data pose privacy risks. Further, Spiros was an invited member of the panel of experts that followed, were several issues were raised and discussed.

It was a great opportunity to establish networking ties with relevant EU research projects and research teams. For once more, the interest of all participants regarding the technologies and outcomes of the GeoKnow project was extremely vivid. We explored several areas where GeoKnow can directly or indirectly contribute and laid the foundations for closer collaboration in the future.

Thanks to everyone that participated for the fruitful discussions we had!