Last Friday (2014-03-14) I (Orri Erling) gave a talk about GeoKnow at the EC Copernicus Big Data workshop. This was a trial run for more streamlined messaging. I have, aside the practice of geekcraft, occupied myself with questions of communication these last weeks.
The clear take-home from London and Brussels alike is that these events have full days and 4 or more talks an hour. It is not quite TV commercial spots yet but it is going in this direction.
If you say something complex, little will get across unless the audience already knows what you will be saying.
I had a set of slides from Jens Lehmann, the GeoKnow project coordinator, for whom I was standing in. Now these are a fine rendition of the description of work. What is wrong with partners, work packages, objectives, etc? Nothing, except everybody has them.
I recall the old story about the journalist and the Zen master: The Zen master repeatedly advises the reporter to cut the story in half. We get the same from PR professionals, “If it is short, they have at least thought about what should go in there,” said one recently, talking of pitches and messages. The other advice was to use pictures. And to have a personal dimension to it.
Enter “Ms. Globe” and “Mr. Cube”. Frans Knibbe of Geodan gave the Linked Geospatial Data 2014 workshop’s most memorable talk entitled “Linked Data and Geoinformatics – a love story” (pdf)about the excitement and the pitfalls of the burgeoning courtship of Ms. Globe (geoinformatics) and Mr. Cube (semantic technology). They get to talking, later Ms. Globe thinks to herself… “Desiloisazation, explicit semantics, integrated metadata…” Mr. Cube, young upstart now approaching a more experienced and sophisticated lady, dreams of finally making an entry into adult society, “critical mass, global scope, relevant applications…” There is a vibration in the air.
We ought to make a series of cartoons about the couple. There will be twists and turns in the story to come.Mr. Cube is not Ms. Globe’s first lover, though; there is also rich and worldly Mr. Table. How will Mr. Cube prove himself? The eternal question… Well, not by moping around, not by wise-cracking about semantics, no. By boldly setting out upon a journey to fetch the Golden Fleece from beyond the crashing rocks. “Column store, vectored execution, scale out, data clustering, adaptive schema…” he affirms, with growing confidence.
This is where the story stands, right now. Virtuoso run circles around PostGIS doing aggregations and lookups on geometries in a map-scrolling scenario (GeoKnow’s GeoBenchLab). VirtuosoSPARQL outperforms PostGIS SQL against planet-scale OpenStreetMap; Virtuoso SQL goes 5-10x faster still.
Mr Cube is fast on the draw, but still some corners can be smoothed out.
Later in GeoKnow, there will be still more speed but also near parity between SQL and SPARQL via taking advantage of data regularity in guiding physical storage. If it is big, it is bound to have repeating structure.
The love story grows more real by the day. To be consummated still within GeoKnow.
Talking of databases has the great advantage that this has been a performance game from the start. There are few people who need convincing about the desirability of performance, as this also makes for lower cost and more flexibility on the application side.
But this is not all there is to it.
In Brussels, the public was about E-science (Earth observation). In science, it is understood that qualitative aspects can be even more crucial. I told the story about an E-science-oriented workshop I attended in America years ago. The practitioners, from high energy physics to life sciences to climate, had invariably come across the need for self-description of data and for schema-last. This was essentially never provided by RDF, except for some life science cases. Rather, we had one-off schemes, ranging from key-value pairs to putting the table name in a column of the same table to preserve the origin across data export.
Explicit semantics and integrated metadata are important, Ms. Globe knows, but she cannot sacrifice operational capacity for this. So it is more than a DBMS or even data model choice — there must be a solid tool chain for data integration and visualization. GeoKnow provides many tools in this space.
Some of these, such as the LIMES entity matching framework (pdf) are probably close to the best there is. For other parts, the SQL-based products with hundreds of person years invested in user interaction are simply unbeatable.
In these cases, the world can continue to talk SQL. If the regular part of the data is in fact tables already, so much the better. You connect to Virtuoso via SQL, just like to PostGIS or Oracle Spatial, and talk SQL MM. The triples, in the sense of flexible annotation and integrated metadata, stay there; you just do not see them if you do not want them.
There are possibilities all right. In the coming months I will showcase some of the progress, starting with a detailed look at the OpenStreetMap experiments we have made in GeoKnow.
Linked Geospatial Data 2014 Workshop posts:
- Linked Geospatial Data 2014 Workshop, Part 1: Web Services or SPARQL Modeling?
- Linked Geospatial Data 2014 Workshop, Part 2: Is SPARQL Slow?
- Linked Geospatial Data 2014 Workshop, Part 3: The Stellar Reach of OKFN
- Linked Geospatial Data 2014 Workshop, Part 4: GeoKnow, London, Brussels, The Message (this post)