It has been a while since the last technical progress report has been published. Luckily, this is not because nothing has happened since, but rather because we were busy building things. This blog post briefly summarizes the most important of those things.
The most prominent changes are naturally related to the user interface. The layout is now based on three interrelated columns; one for the map, one for search / filter result listings and one for individual entries. On top of that, additional information such as a feed of recent additions and statistics are available in a popup window. Less visible, but an enhancement nevertheless, is the fact that navigating the map no longer requires full page loads and is thus much smoother.
The templates that editors of the OER World Map use to input data have also been slighlty reworked. In order to reduce the number of fields that are presented to the users, the inverses of fields for links that have a more or less natural direction have been hidden. In order to clarify the semantics of data elements, descriptions are available via tooltips. The most significant simplification probably is that Markdown is now supported for fields that hold running text.
Finally, first elements of an administrative interface have been implemented. Among these are the administration of roles for registered users, a UI for data migrations and a precise log of all changes that have been made to the database.
During phase II, the emphasis was to win editors for the OER World Map by individually inviting them to collaborate. An important step towards growing a bigger community of OER World Map users now is the possibility for anyone to register a user account. Once registered, it is possible to create a personal profile and thus represent oneself on the map. Also, the possibility to comment on entries is only available to registered users. You are very welcome to participate in editing data beyond that; get in touch if you are interested!
From a technical point of view, we have switched to what can be described as a perimeter security model. User authentification and authorization is now done by an Apache reverse proxy before a request even hits the OER World Map web application. On the one hand, this separation of concerns brings a performance gain. On the other hand, it would be hard to compete with Apache’s battle proven security anyways.
While invisible to most users, there have been very important improvements in the archticture of the back end of the system. While using an Elasticsearch index as our main data sink allowed us to quickly grow the system during phase II, some of the limitations of that approach became evident once more editorial activity was recorded.
On the one hand, data needs to be denormalized quite heavily to fully embrace the features of a document oriented system such as Elasticsearch, especially when it comes to aggregations. On the other hand, the data in the OER Data Hub is highly interlinked. This combination makes write-operations quite expensive because a single update operation often modifies multiple JSON documents in the index. A successful write operation could only be assumed once the data trickled into all places it was supposed to be, which made waiting times unacceptable.
To complement the extremely fast read operations that Elasticsearch provides with equally fast write operations, a special type of relational database, a triple store, was added to the technology stack. It is now our primary data store and the single source of truth in the system which asynchronously feeds the Elasticsearch index after write operations.
Another precondition to gradually open the platform to a bigger circle of editors has been data versioning. In order to ensure data quality, it must be possible to retrace the evolution of the dataset. In other words, it is necessary to completely understand who changed which parts of the data, and when the changes happened. Naturally being familiar with the way source code is versioned, we adopted the structure of Git commits to the RDF data in our triple store:
Author: felix.ostrowski@XYZ.com Date: 2016-07-01T15:55:51.012+02:00 + <urn:uuid:123> <http://schema.org/name> "Felix Ostrowski" . - <urn:uuid:123> <http://schema.org/name> "Felix Ostrowsko" . Author: felix.ostrowski@XYZ.com Date: 2016-06-29T18:01:40.587+02:00 + <urn:uuid:123> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> . + <urn:uuid:123> <http://schema.org/name> "Felix Ostrowsko" .
A nice side effect of all our data being a series of commits such as the ones above, is that the back-up strategy is simply a matter of saving plain text commit files. These are the only precondition to completely recover our data set after potential failures.
On the data model side of things, a tag field has been added to all resources. This, along with the corresponding filter, allows editors to create arbitrary custom subsets of the data. With regards to the
Service type, a
license field is now exposed, along with a controlled vocabulary of licenses and the corresponding filter. Finally, the
funder property is now available for Projects.