Category Archives: openness

search-conf-screenshot

How OER World Map determines search result order

OER World Map collects a lot of data. This is essential for making data centrally available, but as more is collected, the difficulty of finding a specific item increases, regardless of license or data content. Therefore, as data in OER World Map increases, it is very important to implement efficient and targeted search and ranking algorithms.

There are search algorithms, whose complexity, efficiency and confidentiality are impressive. The major search engines in the world are clear examples. Of course, a relatively small non-profit project as the OER World Map can not develop a such complex search algorithm from its own resources. This is also not desirable because the platform is built on the principles of transparency and openness.

Why is transparency so important?

We can assume that the user has the ability to comprehend the search behaviour and what caused the respective ranking of a search result.  Furthermore, and as long as they feel that, in determining the rankings, no topics, authors, vendors, interests or similar parameters are preferred, the user can trust the result. However, once parts of the algorithm are hidden in the proverbial ‘black box’, is at least a theoretical possibility that some searchable items might receive preferential treatment (or be discriminated against).

Like the entire code of OER World Map, our ranking mechanism is implemented as open source. In this way, the OER World Map demonstrates that the same rules and conditions are applied to all resources (services, organizations, people etc.), and that no differences of treatment are existent.

Of course, every search algorithm includes factors that lead to the higher weighting of individual results – otherwise there could be no ordered ranking at all. (These factors are just not dependent on specific content but on universal features like morphological matching or the length of an entry for example.) In the following, the most important search ranking constituents are illuminated (as of September 2016).

The code of the OER World Map

The search for the OER World Map is based on Elasticsearch as the main container for data storage. Elasticsearch is an open source search engine based on Apache Lucene. It allows the configuration of the search mechanisms via a JSON file, called index-config.json within the OER World Map. Within this file you can define whether and how individual data should be searchable. Currently, Elasticsearch is configured as follows:

  • “name” and “alternateName” are both indexed, in original spelling and variants in order to ensure that searching with typos could still produce the intended hits.
  • All other fields are indexed in their standard format (as written in the database).
  • From the data model point of view, all resources can be associated with addresses and geo-coordinates.

Within the OER World Map, a search command to Elasticsearch is triggered by the method esQuery() in the Java class ElasticsearchRepository. The following parameters can be controlled by this method:

  • Field Boost: the field-boost determines which data fields get more weight in the search. Classically, in particular the “name” field is greatly boosted. For example, “alternate name” can (somewhat less) also be boosted. (Boostings are concretized below.)
  • Limitation to a specific partial result: to scroll through multiple search results pages, it is useful only to display the results of a partial area, so for example, only the hit “1 to 10” or “11 to 20”.
  • In very special cases, it may make sense to display search results on ascending order, meaning that the results with the smallest search result value are listed on top. The OER World Map and Elasticsearch basically allow ascending and descending order. The default provided by the OER World Map is “descending”.
  • For completeness, it should be mentioned that search results can be omitted entirely from the results list due to geo-filtering. While the source code of this feature is already written, it is corrently not yet activated. As soon as this implemented feature will be activated, a user can limit the search to a specific geographical area (through the display of a particular map section), whereby all results from outside of this area do not appear in the results list.

The global preferences of the OER World Map for field boosting are located in the file search.conf. At present, boosting provides the following weighting of fields:

  • “name” by a factor of 9
  • “alternateName” by a factor of 6
  • “provider.name” by a factor of 5
  • “provider.alternateName” by a factor of 4
  • “agent.name” by a factor of 4
  • “agent.alternateName” by a factor of 3
  • “participant.name” by a factor of 2
  • “participant.alternateName” by a factor of 1
  • “memberOf.name” by a factor of 1
  • “memberOf.alternateName” by a factor of 1
  • “member.name” by a factor of 1
  • “member.alternateName” by a factor of 1
  • “article body” by a factor of 1

Outlook

Due to continuous development of the OER World Map, details (such as boosting factors) are going to evolve over time. New search fields might be added, or existing ones eliminated. It is envisaged that there will be an additional weighting based on “likes” (or some other voting system). The amount of links to a resource is a desirable weighting parameter as well. In any case, the quality and reliability of the OER World Map will always be gauged from the preservation of transparent and evenhanded search. OER World Map users can always check and be certain that search results are determined fair and reasonable.

The code of the OER World Map is hosted on Github. In still more specific questions, the team of OER World Map would refer you first to the source code but are also very happy to answer questions!

opendoor

The OER World Map Openness Indicator – Background & Introduction

Since most of our team members are somehow connected to the library world, one of the first things we wanted to do, when we started phase II of the project, was to define a clear collection policy for the OER World Map, which should define which data to collect and which not. A clear scope, so we thought, would be especially important for a project like the World Map, since trying to collect too much often ends in collecting nothing right.

We consider ourself to be dedicated to Openness, which means that we support open licenses, develop open source software and even do most parts of our project communication openly on GitHub. Therefore our initial approach to define a collection policy was to restrict the OER World Map to entries, which are related to ‘real OER’, which according to my understanding meant in Creative Commons terminology CC BY, CC BY-SA and also CC BY-NC licenses and equivalents (though another strong opinion in our team argued that NC was no ‘real OER’ according to the Open Definition).

Discussing this issue occasionally, we finally came to the point that keeping this strict focus could not be maintained and that we had to loosen our collection policy. Some of the reasons for this were,

  • that also gratis services (=services, which provide free, but not openly licensed materials) offer some value for situations, where reuse is not needed,
  • that the gratis services of today may become the open services of tomorrow,
  • that the focus on licensing does not really fit for other things than OER collections. For example focussing on licenses only does not help very much to evaluate adequately a project focussing on developing open practises.
  • that otherwise openness has to be decided on before adding a resource to the map, which might raise practical problems.
  • it’s a rather paradox to build a service on open education with a very closed collection policy.

Though being based on good reason, we nevertheless felt that this decision challenged our initial goal to use the OER World Map as a tool to support ‘real openness’. Our solution to this dilemma was to develop an ‘Openness Indicator’, which would allow users to easily see how open a service is. By doing so, we believe that it is possible to be open and flexible as far as our collection scope is concerned, without losing focus on openness.

When we began thinking about how an ‘Openness Indicator’ could look like, our initial focus was to keep it as simple as possible so that it could easily be applied by OER World Map editors. We therefore came up with a very simple structure: three levels of openness, based only on the chosen license. The basic idea was to design the indicator similar to traffic lights, green for very open services, yellow for fairly open services and red for hardly open services.

While we were still thinking in this direction, we found that a special challenge was that we wanted to decide on the openness of a whole collection and not on the openness of an individual resource. It is easy to look up the license of a single resource, but how should it be possible to judge the openness of whole collections? In case a repository has a clear licensing policy, e.g. by stating that all included contents have to be licensed CC BY, this is quite easy. But according to our experience, this is rather the exception in the world of OER, where most repositories include heterogeneous licensed material.

Finally Adrian came up with the solution, which looked something like this:

  • Green: all resources are licensed under an open  license (CC 0, CC BY, CC BY-SA).
  • Yellow: Some or all resources are licensed under CC-BY, CC-BY-SA, CC-BY-NC, CC-BY-NC-SA
  • Red: All resources are licensed with ND-license or have no license indication at all.

Using this approach we found it was possible to combine the question of different levels of openness with the question of internal license heterogeneity, while abiding by the vision of a simple three colour scheme.

But again things turned out to be more complex than that. Rob was the first to express concerns that a simple three colour scheme might appear too offensive for some, while at the same time oversimplifying the topic. This concerns rang in the next level of development of the Openness Indicator. But the real breakthrough came when Pat Lockley from solvonauts joined the discussion. Actually it was him who hinted us to the “HowOpenIsIt?” Open Access Spectrum (OAS) developed by the Public Library of Science (PLOS) in cooperation with the Scholarly Publishing and Academic Resources Coalition (SPARC) and the Open Access Scholarly Publishers Association (OASPA).

OAS, in its own words, “moves the conversation from ‘Is It Open […]?’ to ‘How Open is it?’ and illustrates a nuanced continuum of more versus less open to enable users to compare and contrast publications and policies across a grid of clearly defined components related to readership, reuse, copyright, author and automatic posting, and machine readability.” We found this approach so appealing, that we instantly wanted to reuse it for the OER World Map. But soon we had to find out that, once again, things are not so easy, since the OAS, being developed for Open Access Journals, does not fit for OER-repositories for several reasons:

In the ‘reader rights’ component there are several points which refer to the length of an embargo period. In the field of Open Access it is common that new issues are available only for journal subscribers at the beginning and become open respectively free after an embargo period of several months. This does not seem to fit to OER. At the same time this dimension does not ask for compulsory registration, which arguably is a restriction of the access rights of the reader and can be found occasionally in some OER services.

The ‘reuse rights’ dimension introduces very similar levels of openness as introduced above, but does not give any answer to the question how to handle license heterogeneity. Probably this is because consistent license policies are much more common in OA-journals, than they are for OER repositories.

The ‘copyright’ dimension seemingly does not fit to OER without modification, if at all. This component mainly deals with the question, if the copyright is held by the author or the publisher. Since commercial publishers are still the exception in the field of OER, this section will make no sense in most cases. Within the field of OER, especially within Higher Education, it actually could be more interesting to ask if the copyright is hold by the author or by the higher education institution, which employs her. Though this analogy seems to be quite interesting, I`m not sure, if it really makes a difference for the openness of a repository. As long as it’s open licensed, I would argue, it does not matter, who holds the copyright.

Also the ‘Author posting rights’ and the ‘Automatic posting’ dimension seem to be closely related to phenomena typical for (and restricted to) Open Access Journals. While the former refers to the question, how preprints are handled, the latter refers to the question, if resources are automatically posted to other repositories.

Last but not least the machine readability dimension is quite interesting and certainly makes sense to be applied to OER as well. Nevertheless the dimension does not refer to open formats of the resources, which is frequently considered to be quite important for the openness of a resource. Also it does not include the use of open source software, which might be an interesting aspect, when talking about the Openness of a service.

All in all we concluded, that we cannot adapt the OAS without major adoption for OER repositories. We therefore started defining an indicator, which reuses OAS dimensions as far as possible. An initial version can be found here. We will describe its structure and fields in one of our next blog posts. We believe that the Openness Indicator should be discussed by a wider audience and therefore look forward to receive your comments and questions on this important topic!

(foto: “Open Door” by Börkur Sigurbjörnsson, CC BY 2.0)