Distribution, Autonomy, Consistency, and Trust in a ​Read/Write LOD

Luis­Daniel Ibáñez, Andreas Harth

The LOD initiative has succeeded in making a tremendous amount of data available for querying through SPARQL endpoints. However, querying Linked Data (LD) is undermined by two critical issues. First, the availability of endpoints has been measured to be above average, and many highly demanded linked data providers have been forced to restrict access to their endpoints through timeouts or resultset limits. Second, the quality of the data is low due to broken links, mismatched entities across datasets, wrong types, etc.

Individual data providers can take care of the quality of their own datasets, but errors or changes that break queries spanning multiple datasets are in general noticed by the external consumers that perform them. Unfortunately, even if consumers know how to repair data, they do not have update privileges to data that does not belong to them, i.e., LD is read­only. If LOD were writable, consumers could contribute to enhance its general quality. Making LOD writable poses a number of challenges:

  1. Trust​: As a producer, how to manage the update demands that arrive from consumers? How to select the trusted consumers? 
  2. Autonomous participants: Contrary to distributed databases and clusters where one organization has full control of writers, LD's producer and consumers are autonomous, therefore, nocoordination can be expected from them.
  3. Consistency​: When data is being updated by several(autonomous) participants, how to guarantee its consistency?
  4. Web scale​ : The number of consumers and producers of LD is expected to continue growing. A Read/Write LD model and infrastructure need to scale adequately with the increase of the number of participants.

In this part of the tutorial a model to realize a Read/Write LOD, inspired by Distributed Version Control Systems, is discussed: consumers copy data they are interested in from different providers, update to repair it if needed, and propose back their modifications to the original sources. The model builds upon the provenance of RDF triples to guarantee two different consistency criteria ­ Eventual Consistency and Fragment Consistency. As a positive side effect, the created link between consumers and providers can be further exploited by federated query engines to alleviate the load on providers by retrieving data copies from consumers, improving the overall LD availability.