Improving Quality and Scalability in Semantic Data Management
Full collection of the Tutorial Materials could be accessed at http://km.aifb.kit.edu/sites/qs-sdm
This tutorial offers a walk through several techniques in the field of Semantic Data Management. These techniques stem across coping with semantic data provenance and ontology learning, dealing with dynamics and evolution; improving quality of semantic data and linked data in decentralized settings, at scale, allowing trust and autonomy, guaranteeing consistency; improving quality by applying multitype and largescale entity resolution, also at scale; and finally enabling feasible assertion of new knowledge by applying large scale parallel reasoning. Datasets developed under the context of the EU IRSES project SemData will be used as running examples throughout the tutorial.
This tutorial has a goal to present and discuss several complementary techniques, developed in frame of the SemData project, which are important to help solve the challenges of improving quality and scalability of Semantic Data Management, giving an account to distributed nature, dynamics and evolution of the data sets. In particular, the objectives are:
- Refined temporal representation: to present the developments regarding the required temporal features which help cope with dataset dynamics and evolution.
- Ontology learning and knowledge extraction: to demonstrate the methodology allowing robustly extracting a consensual set of community requirements from the relevant professional document corpus; to refine the ontology and evaluate the quality of this refinement.
- Distribution, autonomy, consistency and trust: to present the approach to implement a Read/Write Linked Open Data coping with participants autonomy and trust at scale.
- Entity resolution: to discuss how traditional entity resolution workflows have to be revisited in order to cope with the new challenges stemming from the Web openness and heterogeneity, data variety, complexity, and scale.
- Large scale reasoning: overview the variety of existing platforms and techniques that allow parallel and scalable data processing, enabling largescale reasoning based on rule and data partitioning over various logics.
The tutorial comprises a short introduction followed by the four topical parts focusing on different but mutually complementary techniques for Semantic Data Management. The parts are scheduled as follows:
- Introduction (10 min)
- Refining Temporal Representations using OntoElect (45 min)
- Distribution, Autonomy, Consistency, and Trust in a Read/Write LOD (45 min)
- Coffee Break (30 min)
- MultiType and LargeScale Entity Resolution (45 min)
- LargeScale Reasoning over RDF(S) and OWLbased Semantic Data (45 min)
- Roundtable Discussion (20 min)
The preparation of this tutorial has been funded by the EU Marie Curie IRSES project SemData.