Large­Scale Reasoning​ over RDF(S)­ and OWL­based Semantic Data

Ilias Tachmazidis

Huge amounts of data are published by public and private organizations, generated by sensor networks and social media. Apart from the issues related to size, this data is often dynamic and heterogeneous. In addition, data as a resource has been identified, and is increasingly utilized to generate added value. Reasoning can facilitate the inference of new and useful knowledge. The major challenge arising in these settings is the feasibility of reasoning over very large volumes of data. Over the past few years, reasoning on a large­scale has been shown to be achievable through parallelization by distributing the computation among nodes, thus scaling reasoning up to 100 billion RDF triples. There are mainly two proposed approaches, namely rule partitioning and data partitioning. In the case of rule partitioning, the computation of each rule is assigned to a node in the cluster. Thus, the workload for each rule (and node) depends on the structure and the size of the given rule set, which could possibly prevent balanced work distribution and high scalability. In the case of data partitioning, data is divided in chunks with each chunk assigned to a node, allowing more balanced distribution of the computation among nodes.

In this section of the tutorial, an overview of existing computing models is presented (such as MapReduce, GPUs, OpenMP and MPI) that enable large­scale data processing, thus allowing large­scale reasoning over huge data volumes. In this way, attendees will be familiarized with a variety of options in terms of existing platforms that allow parallel and scalable data processing. In addition, an overview of existing approaches for large­scale reasoning, based on rule and data partitioning, over various logics such as RDF/S, OWL, Datalog, Description Logics, Defeasible Logic and the Well­Founded Semantics will introduce the audience to semantic manipulation and reasoning, which allows the understanding and exploitation of given datasets and their interconnections. In essence, the discussed methods use rules to encode inference semantics, common­sense and practical conclusions in order to infer new and useful knowledge based on the given data. Finally, potential future trends in this area will be discussed.