Knowledge Graphs (KGs) have been recognized by several industries as an efficient approach for data governance, semantic enrichment, and as a data integration technology that brings unstructured and structured data together. Use cases for graph technologies make use of automatically generated unified views of heterogeneous and disconnected data sources. These ‘virtual graphs’ provide richer data sets to feed analytics platforms or to train AI algorithms. Subsequently, advanced tools for knowledge discovery and data analytics can be built based upon a semantic layer.
Knowledge Graphs are not ‘just another database’
Nevertheless, appropriate methodologies to build, maintain and govern the knowledge graph itself in a sustainable way and on a large scale are not obvious yet for many potential users and stakeholders. Knowledge Graphs are not ‘just another database’, they rather serve as a vehicle to rethink and rework the existing data governance model, while, at the same time, a governance model for the KG management itself has to be developed.
In this brief article I will list several questions that will help organizations to develop capabilities to initiate and successfully execute their knowledge graph projects. These include organizational aspects, as for example, how to establish appropriate roles to bridge mental gaps between departments focussing more on documents and knowledge-driven work in contrast to data-driven practices. This question also addresses the methodological challenge to align subject matter experts and their domain knowledge models with data engineers and their rather ontology-based approaches to automate data processes.
How to mitigate the risk of generating a ‘not-invented-here syndrome’
Additionally, various stakeholders have to be positioned well as integral parts of enterprise knowledge graph initiatives to mitigate the risk of generating a ‘not-invented-here syndrome’. This will include roles like enterprise architects, data scientists and analysts, data warehouse specialists, knowledge managers, and of course all the business lines that ultimately will benefit from knowledge graphs.
Involving business and data stewards as soon as possible is essential, since users will become an integral part of the continuous knowledge graph development process nurturing the graph with change requests and suggestions for improvement.
Laying the foundation of a KG governance model
Eventually some foundational decisions have to be made:
- Which parts of the graph will be governed merely centrally, which are driven rather by collaborative and decentralized processes?
- How can diverse requirements be fulfilled, also with regards to differing ideas of what a good-quality knowledge graph actually is.
- Which parts of the KG can be generated automatically without harming the overall quality criteria, which elements have to be curated by human beings?
- Which data elements, e.g. structured datasets or already existing taxonomies can be incorporated potentially into the evolving KG?
Introducing quality metrics and KPIs at the right point in time
Introducing quality metrics and KPIs at the right point in time of a KG project is a key success factor. Any KG project shouldn’t remain a stand-alone initiative but should be embedded in the company‘s overall data governance framework as early as possible.
Based on a closer look at the anatomy of a knowledge graph, its layers, and its structural elements, we better understand the methodology systematically and we can define parameters more specifically that potentially will have an impact on the quality metrics of a corresponding KG governance framework.
Knowledge Graphs as an extension of data governance in place
Ultimately, Knowledge Graphs as an agile approach for data management based on the linked data life cycle implies the need for an extension of the existing data governance framework. Any graph project triggers changes on various layers of an organization and its information and data architecture:
- New roles, their interplay and their responsibilities have to be defined.
- Content and data authoring / curation processes will be extended and partially automated.
- Diversification of access points to data and knowledge have a direct impact on the existing data governance model.
- New ways to gain insights to enterprise data will be developed, e.g. automated generation of links between data points which initially were not connected yet.
- These new insights, in return, trigger new questions related to GDPR compliance.
- Algorithms that automatically generate personalized views on content and data enhance customer experience.
- New ways to filter and contextualize data objects will be available ‘as a service’.
- New and diversified perspectives on data quality make the necessity to establish a Data Governance Board even more obvious.
- New ways to make use of data standards and harmonized metadata boost the value of existing data sources.
Conclusion: KGs are the basis for the next wave of AI systems
In its inner core a sustainable methodology to create and maintain enterprise knowledge graphs will be implemented, which will change the use of AI technologies substantially: while machine learning components continuously learn from raw data, advanced AI systems combine this with existing knowledge from the KG, and produces new knowledge, answers and explanations at the same time.