Being able to precisely meet the individual information needs of their customers, including with micro-content, will have to become more of a core competency for specialist publishers. In this interview, Andreas Blumauer explains the important role that knowledge models play in this and why LLMs cannot do this alone [Translation of an article original Heinold and Friends Blog]
The opinion that “AI can do anything” is widespread, and not only in the publishing industry. Where are the limits for specialist publishers in particular?
Specialist publishers are currently confronted with various, sometimes contradictory trends. On the one hand, a large number of content modules are being produced at low cost using LLMs, which on the other hand makes high-quality content even more important in order to achieve a strategic competitive advantage. In addition, there are new technical possibilities for linking, filtering and displaying content from different sources or data silos in such a way that new personalized information products are created. The core competence of specialist publishers will be even more focused on being able to precisely meet the individual information needs of their customers with micro-content, among other things. The age of hyper-personalization has dawned. LLMs alone cannot serve this, but they are an important building block.
What is the interaction between AI and taxonomies and semantically derived knowledge models?
(1) Knowledge models help with the pre-filtering and personalization of relevant information, (2) language models (AI) can then convert this into content modules that can be read by humans and software agents (bots). If language models were also responsible for (1), there would be problems with scalability and significantly higher costs. (3) In addition, knowledge models can be used to enrich prompts and results with domain knowledge in a targeted and controlled manner in order to prevent hallucinations, among other things.
How can specialist publishers in particular use this combination?
LLMs offer the technical possibilities to increase the level of automation along the entire content lifecycle, but from a business perspective they can only be used sensibly if the associated risks remain manageable. Knowledge models help, for example, to supplement or filter out incomplete information. In addition, they offer opportunities to support the advisory or question-and-answer dialogue by enriching additional domain knowledge and context.
In order to make even better use of AI models such as an LLM, the concept of “retrieval-augmented generation” (RAG) was developed. In short: What does the concept do and how much effort does it take for a publisher to use an RAG?
LLMs are primarily based on content that has not been authorized by specialist publishers themselves. RAG provides that language models, as soon as they are called up, are specifically linked to additional relevant context information. This can be knowledge from the publisher’s own databases, or it can also be the latest news, research results or statistics from third parties. RAG-based systems therefore generate answers and search results that are always based on their own or at least controllable content, while the LLM is “only” used for services such as creating summaries, but never as a knowledge base in itself. RAG is thus able to retrieve current and authorized information in order to serve users with better quality answers and fewer hallucinations.
RAG architectures are essential for specialist publishers, as this is the only way to ensure adequate quality management. In addition, the overall costs of a knowledge portal can be kept relatively low through the smart use of knowledge models. See also: https://www.poolparty.biz/semantic-retrieval-augmented-generation .
You operate the portal https://knowledge-hub.eco . In short, what is the goal of the portal and how do AI and knowledge models interact there?
On the one hand, we want to use this portal to make knowledge about climate change and ESG (EU taxonomy, sustainable investing, etc.) more easily accessible; on the other hand, the system also serves to point out our B2B offers in this area. In addition, this new portal enables us to test various innovative RAG architectures and to better measure and understand the associated user added value. Essentially, at knowledge-hub.eco we rely on our ESG knowledge model and use it to process thousands of websites and ESG reports using a Semantic RAG architecture.
Your presentation at the CrossMediaForum is entitled “GPTs have revolutionized AI. Do we still need knowledge models and taxonomies, and why?” What will be the most important message?
GPTs and LLMs have brought AI into the mainstream. No one can ignore it anymore, and everyone now wants to get in on this rapidly developing infrastructure. We are all seeing too much honey and overlooking hidden costs and risks. These, as well as all the possibilities that LLMs offer, need to be systematically investigated and better understood. Companies need to be more open than ever to experiments and start integrating the keyboard of AI, and that includes more than the LLM octave, into their business models.
Andreas Blumauer is CEO, Semantic Web Company GmbH and speaker at the 26th CrossMediaForum on July 4, 2024 in Munich.
Andreas Blumauer
CEO Semantic Web Company