Many companies have their own classification systems to label and structure content. The most advanced ones even have their own Knowledge Graphs, a representation of the company’s knowledge that can be understood by both humans and machines. But how does one use this knowledge to process unstructured data such as text documents automatically? How can we recognize that a document mentions one or another resource from the Knowledge Graph?
BMW has designed a car that is going to drive Jaguar X1 out of the Car market.
Simple examples like this demonstrate that string matching with linguistic extensions is not enough to understand if a word represents a resource from the Knowledge Graph. We need to disambiguate words, that is to discover which concepts stand behind these words. In order to solve the task we use a two-step solution:
- Word Sense Induction (WSI)
- Word Sense Disambiguation (WSD)
The goal of the WSI step is to induce the senses of the target word for the given corpus. The outcome is a set of senses of the target word — a sense inventory. These senses are then used in the WSD step. The most advanced WSD methods are able to mix the induced senses with the senses taken from Knowledge Graph, in other words, they include external senses into the sense inventory.
Read further PoolParty’s Director of Research’ explanation of how machines can understand language, in his series “Label unstructured data using Enterprise Knowledge Graphs” on Medium.