Kensho Industry Use Case: Natural Language Processing for Document Management Systems

Kensho Communications
3 min readJun 28, 2023

Background

In today’s information-driven world, organizations face the daunting challenge of efficiently managing and extracting value from vast amounts of textual data. This is where Natural Language Processing (NLP) comes to the rescue. NLP, a branch of artificial intelligence, empowers document management systems to understand, interpret, and extract meaningful insights from unstructured text. By combining the power of linguistics, machine learning, and data processing techniques, NLP revolutionizes how organizations handle documents, enabling them to unlock the untapped potential of their textual data. We will explore how NLP empowers document management systems to streamline processes, enhance search capabilities, and extract valuable information from unstructured documents, ultimately driving productivity and informed decision-making.

Challenges

Two key challenges when working with thousands or millions of documents are quickly finding the few documents you need and getting an overview of what the corpus has information on.

Finding specific documents

Keyword search is the classic solution, and it’ll get you 80% of the way there, but it will always be approximate. Keyword searches will bring in additional documents that aren’t relevant to your interest, and they’ll also miss relevant documents.

For example, if I’m searching for Apple, the technology company, I don’t want documents mentioning food shortages of apples. If I search for documents about viral diseases, I don’t want documents about computer viruses to appear.

If I’m researching the history of Meta Platforms, I want the search to include mentions of Facebook since that was the company’s name before 2021. The same is true for S&P Global and McGraw Hill Financial in 2016. If I’m interested in large language models, I could want my search to include frameworks like LangChain and models like BloombergGPT without necessarily knowing them in advance.

Summary information

Humans are great at quickly understanding a single document, but it’s too time-consuming to manually read thousands of documents about a specific thing you’re interested in. A corpus of documents unlocks insights over time, provided you can harness this insight.

You might want to analyze who a company perceived its competitors as and how that changed over time. Or, you might want to track intersections of companies and topics over time, like regulation and regional banks.

If you have a new dump of data, how can you quickly tell how this new information affects your understanding of the world? Or, if you change your internal taxonomy, how can you regenerate your insights on the existing data?

Solution

Kensho NERD and Classify give you the building blocks to unlock these insights, efficiently structuring the entities and topics throughout your corpus.

Simple keyword searches aren’t able to handle the nuance you need. While complex boolean searches sacrifice complexity for better results, they must be manually updated to keep up with the world. Kensho NERD and Classify go beyond keyword search: NERD identifies entities like companies, automatically handling name changes, aliases, subsidiaries, etc. Classify is the other side of the coin, handling broad thematic topics like viral diseases and large language models.

You can then incorporate the structured entities and concepts identified by NERD and Classify into your system. For example, instead of just keyword searches, you can now search by entity or concept. You can also measure entity-entity, concept-concept, and entity-concept correlations in your text. This unlocks many use cases, like leveraging entity-entity co-mentions to discover competitors or leveraging entity-concept correlations to determine sentiment about specific companies.

NERD and Classify use state-of-the-art machine learning to handle context and nuance better. They help automate structuring your text, linking it to add structured data from S&P in the case of NERD, or allow you to customize the model to your needs in the case of Classify.

Kensho leverages S&P Global’s vast knowledge bases throughout our products, giving them industry-leading performance in financial and business domains.

Sign up for a free trial to start using NERD and Classify today!

--

--