Automatically extracting financial insights from news stories is an intricate process that involves scouring the internet for current news, identifying the key pieces of information in the articles, and making connections across the extracted information to drive financial decisions. Our system gathers relevant news stories through sophisticated web crawling, combined with a categorization algorithm. The news is analyzed along several dimensions to extract key insights in real-time. At the core of our engine is a scalable language technology stack to identify relevant entities, relations between them and events they are involved in. These entities, relations and events form the basis of our financial insights engine, which incorporates user context, current events and a background knowledge graph to generate timely reports, trends, and dashboards of the most crucial pieces of knowledge driving investment decisions.
To be effective, we must be on top of all relevant breaking news. Our system constantly crawls all the key news sources for any developing stories that are likely to impact financial decisions. All stories are gathered, relevant segments of text extracted and initially tagged by a lightweight text categorization model, which enables our downstream processing to appropriately handle these texts. Our text categorization is driven by hand-engineered rules from a human expert in the field.
Most financial decisions are driven by key events that affect market trends. Timely discovery of such events automatically is crucial. For instance, a story about “a sudden drop in Oil prices” can be important to some investors. Our NLP system uses scalable machine learning technologies to detect entities (such as China, Oil, etc.), relations (such as, CEO, supplier, etc.) and events (such as, price drop, rate increase, etc.) within the text of news articles. Our language technology components perform a basic linguistic analysis of text – tokenize, syntactic parse, detect named entities, resolve coreference, etc. – and use this analysis as building blocks in our machine learning models for entities, relations and events. We use various regression, classification and structured prediction approaches along with a rich (hand engineered) feature set to train these models. All of the information extracted by these models stored in our knowledge base and used by the downstream informatics engines.
While the information extracted by the language technology stack is used by various modules in the system, in one such module, an effective quantification derived from the knowledge base is the notion of change in sentiment. Each event in the knowledge base typically represents a positive or negative (direct) effect on certain entities. For instance, “increase in Chinese GDP” is a positive event for China, while “Starsoft reported lower than consensus 4Q earnings” is a negative event for Starsoft. Such changes in sentiment are tracked by our sentiment module through rule-based sentiment attribution and tracking.