How SingleStone helped a Fortune 500 financial institution enable real-time streaming data for machine learning.
Capital One Financial Corporation is a leading information-based technology company focused on helping its customers succeed by bringing ingenuity, simplicity, and humanity to banking. Capital One’s forward-thinking business strategy and innovation-first culture
Capital One stored a vast amount of data in relational database systems. While these systems can be cost-effective, they tend to create siloed environments that fail to offer real-time access to data. With the advent of machine learning algorithms, Capital One saw potential in leveraging this siloed data. Changing the accessibility would enable Capital One to apply machine learning algorithms and unlock previously hidden business insights.
Solving this business problem required three teams to collaborate across three distinct steps. First, the teams identified the problem, then they devised a strategy and finally, they executed their vision. The producer group maintained a database of executive-level information that backed a production application. This database had over 20 tables that were turned into streams, which are used to move the data from the source system into Data Migration Service (DMS). The data engineering group facilitated the movement of streamed data through the serverless architecture and across AWS accounts to another group. Once there, another data engineering team used a similar serverless set up to copy the data into multiple streaming platforms, which served to enforce data governance and ultimately transfer it to Capital One’s data lake. This resilient architecture was mirrored in two AWS regions to ensure the system would still run in the event of a total region failure.
SingleStone worked extensively with Capital One to implement a serverless real-time streaming data pipeline using Amazon Web Services (AWS) services including Database Migration Service (DMS), Relational Database Service (RDS), Simple Storage Service (S3), Lambda functions, and Kinesis Data Streams. This pipeline fed Capital One’s Streaming Data Platform to enforce its data governance policies, while also delivering this data to the Capital One data lake. These data streams are now accessible in real-time, enabling the information they contain to be leveraged to improve machine learning algorithms.