Data Warehouse on Kubernetes

Yellowbrick Logo
Yellowbrick | Spray Paint

Achieving Real-time Analytics for Insurance

Achieving Real-time Analytics for Insurance


Bhuvana Ramakrishnan: Hello, everyone. Thank you for joining us for today’s webinar. I’m your host, Bhuvana Ramakrishnan, Head of Events at Yellowbrick. Our topic today is about achieving real-time analytics for insurance, and we’ve got some amazing use-cases to share. In this session, you will discover how to build a real-time standards-based analytics platform that leverages existing tools, models, and investments in both public and private clouds. Let’s welcome our presenters Nick Cox and Yusuf Gulamhussain. Nick Cox is Head of Product at Yellowbrick. And before joining Yellowbrick, Nick ran product management at Pure Storage and Yusuf is an expert in all things, data subject matter expert in the insurance space with over 20 years of key business insights and technology experience. As the CTO of Systech Solutions, Yusuf spearheads new technology and innovation and brings new products to market. So without further ado over to you, Yusuf.

Yusuf Gulamhussain: Thank you. Hello everyone. Good morning. So we have a very interesting topic and let’s begin with, you know, what’s today’s reality. Right there. Okay. So now, you know, in this digital age, especially after COVID it’s imperative for companies to build what we call a modern digital business platform in order to stay relevant and stay ahead. And you’ve seen it quite often, companies going out of business because they’re not digital. We have proof in light of COVID. So what is a modern business platform. It’s a connected ecosystem powered by data and analytics at its core. So that’s where today’s reality is. And that’s what all the CIOs are looking for. So how does this translate to the insurance industry specifically? Let’s take a look. Now in the insurance space, the adoption of digital has been slow.

And I think you guys can agree with me on that. And the pandemic has truly accelerated that process and what we anticipate is that in the post-pandemic world, there’s going to be a steep rise in transformative innovative projects, primarily focused on companies being data ready, and data-driven. So all the key aspects of business, you know, be it customer, as you see on your top left, be it your third party data providers integrators, be it your smart devices, or be it your own workforce. That’s working on operational functions. They are connected in a data mesh, as you see in the direct via our data fabric, and that is going to lead to smart business strategy. And let’s get some of them in the picture, you know, based on the business function itself. So this is the promise of data analytics and digitization specifically for the insurance industry.

So what is the opportunity ahead for insurance specifically? So it’s all about, you know, applying smart technology to remove points of friction in key processes. That’s the key, right? And as you would agree, you guys being in the insurance space, document centric processes, you know, a whole bunch of PDFs, even unstructured content, really dominates the insurance industry. And that continues to be a major friction point, but modern technology through digitization through automation and intelligence can really help overcome those challenges and achieve some of the new and key business objectives within the insurance space. One, that number one being, how do we offer a personalized and a seamless customer experience? How do we make our whole claims processing more intelligent, more efficient? How can we be proactive in risk assessment and fraud detection and all of these technologies such as intelligent document processing, you know, use of OCR use of RPA is robotic process automation, IOT, machine learning, and all of that encompassing data analytics will play a key role.

Now let’s dig deeper into some of these core business objectives starting with customer experience. So as I was saying, you know, providing a seamless lasting customer experience is the number one priority for all insurers, be it in life, be it in auto, health, so on and so forth. And data and technology are going to play a wider role throughout the customer journey, right? And as you see it on the screen, right, every phase of the customer journey, starting from acquisition of the customers to growing your customer base and then targeting profitable segments of your customer to evangelize new products and then ensuring loyalty and advocacy. And throughout that life cycle, as you see on the screen, technology can play a key role in realizing some tangible business benefits. And at the very bottom, what drives all of this is data, all kinds of data within the outside the enterprise, data generated by smart devices, all coming in in real time and making a difference in how decisions are made.

Let’s take a few examples now in customer service, the use of natural language processing can be key. You know, there’s a whole bunch of information that customers share when they call and when they send out emails that the technology to kind of use and apply would convert all the speech into text. And then from the text decode sentiments, decode topics that are most interesting to the customer, allowing you to really provide a very personalized customer experience, you know exactly what the customer’s pain points are, and you target those pain points and address them with strategies with programs, right? That’s one use case. The other one on the technology side could be, you know, looking futuristically in auto insurance and auto industry and auto insurance industry is the use of vehicle telematics. You know, I know it’s very far-fetching, but you know, once the vehicle starts telling you how the driver is driving the car. Based on that, you could figure out, you know, ways to incentivize good behavior, even figured out different price points for your insurance policies, depending on what you have learned from behavior, from driving patterns and so on.

So, the application of technology is endless within this whole customer experience and customer journey space. Now let’s look at another function that’s pivotal in the insurance industry. Now a big part of providing a lasting customer experience in my opinion, is just making your whole claims processing efficient and intelligent, right? Because that is the key. That is your biggest touchpoint with your customers when they file for a claim. And that whole experience will drive whether they stay with you or whether they leave, right? And it’s the single most important factor that ensures customer loyalty and on the other hand, if done right, it significantly improves operational efficiency, hence cost enhancement. You can be more competitive if you are able to kind of trim down the whole process, make it automated, make it intelligent. You kill two birds with a stone, right?

Again, within the whole claims process, let’s take a look at how technology can help. Now, you guys have thought about machine learning, computer vision, the method which allows us to look at images, look at videos, look at pictures and figure out what’s going on. A good use case could be again in claims processing would be, you know, if you meet with an accident and those pictures of the vehicle quickly uploaded in real-time, analyzed through a computer vision method. And then you could figure out what is the extent of the damage? What, what should be the potential claim amount and so on and so forth, all automated right now? Yeah, it could sound farfetched, but it’s all happening in the real world. Another more operational use of technology could be how do you make the whole claims process automated using RPA?

Now, a lot of these companies are still asking customers to file a claim physically or through paper, or, you know, there’s a whole process of doing it. If you could make the entire experience, you know, really quick, easy, and user-friendly for your customers through the ability to file a claim on your mobile phone, launch all of your documents or the pictures. And before any human touches it, the whole process is done. And only the most complicated ones are the ones where probably the humans can take it on and address it. So basically you could reduce your workload by at least 60, 70% through automation and technology. And again, everything that drives, this is data underneath, right? And last but not the least, let’s talk about risk and fraud. Now, this is interesting, you know, underwriting and actuarial are two key functions within any insurance company.

And without a doubt, you know, these two functions are the early adopters of data and strategic state going back, like maybe 60, 70, even a hundred years back, not long before the digitally native social media companies turned data into goldmines, right? So in essence, when we say insurance is lagging in digitalization, in this front, they have been pioneers, but they really did not turn data into an asset as the other new age companies did. Right? But the key here is how do you expand that horizon? How do you now add third-party data sources? You know, an insurance claim, people look at weather data, they look at location data, they look at endemic data, all kinds of third-party data sources can be extremely valuable in integrating it with your transactions, your interactions in real-time. That is the most important thing, right? How you make quick decisions on risk and fraud in real-time will be a difference maker.

That’s the promise of, you know, data and analytics with a robust platform that allows you to kind of process that kind of scale, volume and latency that is required to take faster and accurate decisions. So just very quickly you saw, you know, in three major business functions, there’s a slew of technology that can play a key role in making your business strategy that much more smarter, that much more intelligent and that much more effective, right? So the opportunities are really endless, right? So all this is great. Now all this promise of, you know, doing it better, doing it faster with technology, how do you really make it real for you? Right. So let’s take a look at that. This is my favorite part. Everything starts with data and analytics. So data analytics best practices, and the whole bunch of it in terms of how you manage data, of how you hold data, how you transform data, how you analyze it, that coupled with modern technology.

And what you see on the left is the top 10 trends in technology, per se. All kinds of techniques, technology, and tools are available to apply to data and really bring this whole promise to life in terms of being digitally savvy, making smarter business decisions. And this is what the CIO is in insurance and otherwise are going to build, a robust enterprise intelligence capability that truly fuels, you know, smarter decisions, I would say, across the value chain. And that’s what the picture truly illustrates. From the core data analytics engine, you know, intelligent and smarter actions are kind of spawned across the entire ecosystem with customer experience, with your partner experience, with your IOT smart devices and just your own internal operational functions. And one key piece of that puzzle. And I would say the most important piece of that puzzle is a scalable data platform at the center of it.

So now a modern digital business platform needs a modern data platform to run. And that’s the promise of Yellowbrick. So Yellowbrick will help fuel innovation and transformation, you know, within this whole modern digital business ecosystem, whether it is at an enterprise level or whether it is a department divisional level, be it integration with your third party data providers or your smart devices at the edge, Yellowbrick caters to each of those specific workloads, you know, from a scale, from a size, from a performance standpoint. And the distributed data architecture, the distributed data cloud architecture from Yellowbrick, you know, it truly offers a seamless and a high-performance experience across your on-prem environment, your cloud environment and the edge, and all of it is managed to a user-friendly ecosystem. And you all agree, you know, all of these workloads are different, but that’s the beauty of the technology that we are presenting today is the ability to have that in place in its right form, I would say, at the edge or in the cloud or on-prem, based on what those data requirements are based on what the regulatory requirements are, or the compliance requirements are, and yet have extremely similar experience across all three and manage it through a single control panel.

Now, on that note, let me hand it over to Nick, to dive deep into the Yellowbrick platform and talk more about the technology, On to you, Nick.

Nick Cox: Thank you, Yusuf. That’s a really great overview of the industry. So thanks to everyone who’s joined. Pleasure having you with us today. I wanted to kind of echo some of the statements that Yusuf just made there and start out with a fairly obvious statement, which is that Yellowbrick is a data warehouse vendor for the distributed cloud. And I’m going to talk a little bit about what our core value proposition is, and look a little bit at our architecture again, a little bit later in this presentation, but as a vendor, we supply a very wide range of enterprises, and many of those are household global names, and they cut across finance and healthcare, telecommunications, manufacturing, and insurance, and from our interactions across all of those verticals, it’s clear that many businesses are growing and they’re adapting to these new technologies.

That’s a pretty common universal theme. But finance and specifically insurance are perhaps unique amongst the peer group there. You know, insurance is a really old profession. Almost it surprised me how many companies can trace their pedigree or their roots back well over a hundred years. By definition, with that amount of history behind a company, there’s a great deal of legacy infrastructure. So insurance as an industry has seen tremendous mergers and acquisitions over the years. And that only adds to the complexities, or you’ve already got a very highly involved IT operations and back-office data application pipelines. So we hear these issues coming across loud and clear every time you speak to someone in the insurance space. So there are some really strong themes around bringing together these disparate data repositories and breaking down the silos so that you can derive new value from that data, either through a new approach to analytics or simply removing the fragmentation that’s been driven by these silos.

So you can get to a true customer, 360 view. Likewise, a common goal in digital transformation that runs parallel to this removal of complexity is can we reduce the cost? You know, as insurance providers, your business is providing insurance, it’s not operating data centers. So a common theme is, is the cloud the right move for some, all of my business? If I moved to the cloud and am I exposed to new security issues? Do I have difficulties with data gravity and so on? Having kind of mentioned this kind of column on the left here, these emerging challenges for the data warehouse and insurance, I think many of you will be familiar with many of the limitations we’ve been enumerated here in this central column. I’m just going to pick on two of them. Firstly, over the last few years, you know, there’s been a lot of excitement in this industry around the cloud.

You know, the conventional cloud data warehouses have frankly revolutionized the user experience and the management experience as a whole and even to an extent agility, but one of the drawbacks is that they are very, very expensive to use at scale. So once you get past a certain limit, whether it’s an amount of data or a number of users, things get very, very pricey. And not only that, they also become unpredictable and, you know, migrations become incredibly complicated as a result. So there are very few companies that can afford to take this risk of essentially kind of making a move lock stock to the cloud, burning all the ships, doing an outbound migration to the cloud. That’s not realistic. It’s not feasible in many cases. With insurance, you’re going to have workloads that can’t migrate to the cloud anyway. You’ve got data becoming more distributed, not less distributed especially as you get increasing amounts of data produced outside of the data center.

So you have data that needs to stay on premises because of security or compliance because it’s simply too expensive to move or to centralize. And in these kinds of areas, the conventional cloud data warehouses do a fairly poor job of addressing those distributed data challenges. Another issue if you’re not using the cloud is that you’re likely using one of these legacy enterprise data warehouses. And so that’s going to include companies like Teradata and Netezza, Oracle, SQL Server. And these are all great solutions. They’ve been around for a very long time. But they’re all expensive to buy and they’re very expensive to scale. And they’re fairly inflexible. In every case, you essentially end up hiring a team of technical specialists whose day job becomes the care and feeding of these systems. You know, making sure they’re constantly viable optimizing performance and those reasons, again, are why many customers come to us who, you know, they’re in this process of wanting to re-platform their data warehouses because of those reasons.

So overall, what are customers really asking us for? So we hear there is tremendous demand for more from the overall platform. As customers, you’re looking for uniformity. Within insurance and finance as well, very similar in many regards, we’re dealing with customers who have a large number of data silos. There’s one particular example that I’m thinking of that has over 20,000 different data silos within a single organization. When you’ve got that amount of data in that many different silos, you can’t find your data, you can’t share it. You probably don’t even know whether you do have that data. So if you do find it, you don’t know whether you can use it because perhaps the person who put it there has gone. You’re not sure of the provenance of that data. You’re looking for integrative data architecture?

Data warehouses are not known before as easy to consume experiences. And one of the things I’ll give kudos to some of the cloud vendors is I think we’re now at a point where the data command line-driven interfaces to a data warehouse are frankly behind us. You know, we’re all trying to appeal to citizen data scientists. DBAs, developers, people who need to access things through a web interface to be able to fire queries, to be able to manage the infrastructure, to be able to program against us through APIs. That kind of shared cohesive management experience is incredibly important to our customers. And lastly, as I touched upon, you want to do all of this without having to worry about care and feeding. You shouldn’t be spending your revenues on employing people to manage the infrastructure.

You need to have a data warehouse that can scale workloads without having to be tuned or without having to be directly administered, all of this at the same time while handling these real time use cases that are incredibly prevalent within the industry. So Yellowbrick, as a company, is the kind that thinks that our customers and the industry as a whole deserves better than many of the limits that we just discussed there. So when we were founded with this kind of core set of values which we tried to address in terms of how we deliver the values in our roadmap and addressing our customer’s needs. So the first core valueI’d say is really excellent economics, right? So we pride ourselves on essentially providing the best performance you’re going to get for every dollar spent on data warehousing. Frankly, from speaking to our customers, that’s the most single important factor in terms of the long-term success with the data warehouse.

So we provide tremendous economics. The second is really kind of recognition of the fact that data is becoming more distributed. It’s becoming distributed across private data centers, public clouds, but eventually we started to see workloads appearing at the network edge as Yusef previously mentioned. So we’re getting much more data, not less data and data warehouses are really having to address that challenge. So we know that data is expensive to move and moving it can add latency to your analytics workloads, and real-time analytics does not support high latency. So the way to address this is with a truly cloud-native architecture that allows you to deploy your data warehouse anywhere on any infrastructure and manage it as local or as a distributed cloud. And that’s something that we pride ourselves on delivering. So Yellowbrick itself is incredibly easy to manage.

We have no manual indexes and I’m tuning, no caches are needed for peak performance. It just works. And Yellowbrick allows you to manage all of those instance types that I mentioned, whether it’s instance databases, users, you can load data and you can query that data all from a web browser now. The third value is really that batch analytics is always going to have its place, and it’s certainly not going to go anywhere, but there are many newer use cases that really require real-time analytics. So we need to have a very high-performance system that can support both batch and real-time data loading. And finally, basically that we expect the customers shouldn’t need to hire a consultant to understand the bill, right? You should have predictable pricing and that pricing shouldn’t be lower and more predictable month-to-month than you’re currently getting.

So by having these predictable performance-driven experiences on a common architecture that can bring the best performance out of any deployment type, it gives you broad compatibility and enables Yellowbrick users to support all of the core challenges that Yusuf has already outlined. So today Yellowbrick is supporting data warehouses in production that range from a few terabytes up to five petabytes and growing very rapidly. So our ability to scale, yeah, widely supported interfaces custom tooling that supports bulk loading as well as a very highly responsive performance real-time ingestion. We support all of this whilst delivering excellent concurrency. And that makes us ideal for supporting a very broad range of use cases at the same time on the same instance. So we support use cases from real-time fraud analytics queries at the same time as perhaps a complex data mining exercise from a citizen data scientist, all on that same instance without risking that real-time high performance fraud query set that that I mentioned. Combining all of that into one product really is the transformative solution.

So let’s look a little bit at the architecture. Bhuvana, move to the next slide. Because one of the most common questions at this point is how on Earth is Yellowbrick going to do all of that? That sounds too good to be true. And this diagram is very similar to the one that Yusuf showed in terms of a reference architecture. And I’m not going to go into deep detail here. If you would like to learn about the approach that we’ve taken in exhaustive detail. I really would urge you to go to and look at the recent Yellowbrick summit keynote. And you’ll see our CEO, Neil Carson, giving an excellent overview of everything we’ve done in this space and how we deliver this differentiated performance. That said there are a couple of things I do want to highlight on this marketecture, I guess.

So the first is that there’s a data ingestion method for essentially any use case you might have. So here on the left, we’ve got this batch to change data capture to streaming for the rates of millions of rows per second, all coming into the same underlying data warehouse. So that covers essentially any industry use case that you can think of, whether it’s a large batch financial close workload, all the way through to real-time fraud detection. The second one here in the middle really is that we’ve developed what we’re calling an adaptive cut-through architecture, and that’s something that Neil covers very well in that summit that I mentioned. Really, what we mean by that is trying to bypass the Linux kernel when necessary to allow us to get the best performance out of whatever infrastructure that we’re running on, whether it’s a physical server or a virtualized infrastructure, such as a virtual machine or a container you are going to get the best performance from Yellowbrick because of the way we’ve approached the performance optimizations across the board.

And thirdly, finally, on this slide, I guess, is that the architecture is well on its way to becoming fully cloud-native, which means that we can run virtually anywhere. So we’re talking about private-public clouds and eventually increasingly their needs at the edge as well, where you can have all of those Yellowbrick data warehouse instance types managed by a single unified Yellowbrick manager control plane. What are we focusing on and why are we different there? So, you know, we want to break that price-performance limit that its customers see, we want to do it for every single use case that’s out there. So any amount of data, whether it’s a single terabyte through to multiple petabytes, whatever, depending on the use case, it is the amount of data involved, our customers are going to see real analytics acceleration.

And again, if you look at some of the materials on our website, you’ll see a range of different customer testimonials that talk about analytics acceleration that’s well over a hundred times and these are all genuine, real use cases there. So I’m including Teradata and Redshift and Snowflake, and Natezza, you name the platform, we’re going to have a customer there who has accelerated their analytics considerably but they’ve also done it by paying a lot less money. And again, it does depend on who you’re comparing us to. So we have some customers who are paying maybe 20% of what they were paying. And this is for thousands of users, but I’m not talking here about tens of users who are going to be supported by something like a Databricks platform or Snowflake. And then you need to spend money spinning up multiple virtual data warehouses to support extra users.

I’m talking about thousands of users on the same instance out of the box. So it’s a very high performance, high scale, very affordable system overall. So finally, kind of the meat of this right at the end here is the use cases. Clearly, I’m not a sales guy, so I’m much more happy and much more confident in relying on my customers and their experiences to do the talking. So here on this slide, again, I’m not going to read this. We’ve got four examples of customer success or customer success stories at Yellowbrick within the insurance space. In terms of the top five global insurance companies here, I think at the top left, this is basically one of the biggest insurance companies in the world, and they replaced Oracle with Yellowbrick. And we’re now their platform of record.

So we’re used for everything there from HR analytics and the company has got I think millions of employees, which you know, that’s the sort of scale of company we’re talking about here. We’re supporting them doing their monthly claims, ratio processing, which used to take days. I think it was a two days to run on Oracle, and it now takes two hours on a single Yellowbrick instance. Over in Europe, like many insurance companies, we have a company that runs end-of-month claims ratio reports and those involve very, very large batch loading of data. And that’s something which historically took this company over a day, over 24 hours, to run on Natezza. And if they hit any errors or new data had to be added, it could extend their run to two or three days and beyond. And with Yellowbrick that processing now takes less than an hour.

And I’m including in that time, both the data loading and the query execution. Within that healthcare back in the US, this is a top 10 Blue Cross, Blue Shield company. So they were getting pretty minimal business value out of their Hadoop-based data lake. They had Apache Hive, which was too slow. It was too unreliable to support viable business use cases, but they had, you know, a community of well over a thousand users. So originally their plan was to use Teradata as a high-speed query layer on top of that. But they were testing Teradata and Yellowbrick, and they found that Yellowbrick would complete queries in seconds. That Teradata was taking minutes to complete if they would complete at all. And finally here in terms of the auto insurance space one of the largest auto insurers in the US is now doing their quarterly financial close processing about five times faster on Yellowbrick than they were with SQL Server.

Their query processing times for the actuaries involved, they’ve gone down from hours in some cases down to seconds on Yellowbrick. And, you know, I mentioned earlier, there’s a very high rate of mergers and acquisitions in this industry as a whole. And this company has recently been going through some of that and the Yellowbrick platform, you know, they very confidently projected for that we’ll be able to support and an additional 40,000 actuaries over the next couple of years for some recent merger acquisition work that’s been going on there. So lastly, I just want to move on to this last slide here and welcome Yusuf back to the conversation. So Systech has been a fabulous partner of Yellowbrick and we’ve worked together to help a pretty major insurer through everything that we’ve talked about here today. And, you know, in closing, I kind of want to welcome Yusuf back into the conversation to just kind of give a little bit of color around how and why Yellowbrick was chosen as a vendor and how that process worked for you.

Yusuf: Yeah. Thanks, Nick. Maybe this is an interesting use case that we worked on very recently and the plot was very similar to what Nick talked about. You know, unable to scale, they were on IBM Netezza as a data warehouse platform and they wanted to move to cloud something more modern and the decision, they had the same quandary, right? Where do we go? Do we do cloud? We do hybrid? Do we do, you know, on-prem and all kinds of permutation combinations? And the approach they took was pretty simple, you know, Hey, let’s take a look at all the leading players in the marketplace. Let’s do a proof of concept. You know, let’s do a bake-off, if you will. So we started with that and, you know, this was, again, a joint venture with Yellowbrick. We kind of scoped out, you know, a portion of their environment for a POC, but a hundred terabytes of data, a hundred data pipelines to move over and, and you look at the data, right.

We looked at Netezza, which was obvious. And then we started off with, you know, let’s talk Snowflake, big cloud vendor, and then pair it up with Yellowbrick and see how it performs. And, and lo and behold, you know, we had like three X, four X performance. And for us, it’s one key thing is it’s not just looking at performance in silo, right? Look at the holistic picture. What is the cost? What does it take to get a certain level of performance, right? If you look at just one aspect of it, it will skew your image. So we looked at holistically and went through that very systematically from, you know, from a three-player race to then pick, okay, why don’t we bring an Amazon Redshift into the mix, see how that works. Right? But then that, and again, the price-performance was just great with Yellowbrick.

And eventually after going through a series of these very meticulously planned, POC steps, if you will, truly established value and they picked Yellowbrick as their platform of choice for their data analytics platform, right? And, this was just the beginning of the journey because they started off with a very specific data platform. I would say the specific domain, if you will. And then that journey would continue, you know, over, I don’t know, months and years, if you will, but the good thing what Nick alluded to was they do have the same concept. They do have data in silos, and it’s not always easy to kind of merge all of that into one big enterprise data warehouse, and then monitor. That doesn’t seem to be realistic. With Yellowbrick, they’re able to now modernize their data platform where they are without truly merging and making a big, huge enterprise they’re out of.

And because all of the flavors of Yellowbrick provides the same seamless experience through a unified control panel that makes life that much easier to do all kinds of connected analytics that I talked about. So it’s a great marriage between how you truly realize all kinds of analytics, not just not batch or descriptive or historical analysis, but in real time, inline analytics alongside your enterprise analytics, your compliance reporting, all of those flavors can now be met with a single platform underneath. And that’s the beauty. That’s why we are so jazzed with this partnership. You know, being in this data analytics space for over 27 years, this is just the right marriage, in my opinion.

Yellowbrick | Panda
Yellowbrick | Panda

Top Rated in Customer Reviews

Yellowbrick is a leader in Data Warehouse on G2
Review Yellowbrick on G2
Book a Demo

Learn More About the Only Modern Data Warehouse for Hybrid Cloud

Run analytics 10 to 100x FASTER to achieve analytic insights that have never been possible.

Simpler to Manage
Configure, load and query billions of rows in minutes.

Shrink your data warehouse footprint by as much as 97% and save millions in operational and management costs.

Accessible Anywhere
Achieve high speed analytics in your data center or in any cloud.