ACI Worldwide – a leader in real-time payments – delivers mission-critical real-time payments software solutions that enable corporations to process and manage digital payments, power omni-commerce payments, present and process bill payments, and manage fraud and risk.
The company’s fraud analytics platform, built on an old legacy data warehouse that contained more than 30 billion rows and 100 terabytes of data, couldn’t scale. As its business grew, and the number of transactions increased, the legacy warehouse couldn’t keep up.
At Big Data Analytics November 2022, Radu Medesan, Director of Software Engineering at ACI Worldwide, shared how his team modernized their fraud analytics solution to a modern cloud data warehouse solution with Yellowbrick.
I’m Radu Medesan. I’m a Director of Software Engineering at ACI Worldwide, and I’m here to tell you a story; a story about how we ended up using Yellowbrick for our fraud analytics solution. Now, first let me say a few things about ACI Worldwide because probably most of you haven’t heard about us.
We are building software for electronic payment, software that makes the money move. Every time you use your card to purchase something in the store, or when you buy something online on a website, there are very good chances that that transaction will end up, at some point in its life cycle, through the software built by ACI. That’s because we have, as our customers, 19 of the top 20 worldwide banks. We have more than 80,000 merchants that rely on ACI software to process payments, and more than 1500 banks, merchants, or other financial institutions, are looking at ACI software to protect their transactions from fraud.
Now, speaking of fraud, one of the products that my engineering team is building is the fraud analytics part. Basically, this is the software that allows our customers to see how many transactions did they have, how many of those were successfully approved, which of those were declined, which of those were fraud, and which of the fraud strategies that they put in place are most effective in catching fraud. And for that, we rely on a lot of data.We have a data warehouse and we have large tables, very wide. It’s like more than 1000 columns and more than 30 billion rows. Overall, the size of the data is more than 100 terabytes.
So speaking of that, one more thing that I want to tell you about ACI is our favorite part of the year is the holiday season. Why is that? Because starting with Thanksgiving, Black Friday, Christmas, everybody shops more, and are more transactions coming in.
We see an increase in volumes and we need to make sure that our systems are ready to take that increase in volume. And for that, starting November through December, our systems go into some sort of a change-free state. Basically, we are avoiding any non-critical changes. It’s all eyes on the glass. The focus is on monitoring, on stability, making sure that we are able to run smoothly through the holiday season.
But before that, we are having holiday preparedness exercises. We are doing performance testing, we are increasing capacity, we are making sure that we are ready to go into the holiday season.
And a few years back we realized that our fraud analytics platform that was built on an old legacy warehouse couldn’t scale. Our business is growing, we are getting more and more transactions, but our legacy warehouse couldn’t keep up with that. So we had to take a decision and we agreed to replace that with more modern and newer technologies.
And at that time, everybody loves Hadoop. So it was, “This is the good thing, we should try it.” So we started a project to move the data and get into Hadoop. We didn’t have that knowledge inside, so we had to get external consultants to help. We also purchased premium support. We wanted this project to be really successful and in some parts it went really well, but in other parts it wasn’t giving the performance that we need.
And as much as we tried, we’ve seen queries taking even up to 25 minutes and sometimes timing out. So in the end, we agreed Hadoop is a really great thing, but for what we were trying to do, is not the use case that it intended. So we had to look for something else. And that’s something else was Yellowbrick.
We found out about Yellowbrick in May 2020 and we said let’s try it. We run some performance tests. It was surprisingly good. Then in just three months, we got the contract signed and had it delivered into our data centers.
Basically, what’s Yellowbrick? It’s like a Postgres warehouse that comes with its own hardware. So it’s an appliance that we put in our data center and we started loading data into it. Now for loading the data, Yellowbrick has some tools why below which are very performant. It helped us a lot in loading large volumes of data. Once we load the historical part, then we said we are going to take a copy – a backup of what we’ve loaded in one of the data centers – and we are going to move it in the other data centers across the ocean. And that was in 2020. And we were thinking, are we going to move the backup over the wire through the internet or are we going to better just put that on an encrypted hard drive and have something flight over? It’s going to be faster.
And since that was in September, October of 2020 when Covid was in full swing, even if we would send somebody to take with that volume of… To take that encrypted disk over, it would go into a lockdown two weeks. So it’s going to take the same time. So we ended up having a dedicated MPLS line that we transferred and loaded the data in the other data centers.
Now the whole load wasn’t that smooth because we had to look at data cleansing. For example, Yellowbrick is based on Postgres. And when you are taking a text data and put that into a text column, if that data has a zero bite in it’s going to give errors. So we have to do some cleanup we have to do to take care of the null values. And one of the most challenging problems was with regards to managing duplicates and managing updates.
Let me tell you a little bit about that. Basically, every time when you’re making the purchase, there is the transactions, those transactions are coming to us through Kafka topics. We use Confluent for that. And we get Spark jobs reading from those Kafka topics, doing the ATL and storing the result into Yellowbrick. Basically for every new transaction there is a new row that is added in Yellowbrick.
Now, sometimes the ports that are running Spark may fail, may get restarted for various reasons, for Cerberus authentication or whatever. And when those are restarted, they are picking up the batch from where it was left and sometimes we even got duplicates. So we had to implement a mechanism that would take care of the duplicates because in the end, if a customer wants to see the account of transactions, they should see one transaction, one road for each transaction. We cannot have duplicates.
And another problem that we had was on the part where we apply updates. Basically for that part, we get after we’ve added the transaction, we are receiving information. Was it a genuine transaction? Was it a fraud? Was it approved? Was it declined? So we had updates coming in from a different part, which were arriving before the transaction came in. So we couldn’t apply an update on something that doesn’t exist.
So we’ve built through SPR stored procedure on top of Yellowbrick, we’ve built a managing just process that takes care of all this stuff and also helps us with performance optimizations. It was really good. We needed help from Yellowbrick; we got Yellowbrick support, which by the way is fantastic. And we were able to get that running for the last two months of the year 2020. We get the system running in parallel with the old one and we had to cut over at the end of the year.
And one other example that I’m going to tell you about using Yellowbrick in real life. In the first month after go live, we had an issue. The processes that I was mentioning about managing duplicates and insert – those were taking longer. Instead of being completed in a couple of minutes, now we’ve seen that it’s half an hour.
We got alerts on the monitoring dashboard. So we started to investigate and we found out that there were some rogue queries coming from the UI. That’s happening because we don’t necessarily do canned reports for our customers. We have some predefined queries, but we give our customers the flexibility of slicing and dicing the data as they need.
And one of our customers was able to generate a query that was consuming a lot of resources from our system. And it was in afternoon US time, all our developers were already asleep at that time. So rather than waking them up, let’s put an emergency fix in there.
Our DBA worked with Yellowbrick and in a matter of minutes they were able to put a rule from the systems management console that would prevent, would reject that query. And suddenly the system went back to normal. Everything looked good after that. And in the morning we got developers coming back and they were able to put additional rail guards in the UI to make sure that we don’t get hit by those types of queries anymore.
Now, the overall experience with Yellowbrick was really positive. The customers were saying, “Hey, this new platform is so much better than what we had in the past.” Our internal team that is building the fraud strategies for customers – basically they are looking at the data and they were saying now with Yellowbrick, we can run those queries whenever we need.
We don’t have to wait until the end of the day to run the report off ours so that we don’t impact our customers. And the DBAs were saying, “Hey, the systems management console from Yellowbrick is really good. I can see the performance of the queries. I can see if something is going wrong, we can assign them on different work lanes. Also, we can see the historical performance of those queries, which is really easy to make system to keep the system in a good shape.”
And our developers were also happy because Yellowbrick is ANSI SQL compliant. It required minimal changes on the SQLs that we had and it also supports VARCHAR 64K and that was without an impact on the overall performance. So overall, a lot of people were happy with that.
Now for the future, we are looking to get more data into Yellowbrick. Getting as much data as possible in one place helps us. Now the problem with that is sometimes when we are looking to get data from different places, we may run up… Run into data protection and data sovereignty restrictions. And for that, because Yellowbrick is based on an appliance, if we would need to have business in a different part of the world which has those kind of data protections, then we would have an issue because we would need a data center in that country to have the appliance running there.
But there is a solution for that and we are looking for Yellowbrick’s new offering, Yellowbrick in the cloud. Basically, we’re starting a proof of concept with Yellowbrick to take what we have now on-premises, running on the actual physical hardware and running in virtualized cloud-native environment in Microsoft Azure.
And that would allow us to have it in whatever Azure regions where we want. And probably we will end up in that sort of data mesh that Adam [Mayer] from Qlik was mentioning before where we don’t have data centralized in one place, but we have to combine it from various places due to data sovereignty restrictions.
So overall, it was a good experience for us with Yellowbrick and I hope that the story that I’ve shared with you was interesting. So thank you very much.