Denson Pokta's Cassandra-Powered Journey to Fueling GenAI at Intuit

Denson Pokta's Cassandra-Powered Journey to Fueling GenAI at Intuit

Denson Pokta, Principal Engineer at Intuit

Video preview
Denson Pokta
Denson Pokta
Principal Engineer at Intuit

Denson Pokta is a Principal Engineer at Intuit Persistence Service (IPS). IPS provides persistence as a service for many products at Intuit. IPS portfolio includes NoSQL, Search, Vector, Relational, and Time Series store.

Transcript

Intuit is in the space of small business and personal finance. Our main products that's pretty well known are TurboTax, QuickBooks, Credit Karma, and Mailchimp.

My group is Intuit Persistence Service. What we do is to provide persistence as a service because maintaining a persistence is very difficult, and a lot of people don't even know what to monitor.

Our group provides a paved path within Intuit to make it easier for our service developers to develop their service and, in turn, accomplish the company goal of powering prosperity around the world.

Our platform a REST API interface, so the client doesn't even interact with the database underneath. And we take care of all that. Our service developers only focus on writing the business logic for their services. We make it really easy for them to develop their service, and they don't even have to worry about scaling.

We use Cassandra as a backend for our persistence service. But our client doesn't care that we use Cassandra in the backend because all they see is just the REST services. We kind of abstract the database out for them. We started with Cassandra a very long time ago, more than ten years ago.

But as we grow over the last ten years, we grow to a platform now within Intuit. We were initially founded within the business unit, within the TurboTax unit in Intuit. As we grow the platform with more and more features, now, we are supporting multiple business units and products within Intuit now.

Because of the scalability of Cassandra for example, now in production we have our largest cluster today. We have about 117 nodes per region and in two regions in AWS. And this particular production cluster can support up to 220,000 TPS and Cassandra provides us a platform to do that.

Data is everything. Without data, there's no GenAI. If you have what you call dirty data, have the proper schema, there's not much you can do with it. So there's a big effort now in Intuit to publish clean data to the lake.

Then it can be exercised by GenAI. As a whole there's a drive and effort now to produce clean data to the lake where it can be consumed by GenAI, whatever API model that we'll use the data from our platform that we feed for GenAI use cases.