August 27, 2020

[Webcast] Break the Chains of RDBMS Monoliths by Migrating to Cassandra Microservices

Modern application demands for scale, uptime, and distributed low latency access are pushing RDBMS technologies past the breaking point. Modern data platforms such as Apache Cassandra® offer hope but data models and architectures are vastly different.  How do you bridge the gap? 

In this recorded webcast (with full transcript below), I discuss how your DB2 or Oracle enterprise can take advantage of Cassandra today. Additionally:

  • How does Cassandra compare with relational in terms of data models and architecture?   
  • How can an enterprise migrate from relational to NoSQL without any downtime? 
  • Can you just offload some of the more expensive RDBMS functions to Cassandra? 

 

Transcript

Introduction 

Brian Mortimore: 00:00            
Hello everyone. Today, we're talking about migrating RDBMS database systems Monoliths into a Transformational Data Architecture. The first thing I'd like to ask is why we want to move from an RDBMS Monolith to a Transformational Data Architecture. If you're in this webinar, you probably have a good idea of why you want to do this already. But the real reason is that we want to achieve some transformational outcomes. There's things that we want to be able to do with our data that we aren't necessarily able to do today. These things involve things like be able to operate at a large global scale, be able to push our data out to the edge for lower latency, be able to acquire a larger user base, digital security, IOT, 5G. These kinds of things are putting pressures on our RDBMS systems that have been designed 30, 40 years ago in terms of the technology and we need a new solution today.

Brian Mortimore: 00:59           
The other reason is we want to be able to overcome our current constraints on data. For example, we need to have data that's relevant, accessible, and we need to be able to scale our data. We need to have control over data sovereignty more and more today where physically does our data live and we need to have more agile business operations. So we need to be able to use our data and perceive it in new ways and use it for new lines of business. And really one of the most important ones is we need to have availability of our data. So we need to have always up, always there, always responsive data systems. Traditionally, that's sometimes difficult with RDBMS systems because we end up with downtimes maintenance windows. Sometimes it's problematic to be able to keep them a hundred percent available. So to be able to do that at DataStax, we have Cassandra.

How does Cassandra compare with relational in terms of data models and architecture?  

Brian Mortimore: 01:49            
Cassandra is our foundational technology for being able to do this kind of transformation capability. It does provide that a hundred percent uptime. And so if you've used Cassandra, you know we've got some deployments now that are going on 10 years with zero downtime, which is really impressive. The other thing that Cassandra gives us is no lab lock-in, it's a true cloud database, but it doesn't require a single cloud vendor. So it's really the cloud database that you can take with you, which is very important in today's day and age. The other thing it gives us is this kind of global scale. We can take the largest applications, the largest read and write volumes, the largest amount of total data stored, and we can handle it with Cassandra. So it gives us this great foundational capability that we're looking for in a transformation of our enterprise platform.

Brian Mortimore: 02:37           
The problem is Cassandra itself, as an open source product is a little difficult to put into the enterprise. If you've done this in the past, you've probably realized that it turns into a little bit of a science project and operationally, it can be challenging to get Cassandra up to that level where you're actually getting those benefits that you need to achieve those business outcomes. So how do we handle that at DataStax? We've come to a point of view, kind of a perspective on what we can do with Cassandra. Cassandra gives us that foundational level capability that we need for our transformed data architecture, but we need to add a little to it. We need to add to that foundation to make it trusted. We need to know that we can deploy Cassandra reliably. We need to know that we can count on it to be there and we need to be able to move it, move the data around, be able to access it and be able to actually achieve that a hundred percent uptime that we want from the product operationally.

Brian Mortimore: 03:30            
You also need to be able to accelerate it and adopt it in our enterprises as a niche product, it's difficult. As an adopted product with standardized procedures and processes it's much easier. We also need to be able to use it strategically in our enterprise, which means it needs to talk well to other systems. So we need to be able to integrate it and use Cassandra's strengths along with the rest of our enterprise products, the rest of our enterprise operations. And that's what yields the kind of outcomes that we're really going for. So to be able to do that, this is kind of a model that we've come up with here. So we have the foundational layer of Cassandra at the very bottom and on top of that, we build operational reliability. So at DataStax we provide support for open source Cassandra. We also provide DSE, which gives a different layer on top of that, which gives us a kind of a trusted platform that we can support and deploy easily.

Brian Mortimore: 04:25       
We also then have this accelerated layer. So Cassandra itself can be a little bit difficult to do on the development side. If you've built a project on it, you realize that. So we've added different products on top of it that make it a little bit easier. We have multi-model data. We also have some analytics which gives us what I call a Swiss army knife of Spark. On top of our Cassandra layer. We also have search, which is very important because it allows us to find data based on predicates that Cassandra doesn't normally allow. The other thing we've added is a graph engine. Graph engine allows us to relate objects. So it allows us to build a network or a data model of interrelated objects and be able to search and query and traverse across that entire model. And we're also offering extensible integration. So now we can actually plug and play and add additional technologies as we go.

Brian Mortimore: 05:16            
We're also now moving up into the strategic layer with our commitment at DataStax to Kubernetes and using the Kubernetes operator. And we're also already a cloud native platform, but we're providing cloud native automation and elasticity so that we can grow our clusters, move our data and do what we need to do. And all of that is going to bring us our outcomes that we're looking for, our business results, which is going to be being able to do AI at scale experiences. We can now do microservices migrations, which is really what we're talking about today. Being able to migrate off of our RDBMS system and have a full stack that can support us. Just trying to do that migration with the foundational technology is a little bit too difficult to do. So at DataStax what we've done, if you look at the column on the right is we have a commitment to operating systems to the foundational layer of Cassandra. And now we're building on top of that partnerships as well as enterprise support, thought leadership. And we're adding additional tools and services to be able to make this process a whole lot easier for your enterprise.

Brian Mortimore: 06:20          
So what we have is DataStax has a commitment to be a partner to your enterprise. And what that means is we can carry you from that foundational Cassandra layer, which is important and gives us those benefits that we need and brings us all the way up to the outcomes that we're looking for. And we do that by just like I said, we're providing a trusted layer we’re then accelerating your ability to use it. And we're allowing you to integrate it strategically into your enterprise to drive the outcome that you desire. So the importance here is that DataStax has looked at this full stack. We have tools and processes that can relate to it full stack. And so we're ready to help and engage you as a partner, as you move some RDBMS workloads to a full Cassandra. So why is it a challenge to migrate to Cassandra from an RDBMS world?

Brian Mortimore: 07:12         
And it's really because we have two different paradigms, the ERD based design for the relational based design, which has been in play for the better part of 40 years now. Things like DB2, SQL Server, Oracle, MySQL, relational ERD based designs, probably what you're already comfortable and familiar with. The problem with that is when you build enterprise applications on this technology, you end up developing all of your code and doing all of your design work and doing all of your growth based on that ERD design. And what that becomes is a little bit of a constraint on your data, a constraint on your ability to grow. You have to work within that base architecture of how all of your different data tables and components are constrained and are interrelated that also presents some scaling issues. So that whole data model that is there becomes one unified design pattern that is coded against by the enterprise. And that is the monolith that we're talking about.

How can an enterprise migrate from relational to NoSQL without any downtime? 

Brian Mortimore: 08:09           
It becomes a single structure that's very difficult to piece apart and be able to migrate. At the same time Cassandra, NoSQL is based on the query based design patterns. So we're determining first, what questions are we asking of our data? What answers do we want? And then we design our patterns based on that. So these two different design patterns are broadly incompatible, and that's what makes it a challenge to be able to migrate from these RDBMS systems to Cassandra and to microservices.

Brian Mortimore: 08:40        
So how do we do it? How can the enterprise get from an RDBMS based design pattern to a transformational data architecture? There's several things that we've tried over the years. And several things that lots of enterprises have tried over the years, a simple lift and shift let's rewrite our application, let's rewrite our program, and then migrate into it. That becomes a boil the ocean problem. And the reason is that your enterprise is not static. And that project takes an incredibly long time to execute. So by the time that project is done, your business will have moved on. So it's probably not going to be relevant by the time you get to the phase of migration, it becomes a boil the ocean problem. It's too big to be able to execute.

Brian Mortimore: 09:18         
The other methodology we've seen tried is a table by table migration. That saying, let's just treat Cassandra, let's just treat this NoSQL world, just like it's any other RDBMS monolith database. Let's take my table in Oracle or my table in DB2 and take that exact same table and cast it into Cassandra. The problem is that data model difference isn't going to work very well. The NoSQL world is designed to perform very differently from the RDBMS world. The RDBMS world, for example, has rules such as let's not duplicate the data. Let's make sure data exists only in one place whereas in NoSQL duplicating that data is kind of a normal pattern for us. So that doesn't tend to work very well. You can do it, but the scaling becomes a challenge because the same constraints that exist in the RDBMS model will exist in the NoSQL model in a platform that is not as well designed to perform with it.

Brian Mortimore: 10:16       
The other thing is that you can't just rewrite the application. It's expensive, too much downtime, too much risk. And again, the big issue here is we have data model differences. If we want a data model for RDBMS, we're going to use a completely different set of rules from when we data model for Cassandra. So what we've done is we've figured out at DataStax, that we have a few different key pieces that we need to be able to do. And being able to translate that data model is kind of a tall pole in the tent, the most difficult challenge of doing this kind of a migration. So what we have is a CDC tool that we've partnered with a company called Striim spelled S-T-R-I-I-M, and what they've given us as a CDC tool that works in memory. So it's very fast and it allows us to do translations between my RDBMS schema and now my Cassandra schema, my NoSQL schema.

Brian Mortimore: 11:08        
That's incredibly powerful because what that tool does is that allows me now to be able to take any kind of a change that's happening, any kind of a write in my legacy RDBMS system, and now that write, can then be translated and carried out of that system, using standardized technologies, such as Kafka, and then since using our Kafka connector to the DataStax platform, to our new NoSQL platform. And based on that, we can now build our modern applications like using GraphQL, using REST, using any kind of a microservice technology that we really want to use. Very powerful. The problem there is it still doesn't give us our real time replica of our legacy data that we need. So there is some latency here, so we're doing our data model transformation, but it's going to take a little time for that data to get where it needs to be.

Brian Mortimore: 12:01         
The other problem with it is it's kind of a straight path. So we just do one straight translation that doesn't allow me to do AI. It doesn't allow me to do discoveries. It doesn't allow me to do some of the work that I want to do to find the impact of some of these writes and what they need to be done, what I need to do with those writes business-wise. So for that, we have kind of broadened out our pattern a little bit. So what we have is kind of the same pattern here about the same CDC tool going to Kafka, but now I'm intercepting it with Spark instead. So what that does is allow me to now find out when these write come in from my CDC from my change data capture, when these writes come in, I can now do a little bit of determination to see what these writes mean. That gives me a place to plug in my AI. That gives me a place to plug in my data science and be able to do some of these things that allows for a little bit more of a Lambda architecture.

Brian Mortimore: 12:52
So I can find out the relevance of this data as it's coming in. That's also very powerful, but I'm still not quite at that real time replica experience that I'd need to be able to transfer over completely from my RDBMS system. So what we have now is using DataStax and Striim for ETL. And what I have here is a diagram of using our CDC tool, breaking it out, using Kafka, and now I'm intercepting it in two ways, right? I have a Lambda architecture, I’m intercepting some of that data using Spark Streaming from a Kafka topic, and I'm intercepting some of that data directly using a consumer. What that does is allows me to persist that data in a very quick way, using the Kafka connector from DataStax and at the same time, I can add business intelligence to it using Spark Streaming, using AI and find some of the relevant meaning of that data as it's persisted, and open it up to more backend systems.

Brian Mortimore: 13:51          
And there's one more step that we can go with this, which is really what makes it extremely powerful. By using that same CDC tool, I can now use that bottom blue arrow and completely bypass all the other systems, bypassing Kafka, which allows me to bypass latency. Since my CDC tool is in memory, I can now write and persist that transformed data directly into Cassandra immediately. And what that does is gives me a near real time, low latency replica of my RDBMS data. Why is that so powerful? It's very powerful because by having a separate replica, this is how with Cassandra we deal with the world. We have multiple replicas of the data. Being an eventual consistent model that allows us to treat this transactional data from the RDBMS system as it's another replica of my enterprise data system. So now all of a sudden my RDBMS system is following some of my NoSQL rules. And that allows me to be able to do something, which is break off pieces, just little pieces of that RDBMS application, chunks of functionality, and migrate them over and maintain consistency between my NoSQL world and my RDBMS world.

Brian Mortimore: 15:06        
So what we've done at DataStax is by using this methodology, we have a way of being able to divide and conquer by divide and conquer. I mean, we can't rewrite the entire application. We can't do the whole thing at once, but what we can do is take a small chunk, one squeaky wheel, one piece of your business that you need to transform and do it at a time and measure that and be successful with it. So what we do is we identify a specific business function that you need to transform, and then we scope it incredibly carefully. So what we do is we find that business function and we model it. We determine the objects, those business related functions and objects that are related in there. And usually that comes down to a half a dozen or so objects that are supported by about 50 or so RDBMS tables.

Brian Mortimore: 15:54           
We take that one defined piece and we find all of its input and output patterns, the read and write patterns that we need to use that populate that data structure, how we read and use that data. And we use that to define an optimal business data model that we can use inside Cassandra. So we do a data model for this that is completely optimized for Cassandra. That's going to be very, very different from your RDBMS data model. And in the process, we ended up building a CDC in ETL pipeline that does that transformation for us. And it maintains that consistency. That CDC ETL pipeline can actually go in two different directions depending on the technologies that are used. And then we also provide a microservice framework to provide the CRUD, the create, update, delete type functions for those objects, so that they're being able to use immediately in your business functions.

Brian Mortimore: 16:47           
So what's our goal? We want to return transformation value very fast and predictably. And so how we do that is by using this proven and repeatable process. We have to crisply define our business functions that we're going to transform and their order. And we maintain a list of all the dependent tables and the specific functions that depend on those tables. And we keep that list within reason. So we keep the functions very specific so that we can maintain the scope. And that allows us to succeed in transforming that one business function at a time.

Brian Mortimore: 17:19           
We also then perform adequate testing of the data model, which is extremely important to be able to scale it according to requirements. And then we use microservices using standardized interfaces. This allows us to avoid interrupting developers and causing disruption. We can allow developers to use standard RESTful interface, or even better using GraphQL interfaces. So we perform these live migrations then by abstraction. So once we have this replica running in Cassandra of the RDBMS data, we can now then point our application towards it. So it becomes a much more simple task for developers to take an API or existing structures, and now point those to a RESTful or GraphQL interface, and then to be able to use that instead. That of course requires that real time CDC pipeline and transformation, to be able to take place.

Can you just offload some of the more expensive RDBMS functions to Cassandra

Brian Mortimore: 18:13
Here's kind of a diagram of what this pattern looks like in a real business. In this case, we've taken a financial sector business. In which case you may have a particular business function. The business function I like to use in the financial sector is for a bank and mobile banking. With the advent of everyone taking their cell phones out and now the mobile banking applications are easy to log into people check their balances very frequently, especially on paydays. So people are looking to see what their account balances are. That can be one of the most expensive transactions, one of the most expensive functions on an RDBMS system. So if you are paying in MIPS, for example, if you're on DB2 in a mainframe and you're paying in MIPS, you probably have a very keen awareness of just how expensive that function would be.

Brian Mortimore: 19:01       
In this case, we could take that one particular function, isolate it, find it's dependent tables, and then build that CDC pipeline for real time between them and build that operational data layer and put a microservices to it. And now you have a way for your APIs on your service layer to be able to talk to your new transformed data architecture. At DataStax, we've built and designed a service to be able to do this with our field services team. And what it really does is our DataStax RDBMS to microservices service. And what it does is take a business function from your enterprise RDBMS environment and we allow you to cast that as a Cassandra back spring microservice very easily.

Brian Mortimore: 19:46        
It So what that includes really is taking one business function and with that, we will build an API object and we'll identify exactly what that scope is. We'll assist you in and kind of unwrapping that, finding out what objects we're really talking about, defining the properties of those objects that we need to manage and need to record. And then building a data model in Cassandra that's based on it that uses the data from those 50 RDBMS tables in a different way that we can actually scale and use in the NoSQL cloud world. We do all that data modeling and end up building a data model, and we give you that DDL. So it's ready to go. We then build that persistence layer and define that for you. And we do the operational automation templates. So all that trusted layer that we were talking about of enabling, we help you with that. So that involves all the automation you need, a test automation framework, a plan to be able to scale that up as you grow.

Brian Mortimore: 20:42
And then we also build that CDC inquiry-based pipeline definition for, and do that ETL work. That's actually a little bit complicated and we help you with that. There's two different pipelines we built. One is a CDC. So as data comes in and is changed, we go ahead and make that change so that we have a real time replica that's kept in sync. And we also have a query based pipeline, which means we can take all the legacy data. You have do a query, do that transformation and populate all that legacy data as part of a migration plan. So we'll help you with both of those. The other thing we end up with is we deliver to you a spring framework based microservice architecture. That gives you the basic CRUD, not the business logic, but the basic CRUD for those objects. So now that you can code against them and do the business functions that you need to do in a normalized, standardized way.

Brian Mortimore: 21:29         
So back to our pattern, our enterprise diagram here, those areas in pink are the areas that this service accelerates. So you'll see there that we're accelerating that CDC pipeline, that connection and the query based pipeline to be able to populate it. We help you establish the DDL and the schema as well as kind of the operational footprint with the number of nodes, number of data centers, et cetera, that you need in terms of automation. And so we ended up giving you all of that as the operational data layer, at least enabling that for you. And then we give you that microservices layer that spring based microservice, that gives you a credit function so that now your API can point to.

Brian Mortimore: 22:11          
So these are traditionally the most difficult parts of doing this transformation. So we try to make it into a bite sized chunk. We do this for one business function, and then we deliver it, and hopefully we deliver it very quickly. So you can start to realize value very quickly from your transformation investment. The idea is then you can then take another bite. So we can start with the squeaky wheel, the most painful process you're dealing with, and then move on towards a bigger transformation bite, which basically means taking more chunks, even doing them concurrently if we need to. This is the methodology that we've refined from experience. I'm going on my fifth year now at DataStax doing this for customers. And really rather than saying that this is a slide of our kudos, this is a slide that shows you all the different organizations that have refined us. These are enterprises that have helped us define this process and that have helped us make it a better environment, and that we've learned from. So it's kind of a refined process from five years of experience here of doing these transformations, and so we're looking forward to being able to work with more customers like you.

So I'd like to say, thank you very much, first of all, for taking the time during this busy period and at home period for many of us and engaging with me, and I'd like to answer any questions that you have at this point, Michael, do we have any questions that would come in?

Michael Ledesma: 23:37      
Yeah, thank you, Brian. It does look like we have some questions that came in. If you haven't had a chance to submit them, please do so now. And we'll try to get to as many as possible. We'll start with the first one here from Subra. If multiple LLTP applications are using the same reference data, how do you handle it during the migration?

Brian Mortimore: 23:57             
Yeah. How do we handle it during them? If they're using the same source data, since it's a live replica, it gives us some options. We're not getting rid of that source data that we're using in the transactional system. So we're treating it as another replica. So that means that existing systems could be migrated over time to the new microservice. You don't have to do a cut. That's what enables us to do a live migration. So we have the data living in both places live. And before you ever actually change your application and start moving the systems over to point to the new architecture, the data is there and ready to be tested and validated. So those components of your business logic, that you may have multiple different angles that are coming in and accessing that same RDBMS system. We can keep that system as it is and move those access points over little by little in migration, pointing to the new microservice architecture. Very good question. Thank you very much.

Michael Ledesma: 24:53             
Yext one in here, it looks like from Vlad. Do you need to create a Cassandra data model before starting CDC?

Brian Mortimore:  25:00          
You do need to create the Cassandra data model before you start CDC. The reason for that is because the CDC is a vector, it's taking data from one location and placing it in another location. So if I have that query pattern with the RDBMS system, and now I'm going to a destination data model, I have to know what that destination data model is to be able to persist that data.

Michael Ledesma: 25:28            
Great. Another one here, during my experience as a DBA, I've come to see leading databases that can satisfy most needs, but tend to leave out one. For Cassandra that is housekeeping via repair node. How does this impact in a high throughput clusters that require sub 10 ms response times?

Brian Mortimore: 25:52            
You are onto something that's very, very near and dear to my heart. This is something we've been working on at DataStax for a while. And so we've developed node sync and we're working with the open source community and making a lot of this part of the open source Cassandra. I'm not sure how much it's going to be in Cassandra 4.0, but there's a lot of work that's been done on making repair a whole lot more efficient than a whole lot better. Node sync is kind of the version of that that we're working with now in the newer versions. And it's been kind of a godsend to us.

Michael Ledesma: 26:26             
Perfect. One here from Ed, Brian, what is the best modeling tool for creating Cassandra of NoSQL models?

Brian Mortimore: 26:33           
The best modeling tool? My favorite modeling tool, believe it or not is Notepad, I just use text modeling and that's unfortunately, because I haven't come across anything that I'm incredibly comfortable with as a real modeling tool. For Cassandra modeling, I model based on the query. So I really try to follow that pattern. So I treat that one query per model, and I try to match it. I try to do everything in Cassandra first, and if I absolutely need to, I use one of those layered on technologies, such as graph, such as search, to be able to extend my functionality if my data model becomes too constrained. There are some tools out there, but there's nothing that I personally use, except for just the text pad.

Michael Ledesma: 27:22            
Awesome. We'll answer a few more here. One here from Kay, I might've missed this. When you bypass Kafka and create the replica of the RDBMS data, do you use some other connector?

Brian Mortimore: 27:37             
We do. In that case, it's using the driver and is going directly from the in-memory CDC tool straight to being written to Cassandra so that we can control that latency. It gives us a place to be able to monitor our replica latency, which is kind of important. So if I have a few milliseconds between when that data shows up in my RDBMS system and when it's in my Cassandra system, that way it gives me something to be able to monitor operationally and I can keep really tight controls on that, but it's just using that driver direct from inside that CDC tool in memory directly.

Michael Ledesma: 28:16            
Great. We'll do one more here. How many max number of tables can join to produce the transformational data?

Brian Mortimore: 28:27            
How many max number of tables can you join? I guess I'm not really understanding the question I try to keep that scope kind of narrow because the more tables I deal with, obviously it gets out of control and I end up with a boil the ocean problem. So maintaining your scope of what you're actually trying to transform is incredibly important. And that's where I try to use a limit of about 50. The realistic thing on a Cassandra system, I don't like to have more than a hundred tables living on any single Cassandra system. It starts to behave a little bit crazier. So it's better for taller tables rather than more tables as a design pattern. So my tendency is to flatten tables and reduce the number of tables on the Cassandra side, as much as I can by denormalizing, is kind of the normal pattern. I try to start with no more than 50 tables on the RDBMS side. Occasionally you run into a business function where you have to break that rule just because you have dependencies that start the string out a little bit. Those projects tend to get a little bit longer, tend to get a little bit more expensive. So anytime we can keep that scope focused and bite off a chunk and then execute and deliver it the better off we are.

Michael Ledesma: 29:45             
Great. Thank you, Brian. We do have a lot of questions and they are very good. So we're going to stay on for a few more minutes and try to get as many as we can answered. If you need to drop, we will be able to share this with some folks, but it's some great questions here. We want to make sure that these do get answered. Brian, another one here. Do you have consistency issues with multiple spring base API microservices connecting to the same database?

Brian Mortimore: 30:15             
Connecting to Cassandra, we generally do not. Cassandra is an eventual consistency system. So a lot of times what our methodology is, is if I'm dealing with an asset transaction load, I'll try to keep that asset transaction load on the RDBMS system and lighten what we're using that RDBMS system for, on other fronts. For example, we use a migration by abstraction methodology. So what we try to do is take, for example, the read load off, and we might leave that transactional load on the RDBMS system, and then eventually move that transactional load to Cassandra where we'll use some more native methodologies to ensure consistency, but we follow the same kind of consistency rules. So you wouldn't want two different microservices performing the exact same transaction on the exact same objects. So we would follow those kinds of rules and we would enforce those.

Michael Ledesma: 31:10            
Great, another one here, can we migrate from MongoDB to Cassandra using CDC?

Brian Mortimore: 31:18            
You can. Yes, that's possible.

Michael Ledesma: 31:23            
Okay. Awesome. Another one here. One of the challenges we are facing is that we have multiple platforms and data that is growing exponentially. How do you manage/plan for the growth/cost?

Brian Mortimore: 31:36            
That's a very difficult, broad question. What we try to do is first of all, if your growth is unconstrained, that's usually... If your growth is tied to a business objective, that is good. That means your growth is generating revenue and accomplishing your business objective. For that, what we want to do is make sure that we have a data model that can appropriately scale. With that we won't have limits. So as long as we have the proper data model, I can give you a thousand node cluster and it's going to perform in sub millisecond and be excellent, and you can grow that as big as you want. And as long as it's tied to a business objective, the cost won't matter because all those nodes are going to be generating revenue for your business or being accomplishing your business objectives.

Brian Mortimore: 32:19            
The problem is if we have unconstrained growth, which means growth, that is sideways. Growth that is somehow not tied to a business objective or not delivering us business value. In those cases, we try to identify that and data model it away. So we want to identify any kind of extraneous logging or anything like that, that we have that's unnecessary. We also TTL off data that's irrelevant to us. So the idea is finding that data that really matters to you and is accomplishing business objectives and having a system that's a transformational data architecture so that you could focus just on that data and expunge yourself of the liability data.

Michael Ledesma: 32:58            
Fantastic. Another one here from Rui, how is the CDC inquiry-based pipeline used to resolve data conflict assuming there is an edge case that is the same data is being streamed in both pipelines?

Brian Mortimore: 33:11            
Really, really good question. What I normally would do is make sure that I can't have pipeline collisions like that. So if I have two pipelines that are updating that same exact data, I wouldn't do that. I would data model my way around that. If I had a situation that was like that, then I would have to have some kind of error checking. We have what's called lightweight transactions in Cassandra, which allows us to use an if, and that's a way to do some consistency operations. The only downside is that they're a little bit expensive and the larger your cluster is the more expensive they are. So we try to avoid those as much as possible, but that's tooling that we can use to avoid consistency conflicts between objects. But the best way to do is a data model yourself a way around it, so that you don't have two different pipelines that are talking to the same data objects.

Michael Ledesma: 33:58            
Okay. Time for two more. Can we run some sample applications to convert and utilize this data platform to learn all the things like DDDD and many more in a true experience way?

Brian Mortimore: 34:12            
I would encourage that? Yeah, I would encourage that. We're going to be doing some demonstration environments. And we're going to be taking some live applications and kind of a walkthrough that you can take apart using this methodology. So that should be out pretty shortly. And hopefully that will give you something you can kind of look at and then refactor using your own data, using your own methodology and kind of come up with something. So I'm thinking that should be possible using the examples we have in the near future.

Conclusion

Michael Ledesma: 34:44            
Cool. Last one, Brian, for someone new to the NoSQL world, does DataStax offer any free intro classes or other training?

Brian Mortimore: 34:52           
Absolutely. Yeah. We have our DataStax Academy. Academy.datastax.com and there's really good introduction material there on data modeling on operational functions of Cassandra on how to deploy it in your enterprise and even a full certification path, if you want to go that far, but just the introduction classes are really good. The videos are excellent and they should get those basic concepts of how to live in this eventual consistency, multiple replica world. You should get a grip on that pretty quickly.

 

 

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.