TechnologySeptember 23, 2020

[Webcast] Leave it to Astra: Database-as-a-Service on Google Cloud

[Webcast] Leave it to Astra: Database-as-a-Service on Google Cloud

DataStax on GCP delivers production-certified Apache Cassandra® and expert support to minimize risk and optimize costs so you can focus on innovation. Join me for an introduction to Astra, Database-as-a-Service, and learn how it can be deployed in Google Cloud.

In this recorded webcast (with full transcript below), I covered:

  • What is DataStax Astra
  • Astra in the Google Cloud Ecosystem
  • Astra Demo highlighting Storage Attached Indexes (SAI)
  • Astra on GCP: Architecture Teaser

We hope you find this session informative! Sign up for the free 5GB Astra tier and start building!

 

Transcript

Introduction

Matt Kennedy (00:01): Good morning and thank you for joining us for today's webinar; Introduction to Astra, learn about DataStax Astra on Google Cloud. This is Matt Kennedy and it's an exciting day here at DataStax. This morning we announced the general availability of our new storage attached indexing in Astra and DSC. So today I'm going to go through some examples of how to use that with Astra running in GCP and we'll talk a little bit about how we leverage GCP to keep the lights on in Astra.

Cassandra: The Best NoSQL Database of Choice

Matt Kennedy (00:33): So first let's talk about the core reason that we're having this webinar today which is Cassandra. Cassandra is an incredibly pervasive database and I really like this quote here on the slide, "If you use a website or a smartphone today, you're touching a Cassandra backend system." There are famous use cases at Apple and Netflix of thousands of Cassandra nodes running but really Cassandra is deployed in many, many places where global scale, the ability to really, really scale out a database without the concern about there being a limit or a ceiling that you're going to hit at some point. And the power of those low latency reads and writes to all of the data in those enormous databases is really something that has helped make the internet what it is today. It's one of the fundamental database tools that we use to run that we rely on for modern internet business.

DataStax Astra: Cassandra Made Easy in the Cloud

Matt Kennedy (01:41): That said, it is a little bit tricky to run yourself. And that is why we created Astra. So Astra is our Cassandra-as-a-service offering. The best way to understand Astra is to use it and I invite you all to sign up, it is super easy. You can go through the GCP console as I'm about to show you in a second, or you can come to astra.DataStax.com and sign up for an account. There is a free tier that anyone is free to use with very little restriction on that. It does have 10 gigabytes of space that you can use to prototype an application. There's plenty of ways to get in touch with us through the app. So I encourage everyone that is listening today to check that out. It really is a super simple system to get up and running with, with Cassandra. And I highly encourage you to check out some of the SAI demos that we'll be showing later.

Matt Kennedy (02:38): So a few details about Astra before I show it in action. As I said, Astra is at its core Cassandra-as-a-service, but it's more than that. This is a completely no ops database that has on top of that Cassandra layer, really powerful API. So one of the cool things about Astra is, we provide a GraphQL and a REST API generator endpoints. So when you enter a data model on the Cassandra side, you get out with you no coding on your part. A REST API and a GraphQL API that you can use to immediately start prototyping applications. It's a really cool thing to see in action. I highly recommend you check that out again on the Astra free tier. It is a totally cloud native system so it is built from the ground up running on Kubernetes. We have open-source, the Kubernetes operator that we use to run Astra and that has all sorts of lessons learned from our experience running thousands of hours of Cassandra clusters in Astra.

Matt Kennedy (03:44): This is also a system that has relatively good portability. It is a system that does not require any sort of locked into a particular API. If at the end of the day you want to switch to running open-source Cassandra on your own, your application code should be compatible to make that happen. And as I said, 10 gigabyte free tier that I encourage everyone to check out and to try out these demos with. So all that said, let's get into the Introduction to Astra demo. And then I will show off some of the indexing technology.

Astra on Google Cloud Demo

Matt Kennedy (04:22): To get started with Astra on the Google Cloud Platform, we're going to open up the Google Cloud Console. Everybody should be familiar with this. And we head over to the navigation menu in the upper left. Now to find DataStax Astra we scroll down to the bottom where the partner solutions are, and DataStax Astra is right here. This is our Cassandra-as-a-service offering. We're going to pop a quick pin in this so that it is easily accessible from our menu at the top here. So let's hit DataStax Astra. If this is your first time coming to Astra you're going to see a product description page, and the product description page tells all about the service and the pricing plans. And then you click through the terms of service and you wind up here, or on the Create Database page, which I will show you momentarily. Where you land depends a little bit on whether or not you are sharing a project with anybody else that may have already gone through the process of enabling Astra in that project.

Matt Kennedy (05:29): So the first thing I want to show you is how to create a free tier database. Now, everybody gets a free tier database as something that DataStax provides to the Cassandra community. We think that it's a really useful tool for anybody that has any reason to use Cassandra. But there is the limit that I get one. So I have one running here. And what I'm going to do is going to delete that first so that I can show you how to create one. So we have a precaution that we use where we require you to type the database name in as a sort of deliberate check that you mean to do what you're doing, you didn't just accidentally hit that delete database. So now we will terminate the database. And once that starts terminating, I am free to create another free tier database.

Matt Kennedy (06:23): So I'm going to do that with this Create Database button. And I'm going to give my database a name, we'll call it free Astra and I'm going to make that a developer tier. The C 10 tier is not free, it is a production tier database. I am going to go down and give this a keyspace thing KS1. I'm going to create a database username which is MKADM, which is just short for Matt K admin, and I am going to set a password. Make sure I know what password I'm typing in, and then I confirm that password. And now I can click create. You'll see here that it says capacity starts with one capacity unit, capacity unit is how we measure the scale of the Cassandra database that is underlying the service. So we will create that.

Matt Kennedy (07:21): And once that starts creating what I'm going to do is go over to another database that already exists and show you how we can actually expand the number of capacity in it. So you see I have this MKC10DB that is running here that's already two capacity in it. But what I'm going to do is go and edit that database. And when I edit that database, I am going to expand it. So I'm starting out at two, I'm going to go to three capacity units here. And that's going to take me up to 1500 gigabytes of usable data in the database. So I hit save, and that save starts the resizing operation for that database.

Matt Kennedy (08:03): Now, everything stays online. It is a operation that is fully automated. But this is Cassandra so we are in the background doing lots of interesting operations that you really wouldn't want to do manually. And this is all taken care of for us in an automated fashion. Now, what's interesting is all of this in the background is running on GKE. And it's using our open-source Cassandra operator that we use to run all of our Astra databases. So we'll talk a little bit more about that later and in future webinars. But for now I just wanted to mention that in passing. At the moment what I'm going to do is show you how I can also instead of using this integrated console, I can also switch over to the data stack console. So this is a native GCP experience that we're looking at here, but what's happening on the back end is we're calling web services basically our REST endpoint, that is going to be the same back end code that we use to run DataStax Astra in our own web platform.

Matt Kennedy (09:18): So let's go and click this and see what that looks like. So I'm told that I am going to be leaving Google, and I can go ahead and confirm that. And this is going to cause me to switch windows. So there's going to be a little bit of a blip here, and then I will be right back. So here we see the DataStax hosted version of the Astra console. So obviously not integrated with the GCP console. But the cool thing is, is if you come in via single sign on you are going to get all of the billing integrated with GCP. So if you are using one of the commercial tiers of the database, all of that will automatically be integrated with your GCP billing.

Matt Kennedy (09:59): There's a couple extra things in the DataStax console that aren't in the GCP console. For example, these interaction widgets you can tell us what you like about Astra, what you hate about Astra, or what you wish Astra had in it. Any feature requests that you have can go here. You can even set up time on our calendars to chat and have an in depth discussion about a particular feature that you need. Or you can go over here to get help and open up our documentation index, create a support request. And then we also have an intercom widget down here which allows you to see the system status, search the documentation, or even start a direct chat with somebody on the team.

Matt Kennedy (10:41): So that is the quick tour of Astra. For now, I want to get on to other topics and in particular get into the meat of today's webinar which is SAI. So stay tuned. Before we do that, I want to talk a little bit about a key component of any database system, and that is storage. We would have a really tough time running Astra without a really, really good storage subsystem. And for this within GCP, we rely on their PD-SSD functionality, which is basically a persistent remotely mountable SSD backed volume storage system that has really, really good performance for the amount that it costs. We are also excited to try out the new balanced SSD offering.

Matt Kennedy (11:39): Currently the ones we use are the column on the right hand side for our Astra disks because we want the highest performance, but as we look at bringing out more offerings and a wider variety of SKUs, we're excited about this balanced SSD which could give us the ability to offer some additional lower cost tiers in addition to the free tier. So the storage subsystem in a database really can't be overemphasized in terms of its performance. One of the great things about a cloud storage system like PD-SSD is it also gives us a lot of conveniences that make it easier to manage a an as-a-service platform that needs to be able to do things like flexibly move around infrastructure within the cluster, and reboot nodes and bring them up with the same storage. It's pretty critical to have this kind of persistent volume storage system for any kind of as-a-service just given the importance of reliability and keeping things up and running.

Matt Kennedy (12:55): So, just want to give credit where credit is due to the GCP platform there. PD-SSDs really do make things a lot easier for us to manage, and certainly help the performance when it comes to things like database indexes. And so without further elaboration there, I do want to dive right into the SAI demos and then we'll talk a little bit about SAI from an overview standpoint. So let's see what we can do with some of these SAI queries and indexes.

SAI Demo

Matt Kennedy (13:35): What we're looking at here is an Astra database running in GCP. This is our primary view page for the database and we have a lot of details here on how to use Astra with different kinds of tooling. What you will definitely want to check out, if you have not seen this already, is our prior webinar called 10 Clever Astra Tricks. And in that webinar, we cover how to use the REST API with things like Postman and cURL and how to use the GraphQL API to generate endpoints directly from your data model without having to do any coding. We talk about how to use the secure connect bundle with your code and dsbulk, which is our bulk loading and unloading tool, as well as NoSQLBench which is a great workload generator that works with Astra.

Matt Kennedy (14:24): Now, one of the other cool things that I do want to reiterate here because we have added this recently for GCP is we can support VPC Peering. So if you have an application VPC that you run in the Google Cloud, you can now peer that VPC to get a direct connection into the Astra VPC. So I'm not going to fill all of this out right now because we don't need it for the purposes of this demo, but I did want to point out that that's there. Great use of the security functionality built into GCP. For now, I am going to skip over most of what's on this page and go over to the CQL Console. So I'm going to log in and we are going to start doing SAI operations.

Matt Kennedy (15:14): So I am very excited about SAI. I think SAI is a game changer when it comes to Cassandra development. Everything that Astra did to make life easier for Cassandra operators and developers when it comes to having a no ops environment, SAI has an equivalent level of power for making the developer's life easier. All of the data modeling headaches that Cassandra has had are a thing of the past as far as I'm concerned. I think you'll be pretty astonished with the kinds of operations that we can complete with a Cassandra database with really performant indexing. And let's just get into it and show what that looks like.

Matt Kennedy (16:00): So I'm going to first switch over to my keyspace. And I'm going to paste in a Create Table command that I've got here. So if you are familiar with DataStax's documentation, you will no doubt recognize the cycling examples that we use in our docs team. We have some very enthusiastic cyclists. So I've created that table and I am going to now insert a bunch of sample data into that table.

Matt Kennedy (16:41): All right. So we've inserted our sample data. If we do select, we want to grab the table name here. If we do a select, we can see that we have 20 rows of data neatly organized here in our CQLSH console. So now I want to go over to actually create some indexes and start showing how we can query in a way we haven't been able to query before. So you'll note that for this table, the primary key is the ID. And that's great because it's a unique ID, it uniquely identifies the rows. But if we want to query from this table by last name, right now we can't do that. We could do it if we specified allow filtering, but really on a large table, we really don't want to do that.

Matt Kennedy (17:40): So what we're going to do instead is define an index on the last name column. So I'm going to paste that command in here. So here you can see I'm creating custom index. I'm naming that index. I'm saying that it is on the table, cyclists_semi_pro column last name, and I'm using storage detached index. So that is how we get access to an SAI index as opposed to another kind of index. And finally, we are adding the options here false for case sensitive. So this is just a question of whether or not you want case sensitive matching or not. And we are setting normalize to true. I'll talk a little bit more about that later. So we create that index. And now if we go back to our last name query where we're looking for Eppinger, we can actually execute that query now.

Matt Kennedy (18:34): So instead of having to remodel everything and potentially denormalize just to execute a query where we can search for people by last name, all we had to do was add an index. So let's take a look at some other examples. Let's say I want to examine cyclists by their age. So I can run a query that looks for age. I can't do that yet because I don't have the index on it and I don't want to do allow filtering. So I create that index on age. And now I am able to execute that query and I get back all of the cyclists over 23 years old. What's really cool is I can also do this for, I can say, I want age to be less than or equal to 23 and age greater than or equal to 20. And now I get a filtering with the bottom condition and the top condition.

Matt Kennedy (19:49): So let's look at another example. So this is another text match. In this case, I want to query by country. Again, this is one we would potentially have to completely rewrite our data and denormalize for, but now I can create a index. And this one is also not case sensitive. But now I can run that query and get all the cyclists back where the country matches GBR. One final one we will show on date column. So we will try to see how many have registration dates between those two limits. We can't do that. So we will create our index and run the query again. And we're off to the races.

Matt Kennedy (20:52): So if you have any experience developing with Cassandra, this is a complete game changer. We used to have to really create new tables every time we wanted to support a new query pattern. Materialized views did come along, but they're still going to rewrite all of the data. So here we're using a very space efficient index to be able to support different kinds of queries and it's just a question of adding that index to support the new query. I don't have to do all of the data management that I used to, to support adding a query to my application. So I'm pretty excited about SAI. In a second here, we will jump into some different examples. For the moment let me clear my screen here.

Matt Kennedy (21:50): Okay. So I've switched databases just so I can show you some new functionality that we've recently added to Astra. When we create an Astra database, we create a keyspace. And I tend to use keyspace1 or KS1 as that keyspace just because it's a good generic keyspace name that has been used in Cassandra documentation since forever. And so it's just kind of habit for me. But what I want to do right now is copy from the DataStax documentation on SAI a whole bunch of CQL statements that I want to run on the CQL Console. And rather than having to put those into a text editor and change the keyspace name in the statements, what I'm going to do is go over here and I'm going to add a keyspace.

Matt Kennedy (22:40): So my keyspace name is now going to be demo3 because that is the demo keyspace name and the code. So we'll just add that and we're going to wait for that to finish which takes a couple of seconds, not terribly long. You'll note that if I try to go To CQL Console while that's happening, I get a quick notice that a brief maintenance task is in progress. So this is the sort of thing that we occasionally have to do, it only takes a couple of seconds. But pretty soon we will be back in business. This is just preventing us from logging in and doing anything that would get in the way of that operation.

Matt Kennedy (23:23): So now I am here, I can log in. And I am going to be showing you how to use SAI operations against collection types in Cassandra. So we're going to create a table that has a map in it. I'm going to switch over to the new demo3 keyspace, and let's take a look at how I can use SAI against collection types. So here, we're going to create a new table called audit. We have an ID that's just an integer for the primary key. And our value is a text map with text keys and text values. So let's create that. And you see that created just fine with the keyspace name demo3. So I clearly now have two key spaces, I can use keyspace1 and demo3 which makes life convenient.

Matt Kennedy (24:34): I'm going to insert just a small number of rows, just three rows into this database. And these are just primary keys of one, two, and three, and then some names in our map. So what do we do with that? So let's say I want to do an exact match on a map entry. I want to write a query like this. Select star from demo3.audit where text map I key Irene equals Cantona. So I'm saying here that my map key is Irene and the value specifically has to be Cantona. And then I'm adding another condition, and I have a key of Mark where the value is Pastore. Now, I can't do this at all. It's not anything that I have an index created on. So let's create that index.

Matt Kennedy (25:33): So I am going to create an index of the entry type. So we'll talk about the weapon, what that means. So create custom index audit map entries idx on demo3.audit. And here we are specifying that this is on the entries of text map. There is also a key type and a value type. And again, we are using SAI as our index type. So now that I did that, I can go back and I can run this query that would not execute before. And I get an exact match where my collection conditions are equal to what I put in the query.

Matt Kennedy (26:16): Now, if I wanted to change this to query on, say just keys are just values, I have to change the index type. So for example, if I want to run this next query, I can't do that. And the reason I can't do that is that I have not created the index on the keys, I have created the index for the whole entry. So what I'm going to have to do if I want to create a new index, so let's see what that would look like. I want to create an index on the same column, but instead of the entries on the text map, I'm going to use the keys function on the text map.

Matt Kennedy (27:04): Now, unfortunately, I can't create more than one SAI index on the same column. And when you think about it, that's actually pretty sensible. I don't want to have to tell it which index to use when I go and query a table. So for the moment we have a limitation there, but that's okay. What I can do is drop that index. And we do that just by drop index and then the index name. And now I am free to rerun that create custom index command. And it'll run this time. And now I can run my contains key command. And you see we match the row where I have the key Giovani in the data. All right. So let's try One more, but this time on the values.

Matt Kennedy (28:02): So we know that we're going to need to drop this index if we want to create an index on the values. And now we're going to create this new index type. And I am going to select from that. And we should be good to go. So now I have matched a value in my map and I am all set to be able to create whatever indexes I need on my map collection types, and I have extended this awesome power of SAI to even more different kinds of data organization that I want to use for my application. We see maps used a lot as a way to deal with data of unknown structure. So if you are getting records into your system where you have some known fixed columns but there are potentially other columns added to a data stream that are ad hoc in nature, putting those into a map is a great way to deal with those. And now you can directly query those with SAI.

Matt Kennedy (29:18): All right. So I've got one more set of SAI demos to show you. So let me reset real quick here. Okay. So we are back at our CQLSH prompt. And what we're going to talk about now is how we can use SAI to index columns that are part of a composite partition key or composite primary key rather. So let me go ahead and use my keyspace, use KS1. And now let's create a table. So we're going to create this table that is admittedly a little bit contrived just to show this example. So I'll explain why here in a second.

Matt Kennedy (29:58): So we've created this person ID named compositkey2 table. The columns are an ID and age and a name. And we structure those so that the ID and the age are the partition key. And we then have a name that is part of the primary key. So what you would have here is partitions that were loaded with... Anybody with the same ID and age would go into the same partition. In reality, that's never going to happen because IDs tend to be unique. So really what we're doing here is we are showing this as an example of how we can create an index on a column type that we wouldn't ordinarily be able to create an index in any flavor of Cassandra prior to SAI. So let's take a look at how we can use that functionality.

Matt Kennedy (30:55): First, we're going to insert some data. So here I'm just inserting four rows. And now what I'm going to do is create an index on the age column. So this will allow me to basically query for age ranges. So I'm going to say, select R from, and I'm going to copy this table name because it is long and prone to typos. All right. So let's just get all those rows first. Now let's add a where clause. And we want to say age greater than 30, let's say, and age less than 35. And we'll get Bryn and Jason back. So as you can see, this is similar to the range query that we did before, only this time it's not a standard column. It's a column that is part of the primary key. And notably we did not have to copy it out into a column value to be able to index it, we were able to index directly on the column as declared in the composite primary key.

Matt Kennedy (32:16): So I hope those were illuminating demos about the capabilities of SAI. In case I didn't say so before, SAI stands for storage attached indexes. It is a indexing implementation that is not only inspired by the SASSI implementation that was put forward in Cassandra a few years ago, but also from years of experience with our own DSC search project or product. And everything that we've learned from that went into the technology that underlies the storage attached indexing. It's an awful lot to get into from a technical implementation standpoint, we'll be talking more about that in the coming days and weeks. But for now I really want to highlight the power that it brings to the Cassandra developer.

Matt Kennedy (33:13): This is really something that in some ways rewrites the book on Cassandra data modeling. There are going to be situations in the future where there's still a reason to denormalize for some tables or perhaps to use materialized views for some tables. And those cases are really going to be around the strictest of read SLAs. But for the majority of Cassandra data modeling challenges, and especially getting up and running in the beginning when all you really want to do is get the right queries into the system on top of as a single table that has all your data in it, or perhaps a small handful of tables, it really does change things for Cassandra developer adoption. We can now use Cassandra for use cases that require a little less planning than a Cassandra use case used to. Using Cassandra with the need to have multiple queries would really mean a huge logistical problem in denormalizing and identifying how we were going to do all those query patterns. And now it's just a question of creating additional indexes on those columns.

Matt Kennedy (34:42): So I'm very excited in case you couldn't tell, I remember back to Cassandra 0.7 when the C2I implementation first came out. C2I is shorthand for Cassandra secondary indexes incase that was sort of a head scratcher. So you see that abbreviation all the time referring to Cassandra secondary indexes. And at the time there was only thrift, CQL was kind of a twinkle in her eye at that point. And I remember getting a 0.7 Cassandra cluster up and running. And I loaded all this data, and I created like 10 C2I indexes, and running one query just caused the cluster to go kablooey.

Matt Kennedy (35:27): So I quickly became disillusioned with C2I as many other users did. It turns out that over time, it did evolve to have an interesting edge case where if you were querying on a large partition and specifying the partition key is the first part of the query, the C2I could be an effective index into the rest of that partition without causing too much problems. But that's a really kind of narrow case for indexing in a database. So we're very excited to have storage that attached indexes around to make life easier.

Architects Can Achieve More with Less

Matt Kennedy (36:06): This not only changes what life is like for developers, but architects can now reduce overall TCO. This makes Cassandra much easier to maintain. And it also really reduces the number of choices that you have to consider for a particular implementation. We can now provide the core indexing for everything in your Cassandra database. There are no bolt on parts required, no additional tooling for those key indexes that you need to support your query patterns.

Matt Kennedy (36:40): I'm going to leave this slide up for a second here so people can take some screenshots. I'm not going to completely go through this. But if you want to grab a quick screenshot to give yourself a cheat sheet on what SAI can do today, please do so. I will wait a second here while all folks do that. In general, this is our first pass at or first half of the implementation. We do have some additional features coming in the future. Things like geospatial support, which I'm very excited about. But for now this is your cheat sheet for what SAI can do.

Matt Kennedy (37:23): I want to talk a little bit about performance as well. So, the read performance is very much going to be driven by factors like, how selective the index is being for a particular query and how much data you're getting back, that sort of thing. But we can speak in very specific terms about performance expectations of throughput and latency on the right path. So compared to the C2I mechanism, we have SAI showing a 43% better throughput for writes. And compared to DSE search, it's 86% better throughput on those rights. And then latency on the writes is 230% better for SAI compared to C2I, and 670% better compared to DSE search.

Matt Kennedy (38:21): So especially on the right path, these indexes are really, really efficient. They're very storage efficient. And having that additional efficiency on the right path, think of that as keeping operations shorter and therefore fewer operations in flight, and therefore basically, better memory management of inflight operations within the database itself. Which is a huge deal from the standpoint of overall predictability on the system.

Matt Kennedy (38:50): We would have some improvements in latency, but as I said, again, that's going to be very, very dependent on the particulars of the query. So it's a little bit hard to measure and provide useful guidance on what the real improvement is. But the other aspect here is we also have generally better density support. So we can have more data in an index with SAI than we can with other mechanisms. There is some modest performance impact to writes and updates. But given the speed we're starting at with Cassandra write, it's very, very reasonable and most use cases I would not imagine would be that terribly sensitive to it.

Matt Kennedy (39:33): So there are a lot of questions on SAI and I want to make sure that I have a chance to cover a bunch of them. And so I'm going to close out here with a little bit of a sneak peek about what's coming up in future webinars. So we run Astra on GKE and the GCP Cloud. We have a Cassandra operator that is open-source that we use to run Astra databases. Now, there is a little bit of lag between things that we develop for Astra going into the operator. But there's still an awful lot in that operator that was built from us already having experience with starting and creating and destroying thousands of clusters in Astra. And all of that is codified into lifecycle management and break fix routines in the operator that are definitely worth checking out.

Matt Kennedy (40:37): If you are looking for a quick start on GKE, have a Google for DataStax and Kong. We have an older blog post from a couple of months ago on getting up and running with the Kong API gateway on Astra, and we use GKE to run that Kong instance, and the blog post has all kinds of details on exactly what you would need to get up and running with GKE. So do check that out.

Audience Questions

Matt Kennedy (41:08): So that said, we're going to dive into some of the questions you've submitted. If you haven't had a chance to submit your questions yet, please feel free to do so now via the Q&A widget on the left side of your screen, I'm going to try to get through as many of them as we can. But let me take a look at which ones have come in now. I did try to answer some of them earlier, but as I was doing so, it looked like the Q&A widget caused my screen to freeze a little bit. So I stopped doing that, which means we have a whole bunch of questions backed up here. So I'm going to take these as I can.

Matt Kennedy (41:45): So which DSC version supports SAI? We have that available in 6.8.3 in DSC. That is also if you have a 6.8 release, there is an earlier version of SAI available in beta on earlier releases, but 6.8.3 is the GA release. We have a question that asks, is there any limitation on the number of SAIs we can create on a table? There is a guardrail on SAIs. And we have that set to, I believe, eight SAI indexes on a table at the moment. I'm very interested in hearing feedback on that as people start to get into using SAI and whether that is a sufficient guardrail. We have raised guardrails in the past when there was a need to, and that's speaking specifically towards Astra. So if you have guardrails enabled on your DSC system, they will be the same. But if you disable the guardrails on your DSC system, then you effectively have an [inaudible 00:42:58] SAI indexes.

Matt Kennedy (43:04): Let's see. Does adding many indexes slow down at normal retrieves? So the answer there is no. If you are querying on the partition key or the primary key, you're not going to be affected on the read path there by any of the indexes. Let's see. Can Astra handle user defined types? Yes, Astra can handle user defined types. We do not yet have user defined aggregates or user defined functions in Astra, but we do support user defined types.

Matt Kennedy (43:38): Looking at some of the other questions here, we had a question earlier, does the partner solution need an agreement with DataStax separately? And just to clarify, this one is not an SAI question. This is about the GCP Console. Does the partner solution need an agreement with data specs separately or as a GCP customer can we start directly using and the charge goes to the standard billing account of the project?

Matt Kennedy (44:04): Yeah, so it's exactly as the latter half of that question stated, if you have a Google account and you have billing setup on our project, then you can find ask for directly under that partner solutions menu at the bottom of the left hand nav menu and click it, you will then be taken to an offerings page, that's where you have a chance to click through our terms of service. And so that's, you've got your Google Cloud in terms of service, and then for that specific offering there's a separate terms of Service. So that will be how you activate it for that specific project.

Matt Kennedy (44:41): Let's see. We also had a question, can you query using two indices that you have on a table? And the answer is yes. So if you had two columns indexed and you wanted to have a where clause that referenced both of those columns, then yes, you absolutely can do that with SAI.

Matt Kennedy (45:03): There are a couple of questions about the low level implementation details at SAI. We are going to be talking about how we go about open-sourcing SAI in the near future here. So I would definitely stay tuned to news on that. We'll be able to show everybody what that implementation looks like. It is a little bit beyond my ability to speak to, not being a database indexing engineer. So I hope you don't mind if I defer those questions to someone with more expertise than me to answer.

Matt Kennedy (45:44): When we create these SAI indexes, do we need to take disk space into consideration? Yes, to a degree. I mean, we are writing additional data, but this is a relatively efficient representation of that data. So you're going to be using less additional disk space for SAI than you would for say materialized view certainly, or a denormalized table. So you'd consider it a fairly space efficient option here.

Matt Kennedy (46:18): Is SAI available on AWS and Azure? It is. So it's available on any Astra database. This one is a little bit of a repeat, but it's stated differently. So I think it's worth answering. Can you use multiple SAIs in a single query? And the answer is yes. So that's the same as, can I have an index on more than one column in my table and specify those both in the where clause? Yes, we can.

Matt Kennedy (46:42): We have a question on, what is the cost structure of Astra? So essentially Astra charges by units called capacity units. And we do that at basically permanent billing resolution. So you'll see on the Astra page when you sign up that each of the offerings aside from the free tier has a price per hour. And you see that price per hour that is the price per hour per capacity unit. And so if you were running two capacity units, you would have two times that price per hour. And there's no additional fee for any of the bandwidth charges or fees per operation or anything like that.

Matt Kennedy (47:23): There is the ability to park databases. So what that allows you to do is just retain the storage. This is especially helpful in the case of, say an early development project where you are using a database during the daytime, you want to save some money and turn it off at night. You can park that database, all the data stays there. Also useful for testing scenarios where we have, let's say an integration test that happens once a week but you don't want to necessarily have to reload all of the data every time. So you can use a park database to do that. The parking is basically turning off the compute resource and keeping that storage around.

Matt Kennedy (48:11): Question about, is this integrated with Google Cloud? So as you saw on the earlier part of the demo, absolutely. We have a direct console integration with Google Cloud and the billing is integrated as well. So if you do create an Astra database through that GCP Console, the billing will go directly to your Google account.

Conclusion 

Matt Kennedy (48:33): And it looks like that's all the questions that we have time for. So with that, I want to thank you all for joining us today, and we would like to invite you to future DataStax events at DataStax.com/webinars, and be sure to sign up at DataStax.com for more information on Astra. Lastly, be sure to check out webinars two and three in this series; Ask the Cassandra Experts: Astra on GCP Webinar Series. Thank you very much for your time, everybody.

 

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.