Five Steps to an Awesome Data Model
This is an excerpt from the DataStax whitepaper "Data Modeling in Apache Cassandra™;" which delves into how to choose the right data model for your Apache Cassandra™ application in 5 easy steps. Click here to download the full whitepaper.
Step 1: Build the application workflow
When building applications using relational databases, developers often start with the data model, thinking about the data items that need to be stored and how they relate to one another. With Cassandra, just the opposite is recommended. The best practice is to start with the application workflow; an approach referred to as “query-first design.”
Before thinking about how data will be stored, designers need to know what types of queries the database will need to support. Figure 3 presents a simplified application workflow for KillrVideo.com.
Figure 3 – Simplified application workflow
The sequence of workflow steps matters because it helps us determine what data is available and required for each query. For example, before we can show basic information about a user (step 2 above), a userid is required. The user first needs to log in to the site (step 1) supplying an email address and password in exchange for the required userid. A userid might also be obtained by searching for a video (steps 6 or 7), showing comments for a video (step 9), and looking up details about the user that commented. Similarly, before the application can display details about a video (step 8) the application needs a videoid obtained by selecting from a list of the latest videos (step 7) or by searching videos by tag (step 6).
Step 2: Model the queries required by the application
Even at the design stage, developers can think through the sequence of tasks required, mock up what each screen will look like, and decide what data will be required at each stage.
Figure 4 shows a simplified entity relationship diagram (ERD) for the KillrVideo application. The application needs to be able to keep track of entities such as users, videos, and comments. Users can perform activities such as adding videos, rating videos, and posting comments. Users can comment on multiple videos, and each video can have multiple user comments associated, but there is only one owner of each video.
Figure 4 – KillrVideo: Entity relationship diagram (ERD) for KillrVideo
It’s a good idea to iterate between the application workflow and ERD, updating both as new data items and relationships required by the application are identified. Once developers have a clear idea of the application workflow and the key data objects required, it’s possible to start identifying the queries that the application needs to support. A diagram showing key queries and how they relate to data domains is shown in Figure 5.
Figure 5 – Identify the queries required to support the application workflow
Thanks for reading this excerpt from the DataStax whitepaper "Data Modeling in Apache Cassandra™" tune in next week when we release another excerpt or click here to download the full asset.