CompanyMarch 17, 2022

Easily Manage Workflows at Scale with Temporal.io and Astra DB

This is the first in our three-part series on how to connect Astra DB, a managed cloud-native database with open-source Temporal.io to more easily manage even your heaviest workloads with the high performance and reliability of Apache Cassandra®.
Melissa Herrera
Melissa HerreraLead Solutions Engineer
Ranjan Melanta
Ranjan MelantaData Architect
Easily Manage Workflows at Scale with Temporal.io and Astra DB

Managing long running workflows has never been easier, using open source Temporal.io as your microservices platform. With Temporal’s quick start installation, getting started is simple plug-and-play for anyone wanting to use it with their own personal workflows.

Currently, Temporal provides different configurations using different databases and dependencies, including Apache Cassandra. DataStax Astra DB, which is built on Cassandra, allows you to add a managed, cloud-native database backing your Temporal platform that will support your heavy workloads while providing high performance and reliability.

In this first post, we introduce you to Temporal and Astra DB and illustrate the advantages of using these two powerful technologies together to manage your workflows reliably and with less effort.

What is a workflow?

Workflows help developers build, run, and scale background jobs that have parallel or sequential steps. You can think of Workflow as a fully-managed state tracker and task coordinator.  Let’s say your application takes more than a few hundred milliseconds to complete.  How do you track the application state? How do you recover or retry if a task fails?  This is where  workflow orchestration can help you.  These long-running, reliable, fault-tolerant tasks are sometimes referred to as re-entrant processes (more on that later).

Workflow orchestration is most often used to orchestrate long-running provisioning, monitoring, and management operations.  Think about a cloud service’s control plane, which is required to be operational more than 99% of the time.  They are also used for business transactions, business process applications, and managing software / cloud infrastructure.

What is Temporal?

Temporal is an open source, distributed and scalable workflow orchestration engine capable of running millions of workflows. Workflows can hold state and describe which activities (workflow tasks) should be carried out.

Figure 1. Illustration of a Temporal Cluster, which consists of four independently scalable services. (Source: Temporal.io)

The Temporal “system” consists of a Temporal Server (either the Temporal Cloud service or self-hosted) orchestrating work with a fleet of Temporal workers (operated by application developers), and Temporal Clients (embedded in applications) over gRPC.

These workflows are carried out by the Temporal server which consists of four independently scalable services:

  • Frontend gateway (rate limiting, routing, authorizing)
  • History subsystem to maintains data (mutable state, queues, and timers)
  • Matching subsystem to host task queues for dispatching
  • Worker service to handle the internal background workflows

Temporal Clients embedded within your app (route handlers or serverless functions) can start, cancel, signal, and query individual workflow executions. Activities are distributed using task queues and executed on worker nodes organized in a cluster.

A Temporal cluster is a Temporal Server paired with a persistence layer (i.e. the data access layer). The workflow data is stored in its respective backend depending on what you choose to configure Temporal with. Supported databases include PostgreSQL, MySQL, and Cassandra.

A Temporal Application is a set of workflow executions. 

A Temporal Workflow Execution is a Reentrant Process. A Reentrant Process is resumable, recoverable, and reactive.

  • Resumable: Ability of a process to continue execution after execution was suspended on an awaitable.
  • Recoverable: Ability of a process to continue execution after execution was suspended on a failure.
  • Reactive: Ability of a process to react to external events.

One aspect of the Temporal system is that it abstracts the complexity of a distributed system. Distributed systems are known to scale computation across multiple machines and handle the potential load of system changes. In theory, a distributed system facilitates a reliable and highly performant application. Application developers don’t have to worry about handling failures because that’s handled by the engine.

However, any failure that leaves the downstream part of the application waiting for a response can make things very complicated, especially at a large scale.

Temporal Persistence Layer

Temporal supports Cassandra, MySQL, and PostgreSQL schemas and can be used as the server’s database. The database stores the following types of data:

  • Tasks to be dispatched
  • The state of Workflow Executions
  • The mutable state of Workflow Executions
  • Event History, which provides an append-only log of Workflow Execution History Events
  • Namespace metadata for each Namespace in the Cluster
  • Visibility data, which enables operations like “show all running Workflow Executions”

Figure 2. Parts of a Temporal Cluster stored in a database (Source: Temporal.io).

Why Cassandra and Temporal work so well together

Temporal is write-intensive, which Cassandra excels at handling.  For this and other reasons, Cassandra is often employed to handle potentially unbounded data volumes of machine-generated data. When Temporal is configured to use an Apache Cassandra database, your Temporal deployment can handle a massive amount of data, and experience other benefits like: 

  • Continuous availability
  • High performance / low latency writes, and reads (assuming a proper data model)
  • Linear scalability
  • Global, multi-region replication
  • Improved consistency and integrity (via internal usage of Cassandra LWTs)

When you combine Temporal with Astra DB, you get all the benefits of Cassandra with the simplicity of a managed cloud service (DBaaS) that offers, among many other things: 

  • Multi-cloud deployment 
  • Serverless pricing and autoscaling
  • Relational style, secondary database indexes (SAI)
  • Global, multi-region active/active deployment
  • End-to-end security

Astra DB simplifies cloud-native Cassandra application development

Cassandra is the open-source NoSQL database behind some of the largest applications in the world, including Netflix and Instagram. Astra DB is built on Cassandra to simplify cloud-native Cassandra application development. Using Astra DB takes care of the operational burden often associated with using a powerhouse database like Cassandra, while reducing deployment time from weeks to minutes.

Astra DB creates the Cassandra database for you. 

Ready to dig in?

Astra DB makes getting the benefits of Cassandra and Temporal easy. In our next post in this series, we’ll show you how you can connect Temporal with Astra DB in just five easy steps:

  1. Astra DB prerequisites
  2. Temporal pre-setup
  3. Temporal schema migration to Astra DB
  4. Run Docker Compose 
  5. Test and validate

You don’t have to wait until our next post to get started though. Sign up for your free Astra DB account now and start exploring how it can benefit your application development. 

Follow the DataStax Tech Blog for more developer stories. Check out our YouTube channel for free tutorials and DataStax Developers on Twitter for the latest news in our developer community.

Resources:

  1. Temporal
  2. Apache Cassandra
  3. DataStax Astra DB
  4. Introduction to Temporal
  5. What is a Worker Process?
  6. What is a Temporal Cluster?
  7. What is a Temporal Application?
  8. Tour of Temporal: Welcome to the Workflow
  9. What is a Task?
  10. What is a Workflow Execution?
  11. What is an Event History?
  12. What is a Namespace?
  13. Lightweight transactions (LWT) feature of Cassandra

 

Discover more
DataStax Astra DBCloudMicroservices
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.