TechnologyMarch 25, 2013

How to Create a DataStax Enterprise Cluster on Amazon EC2 in Ten Minutes

Robin Schumacher
Robin Schumacher
How to Create a DataStax Enterprise Cluster on Amazon EC2 in Ten Minutes

Let me show you how to set up a new DataStax Enterprise cluster – consisting of Apache Cassandra, Hadoop, and Solr – on Amazon’s cloud in about ten minutes using DataStax OpsCenter.

Step 1 – Install DataStax OpsCenter Enterprise on any existing EC2 machine

The first thing you’ll want to do is download and install DataStax OpsCenter Enterprise (which is part of DataStax Enterprise Edition) on an already running Amazon machine instance.

Important: Ensure that you set OpsCenter’s interface parameter configured properly in its config file as shown in the install instructions and have the machine’s firewall configured properly for OpsCenter to operate and talk to other Amazon instances that you’ll spin up. If you don’t do this, you likely won’t be able to connect to OpsCenter and/or have any database cluster connect to it either.

Barring any odd issues, once you have your EC2 firewall configured correctly, installing OpsCenter and having it up and running should take about five or so minutes from start to finish.

Step 2 – Use OpsCenter’s Create Cluster option to create a new database cluster

Once you have OpsCenter installed and running in your web browser, choose the Create Cluster option that’s in the upper right portion of the screen. You’ll be presented with a dialog like this:

Build DSE Cluster

In this example, I’m asking OpsCenter to create a new, six-node cluster that contains Cassandra, Hadoop, and Solr, and use version 3.0 of DataStax Enterprise. You’ll need to enter in your DataStax login information (that gets emailed to you when you register to download DataStax Enterprise) as well as your Amazon credentials.

Make sure you select whichever Amazon availability zone you’d like your new cluster to be in and it’s size. You can keep the other defaults as is.

Step 3 – Click the Build Cluster Button

With everything entered as described above, click the Build Cluster button. OpsCenter will then create your new cluster and automatically carries out the following actions for you:

  • Spins up new Amazon instances for the number of nodes you’ve requested.
  • Downloads the version of DataStax software you’ve requested onto all the nodes.
  • Installs DataStax Enterprise.
  • Configures each node as Cassandra, Hadoop, or Solr.
  • Sets the machine’s security and firewall so it can communicate with other nodes.
  • Correctly calculates and applies the data distribution parameters (e.g. token assignments) to all nodes.
  • Installs and configures OpsCenter agents on all nodes so they can be managed and monitored.
  • Registers the new cluster with OpsCenter for management.

OpsCenter keeps you informed of its progress along the way:

DSE Cluster Build StartDSE Cluster Install StartDSE Install Connected no IPs

 

DSE Install finish with ring

 

Keep in mind that the actually time it takes to perform the above functions can differ depending on a number of factors (e.g. new instance spin up time, download speed, etc.), but in general, we’ve seen 10+ node cluster be commissioned and ready for action on Amazon West Coast availability zones in about five minutes.

Next Steps

To start building new DataStax Enterprise clusters on Amazon, download OpsCenter Enterprise today onto an existing instance you have running on Amazon. And don’t forget that you can also use OpsCenter to create database clusters in your own data centers as well as the cloud.

You can also use our DataStax AMI to create EC2 database clusters outside of OpsCenter if you’d like, or use other command-line methods for creating clusters using our other install packages.

For detailed install and configuration instructions, please see our online documentation.

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.