TechnologyAugust 28, 2018

Work with DSE Graph in Your Web Applications Part (1/2) : CRUD Operations

Work with DSE Graph in Your Web Applications Part (1/2) : CRUD Operations

Graph Databases are really effective when it comes to working with highly connected data and getting value based on relationships, as we detailed in this previous blogpost. This article focuses on integrating graph databases with web applications to implement CRUD operations, pattern detection, and visualization in the user interface. Part 1 is dedicated to environment setup and CRUD operations, and Part 2 will dig into the user interface. Let's get our hands dirty.

Getting DataStax Enterprise Running

Start DSE Docker image

There are multiple ways to install DataStax Enterprise, for example, from tarball, installer, or the OpsCenter Lifecycle Manager. There are also the very convenient Docker images available on Docker Hub which allow us to do some quick tests without having to install anything; this is the approach we chose here. Let's run the image datastax/dse-server(defaulting to the latest version) and provide the option "-g" to enable the Graph workload we will work with today. You can notice several additional options, -s enables the search workload, and the environment variable DS_LICENSE is required to accept licence terms.

docker run -e "DS_LICENSE=accept" -it -d -p 9042:9042 --name dse datastax/dse-server -s -g

Start DataStax Studio Docker image

In this blog post we will illustrate samples with screenshots from DataStax Studio. This user interface will help us creating the schema and importing our first data without coding or using the command line, which is again pretty convenient. As for DataStax Enterprise, you can download and install the tool or you can simply run the datastax/dse-studio Docker image. Note that we link this Studio container with the DSE container by providing a name with --name for DSE and using the --link option on the Studio container.

docker run -e "DS_LICENSE=accept" -it -d -p 9091:9091 --link dse:dse datastax/dse-studio


Please note that if you don't want to lose your data or notebooks, I recommend defining external volumes on each container. You can use a docker-compose.yaml file as shared by Jeff Carpenter in his blogpost on Medium.com that I reproduce here :

version: '2'
services:
  # DataStax Enterprise
  dse:
    image: datastax/dse-server:6.0.2
    command: [ -s -g ]
    ports:
    - "9042:9042" # cassandra
    environment:
      DS_LICENSE: accept
    volumes:
    - "./data:/var/lib/cassandra"
    # Allow DSE to lock memory with mlock
    cap_add:
    - IPC_LOCK
    ulimits:
      memlock: -1

# One instance of DataStax Studio
  studio:
    image: datastax/dse-studio:6.0.0
    ports:
    # The Web UI exposed to our host
    - "9091:9091"
    depends_on:
    - dse
    environment:
      DS_LICENSE: accept
    volumes:
    - "./notebooks:/var/lib/datastax-studio"

To start the related containers please do :

docker-compose up -d


Once the containers are started, you should be able to access Studio at the URL http://localhost:9091. You can setup a connection to point to dse which is the dse container hostname. If you run into trouble, check the official documentation to help you to create step by step a connection and your first notebook.

Create the Graph

As for most content you will find on DataStax Academy, we will leverage on Killrvideo reference application and its recommendation engine. Users can upload, rate and tag videos. We use the graph schema detailed on the following picture :

To create this schema first create a notebook, then create a Gremlim code block and execute the following:

// Create Graph
system.graph("killrvideo_video_recommendations")
           .replication("{'class' : 'SimpleStrategy', 'replication_factor': '1' }")
           .ifNotExists().create();

// Create property keys
schema.propertyKey("tag").Text().ifNotExists().create();
schema.propertyKey("tagged_date").Timestamp().ifNotExists().create();
schema.propertyKey("userId").Uuid().ifNotExists().create();
schema.propertyKey("email").Text().ifNotExists().create();
schema.propertyKey("added_date").Timestamp().ifNotExists().create();
schema.propertyKey("videoId").Uuid().ifNotExists().create();
schema.propertyKey("name").Text().ifNotExists().create();
schema.propertyKey("description").Text().ifNotExists().create();
schema.propertyKey("preview_image_location").Text().ifNotExists().create();
schema.propertyKey("rating").Int().ifNotExists().create();

// Create vertex labels
schema.vertexLabel("user").partitionKey('userId').properties("userId", "email", "added_date").ifNotExists().create();
schema.vertexLabel("video").partitionKey('videoId').properties("videoId", "name", "description", "added_date", "preview_image_location").ifNotExists().create();
schema.vertexLabel("tag").partitionKey('name').properties("name", "tagged_date").ifNotExists().create();

// Create edge labels
schema.edgeLabel("rated").multiple().properties("rating").connection("user","video").ifNotExists().create();
schema.edgeLabel("uploaded").single().properties("added_date").connection("user","video").ifNotExists().create();
schema.edgeLabel("taggedWith").single().connection("video","tag").ifNotExists().create();

// Help development and allow scans
schema.config().option('graph.allow_scan').set('true')
schema.config().option('graph.schema_mode').set('Development')
 

Import Data

Create a new gremlin code block in the notebook, and execute the following script to create sample data to populate the Graph. Here we create 2 users. The first user1 will upload a video Video1 with tags tag1Video and tag2Video.

// Create 2 Users
user1 = graph.addVertex(T.label, 'user', 'userId', 'd0de3100-fc20-4079-bef5-2fb3b8ff51f3',
                 'email', 'user1@gmail.com',
                 'added_date', java.time.Instant.now());
user2 = graph.addVertex(T.label, 'user', 'userId', 'b1ff6d5f-80da-4ada-a852-165ce07e90d5',
                 'email', 'user2@gmail.com',
                 'added_date', java.time.Instant.now());

// user1 upload video1 with 2 tags tag1Video and tag2Video
def insertTimeVideo1 = java.time.Instant.now();
video1 = graph.addVertex(T.label, 'video', 'videoId', '6c30089c-25d2-434d-b685-f1b6073d8e16',
                   'name', 'Video1',  'added_date', insertTimeVideo1);
tag1video1 = graph.addVertex(T.label, 'tag', 'name', 'tag1Video', 'tagged_date', insertTimeVideo1);
tag2video1 = graph.addVertex(T.label, 'tag', 'name', 'tag2Video', 'tagged_date', insertTimeVideo1);
video1.addEdge('taggedWith', tag1video1);
video1.addEdge('taggedWith', tag2video1);
user1.addEdge('uploaded', video1);

user2.addEdge('rated', video1, 'rating', 4);
'Success'

We use a static UUID here to ease readability, but you can also generate a random UUID with def myUserId = UUID.randomUUID(); and substitute whenever needed. Now add a new Gremlin code block and enter the following :

g.V().has('video', 'videoId', '6c30089c-25d2-434d-b685-f1b6073d8e16').bothE();

Now spot the icon  in the result. Click this icon to display results as a graph and you should see something like this:

Congratulations, you are now set up. You can play a bit more with the Gremlin language and see how the graph evolves as you make changes. Try the autocompletion mechanism in Studio to get some additional ideas on extending your query.

DataStax Studio is a very convenient environment to learn and browse graph data, but what about implementing a real web application? Enough with the shenanigans, let's get serious.

CRUD Operations in web applications

Configuring your application

First, you need to add dependencies to DataStax drivers in the pom.xml file of your Java application:

<dependency>
 <groupId>com.datastax.dse</groupId>
 <artefactId>dse-java-driver-graph</artefactId>
 <version>1.6.8</version>
</dependency>

You can now create a JUnit Test class which checks that you are able to connect to Dse Graph as expected :

public class StandAloneGraphTest {
    
  DseSession dseSession;
    
  @Before
  public void createSession() {
    Builder clusterConfig = new Builder();
    clusterConfig.withPort(9042);
    clusterConfig.addContactPoint("localhost");
    GraphOptions graphOption = new GraphOptions();
    graphOption.setReadTimeoutMillis(100000);
    graphOption.setGraphName("killrvideo_video_recommendations");
    clusterConfig.withGraphOptions(graphOption);
    dseSession = clusterConfig.build().connect();
  }

@Test
  public void listAvailableGraphs() {
    dseSession.executeGraph(new SimpleGraphStatement("system.graphs()")
                 .setSystemQuery())
                 .all().stream().map(GraphNode::asString)
                 .forEach(System.out::println);
  }
    
  @After
  public void closeSession() {
    // Even is cassandra session are stateless, driver let socket opened that need to be closed
    dseSession.getCluster().close();
  }
}

You can immediately see here how to use the Java driver. Convert a Gremlin query into a GraphStatement and execute using executeGraph(). From that point, let's create methods to populate the graph as we did before. Initialize the DseSession object as we just did in the previous test.

public UUID createUser(String email) {
  UUID userUuid = UUID.randomUUID();
  dseSession.executeGraph(
    DseGraph.statementFromTraversal(
      DseGraph.traversal(dseSession).addV("user")
        .property("userId", userUuid.toString())
        .property("email", email)
        .property("added_date", new Date())
      )
    )
  );
  return userUuid;
}

public UUID userUploadVideo(UUID userId, String videoName, String... tagNames) {
  UUID videoUuid = UUID.randomUUID();

// Batch operations
  TraversalBatch batch = DseGraph.batch();
  
  // Create Vertex Video
  batch.add(addV("video").property("videoId", videoUid)
    .property("name", videoName)
    .property("added_date", new Date()));
  
  // Create Edge 'uploaded' from User to Video
  batch.add(addE("uploaded")
    .from(DseGraph.traversal(dseSession).V().has("user", "email", userEmail))
    .to(DseGraph.traversal(dseSession).V().has("video", "videoId", videoUuid)));
  
  // Create Vertices Tag and Edges from Video to Tags
  for (String videoTag : tagNames) {
    batch.add(addV("tag").property("name", videoTag).property("tagged_date", new Date()));
    batch.add(addE("taggedWith")
      .from(DseGraph.traversal(dseSession).V().has("video", "videoId", videoUuid))
      .to(DseGraph.traversal(dseSession).V().has("tag", "name", videoTag)));
   }

  // Execute statements
  dseSession.executeGraph(batch.asGraphStatement());
}

You might notice several points here :

GraphTraversal<Vertex, Vertex> traversal = DseGraph.traversal(dseSession)
  .V().has("video", "videoId", videoUid).fold()
  .coalesce(__.unfold(), 
                   __.addV("video").property("videoId", videoUid)
                      .property("name", videoName)
                      .property("added_date", new Date()))
   .sideEffect(__.as("^video").coalesce(__.in("uploaded").hasLabel("user").has("email", userEmail), 
                      __.V().has("user", "email", userEmail).addE("uploaded").to("^video").inV()));
for (String videoTag : videoTags) {
   traversal.sideEffect(__.as("^video").coalesce(
                    __.out("taggedWith").hasLabel("tag").has("name", videoTag),
                    __.coalesce(__.V().has("tag", "name", videoTag), 
                    __.addV("tag").property("name", videoTag)).addE("taggedWith")
                    .from("^video").inV()));
 }

Execute the query g.V() in Studio again and see your graph updated.

All creation operations are upserts.

We will now use the driver to implement several additional operations to work with the User plain old java object :

public class User {
    private UUID uuid;
    private String email;
    private Date addedDate;
 
   // Constructors...
   // Getters ans Setters ...
}

We want to find a user by id (if exist) : 

public Optional < User > findUserById(UUID uuid) {
  GraphResultSet gras = dseSession.executeGraph(
    DseGraph.statementFromTraversal(
      DseGraph.traversal(dseSession).V().hasLabel("user").has("userId", uuid.toString())));
    if (!gras.isExhausted()) {
      GraphNode record  = gras.one();
      com.datastax.driver.dse.graph.Vertex userVertex = record.asVertex();
      String userEmail = userVertex.getProperty("email").getValue().asString();
      Date userDate    = Date.from(userVertex.getProperty("added_date").getValue().as(Instant.class));
      return Optional.ofNullable(new User(uuid, userEmail, userDate));
    }
    return Optional.empty();
 }

Delete a user by id (if exist). It is important to notice that the edges connecting this Vertex to others will be dropped. More information is available in the documentation.

public boolean deleteUserById(UUID uuid) {
  if (findUserById(uuid).isPresent()) {
    dseSession.executeGraph(
      DseGraph.statementFromTraversal(
        DseGraph.traversal(dseSession).V().hasLabel("user").has("userId", uuid.toString()).drop()));
    return true;
   }
   return false;
}

List video names updated by a single user :

 public Set < String > findlistOfVideoUploadedByUser(UUID uuid) {
  GraphResultSet res = dseSession.executeGraph(
    DseGraph.statementFromTraversal(
      DseGraph.traversal(dseSession)
        .V().hasLabel("user").has("userId", uuid.toString())
         .out("uploaded").values("name")));
    if (!res.isExhausted()) {
      // ALL fetch everything, be sure you don't need pagination here
      return res.all().stream().map(GraphNode::asString).collect(Collectors.toSet());
    }
    return new HashSet<>();
}

Please note that, as for Cassandra, selecting all records is a bad practice as it requires a table full scan, which is not what we want with big graphs and real time queries.

Conclusion and Takeaways

Starting to work with DSE Graph is super easy using Docker. DataStax provides non only the database runtime but also DataStax Studio, a powerful notebook-based user interface to work with data in Gremlin and CQL.

Working with Graph in Java applications is easy: simply import the driver and start executing Gremlin statements exactly the same way as Cassandra queries. The driver provides a fluent API to help build complex queries with autocompletion simplicity.

That's it for Part 1. In Part 2 we will leverage on the work done here and improve to visualize the very same graph in your own web application user interfaces. You can download the source code presented here from github.

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.