TechnologyJuly 17, 2019

Using the Object Mapper with Scala

Using the Object Mapper with Scala

DataStax's Java driver for Apache Cassandra and DSE has a very useful feature called Object Mapper that greatly simplifies the development of the code that accesses the database. It allows you to map Cassandra tables directly into your business objects, so you don't need to write the numerous calls, like, getInt, getString, etc. to extract data from returned rows and explicitly convert that data into your classes.

The approach is simple (full code for Java examples is here):

  • you annotate your POJO class(-es) with annotations provided by driver (you need to specify extra dependency, as this functionality isn't included into core module);
  • obtain instance of Mapper class from MappingManager - this will collect annotations from your POJO, and generate prepared queries for select/insert/delete operations;
  • access your data by calling get on the instance of Mapper, use save to insert/update data, or use delete to perform delete operation.

Let's say that our table looks as following (and we insert some test data into it):

create table test.scala_test (  
   id int primary key,  
   t text,  
   tm timestamp);
insert into test.scala_test(id,t,tm) values (1,'t1','2018-11-07T00:00:00Z');

Then the Java implementation may look as following:

// POJO defintion...
@Table(name="scala_test", keyspace = "test")
public class TableObjJava {
    @PartitionKey    
    int id = 0;    
    String t = "";    
    Date tm = new Date();    

    // getters & setters are omitted
}
// somewhere in the code
// ...    
    MappingManager manager = new MappingManager(session);    
    Mapper mapper = manager.mapper(TableObjJava.class);    

TableObjJava obj = mapper.get(1);    
System.out.println("Obj(1)=" + obj);

There is also support for execution of "custom" queries for cases when you need to retrieve a set of objects, or when the Mapper approach isn't flexible enough. This is done by:

  • declaring the Java interface, and annotating it as @Accessor;
  • declaring function(s) inside that interface annotated as @Query together with specification of custom query that you want to execute;
  • obtaining instance of accessor from MappingManager, and call declared function(s).

Let look to following table with more complex structure with partition key consists of 2 columns, and 2 additional clustering columns:

create table test.scala_test_complex (  
    p1 int,  
    p2 int,  
    c1 int,  
    c2 int,  
    t text,  
    tm timestamp,  
    primary key ((p1,p2), c1, c2));
insert into test.scala_test_complex(p1, p2, c1, c2, t,tm)  
    values (0,1,0,1,'t1','2018-11-07T00:00:00Z') ;
insert into test.scala_test_complex(p1, p2, c1, c2, t,tm)  
    values (0,1,1,1,'t1','2018-11-08T10:00:00Z') ;

We declare POJO and accessor (note that we don't put annotations on the fields in this case):

// POJO definition
@Table(name = "scala_test_complex", keyspace = "test")
public class TableObjectClustered {
    int p1 = 0;    
    int p2 = 0;    
    int c1 = 0;    
    int c2 = 0;    
    String t = "";    
    Date tm = new Date();    

    TableObjectClustered() {
    }
// getters/setters/...
}

// Accessor definition
@Accessor
public interface TableObjAccessor {
    @Query("SELECT * from test.scala_test_complex where p1 = :p1 and p2 = :p2")    
    Result getByPartKey(@Param int p1, @Param int p2);       

    @Query("DELETE from test.scala_test_complex where p1 = :p1 and p2 = :p2")    
    void deleteByPartKey(@Param int p1, @Param int p2);

}

And we can retrieve or delete data by partition using the following code:

MappingManager manager = new MappingManager(session);
TableObjAccessor accessor = manager.createAccessor(TableObjAccessor.class);
Result objs = accessor.getByPartKey(0, 1);
for (TableObjectClustered obj: objs) {    

    System.out.println("Obj=" + obj);
}
accessor.deleteByPartKey(0,0);

The version 3.x of DataStax C* Java driver (and 1.x DSE Java driver) is processing annotations in run-time. The new version of the DataStax C* Java driver is also available - 4.1 (and corresponding DSE Java driver - 2.1), and it includes the completely new implementation of Object Mapper, that will be a topic for a separate blog post.

In this post I won't put much emphasis on the detailed description of full functionality - just read the official documentation - it's really great. Here we'll concentrate more on explaining how the Object Mapper could be used together with Scala.

Scala as the JVM-based language also provides support for annotations, but there are some differences. Let's start with the following example that is mapping the instance of the Scala class into the test.scala_test table shown above. We can map this table into following Scala class by different ways. For example, we can use class with "mutable" fields, declared as var, so we can update them from auxiliary constructor - in this case code looks more like Java:

@Table(name = "scala_test")
class TableObj {  
    @PartitionKey  
    var id: Integer = 0;  
    var t: String = "";  
    var tm: java.util.Date = new java.util.Date();  

def this(idval: Integer, tval: String, tmval: java.util.Date) = {    
    this();    
    this.id = idval;    
    this.t = tval;    
    this.tm = tmval;  
}

    override def toString: String = {    
    "{id=" + id + ", t='" + t + "', tm='" + tm + "'}"  
    }
}

Or we can declare a class with immutable fields - in this case the class definition looks more like case classes that we'll describe later:

@Table(name = "scala_test")
class TableObjectImmutable(@PartitionKey id: Integer, t: String, tm: java.util.Date) {
    override def toString: String = {    
        "{id=" + id + ", t='" + t + "', tm='" + tm + "'}"  
    }
}

In both cases, we specify the @Table annotation the same way as we did in Java (note that we didn't specify the keyspace parameter to that annotation, so we'll need to specify the keyspace name when establishing the session).

Also note, that we're using the Java data types in the class declaration - we need to do this because by default Java driver does have codecs (classes to translate Cassandra data types into Java classes) only for Java types, so if we'll use Scala types directly, we'll immediately get an error about absence of matching codec. (In DataStax's GitHub repository there is an implementation of codecs for Scala types, but they aren't supported officially, so there is no released artifact that we can use directly, although we can include source files into project and use them).And we can use that class very similar to the Java code:

val cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
val session = cluster.connect()
val manager = new MappingManager(session)

val mapperClass = manager.mapper(classOf[TableObj], "test")
val objClass = mapperClass.get(new Integer(1))
println("Obj(1)='" + objClass + "'")

And as expected, this will print us following:

Obj(1)='{id=1, t='t1', tm='Wed Nov 07 01:00:00 CET 2018'}'

We can save a new instance of our class into the database, and check that it's there, and after that, use the instance of class to delete data from database:

mapperClass.save(new TableObject(2, "t2", java.util.Date.from(Instant.now())))
val objClass2 = mapperClass.get(new Integer(2))
println("Obj(2)='" + objClass2 + "'")

mapperClass.delete(objClass2)

These examples show only a small part of annotations available for use - we'll see more of them below.

Besides "normal" classes, Scala also has a special type of classes - case classes that often are much easier to use than classes to hold data - compiler generate many things automatically (like toString, etc.), you don't need to write new to create them, etc.

But if we'll try to use Object Mapper with them the same way as we use with "normal" classes:

@Table(name = "scala_test")
case class TableObjectCaseClass(@PartitionKey id: Integer, t: String, tm: java.util.Date) {
   def this() {    
      this(0, "", new java.util.Date())  
   }
}

we'll start to get cryptic error when trying to access data:

Exception in thread "main" java.lang.IllegalArgumentException: Invalid number of PRIMARY KEY columns provided, 0 expected but got 1
at com.datastax.driver.mapping.Mapper.getQueryAsync(Mapper.java:447)
at com.datastax.driver.mapping.Mapper.getQueryAsync(Mapper.java:440)
at com.datastax.driver.mapping.Mapper.getAsync(Mapper.java:521)

From the error message we can see that driver expects that row does have 0 columns in the primary key, but that couldn't be true! So it looks like that our annotation wasn't taken into account. And that's true - to apply annotation to the field of the case class it should be declared slightly differently - with special @field annotation of Scala:

@Table(name = "scala_test")
case class TableObjectCaseClass(@(PartitionKey @field) id: Integer, t: String, tm: java.util.Date) {  
   def this() {    
      this(0, "", new java.util.Date())  
      }
}

Note that we anyway need provide an empty constructor for case class, otherwise Java driver won't be able to create an instance of the class, throwing the following error (trying to specify defaults for members won't help as it's still three args constructor):

Caused by: java.lang.NoSuchMethodException: TableObjectCaseClass.()
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at com.datastax.driver.mapping.ReflectionUtils.newInstance(ReflectionUtils.java:17)

The ability of mapper to find match between columns and fields based on the matching names is good, but sometimes we need to use different name for a field. This could be easily fixed with the @Column annotation:

@Table(name = "scala_test")
case class TableObjectCaseClassRenamed(@(PartitionKey @field) id: Integer,                                      
                                       @(Column @field)(name = "t") text: String, tm: java.util.Date) {  
   def this() {    
      this(0, "", new java.util.Date())  
   }
}

Also, sometimes, the table may have complex primary key, like it's used in the scala_test_complex table shown above. In this case, Object Mapper requires that we explicitly specify the order of fields in the primary key - separately for partition key, and separately for clustering columns. This is done with value attribute of annotation (we can omit (value = 0) as it's a default value):

@Table(name = "scala_test_complex", keyspace = "test")
case class TableObjectCaseClassClustered(@(PartitionKey @field)(value = 0) p1: Integer,                          
                                         @(PartitionKey @field)(value = 1) p2: Integer,                         
                                         @(ClusteringColumn @field)(value = 0) c1: java.lang.Integer,                                        
                                         @(ClusteringColumn @field)(value = 1) c2: java.lang.Integer,                                        
                                         t: String,                                        
                                         tm: java.util.Date) {  
   def this() {     
      this(0, 0, 0, 0, "", new java.util.Date())  
   }
}

In complex cases, users are often organizing the related information into structures, called user defined types (UDT). This often helps to easier model structure of tables. For example, we have the following table with column of user defined type:

create type test.scala_udt(  
   id int,  
   t text);
create table test.scala_test_udt(  
   id int primary key,  
   udt frozen
);
insert into test.scala_test_udt (id, udt) values (1, {id: 1, t: 't1'});

Object Mapper supports mapping of user defined types into classes and case classes via @UDT annotation that requires a name of the type as parameter. And similarly to tables, we can name field differently than it's named in the type's definition - we only need to use @Field annotation instead of @Column:

@UDT(name = "scala_udt")
case class UdtCaseClass(id: Integer, @(Field @field)(name = "t") text: String) {
   def this() {

      this(0, "")  
   }
}

@Table(name = "scala_test_udt")
case class TableObjectCaseClassWithUDT(@(PartitionKey @field) id: Integer,
                                      udt: UdtCaseClass) {
  def this() {
    this(0, UdtCaseClass(0, ""))
  }
}

   

That's all about using Object Mapper with Scala's classes & case classes.

Now we can come to slightly different topic - use of accessors in Scala. As was mentioned above, accessors are very handy when built-in functionality of Mapper class is not enough. In many cases this happens when we need to work with collections of objects, for example, retrieve the whole partition (or range inside partition) as a sequence of objects, or perform deletion by partition key only, etc.

The creation of accessors in Scala is very similar to Java, but instead of interface we need to add the @Accessor annotation to a trait. For example, following accessor declares 2 function - one to retrieve all rows from partition and return them as a sequence of objects, and second - to remove data by partition key only (it's more effective than deletion of individual rows that is performed by Mapper):

@Accessor
trait ObjectAccessor {
@Query("SELECT * from scala_test_complex where p1 = :p1 and p2 = :p2") def getByPartKey(@Param p1: Integer, @Param p2: Integer): Result[TableObjectCaseClassClustered]

@Query("DELETE from scala_test_complex where p1 = :p1 and p2 = :p2") def deleteByPartKey(@Param p1: Integer, @Param p2: Integer)
}

And we can use it really easy:

val cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
val session = cluster.connect("test")
val manager = new MappingManager(session)
val accessor = manager.createAccessor(classOf[ObjectAccessor])

val rs = accessor.getByPartKey(0, 1)
for (r <- JavaConverters.asScalaIteratorConverter(rs.iterator()).asScala) {
  println("r=" + r)
}

accessor.deleteByPartKey(0,0)

that will produce:

r=TableObjectCaseClassClustered(0,1,0,1,t1,Wed Nov 07 01:00:00 CET 2018)
r=TableObjectCaseClassClustered(0,1,1,1,t1,Thu Nov 08 11:00:00 CET 2018)

That's all. As you can see from this article, it's possible to use Object Mapper's functionality with Scala code without much overhead. The full source code for the example is available on GitHub.

Scala Object Mapper

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.