DataStax Enterprise 4.0

Release notes

DataStax Enterprise 4.0.4 release notes

DataStax 4.0.4 includes updates to components, enhancements, changes, patches, and resolved issues. The following issue resolved in this release might break existing code:

Possible code breaker

Hive will now correctly return NULL in places it was previously returning an empty string.

If your application inserts null string-type values into a table, adjust the code to expect NULL instead of an empty string.

Updated Components

  • From Apache Cassandra 2.0.7 to Apache Cassandra 2.0.9.61
  • From Apache Hadoop 1.0.4.10 to Apache Hadoop 1.0.4.13
  • From Cassandra Java Driver 2.0.1 to a patched Cassandra Java Driver 2.0.4 (DSP-3848)

Enhancements, changes, and patches

  • Patches
  • General enhancements
    • The procedure for encrypting data has changed in this release. DataStax recommends migrating data encrypted in earlier releases to DataStax Enterprise 4.0.4.
    • This release includes an example of using Cassandra triggers in the demos directory.
    • The _HOST macro has been changed to force the hostname part of the Kerberos service principal to lowercase. The Kerberos service principal is generated from the host name, which can cause a problem connecting to cqlsh running if lowercase host names are not used.
  • Hive enhancements
  • Pig changes
    • Cql Paging Recorder Reader has been removed from Cassandra. The CqlStorage handler is no longer compatible with compact tables with clustering columns. Users who are accessing these tables need to migrate the tables to CqlNativeStorage format. This format uses the Cql Record Reader.
    • The CqlStorage handler is deprecated and slated for removal at some point in the future. Use the CqlNativeStorage handler and cql:// for new pig applications.

Resolved issues

DataStax Enterprise 4.0.4 fixes the following issues:

  • The DSE startup script overwrote the JAVA_LIBRARY_PATH environment variable instead of appending it. (DSP-2914)
  • Solr blob data types used in composite keys threw an invalidating error. (3033)
  • The heap was not dumped on an out-of-memory condition. (DSP-3308)
  • Tomcat blocked shutdown when an out of memory error occurred on Solr nodes. (DSP-3328)
  • No value was returned when the fl request parameter specified a dynamic field in Thrift and the 'fl' parameter had only dynamic fields. (DSP-3332)
  • Alteration of MapReduce output returned multiple results (MAPREDUCE-1597), which caused large gzip files with a custom input format to be read multiple times. (DSP-3384)
  • Solr HTTP requests ignored permissions set on keyspaces and tables. (DSP-3388)
  • Solr query POST parameters were not logged in the DSE audit log. (DSP-3440)
  • Audit logging threw a null pointer exception if a prepared statement used a null value. (DSP-3447)
  • Hive returned NULL in places it previously returned an empty string. See Possible code breaker. (DSP-3534)
  • Creation of a Solr cql core using the CLUSTER ORDERING directive and a TimeUUID threw an IllegalStateException. (DSP-3542)
  • Soft commit time interval reverted to a previous setting if changed during back pressure. (DSP-3584)
  • Reloading during back pressure of the old pre-reload value of the soft commit. (DSP-3584)
  • Removal of a node from the ring by the disk full alert feature.

    DataStax Enterprise 3.x added a feature to auto-decommission a node when the disk is close to being full. This feature has been removed and there are no plans to do any sort of proactive cluster changes in the future. Because of the adverse effects from filling up data disks, customers are always advised to monitor disk space and add capacity when necessary. (DSP-3601)

  • An out-of-memory error when using the Cassandra File System. (CFS)

    Memory consumption has been reduced significantly, by a factor of approximately 500 for some use cases. (DSP-3615)

  • Lack of support for writing to blob columns from Spark.

    This release supports reading columns of all types; however, you need to convert collections of blobs to byte arrays before serializing. (DSP-3620)

  • The TermVectorComponent, not fixed in DSP-3147, caused unnecessary Cassandra access in CassandraRowReader due to the StoredFieldVisitor returning an empty collection of fields to load. (DSP-3625)

  • Failed repair, addition, and removal of nodes from a cluster using transparent data encryption.

    When attempting to repair a node, the node did not unencrypt the data for reading. (DSP-3636).

  • Improper loading of SSTables with encrypted tables using sstableloader when internal authentication, SSL, or kerberos authentication. (DSP-3641)
  • DataStax Enterprise failed to find the org.tartarus.snowball.ext.DanishStemmer class, which prevented clustering in Solr. (DSP-3645)
  • A race condition between shutdown of client and server channels caused a harmless Netty exception to appear. (DSP-3651)
  • Hive was incapable of reading from a varint column if Cassandra was configured to use the random partitioner (DSP-3652)
  • A null pointer exception occurred when using QueryBuilder.batch from the Java driver and when turning on auditing. (DSP-3673)
  • A non-functioning HSHA rpc server (DSP-3675)
  • Previously NULL values were returned as empty datatypes in hive. Now, the exact information from the native protocol is returned, except collections, which are returned as empty collections when the value is NULL. (DSP-3840)

Issues

  • After upgrading DataStax Enterprise from 4.0.0 or 4.0.1 to 4.0.2 or 4.0.3 on RHEL5/CentOS5, the Snappy JAR file will be missing. (DSP-2558) To restore the Snappy JAR, perform one of the following procedures to either run the switch-snappy script or re-install DataStax:
    • Run the switch-snappy script:
      1. Navigate to the directory containing the switch-snappy script.
        $ cd /usr/share/dse  ## Package installations 
        $ cd install_location/bin  ## tarball installations
      2. Execute the script.
        $ switch-snappy  1.0.4
    • To re-install DataStax Enterprise:
      1. Uninstall the old installation.
      2. Re-install DataStax Enterprise 4.0.4.

        By uninstalling the old installation and re-installing the new one instead of performing an in-place upgrade, DataStax Enterprise uses the configuration files and data files of the new installation.

  • DSE Search/Solr cannot index a document that indexes only one field, which is also the unique key in the schema and the primary key in the corresponding Cassandra table. DSE Search/Solr deletes any existing data with that primary key and does not return any results for such query. (DSP-3362)

DataStax Enterprise 4.0.3 release notes

DataStax 4.0.3 updates components, and includes improvements, patches, and bug fixes.

Components

  • Apache Cassandra 2.0.7 (Updated)
  • Apache Hadoop 1.0.4.10 (Updated)
  • Apache Hive 0.12.0.2
  • Apache Pig 0.10.1
  • Apache Solr 4.6.0.1.3
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.14.3
  • Apache Mahout 0.8
  • Apache Tomcat 6.0.39 (Updated)
  • Apache Thrift 0.7.0
  • Apache Commons
  • JBCrypt 0.3m
  • SLF4J 1.7.2
  • Guava 15.0
  • JournalIO 1.4.2
  • Netty 4.0.13.Final
  • Faster XML 3.1.3
  • HdrHistogram 1.0.9
  • Snappy 1.0.5
  • Cassandra Java Driver 2.0.1
Apache Cassandra documentation covers release notes for Cassandra 2.0.7. NEWS.txt contains late-breaking information about upgrading from previous versions of Cassandra. A NEWS.txt or a NEWS.txt archive is installed in the following locations:
  • Tarball: install_location/resources/cassandra
  • Package: /usr/share/doc/dse-libcassandra*

NEWS.txt is also posted on the Apache Cassandra project web site.

Enhancements and changes
  • Internal authentication support for hadoop, hive, pig, and sqoop commands.
  • Support for Solr stored copy fields having different source and destination data types.
  • Enhanced dse and dse-env.sh scripts extend the HADOOP_CLASSPATH of the user and include the path to Mahout.
  • Updated Tomcat version 6.0.39 avoids potential garbage collection issues.
  • In addition to the classic Solr Update Request Processor, in DataStax Enterprise 4.0.3, a custom version is also available.
Resolved issues
  • Solr
    • Fixed the issue that caused an error when source and destination copy fields have different validators. (DSP-1910)
    • Fixed the issue that caused the DynamicSnitch to break the distributed search when load information changed during shard selection. (DSP-3322)
    • Fixed an issue that caused the merging of Solr index segments to fail when using a custom sorting MergePolicy. (DSP-3230)
    • Using a composite unique key in a Solr schema with a Thrift-compatible table is no longer allowed. Attempting to load a schema under these conditions results in a message that the operation is not supported. (DSP-3232)
    • Fixed the issue causing copy fields to be applied twice when a Solr HTTP query is inserted data over CQL 3 tables. (DSP-3240)
    • Fixed the issue causing the same row to be indexed several times when you use triggers for copy fields. (DSP-3241)
  • Hadoop
    • Fixed an issue causing an Analytics node to throw an exception endlessly. (DSP-3130)
  • Other
    • The dsetool rebuild_indexes command now returns an error code when it fails. (DSP-3178)
    • Using the CQL native protocol, DataStax Enterprise now throws an appropriate exception when Kerberos authentication issues occur. (DSP-3293)
    • Fixed the issue breaking CFMetaData. (DSP-3276/CASSANDRA-7074)
    • Backported a fix to Apache Cassandra to resolve the problem causing a huge performance regression in tombstone-heavy workloads. (CASSANDRA-6949)
Issues
  • The Solr commitWithin parameter is not supported. (DSP-3021)

DataStax Enterprise 4.0.2 release notes

DataStax Enterprise 4.0 includes updated components, enhancements, changes, resolved issues, and issues.

Components

  • Apache Cassandra 2.0.6
  • Apache Hadoop 1.0.4.9
  • Apache Hive 0.12.0.2
  • Apache Pig 0.10.1
  • Apache Solr 4.6.0.1.3
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.14.2
  • Apache Mahout 0.8
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons
  • JBCrypt 0.3m
  • SLF4J 1.7.2
  • Guava 15.0
  • JournalIO 1.4.2
  • Netty 4.0.13.Final
  • Faster XML 3.1.3
  • HdrHistogram 1.0.9
  • Snappy 1.0.5
  • Cassandra Java Driver 2.0.1
Enhancements and changes
  • Other
    • In DataStax Enterprise 4.0.2, only JMX ( java management extensions) password authentication is supported by the dsetool utility. If JMX passwords are enabled, users need to use the passwords to use the dsetool utility. In earlier releases, Cassandra internal authentication and JMX provided dsetool password authentication.
    • A new dsetool command, dsetool status, which is the same as dsetool ring as been added. The commands list the nodes in the ring including their node type.
    • If vnodes are enabled on an Analytics or Solr node, an error is logged in the system log and an error appears in the output of the dsetool ring command.
    • If the cluster uses a Random or Murmur3 partitioner, the dsetool ring command warns you if the nodes are imbalanced. The dsetool compares nodes having the most and least load. If the ratio is greater than 1.1, a warning message appears.

Resolved issues

  • Fixed an issue causing Solr to always use an external IP address, even though the EC2MultiRegionSnitch and GossipingPropertyFileSnitch route the traffic on the Cassandra side to internal IP addresses for nodes in the same data center and to external IP addresses for nodes in different data centers. This problem resulted in high EC2 bills from Amazon charges for external traffic and failure in the Rackspace environment, which prevents routing traffic to the external IP address within the same data center. (DSP-2710)
  • Fixed the problem with the getFileBlockLocations not returning the correct set of block descriptions. (DSP-2922)
  • Fixed an issue causing Solr queries to fail during a short period of time when adding or removing nodes. (DSP-2975)
  • Fixed an issue that caused a null pointer exception to be thrown on server shutdown. (DSP-2974)
  • Fixed issues causing Solr problems working properly with lightweight transactions. (DSP-3028 and DSP-3032)
  • Fixed the problem involving the per-segment query results cache to sort queries, such as queries that do not return a score. Set the useFilterForSortedQuery to true to use the query result cache and execute the same query multiple times. (DSP-3084)
  • The problem, which affected the Cassandra commit log, writing to copy fields when those fields were not stored and when no stored copy fields were present. (DSP-3099)
  • A row of bad data is no longer inserted through CQL into fields using a copy directive. (DSP-3107)
  • Fixed the issue preventing batch updates from being indexed if the updates were followed by a delete and were made on the same partition key. (DSP-3187)

Issues

  • After upgrading DataStax Enterprise from 4.0.0 or 4.0.1 to 4.02 or 4.0.3 on RHEL5/CentOS5, the Snappy JAR file will be missing. To get it back, either:
    • Run the switch-snappy script:
      $ cd /usr/share/dse ## Package installations
      $ cd install_location/bin  ## tarball installations
      
      $ switch-snappy 1.0.4
    • Uninstall the old installation and then do a fresh installation. Using, a regular uninstall maintains the configuration files and data files.
  • Cassandra static columns, introduced in Cassandra 2.0.6, cannot be included in the Solr schema (and hence indexed) for performance reasons because changing the value of a single static column would require re-indexing all documents sharing the same partition key. (DSP-3143)

DataStax Enterprise 4.0.1 release notes

This release incorporates two patches:

  • A patch to fix the problem with lightweight transactions, compare and set (CAS) that caused Cassandra to treat an existing row as non-existent if a TTL marker was applied to a column and then the TTL expired. (CASSANDRA-6623)
  • A patch to fix the 2.0 HSHA server problem that introduced corrupted data. (CASSANDRA-6285)

Issues

A bug was found that affects upgrades from DSE 4.0.0 on Solr nodes. Upgrades from versions prior to 4.0.0 directly to 4.0.1 are not affected. Please refer to the Upgrade Guide for a detailed workaround.

DataStax Enterprise 4.0 release notes

DataStax Enterprise 4.0 includes updated components, enhancements, changes, resolved issues, and issues.

Components

  • Apache Cassandra 2.0.5
  • Apache Hadoop 1.0.4.9
  • Apache Hive 0.12.0.1
  • Apache Pig 0.10.1
  • Apache Solr 4.6.0.1
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.2.14.1
  • Apache Mahout 0.8
  • Apache Tomcat 6.0.32
  • Apache Thrift 0.7.0
  • Apache Commons

Apache documentation covers release notes for Cassandra 2.0.5. Cassandra 2.0.5 supports CQL 3.

Enhancements and changes

DataStax Enterprise 4.0 includes the following enhancements and changes:

  • The latest version of the Java SE Runtime Environment (JRE) 7 is required for installing and running DataStax 4.0.

  • Support for in-memory tables to accommodate applications, such as ad bidding, that require rapid response time

  • Integration of Cassandra 2.0.5, which includes the following features and changes:

    • Improvements to CQL
    • Experimental triggers, configured in CQL, not supported in production deployments
    • Performance enhancements
    • Column aliases
    • Prepared statement and bind variable support
    • Support for accessing legacy CQL tables through cqlsh using the -2 option has been removed
  • Virtual nodes off by default (differs from Cassandra 2.x)

    DataStax Enterprise 4.0 turns off virtual nodes (vnodes) by default. DataStax does not recommend turning on vnodes for Hadoop or Solr nodes, but you can use vnodes for any Cassandra-only cluster, or a Cassandra-only data center in a mixed Hadoop/Solr/Cassandra deployment. To enable vnodes, see Using virtual nodes.

  • Required configuration of initial_token option (differs from Cassandra 2.x)

    In DataStax Enterprise 4.0, the initial_token option in the default cassandra.yaml of the DataStax Enterprise-integrated component needs to be set. Because DataStax Enterprise does not use virtual nodes (vnodes) by default, you need to set the initial_token option.

  • DSE Search/Solr enhancements and changes:

    • Optional TCP-based Solr communications that provide lower latency, improved throughput, and reduced resource consumption
    • Support for Solr custom field types
    • New mbeans for obtaining commit and query latency information
    • Higher maximum calculated heap size limits, 14GB and 10GB, for DSE Search/Solr nodes and analytics/Hadoop nodes, respectively
    • Non-string/non-numeric types, such as dates and booleans, can now be used as unique key in the Solr schema.xml.
    • Support for Solr 4.3 SolrJ, assuming security is not needed
    • Solr indexes on CQL 3 tables, including those created using compact storage directive, must now include the following type mapping version in solrconfig.xml:
       <dseTypeMappingVersion>2</dseTypeMappingVersion>.
  • Hadoop and Hive enhancements:

  • Changes to command-line options for securing sstableloader data

  • Lazy loading of compression rules, and in the event of a failure to load, a retry occurs

  • Configurable disk health checking

  • Disabled SSTable notifications to improve performance of memtable flushing.

Issues resolved

DataStax Enterprise 4.0 fixes the following issues:

  • Fixed an issue causing data to remain after the keyspace and table dropped and recreated. (CASSANDRA-6635)
  • Fixed the internal authentication issue when using cassandra-stress. (DSP-2911)
  • Fixed the problem causing an exception if the PIG_OUTPUTPARTITIONER, which is a CassandraStorage user-defined function (UDF), is not configured. (DSP-2075)

Issues

  • GLIBCXX_3.4.9 not found. This error may appear in older Linux distributions when installing DSE from the binary tarball. The workaround is to replace snappy-java-1.0.5.jar with snappy-java-1.0.4.1.jar. (DSP-2189)
  • DataStax Enterprise does not recognize changes to the default log directory used by its Hadoop component unless you add the HADOOP_LOG_DIR environment variable to the dse-env.sh file, as described in the Hadoop section of this document.

  • When the amount of data written to a table exceeds the limit specified by the size_limit_in_mb property, the following error message is logged:

    SEVERE: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)

    To avoid this condition, manage available memory carefully. (DSP-2990)

  • In this release, the flush_largest_memtables_at setting is 0.75, which is typically too small causing excessive flushing of the memtable to disk. The workaround is to change the setting to 0.80 in the cassandra.yaml. (DSP-2989)

  • The 2.0 HSHA server introduced corrupted data. This occurs when running the HSHA server in Cassandra 2.0.x releases earlier than 2.0.6. The hsha (half-synchronous, half-asynchronous) Thrift server was rewritten on top of Disruptor for Cassandra 2.0. This server can handle more simultaneous connections than the default sync server. Unfortunately, the rewrite introduced a bug that can cause incorrect data to be sent from the coordinator to replicas. (CASSANDRA-6285)

    Workaround: Use the native protocol or the default sync server. Cassandra 2.0.6 includes the fix and is expected to be released 3/10/2014. Alternately, you can get the pre-release build from http://people.apache.org/~slebresne/.

    DataStax Enterprise 4.0.1 will include the fix.

Show/hide