DataStax Enterprise 4.5

Release notes

DataStax Enterprise 4.5.3 release notes

DataStax Enterprise 4.5.3 includes component changes, enhancements, changes, and resolved issues.

Component changes

DataStax Enterprise changes a few of the components introduced in the previous release. Changes are limited to these components:

  • Apache Cassandra 2.0.11.82
  • Apache Solr 4.6.0.2.8
  • Shark 0.9.1.4
  • Cassandra Java Driver 2.0.6
  • Spark connector 1.0.3

Enhancements and changes

  • The default Kerberos authenticator now supports the Digest authentication.
  • The spark.cleaner.ttl setting is configurable in the spark-env file, and set to 12 hours by default.
  • DSE Search now logs a warning when users do not map dynamic fields correctly.
  • Upgraded the Spark connector from 1.0.0 to 1.0.3, which includes the following enhancements:
    • 1.0.3
      • Fixed handling of Cassandra rpc_address set to 0.0.0.0 (#332)
    • 1.0.2
      • Fixed batch counter columns updates (#234, #316)
      • Expose both rpc addresses and local addresses of cassandra nodes in partition preferred locations (#325)

      • Cleaned up the SBT assembly task and added build documentation (backport of #315)

    • 1.0.1
      • Added logging of error message when asynchronous task fails in AsyncExecutor. (#265)

      • Fixed connection problems with fetching token ranges from hosts with rpc_address different than listen_address. Log host address(es) and ports on connection failures. Close thrift transport if connection fails for some reason after opening the transport, for example, authentication failure.

      • Upgraded cassandra driver to 2.0.6.

Resolved Issues

  • Fixed an issue causing Solr Join queries to return incorrect results. (DSP-3825)
  • Active commit log segments are now archived. (DSP-3873, CASSANDRA-6904)
  • Problems using S3 as an external file system have been resolved. (DSP-4082)
  • Fixed a problem causing an out-of-memory condition when running nodetool repair. (DSP-4104, CASSANDRA-7983)
  • Fixed an issue that prevented Hive from working with Kerberos if the user is not initialized on the node. (DSP-4127)
  • Fixed a null pointer exception during node bootstrap caused by the Solr core not being active yet when the back pressure kicks in. (DSP-4131)
  • Resolved the CQL TRUNCATE command problem related to memory-only tables. (DSP-4135)
  • Fixed an issue causing Shark to silently drop inserts. (DSP-4148)
  • The DataStax Enterprise integration of Spark respects the SPARK_DRIVER_HOST environment variable on start up. (DSP-4166)
  • The GossipFilePropertySnitch and EC2MultiRegionSnitch now use private IP addressses for communication during node repair between nodes in different data centers. (DSP-4192, CASSANDRA-8084)
  • Fixed bug causing core creation to fail if the Cassandra timeuuid type is used inside a list, set, or map collection. (DSP-4288)
  • Querying partitions directly now works. (DSP-4293)
  • Fixed an issue causing a null pointer exception on non Solr workload nodes holding Solr data and attempting to run the nodetool cleanup command on data. (DSP-4310)

DataStax Enterprise 4.5.2 release notes

DataStax Enterprise 4.5.2 includes component changes, enhancements, changes, resolved issues, patches, and issues. The following issue resolved in this release might break existing code:

Possible code breaker

Hive now correctly returns NULL in places it was previously returning an empty string.

If your application inserts null string-type values into a table, adjust the code to expect NULL instead of an empty string.

Components

  • Apache Cassandra 2.0.10.71
  • Apache Hadoop 1.0.4.13
  • Apache Hive 0.12.0.5
  • Apache Pig 0.10.1
  • Apache Solr 4.6.0.2.6
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.4.14.2
  • Apache Mahout 0.8
  • Apache Tomcat 6.0.39
  • Apache Thrift 0.7.0
  • Apache Commons
  • Spark 0.9.1
  • Shark 0.9.1.3
  • JBCrypt 0.3m
  • SLF4J 1.7.2
  • Guava 15.0
  • JournalIO 1.4.2
  • Netty 4.0.13.Final
  • Faster XML 3.1.3
  • HdrHistogram 1.2.1.1
  • Snappy 1.0.5
  • Cassandra Java Driver 2.0.4.1
Enhancements and changes
  • Miscellaneous enhancements
    • The procedure for encrypting data has changed in this release. DataStax recommends migrating data encrypted in earlier releases to DataStax Enterprise 4.5.2.
    • This release supports spatial analytics through the integration of some components of GIS Tools for Hadoop. Also, included is a custom tool for importing data in Enclosed JSON format from ArcGIS to a Cassandra table.
    • An example of using Cassandra triggers in the demos directory.
    • You can now query the system.local and system.peers tables to get the type of workload a node is running.
    • A new merge metrics mbean provides information about Solr/Lucene segment merging.
  • Shark and Spark enhancements and changes
    • DataStax Enterprise Spark integration now uses the Spark Cassandra Connector. The package name has changed to com.datastax.spark.connector. You can use the configuration options defined in the project to configure DataStax Enterprise Spark.
    • During the upgrade to DataStax Enterprise 4.5.2, do not run Spark jobs until the entire cluster is upgraded to the new version of DataStax Enterprise.
    • As a workaround to https://spark-project.atlassian.net/browse/SHARK-217, this release updates the local Kyro-Serilization jar from 2.21 to 2.24. This fixes select statements having null fields.
    • In a file, you can specify a CQL schema for the Cassandra context when starting up Spark.
  • Sqoop enhancements
    • This release supports a native Cassandra implementation of the Sqoop metastore.
    • New cql-export options select certain columns for export, limit the page size of the export, and conditionally filter the data you select for export using the --cassandra-where-clause option <clause>.
    • In the previous release, you could not configure the node consistency level using the CQL import command. The --cassandra-consistency-level property can now be used for imports as well as exports.
  • Pig changes
    • Cql Paging Recorder Reader has been removed from Cassandra. The CqlStorage handler is no longer compatible with compact tables with clustering columns. Users who are accessing these tables need to migrate the tables to CqlNativeStorage format. This format uses the Cql Record Reader.
    • The CqlStorage handler is deprecated and slated for removal at some point in the future. Use the CqlNativeStorage handler and cql:// for new pig applications.
Resolved issues
  • CREATE TRIGGER and DROP TRIGGER now show up in the audit log. (DSP-2442)
  • This release of DSE Search/Solr compensates for the lack of scheduled hard commits to update the opened reader with the latest segments version. (DSP-3031)
  • Solr blob data types used in composite keys threw an invalidating error. (3033)
  • Incorrect authentication/authorization settings in Cassandra.yaml left cassandra in a state that prevented startup while keeping port 7199 closed. (DSP-3498)
  • Hive returned NULL in places it previously returned an empty string. See Possible code breaker. (DSP-3534)
  • The TermVectorComponent, not fixed in DSP-3147, caused unnecessary Cassandra access in CassandraRowReader due to the StoredFieldVisitor returning an empty collection of fields to load. (DSP-3625)
  • The problem that prevented repair, addition, and removal of nodes from a cluster using transparent data encryption has been resolved. When attempting to repair a node, the node could not unencrypt the data for reading. (DSP-3636).
  • The sstableloader utility works properly with encrypted tables when internal authentication, SSL, or kerberos authentication is used. (DSP-3636)
  • Improper loading of SSTables with encrypted tables using sstableloader when internal authentication, SSL, or kerberos authentication. (DSP-3641)
  • The null pointer exception no longer occurs when using QueryBuilder.batch from the Java driver and turning on auditing. (DSP-3673)
  • Incorrect names for options in the import.options file installed by DataStax Enterprise for the Sqoop demo have been fixed. (DSP-3708)
  • When installing in text mode, the installer failed to remove old dse JAR files. (DSP-3728)
  • DSE Search now uses the Tomcat HTTP NIO connector by default, unless otherwise specified in the server.xml. (DSP-3746)
  • Fixed an issue causing the Spark REPL to shutdown after a failure. (DSP-3795)
  • Fixed a memory leak in CFS compaction. (DSP-3799)
  • Fixed an Sqoop issue causing a null pointer exception to occur if not all of the cql columns were specified in the cql-export command. (DSP-3803)
  • Fixed an issue causing Solr Join queries to return incorrect results. (DSP-3825)
  • Previously NULL values were returned as empty datatypes in hive. Now, the exact information from the native protocol is returned, except collections, which are returned as empty collections when the value is NULL. (DSP-3840)
  • Fixed an issue causing certain DSE tools to warn about an incorrect configuration loader. (DSP-3842)
  • Fixed an issue, which occurred in specific situations if an analytic node is down, that prevented Hive from accessing CFS. (DSP-3846)
  • Fixed the issue causing Spark Master HA to fail to start sometimes on a single node. (DSP-3855)
  • Fixed the build.xml file that ships with the Spark demo. (DSP-3886)

Patches

  • Better error message when adding a collection with the same name than a previously dropped one (CASSANDRA-6276)
  • Pig support for hadoop CqlInputFormat (CASSANDRA-6454)
  • Add inter_dc_stream_throughput_outbound_megabits_per_sec (CASSANDRA-6596)
  • Fix potential AssertionError with 2ndary indexes (CASSANDRA-6612)
  • Add option to disable STCS in L0 (CASSANDRA-6621)
  • Hadoop--Add CqlOutputFormat (CASSANDRA-6927)
  • Always merge ranges owned by a single node (CASSANDRA-6930)
  • cqlsh--Wait up to 10 sec for a tracing session (CASSANDRA-7222)
  • Give CRR a default input_cql Statement (CASSANDRA-7226)
  • Fix IncompatibleClassChangeError from hadoop2 (CASSANDRA-7229)
  • Hadoop--allow ACFRW to limit nodes to local DC (CASSANDRA-7252)
  • Workaround JVM NPE on JMX bind failure (CASSANDRA-7254)
  • Fix race in FileCacheService RemovalListener (CASSANDRA-7278)
  • Fix inconsistent use of consistencyForCommit that allowed LOCAL_QUORUM operations to incorrect become full QUORUM (CASSANDRA-7345)
  • Make sure high level sstables get compacted (CASSANDRA-7414)
  • Properly handle unrecognized opcodes and flags (CASSANDRA-7440)
  • Fix AssertionError when using empty clustering columns and static columns (CASSANDRA-7455)
  • Hadoop--close CqlRecordWriter clients when finished (CASSANDRA-7459)
  • Switch liveRatio-related log messages to DEBUG (CASSANDRA-7467)
  • Set correct stream ID on responses when non-Exception Throwables are thrown while handling native protocol messages (CASSANDRA-7470)
  • Remove duplicates from StorageService.getJoiningNodes (CASSANDRA-7478)
  • Fix error when doing reversed queries with static columns (CASSANDRA-7490)
  • Properly reject operations on list index with conditions (CASSANDRA-7499)
  • Throw InvalidRequestException when queries contain relations on entire collection columns (CASSANDRA-7506)
  • Don't depend on cassandra config for nodetool ring (CASSANDRA-7508)
  • Fix truncate to always flush (CASSANDRA-7511)
  • Warn when SSL certificates have expired (CASSANDRA-7528)
  • Fix range merging when DES scores are zero (CASSANDRA-7535)
  • Windows--force range-based repair to non-sequential mode (CASSANDRA-7541)
  • Fix row size miscalculation in LazilyCompactedRow (CASSANDRA-7543)
  • Backport CASSNADRA-3569/CASSANDRA-6747 (CASSANDRA-7560)
  • Remove CqlPagingRecordReader/CqlPagingInputFormat (CASSANDRA-7570)
  • Fix ReversedType aka DateType mapping to native protocol (CASSANDRA-7576)
  • cqlsh--enable CTRL-R history search with libedit (CASSANDRA-7577)
  • Fix sstableloader unable to connect encrypted node (CASSANDRA-7585)
  • Add stop method to EmbeddedCassandraService (CASSANDRA-7595)
  • Remove shuffle and taketoken (CASSANDRA-7601)
  • cqlsh--Add tab-completion for CREATE/DROP USER IF [NOT] EXISTS (CASSANDRA-7611)
  • Update java driver for hadoop (CASSANDRA-7618)
  • Fix NPE when listing saved caches dir (CASSANDRA-7632)
  • Add 'nodetool sethintedhandoffthrottlekb' (CASSANDRA-7635)
  • cqlsh--cqlsh should automatically disable tracing when selecting from system_traces (CASSANDRA-7641)
  • Track max/min timestamps for range tombstones (CASSANDRA-7647)
  • Add cassandra.auto_bootstrap system property (CASSANDRA-7650)
  • SimpleSeedProvider no longer caches seeds forever (CASSANDRA-7663)
  • Throw EOFException if we run out of chunks in compressed datafile (CASSANDRA-7664)
  • Set gc_grace_seconds to seven days for system schema tables (CASSANDRA-7668)
  • Support connecting to ipv6 jmx with nodetool (CASSANDRA-7669)
  • Avoid logging CompactionInterrupted at ERROR (CASSANDRA-7694)
  • Fix potential AssertionError in RangeTombstoneList (CASSANDRA-7700)
  • cqlsh--Fix failing cqlsh formatting tests (CASSANDRA-7703)
  • Validate arguments of blobAs functions (CASSANDRA-7707)
  • Minor leak in sstable2jon (CASSANDRA-7709)
  • Fix validation when adding static columns (CASSANDRA-7730)
  • Thrift--fix range deletion of supercolumns (CASSANDRA-7733)
  • Fix dropping collection when it's the last regular column (CASSANDRA-7744)
  • Fix race in background compaction check (CASSANDRA-7745)
  • Do not flush on truncate if durable_writes is false (CASSANDRA-7750)
  • Fix MS expiring map timeout for Paxos messages (CASSANDRA-7752)
  • Configure system.paxos with LeveledCompactionStrategy (CASSANDRA-7753)
  • Fix NPE in FileCacheService.sizeInBytes (CASSANDRA-7756)
  • Clone token map outside of hot gossip loops (CASSANDRA-7758)
  • Hadoop--fix cluster initialisation for a split fetching (CASSANDRA-7774)
  • Fix PRSI handling of CQL3 row markers for row cleanup (CASSANDRA-7787)
  • Improve PasswordAuthenticator default super user setup (CASSANDRA-7788)
  • Make StreamReceiveTask thread safe and gc friendly (CASSANDRA-7795)
  • Stop inheriting liveRatio and liveRatioComputedAt from previous memtables (CASSANDRA-7796)
  • Validate empty cell names from counter updates (CASSANDRA-7798)
  • Fix ALTER clustering column type from DateType to TimestampType when using DESC clustering order (CASSANRDA-7797)

Issues

  • In this release, compaction of files stored on the cfs-archive layer should be disabled, but instead these files are compacted automatically. (DSP-4081)

DataStax Enterprise 4.5.1 release notes

DataStax Enterprise 4.5.1 updates three components:
  • Apache Solr 4.6.0.2.4 to 4.6.0.2.5
  • Shark 0.9.1.1 to 0.9.1.2
  • Cassandra Java Driver 2.0.2 to 2.0.2.1
Enhancements and changes
  • Post Solr query parameters are now properly stored in the audit logs.
  • This release adds the update metrics mbean, which can be useful to guide tuning of all factors affecting indexing performance, such as back pressure, indexing threads, RAM buffer size and merge factor.
  • The Weather Sensor demo for running analytical queries with Hadoop and Spark is now easier to run. You no longer need to change path variables in tarball installations.
  • The Weather Sensor demo readme file has been improved.
  • The Shark component, updated to 0.9.1.2, works with internal authentication.
  • The showSchema method, which has been added to Spark, provides information about all user keyspaces, a particular keyspace, or a table.
  • Improved default memory settings for Spark.
  • When running the Portfolio Manager demo, messages about keyspace sensitivity no longer appear.

Resolved issues

  • Resolved the issue causing DataStax Enterprise to hang during shutdown, waiting for gossip to start. (DSP-3518)
  • Fixed the out-of-memory error on huge clusters caused by Cassandra File System (CFS) memory consumption, which has been reduced significantly, approximately 500 times for some use cases. (DSP-3615)
  • Fixed an issue when enabling clustering in Solr that caused DataStax Enterprise to complain about the org.tartarus.snowball.ext.DanishStemmer class not being found. (DSP-3645)
  • Fixed the race condition between shutdown of client and server channels, causing a harmless Netty exception to appears when you shut down. (DSP-3651)
  • Fixed the problem preventing the HSHA rpc server from functioning. (DSP-3675)
  • RPM and Deb package installations now properly find the shark-env.sh file. (DSP-3696)

DataStax Enterprise 4.5 release notes

DataStax Enterprise 4.5 updates components and includes major enhancements improvements, patches, and bug fixes.

Components

  • Apache Cassandra 2.0.8.39
  • Apache Hadoop 1.0.4.13
  • Apache Hive 0.12.0.3
  • Apache Pig 0.10.1
  • Apache Solr 4.6.0.2.4
  • Apache log4j 1.2.16
  • Apache Sqoop 1.4.4.14.1
  • Apache Mahout 0.8
  • Apache Tomcat 6.0.39
  • Apache Thrift 0.7.0
  • Apache Commons
  • Spark 0.9.1
  • Shark 0.9.1.1
  • JBCrypt 0.3m
  • SLF4J 1.7.2
  • Guava 15.0
  • JournalIO 1.4.2
  • Netty 4.0.13.Final
  • Faster XML 3.1.3
  • HdrHistogram 1.0.9
  • Snappy 1.0.5
  • Cassandra Java Driver 2.0.2
Apache Cassandra documentation covers release notes for Cassandra 2.0.8. NEWS.txt contains late-breaking information about upgrading from previous versions of Cassandra. A NEWS.txt or a NEWS.txt archive is installed in the following locations:
  • Tarball: install_location/resources/cassandra
  • Installer-Services installations: /usr/share/dse/resources/cassandra
  • Package: /usr/share/doc/dse-libcassandra*

NEWS.txt is also posted on the Apache Cassandra project web site.

Enhancements and changes

DataStax Enterprise 4.5 includes the following enhancements and changes:
  • Spark/Shark
    • Support for Apache Spark for running analytical queries independent of Hadoop
    • Support for Apache Shark, a SQL-like, Hive-compatible language built on top of Spark
  • External Hadoop systems
    • A bring your own Hadoop (BYOH) model that integrates Hadoop data warehouse implementations by Cloudera and Hortonworks
    • Support for Kerberos-secured BYOH integration using the Cloudera Manager
  • DSE Hadoop/Hive/Pig
    • Support for the native protocol in Hive including the addition of 19 new Hive TBLPROPERTIES to support the native protocol
    • Auto-creation of Hive databases and external tables for each CQL keyspace and table
    • A new cql3.partition.key property that maps Hive tables to CQL compound primary keys and composite partition keys
    • Support for HiveServer2
    • Integration of the HiveServer2 Beeline command shell
    • Support for expiring data in columns by setting TTL (time to live) on Hive tables.
    • Support for expiring data by setting the TTL on Pig data using the cql:// URL, which includes a prepared statement shown in step 10 of the library demo.
  • Sqoop
  • Solr
    • For performance, you can configure DSE Search/Solr to parallelize row reads.
    • The default shard transport type has been changed from http to netty. If you upgrade to DataStax Enterprise 4.5, perform the upgrade procedure using the shard transport type of your old installation, and after the upgrade, change the shard transport type to netty. Start the cluster using a rolling restart.
    • This release of DataStax Enterprise does not use Lucene compressed stored fields anymore for performance reasons. Subsequent releases will also not use these fields. (DSP-3484)
    • When the Solr query time join from field is docValues=true, the faster doc values-based join system is used. Upgrading to DataStax Enterprise 4.5 requires reindexing if query time join is used.
    • DataStax Enterprise 4.5 and later moves the DSE per-segment filter cache off-heap by using native memory, hence reducing on-heap memory consumption and garbage collection overhead.
    • The new off-heap filter cache is enabled by default, but can be disabled by passing the following JVM system property at startup time: -Dsolr.offheap.enable=false.
    • Query metric times are now logged by DSE Search.
    • DSE mbean names have been improved to decrease the chance that names will clash. The old naming format was com.datastax.bdp:type=name. The new format is com.datastax.bdp:type=workload,name=name. For example, com.datastax.bdp:type=search,name=SolrContainerPlugin.

Resolved issues

  • Solr
    • Soft commit adjustments during back pressure are now correctly executed. (DSP-3584)
    • The problem that caused the old pre-reload value of the soft commit to be reloaded during back pressure has been resolved. (DSP-3584)
    • Resolved a problem causing Tomcat to block shutdown when an out of memory error occurs. (DSP-3328)
  • Other
    • Resolved a problem caused by the disk full alert feature that removed a node from the ring. Because Cassandra will no longer automatically decommission a node when the disk is almost full, you need to monitor disk space and add capacity when necessary. (DSP-3601)
    • The DSE init.d script sets -XX:HeapDumpPath when using jsvc fallback and also when using the default dse_daemon script. The latter was not being set, which prevented the heap from being dumped on an out-of-memory condition. (DSP-3308)
    • Resolved a problem caused by calling Gossiper.instance.getEndpointStateForEndpoint but not checking for a null return, which lead to null pointer exceptions like the following (DSP-3616):
      ERROR 11:02:42,841 Exception in thread Thread[Thread-1,5,main]
      java.lang.NullPointerException
      at com.datastax.bdp.gms.DseState.doGetCurrentState(DseState.java:269)
      at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:167)
      at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:470)
      at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:380)

Issues

  • DataStax supports a data center that contains one or more nodes running in dual Spark/DSE Hadoop mode. DataStax does not support running some nodes in DSE Hadoop mode and some in Spark mode in the same data center. Dual Spark/DSE Hadoop mode means you started the node using the -k and -t options on tarball installations, or set the startup options HADOOP_ENABLED=1 and SPARK_ENABLED=1 on packaged installations. (DSP-3561)
  • Due to a DSE_CLASSPATH problem, if you are installing DataStax Enterprise to use the bring your own Hadoop (BYOH) model, you need to install and configure DataStax Enterprise on all nodes, including nodes in the Hadoop cluster, as described in the installation procedure. (DSP-3654)
  • Due to a race condition between shutdown of client and server channels, a harmless Netty exception appears when you shut down. The exception looks something like this:
    WARN 20:18:15,397 Failed to submit an exceptionCaught() event.
    java.util.concurrent.RejectedExecutionException: event executor terminated
    at io.netty.util.concurrent.SingleThreadEventExecutor.reject
    (SingleThreadEventExecutor.java:703) . . .

    Ignore this exception. (DSP-3651)

  • In this release, the HSHA rpc server is not functioning. (DSP-3675)

  • Writing to Blob columns from Spark is not supported in this release. Reading columns of all types is supported; however, you need to convert collections of blobs to byte arrays before serializing. (DSP-3620)
  • After upgrading DataStax Enterprise from 4.0.0 or 4.0.1 to 4.5.x on RHEL5/CentOS5, the Snappy JAR file will be missing. To get it back, either:
    • Run the switch-snappy script:
      $ cd /usr/share/dse ## Package installations
      $ cd install_location/bin  ## tarball installations
      
      $ switch-snappy 1.0.4
    • Uninstall the old installation and then do a fresh installation. Using, a regular uninstall maintains the configuration files and data files.
  • Cassandra static columns, introduced in Cassandra 2.0.6, cannot be included in the Solr schema (and hence indexed) for performance reasons because changing the value of a single static column would require re-indexing all documents sharing the same partition key. (DSP-3143)

DataStax Enterprise 4.5 Installer release notes

Components

  • Apache Cassandra 2.0.8
  • OpsCenter 4.1.4
  • DevCenter 1.1.x
Show/hide