Announcing GA of Apache Spark 1.6 in Hortonworks Data Platform 2.4
Author: Vinay Shukla : March 4, 2016
As Apache Spark continues to gain popularity, the rapid march of new Spark releases continues. With HDP 2.4, we are announcing the general availability of Spark 1.6, which is the latest Spark version from the community.
With Spark proving an incredibly useful data access engine running on top of Hadoop, data scientists and business analysts need an easy to use tool to further harness its power. Apache Zeppelin brings a compelling user interface with the data exploration and visualization capabilities to address this need.
Late last year, Hortonworks introduced Apache Zeppelin as a technical preview with HDP 2.3.
With the GA of HDP 2.4, we proudly announce that we have revised and updated the Apache Zeppelin technical preview. Please go here to get the updated version.
This updated Apache Zeppelin technical preview now comes with a notebook import and export feature and a key building block for LDAP Authentication that we worked on within the community.
We will continue to drive Zeppelin to be enterprise ready by hardening its security capabilities and improving it for multi-tenant environments. Recently, at a packed meet up in NYC, I talked about running Zeppelin in a multi-tenant environment. Our goal is to provide general availability of Apache Zeppelin in the first half of 2016.
With the GA of Apache Spark 1.6, we also introduce the general availability of Dynamic Resource Allocation. With this feature, Spark jobs will use the YARN cluster resources much more efficiently. This feature leverages HDP’s capability to store shuffle data and now HDP makes it easier to activate this feature.
At the recent Spark Summit East – 2016, Shaun Connonlly outlined Hortonworks’ strategy and investments in Apache Spark in his keynote. At the same time, Spark 2.0 was previewed. New innovations being developed for this upcoming release are focused on continued improvements in Spark performance with project Tungsten, Structured Streaming and plans to merge the Dataset and DataFrame APIs. With Apache Spark 2.0, expected to come out in May 2016, Dynamic Resource Allocation will also be implemented for Spark Streaming jobs.
Over the past year, we have had tremendous customer success with Spark here at Hortonworks and we continue to deepen our commitment and involvement within the Spark community. We want to thank our customers and partners in helping us in this journey. We have just started, with much more planned ahead, so stay tuned.