Cloudera Accelerating Ease of Use and Enterprise Adoption of Spark With Hadoop

December 2, 2015

PALO ALTO, Calif., Dec. 2 -- Cloudera, provider of the fastest, easiest, and most secure data management and analytics platform built on Apache Hadoop and the latest open source technologies, announced today that it has further matured Apache Spark integration within Apache Hadoop environments, with critical achievements around usability and interoperability throughout the past year. To further expand the enterprise capabilities of this powerful data processing engine, Cloudera has added support for Spark SQL and MLlib into Cloudera Enterprise 5.5 and CDH 5.5, which the company launched recently.

Due to its development ease and flexible data processing, Spark has soared in popularity within the open source community and across customer use cases. It is the most active project in the Apache Software Foundation (ASF), with more than 800 developers from more than 200 companies. Cloudera’s team of Spark committers have been actively driving the enterprise capabilities of Spark and uniting Spark within Hadoop to meet customer needs and further production adoption.

”The embrace of Spark by the developer community and Cloudera’s efforts in the past year to drive its mainstream adoption have been nothing short of remarkable,” said Doug Cutting, chief architect at Cloudera. “With the most customers running Spark with Hadoop, we have already made impressive strides in furthering the enterprise capabilities of Spark for Hadoop deployments across industries and use cases. With the addition of Spark SQL and MLlib to Cloudera’s platform, and a clear roadmap with the One Platform Initiative, Spark adoption will continue to soar for batch, streaming, and machine learning use cases.”

Cloudera and Spark: A Year in Review for Production Adoption

Over the past year, Cloudera has made significant strides in maturing Spark to address a wider range of data processing use cases, including end-to-end Internet of Things (IoT) applications, simpler batch processing, and native machine learning.

As more customers aimed to take advantage of Internet of Things and real-time streaming data, they needed an enterprise-grade stream processing engine to support their applications. To address this, Cloudera led development on Spark Streaming resiliency, ensuring zero data loss and bringing it up to production standards. This critical improvement, paired with the integration of Apache Kafka within the platform, has allowed Cloudera customers to build complete IoT applications within a unified platform and has had a drastic impact on Spark Streaming adoption overall.

To enable simpler, more powerful batch processing, and help solidify Spark’s place as the standard execution engine in Hadoop, Cloudera also released the beta of Apache Hive-on-Spark this year. As the tool-of-choice for ETL development, Hive integration with the Spark processing engine marks a significant milestone supporting next-generation data integration workloads and adoption of Spark as the successor of MapReduce.

Cloudera’s One Platform Initiative, announced in September, continues the acceleration of Apache Spark development for the enterprise and within the Hadoop ecosystem. Cloudera is making significant gains in enhancing Spark’s security, scale, management, and streaming capabilities, and will continue to focus heavily on this development in the coming year.

With the recent Cloudera 5.5 release, Cloudera has added Spark MLlib - broadening Spark’s ease of use and performance gains to machine learning applications within Hadoop - and Spark SQL - extending the capabilities of Spark for developers and data scientists by allowing SQL to be seamlessly embedded within Spark applications. This release also included improvements made to Spark SQL’s query engine as part of Project Tungsten, providing significant improvements in efficiency and speed. For further functionality, integrations built with Hive and its metastore ensure full interoperability of data schemas with Spark SQL within the Hadoop platform - ensuring the right users have a seamless experience with the right tools for their job, whether it be ETL development with Hive, application development with SparkSQL, or interactive business intelligence with Impala.

Driving Broad Customer Adoption

With the most experience supporting Spark as part of Hadoop, Cloudera has more customers running Spark on Hadoop than all other vendors combined and powers some of the largest multi-tenant Spark clusters today, including deployments over 800 nodes.

With over 170 customers running Spark across a vast range industries, including finance, healthcare, retail, and insurance, Cloudera has helped customers embrace a wide range of next-generation use cases, including:

Cox Automotive: Leading provider of products and services for automotive dealers and car buyers, moved from hourly analytics to real-time insights into ad campaigns using Spark Streaming
PRGX: World's leading provider of accounts payable recovery audit services, stated Spark’s flexible, performant data processing has been a “saving grace” and resulted in a 9-10x performance improvement compared to legacy systems
Online Retailer: Leveraged Spark to reduce data processing time by 30% and to take advantage of real-time trends for greater engagement
Allstate: One of the nation’s largest insurance providers, uses Cloudera and Apache Spark to combine more than 80 years of data for highly refined pricing models
RelayHealth: Healthcare technology solution provider and subsidiary of McKesson, builds predictive models for when payments to healthcare providers will be received, improving their cash flow. The company processes healthcare payment interactions between 200,000 physicians, 2,000 hospitals, and 1,900 health plan subscribers
Barclays: Multinational banking and financial services company, builds an insights engine that securely analyzes previously disparate transaction data and delivers relevant insights to Barclays customers in an easily digestible manner

In addition, Cloudera’s Accelerator Program for Spark has driven dozens of robust Spark applications and integrations with the leading third-party tools, further expanding the capabilities of Spark to customers. Key partners include Datameer, Informatica, Oracle, Paxata, Pentaho, Platfora, StreamSets, Syncsort, Talend and Trifacta.

About Cloudera

Cloudera delivers the modern data management and analytics platform built on Apache Hadoop and the latest open source technologies. The world’s leading organizations trust Cloudera to help solve their most challenging business problems with Cloudera Enterprise, the fastest, easiest and most secure data platform available for the modern world. Our customers efficiently capture, store, process and analyze vast amounts of data, empowering them to use advanced analytics to drive business decisions quickly, flexibly and at lower cost than has been possible before. To ensure our customers are successful, we offer comprehensive support, training and professional services. Learn more at http://cloudera.com.

---

Source: Cloudera

Categories: Happening Now

Cloudera Accelerating Ease of Use and Enterprise Adoption of Spark With Hadoop

Related

Happening Now

Recent News

Contributors

Cloudera Accelerating Ease of Use and Enterprise Adoption of Spark With Hadoop

Related

Happening Now

Recent News

Contributors

Share

Copy short link