Lightning Engine for Apache Spark

Accelerate performance of Apache Spark jobs by 3.6x*

Get faster query performance with Lightning Engine, a new Spark processing engine with vectorized execution, in-built intelligent caching, and optimized storage I/O. Lightning Engine is now in Preview.

*The queries are derived from the TPC-DS standard and TPC-H standard and as such are not comparable to published TPC-DS standard and TPC-H standard results, as these runs do not comply with all requirements of the TPC-DS standard and TPC-H standard specification.

Apache Spark is a trademark of The Apache Software Foundation.

Features

Boosted Spark performance

Lightning Engine leverages a new Apache Spark processing engine with vectorized execution, in-built intelligent caching, and optimized storage I/O to deliver significantly faster query performance. Lightning Engine is fully compatible with open source Spark applications.

Industry-leading price-performance

Delivers superior performance and cost efficiency, allowing users to process more data for less. Lightning Engine provides better than 3.6x* performance compared to open source Apache Spark, along with deep integrations across Google Cloud services like BigQuery and Vertex AI. Managed optimization reduces manual performance tuning.

Interoperability with open lakehouse

Lightning Engine is deeply integrated with Apache Iceberg and Google Cloud BigLake, providing a unified data analytics and AI platform. It features optimized data connectors for Cloud Storage and BigQuery, significantly improving data access latency and throughput.

Flexible deployment

Lightning Engine is currently in Preview and will be available in the premium tier of Google Cloud Serverless for Apache Spark as well as managed clusters in Dataproc. Both services already feature GPU support for accelerated machine learning workloads, and best-in-class job monitoring tools for operational efficiency. With serverless Spark, which supports robust production job support at scale through flexible Spark configurations and handling of large record sizes, you also achieve close to 100% resource utilization.

How It Works

Lightning Engine significantly boosts Spark's performance on Google Cloud by optimizing data access, implementing intelligent caching, and leveraging a vectorized C++ execution engine, enabling substantially faster query times and reduced resource consumption across various benchmarks.

Common Uses

Large-scale ETL/ELT workloads

Boost performance of workloads with complex SQL transformations

Large-scale ETL/ELT workloads with complex SQL queries are often CPU-bound and involve heavy data shuffles and computations. Lightning Engine's columnar processing and vectorized execution can help dramatically reduce the processing time for these complex SQL operations, leading to faster data pipelines, reduced cost due to shorter runtimes, and enabling more frequent data updates.

Tutorials, quickstarts, & labs

Boost performance of workloads with complex SQL transformations

Large-scale ETL/ELT workloads with complex SQL queries are often CPU-bound and involve heavy data shuffles and computations. Lightning Engine's columnar processing and vectorized execution can help dramatically reduce the processing time for these complex SQL operations, leading to faster data pipelines, reduced cost due to shorter runtimes, and enabling more frequent data updates.

AI/ML workloads

Accelerate AI/ML workloads and ETL workloads with native GPU support

Run Spark ML training and batch inferencing workloads without additional setup or configuration. The Spark image comes pre-packaged with NVIDIA drivers and popular ML libraries. Use inbuilt support for Spark rapids, which benefits from all the performance improvements of Lightning Engine, has optimal default configurations for the jobs to better utilize GPUs and has fast autoscaling of nodes.

Tutorials, quickstarts, & labs

Accelerate AI/ML workloads and ETL workloads with native GPU support

Run Spark ML training and batch inferencing workloads without additional setup or configuration. The Spark image comes pre-packaged with NVIDIA drivers and popular ML libraries. Use inbuilt support for Spark rapids, which benefits from all the performance improvements of Lightning Engine, has optimal default configurations for the jobs to better utilize GPUs and has fast autoscaling of nodes.

Pricing

How Lightning Engine pricing works	Lightning Engine for Apache Spark is in Preview and pricing will be coming soon
Services and usage	Description	Price (USD)
Data Compute Unit (DCU)	The DCU rate details will be coming soon	Coming soon

Lightning Engine (Preview) pricing is coming soon.

How Lightning Engine pricing works

Lightning Engine for Apache Spark is in Preview and pricing will be coming soon

Data Compute Unit (DCU)

Description

The DCU rate details will be coming soon

Price (USD)

Coming soon

Lightning Engine (Preview) pricing is coming soon.

Pricing calculator

Estimate your monthly costs, including region-specific pricing and fees.

Custom quote

Connect with our sales team to get a custom quote for your organization.

Lightning Engine for Apache Spark

Accelerate performance of Apache Spark jobs by 3.6x*

Product highlights

Boosted Spark performance

Industry-leading price-performance

Interoperability with open lakehouse

Flexible deployment

Lightning Engine significantly boosts Spark's performance on Google Cloud by optimizing data access, implementing intelligent caching, and leveraging a vectorized C++ execution engine, enabling substantially faster query times and reduced resource consumption across various benchmarks.

Large-scale ETL/ELT workloads

Boost performance of workloads with complex SQL transformations

Tutorials, quickstarts, & labs

Boost performance of workloads with complex SQL transformations

AI/ML workloads

Accelerate AI/ML workloads and ETL workloads with native GPU support

Tutorials, quickstarts, & labs

Accelerate AI/ML workloads and ETL workloads with native GPU support

Pricing calculator

Custom quote

Start your proof of concept

Turbocharge your Spark jobs

Have a large project?

Learn more about Google Cloud Serverless for Apache Spark

When to use Lightning Engine for Apache Spark

Tuning your Lightning Engine Spark workloads