Lightning Engine for Apache Spark

Accelerate performance of Apache Spark jobs by 3.6x*

Get faster query performance with Lightning Engine, a new Spark processing engine with vectorized execution, in-built intelligent caching, and optimized storage I/O. Lightning Engine is now in Preview.

*The queries are derived from the TPC-DS standard and TPC-H standard and as such are not comparable to published TPC-DS standard and TPC-H standard results, as these runs do not comply with all requirements of the TPC-DS standard and TPC-H standard specification.

Apache Spark is a trademark of The Apache Software Foundation.

Features

Boosted Spark performance

Lightning Engine leverages a new Apache Spark processing engine with vectorized execution, in-built intelligent caching, and optimized storage I/O to deliver significantly faster query performance. Lightning Engine is fully compatible with open source Spark applications.



Industry-leading price-performance

Delivers superior performance and cost efficiency, allowing users to process more data for less. Lightning Engine provides better than 3.6x* performance compared to open source Apache Spark, along with deep integrations across Google Cloud services like BigQuery and Vertex AI. Managed optimization reduces manual performance tuning.


Interoperability with open lakehouse

Lightning Engine is deeply integrated with Apache Iceberg and Google Cloud BigLake, providing a unified data analytics and AI platform. It features optimized data connectors for Cloud Storage and BigQuery, significantly improving data access latency and throughput.

Flexible deployment

Lightning Engine is currently in Preview and will be available in the premium tier of Google Cloud Serverless for Apache Spark as well as managed clusters in Dataproc. Both services already feature GPU support for accelerated machine learning workloads, and best-in-class job monitoring tools for operational efficiency. With serverless Spark, which supports robust production job support at scale through flexible Spark configurations and handling of large record sizes, you also achieve close to 100% resource utilization.

How It Works

Lightning Engine significantly boosts Spark's performance on Google Cloud by optimizing data access, implementing intelligent caching, and leveraging a vectorized C++ execution engine, enabling substantially faster query times and reduced resource consumption across various benchmarks.

Common Uses

Large-scale ETL/ELT workloads

Boost performance of workloads with complex SQL transformations

Large-scale ETL/ELT workloads with complex SQL queries are often CPU-bound and involve heavy data shuffles and computations. Lightning Engine's columnar processing and vectorized execution can help dramatically reduce the processing time for these complex SQL operations, leading to faster data pipelines, reduced cost due to shorter runtimes, and enabling more frequent data updates.

    Boost performance of workloads with complex SQL transformations

    Large-scale ETL/ELT workloads with complex SQL queries are often CPU-bound and involve heavy data shuffles and computations. Lightning Engine's columnar processing and vectorized execution can help dramatically reduce the processing time for these complex SQL operations, leading to faster data pipelines, reduced cost due to shorter runtimes, and enabling more frequent data updates.

      AI/ML workloads

      Accelerate AI/ML workloads and ETL workloads with native GPU support

      Run Spark ML training and batch inferencing workloads without additional setup or configuration. The Spark image comes pre-packaged with NVIDIA drivers and popular ML libraries. Use inbuilt support for Spark rapids, which benefits from all the performance improvements of Lightning Engine, has optimal default configurations for the jobs to better utilize GPUs and has fast autoscaling of nodes.

        Accelerate AI/ML workloads and ETL workloads with native GPU support

        Run Spark ML training and batch inferencing workloads without additional setup or configuration. The Spark image comes pre-packaged with NVIDIA drivers and popular ML libraries. Use inbuilt support for Spark rapids, which benefits from all the performance improvements of Lightning Engine, has optimal default configurations for the jobs to better utilize GPUs and has fast autoscaling of nodes.

          Pricing

          How Lightning Engine pricing worksLightning Engine for Apache Spark is in Preview and pricing will be coming soon
          Services and usageDescriptionPrice (USD)

          Data Compute Unit (DCU)

          The DCU rate details will be coming soon


          Coming soon

          Lightning Engine (Preview) pricing is coming soon.

          How Lightning Engine pricing works

          Lightning Engine for Apache Spark is in Preview and pricing will be coming soon

          Data Compute Unit (DCU)

          Description

          The DCU rate details will be coming soon


          Price (USD)

          Coming soon

          Lightning Engine (Preview) pricing is coming soon.

          Pricing calculator

          Estimate your monthly costs, including region-specific pricing and fees.

          Custom quote

          Connect with our sales team to get a custom quote for your organization.

          Start your proof of concept

          Turbocharge your Spark jobs

          Have a large project?

          Learn more about Google Cloud Serverless for Apache Spark

          When to use Lightning Engine for Apache Spark

          Tuning your Lightning Engine Spark workloads

          Google Cloud
          OSZAR »