Covering Scientific & Technical AI | Friday, June 5, 2026

LF AI & Data Foundation Hosts Vortex Project to Power High Performance Data Access for AI and Analytics 

SAN FRANCISCO, Aug. 6, 2025 -- The LF AI & Data Foundation, the premier organization supporting open source innovation in artificial intelligence and data under the Linux Foundation, today announced the launch of the Vortex Project: an open, extensible columnar format that bridges the gap between cloud storage and heterogeneous compute, handling data seamlessly across memory, disk (file format), and network (IPC format) while maintaining compression throughout.

Contributed to LF AI & Data as a new Incubation-stage project by SpiralDB, Vortex joins LF AI & Data with contributions and support from Microsoft, Snowflake, Palantir, NVIDIA, and other industry leaders, signaling broad industry alignment around the need for next-generation storage infrastructure.

Vortex is purpose-built as the foundational storage format for modern data systems backed by object storage and is based on the latest compression research. Recent public validation includes the Technical University of Munich's (TUM) database group calling Vortex the "cutting edge," and Microsoft demonstrating 30% runtime reductions when running traditional Spark workloads with Vortex in Apache Iceberg. Unlike Apache Parquet and other formats that were built only for structured analytics performed on CPUs, Vortex is optimized to also support multimodal data, wide schemas, GPU-based training workloads, and high performance reads from cloud object stores such as S3 and GCS.

"Storage and compute have always been fungible, but data processing is no longer only about moving data from a disk into the CPU. Modern GPUs can consume terabits per second, but legacy storage formats are a huge bottleneck – they effectively require CPUs to sit in the middle, decompressing data before passing it on. We created Vortex to support this next generation of workloads, while dramatically improving performance for traditional data systems at the same time," said Will Manning, co-founder and CEO at SpiralDB. "By contributing Vortex to LF AI & Data, we're excited to foster a broader community. What excites me most is that Vortex gives the entire community a platform to innovate on storage – researchers can contribute new compression techniques, companies can optimize it for their workloads, and everyone can benefit from shared advances."

Designed for speed, simplicity and composability, Vortex provides:

  • State-of-the-art performance across every key metric: 100x faster random access reads, 10-20x faster scans, and 5x faster writes compared to Apache Parquet, while maintaining similar compression ratios.
  • An extensible architecture designed to facilitate research and rapidly incorporate new compression techniques, ensuring Vortex remains state-of-the-art as the field evolves.
  • First-class, native integrations with many other key open source data tools across the Composable Data Stack, including Apache Arrow, Apache DataFusion, DuckDB, Apache Spark, and (soon) Apache Iceberg.
  • The first storage format designed for direct GPU decompression, eliminating CPU bottlenecks by loading training data straight from object storage into GPU memory.

"Vortex tackles one of the most overlooked performance problems in AI infrastructure: how slow and cumbersome it is to access training data from the cloud," said Mark Collier, general manager of AI & Infrastructure at the Linux Foundation. "This project represents a huge step forward for scalable, AI-native data pipelines – and we're thrilled to welcome it into the LF AI & Data community."

Vortex has been initiated with contributions from leading researchers and engineers across academia and industry, and welcomes broad participation from the global open source community.

To learn more or get involved, visit https://vortex.dev.

About the Linux Foundation

The Linux Foundation is the world's leading home for collaboration on open source software, hardware, standards, and data. Linux Foundation projects are critical to the world's infrastructure including Linux, Kubernetes, Node.js, ONAP, OpenChain, OpenSSF, OpenStack, PyTorch, RISC-V, SPDX, Zephyr, and more. The Linux Foundation is focused on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org.


Source: Linux Foundation