<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://charlesreid1.com/w/index.php?action=history&amp;feed=atom&amp;title=DataFusion</id>
	<title>DataFusion - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://charlesreid1.com/w/index.php?action=history&amp;feed=atom&amp;title=DataFusion"/>
	<link rel="alternate" type="text/html" href="https://charlesreid1.com/w/index.php?title=DataFusion&amp;action=history"/>
	<updated>2026-06-20T03:49:44Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.12</generator>
	<entry>
		<id>https://charlesreid1.com/w/index.php?title=DataFusion&amp;diff=30197&amp;oldid=prev</id>
		<title>Unknown user: Created page with &quot;=About DataFusion=  Apache DataFusion serves as a powerful and flexible query engine that developers use as a foundation to build a wide variety of data-centric systems. Inste...&quot;</title>
		<link rel="alternate" type="text/html" href="https://charlesreid1.com/w/index.php?title=DataFusion&amp;diff=30197&amp;oldid=prev"/>
		<updated>2025-05-26T17:56:24Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;=About DataFusion=  Apache DataFusion serves as a powerful and flexible query engine that developers use as a foundation to build a wide variety of data-centric systems. Inste...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;=About DataFusion=&lt;br /&gt;
&lt;br /&gt;
Apache DataFusion serves as a powerful and flexible query engine that developers use as a foundation to build a wide variety of data-centric systems. Instead of building a query processing and optimization layer from scratch, projects leverage DataFusion&amp;#039;s capabilities.&lt;br /&gt;
&lt;br /&gt;
Below are some examples of what can be and has been built using Apache DataFusion:&lt;br /&gt;
&lt;br /&gt;
The common thread across these examples is that DataFusion provides the &amp;#039;&amp;#039;&amp;#039;core query processing capabilities&amp;#039;&amp;#039;&amp;#039; (SQL parsing, logical and physical planning, optimization, and execution against various data formats like Parquet, CSV, JSON, Avro), allowing developers to &amp;#039;&amp;#039;&amp;#039;focus on the unique features and domain-specific logic&amp;#039;&amp;#039;&amp;#039; of their applications. Its Rust foundation offers high performance and memory safety, while Apache Arrow integration ensures efficient in-memory data handling.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Types of Systems and Examples ==&lt;br /&gt;
&lt;br /&gt;
=== Specialized Analytical Databases ===&lt;br /&gt;
DataFusion&amp;#039;s extensibility makes it suitable for creating database systems tailored for specific analytical needs, particularly in the realm of time-series data.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;InfluxDB 3.0&amp;#039;&amp;#039;&amp;#039;: A widely-used time-series database that leverages DataFusion for its query engine.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;GreptimeDB, HoraeDB, CnosDB&amp;#039;&amp;#039;&amp;#039;: Open-source time-series databases built using DataFusion.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;CeresDB&amp;#039;&amp;#039;&amp;#039;: An analytical database.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Seafowl&amp;#039;&amp;#039;&amp;#039;: A CDN-friendly analytical database.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;ParadeDB&amp;#039;&amp;#039;&amp;#039;: PostgreSQL for search and analytics.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Distributed SQL Query Engines &amp;amp; Big Data Systems ===&lt;br /&gt;
It can be used to create systems that distribute query processing across multiple nodes, similar to Apache Spark.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Ballista&amp;#039;&amp;#039;&amp;#039;: A distributed SQL query engine built on Apache Arrow and DataFusion, designed to compete with systems like Spark.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Query Language Engines &amp;amp; Accelerators ===&lt;br /&gt;
DataFusion can power new query languages or accelerate existing ones.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Comet (by Apple, now Apache DataFusion Comet)&amp;#039;&amp;#039;&amp;#039;: An accelerator for Apache Spark that replaces Spark&amp;#039;s query execution with DataFusion for improved performance.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;VegaFusion&amp;#039;&amp;#039;&amp;#039;: Provides server-side acceleration for the Vega visualization grammar.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;PRQL-query&amp;#039;&amp;#039;&amp;#039;: An engine for the PRQL (Pipelined Relational Query Language).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SQL Support for Existing Libraries &amp;amp; Frameworks ===&lt;br /&gt;
It can add SQL querying capabilities to existing data tools and libraries.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Dask SQL&amp;#039;&amp;#039;&amp;#039;: Integrates SQL query capabilities into the Dask parallel computing library in Python.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Streaming Data Platforms ===&lt;br /&gt;
DataFusion&amp;#039;s architecture is also suitable for building systems that process continuous streams of data.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Synnada&amp;#039;&amp;#039;&amp;#039;: A streaming-first framework for data products.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Arroyo&amp;#039;&amp;#039;&amp;#039;: A distributed stream processing engine written in Rust.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Kamu&amp;#039;&amp;#039;&amp;#039;: A planet-scale streaming data pipeline.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Data Integration &amp;amp; ETL Tools ===&lt;br /&gt;
Its ability to read various formats and execute SQL makes it a good fit for Extract, Transform, Load (ETL) pipelines.&lt;br /&gt;
* While not a specific named product, DataFusion&amp;#039;s core capabilities are well-suited for building custom ETL solutions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Data Exploration &amp;amp; Utility Tools ===&lt;br /&gt;
Simple tools for quick data inspection and manipulation.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;&amp;lt;code&amp;gt;qv&amp;lt;/code&amp;gt;&amp;#039;&amp;#039;&amp;#039;: A command-line tool for quickly viewing and transcoding data in formats like Parquet, CSV, Avro, and JSON.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Observability Platforms ===&lt;br /&gt;
Systems for collecting, storing, and querying telemetry data like logs and metrics.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;OpenObserve, Parseable, ZincObserve&amp;#039;&amp;#039;&amp;#039;: Cloud-native observability platforms.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Semantic Layer Platforms ===&lt;br /&gt;
Tools that provide a unified business view of data.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Cube Store&amp;#039;&amp;#039;&amp;#039;: Cube&amp;#039;s universal semantic layer platform uses DataFusion.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Machine Learning &amp;amp; AI Infrastructure ===&lt;br /&gt;
Platforms that support ML workflows, often involving large-scale data processing and querying.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;LanceDB&amp;#039;&amp;#039;&amp;#039;: A vector database for AI/ML that uses DataFusion to support SQL queries over multimodal data.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Spice.ai&amp;#039;&amp;#039;&amp;#039;: Develops building blocks for data-driven AI applications, using DataFusion for SQL interfaces.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Replacements &amp;amp; Enhancements for Existing Systems ===&lt;br /&gt;
DataFusion can be used to enhance or replace components of existing data systems for better performance or new features.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Blaze (blaze-rs)&amp;#039;&amp;#039;&amp;#039;: A project aimed at providing a faster Spark runtime replacement using DataFusion.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Research Platforms ===&lt;br /&gt;
Its modularity makes it a good base for experimenting with new database technologies.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Flock&amp;#039;&amp;#039;&amp;#039;: A research platform for new database systems.&lt;/div&gt;</summary>
		<author><name>Unknown user</name></author>
	</entry>
</feed>