
Firebase developer hit with €54,000 Gemini bill in 13 hours after misconfigured API key
Read the latest insights from the RepoRank editorial team.
Explore big data frameworks for distributed processing, large-scale analytics, batch computation, streaming, and data-intensive engineering workloads. Compare the frameworks developers use to process massive datasets reliably across modern data platforms.

Read the latest insights from the RepoRank editorial team.

Read the latest insights from the RepoRank editorial team.

Read the latest insights from the RepoRank editorial team.
Trending open-source projects, delivered weekly.

Big data tools are built to handle datasets, workflows, and infrastructure demands that go beyond traditional local or lightweight analytics environments. Open source repositories in this space help teams work with distributed processing, large-scale storage, stream-based systems, and the architecture needed for modern data-intensive applications.
The open source big data ecosystem includes distributed compute engines, streaming platforms, data storage systems, analytics frameworks, large-scale processing utilities, and broader repositories built for data-heavy infrastructure and operations. RepoRank helps surface the repositories that are earning real attention and momentum.
This page helps you discover the big data tools engineers, platform teams, and analytics organizations are actively using, evaluating, and watching.
RepoRank focuses on real GitHub growth signals, helping you identify big data repositories that are active, relevant, and gaining adoption across large-scale data and analytics workflows.
Whether you are evaluating distributed processing tools, building scalable data infrastructure, or tracking open source repositories shaping modern big data workflows, this page helps you stay close to the projects driving large-scale data systems forward.
Use this page to discover trending big data repositories, compare tools, and stay current with the open source projects shaping modern large-scale analytics and infrastructure.
A big data framework is a framework designed to process and analyze very large datasets across distributed systems, often supporting batch jobs, streaming workloads, fault tolerance, and parallel execution.
Regular data tools often work well on a single machine or smaller datasets, while big data frameworks are built to distribute workloads across many nodes and handle higher scale, more complex execution patterns, and infrastructure-level concerns.
They use them to process large datasets more efficiently, support scalable ETL and analytics workloads, manage streaming data, and reduce the operational risk of running high-volume jobs.
No. They are most valuable at larger scales, but even smaller teams may use them when workloads involve distributed computation, event streams, or data volumes that are too large for simpler approaches.
Look at batch versus streaming support, cluster requirements, fault tolerance, performance, programming model, integration with your storage systems, observability, and how much operational overhead the framework introduces.
Batch processing handles data in larger scheduled groups, while stream processing handles data continuously or near real time as events arrive.
Not usually. They often complement warehouses by processing, transforming, or moving data before it reaches analytical storage, or by supporting workloads that are not well suited to warehouse execution alone.
Yes, many frameworks support streaming or near-real-time processing, which makes them useful for analytics, monitoring, anomaly detection, and event-driven data products.