
Firebase developer hit with €54,000 Gemini bill in 13 hours after misconfigured API key
Read the latest insights from the RepoRank editorial team.
Explore data pipeline tools for ingestion, transformation, movement, orchestration, scheduling, and reliability across modern data platforms. Compare the tools teams use to keep data flowing cleanly between systems at production scale.

Read the latest insights from the RepoRank editorial team.

Read the latest insights from the RepoRank editorial team.

Read the latest insights from the RepoRank editorial team.
Trending open-source projects, delivered weekly.

Data engineering is the backbone of modern analytics and data-driven software, making it possible to collect, transform, move, and serve data reliably across systems. Open source repositories play a major role in this ecosystem by providing practical tooling for orchestration, pipelines, warehousing, streaming, and platform design.
The open source data engineering landscape includes ETL and ELT tools, workflow orchestration systems, transformation frameworks, stream processing projects, warehouse utilities, and infrastructure-focused repositories built for scalable data operations. RepoRank helps surface the repositories that are earning real attention and momentum.
This page helps you discover the data engineering tools developers, analytics teams, and platform engineers are actively using, evaluating, and watching.
RepoRank focuses on real GitHub growth signals, helping you identify data engineering repositories that are active, relevant, and gaining adoption across data platform and infrastructure workflows.
Whether you are building reliable pipelines, evaluating orchestration frameworks, or tracking open source repositories shaping modern data infrastructure, this page helps you stay close to the projects gaining traction across data engineering.
Use this page to discover trending data engineering repositories, compare tools, and stay current with the open source projects shaping modern data infrastructure.
Data pipeline tools are tools used to ingest, move, transform, schedule, and monitor data as it flows between systems such as databases, warehouses, applications, and analytics platforms.
Transformation tools focus mainly on changing or modeling data, while data pipeline tools often cover the broader workflow, including ingestion, scheduling, orchestration, retries, dependencies, and delivery between systems.
They help teams keep data moving reliably, reduce manual workflow management, improve observability, and support the growing complexity of modern data platforms.
Look at source and destination support, orchestration features, reliability, observability, scaling behavior, developer experience, deployment model, and how well the tool fits your architecture.
ETL transforms data before loading it into the destination, while ELT loads raw data first and performs transformations later, often inside the warehouse or processing environment.
Not always at first, but as sources multiply and reliability requirements increase, dedicated tooling often becomes necessary to avoid brittle scripts and difficult-to-maintain workflows.
Yes. Some focus on scheduled batch workflows, while others support streaming or low-latency event pipelines for near-real-time movement and processing.
Orchestration tools coordinate task order, dependencies, scheduling, retries, and monitoring, making them a critical layer for managing complex multi-step data workflows.