RepoRankRepoRank

Pillar

Data Engineering Repositories & Open Source Data Infrastructure Projects

Explore the most popular data engineering repositories, pipeline tools, and open source data infrastructure projects. From ETL workflows and orchestration systems to warehousing utilities, streaming platforms, and data platform tooling, discover which data engineering projects are gaining traction on GitHub.

Explore Data Engineering Topics

No active child topics are mapped to this pillar yet.

Recent blogs

Stay Ahead

Get weekly Data Engineering repos in your inbox

Trending open-source projects, delivered weekly.

Get weekly Data Engineering repos in your inbox preview

Explore Open Source Data Engineering

Data engineering is the backbone of modern analytics and data-driven software, making it possible to collect, transform, move, and serve data reliably across systems. Open source repositories play a major role in this ecosystem by providing practical tooling for orchestration, pipelines, warehousing, streaming, and platform design.

The open source data engineering landscape includes ETL and ELT tools, workflow orchestration systems, transformation frameworks, stream processing projects, warehouse utilities, and infrastructure-focused repositories built for scalable data operations. RepoRank helps surface the repositories that are earning real attention and momentum.

What You Will Find Here

  • ETL, ELT, and data pipeline repositories
  • Workflow orchestration and transformation tooling
  • Streaming, warehousing, and data platform projects
  • Emerging data engineering repositories gaining traction

This page helps you discover the data engineering tools developers, analytics teams, and platform engineers are actively using, evaluating, and watching.

Why RepoRank Is Different

RepoRank focuses on real GitHub growth signals, helping you identify data engineering repositories that are active, relevant, and gaining adoption across data platform and infrastructure workflows.

  • Live GitHub star growth and activity tracking
  • A mix of established data infrastructure tools and rising projects
  • A discovery layer built for practical data platform work

Built for Data Engineers, Platform Teams, and Analytics Organizations

Whether you are building reliable pipelines, evaluating orchestration frameworks, or tracking open source repositories shaping modern data infrastructure, this page helps you stay close to the projects gaining traction across data engineering.

  • Data engineers building pipelines and transformation workflows
  • Platform teams evaluating warehouse and orchestration tooling
  • Organizations tracking fast-moving open source data projects

Use this page to discover trending data engineering repositories, compare tools, and stay current with the open source projects shaping modern data infrastructure.

Data Engineering FAQ

What are data engineering repositories?

Data engineering repositories are open source codebases related to moving, transforming, orchestrating, storing, and serving data across modern systems and analytics workflows.

What types of data engineering projects are included here?

This page includes ETL and ELT tools, orchestration systems, transformation frameworks, streaming projects, warehouse utilities, and broader open source repositories for data infrastructure.

How does RepoRank rank data engineering repositories?

RepoRank uses real GitHub growth signals such as star growth, activity, and project momentum to surface data engineering projects that are gaining traction.

Are these data engineering repositories open source?

Yes, all featured repositories are open source projects sourced directly from GitHub.

Why should I track trending data engineering repositories?

Tracking trending data engineering repositories helps you discover new data platform workflows, compare infrastructure patterns, and evaluate the tools data teams are actively adopting.

Are data engineering tools only for large enterprises?

No. Data engineering tools are also useful for startups, product teams, and growing organizations that need reliable pipelines, better analytics foundations, or scalable data workflows.

What is the difference between data engineering and data science tools?

Data engineering tools focus on data movement, orchestration, transformation, infrastructure, and reliability, while data science tools are generally more focused on analysis, experimentation, modeling, and insight generation.

How do I choose the right data engineering repository?

Start with your data stack, scale needs, and workflow. Consider maintainability, ecosystem support, orchestration fit, operational complexity, documentation, and how well the repository aligns with your team and infrastructure.