Data Pipelines

Do you want to build an ETL pipeline?

Analysts and data scientists use SQL queries to pull data from the data storage underbelly of an enterprise. They mold the data, reshape it, and analyze it, so it can offer revenue-generating business insights to the company. But analytics is only as good as the material it works with. That is, if the underlying data is missing, compromised, incomplete, or wrong, so will the data analysis and inferences derived from it.

The 5 Best Data Pipeline Tools for 2022

In 2021, data analysts have access to more data than at any other time in history. Experts believe the amount of data generated in 2020 totaled 44 zettabytes, and humans will create around 463 exabytes every day by 2025. That's an unimaginable volume of data! All this data, however, is worthless unless you can process it, analyze it, and find insights hidden within it. Data pipelines help you do that.

The Ultimate Guide to Building a Data Pipeline

Data is the new oil. Almost every industry is becoming more and more data-driven, and this trend will only continue to grow in the coming years. With so many organizations now relying on data for decision-making, they must easily access and analyze their information through data pipelines. This article will get you started on how to build your own data pipeline.

ETL Pipeline vs. Data Pipeline: What's the Difference?

ETL Pipeline and Data Pipeline are two concepts growing increasingly important as businesses keep adding applications to their tech stacks. More and more data is moving between systems, and this is where Data and ETL Pipelines play a crucial role. Take a comment on social media, for example. It might be picked up by your tool for social listening and registered in a sentiment analysis app.

Optimizing your BigQuery incremental data ingestion pipelines

When you build a data warehouse, the important question is how to ingest data from the source system to the data warehouse. If the table is small you can fully reload a table on a regular basis, however, if the table is large a common technique is to perform incremental table updates. This post demonstrates how you can enhance incremental pipeline performance when you ingest data into BigQuery.

Migrating Data Pipelines from Enterprise Schedulers to Airflow

At Airflow Summit 2021, Unravel’s co-founder and CTO, Shivnath Babu and Hari Nyer, Senior Software Engineer, delivered a talk titled Lessons Learned while Migrating Data Pipelines from Enterprise Schedulers to Airflow. This story, along with the slides and videos included in it, comes from the presentation.

Automating Data Pipelines in CDP with CDE Managed Airflow Service

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. By leveraging Spark on Kubernetes as the foundation along with a first class job management API many of our customers have been able to quickly deploy, monitor and manage the life cycle of their spark jobs with ease. In addition, we allowed users to automate their jobs based on a time-based schedule.