blog.det.life Open in urlscan Pro
162.159.153.4  Public Scan

Submitted URL: http://blog.det.life/
Effective URL: https://blog.det.life/?gi=19c554cf5094
Submission: On August 16 via manual from MA — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.

Homepage
Open in app
Sign inGet started



DATA ENGINEER THINGS


INSIGHTS AND IDEAS ON DATA AND ENGINEERING.


ETLData ArchitectureOptimizationInterview GuideCareer GrowthAI in Data
EngineeringAboutContribute
FollowFollowing
How does Notion handle 200 billion data entities?
HOW DOES NOTION HANDLE 200 BILLION DATA ENTITIES?

From PostgreSQL → Data Lake
Vu Trinh
Aug 6
Trending Now
No, Data Engineers Don’t NEED dbt.
NO, DATA ENGINEERS DON’T NEED DBT.

But It Sure Does Solve a Lot of Problems
Leo Godin
Jul 19
A Practitioner’s Guide to Developing Data Engineering Solutions with Databricks
A PRACTITIONER’S GUIDE TO DEVELOPING DATA ENGINEERING SOLUTIONS WITH DATABRICKS

Development Approaches, Environments, CI/CD and Testing with Databricks
Eduard Popa
Jul 25
Apache Kafka — Important Designs
APACHE KAFKA — IMPORTANT DESIGNS

Filesystem, Zero-copy, and Batching
Vu Trinh
Jul 13
Diving Deep into LinkedIn’s Data Infrastructure: My 6-Hour Learning & Key
Takeaways
DIVING DEEP INTO LINKEDIN’S DATA INFRASTRUCTURE: MY 6-HOUR LEARNING & KEY
TAKEAWAYS

Things I distill after reading the paper: Data Infrastructure at LinkedIn
Vu Trinh
Aug 3
Netflix Maestro and Apache Airflow — Competitors or Companions in Workflow
Orchestration?
NETFLIX MAESTRO AND APACHE AIRFLOW — COMPETITORS OR COMPANIONS IN WORKFLOW
ORCHESTRATION?

How Netflix Maestro and Apache Airflow complement each other. Delve into their
features, strengths, and use cases.
Volker Janz
Jul 29
Getting Started with APIs for Data Engineers
GETTING STARTED WITH APIS FOR DATA ENGINEERS

So what are APIs and what do they do?
Aminat Lawal
Jul 26
Latest stories
Creating Business Value with Databricks: The Role of Solution Architects
CREATING BUSINESS VALUE WITH DATABRICKS: THE ROLE OF SOLUTION ARCHITECTS

Bridging the gap between stakeholders and data teams to bring valuable data
solutions into production
Eduard Popa
Aug 15
How Did LinkedIn Handle 7 Trillion Messages Daily With Apache Kafka?
HOW DID LINKEDIN HANDLE 7 TRILLION MESSAGES DAILY WITH APACHE KAFKA?

Was adding more machines enough?
Vu Trinh
Aug 14
Timeless Skills for Navigating the Evolving World of Data Engineering
TIMELESS SKILLS FOR NAVIGATING THE EVOLVING WORLD OF DATA ENGINEERING

What technologies and programming languages should you learn to become a data
engineer?
Ben Rogojan
Aug 12
Perhaps the ultimate Orchestration Tool was in front of us all along
PERHAPS THE ULTIMATE ORCHESTRATION TOOL WAS IN FRONT OF US ALL ALONG

Hopefully you’ve been using this all along
Hugo Lu
Aug 11
I spent 4 hours learning Apache Iceberg. Here’s what I found.
I SPENT 4 HOURS LEARNING APACHE ICEBERG. HERE’S WHAT I FOUND.

The table format’s overview and architecture
Vu Trinh
Aug 10
Understanding Flight Cancellations and Rescheduling in Airlines Using Databricks
and PySpark
UNDERSTANDING FLIGHT CANCELLATIONS AND RESCHEDULING IN AIRLINES USING DATABRICKS
AND PYSPARK

Using Databricks and PySpark for Enhanced Flight Operations in the Airline
Industry.
Brahmareddy, The Data Engineer.
Aug 9
Big-O Essentials for Data Engineers
BIG-O ESSENTIALS FOR DATA ENGINEERS

Essential Concepts to Enhance Your Coding Efficiency
Santosh Joshi
Aug 8
This is What I will do to Become a Data Engineer in 2025
THIS IS WHAT I WILL DO TO BECOME A DATA ENGINEER IN 2025

Discover how to become a data engineer with this comprehensive guide, whether
you’re a beginner or an intermediate software…
Syed Kadar Ansari Syed Ahamed
Aug 5
Adding a custom source to PyAirbyte using the no-code builder
ADDING A CUSTOM SOURCE TO PYAIRBYTE USING THE NO-CODE BUILDER

Learn how to build your customized data sources with PyAirbyte
Felix Gutierrez
Aug 4
Make Your Own Data Diff CLI from Scratch using DBT, Snowflake and Python: Part 1
MAKE YOUR OWN DATA DIFF CLI FROM SCRATCH USING DBT, SNOWFLAKE AND PYTHON: PART 1

Background
Matthew Macias
Aug 1
Eliminate Data Errors: Four SQL Techniques to Enhance Data Quality
ELIMINATE DATA ERRORS: FOUR SQL TECHNIQUES TO ENHANCE DATA QUALITY

Introduction
Rajanikant Vellaturi
Jul 30
Batch to Streaming eTL with Redpanda Connect
BATCH TO STREAMING ETL WITH REDPANDA CONNECT

This post covers a demo of how to convert a batch delivered complex CSV file
into a realtime stream in AVRO format. It is split into…
Mark Olliver
Jul 29
Migrating Your Existing ELT Data Pipeline to PyAirbyte
MIGRATING YOUR EXISTING ELT DATA PIPELINE TO PYAIRBYTE

Leverage the power of data integration with PyAirbyte
Felix Gutierrez
Jul 27
Apache Kafka — Consumer
APACHE KAFKA — CONSUMER

The clients who read
Vu Trinh
Jul 27
Perfect Data Pipeline: How to Build Them Nearly Flawless
PERFECT DATA PIPELINE: HOW TO BUILD THEM NEARLY FLAWLESS

Great for data engineers aiming to optimize data workflows and decision-making
processes in their projects.
Rui Carvalho
Jul 26
A Data Quality Starter Toolkit: Building Trustworthy Data with YData, Soda, and
pandas
A DATA QUALITY STARTER TOOLKIT: BUILDING TRUSTWORTHY DATA WITH YDATA, SODA,
AND PANDAS

A hands-on walkthrough of some key data quality tooling
Eva Revear
Jul 21
Stream Processing Systems: RisingWave vs ksqlDB
STREAM PROCESSING SYSTEMS: RISINGWAVE VS KSQLDB

Understand the differences between ksqlDB and RisingWave, two powerful streaming
systems to decide the right solution for your use case.
RisingWave Labs
Jul 21
CAP theorem — What Every Data Engineer Should Know
CAP THEOREM — WHAT EVERY DATA ENGINEER SHOULD KNOW

A Data Engineer’s Guide to Balancing Consistency, Availability, and Partition
Tolerance
Santosh Joshi
Jul 20
Apache Kafka — Producer
APACHE KAFKA — PRODUCER

The clients who write
Vu Trinh
Jul 20
Exploring Advanced Open Data Formats: Apache Hudi, Apache Iceberg, and Delta
Lake
EXPLORING ADVANCED OPEN DATA FORMATS: APACHE HUDI, APACHE ICEBERG, AND
DELTA LAKE

Discover the in-depth details of Apache Hudi, Apache Iceberg, and Delta Lake
Syed Kadar Ansari Syed Ahamed
Jul 18
DBT isn’t dynamic: Part 2
DBT ISN’T DYNAMIC: PART 2

A full data pipeline explanation
Cai Parry-Jones
Jul 13
How I built a Scalable, Robust, and Cost-Effective Data Platform for a Fintech
Company
HOW I BUILT A SCALABLE, ROBUST, AND COST-EFFECTIVE DATA PLATFORM FOR A
FINTECH COMPANY

In today’s data-driven business world, having a unified data platform is crucial
for effectively storing and processing data based on…
Sainath
Jul 12
Execute Azure Data Factory REST APIs with Python
EXECUTE AZURE DATA FACTORY REST APIS WITH PYTHON

Understand the step-by-step process to execute Azure Data Factory (ADF) REST
APIs with Python. The guide includes multiple working…
Rahul Madhani
Jul 12
Building a Local Data Lake from scratch with MinIO, Iceberg, Spark, StarRocks,
Mage, and Docker
BUILDING A LOCAL DATA LAKE FROM SCRATCH WITH MINIO, ICEBERG, SPARK, STARROCKS,
MAGE, AND DOCKER

Hello again, fellow technology enthusiasts! I am a software/data engineer who
transitioned from data science. The learning curve in this…
George Zefkilis
Jul 12
Data Modeling with Snowflake: A concise critical review
DATA MODELING WITH SNOWFLAKE: A CONCISE CRITICAL REVIEW

Should you skip, borrow, or buy?
Chad Isenberg
Jul 12
Data Engineer Things
Things learned in our data engineering journey and ideas on data and
engineering.
More information
Followers
7.3K
Elsewhere

About Data Engineer ThingsLatest StoriesArchiveAbout MediumTermsPrivacyTeams