blog.det.life Open in urlscan Pro
162.159.152.4  Public Scan

Submitted URL: https://blog.det.life/?source=read_next_recirc-----a6c9da65c335----0---------------------7693cce9_c520_4fad_bf2c_73344...
Effective URL: https://blog.det.life/?gi=a9f2c974eb06&source=read_next_recirc-----a6c9da65c335----0---------------------7693cce9_c520...
Submission: On June 21 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.

Homepage
Open in app
Sign inGet started



DATA ENGINEER THINGS


INSIGHTS AND IDEAS ON DATA AND ENGINEERING.


ETLData ArchitectureOptimizationInterview GuideCareer GrowthAI in Data
EngineeringAboutContribute
FollowFollowing
Dataframe API Roundup 2024
DATAFRAME API ROUNDUP 2024

A survey of current tooling
Chad Isenberg
Jun 16
Trending Now
How Twitter processes 4 billion events in real-time daily
HOW TWITTER PROCESSES 4 BILLION EVENTS IN REAL-TIME DAILY

From Lambda to Kappa
Vu Trinh
May 25
Roadmap to Learn Data Engineering: How I Would Start Again
ROADMAP TO LEARN DATA ENGINEERING: HOW I WOULD START AGAIN

A completely free curriculum I wished I had
Wei Chun
Jun 7
Minds and Machines — AI for Mental Health Support, Fine-Tuning LLMs with LoRA in
Practice
MINDS AND MACHINES — AI FOR MENTAL HEALTH SUPPORT, FINE-TUNING LLMS WITH LORA
IN PRACTICE

Explore the potential of Large Language Models (LLMs) changing the future of
mental healthcare and learn how to fine-tune LLMs by example
Volker Janz
May 19
The Hadoop Distributed File System
THE HADOOP DISTRIBUTED FILE SYSTEM

Everything you need to know about the HDFS
Vu Trinh
May 24
Real-Time Data Processing: Spark Streaming vs. Flink
REAL-TIME DATA PROCESSING: SPARK STREAMING VS. FLINK

Choosing the right tool for handling big data in real-time
Steffi Christopher
May 28
How to Create First Data Engineering Project? An Incremental Project Roadmap
HOW TO CREATE FIRST DATA ENGINEERING PROJECT? AN INCREMENTAL PROJECT ROADMAP

Build Data Engineering projects in this incremental approach for guaranteed
success. Break tutorial hell and stop procrastinating..
Saikat Dutta
Aug 20, 2023
Latest stories
Building Secure Data Platforms: A Guide for Team’s Structure & Access Strategies
BUILDING SECURE DATA PLATFORMS: A GUIDE FOR TEAM’S STRUCTURE & ACCESS STRATEGIES

A step-by-step guide to design and build the right structure and data access
strategies to handle various data access patterns across…
Hussein Jundi
Jun 15
Writing Clean and Maintainable Code in Python
WRITING CLEAN AND MAINTAINABLE CODE IN PYTHON

Key principles and practices
Ana Escobar
Jun 15
The Architecture of Apache Druid
THE ARCHITECTURE OF APACHE DRUID

When Hadoop can solve every problem
Vu Trinh
Jun 15
Automating SSL/TLS Certificate Management in Data Engineering
AUTOMATING SSL/TLS CERTIFICATE MANAGEMENT IN DATA ENGINEERING

Securing your Data Engineering projects using Let’s Encrypt, Certbot, Docker,
and Docker Compose
George Matheou
Jun 13
Azure Databricks in the Enterprise Context: Networking
AZURE DATABRICKS IN THE ENTERPRISE CONTEXT: NETWORKING

A Comprehensive Overview of Network Security and Compliance with Databricks
Eduard Popa
Jun 7
What Consistency Really Means in Data Systems?
WHAT CONSISTENCY REALLY MEANS IN DATA SYSTEMS?

Consistency varies significantly across databases, distributed systems, and
streaming systems.
RisingWave Labs
Jun 7
How to build a Data Pipeline with AWS Glue and Terraform
HOW TO BUILD A DATA PIPELINE WITH AWS GLUE AND TERRAFORM

A step-by-step guide to an ETL project that explores Australian property price
Bella Jiang
Jun 6
A Brief History of Data Management — From Relational Databases to Data
Lakehouses
A BRIEF HISTORY OF DATA MANAGEMENT — FROM RELATIONAL DATABASES TO DATA
LAKEHOUSES

How we evolved to modern data management approaches and what should we know as
Data Engineers
Ihor Lukianov
Jun 2
Everything you need to know about MapReduce
EVERYTHING YOU NEED TO KNOW ABOUT MAPREDUCE

All the key insights from the paper MapReduce: Simplified Data Processing on
Large Clusters from Google
Vu Trinh
Jun 1
Bloom Filter in Short
BLOOM FILTER IN SHORT

Set.contains() at scale with some False Positives
Susmit
May 30
Test Driven Development for Data Engineering (Part 1)
TEST DRIVEN DEVELOPMENT FOR DATA ENGINEERING (PART 1)

How to write unit tests for data engineering
Yaakov Bressler
May 28
Granular Look at Left, Semi, and Anti Joins in PySpark
GRANULAR LOOK AT LEFT, SEMI, AND ANTI JOINS IN PYSPARK

In data operations, understanding the inner-working of the various types of
joins can optimize query performance and accuracy. Spark…
Nicholas Piesco
May 20
Customer segmentation using Spark ML and Scikit learn in Spark— part 3
CUSTOMER SEGMENTATION USING SPARK ML AND SCIKIT LEARN IN SPARK— PART 3

Introduction:
Suhaib Arshad
May 16
Understanding Snowflake Table Locks
UNDERSTANDING SNOWFLAKE TABLE LOCKS

A hands-on look at table locks.
Jonathan Duran
May 16
EDA and Data Transformation using PySpark — part 1
EDA AND DATA TRANSFORMATION USING PYSPARK — PART 1

GitHub repository
Suhaib Arshad
May 16
Automate Dbt Date Logic with Python — Part 2
AUTOMATE DBT DATE LOGIC WITH PYTHON — PART 2

Simplifying Our Models and Tests From Part 1 Using Meta Config
Leo Godin
May 14
The Inheritance Schema Design Pattern for MongoDB Data Modelling
THE INHERITANCE SCHEMA DESIGN PATTERN FOR MONGODB DATA MODELLING

In the world of NoSQL databases, particularly MongoDB, designing an efficient
data model is crucial for optimal application performance…
Karen Zhang
May 12
How I build an ETL pipeline with AWS Glue, Lambda, and Terraform
HOW I BUILD AN ETL PIPELINE WITH AWS GLUE, LAMBDA, AND TERRAFORM

A Step-by-Step Guide
Lorena Gongang
May 12
Enhance your data quality tests with the dataform-assertions package
ENHANCE YOUR DATA QUALITY TESTS WITH THE DATAFORM-ASSERTIONS PACKAGE

dbt is no longer the only choice for testing data pipelines
Fumiaki Kobayashi
May 12
My Data Pipeline Orchestrators Journey
MY DATA PIPELINE ORCHESTRATORS JOURNEY

Originally Posted at: www.junaideffendi.com
Junaid Effendi
May 5
I spent 5 hours understanding more about the Delta Lake table format
I SPENT 5 HOURS UNDERSTANDING MORE ABOUT THE DELTA LAKE TABLE FORMAT

All insights from the paper: Delta Lake: High-Performance ACID Table Storage
over Cloud Object Stores
Vu Trinh
May 4
What is something we have but don’t own and is never working when you need it.
WHAT IS SOMETHING WE HAVE BUT DON’T OWN AND IS NEVER WORKING WHEN YOU NEED IT.

Testing is difficult but pains could be eased with unified tooling. Here we
explore the pros and cons of testing with new tools to help us
Peter Flook
May 2
Installing (and Switching between) Different Versions of Python
INSTALLING (AND SWITCHING BETWEEN) DIFFERENT VERSIONS OF PYTHON

How to install and switch between different python versions.
Yaakov Bressler
May 1
I completed a Senior Data Engineer Code Challenge for fun, and this is how it
went. PART II
I COMPLETED A SENIOR DATA ENGINEER CODE CHALLENGE FOR FUN, AND THIS IS HOW IT
WENT. PART II

Question: Using MySQL’s public employee sample database, create a DAG to move
data from the employee’s table to BigQuery.
Jennifer Ebe
Apr 27
How We Integrate 1000++ Hive Tables into Data Warehouse Without ETL Seamlessly
HOW WE INTEGRATE 1000++ HIVE TABLES INTO DATA WAREHOUSE WITHOUT ETL SEAMLESSLY

Migrating our data warehouse to Greenplum enables us to access data from Hive in
real-time, eliminate storage issue, and much more!
Bernard Adhitya
Apr 26
Data Engineer Things
Things learned in our data engineering journey and ideas on data and
engineering.
More information
Followers
3.9K
Elsewhere

About Data Engineer ThingsLatest StoriesArchiveAbout MediumTermsPrivacyTeams