blog.det.life
Open in
urlscan Pro
162.159.152.4
Public Scan
Submitted URL: https://blog.det.life/?source=read_next_recirc-----a6c9da65c335----0---------------------7693cce9_c520_4fad_bf2c_73344...
Effective URL: https://blog.det.life/?gi=a9f2c974eb06&source=read_next_recirc-----a6c9da65c335----0---------------------7693cce9_c520...
Submission: On June 21 via api from US — Scanned from DE
Effective URL: https://blog.det.life/?gi=a9f2c974eb06&source=read_next_recirc-----a6c9da65c335----0---------------------7693cce9_c520...
Submission: On June 21 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy. Homepage Open in app Sign inGet started DATA ENGINEER THINGS INSIGHTS AND IDEAS ON DATA AND ENGINEERING. ETLData ArchitectureOptimizationInterview GuideCareer GrowthAI in Data EngineeringAboutContribute FollowFollowing Dataframe API Roundup 2024 DATAFRAME API ROUNDUP 2024 A survey of current tooling Chad Isenberg Jun 16 Trending Now How Twitter processes 4 billion events in real-time daily HOW TWITTER PROCESSES 4 BILLION EVENTS IN REAL-TIME DAILY From Lambda to Kappa Vu Trinh May 25 Roadmap to Learn Data Engineering: How I Would Start Again ROADMAP TO LEARN DATA ENGINEERING: HOW I WOULD START AGAIN A completely free curriculum I wished I had Wei Chun Jun 7 Minds and Machines — AI for Mental Health Support, Fine-Tuning LLMs with LoRA in Practice MINDS AND MACHINES — AI FOR MENTAL HEALTH SUPPORT, FINE-TUNING LLMS WITH LORA IN PRACTICE Explore the potential of Large Language Models (LLMs) changing the future of mental healthcare and learn how to fine-tune LLMs by example Volker Janz May 19 The Hadoop Distributed File System THE HADOOP DISTRIBUTED FILE SYSTEM Everything you need to know about the HDFS Vu Trinh May 24 Real-Time Data Processing: Spark Streaming vs. Flink REAL-TIME DATA PROCESSING: SPARK STREAMING VS. FLINK Choosing the right tool for handling big data in real-time Steffi Christopher May 28 How to Create First Data Engineering Project? An Incremental Project Roadmap HOW TO CREATE FIRST DATA ENGINEERING PROJECT? AN INCREMENTAL PROJECT ROADMAP Build Data Engineering projects in this incremental approach for guaranteed success. Break tutorial hell and stop procrastinating.. Saikat Dutta Aug 20, 2023 Latest stories Building Secure Data Platforms: A Guide for Team’s Structure & Access Strategies BUILDING SECURE DATA PLATFORMS: A GUIDE FOR TEAM’S STRUCTURE & ACCESS STRATEGIES A step-by-step guide to design and build the right structure and data access strategies to handle various data access patterns across… Hussein Jundi Jun 15 Writing Clean and Maintainable Code in Python WRITING CLEAN AND MAINTAINABLE CODE IN PYTHON Key principles and practices Ana Escobar Jun 15 The Architecture of Apache Druid THE ARCHITECTURE OF APACHE DRUID When Hadoop can solve every problem Vu Trinh Jun 15 Automating SSL/TLS Certificate Management in Data Engineering AUTOMATING SSL/TLS CERTIFICATE MANAGEMENT IN DATA ENGINEERING Securing your Data Engineering projects using Let’s Encrypt, Certbot, Docker, and Docker Compose George Matheou Jun 13 Azure Databricks in the Enterprise Context: Networking AZURE DATABRICKS IN THE ENTERPRISE CONTEXT: NETWORKING A Comprehensive Overview of Network Security and Compliance with Databricks Eduard Popa Jun 7 What Consistency Really Means in Data Systems? WHAT CONSISTENCY REALLY MEANS IN DATA SYSTEMS? Consistency varies significantly across databases, distributed systems, and streaming systems. RisingWave Labs Jun 7 How to build a Data Pipeline with AWS Glue and Terraform HOW TO BUILD A DATA PIPELINE WITH AWS GLUE AND TERRAFORM A step-by-step guide to an ETL project that explores Australian property price Bella Jiang Jun 6 A Brief History of Data Management — From Relational Databases to Data Lakehouses A BRIEF HISTORY OF DATA MANAGEMENT — FROM RELATIONAL DATABASES TO DATA LAKEHOUSES How we evolved to modern data management approaches and what should we know as Data Engineers Ihor Lukianov Jun 2 Everything you need to know about MapReduce EVERYTHING YOU NEED TO KNOW ABOUT MAPREDUCE All the key insights from the paper MapReduce: Simplified Data Processing on Large Clusters from Google Vu Trinh Jun 1 Bloom Filter in Short BLOOM FILTER IN SHORT Set.contains() at scale with some False Positives Susmit May 30 Test Driven Development for Data Engineering (Part 1) TEST DRIVEN DEVELOPMENT FOR DATA ENGINEERING (PART 1) How to write unit tests for data engineering Yaakov Bressler May 28 Granular Look at Left, Semi, and Anti Joins in PySpark GRANULAR LOOK AT LEFT, SEMI, AND ANTI JOINS IN PYSPARK In data operations, understanding the inner-working of the various types of joins can optimize query performance and accuracy. Spark… Nicholas Piesco May 20 Customer segmentation using Spark ML and Scikit learn in Spark— part 3 CUSTOMER SEGMENTATION USING SPARK ML AND SCIKIT LEARN IN SPARK— PART 3 Introduction: Suhaib Arshad May 16 Understanding Snowflake Table Locks UNDERSTANDING SNOWFLAKE TABLE LOCKS A hands-on look at table locks. Jonathan Duran May 16 EDA and Data Transformation using PySpark — part 1 EDA AND DATA TRANSFORMATION USING PYSPARK — PART 1 GitHub repository Suhaib Arshad May 16 Automate Dbt Date Logic with Python — Part 2 AUTOMATE DBT DATE LOGIC WITH PYTHON — PART 2 Simplifying Our Models and Tests From Part 1 Using Meta Config Leo Godin May 14 The Inheritance Schema Design Pattern for MongoDB Data Modelling THE INHERITANCE SCHEMA DESIGN PATTERN FOR MONGODB DATA MODELLING In the world of NoSQL databases, particularly MongoDB, designing an efficient data model is crucial for optimal application performance… Karen Zhang May 12 How I build an ETL pipeline with AWS Glue, Lambda, and Terraform HOW I BUILD AN ETL PIPELINE WITH AWS GLUE, LAMBDA, AND TERRAFORM A Step-by-Step Guide Lorena Gongang May 12 Enhance your data quality tests with the dataform-assertions package ENHANCE YOUR DATA QUALITY TESTS WITH THE DATAFORM-ASSERTIONS PACKAGE dbt is no longer the only choice for testing data pipelines Fumiaki Kobayashi May 12 My Data Pipeline Orchestrators Journey MY DATA PIPELINE ORCHESTRATORS JOURNEY Originally Posted at: www.junaideffendi.com Junaid Effendi May 5 I spent 5 hours understanding more about the Delta Lake table format I SPENT 5 HOURS UNDERSTANDING MORE ABOUT THE DELTA LAKE TABLE FORMAT All insights from the paper: Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores Vu Trinh May 4 What is something we have but don’t own and is never working when you need it. WHAT IS SOMETHING WE HAVE BUT DON’T OWN AND IS NEVER WORKING WHEN YOU NEED IT. Testing is difficult but pains could be eased with unified tooling. Here we explore the pros and cons of testing with new tools to help us Peter Flook May 2 Installing (and Switching between) Different Versions of Python INSTALLING (AND SWITCHING BETWEEN) DIFFERENT VERSIONS OF PYTHON How to install and switch between different python versions. Yaakov Bressler May 1 I completed a Senior Data Engineer Code Challenge for fun, and this is how it went. PART II I COMPLETED A SENIOR DATA ENGINEER CODE CHALLENGE FOR FUN, AND THIS IS HOW IT WENT. PART II Question: Using MySQL’s public employee sample database, create a DAG to move data from the employee’s table to BigQuery. Jennifer Ebe Apr 27 How We Integrate 1000++ Hive Tables into Data Warehouse Without ETL Seamlessly HOW WE INTEGRATE 1000++ HIVE TABLES INTO DATA WAREHOUSE WITHOUT ETL SEAMLESSLY Migrating our data warehouse to Greenplum enables us to access data from Hive in real-time, eliminate storage issue, and much more! Bernard Adhitya Apr 26 Data Engineer Things Things learned in our data engineering journey and ideas on data and engineering. More information Followers 3.9K Elsewhere About Data Engineer ThingsLatest StoriesArchiveAbout MediumTermsPrivacyTeams