Blog

post-image

Apache Kafka - Achieving Exactly-Once Semantics (EOS)

November 2025 by Dakshin Kafka, Distributed System, Event-driven

In distributed systems, message delivery guarantees are one of the most critical design considerations. Every pipeline must answer a deceptively simple question: how many times will this message be processed?

Traditionally, systems have offered two guarantees:

  • At-most-once delivery: Messages are processed zero or one time. This avoids duplicates but risks data loss if failures occur.
  • At-least-once delivery: Messages are processed one or more times. This ensures reliability but introduces duplicates that downstream systems must handle.

In domains like payments, fraud detection, and compliance auditing, duplicates or lost events can have severe consequences.

Continue Reading
post-image

PostgreSQL Under Load - Part 2: Vacuum, Analyze, and Index

October 2025 by Dakshin Postgres, PgSQL, Table Partition, Performance Fix

In Part 1, we tackled partitioning — the first step to keeping Postgres performant under heavy write loads. Partitioning solved routing, not cleanup.

Postgres uses MVCC (Multi-Version Concurrency Control) to handle concurrent reads and writes. It’s brilliant, but it comes with a caveat: every update leaves behind dead tuples, every delete lingers until vacuumed, and every index quietly bloats over time.

This post dives into the cleanup side of scaling:

Continue Reading
post-image

PostgreSQL Under Load - Part 1: Partition Like a Pro

September 2025 by Dakshin Postgres, PgSQL, Table Partition, Performance Fix

This post is split into two parts for brevity.

  1. Part 1: Partition like a Pro
  2. Part 2: Vacuum, Analyze and Index

Let’s start by setting the scene.

You’ve got a backend service humming in production - maybe its ingesting user events, transaction logs, or IoT metrics. Everything’s smooth until one day, your dashboards light up:

  •   Insert latency spikes from 10ms to 500ms.
  •   CPU usage climbs, even though your queries haven’t changes.
  •   Auto-vacuum falls behind, and your disk usage balloons overnight.
  •   A simple SELECT query takes 3 seconds. Three. Whole. Seconds.

You check the table: 400 million rows ; No analytics - just trying to write fast and read fresh. But Postgres is groaning under the weight of your workload. It’s a write-heavy, recent-data problem – and Postgres can absolutely handle it, only if you treat it right.

Continue Reading
post-image

Apache Kafka - The Architecture to Handle Billions of Events Per Day

August 2025 by Dakshin Kafka, Distributed System, Event-driven

The modern digital world operates on an unprecedented volume of data. Every click, transaction, sensor reading, and user interaction is an event, and the systems designed to process them must handle this stream in real-time, reliably, and at massive scale.

The Scale Challenge

  • LinkedIn, Kafka’s creator, processes over 7 trillion messages per day.
  • Uber handles trillions of messages and multiple petabytes of data daily, from calculating ETAs to driver-rider matching.
  • FinTech companies handle over a billion events every day to process transactions and real-time monitoring. In this deep dive, we will move past the basic definitions of Kafka and explore the specific architectural patterns and configuration tuning strategies that allow Kafka to manage billions of events per day.

The Three Pillars of Kafka Scalability

Kafka’s remarkable scaling ability is rooted in a design philosophy that treats data not as records in a database, but as an immutable, distributed commit log. This simple, yet powerful, concept allows it to sustain immense throughput.

Continue Reading
post-image

Inside the JVM - Profiling & Diagnosing Memory Leaks in Java

August 2025 by Dakshin JVM, Java, OOM, Memory Leak

A few months back, while working on a backend service that used Drools for rule evaluation, I noticed something odd - JVM heap usage kept climbing, GC metrics were off, and eventually, the service crashed with an OutOfMemoryError. Profiling the JVM with VisualVM and Java Flight Recorder, and analyzing a heap dump revealed a memory leak tied to rule engine objects.

The root cause? A Drool’s Session resource that wasn’t closed properly. That one oversight let to retained references, bloated heap and a production outage.

Continue Reading

Let's Connect

for a cup of coffee, challenges, or conversations that spark something new

dakshin.g [at] outlook [dot] com
www.dakshin.cc