Vincent Nyanga

Two-Phase Commit: The Protocol Behind Every All-or-Nothing Guarantee

Vincent Nyanga — Fri, 12 Jun 2026 08:00:47 GMT

Introduction

You are standing at an ATM abroad. You request $200. Behind the scenes, two banks must coordinate: your home bank needs to debit your account, and the local bank needs to authorise the cash dispense. If your account is debited but the machine jams, you lose $200. If the machine dispenses cash but the debit fails, the local bank eats the loss. Neither outcome is acceptable. Both sides must succeed together, or neither proceeds.

That is the problem Two-Phase Commit (2PC) solves. When a transaction spans multiple systems, 2PC ensures that they all commit or roll back together—all or nothing.

First described in Jim Grey’s 1978 paper, 2PC is one of the oldest protocols in distributed computing. It powers ATM networks, airline booking systems, distributed databases, and enterprise platforms. Whenever money moves, inventory is reserved, or seats are booked, 2PC is probably involved. In this post, we will examine how the protocol operates, the trade-offs it makes, the real-world systems that depend on it, and the modern patterns that support it.

Let’s get started!

How Two-Phase Commit Works

The protocol involves a coordinator (the node managing the transaction) and multiple participants (the nodes holding the data). It unfolds in two phases.

Phase 1: Prepare (The Vote)

1. The coordinator sends a PREPARE message to every participant.

2. Each participant executes the transaction locally, acquires locks, writes undo/redo information to its write-ahead log, and ensures it can commit if asked.

3. Each participant responds with YES (I can commit) or NO (I cannot).

4. This is the critical moment: once a participant votes YES, they have made a binding promise. It cannot unilaterally back out.

Phase 2: Commit or Abort (The Decision)

1. If every participant voted YES, the coordinator writes a COMMIT record to its log and sends COMMIT to all participants.

2. If any participant voted NO, the coordinator sends ABORT to everyone.

3. Each participant executes the decision, releases its locks, and acknowledges back to the coordinator.

In the normal case, this takes 2 network round-trip messages and 3N messages for N participants (N prepare messages, N votes, N commit/abort messages). In a same-datacenter deployment, this adds roughly 2-5ms of overhead per transaction.

The Trade-Off: Blocking

Every engineering decision has a cost. For 2PC, the cost is the potential for blocking.

Consider this scenario: all participants vote YES in Phase 1. The coordinator receives every vote and decides to commit. But before it can broadcast that decision, it crashes.

Now the participants are stuck. They voted YES, so they surrendered their right to abort. But they never received the commit message, so they cannot commit either. Their locks are secured. Their resources are frozen. They must wait indefinitely for the coordinator to recover.

This is not a theoretical concern. It has been formally proven by Bernstein, Hadzilacos, and Goodman that no commit protocol can simultaneously handle arbitrary failures *and* allow independent recovery. Blocking is baked into the math.

The dangerous window is typically very short (on the order of milliseconds in a healthy system). But when a coordinator failure coincides with that window, cascading lock contention can bring down significant portions of a database.

Why Not Three-Phase Commit?

Three-Phase Commit (3PC), proposed by Dale Skeen in 1981, adds an intermediate “pre-commit” phase to eliminate blocking. In theory, it works. In practice, 3PC assumes a fail-stop model with no network partitions. Real networks have partitions. This means 3PC can violate safety guarantees in production environments.

3PC also requires 3 round-trips instead of 2, adding latency to every transaction, even in the normal case. The result: virtually no production system uses 3PC. The industry moved in a different direction entirely.

2PC in the Wild

2PC is far more pervasive than most engineers realise. You interact with it every day.

Financial Systems

When you withdraw cash from an ATM that belongs to a different bank, 2PC coordinates the transaction. Your bank must debit your account, and the ATM’s bank must authorise the dispensing of cash. Both sides must agree, or neither proceeds. Interbank transfers, payment processing networks, and stock trading settlement systems all rely on the same all-or-nothing guarantee. In finance, “partial success” is not an option.

Travel and Booking Systems

Book a vacation package with a flight, hotel, and rental car. The booking system requires all three reservations to succeed. If the flight is confirmed but the hotel is sold out, you don’t want to be stuck with a flight to a city with nowhere to stay. Travel aggregators use 2PC (or 2PC-like protocols) to coordinate across multiple reservation systems.

Enterprise Systems

Enterprise Resource Planning (ERP) systems frequently coordinate transactions across inventory, billing, and shipping databases. When an order is placed, inventory must be decremented, an invoice must be created, and a shipment must be scheduled. XA transactions (the standardised 2PC interface) have been the backbone of Java enterprise systems for decades, coordinating database writes with JMS message queue publishes atomically.

Distributed Databases

Inside modern distributed databases, 2PC is the standard for cross-shard transactions. The database hides the complexity from you, but the protocol is running under the hood.

Google Spanner: 2PC at Global Scale

Spanner is the canonical example. It uses 2PC on top of Paxos groups. When a transaction spans multiple shards, the Paxos group leaders coordinate via 2PC. Each Paxos group independently replicates data, so if a coordinator fails, Paxos elects a new leader and the 2PC can continue.

The breakthrough is TrueTime, a clock system based on GPS receivers and atomic clocks that provides globally bounded clock uncertainty (typically under 7ms). Spanner uses TrueTime to assign commit timestamps that guarantee external consistency, then waits (”commit-wait”) until the uncertainty window passes before declaring the transaction committed.

CockroachDB: Parallel Commits

CockroachDB, inspired by Spanner but without specialised hardware, faced a problem: traditional 2PC required two sequential rounds of consensus writes, doubling latency for cross-range transactions.

Their solution, Parallel Commits, introduces a STAGED transaction status that can be written in parallel with the final batch of writes. Once all writes have succeeded, the commit cannot fail, so the client receives success immediately. This cuts the final-batch commit latency in half.

Dropbox Edgestore: Modified 2PC at 10M Requests/Second

Dropbox’s metadata store handles over 10 million requests per second using a modified 2PC. Their key optimisation: only 5-10% of transactions actually cross shard boundaries (thanks to careful data colocation). For those that do, an external durable transaction record serves as the source of truth, and a strongly-consistent cache absorbs 95%+ of reads.

When 2PC Shines

2PC is the right tool when:

Strong consistency is non-negotiable: financial transactions, inventory systems where overselling is catastrophic, and regulatory compliance
Participants are co-located: same data centre, sub-millisecond network latency, where the 2-5ms overhead is negligible
The participant count is small: 2-5 participants is the sweet spot
You control all participants: a single database system, a tightly coupled set of internal services, or systems that all support XA

2PC becomes more expensive when transactions are long-running (locks held for extended periods), participants are geographically distributed (100ms+ round-trip times), or the number of participants grows large. In those cases, the alternatives below can help.

Complementary Patterns

When 2PC is not the right fit, these patterns fill the gaps.

Saga Pattern

The most popular approach for coordinating across independent microservices. Each service executes a local transaction and publishes an event. If a step fails, compensating transactions undo previous steps. Sagas come in two flavours: choreography (event-driven, with no central coordinator) and orchestration (where a saga orchestrator directs each step). The trade-off is that sagas provide eventual consistency, not strong consistency. Other transactions can see intermediate states.

Transactional Outbox

Solves the specific “dual write” problem (updating a database and publishing a message atomically). The service writes both the business data and an outbox message to its local database in one transaction. A separate process reads the outbox and publishes to the message broker. No distributed transaction needed.

Deterministic Databases (Calvin)

Daniel Abadi’s research takes a different approach: eliminate 2PC by making execution deterministic. All nodes agree on the order of transactions before executing them. Because execution is deterministic, all replicas arrive at the same state without coordination. The catch: you need to know the full read/write set of every transaction in advance, which does not fit all workloads.

Conclusion

Two-Phase Commit is 47 years old and still foundational. The protocol’s math has not changed: it guarantees atomicity at the cost of potential blocking. What has changed is how we layer other techniques on top of it.

Google wraps 2PC with Paxos and TrueTime to make it work at a global scale. CockroachDB halves its latency with Parallel Commits. ATM networks and booking systems rely on it every second of every day. And when the trade-offs don’t fit, sagas and outbox patterns step in as complements, not replacements.

The next time you withdraw cash from a foreign ATM, book a multi-leg trip, or watch a distributed database commit across shards, remember: 2PC is doing the heavy lifting. Forty-seven years in, the protocol is as relevant as ever.

Coding agents have robbed me of my emotional connection to software I produce

Vincent Nyanga — Fri, 05 Jun 2026 07:07:53 GMT

It’s 4 pm on a Wednesday, and the massive feature I started working on 2 days ago is all but done: 85% test coverage, clean, easy-to-follow architecture (doesn’t look like slop at all 😂), and all known edge cases covered. This should be cause for celebration. I’m moving way faster than I used to, still delivering at a high level, but I’m not feeling it; there’s no true sense of accomplishment. Is this normal?

I’ve been designing and building high-quality software systems for over a decade now, and each time I complete a project, especially one in which my technical boundaries were stretched, I’d have this euphoria, a dopamine hit second to none. I’d quickly rush to my ‘brag doc’, a living document where I jot down the proudest moments of my software engineering career. Gone are those days, at least for the most part. Ever since I started using coding agents (they are great 80-90% of the time by the way), I don’t feel the same way about the software I produce.

This is my workflow when I’m using coding agents:

Draft a high-level spec, a 25 thousand foot view of the feature, including diagrams in some instances.
Iterate over the spec with the agent till I’m satisfied it understands what I need to build. This normally takes hours, but some have gone for more than a day.
Ask the agent to draft bite-sized chunks that it will build. If I’m using GitHub, I then create epics and issues from those chunks.
Ask the agent to work on one chunk at a time. I review its work as it’s busy working, as well as the PR when it’s done.
After a few hours for small tasks or a few days for larger features, the work is complete.

As you can see, I don’t throw the work at the agent to go figure it out and come back with a result. I’m actively involved, watching it with such immense attention that as soon as it starts to stray from the path, I jump right in to steer it back onto the right path.

Despite all this involvement, I don’t feel as emotionally attached to the end product, for some reason. The spec is mine, the high-level plan is mine; everything except the actual lines of code. This is baffling. Why don’t I feel as connected to the features I build using agents? Is it because I didn’t personally type the code? Is it because I didn’t ‘sweat’ as much while working on the feature? Why don’t I have the same dopamine hit?

I remember back in the day when I’d spend days on end researching the best way to solve a particular problem. I have a whole directory on my computer dedicated to proof-of-concept projects and spikes to verify or debunk certain assumptions. I’d go to page 10 of the Google search results, read countless forums, GitHub issues, and StackOverflow (my wife recently asked me why I haven’t mentioned StackOverflow in a long time), all in a quest to find the best solution. And man, when I finally find the solution! The feeling was second to none.

These days, all this is now wrapped in a tool call for an AI agent. What used to take me days is now done in a couple of seconds, minutes at most. That’s progress, right? Well, I’m not too sure to be honest.

Am I moving quicker? Yes.

Am I still producing high-quality (in some cases, higher-quality) software? Yes.

Am I learning and growing as a practitioner? Absolutely note.

That may be why I’m struggling to feel connected to the work I produce with coding agents. When the work is done, feature is built, I personally didn’t gain much from the experience. I feel stagnant, stuck.

Something’s got to change. Coding agents, by the looks of things, are here to stay, and they’re going to continuously get better (again, looking at the trends). The onus is now on me to find different ways to get the dopamine hit I used to get pre-AI. I need to find a way to feel connected to my work.

I will find a way, like I always do.

Global Error Handling with Problem Details (RFC 9457) in ASP.NET Core

Vincent Nyanga — Fri, 29 May 2026 08:00:39 GMT

Every API returns errors. The question is whether those errors are useful to the consumer or just a generic “something went wrong” message that forces them to guess what happened.

If your API returns { “error”: “An error occurred” } or, worse, a raw exception stack trace, your consumers are working harder than they should. There is a standard for this, and ASP.NET Core supports it out of the box.

In this post, I will show you how to implement consistent, machine-readable error responses using Problem Details (RFC 9457) and the `IExceptionHandler` interface.

What Is Problem Details?

Problem Details is an RFC standard (RFC 9457, which replaced RFC 7807) that defines a consistent format for HTTP API error responses. The content type is `application/problem+json`, and every error includes these fields:

{
  "type": "https://tools.ietf.org/html/rfc9110#section-15.5.5",
  "title": "Not Found",
  "status": 404,
  "detail": "Article with ID '3fa85f64' was not found.",
  "instance": "/api/articles/3fa85f64"
}

type: A URI reference that identifies the problem type
title: A short, human-readable summary
status: The HTTP status code
detail: A human-readable explanation specific to this occurrence
instance: A URI reference identifying this specific occurrence

The format is extensible. You can add custom properties for additional context, like validation errors or trace IDs.

Setting Up in ASP.NET Core

The setup requires just three lines in your Program.cs:

builder.Services.AddProblemDetails();

builder.Services.AddExceptionHandler();

var app = builder.Build();

app.UseExceptionHandler();

AddProblemDetails() configures the framework to emit Problem Details responses from built-in middleware. AddExceptionHandler() registers your custom exception handler. UseExceptionHandler() activates the middleware early in the pipeline.

Writing an Exception Handler

The IExceptionHandler interface has a single method: TryHandleAsync. It receives the HttpContext and the Exception, and returns true if it handled the exception or false to pass it to the next handler.

public class GlobalExceptionHandler : IExceptionHandler
{

  private readonly ILogger _logger;

  public GlobalExceptionHandler(ILogger logger)
  {
    _logger = logger;
  }

  public async ValueTask TryHandleAsync(
    HttpContext httpContext,
    Exception exception,
    CancellationToken cancellationToken)
  {
    _logger.LogError(exception, “An unhandled exception occurred”);

    var (statusCode, title) = exception switch
    {
      NotFoundException => (StatusCodes.Status404NotFound, “Not Found”),
      ValidationException => (StatusCodes.Status400BadRequest, “Validation Error”),
      UnauthorizedAccessException => (StatusCodes.Status403Forbidden, “Forbidden”),
      _ => (StatusCodes.Status500InternalServerError, “Internal Server Error”)
    };

    var problemDetails = new ProblemDetails
      {
        Status = statusCode,
        Title = title,
        Detail = exception.Message,
        Instance = httpContext.Request.Path
      };

    httpContext.Response.StatusCode = statusCode;
    await httpContext.Response.WriteAsJsonAsync(problemDetails, cancellationToken);
    return true;
  }
}

Handler Chaining

One of the best features of IExceptionHandler is that you can register multiple handlers. They are called in the order they were registered, and the first handler that returns true wins.

This lets you write focused handlers for specific exception types:

// Handles validation exceptions specifically

public class ValidationExceptionHandler : IExceptionHandler
{
  public async ValueTask TryHandleAsync(
    HttpContext httpContext,
    Exception exception,
    CancellationToken cancellationToken)

  {
    if (exception is not ValidationException validationException)
      return false;

    var problemDetails = new ValidationProblemDetails(
        validationException.Errors
          .GroupBy(e => e.PropertyName)
          .ToDictionary(g => g.Key, g => g.Select(e => e.ErrorMessage).ToArray()))
    {
      Status = StatusCodes.Status400BadRequest,
      Title = “Validation Error”,
      Instance = httpContext.Request.Path
    };

    httpContext.Response.StatusCode = StatusCodes.Status400BadRequest;
    await httpContext.Response.WriteAsJsonAsync(problemDetails, cancellationToken);
    return true;
  }
}

builder.Services.AddExceptionHandler();
builder.Services.AddExceptionHandler();
builder.Services.AddExceptionHandler(); // fallback

The validation handler checks if the exception is a ValidationException. If not, it returns false, and the next handler gets a chance. The global handler at the end catches everything else.

The Evolution of Error Handling in ASP.NET Core

It is worth understanding how we got here, because you will see all of these approaches in existing codebases:

1. Try-Catch in Every Controller (Don’t Do This)

[HttpGet("{id}")]
public async Task GetArticle(Guid id)
{
  try
  {
    var article = await _repository.GetByIdAsync(id);
    return Ok(article);
  }
  catch (NotFoundException)
  {
    return NotFound();
  }
  catch (Exception ex)
  {
    _logger.LogError(ex, "Error getting article");
    return StatusCode(500);
  }
}

This duplicates error handling across every action method and produces inconsistent error responses.

2. Exception Handling Middleware (Manual)

A custom middleware that wraps the pipeline in a try-catch. Better than per-controller handling, but you end up writing the plumbing yourself.

3. UseExceptionHandler with Lambda

app.UseExceptionHandler(errorApp =>
{
  errorApp.Run(async context =>
  {
    context.Response.StatusCode = 500;
    await context.Response.WriteAsJsonAsync(
    new ProblemDetails { Title = “Error” });
  });
});

Functional, but limited. No access to the exception type for differentiated responses.

4. IExceptionHandler (.NET 8+, Recommended)

The current best practice. Structured, chainable, testable, and fully integrated with the Problem Details framework.

.NET 9+: StatusCodeSelector

.NET 9 introduced StatusCodeSelector, which simplifies common exception-to-status-code mappings:

builder.Services.AddProblemDetails();
app.UseExceptionHandler(new ExceptionHandlerOptions
{
    StatusCodeSelector = ex => ex switch
    {
        ArgumentException => StatusCodes.Status400BadRequest,
        UnauthorizedAccessException => StatusCodes.Status401Unauthorized,
        _ => StatusCodes.Status500InternalServerError
    }
});

For straightforward mappings where you do not need custom logic, this significantly reduces boilerplate.

Adding Custom Extensions

Problem Details is extensible. Add a trace ID, error code, or any other context your consumers need:

var problemDetails = new ProblemDetails
{
  Status = statusCode,
  Title = title,
  Detail = exception.Message
};

problemDetails.Extensions["traceId"] =
Activity.Current?.Id ?? httpContext.TraceIdentifier;
problemDetails.Extensions["errorCode"] = "ARTICLE_NOT_FOUND";

This produces:

{
"type": "https://tools.ietf.org/html/rfc9110#section-15.5.5",
"title": "Not Found",
"status": 404,
"detail": "Article with ID '3fa85f64' was not found.",
"traceId": "00-abc123-def456-01",
"errorCode": "ARTICLE_NOT_FOUND"
}

Conclusion

Consistent error responses are not optional for a professional API. Problem Details provides a standard format your consumers can rely on, and IExceptionHandler offers a clean, testable way to produce those responses.

The setup is minimal: register Problem Details, write your exception handlers, and activate the middleware. From that point on, every error your API returns follows the same structure: the correct status code, a useful message, and any additional context the consumer needs.

Stop inventing error formats. Use the standard. Your API consumers will thank you.

Database Sharding

Vincent Nyanga — Fri, 15 May 2026 08:02:02 GMT

Introduction

Imagine you run a library. At first, one building holds all the books, and a single catalogue helps visitors find what they need. But the library grows. The shelves overflow, the aisles are packed, and the catalogue desk has a line out the door. You have two choices: build a bigger building (vertical scaling) or open multiple branches across the city and distribute the collection (horizontal scaling).

Database sharding is the second option. It’s the practice of splitting your data across multiple database instances, called shards, so that no single server has to bear the entire load. In this post, we’ll explore how sharding works, the strategies you can use, how to pick the right shard key, and the real-world pain of cross-shard queries. By the end, you’ll have a solid understanding of when sharding makes sense and how companies like Instagram, Discord, and Slack have implemented it at scale.

Let’s get started!

What Is Database Sharding?

Sharding is a form of horizontal partitioning where rows of a database table are distributed across multiple independent database instances. Each shard holds a subset of the data, and together they represent the complete dataset.

Unlike read replicas (which duplicate the entire dataset for read scaling), sharding splits the data itself. This means both reads 𝘢𝘯𝘥 writes scale horizontally, something replicas alone cannot achieve.

Going back to our library analogy: read replicas are like printing extra copies of the catalogue so more people can look things up at the same time. Sharding is like distributing the actual books across multiple branches, so no single building runs out of shelf space.

Sharding Strategies

There are three primary strategies for deciding which shard holds which data. Each comes with trade-offs.

Range-Based Sharding

Data is partitioned based on continuous ranges of the shard key. For example, user IDs 1 to 1,000,000 go to Shard A, 1,000,001 to 2,000,000 go to Shard B, and so on.

Advantages:

Keeps adjacent data together, making range scans efficient
Simple to implement and easy to reason about

Disadvantages:

Creates hotspots when new data clusters at one end of the range (the shard holding the newest data gets hammered with writes)
Leads to unbalanced shards over time

Best for: Time-series data where range queries are the dominant access pattern.

Hash-Based Sharding

A hash function is applied to the shard key, and the result determines which shard stores the record:

shard_number = hash(user_id) % number_of_shards

Advantages:

Distributes data and traffic most evenly across shards
Dramatically reduces hotspot risk

Disadvantages:

Range queries become expensive; fetching user IDs 2M to 3M may scatter across hundreds of shards, forcing scatter-gather operations
Adding or removing shards changes the modulo value, meaning most keys need rehashing

This is the most common general-purpose strategy. To mitigate the rehashing problem, many systems use consistent hashing, where data and servers are placed on a virtual ring. Adding a node only requires redistributing roughly 1/N of the data, rather than rehashing everything.

Best for: High-write workloads with primarily point lookups (get by ID).

Directory-Based (Lookup) Sharding

A centralised lookup table maps shard keys to specific shards. Every query first consults this directory to find the target shard.

Advantages:

Maximum flexibility: move users between shards without changing application logic
Isolate high-traffic tenants onto dedicated shards

Disadvantages:

The directory itself can become a bottleneck or a single point of failure
Cache misses on the directory introduce extra network hops

Best for: Multi-tenant systems where tenants vary wildly in size.

Here’s a quick comparison:

Choosing the Right Shard Key

If sharding is the commitment, the shard key is the marriage vow. Get it wrong, and you’ll feel the pain for years. Here’s what makes a good shard key:

High cardinality: The key must have many unique values. A user_id with millions of values is far better than a continent with only 7. Low cardinality literally caps how many effective shards you can have.
Even distribution: The ideal key distributes data roughly equally across all shards.
Alignment with query patterns: Choose a key that appears in your most common WHERE clauses. If most queries filter by customer_id, that should be your shard key. When queries lack a shard key, they become scatter-gather queries that are broadcast to every shard.
Immutability: The shard key should rarely (or never) change. Updating it means physically moving a row between shards.

Common Mistakes

1. Monotonically increasing keys (timestamps, auto-increment IDs) with range-based sharding: all new inserts hit the last shard forever, creating a permanent write hotspot.

2. Low-cardinality fields like status or country: you can never have more effective shards than you have distinct values.

3. Misalignment with queries: Sharding by region when queries primarily filter by customer_id forces every customer query to scan all shards.

The Cross-Shard Query Problem

This is where the real pain lives. Sharding breaks the relational model. If a user’s account lives on Shard 1 and their transaction record lives on Shard 2, you can’t use a single database transaction or a simple JOIN.

Think of it like our library branches: if a researcher needs books from three different branches, they can’t just walk to one shelf. They need to visit (or call) each branch, collect the results, and piece together what they need.

Solutions

1. Data Colocation (Prevention is Better Than Cure)

Design your schema so related data lands on the same shard. For example, if you shard by user_id, make sure a user’s orders, preferences, and activity logs all use user_id as the shard key. This is the most important technique. Avoid cross-shard queries by design.

Pinterest does exactly this: when inserting a Pin, they prefer to place it on the same shard as its parent board.

2. Reference Tables (Broadcast Tables)

Replicate small, slowly-changing tables (countries, categories, config) across all shards. Joins against reference data become local operations.

3. Eventual Consistency / CQRS

Accept that some queries will be eventually consistent. Materialise cross-shard views asynchronously, separating read models from write models.

4. Modified Two-Phase Commit

Dropbox’s Edgestore uses a modified 2PC with an external durable transaction record for cross-shard transactions. The leader writes a durable transaction record before the cross-shard operation, and concurrent requests only need to check this record to determine the transaction state. This approach handles 10 million requests per second across thousands of MySQL nodes, with only 5-10% of transactions being cross-shard.

Real-World Examples

Instagram: Embedded Shard IDs

Instagram chose sharded PostgreSQL over NoSQL solutions. Their 64-bit ID scheme encodes the shard directly into the primary key:

| 41 bits: timestamp | 13 bits: shard ID | 10 bits: sequence |

By reading the ID, the application knows exactly which shard to query. No lookup table needed. This supports 8,192 logical shards and 1,024 IDs per millisecond per shard.

Discord: From Cassandra to ScyllaDB

Discord stored trillions of messages across 177 Cassandra nodes, but hot partitions and JVM garbage collector pauses caused latency spikes. They migrated to ScyllaDB (a Cassandra-compatible database written in C++), reducing their cluster to 72 nodes while dropping p99 read latency from 40-125ms to just 15ms.

Slack: Vitess Migration

Slack spent three years migrating 99% of their MySQL traffic to Vitess, an open-source sharding middleware. At peak, they handle 2.3 million queries per second (2M reads, 300K writes). The migration eliminated database hotspots and enabled new features like Slack Connect and international data residency.

When Should You Actually Shard?

Sharding should be a last resort. Before you commit, exhaust these options first:

1. Vertical scaling (more RAM, faster CPU, SSD)

2. Read replicas for read-heavy workloads

3. Query optimisation and better indexing

4. Caching layers (Redis, Memcached)

5. Connection pooling

6. Table partitioning (single-node)

Shard when:

Your data physically doesn’t fit on one machine
Write throughput exceeds what one server can handle
You need geographic data residency for compliance (GDPR)
You’ve tried everything above, and it’s still not enough

Conclusion

Database sharding is a powerful scaling tool, but it’s not free. It introduces operational complexity: schema changes must be coordinated across all shards, backups become more involved, monitoring must cover every shard, and failure modes multiply.

The companies that do it well, Instagram, Discord, Slack, Pinterest, all share a common trait: they designed their shard key and data model 𝘣𝘦𝘧𝘰𝘳𝘦 sharding, not after. They colocated related data, planned for resharding from day one, and built routing layers to keep application code clean.

If you’re considering sharding, start with the shard key. Get that right, and most of the other problems become manageable. Get it wrong, and you’ll have a distributed system that’s slower and harder to operate than the single database you started with.

That’s it for this post. If you want to explore further, I recommend reading Instagram’s engineering blog on their ID generation scheme and Slack’s write-up on their Vitess migration. Both are excellent examples of sharding done well.

Caching Strategies Explained

Vincent Nyanga — Fri, 01 May 2026 08:01:28 GMT

Think of caching like a kitchen prep station. A good chef doesn’t fetch every ingredient from the pantry for every order. They prep the most-used ingredients and keep them within arm’s reach. But prep too much, and the food goes stale. Prep the wrong things, and you’re still running back and forth to the pantry anyway.

Caching in software works the same way. The right strategy keeps your system fast and your database healthy. The wrong one gives you stale data, wasted memory, or a database that gets crushed the moment a cache key expires.

In this post, we’ll walk through the six major caching strategies, when each one shines, and where each one falls apart. We’ll also cover the common pitfalls that catch teams off guard and how .NET’s caching stack has evolved to address them.

Let’s get started!

Cache-Aside (Lazy Loading)

This is the strategy most developers learn first, and for good reason. It’s simple, and it works.

How it works:

1. Application receives a read request

2. Checks the cache for the data

3. On a hit, returns data from cache

4. On a miss, queries the database, writes the result to cache, then returns it

For writes, the application writes directly to the database and invalidates the cache entry. The next read will repopulate it.

Pros:

Simple to implement and reason about
Resilient to cache failures (falls back to the database)
Only requested data is cached (no wasted memory)

Cons:

First request for any key always hits the database (cold start)
Risk of stale data if invalidation is missed
Every service accessing the data must implement the pattern correctly

Best for: General-purpose read-heavy workloads. E-commerce product catalogues, user profiles, and content management systems.

Cache-aside is the safe default. If you’re unsure which strategy to use, start here.

Read-Through

Read-through looks similar to cache-aside, but there’s an important difference: the cache itself is responsible for fetching data on a miss, not the application.

How it works:

1. Application requests data from the cache

2. On a hit, the cache returns the data

3. On a miss, the cache fetches from the database, stores the result, and returns it

The application never talks to the database directly for reads. It only talks to the cache.

Pros:

Cleaner application code (no cache-miss logic scattered everywhere)
Enforces separation of concerns

Cons:

The cache provider must know how to query your database (tighter coupling)
More complex setup and configuration

Best for: Read-heavy workloads with predictable access patterns. News feeds, product listings, reference data.

Read-through is often paired with write-through or write-behind for a complete caching solution.

Write-Through

Write-through is the strategy you reach for when consistency matters more than write speed.

How it works:

1. Application writes data to the cache

2. The cache synchronously writes the same data to the database

3. Both writes must succeed before the caller gets an acknowledgement

Because every write goes through the cache first, reads are always up to date. There’s no stale data window.

Pros:

Strong consistency between cache and database
Reads are always fast (cache is always warm for recently written data)
Simple mental model for data freshness

Cons:

Higher write latency (you’re waiting for two writes on every operation)
Write-heavy workloads take a significant performance hit
Data that’s written but rarely read still occupies cache memory

Best for: Financial systems, inventory management, and user sessions. Anywhere the cost of serving stale data exceeds the cost of slower writes.

Write-Behind (Write-Back)

Write-behind flips the consistency tradeoff. It prioritises write speed and accepts eventual consistency.

How it works:

1. Application writes data to the cache

2. Cache immediately acknowledges the write

3. In the background, the cache batches and flushes writes to the database asynchronously

The application never waits for the database write to complete. This gives you the lowest write latency of any strategy.

Pros:

Extremely low write latency
Batching reduces database load (10 individual writes become 1 batch insert)
Excellent for write-heavy workloads

Cons:

If the cache node crashes before flushing, that data is lost
Eventual consistency between cache and database
Debugging is harder (database state lags behind cache state)

Best for: Analytics event ingestion, social media activity feeds (likes, views, impressions), IoT sensor data, logging systems.

I need to emphasise: only use write-behind when you can tolerate some data loss, or when you have cache replication in place as a safety net.

Write-Around

Write-around is the quiet one. It doesn’t get talked about much, but it solves a specific and common problem: cache pollution.

How it works:

1. Application writes data directly to the database, bypassing the cache entirely

2. The cache is not updated on writes

3. Reads follow cache-aside or read-through; data enters the cache only when it’s actually requested

Pros:

Prevents the cache from filling up with data that’s written once and never read
Cache memory is reserved for frequently accessed data
Simple write path

Cons:

First read after a write always hits the database
Not suitable if writes are immediately followed by reads

Best for: Log ingestion, audit trails, batch data imports, and real-time chat message storage. Any workload where you write far more data than you read.

Refresh-Ahead

Refresh-ahead is the proactive strategy. Instead of waiting for a cache entry to expire and then suffering a miss, it refreshes entries before they expire.

How it works:

1. Application reads from cache as normal

2. When an entry is within a configurable window before expiration (say, 80% through its TTL), the cache returns the current value immediately and triggers an asynchronous background refresh

3. The background job fetches fresh data and updates the cache before the TTL expires

Pros:

Eliminates latency spikes for popular keys
Predictable, consistent response times
Users always hit a warm cache for hot data

Cons:

Wasted refreshes for entries that nobody accesses again
Increased backend load from proactive fetches
Requires predictable access patterns to be cost-effective

Best for: Dashboards, leaderboards, stock tickers, and news homepage data. Anywhere you have high-traffic keys that must always be fast.

Combining Strategies

In practice, most production systems don’t use a single strategy. They combine them:

Cache-Aside + Write-Around: the most common pairing. Reads are cached on demand. Writes bypass the cache. Simple and effective for most CRUD applications.
Read-Through + Write-Through: full cache mediation. The application never touches the database directly. Strong consistency with clean code.
Read-Through + Write-Behind: high-throughput systems that need both fast reads and fast writes, and can tolerate eventual consistency.
Cache-Aside + Refresh-Ahead: for the critical hot paths. Most data uses regular cache-aside. The top 1% of keys get a proactive refresh.

The right combination depends on your consistency requirements, read/write ratio, and tolerance for complexity.

The Pitfalls Nobody Warns You About

Cache Stampede (Thundering Herd)

A popular cache key expires. Hundreds of concurrent requests see the miss, and all hit the database simultaneously.

This is exactly what caused one of Facebook’s biggest outages in 2010. A configuration change invalidated a frequently-accessed cache entry. The resulting stampede overwhelmed their database, and error-handling logic made it worse by deleting more cache keys, creating a self-reinforcing cascade that lasted 2.5 hours.

Solutions: distributed locking (only one request fetches, others wait), probabilistic early expiration (random chance of refreshing before TTL), or use .NET’s HybridCache, which handles this automatically with built-in stampede protection.

Cache Avalanche

Many keys expire at the same time. This typically happens when you set the same TTL on everything at startup.

Solution: Add random jitter to your TTLs. Instead of 10 minutes for every key, use 10 minutes + a random 0-60 seconds.

Cache Penetration

Requests for keys that don’t exist in the cache or the database. Every request passes through to the database. Often caused by bots or bugs.

Solution: Cache null results with a short TTL. For heavy traffic, use a Bloom filter to quickly reject keys that are known not to exist.

.NET’s Caching Stack in 2026

The .NET caching story has matured significantly:

IMemoryCache: in-process, single-server. Fast, but lost on restart and not shared across instances.
IDistributedCache: abstraction over Redis, SQL Server, and CosmosDB. Shared across instances but requires serialisation.
HybridCache (.NET 9+): the new recommended default. Combines L1 (in-process) + L2 (distributed) with stampede protection and tag-based invalidation out of the box.

HybridCache deserves special attention. It eliminates the hand-rolled cache-aside boilerplate that’s in every .NET codebase:

// Before: manual cache-aside

var product = await cache.GetAsync($”product-{id}”);

if (product is null)
{
  product = await db.Products.FindAsync(id);

  await cache.SetAsync($”product-{id}”, product,

  new DistributedCacheEntryOptions

  {

    AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)

  });

}

// After: HybridCache

var product = await cache.GetOrCreateAsync(
  $”product-{id}”,
  async token => await db.Products.FindAsync(id, token),
  tags: [”products”, $”category-{categoryId}”]
);

// Bulk invalidation by tag
await cache.RemoveByTagAsync($”category-{categoryId}”);

If 100 requests arrive for the same missing key simultaneously, HybridCache runs the factory method once and returns the same result to all 100 requests. No stampede. No duplicate database calls.

Conclusion

Caching is not a single tool. It’s a toolkit. Cache-aside is the safe starting point, but knowing when to reach for write-through, write-behind, or refresh-ahead is what separates systems that scale from systems that buckle under load.

Like everything in software engineering, one needs to weigh the pros and cons. Strong consistency costs write latency. Low write latency risks data loss. Proactive refresh costs backend resources. There’s no free lunch.

My advice: start with cache-aside. Add complexity only when you have evidence that it’s needed. Always set TTLs as a safety net. And if you’re on .NET 9+, make HybridCache your default; it handles the hardest problems (stampede protection, L1+L2, tag invalidation) so you don’t have to.

Full-Text Search with PostgreSQL

Vincent Nyanga — Fri, 17 Apr 2026 08:00:53 GMT

When someone says “we need search”, the conversation almost always jumps to Elasticsearch or Algolia. It’s a reflex at this point. But here’s the thing: PostgreSQL has had a full-text search engine baked into its core since version 8.3, released in 2008. That’s 18 years of battle-tested search sitting right inside the database you’re probably already running.

In this post, we’ll explore how PostgreSQL full-text search works, when it’s the right choice, and how to set it up with practical code examples. By the end, you’ll have everything you need to build a solid search feature without spinning up a single extra service.

How PostgreSQL Full-Text Search Works

At its core, Postgres full-text search revolves around two data types: tsvector and tsquery.

A tsvector is a sorted list of distinct lexemes (think of them as normalised words). When you convert text into a tsvector, Postgres applies language-specific processing: it strips stop words like “the” and “is”, and it stems words to their root form. “Running”, “runs”, and “ran” all become “run”.

SELECT to_tsvector(’english’, ‘The quick brown foxes were jumping’);

-- Result: ‘brown’:3 ‘fox’:4 ‘jump’:5 ‘quick’:2

Notice how “foxes” became “fox” and “jumping” became “jump”. The numbers represent positions in the original text.

A tsquery represents your search query. It supports Boolean operators like `&` (AND), `|` (OR), `!` (NOT), and `<->` (FOLLOWED BY) for phrase matching.

SELECT to_tsquery(’english’, ‘quick & fox’);

-- Result: ‘quick’ & ‘fox’

To check whether a document matches a query, you use the `@@` operator:

SELECT to_tsvector(’english’, ‘The quick brown fox’)

@@ to_tsquery(’english’, ‘quick & fox’);

-- Result: true

Simple enough. But this alone would be slow on a real table. That’s where indexes come in.

Setting Up Search: The Three-Step Pattern

In my experience, you need exactly three things to get production-ready full-text search in Postgres: a tsvector column, a GIN index, and a ranking query. Let’s walk through each.

Step 1: Add a Generated tsvector Column

Since PostgreSQL 12, you can use a generated column that automatically computes the tsvector whenever the row changes. This is the cleanest approach because there are no triggers to maintain.

CREATE TABLE articles (

id SERIAL PRIMARY KEY,

title VARCHAR(255) NOT NULL,

body TEXT NOT NULL,

author VARCHAR(100),

search_vector tsvector GENERATED ALWAYS AS (

setweight(to_tsvector(’english’, coalesce(title, ‘’)), ‘A’) ||

setweight(to_tsvector(’english’, coalesce(body, ‘’)), ‘B’) ||

setweight(to_tsvector(’english’, coalesce(author, ‘’)), ‘C’)

) STORED
);

The `setweight()` function assigns importance levels. Weight ‘A’ is the highest, ‘D’ the lowest. Here, title matches will rank higher than body matches, which rank higher than author matches. This is exactly what users expect from search results.

Step 2: Create a GIN Index

A GIN (Generalised Inverted Index) works like the index at the back of a textbook. It maps each word to a list of rows containing that word. Without this index, Postgres would need to scan every row and compute tsvectors on the fly.

CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

GIN indexes are about 3x faster for lookups than the alternative GiST index. The trade-off is that they’re slower to build and larger on disk, but for read-heavy search workloads (which is most search workloads), GIN is the right choice.

Step 3: Query with Ranking

SELECT id, 
  title,
  ts_rank(search_vector, query) AS rank

FROM articles, websearch_to_tsquery(’english’, ‘database performance’) AS query

WHERE search_vector @@ query

ORDER BY rank DESC

LIMIT 20;

`websearch_to_tsquery` is a gem that was introduced in PostgreSQL 11. It accepts Google-like search syntax: quoted phrases for exact match, `-` for exclusion, and implicit AND between words. It handles messy user input gracefully, which makes it perfect for user-facing search boxes.

Want to show highlighted snippets? Use `ts_headline`:

SELECT id,
     title,
     ts_headline(’english’, 
     body,
     websearch_to_tsquery(’english’, ‘database performance’),
    ‘StartSel=, StopSel=, MaxFragments=3’
) AS snippet
FROM articles
WHERE search_vector @@ websearch_to_tsquery(’english’, ‘database performance’);

Handling Typos with pg_trgm

One legitimate gap in Postgres FTS is its lack of typo tolerance. If a user searches for “postgressql” (note the typo), the full-text search won’t find “postgresql”. This is where the pg_trgm extension comes in.

Trigram matching breaks words into three-character sequences and compares them. It doesn’t care about stemming or language; it just measures how similar two strings look.

CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Create a trigram index on the title

CREATE INDEX idx_articles_title_trgm ON articles USING GIN (title gin_trgm_ops);

-- Fuzzy search

SELECT title, similarity(title, ‘postgressql’) AS sim

FROM articles

WHERE title % ‘postgressql’

ORDER BY sim DESC;

The real power comes from combining both approaches:

SELECT id, 
  title,
  ts_rank(search_vector,
  websearch_to_tsquery(’english’, ‘postgres’)) * 2.0 + similarity(title, ‘postgres’) AS combined_score

FROM articles

WHERE search_vector @@ websearch_to_tsquery(’english’, ‘postgres’)

OR title % ‘postgres’

ORDER BY combined_score DESC

LIMIT 20;

This gives you relevance-ranked results from full-text search, with a fallback to fuzzy matching for typos. It covers a surprising amount of ground.

Multi-Table Search with Materialised Views

Things get more interesting when you need to search across multiple tables. Imagine an e-commerce app where you want to search products by name, description, category, and tags. Joining four tables on every search query would be painfully slow.

The solution is a materialised view that precomputes the search vector:

CREATE MATERIALIZED VIEW product_search AS

SELECT p.id,
   p.name, 
   p.description, 
   c.name AS category_name,
   string_agg(t.name, ‘ ‘) AS tag_names,

setweight(to_tsvector(’english’, p.name), ‘A’) ||

setweight(to_tsvector(’english’, coalesce(p.description, ‘’)), ‘B’) ||

setweight(to_tsvector(’english’, coalesce(c.name, ‘’)), ‘C’) ||

setweight(to_tsvector(’english’, coalesce(string_agg(t.name, ‘ ‘), ‘’)), ‘D’)

AS search_vector

FROM products p

JOIN categories c ON p.category_id = c.id

LEFT JOIN product_tags pt ON p.id = pt.product_id

LEFT JOIN tags t ON pt.tag_id = t.id

GROUP BY p.id, p.name, p.description, c.name;

CREATE UNIQUE INDEX ON product_search (id);

CREATE INDEX ON product_search USING GIN (search_vector);

Now searches hit a single, pre-indexed table. Refresh it periodically with `REFRESH MATERIALIZED VIEW CONCURRENTLY` (the `CONCURRENTLY` keyword keeps the view available during refresh, so there’s no downtime).

The trade-off is that results can be slightly stale, usually by minutes. For most applications, however, that’s acceptable.

When Postgres FTS Is Enough (and When It Isn’t)

Here’s my rule of thumb: if search is a feature of your app, use Postgres. If search is your app, consider a dedicated engine.

Postgres FTS shines when:

Your dataset is under a few million rows
You want zero additional infrastructure
ACID compliance matters (no stale search results from sync lag)
Your team is small and can’t afford to maintain a separate search cluster

You might outgrow it when:

Search relevancy is the core product experience (e-commerce, content discovery)
You need built-in autocomplete suggestions, faceted navigation, or synonym handling
Your dataset exceeds tens of millions of rows with sub-50ms latency requirements

It’s also worth mentioning the extension ecosystem. Projects like pg_textsearch (from Timescale) bring BM25 ranking (the same algorithm Elasticsearch uses) directly into Postgres, with 4x faster top-k queries and 41% smaller indexes. ParadeDB’s pg_search is another option, built on a Rust-based search engine, delivering 20x faster ranking than native tsvector.

The line between “need a dedicated engine” and “Postgres is fine” keeps moving in Postgres’s favour.

Conclusion

PostgreSQL full-text search is one of the most underutilised features in the database world. A generated tsvector column, a GIN index, and websearch_to_tsquery give you stemming, ranking, phrase search, and multi-language support with no extra infrastructure. Add pg_trgm for typo tolerance and materialised views for multi-table search, and you’ve covered what most applications need.

I have created a simple project that showcases how all this works. You can check it out on GitHub.

Vector Search vs Keyword Search: Choose the Right Tool, Not the Trendy One

Vincent Nyanga — Fri, 03 Apr 2026 08:00:32 GMT

Imagine you’re looking for a book in two different libraries. The first library has a perfect card catalogue; every word in every book is indexed. You walk in, say “ERROR 1045”, and the librarian hands you the exact page. Fast, precise, surgical.

The second library has a librarian with deep reading comprehension. You walk in and say, “I can’t connect to my database,” and she understands what you mean, walks you to the right shelf, and pulls three books that might help, even if none of them uses the phrase “can’t connect.”

Both librarians are valuable. The mistake is assuming the second one makes the first one obsolete.

That’s the vector search vs keyword search debate in a nutshell. And in this post, I’ll break down exactly when each approach wins, where hybrid search bridges the gap, and why choosing vector search purely because it sounds more “AI” is how you build an expensive solution to the wrong problem.

Let’s get started.

What is Keyword Search?

Keyword search (also called lexical search) finds documents by matching the exact tokens in your query against an inverted index of your corpus. The dominant algorithm is BM25 (Best Match 25), a ranking function that weighs matches by term frequency and how rare the term is across the document collection.

BM25 is 30 years old. It powers Elasticsearch, OpenSearch, and Solr. It needs no model, no GPU, no embedding API. It’s deterministic, the same query always returns the same results, and it’s fast. Sub-millisecond retrieval at millions of documents.

It is, to put it plainly, still extremely good.

What is Vector Search?

Vector search (semantic search) converts your documents and queries into high-dimensional numerical vectors using an embedding model, something like OpenAI’s `text-embedding-3-small`, Cohere Embed, or open-source models like E5 or BGE. Once embedded, documents and queries live in the same vector space, and similarity is measured by cosine similarity or dot product.

The key insight: semantically similar content ends up close together in vector space, even if the words are completely different. “I can’t log in” and “authentication failure” end up near each other. BM25 would miss that connection entirely.

Vector search enables retrieval based on meaning, not just tokens. That’s genuinely powerful. But it comes with a cost that teams consistently underestimate.

When Keyword Search Wins

This is the part that gets skipped in most “AI search” blog posts.

Exact Matches That Must Be Exact

If a user searches for `ERROR 1045`, they want documents containing exactly that. A vector search might surface related database authentication errors conceptually relevant, but not the one the user typed. The same applies to:

- Product SKUs: “iPhone 15 Pro Max 256GB” must return that exact model, not the nearest semantic neighbour

- Order IDs, serial numbers, account numbers

- Medical codes (ICD-10), legal citations, regulatory references

- API function names, config flags, library import paths

In these cases, a 30x performance advantage for BM25 and exact precision makes it the obvious choice.

Cost and Operational Reality

Vector search has hidden costs that only reveal themselves in production:

Embedding latency: Small embedding models run at ~16ms. Large models (7B+ parameters) sit at 187-221ms — over 10x slower. At user-facing latency budgets, this matters significantly.
Infrastructure overhead: You need an embedding model (hosted or self-managed), a vector database, and a sync pipeline to keep it up to date. BM25 runs on Elasticsearch you’re already operating.
Embedding model lock-in: If you switch embedding models, you must re-embed your entire document corpus. New query vectors won’t align with old document vectors. That’s a silent, expensive migration with no warning signs until results degrade.
Index degradation at scale: HNSW (the algorithm powering most vector DBs). Recall can drop by 10%+ as the database grows from 50k to 200k vectors. Your infrastructure dashboards look fine. Your search quality is quietly getting worse.

When Vector Search Wins

Vector search earns its complexity in specific scenarios.

The Words Don’t Match

The most common scenario: a user describes what they need without knowing the exact terminology used in your documents. “My app is getting slow under load” should surface articles about performance optimisation, connection pooling, and caching strategies, even if none of them uses the phrase “getting slow.”

This is where keyword search fails completely, and vector search excels.

Multilingual Search

Embedding models are trained on multilingual corpora. A query in English can retrieve semantically similar documents written in Spanish, French, or German. BM25 requires explicit multilingual tokenisation pipelines to even attempt this.

Recommendation and Similarity

“More like this article” queries, duplicate detection, and recommendation engines are a natural fit for vector similarity. Find the documents closest to a given embedding; there’s no keyword equivalent.

The Real Answer: Hybrid Search

Here’s the thing neither camp wants to admit: the best production search systems use both.

**Hybrid search** combines BM25 keyword results and vector search results, then merges them using **Reciprocal Rank Fusion (RRF)**. Instead of trying to normalise incompatible scores, RRF uses rank position. The formula:

score = 1 / (rank + k)

Where `rank` is the document’s position in either result list, and `k` (typically 60) prevents top-ranked documents from dominating too aggressively.

A document appearing at position 1 in the BM25 list and position 2 in the vector list scores very high. A document appearing at position 1 in only one list scores lower. The result: exact-match precision from BM25, semantic recall from vector search, merged into a single ranked list that consistently outperforms either approach alone.

Here’s what this looks like in practice with LangChain:

from langchain.retrievers import BM25Retriever, EnsembleRetriever

from langchain_community.vectorstores import Chroma

# BM25 retriever over your document corpus

bm25_retriever = BM25Retriever.from_documents(docs)

bm25_retriever.k = 5

# Vector retriever with your embedding model

vectorstore = Chroma.from_documents(docs, embedding_function)

vector_retriever = vectorstore.as_retriever(search_kwargs={”k”: 5})

# Hybrid retriever using RRF (weights: 50/50)

ensemble_retriever = EnsembleRetriever(

retrievers=[bm25_retriever, vector_retriever],

weights=[0.5, 0.5]

)

results = ensemble_retriever.invoke(”authentication timeout error”)

This retriever does the right thing for both query types: a search for “ERROR 1045” gets exact BM25 precision; a search for “why does my login keep failing” gets semantic vector recall. You don’t have to choose.

You can tune the weights based on your domain. Code search might run 70% BM25 / 30% vector. A customer support chatbot might flip to 30% BM25 / 70% vector. The EnsembleRetriever makes this trivially adjustable.

Native hybrid search is now supported in: Azure AI Search, Elasticsearch 8+, OpenSearch 2.19, Weaviate, Chroma, pgvector + pg_bm25 (ParadeDB), SingleStore, and MariaDB.

The Decision Framework

Like everything in software engineering, one needs to weigh the pros and cons. Here’s a practical guide:

Error codes, SKUs, order IDs: Use keyword only (BM25)
Code search, API documentation: Use keyword-dominant hybrid (70/30)
Customer support FAQ: Use balanced hybrid (50/50)
Conversational / intent-driven search: Use vector-dominant hybrid (30/70)
“More like this” recommendations: Use vector only
Multilingual search: Use vector only

I need to emphasise this: the question is never “which is better?” The question is “what does my query distribution actually look like, and what does my current infrastructure already support?”

Conclusion

Vector search is a genuinely powerful tool. Embedding-based retrieval unlocks search experiences that keyword matching simply cannot deliver. I’m not arguing against it.

But I’ve seen teams rip out working Elasticsearch deployments to stand up a vector database, a managed embedding API, a sync pipeline, and a new infrastructure dependency, for a search use case that was doing just fine with BM25.

The hype around vector search is real. The problems it solves are real. So are the costs, the operational complexity, and the cases where a 30-year-old ranking function still does the job better.

Don’t choose based on what sounds more modern. Choose based on what your users are actually searching for. And if you’re unsure, hybrid search gives you both, with a tunable dial between precision and recall.

Start there. Optimise from data, not from trends.

CQRS: The Pattern That Sounds Simple Until You Ship It to Production

Vincent Nyanga — Fri, 20 Mar 2026 08:01:02 GMT

“For most systems, CQRS adds risky complexity.” Those were the words of Martin Fowler, and I fully agree with him. Yet teams keep implementing it, often for all the wrong reasons. Here’s the uncomfortable reality: CQRS is a high-complexity pattern that separates read and write operations into distinct models. When applied unnecessarily, it becomes an expensive architectural mistake you’ll pay for with every sprint.

But when applied correctly, to the right problems, at the right organisational maturity level? CQRS enables capabilities that traditional architectures simply can’t match. The trick is knowing the difference before you commit.

Pattern Fundamentals: What You’re Actually Building

CQRS divides systems into Commands (operations that change state) and Queries (read-only operations that return data). This separation enables independent optimisation: write models focus on transactional integrity, while read models optimise for query performance.

Logical CQRS: maintains separation only at the code level, using the same database for reads and writes. This delivers architectural clarity without operational complexity, which is ideal for teams that want clear boundaries without eventual consistency headaches.

Physical CQRS uses separate databases, enabling independent scaling and storage optimisation. Your write side might use PostgreSQL for transactional integrity, while your read side leverages MongoDB and Elasticsearch. The benefit is powerful. The cost? Eventual consistency, synchronisation complexity, and operational overhead are often severely underestimated by many teams.

CQRS + Event Sourcing: Double or Nothing

Event Sourcing stores state as a chronological sequence of events rather than a current snapshot. Combined with CQRS, commands generate events stored in event stores (EventStoreDB, Kafka, etc.), which projections consume to build optimised read models.

This combination provides capabilities that sound transformative: complete audit trails, time-travel debugging by replaying events, and multiple read model projections from the same event stream.

The reality check arrives in production. You’re now managing sophisticated projection systems, event schema versioning, and operational complexity requiring expertise most teams don’t possess. Event Sourcing isn’t “CQRS plus some extras”. It’s a fundamentally different architectural commitment with substantially higher implementation risk.

The Decision Framework: When CQRS Makes Sense (And When It Doesn’t)

Implement CQRS when:

- Your domain exhibits genuine complexity, benefiting from Domain-Driven Design bounded contexts

- Read and write patterns are dramatically imbalanced, requiring independent scaling

- Different optimisation approaches for commands versus queries provide measurable value

- You’re building read-heavy applications with large analytical reports benefiting from pre-aggregated data

Avoid CQRS when:

- Your domain is simple, and CRUD interfaces suffice

- Teams lack distributed systems experience

- You’re building an MVP where speed to market trumps architectural sophistication

- Your application requires real-time consistency

As one architect who lived through a failed CQRS migration put it: “We spent six months implementing CQRS for a glorified CRUD app. It didn’t make us faster. It made us slower, indefinitely.”

Critical Pitfalls and How to Avoid Them

Over-Engineering Simple CRUD: The most common failure mode involves applying CQRS to systems that fit traditional data models perfectly. You’ve consumed development velocity due to synchronisation issues and operational overhead that outweigh any benefits. If your application is primarily forms-based rather than database-driven, CQRS is the wrong choice.

Ignoring Eventual Consistency: Applications showing stale data without loading indicators appear broken to users. One e-commerce team discovered customers abandoning carts because the UI showed outdated inventory. Solution: Version-based synchronisation returns version numbers with command results, then blocks queries until projections reach the requested version.

Poor Boundary Design: Unclear ownership between command and query models leads to duplicate logic and coupling. Treat events as first-class APIs requiring versioning discipline and backward compatibility.

Legacy Integration: CQRS architectures struggle with unmovable legacy components. If your architecture depends on deep legacy integration, CQRS complexity may outweigh separation benefits.

Production Challenges: The Hard Parts Nobody Mentions

Eventual Consistency Creates Real UX Problems

Updates to read stores lag behind event generation by milliseconds to seconds during high load. Users are redirected to dashboards after commands, and see nothing. This creates frustrating experiences in which applications appear broken even though they’re working exactly as designed.

The solution requires sophisticated implementation. Return version numbers with command responses. Implement wait handles in query handlers that block until projections reach the requested version. Design UIs with loading states, optimistic updates, or clear indicators that data is propagating.

Some domains can’t tolerate this. Financial transactions, inventory decrements, and real-time booking systems require immediate consistency. CQRS’s eventual consistency model creates unacceptable risk in these scenarios.

Projection Failures Require Sophisticated Recovery

Projections will fail due to network partitions, schema mismatches, resource exhaustion, and other factors. Distributed systems have failure modes that single-database applications never encounter. You need to monitor the tracking event store's disk usage, replication lag, projection latency, and failures. Health checks verifying projection completeness—replay capabilities. The most straightforward approach is to truncate read models and reapply all events.

One team running CQRS reported spending 40% more time on monitoring infrastructure compared to their previous monolithic architecture.

Event Schema Evolution Is a First-Class Problem

Events are immutable. Once written, they live in your event store forever. When business requirements change, you can’t just alter a database schema. You’re managing a versioned event catalogue that projections must interpret correctly across all historical versions.

Strategies that work: Event upcasting converts older versions to the current format during deserialization. Additive changes maintain backward compatibility. Version stamps maintain both formats when breaking changes are unavoidable. Event handlers must support all event versions your store contains.

This versioning discipline becomes a permanent tax on development velocity.

Distributed Tracing Becomes Non-Negotiable

Debugging request flows across CQRS boundaries without distributed tracing is archaeological work. Correlation IDs tracking operations from initial commands through read model updates are essential. Teams report debugging time increases by 35% in CQRS architectures compared to modular monoliths. The only way to manage this is world-class observability infrastructure from day one.

The Bottom Line

CQRS is a powerful pattern that solves specific problems at specific scales with specific team capabilities. It’s not a default architectural choice. It’s a high-complexity optimisation that makes sense when organisational maturity, domain complexity, and scaling requirements justify the investment.

The gap between CQRS success and failure isn’t understanding the pattern. It’s honestly assessing whether your context justifies its complexity. If it doesn’t, then you probably shouldn’t use it.

If you’re considering CQRS, the most critical question isn’t “how do we implement this?” It’s “Are we sure we need this?” Answer honestly, and you’ll save yourself months of complexity providing zero business value.

Further Reading

- Microsoft Learn - CQRS Pattern

- Martin Fowler - CQRS

- Microsoft CQRS Journey

- Confluent - Event Sourcing, CQRS, Stream Processing and Apache Kafka

- Event-Driven.io - CQRS Facts and Myths Explained

- TechTarget - 3 Common CQRS Pattern Problems

Why Your Brain Makes You Procrastinate (And What Actually Works to Stop It)

Vincent Nyanga — Fri, 06 Mar 2026 08:01:00 GMT

You know the feeling. You sit down to tackle an important task, and suddenly you’re reorganising your desk, checking Slack for the third time in five minutes, or reading articles about productivity instead of actually being productive.

If you’re working remotely, this problem has gotten worse. Recent research indicates that 88% of remote workers procrastinate at least once per week, resulting in a loss of up to 20% of their productivity. That’s an entire day each week disappearing into digital distractions and delayed tasks.

Here’s what I’ve learned from recent neuroscience research: procrastination isn’t a time management problem. It’s an emotion regulation problem. And once you understand what’s actually happening in your brain, you can use evidence-based strategies to overcome it.

Let us examine the science behind procrastination and the practical techniques that actually work.

What’s Actually Happening in Your Brain

Groundbreaking research published in Nature Communications in 2024 identified the core neurological mechanism behind procrastination: temporal discounting. Here’s how it works.

When you look at a task with a distant deadline or delayed reward, your brain’s valuation system significantly discounts its value. At the same time, your brain perceives the effort required later as less aversive than doing it now. This creates a cognitive trap: doing something later appears much less effortful but not much less rewarding.

The result? Your limbic system’s desire for immediate gratification consistently overrides your prefrontal cortex’s executive planning functions. It’s not laziness. It’s your brain’s default wiring.

About 46% of procrastination behaviours have genetic components. This isn’t a character flaw.

Researchers Dr Timothy Pychyl from Carleton University and Dr Fuschia Sirois from Durham University have spent years studying procrastination. Their 2024 research confirms what many of us have experienced: we procrastinate to escape negative feelings temporarily. When a task makes us anxious, frustrated, or bored, we postpone it to feel better in the moment.

The problem is what happens next. We feel guilty about procrastination, which elicits additional negative emotions, which in turn lead to further procrastination. This self-blame cycle causes more damage than the delay itself. Their longitudinal research shows strong correlations between chronic procrastination and severe health outcomes, including coronary heart disease and hypertension.

Why Remote Work Makes It Worse

If you’ve noticed yourself procrastinating more since shifting to remote work, you’re not imagining it. The digital transformation created a perfect storm for procrastination among knowledge workers.

Four key factors drive higher procrastination rates in remote environments:

Blurred boundaries. When your home is your office, work never really ends. But it also never really starts. The clear transition from “home mode” to “work mode” is absent, making it harder to engage in complex tasks.

Task ambiguity. Virtual communication is less clear than in-person conversations. When you’re not sure exactly what needs to be done or why it matters, your brain labels the task as “unstructured” or “ambiguous.” These are two of the seven core procrastination triggers.

Constant escape routes. Your phone, social media, YouTube, and personal tasks are always one click away. Research shows that continuous connectivity substantially increases exposure to interruptions. Every notification offers an easy escape from whatever uncomfortable task you’re facing.

Reduced accountability. When you’re working alone, no colleague is dropping by your desk to check progress. Research from Frontiers in Psychology in 2025 found that basic psychological needs for autonomy, competence, and relatedness negatively predict procrastination. Remote environments often fail to meet these needs, particularly the need for relatedness and connection.

Add decision fatigue to the mix, and you’ve got a recipe for chronic delay. Every choice you make throughout the day depletes the cognitive resources you need for self-control. By the afternoon, your ability to push through uncomfortable tasks is significantly diminished.

What Actually Works: Evidence-Based Strategies

A comprehensive meta-analysis of 24 intervention studies involving 1,173 participants reveals what actually reduces procrastination. The answer isn’t motivation or willpower. It’s structured approaches that work with your brain, not against it.

Here’s what the research supports.

Preventive Strategies

Implementation intentions. This is the most powerful technique in the research. Instead of vague goals like “I’ll work on the report tomorrow,” you create specific if-then plans: “At 9 AM tomorrow at my desk, I will write the report introduction.”

This simple shift increases success rates by 300%. Why? Because it removes the decision point. When 9 AM arrives, you don’t debate whether to start. You’ve already decided.

The Pomodoro Technique. Work for 25 minutes, then take a 5-minute break. This aligns with how the brain processes cognitive load and supports regular reward cycles. Knowing a break is coming in 25 minutes makes it easier to start complex tasks.

Task decomposition. Break projects into sub-2-minute initial steps. Instead of “write a proposal,” your first task is “create a document and write a title.” This leverages the Zeigarnik Effect, which is your brain’s preference for completing started tasks. Once you’ve begun, momentum typically carries you forward.

Temptation bundling. Pair aversive tasks with enjoyable activities. Behavioural economist Katherine Milkman developed this approach. Only listen to your favourite podcast while doing expense reports. Only get your premium coffee while working on that difficult presentation. This makes the hard work more appealing.

In-the-Moment Tactics

When you’re facing a task right now and feeling the resistance, these techniques help:

The 5-Minute Miracle. Commit to working for just five minutes. That’s it. This bypasses your emotional resistance because five minutes doesn’t feel threatening. What usually happens? Once you start, you keep going. But even if you don’t, you’ve made progress.

The Swiss Cheese approach. Make random “holes” in large tasks through brief, low-expectation work sessions. Spend 10 minutes just collecting links for your research. Spend 5 minutes outlining section headers. You’re building progress without the pressure of completing anything.

Trigger reversal. Research identifies seven procrastination triggers: boredom, frustration, difficulty, unstructuredness, ambiguity, personal meaninglessness, and a lack of intrinsic rewards. Identify which triggers are activated for your specific task, then deliberately reverse them.

If a task is tedious, can you make it a game or competition with yourself? If it’s ambiguous, can you get clarity from someone before starting? If it feels meaningless, can you connect it to a larger goal that matters to you?

Environmental modification. Eliminate digital distractions before you start, not while you’re working—close unnecessary browser tabs. Put your phone in another room. Use website blockers if needed. Make it more complicated to escape to easy dopamine hits.

The Self-Compassion Factor

Here’s something that surprised researchers: self-compassion reduces procrastination more effectively than self-criticism.

When you procrastinate and then beat yourself up about it, you create more negative emotions. Those emotions drive more procrastination. It’s a vicious cycle.

Studies show that people who practice self-forgiveness for past delays are significantly more likely to complete future tasks. When you slip up, acknowledge it without judgment, identify what got in the way, and recommit to your plan.

Mindfulness-based interventions also show strong results. These improve executive function through body relaxation, breathing practice, and awareness exercises. You don’t need to become a meditation expert. Even brief mindfulness breaks help restore the self-control required to engage with complex tasks.

What You Can Apply Right Now

Even if you’re not struggling with chronic procrastination, here’s what transfers to any professional situation:

Start with implementation intentions. Take your next vital task and create a specific if-then plan. Write down: “When [specific time and location], I will [specific first action].” This single technique offers the highest return on investment.

Design for immediate action, not motivation. Stop waiting to feel motivated. Motivation follows action, not the other way around. Use the 5-Minute Miracle to get started, and let momentum carry you forward.

Identify your triggers. The next time you find yourself procrastinating, ask which of the seven triggers are activated: tedious, frustrating, complex, unstructured, ambiguous, meaningless, or unrewarding. Once you know the trigger, you can address it directly.

Practice self-compassion. When you procrastinate, notice it without self-judgment. Understand that your brain is trying to regulate uncomfortable emotions. Acknowledge what happened, identify what you’ll do differently, and move forward.

Modify your environment first. Don’t rely on willpower to resist digital distractions. Remove them before you start working. Make the right choice, the easy choice.

Moving Forward

Procrastination isn’t about laziness or poor character. Your brain attempts to regulate uncomfortable emotions through avoidance. Understanding this changes everything.

You can’t eliminate procrastination. However, you can use evidence-based strategies to work with your brain’s wiring rather than fighting against it. Implementation intentions, task decomposition, the 5-Minute Miracle, trigger reversal, and self-compassion all have strong research support.

The key is to start small. Pick one technique from this article and try it this week. See what works for your specific situation and constraints.

Thanks for taking the time to read. If you have questions or if you’ve found other strategies that work for procrastination, I’d love to hear about them. Don’t hesitate to leave comments below.

Beyond Patternitis: Why Great Engineers Embrace "The Boring"

Vincent Nyanga — Fri, 20 Feb 2026 08:00:38 GMT

In my recent LinkedIn post, I touched on the Pattern-Process Paradox: the growing gap between solving business problems and the ritualistic application of design patterns. While patterns were intended to mitigate complexity, their dogmatic over-application often leads to the very unmaintainability they were meant to prevent.

Today, let’s go deeper into the research.

The Psychology of the “Golden Hammer”

To understand why we over-engineer, we must examine how we learn. According to the Dreyfus model of skill acquisition, novices and “advanced beginners” rely heavily on rigid, context-free rules. For them, a design pattern is a survival mechanism—a “black-box” solution used because they lack the experience to evaluate a problem from first principles.

The danger zone is the “Competent” stage. Here, a developer has learned the how of a pattern but not the when. This is the breeding ground for Cargo Cult Programming, in which program structures are treated as rituals rather than functional necessities.

The Resume-Driven Development (RDD) Trap

We must also acknowledge the market incentives. Research indicates that 82% of software professionals believe that using emerging technologies makes them more attractive to prospective employers.

This creates a self-sustaining cycle:

Hiring managers (60%) admit that tech trends influence their job offerings.
Developers respond by imposing “buzzword” architectures, such as 50 microservices for a simple CRUD application, into projects to gain “marketable” experience.

The result? A “resume-driven legacy” of over-engineered systems that are difficult to maintain once the hype for that specific framework fades.

The Architect’s Remedy: Strategic Programming

If the goal of software design is managing complexity, how do we shift back to utility? John Ousterhout’s A Philosophy of Software Design offers the best framework: Strategic vs. Tactical Programming.

Prioritise Deep Modules: A “deep” module hides significant complexity behind a simple interface. Contrast this with “shallow” modules, classes, or methods that increase cognitive load by requiring developers to track logic across dozens of fragmented files.
Focus on Cohesion: A deep module thrives on high functional cohesion. Don’t fragment your logic to satisfy a “Clean Code” rule about method length; keep related logic together to reduce obscurity.
The 20% Investment: Tactical programming, which gets the next feature working as quickly as possible, leads to “Tactical Tornadoes”. Strategic programming requires an upfront investment of 10-20% of your time in design improvements to ensure the system remains maintainable.

Final Thought

Success in software architecture isn’t about how many patterns you can fit into a pull request. It’s about competence over ritual. True experts use heuristics and intuition to recognise the “vibe” of a failing system and choose the simplest, most effective tool for the job.

Sometimes, the most “senior” thing you can do is choose the boring solution.

The Complete Guide to Asynchronous Request-Reply Patterns

Vincent Nyanga — Fri, 06 Feb 2026 08:01:00 GMT

Your API just returned a 504 Gateway Timeout because generating that report took 45 seconds.

Your users are frustrated. Your connection pool is exhausted. Your system is brittle.

The Asynchronous Request-Reply (ARR) pattern solves this: acknowledge requests immediately, process in the background, notify when complete.

Here are your five implementation options and when to use each.

1. Polling: Start Here

How it works: The server returns a 202 Accepted response code along with a status URL. Client checks periodically until complete (303 redirect to result).

Use the Retry-After header. Let the server control polling frequency—no guessing needed.

Best for: Browser clients, corporate firewalls, tasks under 60 seconds.

Avoid when: Tasks take hours, real-time updates are required, or you have thousands of concurrent pollers.

Polling sequence diagram

2. Webhooks: Push When Ready

How it works: The client provides a callback URL. Server processes in the background and posts the result to the callback when done.

Security is mandatory. Verify requests using HMAC signatures or JWT tokens. Never trust incoming webhook data unquestioningly.

Implement retry logic with exponential backoff and dead-letter queues. Make your handlers idempotent—you’ll deliver webhooks multiple times.

Best for: Server-to-server communication, event-driven architectures.

Avoid when: Browser clients, behind firewalls, or when debugging complexity is a concern.

Webooks sequence diagram

3. Server-Sent Events: The Underrated Option

How it works: The client opens a persistent HTTP connection. Server pushes events through this stream when tasks complete.

Automatic reconnection is built in. Browsers handle reconnection and resumption using Last-Event-ID header.

Text-only format. JSON works great. Binary data needs base64 or a separate HTTP fetch.

Best for: Real-time browser updates and one-way server-to-client communication.

Avoid when: bidirectional communication, binary streaming, or support for IE/legacy browsers are required.

Example: OpenAI’s ChatGPT streaming responses.

Server-sent events sequence diagram

4. WebSockets: For True Bidirectionality

How it works: Persistent, full-duplex connection. Both the client and the server can send messages at any time.

Operational complexity is objective. Heartbeat/ping required. Sticky sessions for load balancing. Stateful connection management.

Best for: Chat, collaborative editing, gaming—anything requiring frequent bidirectional updates and sub-100ms latency.

Avoid when: One-way updates (use SSE), infrequent communication (use polling), or simple request-reply.

WebSockets sequence diagram

5. Message Brokers: The Enterprise Backbone

How it works: Client publishes to the broker with correlation_id + reply_to Address. The server consumes, processes, and publishes a reply with the same correlation_id. Client matches responses using the correlation ID.

Idempotency is mandatory. At least once, delivery means duplicate messages. Your handlers must handle this safely.

Monitor dead-letter queues. Failed messages after max retries go to DLQs—they’re your canary for system issues.

Broker choice:

RabbitMQ: Low latency, complex routing (< 50K msgs/sec)
Kafka: High throughput, event streaming (millions msgs/sec)
AWS SQS/SNS: Managed, serverless, pay-per-use

Best for: Microservices, guaranteed delivery, high throughput, complex routing.

Avoid when: Browser clients, simple APIs, sub-10ms latency needs, and no DevOps expertise.

Message brokers sequence diagram

Quick Selection Guide

Polling: Browser clients, most straightforward implementation, tasks < 60 seconds, moderate scale

Webhooks: Server-to-server, event-driven, large-scale, need push notifications

SSE: Browser real-time updates, one-way communication, simpler than WebSockets

WebSockets: Bidirectional, < 100ms latency, chat/collaboration, very large scale

Message Brokers: Microservices, millions of messages/second, guaranteed delivery, complex routing

Start Simple, Scale Smart

Begin with polling. It’s universally compatible, easy to debug, and solves 80% of async cases.

Add complexity (WebSockets, message brokers) only when requirements demand it.

The best architecture solves your actual problems without introducing unnecessary complexity.

Ship fast, die slow

Vincent Nyanga — Fri, 23 Jan 2026 08:00:50 GMT

You’re shipping fast and making money. Everything works. The codebase is a mess, but who cares? Customers are paying. Features are landing. You’re winning.

Then you’re not.

The Slow Death

It starts small. A feature that should take a day takes a week. A bug fix that breaks something else. Then something else. Your best engineer, the one who actually understands how things work, quits. They’re tired of fighting the codebase every day.

New hires take months to contribute anything meaningful. They’re not slow. They’re just drowning in complexity nobody bothered to manage.

You’re not slow because the market changed or because your team isn’t good enough. You’re slow because every change is now a negotiation with the mess you left behind.

The Trap

The tech debt curve

Here’s what makes tech debt dangerous: it doesn’t announce itself.

The code keeps working. The product keeps making money. There’s no alarm, no dashboard turning red, just a gradual erosion of your ability to move.

“Move fast and break things” quietly becomes “move slow and break everything.”

By the time you feel the pain, really feel it, you’re looking at a six-month rewrite. Your competitors shipped three features while you were untangling spaghetti. The velocity you thought you were protecting by cutting corners? Gone.

The trap isn’t that messy code stops working. It’s that it keeps working just long enough for you to build your entire business on top of it.

The Balance

I’m not arguing for clean code as an end in itself. Elegant abstractions don’t pay the bills. Shipping does.

But there’s a difference between “clean enough to change” and “clean enough to frame.” The goal isn’t beautiful code. The goal is code you can touch without fear.

Every shortcut is a bet that you won’t need to change that code later. Sometimes that bet pays off. Most of the time, you lose, you don’t know it yet.

The Bottom Line

The code that makes you money today will need to change tomorrow. New feature. New integration. New regulation. Pivot.

If you can’t change it, you can’t compete. Simple as that.

So ship fast — but ship code you can live with because you will be living with it.

The interest on tech debt compounds silently. And the bill always comes due.

Hosting a BFF on AWS: A Simple Guide

Vincent Nyanga — Fri, 09 Jan 2026 08:00:59 GMT

In my previous article I covered the Backend For Frontend (BFF) pattern — why SPAs shouldn’t handle OAuth tokens directly. This week: how to actually deploy it on AWS.

The Goal

One domain. Static frontend and BFF backend. Internal APIs completely hidden from the internet.

The Architecture

Architecture diagram for hosting BFF on AWS

How It Works

CloudFront is your single entry point. It uses “behaviors” to route traffic:

/* → S3 (your static frontend)
api/* → ALB (your backend)

S3 hosts your built frontend assets. CloudFront caches them at edge locations globally.

ALB receives /api/* requests and forwards them to Fargate. It lives inside your VPC.

Fargate runs your BFF. This is where OAuth happens, sessions are managed, and requests are proxied to your internal APIs with access tokens attached.

Redis stores sessions. The BFF is stateless — session data lives here so you can scale horizontally.

Internal APIs are your actual backend services. They have no public endpoints. Only the BFF can reach them.

The Request Flow

User loads app.example.com → CloudFront serves static assets from S3
App calls app.example.com/api/orders → CloudFront routes to ALB
ALB forwards to Fargate (BFF)
BFF looks up session in Redis, gets access token
BFF calls internal API with token attached
Response flows back to browser

The browser only ever sees a session cookie. Tokens stay server-side.

Why CloudFront Behaviors?

You might think you need an ALB in front of everything to do the routing. You don’t.

CloudFront handles it natively, and you get:

Free data transfer from S3
Edge caching for static assets
DDoS protection included
Lower cost than ALB-first architecture

Key Configuration Details

CloudFront behavior paths: Use api/* not /api/*. No leading slash.

Restrict ALB access: Add a custom header in CloudFront (e.g., X-Origin-Verify: secret-value). Configure ALB to reject requests without it. This prevents bypassing CloudFront.

S3 Origin Access Control (OAC): Configure CloudFront with OAC so users can only access static assets through CloudFront, not directly via the S3 URL. This ensures caching is always used and the origin is secured.

Redis for sessions: Don’t store sessions in Fargate memory. When requests hit different tasks, sessions disappear. Always use external session storage.

Handling SPA Client-Side Routing

If you’re using React Router, Vue Router, or similar, you’ll hit a common problem: user refreshes on /dashboard/settings and gets a 404.

Why? S3 looks for a physical file at /dashboard/settings. It doesn’t exist. S3 returns 404 before your SPA can handle the route.

The fix: Configure CloudFront to catch 404 errors from S3 and return /index.html instead. Your SPA loads, reads the URL, and routes correctly.

In CloudFront, go to Error Pages and create a custom error response:

HTTP Error Code: 404
Response Page Path: /index.html
HTTP Response Code: 200

Cookie Settings

This setup enables the strictest possible cookie configuration because everything is on the same domain.

Set-Cookie: session=abc123; HttpOnly; Secure; SameSite=Strict; Path=/api

What each flag does:

HttpOnly — JavaScript can’t read the cookie. XSS can’t steal it.
Secure — Only sent over HTTPS.
SameSite=Strict — Only sent on same-site requests. Strongest CSRF protection.
Path=/api — Cookie only sent to BFF endpoints, not with static asset requests.

Why same domain matters: If your frontend and BFF are on different domains (e.g., app.example.com and api.example.com), you’re forced to use SameSite=None, which is weaker and increasingly blocked by browsers.

With CloudFront serving both app.example.com/* and app.example.com/api/*, you’re same-origin. SameSite=Strict just works.

That’s It

This architecture handles most production workloads. It’s secure by default — your internal APIs have no public exposure, tokens never reach the browser, and CloudFront gives you edge caching and DDoS protection for free.

Start here. Add complexity only when you need it.

Why you SPA shouldn't handle OAuth tokens

Vincent Nyanga — Fri, 26 Dec 2025 07:01:18 GMT

Why Your SPA Shouldn’t Handle OAuth Tokens

Most OAuth tutorials for SPAs show you how to get an access token and store it in localStorage. The app works. You ship it.

The Problem with Browser-Based OAuth Clients

When your SPA handles OAuth directly, it acts as a “public client” and has no secure way to store credentials. The tokens end up in one of the browser's storage areas: localStorage or sessionStorage. Wherever they land, any JavaScript running on your page can access them.

This includes:

Malicious scripts from XSS vulnerabilities
Compromised third-party libraries
Injected code from browser extensions

The IETF draft on browser-based applications is explicit: browser-based public clients are “not recommended for business applications, sensitive applications, and applications that handle personal data.” A large percentage of applications fall into this space

The Attack That Changes Everything

You might think: “I’ll just use short-lived access tokens and refresh token rotation. Even if tokens get stolen, the damage is limited.”

That’s true for simple theft. But there’s a more sophisticated attack that bypasses all of these defences.

An attacker with XSS on your page doesn’t need to steal your tokens. They can get their own.

Here’s how:

Inject a hidden iframe
Initiate a silent OAuth flow using the user’s existing session
Extract the authorisation code from the iframe
Exchange it for a fresh set of tokens

The attacker now has their own access token and refresh token, utterly independent of yours. Short token lifetimes don’t help. Refresh token rotation doesn’t help. PKCE doesn’t help. DPoP doesn’t help.

Why? Because the attacker is running a legitimate OAuth flow. They’re just doing it from your origin, with your user’s session.

The Backend for Frontend Pattern

The BFF pattern takes a fundamentally different approach. Instead of your SPA acting as the OAuth client, a backend component handles all OAuth responsibilities.

The BFF has three jobs:

Act as a confidential OAuth client (with real credentials)
Store tokens server-side, tied to a session
Proxy all API calls, attaching the access token before forwarding

Your SPA never sees a token. It only receives an HttpOnly session cookie.

Architecture diagram showing the BFF pattern

Why This Stops the Attack

The silent flow attack fails because the BFF is a confidential client. Even if the attacker obtains an authorisation code, they can’t exchange it because they don’t have the client secret.

With a public client, all four attack scenarios apply: stealing existing tokens, stealing tokens continuously, running a silent flow for new tokens, and proxying requests through the user’s browser.

With a BFF, only the last one remains. And that’s not an OAuth vulnerability; it’s inherent to all web applications. The attacker can make requests while the user’s browser is open, but they can’t exfiltrate credentials for later use.

When to Use the BFF Pattern

If you’re building business applications, sensitive applications, or anything that handles personal data, use a backend to handle OAuth.

In practice, this means:

Financial services
Healthcare applications
Enterprise software
Any app with user data you’d rather not see in a breach notification

The BFF adds infrastructure complexity. You need a backend component, session storage, and a proxy layer. But this complexity provides security guarantees that browser-based OAuth simply cannot.

The Middle Ground

The decision of whether to use a BFF is not binary. A pure SPA with browser-based OAuth is at one end of the spectrum, while a BFF is at the other. In between the two, there are other options you can employ. Here is one of them:

Token-Mediating Backend

This architecture acts as a “middle ground” between a full BFF and a browser-only client. It is lighter-weight than a BFF because it does not require proxying every API request through your server, but it is less secure because the access token is exposed to the browser.

• How it works: You still use a backend component to handle the OAuth exchange (exchanging the authorisation code for tokens), acting as a confidential client. However, instead of keeping the access token hidden, the backend passes it to the browser application. The browser then uses this token to call resource servers directly.

• Security Properties:

◦ Refresh Tokens: The backend keeps the refresh token and does not expose it to the browser, protecting it from theft via XSS. When the access token expires, the browser requests a new one from the backend.

◦ Access Tokens: The access token is exposed to the browser, making it vulnerable to theft if malicious scripts compromise the application.

◦ Hijacking: Because the access token is exposed, an attacker could steal it to call APIs directly, unlike a pure BFF, where the attacker can only hijack the client session.

• Recommendation: This pattern is recommended only if the use cases or system requirements prevent the use of a proxying BFF.

Analogy

Please think of the BFF as a bank teller who keeps the vault key (token) behind the counter; you ask them to perform transactions, and they do it for you. The Token-Mediating Backend is like a manager who gets the key from the vault but hands it to you to open the safety deposit box yourself; you have more direct access, but if someone steals the key from you, they can open the box too. The Browser-Based Client is like having the key mailed directly to your house; it’s convenient, but anyone who breaks into your mailbox (browser) gets the key immediately.

The architecture behind a reliable AI-powered system

Vincent Nyanga — Fri, 12 Dec 2025 03:15:54 GMT

For the past couple of months, I’ve been working on AI-powered systems and chatbots at my workplace. With minimal knowledge of architecting and building such systems, I had to rely on online tutorials and books to try to find the best way to build. For the most part, these didn’t help much, so I resorted to bulldozing through, figuring things out as I went.

Below is the architecture I’ve landed on. It’s definitely not the only way, and I’m convinced it’ll improve over time. It’s been battle-tested, and so far, it covers the gaps that I found in most tutorials.

The full picture

The query flows through several stages before a response reaches the user. Each stage exists because I learned the hard way what happens when you skip it.

Let’s walk through each one.

1. Cache

The first thing a query hits is the cache. Why? Because everything downstream is expensive. RAG lookups, LLM calls, guardrail checks — all of it costs time and money.

This isn’t just key-value caching. For AI systems, you need semantic caching — matching queries by meaning, not exact string matches. “What’s your refund policy?” and “How do I get my money back?” should hit the same cache entry.

Implementation notes:

Embed incoming queries and compare against cached query embeddings
Set a similarity threshold — too low, and you return wrong answers, too high, and you rarely hit cache
Cache responses with TTL based on how often the underlying data changes. One thing I’m thinking of doing is deciding intelligently what to cache and what not to.
Invalidate aggressively when source data updates

2. Intent classification

Before you do any expensive context retrieval, classify what the user actually wants. This serves two purposes:

Routing: Different intents need different handling. A simple FAQ lookup doesn’t need your most powerful model. A complex analysis might need multiple tool calls. Classify first, then route appropriately.

Safety: This is your first line of defence against prohibited actions. If someone’s trying to extract training data, bypass restrictions, or do something harmful, catch it here before you’ve spent compute on context retrieval.

I run a very small model at this stage — fast enough not to add meaningful latency, but accurate enough to catch the obvious cases. The heavy-duty safety checks come later in the guardrails.

What intent classification catches:

Query type (question, clarification, feedback)
Domain routing (which knowledge base or tool set applies)
Risk signals (prompt injection attempts, out-of-scope requests)

3. Contex engineering

This is where your RAG pipeline, memory systems, and query rewriting live. The goal: construct the context that gives the model the best chance of generating a helpful response. The user’s intent from the previous step informs what information needs to be added to the context.

Query rewriting: User queries are often ambiguous or incomplete. “What about the deadline?” means nothing without context. Rewrite queries to be self-contained using conversation history.

RAG retrieval: Pull relevant documents, generate and run database queries. Retrieve what’s actually relevant based.

Memory: This includes both short-term and long-term memory. For multi-turn conversations or returning users, inject relevant history. What did they ask before? What preferences have they expressed? This is especially important for personalisation. In a natural language-to-SQL (nl2sql) system I built, I use data from long-term memory for few-shot prompting. Based on users’ feedback from previous interactions, I’d add a few examples of good queries, as well as bad queries and reasons why they are good or bad.

The key insight: Context engineering is where most of your system's “intelligence” comes from. A mediocre model with great context beats a great model with poor context.

4. Input guardrails

Now the query has context; before it goes to the model, run it through input guardrails. One of the primary concerns here: privacy

If your context includes personally identifiable information (PII) or any other sensitive data, you need to redact it before sending it to the LLM. This is non-negotiable for most production systems.

How I handle PII:

Named entity recognition to identify PII
Replace with deteministic placeholders ([CUSTOMER_], [EMAIL_], etc.)
Store a mapping in the conversation scope so you can restore the original values before sending a response to the user.
The mapping never leaves your system — only the redacted text goes to the model.

Input guardrails also catch anything the intent classifier missed. If a carefully crafted prompt injection made it past classification, this is your second chance to see it.

5. Model agnostic router

Not every query needs your most expensive model. The router decides where to send the request based on:

Complexity: Simple lookups go to faster, cheaper models. Complex reasoning goes to more capable ones.
Intent: Code generation might route to a model fine-tuned for code. Creative writing, tuned for that.
Cost constraints: If you have token budgets, the router enforces them.

I call this “model agnostic” because the rest of the system doesn’t care which model handles the request. The router abstracts that decision away.

Practical tip: Start with a single model and add routing later. Premature optimisation here adds complexity without proven benefit. Use a router when you have data showing that different queries need different handling.

6. Output guardrails

The model has generated a response. Before it reaches the user, verify it.

Factual grounding: If the response makes claims, can you trace them back to the retrieved context? Flag or filter responses that hallucinate beyond what the context supports.

Safety checks: Run the response through content classifiers. Does it contain anything harmful, inappropriate, or policy-violating? Catch it here.

Format validation: If you expected structured output, validate it. Malformed responses should retry or fallback, not reach the user.

Consistency checks: Does the response contradict earlier statements in the conversation? Does it make promises your system can’t keep?

A very small model can be used here to minimise latency.

7. Response formatting

Final stage: prepare the response that will be sent to the user.

PII restoration: Remember those placeholders? Replace them with the original values from your mapping. The user sees real names and data; the model only ever saw redacted versions.

Then cache the response (if appropriate) and return to the user.

The Feedback Loop: Learn and Improve

One piece I haven’t mentioned: the feedback loop.

User feedback — explicit (thumbs up/down, ratings) and implicit (follow-up questions, task completion) — flows back into your system. This informs:

Cache invalidation: If users consistently dislike a cached response, invalidate it.
Memory updates: Store what worked and what didn’t for future context
Retrieval tuning: If certain documents consistently lead to inadequate responses, adjust their ranking
Model routing: If one model consistently performs better for certain query types, update routing rules

This is how your system gets smarter over time without retraining models.

Principles Behind the Architecture

A few principles that shaped these decisions:

Fail early, fail cheap: Catch problems as early as possible in the pipeline. Intent classification and cache checks are cheap. Model inference is expensive. Don’t spend the expensive compute on queries you’re going to reject anyway.

Defence in depth: Don’t rely on any single safety mechanism. Intent classification, input guardrails, and output guardrails all catch different things. Overlap is fine — missing something isn’t.

Separate concerns: Each stage has one job. Context engineering doesn’t know about caching. Guardrails don’t know about routing. This makes the system testable and maintainable.

Make it observable: Every stage should emit metrics such as cache hit rates, guardrail trigger rates, model latencies, and feedback signals. You can’t improve what you can’t measure.

What I’d Do Differently

If I were starting over:

Add semantic caching earlier. I underestimated how much duplicate work we were doing.
Invest more in intent classification. A good classifier up front saves so much complexity downstream.
Build the feedback loop from day one. Feedback makes the system self-improve over time.

Wrapping Up

This architecture isn’t perfect, and it’s certainly not the only way to build production AI-powered systems. But it handles the problems I kept running into: sensitive data, expensive inference, unreliable outputs, and the need to improve over time.

The pattern applies whether you’re building a customer support bot, a document analysis tool, or an internal knowledge assistant — the specific implementations change; the stages don’t.

If you’re building something similar, I’d love to hear what’s working for you. What am I missing? What would you do differently?

Utilising Bloom filters in high perfomance system design

Vincent Nyanga — Fri, 05 Dec 2025 04:23:51 GMT

Bloom filters have emerged as an elegant and robust solution for data-efficient querying and storage in modern system design. As a space-efficient probabilistic data structure, a Bloom filter is used to test whether an element is a member of a set. While they allow for a low, tunable rate of false positives, they guarantee no false negatives, meaning a query returns either “possibly in the set” or “definitely not in the set”. This trade-off makes them indispensable in scenarios where speed and memory optimisation are critical.

How Bloom Filters Work

A Bloom filter functions using a simple, yet ingenious, structure consisting of a fixed-size bit array (initialised to zeros) and several independent hash functions.

Insertion: To add an element, it is processed by k hash functions. Each function produces a unique index in the array, and the corresponding bits at these positions are set to 1.
Membership query: To check for membership, the element is hashed again using the same k functions. If all of the corresponding bits are set to 1, the element may be present. If any bit is 0, the element is definitely not in the set.

The memory usage of a Bloom filter remains relatively low because it stores only the hashed representation of the items, not the items themselves. For instance, a filter targeting a 1% false positive probability requires less than 10 bits per element, regardless of the size or number of elements being stored.

Key Advantages

Bloom filters provide several advantages crucial for scaling modern applications:

Speed and Efficiency: Both lookups and insertions are speedy, with constant-time complexity O(k), where k is the number of hash functions. This fixed execution time is independent of the total number of items stored in the set.
Memory Efficiency: Bloom filters excel at representing large sets with a minimal memory footprint, a critical factor for managing infrastructure and running costs, especially where DRAM is involved.
Scalability: Due to their low memory overhead and constant query time, Bloom filters can efficiently handle massive datasets.
Privacy Preservation: They can be used in scenarios like financial fraud detection, allowing organisations to exchange lists (e.g., stolen credit card numbers) to check for matches without revealing the underlying sensitive data

Core Tradeoffs and Limitations

False positive rate: The probability of false positives increases as more elements are added, until all bits are set to 1, at which point all queries will return positive. However, by carefully choosing the bit array size (m) and the number of hash functions (k), this probability can be controlled. The false positive rate of a Bloom filter can be calculated using the following formula:
Where:
- p is the false positive rate
- n is the number of elements in the filter
- m is the size of the bit array
- k is the number of hash functions
Inability to delete elements: Removing an element would require resetting corresponding bits, which could inadvertently affect other elements that share those bits, potentially introducing forbidden false negatives. This makes standard Bloom filters unsuitable for highly dynamic datasets that require frequent removals, though variants such as Counting Bloom Filters address this complexity.

System Design Applications

Bloom filters are widely implemented in systems where offloading expensive checks is paramount:

Databases and Key-Value Stores: Log-Structured-Merge trees (LSM-trees), used in key-value stores like Cassandra, make use of Bloom filters. Filters are associated with sorted runs of data. During a point lookup, probabilistically consulting the Bloom filter allows the system to skip accessing the run on secondary storage (I/O) if the key is definitely not present.
Caching and CDNs: Bloom filters prevent “one-hit-wonders” (data requested only once) from being written to disk, reducing disk I/O and saving valuable cache space. They are also used in web servers to check whether an item is in the cache quickly.
Security and Filtering: Previously, Google Chrome used a local Bloom filter copy of malicious URLs.Only if the filter returned a positive result (a probable hit) would a full, costly check against a server be performed, significantly reducing workload on the centralised malicious URL API.
User Management: Provides a fast, efficient initial check to see whether a desired username has already been used, reducing the number of queries to the central database.

Optimal Sizing Guide

For a desired false positive probability ϵ and n inserted elements, the memory utilisation is minimised when:

Optimal number of hash functions:
Required bits per element:
For comparison: A 1% error rate (ϵ=0.01) typically requires 7 hash functions and 9.585 bits per item.

Best Practices, Patterns & Principles vs Context In Software Development

Vincent Nyanga — Sun, 13 Aug 2023 19:12:27 GMT

“You cannot do that! It violates X principle…” I have heard this statement countless times in my career. At first, when I was still young and clueless, I would feel dirty, like I have committed a mortal sin on which the software gods looked away in disgust. As I went on in my career, I began to question such statements. Did I violate a principle? Am I not using the best practice? Does this best practice apply to what I am doing? Some of the time the answer to that question is yes, but in most cases I have found it not to be the case. In this very short post I am going to talk about the importance of context in software engineering.

The Buzz Words Plague

The software engineering world never runs out of buzz words. Everyday there is something cool. A better way to do something — a best practice. I have witnessed in awe as my fellow professionals (and myself sometimes) flock to the new shiny thing, shunning yesterday’s best practices for the most relevant. All of a sudden, that which was once a best practice has instantaneously morphed into an anti pattern.

Again, in some cases that is true. With the constant improvements in technology some of the things that we held in high regard are no longer relevant. However, in most cases, we fall in the trap of going wherever the wind is blowing, blindly applying solutions where they don’t apply.

Principles, Patterns And Best Practices

What are software development principles? Software development principles are a set of guidelines that help software engineers write quality, maintainable software. They come about naturally as we encounter similar problems or as we repeatedly make the same mistakes. They provide templates or guides to solve recurring problems. I will using principles and patterns interchangeably in this post though I think they are slightly different.

Take the Don’t Repeat Yourself (DRY) principle for instance. It is a result of people getting caught out when all of a sudden they are required to change the same logic that’s scattered all over their codebase. It’s definitely a good guideline. But is it the law? Certainly not!

Best practices on the other hand, are things that I find mostly misused or dare I say, abused, in the software engineering industry. What is a best practice anyway? To me, a best a practice is something that solves a particular problem better than other options (that the person has managed to come up with). I try to avoid the word best because chances are there is a better way of solving that problem. Are best practices bad? Not if they are applied correctly to the problems that suit them. This is where context comes in.

The Importance Of Context

Like they say, the best answer in software engineering ‘is it depends’. There is an opportunity cost to every decision we make. Everything has a tradeoff. Choosing which best practice or pattern to use should always be made within the context of the problem space. Not all problems are the same. They may appear similar at face value, which usually leads to incorrectly applying a solution that doesn’t efficiently or effectively solve the problem. The pattern (or best practice) that worked on your previous project won’t necessarily apply to the next.

Should you repeat yourself (violating the DRY principle)? Probably not, but if, in your context, you need to do so, please go ahead. I have witnessed two pieces of logic that appear similar initially, diverge as the project grows and requirements change. Context matters.

I have countless partial projects on GitHub, most of them trying to solve the same problem using whatever the buzz word was at that moment — Clean Architecture, DDD, microservices, you name it. Most of the time I stopped midway because of pure laziness. However, in some cases I just hit a brick wall when I realised that I was over engineering the solution while trying to follow the best practice. Certainly that best practice/pattern didn’t fit my problem space very well.

Conclusion

Am I saying patterns, principles and good practices are a bad thing? No. They are very useful and most of the time help us avoid banging our heads against the wall while try to solve certain problems. However, they should not be applied blindly without taking into consideration the context. Context is king!

I would like to hear what your opinion is on this topic. Please feel free to leave a comment below. Thanks so much for taking your time to read.

I stopped using GitHub Copilot after four months. Here's why

Vincent Nyanga — Sun, 06 Aug 2023 05:38:27 GMT

GitHub Copilot is an AI tool that assists with code suggestions while you write code. It was trained using thousands upon thousands of lines of code from GitHub’s public repositories. Earlier this year I signed up for GitHub Copilot and used it for four months. In this article I will talk about experience and why I eventually stopped using it.

My setup

I installed Copilot on my Jetbrains Rider and Visual Studio Code IDEs. I write mostly C# code so my feedback is based on C#. If you're using other languages like Python (I heard it's very good with Python), you may have a different experience.

What I enjoyed

When I started using Copilot, I was amazed at how well it performed. I was especially impressed when it comes to writing unit tests. All I needed to do was write the first test to give it an idea of how I structure my tests. After that, it would generate tests for the other scenarios in the system under test without straying too far off the context.

While writing logic, I'd prompt Copilot to give me suggestions by adding comments to my code. It would provide a couple of suggestions on how to solve the problem. That was impressive!

There were few occasions when I provided Copilot with a code block and asked it to explain what the code was doing. It also performed very well.

Why I stopped using it

As time went by, my excitement started to fade. I started to realise that I wasn't quite enjoying working with Copilot notwithstanding the nice features it has. Here's why I eventually decided to stop using it:

Reduced productivity

I know Copilot is supposed to increase developer productivity but that wasn't entirely the case with me. While it really helped me when it comes to writing tests, I found myself spending more time trying to debug its suggestions in my head to ensure they made sense before I could accept them.

The subtle bugs

The reason why I eventually started to scrutinise Copilot’s suggestions was that there were instances when I blindly accepted the suggestions that looked impressive yet they contained subtle bugs. If I didn't have experience with C# I would have just rolled the code without realising it contained bugs. If you want to use Copilot, I think it's best if you have some experience with the programming language. Otherwise you might introduce some bugs into your codebase.

It gets in your face sometimes

I don't know how to explain this. I found myself having to switch off Copilot at times because it was getting annoying 😂. I'd be in the zone writing some code and it would add suggestions that are way off. I found it extremely distracting, especially when I really wanted to focus.

Conclusion

In this article I spoke about my experience with GitHub Copilot and why I eventually stopped using it. Don't get me wrong, GitHub Copilot is a great tool and it can be very useful. However, it just didn't click with me. I'd suggest you give it a go if you haven't already, and see how it fairs. Thanks for reading.

Using Azure Event Grid In .NET

Vincent Nyanga — Sat, 22 Jul 2023 07:03:21 GMT

Azure Event Grid is a remarkable solution for developers working with event-based architectures. It plays a pivotal role in managing the routing of events from any source to any destination, for any application. This service can handle events from Azure services and custom events which can be published directly to the service. These events can then be filtered and sent to various recipients, such as built-in handlers or custom web-hooks. In this article, we will delve deeper into the Azure Event Grid and its .NET client library.

Azure Event Grid Concepts

Azure Event Grid's functionality can be understood through these concepts:

Event: Describes what happened.
Event source: Specifies where the event took place.
Topic: The endpoint where publishers send events.
Event subscription: The endpoint or built-in mechanism to route events, sometimes to more than one handler. Subscriptions also help handlers to intelligently filter incoming events.
Event handlers:The application or service responding to the event.

Event Schemas

Event Grid supports two schemas for encoding events — event grid schema and cloud events v1.0 schema. When a topic or domain is created, you need to specify the schema that will be used when publishing events.

Event Grid schema

This is the default schema selected if you don’t specify a schema. This is how the Event Grid Schema looks like:

CloudEvents schema

Another option is to use the CloudEvents v1.0 schema. CloudEvents is a Cloud Native Computing Foundation project which produces a specification for describing event data in a common way. Here is how the schema looks like:

Advantages of Azure Event Grid

The Azure Event Grid comes with tangible benefits:

It supports native event handling mechanisms in the Azure cloud application, enabling swift connections between data sources and event handlers.
It supports both built-in and custom events.
It provides intelligent routing with filters and standardises an event schema.
It is a highly reliable service with 24 hours retry.
It can support millions of events per second.
It greatly enhances serverless, ops automation, and integration work.

Using Azure Event Grid In .NET

There is a client library available to .NET developers. The library provides the following functionality:

Publish events to the Event Grid service using the Event Grid Event, Cloud Event, or custom schemas
Consume events that have been delivered to event handlers
Generate SAS tokens to authenticate the client publishing events to Azure Event Grid topics

To use it in your application, you need to install if from NuGet:

dotnet add package Azure.Messaging.EventGrid

Publishing Messages

The library provides the `EventGridPublisherClient` class that allows you to publish events to a topic or domain. First you need to create a new instance of the client:

var client = new EventGridPublisherClient("", new AzureKeyCredential(""));

The example above uses an access key to authenticate the client. If you are going to host your application in Azure, I highly recommend that you authenticate using a managed identity. The `EventGridPublisherClient` also accepts a set of configuring options through `EventGridPublisherClientOptions`. For example, you can specify a custom serializer that will be used to serialize the event data to JSON.

Once you have authenticated your client, you can start publishing events. Regardless of what schema your topic or domain is configured to use, `EventGridPublisherClient` will be used to publish events to it. Use the `SendEvent` or `SendEventAsync` method for publishing single events, or `SendEvents`/`SendEventsAsync` if you want to publish multiple events:

Receiving Events

There are several different Azure services that act as event handlers. These include Azure Functions, Logic Apps or your own custom webhooks.

Note: when using webhooks to handle your events, Event Grid requires you to prove ownership of your webhook endpoint before it starts delivering events to that endpoint.

Once events are delivered to the event handler, parse the JSON payload into a list of events.

Conclusion

In this post I showed you how you can use Azure Event Grid in your .NET applications. I hope you have learned something. If you have a question, comment or suggestion, please feel free to leave it below. Thanks so much for taking your time to read.

Using Azure Service Bus In .NET

Vincent Nyanga — Fri, 14 Jul 2023 22:00:00 GMT

Azure Service Bus, as one of the most powerful and flexible messaging services, has become a cornerstone in the creation of highly reliable and scalable applications. This article aims to provide an in-depth understanding of Azure Service Bus, its implementation using .NET and C#, and how it can be leveraged to facilitate asynchronous messaging patterns between applications.

What Is Azure Service Bus

Azure Service Bus is a fully managed enterprise message broker with message queues and publish-subscribe topics. Service Bus is used to decouple applications and services from each other, providing the following benefits:

Load-balancing work across competing workers
Safely routing and transferring data and control across service and application boundaries
Coordinating transactional work that requires a high-degree of reliability

It acts as a message broker, offering message queues and publish-subscribe topics within a namespace. It provides several benefits, such as load balancing, safe routing, and transaction coordination. Data transfer between applications and services is accomplished using messages. These messages, decorated with metadata, can carry any kind of information, including structured data encoded in common formats like JSON, XML, Apache Avro, or plain text.

Azure Service Bus supports various messaging scenarios:

Messaging: Facilitates the transfer of business data like sales orders, purchase orders, etc.
Decoupling Applications: Helps enhance application reliability and scalability.
Load Balancing: Allows multiple consumers to read from a queue simultaneously.
Topics and Subscriptions: Enables one-to-many relationships between publishers and subscribers.
Transactions: Supports several operations within the scope of an atomic transaction.
Message Sessions: Provides a mechanism for grouping related messages.

Azure Service Bus Concepts

When working with Azure Service Bus, there are several concepts to be aware of:

Queues: Messages are sent to and received from queues, which store the messages until the receiving application is ready to process them. Queues are useful in point-to-point communication scenarios. Only one consumer can receive and process a message from a queue.
Topics: Messages can also be sent and received via topics, which are useful in publish/subscribe scenarios. Unlike queues, topics can have multiple subscriptions, each of which can have multiple consumers. Each subscription receives a copy of every message sent to the topic.
Subscriptions: Subscriptions allow consumers to receive a copy of each message sent to a topic. They can have filters and actions to select and modify messages.
Namespaces: A namespace is a container for all messaging components. It can have multiple queues and topics and often serves as an application container. If you want to use Service Bus, you must first create a namespace.

Using Azure Service Bus In .NET

There is a client library for Azure Service Bus that can be used to send and receive messages. The library is available as a NuGet package, and it can be installed using the following command:

dotnet add package Azure.Messaging.ServiceBus

Creating A Service Bus Client

Once you have installed the package you can create a Service Bus client using the following code:

const string connectionString = "";
var client = new ServiceBusClient(connectionString);

If you are going to host your service on Azure, I highly recommend using Managed Identity to authenticate with Azure Service Bus.

Sending Messages

In order to send messages, you need to use an instance of the ServiceBusSender class. You can create an instance of this class using the CreateSender method of the ServiceBusClient class:

var sender = client.CreateSender("");

The ServiceBusSender class provides many methods for sending messages, including sending a batch of messages, sending delayed messages etc. For more information, check out the documentation. The following code shows how to send a single message:

var message = new ServiceBusMessage("Hello World!");
await sender.SendMessageAsync(message);

Note: The ServiceBusSender is safe to cache and use for the lifetime of an application or until the ServiceBusClient that it was created by is disposed. Caching the sender is recommended when the application is publishing messages regularly or semi-regularly.

Receiving Messages

In order to receive messages, you need to use an instance of the ServiceBusReceiver class. You can create an instance of this class using the CreateReceiver method of the ServiceBusClient class:

var receiver = client.CreateReceiver("");

To receive messages, you can use the ReceiveMessageAsync method of the ServiceBusReceiver class:

var message = await receiver.ReceiveMessageAsync();
var body = message.Body.ToString();

Handling Messages

Once you are done processing a message, there are options you have available to handle the message:

Complete: This will remove the message from the queue or subscription. You call the CompleteMessageAsync method of the ServiceBusReceiver class to complete a message.
Abandon: This will abandon the message and make it available to be received again. You call the AbandonMessageAsync method of the ServiceBusReceiver class to abandon a message.
Defer: Deferring a message will prevent it from being received again using the standard receive methods. Instead, you need to use the ReceiveDeferredMessageAsync to receive deferred messages.
Dead-letter: Dead lettering a message moves it to a sub-queue of the original queue, preventing the message from being received again. You need a receiver scoped to the dead letter queue to receive messages from the dead letter queue.

Service Bus Processor

The ServiceBusProcessor class provides an abstraction around a set of ServiceBusReceiver that allows using an event based model for processing received ServiceBusReceivedMessage. It is constructed by calling CreateProcessor(String, ServiceBusProcessorOptions) on the service bus client. Here is an example of how to use it:

var processor = client.CreateProcessor("");
processor.ProcessMessageAsync += MessageHandler;
processor.ProcessErrorAsync += ErrorHandler;
await processor.StartProcessingAsync();

static async Task MessageHandler(ProcessMessageEventArgs args)
{
    var body = args.Message.Body.ToString();
    await args.CompleteMessageAsync(args.Message);
}

static Task ErrorHandler(ProcessErrorEventArgs args)
{
    Console.WriteLine(args.Exception.ToString());
    return Task.CompletedTask;
}

When using the ServiceBusProcessor, you won’t need to manually fetch messages from the queue. The processor will automatically fetch messages and call the ProcessMessageAsync event handler. The ProcessErrorAsync event handler will be called if there is an error while processing the message.

Conclusion

In this post, we looked at how to use Azure Service Bus in .NET. We looked at how to create a Service Bus client, how to send and receive messages, and how to use the Service Bus processor. I hope you found this post useful. If you have any questions or comments, please leave a comment below.