Skip to main content

Microservices Architecture

Introduction

Microservices Architecture is an architectural style that structures an application as a collection of small, independent, and distributed services. Each service:

  1. Can be deployed independently.
  2. Runs in its own process.
  3. Communicates through lightweight mechanisms (such as HTTP/REST or Message Queue).
  4. Is built around business capabilities (based on the Bounded Context concept).

Comparison: Monolith vs. Microservices

FeatureMonolithic ArchitectureMicroservices Architecture
StructureOne large code unit, everything in one repo and one executable.Collection of small, separate services.
DeploymentIf there's a small change, the entire application must be rebuilt and redeployed.Each service can be deployed separately (independent deployment).
ScalabilityCan only be scaled vertically (adding CPU/RAM to server) or by duplicating the entire application.Can be scaled horizontally (service-by-service) according to traffic needs.
TechnologyBound to one tech stack (e.g., all Go).Enables Polyglot Persistence (each service can choose the best DB/language).
FailureFailure in one module often causes the entire application to go down.Failure Isolation — failure of one service does not bring down others (Circuit Breaker).

Main Challenges of Microservices

ChallengeSolutions We've Discussed
Complex CommunicationAPI Gateway (Single Entry Point), Service Contract (Schema Guarantee).
Distributed DataEach service has its own DB. Addressed with Asynchronous Communication and Saga pattern for distributed transactions.
Network LatencyNetwork calls between services are slower than local function calls in a Monolith.
Debugging & MonitoringDifficult to track requests across 20 services.
Configuration ManagementMaintaining secrets and configuration for 20 services.

Key Patterns Supporting Microservices

  1. Orchestration: Using Kubernetes or Docker Swarm to manage deployment and scaling of each service automatically.
  2. Service Mesh: A dedicated infrastructure layer (e.g., Istio, Linkerd) to manage service-to-service communication, including Circuit Breaker and Retries transparently, removing these tasks from application code.
  3. Event Sourcing: Using Kafka to record every state change as an immutable event (immutable log), allowing other services to react to these events.

Microservices is the architecture of choice today for companies that need rapid deployment, high scalability, and enable independent teams to work on different services without blocking each other (autonomy).

Service-To-Service Communication

1. Synchronous Pattern (Direct Communication)

This pattern involves direct, real-time calls from one service to another. The sending service must wait for a response before continuing its task.

Mechanism

  • Primary Protocols: HTTP/HTTPS (often using REST or GraphQL) and gRPC.
  • How It Works: Service A uses a client library to call Service B's endpoint (URI). Communication occurs over the network, and Service A is blocked until it receives a response or timeout.

When to Use?

  • Critical & Real-time Requests: When the sending service strongly needs the result from the receiving service to complete its transaction.
    • Example: Auth Service verifies token from User Service.
    • Example: Payment Service calls Fraud Service to check risk before proceeding with the transaction.

Advantages vs Disadvantages

AdvantagesDisadvantages
Simple and Direct: Easy to implement and debug.Tight Coupling: If Service B dies, Service A can fail (Cascading Failure).
Real-time Response: Results available immediately (Low latency).Scalability Challenge: Frequently called services become bottlenecks.

2. Asynchronous Pattern (Indirect Communication)

This pattern uses an intermediary (Broker) and is typically driven by Event-Driven Architecture (EDA). The sending service does not need to wait for a response.

Mechanism

  • Primary Protocols: Messaging/Event Streaming (AMQP, Kafka Protocol).
  • How It Works: Service A (Producer) sends a message/event to a Message Broker (Kafka, RabbitMQ). The broker stores the message. Service B (Consumer) retrieves the message from the broker when ready. Service A and B never communicate directly.

When to Use?

  • Long-Running Task Integration: For tasks that take a long time or are not urgent.
    • Example: After Order Service creates an order, it sends an ORDER_CREATED event to the Broker. Other services (Inventory, Email, Data Analytics) consume this event.
    • Example: Worker Pool retrieves tasks from Queue.

Advantages vs Disadvantages

AdvantagesDisadvantages
Loose Coupling: Services are independent, no cascading failure if one service dies.Eventual Consistency: Data is not synchronized instantly.
Resilience: Messages remain in the broker even if the consumer dies, ensuring data processing (data durability).Complex Debugging: Difficult to track message flow across various services.

Event-Driven Architecture

Event-Driven Architecture (EDA) is an architectural pattern where services in a system communicate by emitting "Events" (occurrences), rather than calling each other directly (Direct Request).

1. Analogy: "Phone Call" vs. "WhatsApp Group"

  • REST API (Request-Response) = Phone Call
    • Service A calls Service B: "Hey, please update the stock. I'll wait on the phone until you're done."
    • Problem: If Service B is busy or dies, Service A gets stuck (waiting/timeout). They are Coupled (tightly bound).
  • Event-Driven Architecture = WhatsApp Group Chat
    • Service A sends a message to the Group: "Guys, there's a NEW ORDER (Event)!" Then Service A immediately continues with other work.
    • Service B (Warehouse) reads that message -> Deduct stock.
    • Service C (Email) reads that message -> Send invoice.
    • Advantage: Service A doesn't care who reads it, when it's read, or if Service B is down. What matters is it has reported. They are Decoupled (not bound).

2. Visual Diagram

graph LR
subgraph PRODUCER
User((User)) -->|Checkout| OrderAPI[🛒 Order Service]
OrderAPI -->|1. Create Order| DB1[(Order DB)]
end

subgraph EVENT_BUS
OrderAPI -- 2. Publish Event:<br/>'OrderCreated' --> Broker{{Event Bus / Broker<br/>RabbitMQ / Kafka}}
end

subgraph CONSUMERS
Broker -- 3. Push Event --> Inv[Inventory Service]
Broker -- 3. Push Event --> Notif[Notification Service]
Broker -- 3. Push Event --> Analytics[Analytics Service]
end

subgraph ACTIONS
Inv -->|Update Stock| DB2[(Inventory DB)]
Notif -->|Send Email| User
Analytics -->|Update Dashboard| DB3[(Data Warehouse)]
end

3. Main Components of EDA

  1. Event Producer (Publisher):
    • Component that detects occurrences.
    • Example: When a user clicks "Pay", Order Service becomes a Producer that publishes the ORDER_PAID event.
  2. Event Router / Broker (Intermediary):
    • Infrastructure "pipeline" or "channel" where events flow.
    • Popular tools: RabbitMQ, Apache Kafka, AWS SNS/SQS, Google Pub/Sub.
  3. Event Consumer (Receiver):
    • Component that "listens" (subscribes) to specific events and reacts.
    • Example: Inventory Service listens to the ORDER_PAID event to reduce stock.

4. Why Event-Driven Architecture?

  1. Decoupling (Separation of Dependencies):
    • If Notification Service errors/dies, Order Service does NOT error. Users can still shop. The email is just delayed (in the broker queue), and will be sent when the service comes back online.
  2. Scalability:
    • If shopping traffic increases dramatically, we can scale up Order Service servers only without needing to scale up Analytics Service servers at that moment.
  3. Extensibility (Ease of Development):
    • Tomorrow the boss asks for a new feature: "Every time there's an order, send data to the Marketing team."
    • We just create a new service that listens to the ORDER_CREATED event. We do NOT need to edit the Order Service code at all. Safe from regression bugs.

5. Challenges

  1. Complexity: Tracking flow becomes difficult. "Which event caused this stock reduction?" Debugging is harder than in a typical monolith system.
  2. Eventual Consistency: Data is not synchronized instantly. The user has paid, but the warehouse stock might only decrease 2 seconds later. Applications must be designed to tolerate this delay.

API Gateway

Think of API Gateway as the main receptionist or valet parking attendant at a large hotel that has many services (restaurant, spa, gym, rooms).

  • Without Gateway: Every client must know the specific address of each service (gym address, restaurant address, etc.). If addresses change, all clients must be updated.
  • With Gateway: Clients only know one address (the hotel address). They give requests to the Gateway. The Gateway knows exactly where the requested service is and forwards the request.

Functions

API Gateway does more than just routing (forwarding requests). In a Senior Fullstack context, API Gateway takes over many operational tasks that should not be handled by backend services (such as Payment Service or Auth Service).

FunctionDescriptionSeniority
Routing & CompositionForwards requests to the appropriate backend service. Can combine responses from multiple services into one response for the client.Avoids clients having to make calls to 5 different endpoints.
Authentication & AuthorizationValidates JWT Token (Bearer Token) on every request. If the token is invalid, the request is rejected before reaching the backend service.Security Layer 1: Backend Service doesn't need to bother validating tokens.
Rate LimitingLimits the number of requests a client (IP/User) can make per unit time (e.g., 100 requests per minute).Denial of Service (DoS) Protection: Protects backend services from traffic spikes.
Load BalancingDistributes incoming requests to multiple backend service instances to avoid overload.High Availability: Ensures no single server is overwhelmed.
Logging & MonitoringRecords all incoming and outgoing requests for audit and performance monitoring purposes.Observability: Provides a centralized view of all system traffic.
Protocol TranslationAccepts requests from clients (REST/HTTP) and translates them to service protocol (e.g., gRPC).Flexibility: Allows backend teams to freely choose the best protocol.

Relationship with Other Components

ComponentRelationship with API Gateway
Client (FE/Mobile)API Gateway is the only endpoint they call.
Load Balancer (Nginx/ALB)Gateway is usually behind the Load Balancer (or can even function as a Load Balancer).
MicroservicesGateway is a client of Microservices. Gateway knows the internal addresses of Microservices.
Authentication FlowIn the authentication flow, Gateway is where token validation occurs (step 6 in your flow).

API Gateway in Architecture

graph LR
subgraph Client_Side
A[Web Browser]
B[Mobile App]
C[Third Party Client]
end

subgraph Infrastructure
GW[API Gateway - Nginx / Kong / AWS API GW]
Middleware[Middleware Layer]
GW -->|Authentication & Rate Limiting| Middleware
end

subgraph Backend_Services
S1[Order Service]
S2[Payment Service]
S3[Auth Service]
S4[Inventory Service]
end

A -->|Request /v1/orders| GW
B -->|Request /v1/payments| GW
C -->|Request /v1/users| GW

Middleware -->|Route /orders| S1
Middleware -->|Route /payments| S2
Middleware -->|Route /auth| S3
Middleware -->|Route /inventory| S4

API Gateway vs. Load Balancer

  • Load Balancer (LB): Works at OSI level 4/7. Its function is purely traffic distribution based on algorithms (Round Robin, Least Connections). LB doesn't see business logic like tokens.
  • API Gateway: Works at level 7. Has application awareness. Gateway can verify JWT tokens, modify request structure, and make routing decisions based on business logic (e.g., if user_id is admin, send to version 2 of the service).

In modern systems, you often use both: ALB (Load Balancer) in front of API Gateway (for basic distribution) or use integrated solutions like Kong, Tyk, or AWS API Gateway.

Distributed Tracing

Distributed Tracing is a solution for observability (the ability to observe a system) problems in complex systems.

Imagine you send a POST request /v1/payments to your API Gateway. This request might involve 7 different services:

  1. API Gateway (Auth check)
  2. Payment Service (Record request)
  3. User Service (Check user balance)
  4. Fraud Service (Risk analysis)
  5. Database (Start transaction)
  6. Queue (Send event)
  7. Worker Service (Process event)

If this request fails or is slow, how do you know where the problem is?

  • Without Tracing: You only know the API Gateway took 5 seconds to respond.
  • With Tracing: You know 4 seconds were spent in Fraud Service due to a timeout connection to a third-party service.

Core Concepts

Distributed Tracing works with two main components:

  1. Trace: The complete journey of a request, from start to finish.
  2. Span: A single unit of work within a Trace. A Span has a unique ID, start time, and end time. Each call to a service, each query to a database, or each important function call is one Span.

How It Works

Every time a service calls another service (both Synchronous REST and Asynchronous Messaging), it must pass a special header, called Context Propagation (Context Forwarding).

  1. Service A receives a request and creates a new Trace ID and Span ID.
  2. When Service A calls Service B, it includes the same Trace ID in the HTTP header (e.g., traceparent header).
  3. Service B receives the header, knows it's part of an existing Trace, and creates a new Span ID under the same Trace ID.
  4. All these Spans are then sent to a Tracer Collector (such as Jaeger or Zipkin) for visualization.

OpenTelemetry (OTel)

OpenTelemetry (OTel) is an open-source project that has become an industry standard (CNCF Project) for generating, collecting, and exporting telemetry data (Metrics, Logging, and Tracing) universally and vendor-agnostically.

Distributed Tracing in Architecture Diagram

graph LR
Client -->|1. Request| GW[API Gateway]
GW -- 2. Create Trace ID & Span A --> PaymentS[💳 Payment Service]
PaymentS -- 3. Propagate Context & Span B --> FraudS[🕵️ Fraud Service]
FraudS -- 4. Propagate Context & Span C --> DB[(Database)]

PaymentS -- 5. Export Span B --> OTelC[OTel Collector]
FraudS -- 6. Export Span C --> OTelC
DB -- 7. Export Span D --> OTelC

OTelC -- 8. Store/Visualize --> Backend[Jaeger/Grafana]

subgraph Instrumentation
GW
PaymentS
FraudS
DB
end

The Interview Angle

  1. Identifying Latency Bottlenecks: Quickly find which service or function is slowest (like Fraud Service taking 4 seconds in the earlier example).
  2. Debugging Asynchronous Flows: Allows you to see the entire workflow (Trace) even if it involves RabbitMQ/Kafka, because OTel also supports context propagation through queue messages.
  3. Improving Signal-to-Noise Ratio: You can search for all logs related to one request (Log Correlation) using Trace ID, instead of searching through thousands of irrelevant log lines.

Configuration Management

Config Management is the practice and tools used to manage, store, and distribute application configuration data (settings) centrally, rather than storing it directly in code or local files.

Problems Without Config Management

In traditional Monolith environments, configuration such as database connection strings, API keys, and port numbers are often stored in .env or application.properties files within the code repository.

This causes problems in distributed environments:

  1. Security: Storing secrets (secret keys) in code (even .env) risks leakage, especially if committed to Git.
  2. Scalability: If you have 20 microservices, and the database password changes, you must update, rebuild, and deploy all 20 services one by one.
  3. Environment: Difficult to distinguish configuration between Dev, Staging, and Production without error-prone manual processes.

Centralized Config Management Concept:

ComponentDescriptionPopular Tools
Configuration ServerA separate service that stores, manages versions, and serves configuration (e.g., from Git or Database).HashiCorp Consul, Spring Cloud Config, AWS AppConfig.
Secret StoreSpecial solution for storing super sensitive information (such as API keys and passwords).HashiCorp Vault, AWS Secrets Manager, K8s Secrets.
Client AgentLibrary or daemon installed on applications (Microservices) to fetch configuration from the Server.Built-in client libraries, Sidecar Containers (Kubernetes).

The Process Flow

graph TD
User(Developer) --> Git[Update Config in Git]
Git --> CS[Config Server - Consul / Vault]

subgraph Microservices_Cluster
A[Service A]
B[Service B]
A -->|Startup: Fetch Config| CS
B -->|Startup: Fetch Config| CS
end

CS -->|Push Update Hot Reload| A
CS -->|Push Update Hot Reload| B

Flow Explanation:

  1. When Service A and Service B are booting (startup), they don't have database configuration. They only know the Config Server address.
  2. They contact the Config Server (CS) and say: "I'm Service A, in Production environment."
  3. Config Server returns the correct configuration (e.g., DB_URL_PROD).
  4. Hot Reload: If the DB URL is changed in CS, CS can notify Service B to pull the new configuration without needing to restart or redeploy.

The Senior Angle

  1. Decoupling Configuration from Code: You can change the timeout in Service B without touching a single line of code in Service B's repository.
  2. Zero Downtime Updates: With hot reloading, you can change feature flags or endpoint URLs without causing downtime.
  3. Auditing & Versioning: Because configuration is often stored in Git or Vault, every change is controlled and has version history. This is important for compliance.
  4. Security (Secrets Management): Separating secrets (sensitive keys) to a protected Secret Store (like Vault) ensures these keys are never touched by operators or regular deployment pipelines.

Health Check (Liveness Probe)

Health Check, often called Liveness Probe (Life Check), aims to determine: Is this service still alive and running well?

Main Purpose: Self-Healing

  • Mindset: If an instance (server/container) fails, kill and restart it!
  • Mechanism: Periodic calls (e.g., every 10 seconds) to an internal endpoint like /health or /liveness.
  • Failure Conditions: This call will fail if:
    • The application experiences deadlock (stuck).
    • The application runs out of memory (OOM - Out of Memory).
    • The main application thread is blocked.

System Action

If Health Check fails (e.g., 3 times in a row), the orchestrator (such as Kubernetes or Docker Swarm) will consider that instance completely failed and will automatically restart or replace that instance.

Readiness Check (Readiness Probe)

Readiness Check, often called Readiness Probe (Readiness Check), aims to determine: Is this service ready to receive traffic from outside?

Main Purpose: Traffic Management

  • Mindset: The service can be alive, but don't send traffic to it until it's truly ready!
  • Mechanism: Periodic calls to an internal endpoint like /ready or /readiness.
  • Failure Conditions: This call will fail if:
    • The service was just restarted and is still loading configuration or cache.
    • The service is alive, but connection to database or Config Server has not been established successfully.
    • The service is in graceful shutdown and wants to drain traffic.

System Action

If Readiness Check fails, the Load Balancer (or Kubernetes Service) will remove that instance from the list of targets receiving traffic.

  • That instance remains alive (not restarted).
  • After Readiness Check succeeds again, that instance is added back to the traffic pool.

Differences

FeatureHealth Check (Liveness)Readiness Check (Readiness)
QuestionShould I be restarted?Am I allowed to receive traffic?
PurposeMaintain internal health (self-healing).Maintain service quality (traffic management).
When Check FailsService deadlock/crash.Service just started or DB connection lost.
System ActionRestart container/pod.Remove instance from Load Balancer/Service.

Example Case

  1. You deploy a new Payment Service.
  2. Service is alive (Liveness Check OK), but needs 30 seconds to initialize connection to PostgreSQL.
  3. During these 30 seconds, Readiness Check will Fail.
  4. Load Balancer will not send traffic to the new Service.
  5. After 30 seconds, DB connection succeeds, Readiness Check OK.
  6. Load Balancer starts sending traffic to the new Service.

This guarantees Zero Downtime Deployment (Deployment without downtime).

Service Contract

Service Contract is a formal agreement between two services regarding how they will communicate and exchange data. It defines the structure, format, and protocol that both parties must follow.

  • Contract Contents: This contract typically includes:
    • Protocol: REST (JSON), gRPC (Protocol Buffers), or Asynchronous (Kafka Schema).
    • Endpoint: Available URI/Method (POST /v1/payments).
    • Payload (Schema): Structure of data sent and received, including data types and required/optional fields (e.g., .proto file for gRPC or JSON schema).
  • Importance: The contract ensures that even though Service A and Service B are developed by different teams using different languages, they can interact successfully. Changes to Service B that violate the contract will break Service A.

Bounded Context

Bounded Context is a core concept in Domain-Driven Design (DDD). It defines the logical boundary where a term, model, or business concept has a single, consistent meaning.

  • Mindset: In the real world, the word "Customer" has different meanings in each department.
    • In Sales/Marketing, "Customer" is a Lead or Prospect.
    • In Shipping/Logistics, "Customer" is a Recipient with shipping address.
    • In Accounting, "Customer" is a Payer with tax information.
  • In Microservices: Each Bounded Context should be the basis for a separate Microservice.
    • Customer Service: Has Customer model (name, email, status).
    • Shipping Service: Has Recipient model (address, coordinates).
    • Both services have different internal data models, even though they refer to the same person. This prevents ambiguity and spaghetti code.

Observability

Observability is the ability to know the internal state of a system only by observing data that comes out of that system. This is the evolution of Monitoring.

  • Monitoring vs. Observability:
    • Monitoring tells you WHAT is wrong (e.g., CPU Usage 90%).
    • Observability helps you find out WHY it's wrong (e.g., because there are 3 slow query requests from a specific IP).
  • Three Pillars of Observability: To achieve Observability, the system must produce three types of data (often collected using OpenTelemetry we discussed earlier):
    • Metrics: Numerical data collected over time (CPU, RAM, Latency).
    • Logging: Textual records of discrete events (error, warning, info).
    • Tracing: Complete path of a request across services (includes log and metric correlation).

Circuit Breaker Pattern

Circuit Breaker Pattern is a design pattern that improves system resilience by preventing services from wasting time and resources repeatedly trying to connect to services that are down or slow.

  • Analogy: Like a circuit breaker in your house. If there's a short circuit, the circuit cuts off the electrical flow to protect your equipment.
  • How It Works:
    1. Closed (Closed): Normal state, all requests pass through.
    2. Open (Open): If X consecutive failures occur (e.g., 5 timeouts), the circuit "opens". Subsequent requests are not sent to the failed service, but immediately returned with a fast error response.
    3. Half-Open (Half-Open): After some time (recovery timeout), the circuit tries to send one test request. If successful, the circuit returns to Closed. If it fails, it stays Open.
  • Benefit: Prevents Cascading Failure where one down service brings down all services that call it.

Idempotent Worker

Idempotency is a property where performing the same operation multiple times gives the same result as if the operation was only performed once.

  • Idempotent Worker: In an Asynchronous system (with Message Queue), a Worker is called idempotent if it can process the same message more than once without causing unwanted duplicate side effects.
  • Problem Solved: In distributed systems, network failures or timeouts in Queue often cause messages to be sent at least once (at least once), meaning the same message may be received twice (called duplication).
    • Example: Worker processes PAYMENT_APPROVED twice.
  • Implementation: Worker must use a unique Transaction ID (or Idempotency Key):
    1. Worker receives a message.
    2. It checks in the database (or Redis) whether the Idempotency Key has been processed before.
    3. If YES, Worker ignores it (return success).
    4. If NO, Worker processes it and records the key.

This is very important in financial and payment domains to prevent double billing or duplicate order processing.