System Design Interview

Sample Case

High performance live-commerce service

Overview

At Souzoh, a project to build a live-commerce service has been launched. This service will attract a large number of users in a short period of time as influencers and celebrities sell their premium products.

Context

We want to build a service which has an advantage in buyer UX. So high-performance is also required even though a large number of users purchase products in a short period of time. Minimizing latency and making customers feel comfortable is a key metric of the system.

Goal

Meet following requirements:

1,000 transactions per sec at peak time
On the read API to get the remaining stocks, 99 percentile latency is less than 100ms
On the purchase API, 99 percentile latency is less than 300ms

Notice

No need to answer about streaming technology in this topic
Think about the design in your area of expertise, such as Server side, Client side, Infrastructure, etc
- Don't have to think of everything

Answer

What to prepare in server side

Microservices Architecture:

Service Orchestration:
- Tool: Kubernetes
- Description: Kubernetes is a container orchestration platform that helps manage and scale containerized applications. It allows you to deploy, scale, and manage microservices independently.
Communication between Microservices:
- Tool: gRPC
- Description: gRPC is a high-performance, open-source RPC (Remote Procedure Call) framework developed by Google. It can be used for communication between microservices.
Service Discovery:
- Tool: Consul or etcd
- Description: Consul and etcd are service discovery tools that help microservices locate and communicate with each other.

Caching:

In-Memory Caching:
- Tool: Redis
- Description: Redis is an in-memory data store that can be used for caching frequently accessed data, such as product information and user sessions.

Load Balancing:

Load Balancer:
- Tool: Nginx or HAProxy
- Description: Nginx and HAProxy are popular open-source load balancers that can distribute incoming traffic across multiple servers and microservices.

Database Sharding:

Sharding Database:
- Tool: Percona XtraDB Cluster:
- Description: Percona XtraDB Cluster is a high availability and high scalability solution for MySQL clustering. It supports synchronous multi-master replication.
Horizontal Scaling Database:
- Tool: Amazon Aurora or CockroachDB
- Description: Amazon Aurora is a MySQL and PostgreSQL-compatible relational database that supports horizontal scaling. CockroachDB is a distributed SQL database that also supports horizontal scaling.

Asynchronous Processing:

Message Queue:
- Tool: RabbitMQ or Apache Kafka
- Description: RabbitMQ and Apache Kafka are popular message queue systems that allow you to decouple components by enabling asynchronous communication. Use this for offloading non-time-sensitive tasks.
Background Job Processing:
- Tool: Celery (with Redis as a broker)
- Description: Celery is a distributed task queue that can be used for processing background tasks asynchronously.

What to prepare in infratructure side

1. Auto-Scaling and Load Balancing:

Auto-Scaling:
- Tool: Kubernetes Horizontal Pod Autoscaler (HPA) or Amazon EC2 Auto Scaling
- Description: Implement auto-scaling to automatically adjust the number of server instances based on traffic. Kubernetes HPA works for containerized applications, while EC2 Auto Scaling is suitable for traditional virtual machines.
Load Balancer:
- Tool: AWS Elastic Load Balancer (ELB) or Nginx Ingress Controller
- Description: Use a load balancer to distribute incoming traffic across multiple instances or microservices. ELB is a managed service on AWS, and Nginx Ingress Controller can be used for Kubernetes environments.

2. Content Delivery Network (CDN) for Static Assets:

CDN Service:
- Tool: Amazon CloudFront, Akamai, or Cloudflare
- Description: Utilize a CDN to cache and deliver static assets (images, stylesheets) closer to users. This reduces latency and offloads traffic from the origin server.

3. Content Compression:

Compression Middleware:
- Tool: Nginx or Apache with gzip compression
- Description: Compress content before sending it to clients to reduce bandwidth usage and improve loading times. Configure your web server to enable gzip compression for text-based assets.

4. Distributed Databases:

Distributed MySQL:
- Tool: Percona XtraDB Cluster or MySQL Cluster
- Description: Deploy a distributed MySQL database that supports horizontal scaling. Percona XtraDB Cluster is known for high availability and scalability.
Global Distribution:
- Tool: Amazon Aurora Global Databases or CockroachDB
- Description: If your user base is distributed globally, consider using a database solution that allows for global distribution, reducing latency for users in different regions.

5. Geographical Distribution:

Multi-Region Deployment:
- Tool: Amazon Route 53 or Cloudflare Load Balancer
- Description: Deploy your application in multiple geographic regions to reduce latency for users. DNS-based global load balancing can route users to the nearest server.
Global Load Balancer:
- Tool: Cloudflare Load Balancer or Google Cloud Load Balancing
- Description: Use a global load balancer to distribute traffic to the nearest server based on the user's location.

6. Monitoring and Optimization:

Monitoring and Logging:
- Tool: Prometheus for monitoring, Grafana for visualization, and ELK Stack (Elasticsearch, Logstash, and Kibana) for logging
- Description: Implement comprehensive monitoring and logging solutions to track system performance, identify bottlenecks, and troubleshoot issues promptly.
Continuous Optimization:
- Tool: Use tools like Google PageSpeed Insights or Lighthouse for web performance analysis
- Description: Regularly review and optimize code, database queries, and system configurations for better performance. Tools like PageSpeed Insights can provide insights into web performance.

What to prepare in client side

1. Front-End Framework:

Framework:
- Example: React, Angular, or Vue.js
- Description: Choose a modern front-end framework to efficiently manage the UI components, state, and interactions. This allows for better organization of code and promotes reusability.

2. Optimized User Interface:

Responsive Design:
- Practice: Implement responsive design principles using CSS media queries.
- Description: Ensure that the user interface is responsive and adapts to various screen sizes and devices, providing a consistent experience.
Lazy Loading:
- Practice: Implement lazy loading for images and components.
- Description: Load assets and components only when needed to reduce the initial page load time.

3. API Communication:

Efficient API Requests:
- Practice: Use efficient data fetching techniques, such as GraphQL or RESTful APIs.
- Description: Optimize API requests to minimize data transfer and ensure quick response times.
Pagination and Infinite Scroll:
- Practice: Implement pagination or infinite scroll for product listings.
- Description: Break down large sets of data into manageable chunks to improve the loading performance.

4. State Management:

State Management Library:
- Example: Redux (for React), Vuex (for Vue.js), or NgRx (for Angular)
- Description: Use a state management library to manage the application state efficiently, especially when dealing with complex data flows.

5. Performance Optimization:

Code Splitting:
- Practice: Implement code splitting for large applications.
- Description: Split the application code into smaller chunks to only load what is necessary for the current view.
Bundle Size Analysis:
- Tool: Webpack Bundle Analyzer or Source Map Explorer
- Description: Regularly analyze the size of your JavaScript bundles to identify and address any large dependencies.

6. Client-Side Caching:

Browser Caching:

Practice: Leverage browser caching for static assets.
Description: Set appropriate cache headers to allow browsers to store and reuse assets, reducing the need for repeated downloads.

7. Error Handling and User Feedback:

Error Handling:
- Practice: Implement proper error handling for API requests and critical operations.
- Description: Provide clear error messages and gracefully handle failures to ensure a good user experience.
Loading Indicators:
- Practice: Use loading indicators during data fetching or processing.
- Description: Inform users about ongoing processes to manage expectations and improve perceived performance.

8. Implement Client Monitoring

Real-Time Error Detection:

Sentry provides real-time error tracking, allowing you to detect and identify errors as they occur in the client-side code. This enables proactive issue resolution and minimizes the impact on users.

Error Insights and Analytics:

Sentry collects detailed information about errors, including the stack trace, user context, and environment details. This data provides insights into the root causes of issues, helping developers understand and prioritize bug fixes.

How To Achieve The Goals

1. Achieving 1,000 Transactions Per Second:

Server-Side:

Scaling:
- Implement horizontal scaling for microservices to handle the increased load.
- Use container orchestration tools like Kubernetes for efficient scaling.
Optimized Code:
- Optimize the codebase to reduce processing time per transaction.
- Use efficient algorithms and data structures to handle transactions.
Connection Pooling:
- Use connection pooling for database connections to efficiently manage database access.

Infrastructure:

Load Testing:
- Conduct thorough load testing to identify the maximum capacity of the system.
- Simulate traffic spikes to ensure the infrastructure can handle peak loads.
CDN for Static Assets:
- Offload static assets (images, stylesheets) to a Content Delivery Network (CDN) to reduce server load and improve response times.

2. Read API - 99 Percentile Latency < 100ms:

Server-Side:

Caching:
- Implement a caching mechanism for product details and remaining stocks.
- Use an in-memory cache like Redis to store frequently accessed data.
Database Indexing:
- Optimize database queries by ensuring proper indexing on columns used in read operations.
- Consider denormalization for frequently read data.

Infrastructure:

Edge Caching:
- Use edge caching (CDN) for read API responses to reduce latency for geographically distributed users.
Content Compression:
- Compress API responses to reduce data transfer time.

3. Purchase API - 99 Percentile Latency < 300ms:

Server-Side:

Asynchronous Processing:
- Offload non-critical tasks, such as order confirmation emails, to asynchronous processes.
- Ensure that the purchase API remains responsive for critical operations.
Optimized Database Transactions:
- Optimize database transactions related to purchases to minimize the time taken for critical operations.

Infrastructure:

Distributed Databases:
- Use distributed databases to handle the increased load during peak times for the purchase API.
- Ensure that databases are geographically distributed to reduce latency.
Auto-Scaling:
- Set up auto-scaling specifically for the services handling purchase transactions.
- Monitor and adjust resources dynamically to meet demand.

Monitoring and Continuous Improvement:

Real-Time Monitoring:
- Implement real-time monitoring for both APIs to identify and resolve issues promptly.
Logging and Tracing:
- Use robust logging and tracing mechanisms to trace the performance of each transaction and identify bottlenecks.
Continuous Optimization:
- Regularly analyze performance metrics and optimize code, queries, and infrastructure configurations.

Sample Case​

High performance live-commerce service​

Overview​

Context​

Goal​

Notice​

Answer​

What to prepare in server side​

Microservices Architecture:​

Caching:​

Load Balancing:​

Database Sharding:​

Asynchronous Processing:​

What to prepare in infratructure side​

1. Auto-Scaling and Load Balancing:​

2. Content Delivery Network (CDN) for Static Assets:​

3. Content Compression:​

4. Distributed Databases:​

5. Geographical Distribution:​

6. Monitoring and Optimization:​

What to prepare in client side​

1. Front-End Framework:​

2. Optimized User Interface:​

3. API Communication:​

4. State Management:​

5. Performance Optimization:​

6. Client-Side Caching:​

7. Error Handling and User Feedback:​

8. Implement Client Monitoring​

How To Achieve The Goals​

1. Achieving 1,000 Transactions Per Second:​

Server-Side:​

Infrastructure:​

2. Read API - 99 Percentile Latency < 100ms:​

Server-Side:​

Infrastructure:​

3. Purchase API - 99 Percentile Latency < 300ms:​

Server-Side:​

Infrastructure:​

Monitoring and Continuous Improvement:​

Sample Case

High performance live-commerce service

Overview

Context

Goal

Notice

Answer

What to prepare in server side

Microservices Architecture:

Caching:

Load Balancing:

Database Sharding:

Asynchronous Processing:

What to prepare in infratructure side

1. Auto-Scaling and Load Balancing:

2. Content Delivery Network (CDN) for Static Assets:

3. Content Compression:

4. Distributed Databases:

5. Geographical Distribution:

6. Monitoring and Optimization:

What to prepare in client side

1. Front-End Framework:

2. Optimized User Interface:

3. API Communication:

4. State Management:

5. Performance Optimization:

6. Client-Side Caching:

7. Error Handling and User Feedback:

8. Implement Client Monitoring

How To Achieve The Goals

1. Achieving 1,000 Transactions Per Second:

Server-Side:

Infrastructure:

2. Read API - 99 Percentile Latency < 100ms:

Server-Side:

Infrastructure:

3. Purchase API - 99 Percentile Latency < 300ms:

Server-Side:

Infrastructure:

Monitoring and Continuous Improvement: