Skip to main content

Pagination with Elasticsearch vs SQL

Introduction

Pagination is a core feature of search and listing experiences. Traditional SQL pagination (LIMIT/OFFSET) is simple but degrades at high offsets and can return inconsistent results as data changes. Elasticsearch offers alternative pagination models (search_after, scroll, point-in-time) that improve latency and consistency for large, sorted result sets. This guide compares approaches and outlines when to choose Elasticsearch over direct SQL.

When to Consider Elasticsearch for Pagination

  • Search-heavy read patterns with flexible filtering and relevance scoring.
  • Very large datasets where deep pagination is common (page > 100, offset in the tens of thousands).
  • Need for full-text search, stemming, fuzzy matching, or custom scoring.
  • Requirements to keep pagination stable while the underlying data is mutating.
  • Multi-field sorting (e.g., relevance desc, created_at desc) with fast response times.

High-Level Architecture

graph TD
Client[Web/Mobile/App] --> API[API Gateway]
API --> SearchSvc[Search Service]
SearchSvc --> ES[(Elasticsearch Cluster)]
API --> DB[(SQL DB)]
  • Search path (Elasticsearch): Used for search-driven listing with stable cursor-based pagination.
  • SQL path: Used for transactional writes and small, strongly consistent lists (e.g., user-owned items).
  • Data pipeline (not shown) feeds SQL changes into Elasticsearch via CDC or change streams.

SQL Pagination Approaches

1) Offset/Limit (Simple, but Slow at High Offsets)

SELECT id, title, created_at
FROM products
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;
  • Pros: Easiest to implement.
  • Cons: Late pages require scanning and discarding many rows; latency grows with offset; unstable if rows are inserted/deleted mid-scan.
-- Use the last seen created_at/id as a cursor
SELECT id, title, created_at
FROM products
WHERE created_at < $last_seen_ts
ORDER BY created_at DESC
LIMIT 20;
  • Pros: O(page_size) instead of O(offset); stable ordering; good for append-only timelines.
  • Cons: Requires deterministic sort key (often compound: created_at, id); harder to jump to arbitrary pages.

Elasticsearch Pagination Approaches

1) from/size (Offset)

{
"from": 10000,
"size": 20,
"sort": [{ "created_at": "desc" }, { "id": "desc" }]
}
  • Pros: Familiar, works for shallow pages.
  • Cons: Expensive for deep pages; index.max_result_window (default 10k) limits depth; re-sorting large windows hurts latency and memory.

2) search_after (Cursor-Based, Preferred)

{
"size": 20,
"sort": [{ "created_at": "desc" }, { "id": "desc" }],
"search_after": ["2025-01-01T00:00:00Z", "987654321"]
}
  • Use the sort values of the last hit as the cursor.
  • Pros: Stable, constant-time pagination; no deep window re-sorting; works well with point-in-time (PIT) to keep a consistent view.
  • Cons: Cannot jump to arbitrary page numbers; clients must hold cursor tokens.

3) Point-in-Time (PIT) + search_after (Stable Snapshot)

POST /my-index/_pit?keep_alive=1m
{ }

POST /_search
{
"pit": { "id": "<pit-id>", "keep_alive": "1m" },
"size": 20,
"sort": [{ "created_at": "desc" }, { "id": "desc" }],
"search_after": ["2025-01-01T00:00:00Z", "987654321"]
}
  • Pros: Stable snapshot across pages even while index mutates; ideal for user-facing pagination.
  • Cons: PITs consume cluster resources; need sensible keep_alive; clean up unused PITs.

4) Scroll API (Batch/Data Export)

  • Designed for large batch exports and reindexing, not user-facing pagination.
  • Holds resources on the cluster; avoid for interactive flows.

Comparison: Elasticsearch vs SQL for Pagination

DimensionElasticsearch (search_after/PIT)SQL (Keyset)
Deep pagination latencyStable; no OFFSET scanStable; no OFFSET scan
Arbitrary page jumpsNot supportedNot supported (keyset)
Text search & relevanceStrong (analyzers, scoring)Weak without extensions
Multi-field sort at scaleEfficient doc valuesDepends on indexes; can be costly
ConsistencySnapshot isolation with PIT; eventual between ES and sourceStrong within DB; exact source of truth
Operational complexityNeeds ingestion pipeline, index tuningSimpler ops; fewer moving parts
Result stability under writesPIT keeps view stable; without PIT may driftStable within transaction; may drift across pages without locks

Design Recommendations

  • Use Elasticsearch + search_after + PIT for search or browse experiences where users navigate many pages and expect stable ordering.
  • Use SQL keyset pagination for transactional lists, admin tables, or when exact source-of-truth consistency is required and deep pagination is shallow (< few hundred rows).
  • Avoid offset-based pagination for large offsets in both systems; keep for small pages only.
  • Define a deterministic composite sort key (e.g., created_at desc, id desc) to prevent tie-related duplication or skipping.
  • Set guardrails: cap maximum pages, provide load more rather than numeric page jumps, and expire cursors.

Operational Considerations (Elasticsearch)

  • Index mappings: Store sortable fields as keyword/numeric/date with doc_values enabled.
  • Hot-warm-cold tiers: Keep recently active data in hot nodes for faster pagination.
  • Circuit breakers: Monitor heap usage; large size values increase memory.
  • Refresh vs consistency: Default refresh (1s) is usually enough; PIT stabilizes reads.
  • Ingestion lag: Monitor CDC/ingest latency; stale data impacts user experience.

Example End-to-End Flow (Search Service)

  1. API receives request with optional cursor token.
  2. If no cursor, create PIT (keep_alive ~1m) and run initial search with size and sort.
  3. Return results plus:
    • next_cursor: last hit's sort values and PIT id.
    • pit_keep_alive: recommended keep_alive for the client to reuse.
  4. On next page, client sends next_cursor; service reuses PIT and issues search_after.
  5. Close PIT early when user ends session or after timeout.

Checklist

  • Use deterministic composite sort keys for pagination.
  • Prefer search_after over from/size for deep pages.
  • Pair search_after with PIT for stable views.
  • Keep size reasonable (e.g., <= 100) to avoid heavy heap usage.
  • Cap max pagination depth; offer "load more" UX instead of page numbers.
  • Monitor ingest lag and PIT counts; expire unused PITs.
  • Use SQL keyset pagination for transactional, strongly consistent lists.
  • Avoid Scroll API for interactive pagination.