Pagination with Elasticsearch vs SQL
Introduction
Pagination is a core feature of search and listing experiences. Traditional SQL pagination (LIMIT/OFFSET) is simple but degrades at high offsets and can return inconsistent results as data changes. Elasticsearch offers alternative pagination models (search_after, scroll, point-in-time) that improve latency and consistency for large, sorted result sets. This guide compares approaches and outlines when to choose Elasticsearch over direct SQL.
When to Consider Elasticsearch for Pagination
- Search-heavy read patterns with flexible filtering and relevance scoring.
- Very large datasets where deep pagination is common (page > 100, offset in the tens of thousands).
- Need for full-text search, stemming, fuzzy matching, or custom scoring.
- Requirements to keep pagination stable while the underlying data is mutating.
- Multi-field sorting (e.g., relevance desc, created_at desc) with fast response times.
High-Level Architecture
graph TD
Client[Web/Mobile/App] --> API[API Gateway]
API --> SearchSvc[Search Service]
SearchSvc --> ES[(Elasticsearch Cluster)]
API --> DB[(SQL DB)]
- Search path (Elasticsearch): Used for search-driven listing with stable cursor-based pagination.
- SQL path: Used for transactional writes and small, strongly consistent lists (e.g., user-owned items).
- Data pipeline (not shown) feeds SQL changes into Elasticsearch via CDC or change streams.
SQL Pagination Approaches
1) Offset/Limit (Simple, but Slow at High Offsets)
SELECT id, title, created_at
FROM products
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;
- Pros: Easiest to implement.
- Cons: Late pages require scanning and discarding many rows; latency grows with offset; unstable if rows are inserted/deleted mid-scan.
2) Keyset (Seek) Pagination (Recommended in SQL)
-- Use the last seen created_at/id as a cursor
SELECT id, title, created_at
FROM products
WHERE created_at < $last_seen_ts
ORDER BY created_at DESC
LIMIT 20;
- Pros: O(page_size) instead of O(offset); stable ordering; good for append-only timelines.
- Cons: Requires deterministic sort key (often compound: created_at, id); harder to jump to arbitrary pages.
Elasticsearch Pagination Approaches
1) from/size (Offset)
{
"from": 10000,
"size": 20,
"sort": [{ "created_at": "desc" }, { "id": "desc" }]
}
- Pros: Familiar, works for shallow pages.
- Cons: Expensive for deep pages;
index.max_result_window(default 10k) limits depth; re-sorting large windows hurts latency and memory.
2) search_after (Cursor-Based, Preferred)
{
"size": 20,
"sort": [{ "created_at": "desc" }, { "id": "desc" }],
"search_after": ["2025-01-01T00:00:00Z", "987654321"]
}
- Use the sort values of the last hit as the cursor.
- Pros: Stable, constant-time pagination; no deep window re-sorting; works well with point-in-time (PIT) to keep a consistent view.
- Cons: Cannot jump to arbitrary page numbers; clients must hold cursor tokens.
3) Point-in-Time (PIT) + search_after (Stable Snapshot)
POST /my-index/_pit?keep_alive=1m
{ }
POST /_search
{
"pit": { "id": "<pit-id>", "keep_alive": "1m" },
"size": 20,
"sort": [{ "created_at": "desc" }, { "id": "desc" }],
"search_after": ["2025-01-01T00:00:00Z", "987654321"]
}
- Pros: Stable snapshot across pages even while index mutates; ideal for user-facing pagination.
- Cons: PITs consume cluster resources; need sensible
keep_alive; clean up unused PITs.
4) Scroll API (Batch/Data Export)
- Designed for large batch exports and reindexing, not user-facing pagination.
- Holds resources on the cluster; avoid for interactive flows.
Comparison: Elasticsearch vs SQL for Pagination
| Dimension | Elasticsearch (search_after/PIT) | SQL (Keyset) |
|---|---|---|
| Deep pagination latency | Stable; no OFFSET scan | Stable; no OFFSET scan |
| Arbitrary page jumps | Not supported | Not supported (keyset) |
| Text search & relevance | Strong (analyzers, scoring) | Weak without extensions |
| Multi-field sort at scale | Efficient doc values | Depends on indexes; can be costly |
| Consistency | Snapshot isolation with PIT; eventual between ES and source | Strong within DB; exact source of truth |
| Operational complexity | Needs ingestion pipeline, index tuning | Simpler ops; fewer moving parts |
| Result stability under writes | PIT keeps view stable; without PIT may drift | Stable within transaction; may drift across pages without locks |
Design Recommendations
- Use Elasticsearch + search_after + PIT for search or browse experiences where users navigate many pages and expect stable ordering.
- Use SQL keyset pagination for transactional lists, admin tables, or when exact source-of-truth consistency is required and deep pagination is shallow (< few hundred rows).
- Avoid offset-based pagination for large offsets in both systems; keep for small pages only.
- Define a deterministic composite sort key (e.g.,
created_at desc, id desc) to prevent tie-related duplication or skipping. - Set guardrails: cap maximum pages, provide
load morerather than numeric page jumps, and expire cursors.
Operational Considerations (Elasticsearch)
- Index mappings: Store sortable fields as keyword/numeric/date with
doc_valuesenabled. - Hot-warm-cold tiers: Keep recently active data in hot nodes for faster pagination.
- Circuit breakers: Monitor heap usage; large
sizevalues increase memory. - Refresh vs consistency: Default refresh (1s) is usually enough; PIT stabilizes reads.
- Ingestion lag: Monitor CDC/ingest latency; stale data impacts user experience.
Example End-to-End Flow (Search Service)
- API receives request with optional cursor token.
- If no cursor, create PIT (keep_alive ~1m) and run initial search with
sizeandsort. - Return results plus:
next_cursor: last hit's sort values and PIT id.pit_keep_alive: recommended keep_alive for the client to reuse.
- On next page, client sends
next_cursor; service reuses PIT and issuessearch_after. - Close PIT early when user ends session or after timeout.
Checklist
- Use deterministic composite sort keys for pagination.
- Prefer
search_afteroverfrom/sizefor deep pages. - Pair
search_afterwith PIT for stable views. - Keep
sizereasonable (e.g.,<= 100) to avoid heavy heap usage. - Cap max pagination depth; offer "load more" UX instead of page numbers.
- Monitor ingest lag and PIT counts; expire unused PITs.
- Use SQL keyset pagination for transactional, strongly consistent lists.
- Avoid Scroll API for interactive pagination.