Notion Reveals Architecture Behind Offline Mode: CRDT-Powered Sync and Reference Tracking
Key Takeaways
- ▸Notion evolved its SQLite cache into a dedicated persistent storage layer with stronger guarantees about data availability and consistency for offline pages
- ▸The company uses CRDT-based data models for conflict resolution, migrating pages dynamically to this model when marked for offline use
- ▸A sophisticated forest-of-trees data structure tracks multiple reasons each page is kept offline (explicit toggle, auto-download, inheritance from parent), preventing accidental removal of offline access
Summary
Notion published an in-depth technical blog post detailing how the company engineered Offline Mode, a highly requested feature that allows users to create, edit, and view pages without an internet connection. The post explains how Notion evolved its SQLite cache into a persistent storage layer capable of guaranteeing full page availability offline, requiring solutions to complex problems around reference tracking, background syncing, and conflict resolution through CRDT (Conflict-free Replicated Data Type) data models. The engineering challenge centered on maintaining a sophisticated data model to track why pages are available offline—whether explicitly toggled, automatically downloaded due to recent activity, or inherited from parent pages—using a forest of offline page trees rather than a simple set approach. The system ensures that pages remain available offline based on multiple independent reasons, removing them only when all reasons for offline availability have been eliminated.
- Offline inheritance allows users to mark a database as offline and automatically download up to 50 related database pages, maintaining state consistency as the workspace changes
- The architecture balances user experience by refusing to show partially-loaded offline pages rather than displaying incomplete content
Editorial Opinion
Notion's offline implementation represents a thoughtful engineering solution to a complex problem that goes beyond simple caching. The use of a multi-reason tracking system and CRDT models shows how modern applications must balance ambitious features with predictable conflict resolution—this level of architectural sophistication is increasingly expected in productivity tools, especially as remote work and hybrid connectivity patterns become the norm. The decision to prioritize consistency over partial data availability is a principled stance that respects user expectations.



