-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Tailing Iterator
Since version 2.7, RocksDB supports a special type of iterator (named tailing iterator) optimized for a use case in which new data is read as soon as it's added to the database. Its key features are:
- Tailing iterator doesn't create a snapshot when it's created. Therefore, it can also be used to read newly added data (whereas ordinary iterators won't see any record added after the iterator was created).
- It's optimized for doing sequential reads -- it might avoid doing potentially expensive seeks on SST files and immutable memtables in many cases.
To enable it, set ReadOptions::tailing
to true
when creating a new iterator. Note that tailing iterator currently supports only moving in the forward direction (in other words, Prev()
and SeekToLast()
are not supported).
Not all new data is guaranteed to be available to a tailing iterator. Seek() or SeekToFirst() on a tailing iterator can be thought of as creating an implicit snapshot -- anything written after it may, but is not guaranteed to be seen.
A tailing iterator provides a merged view of two internal iterators:
- a mutable iterator, used to access current memtable contents only
- an immutable iterator, used to read data from SST files and immutable memtables
Both of these internal iterators are created by specifying kMaxSequenceNumber
, effectively disabling filtering based on internal sequence numbers and enabling access to records inserted after the creation of these iterators.
In addition, each tailing iterator keeps track of the database's state changes (such as memtable flushes and compactions) and invalidates its internal iterators when it happens. This enables it to be always up-to-date.
Since SST files and immutable memtables don't change, a tailing iterator can often get away by performing a seek operation only on the mutable iterator. For this purpose, it maintains the interval
(prev_key, current_key]
currently covered by the immutable iterator (in other words, there are no records with key k
such that prev_key < k < current_key
neither in SST files nor immutable memtables).
Therefore, when Seek(target)
is called and target
is within that interval, immutable iterator is already at the correct position and it is not necessary to move it.
Currently, the TailingIterator is not yet supported if there are range tombstones in RocksDB.
To read range tombstones from the active memtable using a forward iterator, RocksDB needs to fragment range tombstones to allow efficient scanning. Since the tailing iterator allows reading updates as soon as they are added to the memtable, it needs to check if a new range tombstone is added to the memtable during each iterator operation and redo the fragmentation. This has not yet been implemented for the forward iterator, and it currently returns Status::NotSupported
if a range tombstone is found in the memtable or L0 files at the time of creation or found in the SST file when reaching a lower level.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc