Skip to content

Allow compacting and maintaining a monotonic ordering #3538

Open
@tolleybot

Description

@tolleybot

Problem Statement

After running OPTIMIZE (compact) on a Delta table, the file/row layout is not guaranteed to be globally sorted by (objectId, dateTime)
clients must perform an extra sort on read, which adds latency, complexity, and resource pressure.

Background & Motivation

  • Delta Lake’s current compaction merges files but does not enforce a global sort order.
  • Existing Z-order / clustering features are orthogonal; we need deterministic, monotonic ordering.

Proposed Feature

  1. Add a global sort phase to the compaction pipeline:

    • After bin-packing files, feed all pages/records through a DataFusion df.sort(["objectId","dateTime"]) before writing.
    • Preserve the existing unsorted path when sorting is disabled for backward compatibility.
  2. Expose new options on the OptimizeBuilder (Rust) and DeltaTable.optimize (Python):

    • sort_enabled: bool (default = true)
    • sort_columns: Vec<String> / List[str] (default = ["objectId","dateTime"])
    • Builder methods:
      • Rust: .with_sort_columns(&[...]), .disable_sort()
      • Python: dt.optimize.compact(sort_enabled=False, sort_columns=["foo"])

Requirements & Constraints

  • Strict Ordering: Global sort by (objectId, dateTime) across all partitions and pages.
  • Performance: Sorting should not unduly impact compaction throughput; use DataFusion’s spillable memory pools.
  • Configurable: Users can disable sorting or choose a different sort key.
  • Backward Compatible: Unsorted compaction remains available; default behavior may change only when opted in.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @tolleybot

    Issue actions

      Allow compacting and maintaining a monotonic ordering · Issue #3538 · delta-io/delta-rs