Allow compacting and maintaining a monotonic ordering

 ## Problem Statement

After running `OPTIMIZE` (compact) on a Delta table, the file/row layout is not guaranteed to be globally sorted by `(objectId, dateTime)`
clients must perform an extra sort on read, which adds latency, complexity, and resource pressure.

## Background & Motivation

- Delta Lake’s current compaction merges files but does not enforce a global sort order.
- Existing Z-order / clustering features are orthogonal; we need deterministic, monotonic ordering.

## Proposed Feature

1. Add a **global sort** phase to the compaction pipeline:
   - After bin-packing files, feed all pages/records through a DataFusion `df.sort(["objectId","dateTime"])` before writing.
   - Preserve the existing unsorted path when sorting is disabled for backward compatibility.

2. Expose new options on the `OptimizeBuilder` (Rust) and `DeltaTable.optimize` (Python):
   - `sort_enabled: bool` (default = true)
   - `sort_columns: Vec<String>` / `List[str]` (default = `["objectId","dateTime"]`)
   - Builder methods:
     - Rust: `.with_sort_columns(&[...])`, `.disable_sort()`
     - Python: `dt.optimize.compact(sort_enabled=False, sort_columns=["foo"])`

  ## Requirements & Constraints

  - **Strict Ordering**: Global sort by `(objectId, dateTime)` across all partitions and pages.
  - **Performance**: Sorting should not unduly impact compaction throughput; use DataFusion’s spillable memory pools.
  - **Configurable**: Users can disable sorting or choose a different sort key.
  - **Backward Compatible**: Unsorted compaction remains available; default behavior may change only when opted in.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow compacting and maintaining a monotonic ordering #3538

Problem Statement

Background & Motivation

Proposed Feature

Requirements & Constraints

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow compacting and maintaining a monotonic ordering #3538

Description

Problem Statement

Background & Motivation

Proposed Feature

Requirements & Constraints

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions