Skip to content

[spark] Add merge-into.skip-file-pruning option#8065

Open
wzx140 wants to merge 1 commit into
apache:masterfrom
wzx140:codex/merge-into-skip-file-pruning
Open

[spark] Add merge-into.skip-file-pruning option#8065
wzx140 wants to merge 1 commit into
apache:masterfrom
wzx140:codex/merge-into-skip-file-pruning

Conversation

@wzx140
Copy link
Copy Markdown
Contributor

@wzx140 wzx140 commented Jun 1, 2026

Purpose

Add merge-into.skip-file-pruning for MergeInto partial column update on data-evolution tables. When enabled, this option skips the file-level pruning step. It is useful when most files in the target partition are expected to be updated, so the overhead of collecting touched file IDs outweighs the benefit of pruning untouched files.

When file pruning is skipped, Spark merge into still pushes down target-table partition filters from the MERGE ON condition to avoid scanning unrelated partitions.

Tests

  • Added RowTrackingTest cases for enabling/disabling merge-into.skip-file-pruning, including result correctness and file-pruning join behavior.
  • Added RowTrackingTest coverage for target partition filter pushdown from the MERGE ON condition when skip file pruning is enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant