Skip to content

Antalya 26.3: support external paths in Iceberg tables#1859

Open
zvonand wants to merge 2 commits into
antalya-26.3from
feat/antalya-26.3/90740
Open

Antalya 26.3: support external paths in Iceberg tables#1859
zvonand wants to merge 2 commits into
antalya-26.3from
feat/antalya-26.3/90740

Conversation

@zvonand
Copy link
Copy Markdown
Member

@zvonand zvonand commented Jun 1, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support Iceberg tables that have data files outside the table location or on a different object storage. Cherry-picked from ClickHouse#90740 (by @zvonand).

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

Port ClickHouse#90740 to antalya-26.3.

Iceberg tables may now reference files (data files, manifests, manifest
lists) located outside the table location, including on a different
object storage backend. Metadata paths are treated as absolute URIs and
resolved at read/delete time via new object-storage helpers
(`SchemeAuthorityKey`, `resolveObjectStorageForPath`, `SecondaryStorages`),
with the cluster-function protocol bumped to
`DBMS_CLUSTER_PROCESSING_PROTOCOL_VERSION_WITH_ICEBERG_ABSOLUTE_PATH`.

Adds the `s3_propagate_credentials_to_other_storages` setting to optionally
copy base S3 credentials when creating secondary storages.

Notes on porting to this branch:
- Skipped the `ExpireSnapshotsExecute`, `RemoveOrphanFilesExecute` and
  `SnapshotFilesTraversal` files: this functionality does not exist in
  `antalya-26.3`. The `executeCommand` branch using them was dropped and
  the existing `expireSnapshots` implementation is kept.
- Dropped the `S3UriStyle uri_style` `S3::URI` parameter (from an unrelated
  upstream change not in this branch); only `enable_url_encoding` is added.
- Dropped the upstream-only `_path` virtual column `storage_id` field,
  which is not present in `VirtualsForFileLikeStorage` here.
- Folded the metadata-path preference into the existing `getFileIdentifier`
  helper in the stable task distributor rather than the upstream inline
  call sites.
- Updated `Mutations.cpp` (`expireSnapshots`) callers for the new
  `getManifestList` / `getManifestFileEntriesHandle` signatures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@zvonand zvonand added port-antalya PRs to be ported to all new Antalya releases forwardport This is a frontport of code that existed in previous Antalya versions labels Jun 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Workflow [PR], commit [85dc463]

Fixes `04034_iceberg_spark_style_location` (S3_ERROR 404 reading
`warehouse/db/spark_table/metadata/snap-*.avro`).

When an Iceberg table's metadata `location` differs from where the files
actually live (e.g. a Spark-relocated table whose `location` is
`s3a://spark-bucket/warehouse/db/spark_table` while the objects are in the
configured base storage), the manifest-list / manifest / data paths in the
metadata are spelled with that foreign prefix.

`tryResolveObjectStorageForPath` matched such a path against `table_location`
and returned the raw URI key on the base storage, so reads hit a
non-existent key and failed with a 404. The raw key is only valid for paths
whose bucket matches the base storage (handled by the earlier base-bucket
branch). For a path that matches `table_location` but not the base bucket,
only `IcebergPathResolver::resolve` can map it (strip `table_location`,
prepend `table_root`), so defer to it by returning `std::nullopt`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya-26.3 forwardport This is a frontport of code that existed in previous Antalya versions port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant