This document describes Apache Paimon's detailed security threat model for maintainers and automated security triage.
It complements the shorter public-facing security model in
docs/docs/project/security.md (published at the project website) by making
Paimon's trust assumptions, security boundaries, and recurring non-security
bug classes more explicit.
Apache Paimon is a streaming data lake platform that is often deployed as a library and integration layer inside larger systems (Flink, Spark, Hive, and other query engines) that provide their own authentication, authorization, and credential management. Because of that deployment model, many bug classes that look security-relevant in the abstract are not actually security vulnerabilities in Paimon itself.
This model is intended to answer:
- what Paimon generally treats as a security vulnerability
- what Paimon generally treats as correctness, hardening, or deployment work
- which boundaries are primarily owned by Paimon versus the surrounding catalog, engine, or service
- which issue classes should be downgraded by default by scanners
This model is scoped to the Apache Paimon project itself:
- the table format implementation (paimon-core)
- client libraries (paimon-api, paimon-common)
- the REST Catalog client and protocol (paimon-api, paimon-core)
- engine integrations (Flink, Spark, Hive connectors)
- the Python client (pypaimon)
It is not a general threat model for every deployment that embeds Paimon.
In particular, it does not attempt to define the complete security model for:
- query engines or applications that embed Paimon
- storage-level authorization enforced outside Paimon
- REST Catalog server implementations (Paimon defines the client and protocol, not the server)
Paimon should:
- avoid exposing secrets or delegated credentials to principals that were not already trusted with them
- avoid creating new unauthorized capabilities in Paimon-owned components or integrations
- avoid violating trust boundaries that Paimon itself owns, such as leaking auth, signer, or credential-bearing state across catalog or session boundaries in the same process
- avoid leaking delegated storage tokens (data tokens) across table or principal boundaries
Paimon does not aim to be the primary enforcement point for:
- user-to-user authorization inside a query engine
- storage-level authorization (e.g., object store IAM policies)
- service-side authorization performed by a REST Catalog server
- row-level or column-level access control (Paimon relays server-provided filters and column masking rules, but enforcement is in the server)
The operator deploys and configures the catalog, REST Catalog server, engine, and storage integration around Paimon. This role is trusted to choose endpoints, warehouses, and storage integrations, configure credentials, and decide which users may create, read, or modify tables.
The catalog control plane is responsible for resolving tables and supplying metadata, locations, configuration, and delegated credentials to Paimon. This role may be implemented by:
- a REST Catalog server
- a Hive Metastore
- a JDBC-backed catalog
- a filesystem-based catalog
Regardless of implementation, it should not expose secrets to unintended principals or leak credential-bearing state across unintended boundaries.
Paimon assumes a trusted catalog or metastore, which is outside its primary security boundary.
In REST deployments, part of the catalog control plane is implemented by a server that returns metadata, configuration, delegated storage credentials (data tokens), and query-level authorization (row filters and column masking) to the client. This server is generally treated as a trusted control-plane component.
The REST Catalog server is responsible for:
- authenticating clients
- authorizing catalog operations (create/drop/alter databases, tables, views, functions)
- issuing scoped, time-limited data tokens for storage access
- providing row-level filters and column masking rules via the auth table query API
- returning server-side configuration to merge with client configuration
In REST deployments, the client-side catalog (RESTCatalog, RESTApi)
consumes server-provided metadata, configuration, and credentials. Where the
client and server are meaningfully distinct, client-side bugs in token
handling, caching, or reuse may still be security-relevant. This is especially
true when the Paimon-owned client implementation leaks credential-bearing
state across catalog, session, or principal boundaries it is expected to
preserve.
The REST Catalog client is responsible for:
- sending authenticated requests using a configured
AuthProvider - refreshing tokens before expiration (with a configurable safe time margin)
- caching
FileIOinstances keyed by data token (viaRESTTokenFileIO) and evicting them when tokens expire - not mixing data tokens or auth state across different catalog instances or tables in the same process
Query engines (Flink, Spark, Hive, Trino, StarRocks, etc.) and applications may expose only a subset of Paimon capabilities to users. They are responsible for their own user-facing authorization boundaries unless Paimon explicitly documents otherwise.
This role may already have legitimate power to write or replace table metadata, write or delete data files, manage snapshots, create or delete branches and tags, and invoke destructive maintenance operations (compaction, expiration, rollback). If a report only shows a new way to achieve the same effect this role can already cause legitimately, it is usually not a security issue in Paimon.
The following are generally treated as trusted operator or deployment inputs:
- catalog properties (including
uri,warehouse,token.provider) - REST Catalog server endpoint configuration
- warehouse and storage roots
- authentication credentials
- Kerberos keytab paths and principal names
(
security.kerberos.login.keytab,security.kerberos.login.principal) - metastore wiring (Hive Metastore URI, JDBC connection strings)
- custom HTTP headers (
header.*)
If a report depends on the attacker controlling those values directly, it is usually not a vulnerability in Paimon itself.
Paimon often accepts metadata locations, table properties, database properties, schema definitions, and related control-plane information from a catalog or metastore. By default, Paimon treats those sources as trusted.
This means a malicious catalog supplying incorrect or malicious metadata is usually not a Paimon vulnerability by itself.
In REST deployments, Paimon accepts the following from the REST Catalog server:
- Server configuration: merged into client options via the
/v1/configendpoint, including catalog prefix and additional headers - Data tokens: time-limited storage credentials returned by the
/v1/{prefix}/databases/{database}/tables/{table}/tokenendpoint, used byRESTTokenFileIOto access the underlying object store - Auth table query responses: row-level filters and column masking rules
returned by the
/v1/{prefix}/databases/{database}/tables/{table}/authendpoint
By default, these are treated as trusted control-plane inputs unless Paimon explicitly documents a stronger guarantee.
This means a malicious REST Catalog server sending dangerous configuration or overly broad data tokens is usually not a Paimon vulnerability by itself. It also means many client-side token-selection bugs are often correctness or specification issues rather than security boundary failures.
The major exception is secret exposure. If Paimon surfaces credentials or secrets to a new audience that was not already trusted with them, that is security-relevant. In particular:
- Data tokens for one table leaking to operations on a different table
- Auth state from one catalog instance leaking into another
- Credentials appearing in logs, error messages, or serialized state
Object store permissions (e.g., OSS, S3, HDFS ACLs) are enforced by the storage provider and the credentials the surrounding deployment chooses to hand to Paimon. Paimon is not the root authority for bucket- or object-level authorization.
Reports that depend primarily on over-broad IAM policies or permissive storage ACLs are usually deployment-sensitive rather than product-security issues in Paimon.
Paimon integrations may surface data and operations through a query engine or application, but Paimon is not a complete user-authorization framework for those systems.
Paimon does provide a mechanism for the REST Catalog server to supply
row-level filters and column masking rules via authTableQuery, but
enforcement of those rules is a shared responsibility between the engine
integration and the catalog server. Paimon relays the rules; the engine
must apply them.
The following categories are generally security-relevant in Paimon when the report is credible and reproducible.
Examples include:
- catalog credentials exposed through a user-visible engine surface (e.g., query results, EXPLAIN output, table properties)
- one catalog's credentials or auth state leaking into another catalog or session within the same process
- data tokens for table A being used for (or exposed to) table B
- credentials or tokens logged at INFO or lower levels without redaction
- credentials surviving in serialized
RESTTokenFileIOorRESTApistate beyond their intended scope
Security issues exist when Paimon itself is expected to separate catalogs, principals, or sessions and fails to do so.
Examples include:
- process-global auth provider or signer state crossing catalog instances
(e.g., the
FILE_IO_CACHEinRESTTokenFileIOreturning aFileIObelonging to a different principal) - a data token obtained for one table being reused for a different table's data access
- auth header state from one
RESTApiinstance leaking into another
If Paimon's client-side handling of authTableQuery responses (row filters
or column masking rules) allows a caller to bypass filters that the server
intended to enforce, that is security-relevant when the bypass occurs within
Paimon-owned code rather than in the engine integration.
These categories may still be real bugs worth fixing, but they are not usually security vulnerabilities in Paimon itself.
Examples:
- wrong byte offsets or stale decoded values in file formats
- incorrect merge-tree compaction producing wrong query results
- race conditions or logic bugs that do not create a new trust-boundary violation
- snapshot or schema version conflicts that produce incorrect metadata
Malformed-input crashes, raw runtime exceptions from invalid JSON or Avro data, and memory amplification from oversized manifests or schemas are usually treated as robustness or hardening work rather than security issues in Paimon itself.
Reports that require a malicious catalog, metastore, REST Catalog server, or other external service are usually outside Paimon's primary security boundary.
Examples:
- a REST Catalog server returning a data token with overly broad storage permissions
- a Hive Metastore returning a table location pointing to a sensitive path
- a REST Catalog server returning malicious row filters designed to extract data through side channels
If the actor already has a legitimate capability that can cause the same harm, the new path is usually not a security issue. This often applies to writers or maintainers who already control metadata layout, file layout, or destructive maintenance operations (snapshot expiration, orphan file cleanup, branch deletion).
Resource exhaustion caused by legitimate but expensive operations (e.g., large compaction, scanning many partitions, listing all snapshots) is usually treated as an operational concern rather than a security vulnerability.
Paimon's REST Catalog client supports pluggable authentication through the
AuthProvider interface.
Authentication providers are created via the AuthProviderFactory SPI, loaded
using Java's ServiceLoader mechanism based on the token.provider
configuration. The authentication provider is process-level per catalog
instance and must not share mutable state across instances.
When data-token.enabled is true, RESTTokenFileIO manages delegated
storage credentials:
- The client calls the table token endpoint to obtain a time-limited data token
- The token is cached and used to construct a
FileIOinstance for storage access - Tokens are refreshed before expiration (1 hour safe time margin by default)
FileIOinstances are cached in a process-global cache (FILE_IO_CACHE) keyed byRESTToken, with a maximum size of 1000 entries and 10-hour expiry
Security-relevant invariants:
- Data tokens must be scoped to specific tables by the server
- The
FILE_IO_CACHEkeys on the fullRESTToken(token content + expiration), so different tokens produce differentFileIOinstances - Token refresh creates a new
RESTApiinstance from the catalog context if the original instance is unavailable (e.g., after deserialization)
Paimon supports Kerberos authentication for Hadoop-based deployments through
SecurityContext and SecurityConfiguration. Keytab paths and principals
are treated as trusted operator configuration.
A scanner targeting Paimon should treat a finding as higher-confidence only if it plausibly shows one of the following:
- exposure of a secret or delegated credential to a new audience
- creation of a new unauthorized capability in a Paimon-owned component
- violation of a Paimon-owned trust boundary (e.g., cross-catalog credential leak, cross-table data token reuse)
A finding should be downgraded or rejected by default if it instead depends primarily on:
- malformed-input robustness or denial-of-service behavior
- a malicious catalog, metastore, REST Catalog server, or external service
- a principal that already has equivalent power through legitimate write or maintenance capabilities
- operator misconfiguration (overly broad credentials, missing TLS, etc.)