Skip to content

[spark] Support ON_ERROR = CONTINUE / SKIP_FILE in COPY INTO#8062

Open
JunRuiLee wants to merge 4 commits into
apache:masterfrom
JunRuiLee:copy-into-on-error-clean
Open

[spark] Support ON_ERROR = CONTINUE / SKIP_FILE in COPY INTO#8062
JunRuiLee wants to merge 4 commits into
apache:masterfrom
JunRuiLee:copy-into-on-error-clean

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

@JunRuiLee JunRuiLee commented Jun 1, 2026

Motivation

This is part of #8005.

COPY INTO previously only supported ON_ERROR = ABORT_STATEMENT: any parse or
cast error aborted the entire command. In production data-loading pipelines a
single malformed row or file would then fail the whole batch, which is often
too strict. This adds two error-tolerant modes:

  • CONTINUE — skip bad rows and load the rest (row-level tolerance).
  • SKIP_FILE — skip any file that contains an error, all-or-nothing per file.

ABORT_STATEMENT remains the default, so existing behavior is unchanged.

Changes

  • Grammar: ON_ERROR now accepts CONTINUE and SKIP_FILE in addition to
    ABORT_STATEMENT.
  • Result schema gains two columns:
    • errors_seen (BIGINT) — number of error rows per file.
    • first_error (STRING) — first error message, NULL when the file is clean.
    • status now also reports PARTIALLY_LOADED and LOAD_FAILED.
  • Error detection runs once per batch; both modes write in a single commit.
    Load history is recorded so error-tolerant runs stay idempotent under
    FORCE = FALSE.
  • Refactor: CopyIntoTableExec is split into focused helpers
    (CopyIntoHelper, CopyIntoCastValidator, CopyIntoDataFrameBuilder,
    CopyIntoErrorHandler, CopyIntoResultBuilder), shared across CSV/JSON/Parquet.
  • Docs updated in sql-write.md, including the CSV column-count-mismatch caveat
    under CONTINUE.

Supported for CSV, JSON, and Parquet.

@JunRuiLee JunRuiLee force-pushed the copy-into-on-error-clean branch from 673a860 to 2ca4f14 Compare June 1, 2026 11:46
@JunRuiLee JunRuiLee force-pushed the copy-into-on-error-clean branch from 2ca4f14 to 71f172a Compare June 1, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant