Improve performance for tables with composite primary keys#1686
Improve performance for tables with composite primary keys#1686coding-chimp wants to merge 4 commits into
Conversation
| col2StartOp := string(minRangeComparisonSign) | ||
| fromClause := fmt.Sprintf("%s.%s force index (%s)", databaseName, originalTableName, uniqueKey) | ||
|
|
||
| if sameFirstColumnValue(rangeStartArgs, rangeEndArgs) { |
There was a problem hiding this comment.
sameFirstColumnValue reports whether the first column's range start and end bounds are equal. When true, the range fits within a single col1 partition, and a simpler query must replace the 3-part UNION.
It's not just a performance shortcut, it's a correctness requirement. Look at what the UNION produces when col1_start == col1_end:
- Part 1:
col1 = x AND col2 > start_col2- no upper bound oncol2, so it includes all rows, including rows aboveend_col2 - Part 2:
col1 > x AND col1 < x- returns nothing - Part 3:
col1 = x AND col2 <= end_col2- no lower bound, so it includes rows belowstart_col2
The UNION ALL result is essentially every row where col1 = x. The outer ORDER BY ... LIMIT 1 OFFSET chunkSize-1 then picks the Nth row, which is the wrong boundary.
The simplified query fixes this:
WHERE col1 = ? AND col2 > ? AND col2 <= ?
Both bounds are explicit, and because col1 is pinned with equality, MySQL can use the composite index directly as a seek on (col1, col2).
| return result, explodedArgs, nil | ||
| } | ||
|
|
||
| func buildUniqueKeyRangeEndTwoColumnViaTemptable( |
There was a problem hiding this comment.
The UNION optimization is arguably unnecessary for the ViaTemptable code path, since it'll only be called once at the end of the migration, when the query is unlikely to be slow. However, it provides consistency with ViaOffset and doesn't add much code.
There was a problem hiding this comment.
Pull request overview
This PR optimizes gh-ost chunk pagination/copy queries for tables that paginate on two-column composite unique keys by switching from a single composite range predicate to a 3-part UNION ALL pattern (with a single-range fast-path when the first-column boundary is the same). This targets the MySQL optimizer behavior that can cause extremely slow scans for composite key open ranges.
Changes:
- Add two-column optimized query generation in
BuildRangeInsertQuery,BuildUniqueKeyRangeEndPreparedQueryViaOffset, andBuildUniqueKeyRangeEndPreparedQueryViaTemptable. - Introduce helper functions (
sameFirstColumnValue,buildTwoColumnUnionParts,newTwoColumnRangeMeta) to generate the new query pattern and enum-safe ORDER BY clauses. - Expand unit tests to cover the new two-column paths (UNION vs single-range, locking clause variations, enum ordering).
Show a summary per file
| File | Description |
|---|---|
| go/sql/builder.go | Adds two-column fast-path query builders using a UNION-based pattern and supporting helpers. |
| go/sql/builder_test.go | Updates/extends tests to validate the new two-column query strings and argument expansion. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 3
| func sameFirstColumnValue(rangeStartArgs, rangeEndArgs []interface{}) bool { | ||
| return fmt.Sprintf("%v", rangeStartArgs[0]) == fmt.Sprintf("%v", rangeEndArgs[0]) | ||
| } |
| m := newTwoColumnRangeMeta(uniqueKeyColumns) | ||
| col2StartOp := string(startRangeComparisonSign) | ||
| selectClause := m.col1Name + ", " + m.col2Name | ||
| fromClause := databaseName + "." + tableName | ||
| partSuffix := fmt.Sprintf("order by %s limit %d", m.orderByAsc, chunkSize) | ||
|
|
||
| if sameFirstColumnValue(rangeStartArgs, rangeEndArgs) { |
| m := newTwoColumnRangeMeta(uniqueKeyColumns) | ||
| col2StartOp := string(startRangeComparisonSign) | ||
| selectClause := m.col1Name + ", " + m.col2Name | ||
| fromClause := databaseName + "." + tableName | ||
| partSuffix := fmt.Sprintf("order by %s limit %d", m.orderByAsc, chunkSize) | ||
|
|
||
| if sameFirstColumnValue(rangeStartArgs, rangeEndArgs) { |
A Pull Request should be associated with an Issue.
Related issue: #1669
Description
This PR introduces an optimized query pattern for tables with composite primary keys. As described in the linked issue, gh-ost's queries against these tables can currently be very slow. We've seen a single query take up to 16 minutes, resulting in slow migrations that might not even finish when gh-ost falls behind in binlog streaming. The new query pattern has a constant query time (~30ms in our testing) regardless of the table's data shape. We're only introducing it for two-column composite keys because the complexity would grow exponentially with more columns.
We're introducing the new query pattern in
BuildRangeInsertQuery,BuildUniqueKeyRangeEndPreparedQueryViaOffset,BuildUniqueKeyRangeEndPreparedQueryViaTemptable.I've tested the changes against real data on a test DB, and the migration took ~12h to complete. In comparison, I'm currently running the same migration again without these changes, and it has only reached 15.6% after 22h. The current ETA is 87h.
script/cibuildreturns with no formatting errors, build errors or unit test errors.