src: refactor EncodeValidUtf8 to reuse non-ASCII UTF-8 path#63587
src: refactor EncodeValidUtf8 to reuse non-ASCII UTF-8 path#63587araujogui wants to merge 3 commits into
Conversation
92897bc to
a2ccfb5
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR optimizes UTF-8 decoding in the encoding binding by avoiding redundant UTF-8 validation when input has already been validated.
Changes:
- Added
StringBytes::EncodeValidatedUTF8()API for encoding already-validated UTF-8. - Refactored the fast UTF-8→UTF-16 conversion into a shared helper (
EncodeKnownValidNonAsciiUTF8). - Updated
BindingData::DecodeUTF8()to use the new API after performing UTF-8 validation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/string_bytes.h | Declares a new EncodeValidatedUTF8() entry point for already-validated UTF-8. |
| src/string_bytes.cc | Implements EncodeValidatedUTF8() and factors out the fast conversion helper used by both code paths. |
| src/encoding_binding.cc | Uses EncodeValidatedUTF8() after explicit UTF-8 validation to avoid re-validation overhead. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #63587 +/- ##
==========================================
+ Coverage 90.32% 90.34% +0.02%
==========================================
Files 732 732
Lines 236435 236444 +9
Branches 44527 44524 -3
==========================================
+ Hits 213563 213621 +58
+ Misses 14589 14521 -68
- Partials 8283 8302 +19
🚀 New features to boost your workflow:
|
727b683 to
6a919f6
Compare
|
CC @nodejs/performance |
6a919f6 to
183258a
Compare
|
I also tried to fix this in #63231, but this pr more generally, good work! |
Uh, my bad, I haven't seen it. I'm okay with closing this PR in favour of yours. |
|
No problem at all, thanks for your effort. This PR seems like a better solution to me, @addaleax, Do you have any thoughts? |
Signed-off-by: Guilherme Araújo <arauujogui@gmail.com>
Signed-off-by: Guilherme Araújo <arauujogui@gmail.com>
b711577 to
e41315d
Compare
|
@mertcanaltin @addaleax I rebased the branch, it's just a small DRY refactoring now |
aabbe55 to
c02c51d
Compare
Upstream PR #63231 added
StringBytes::EncodeValidUtf8but duplicated theEncodeTwoByteStringblock already present inStringBytes::Encode.This PR refactors that by extracting a shared
EncodeValidNonAsciiUtf8helper, called from bothEncodeValidUtf8and the existingEncodeUTF-8 path.