From 939b03ea55cd1bfe81b41cdb49ff5d45a4030111 Mon Sep 17 00:00:00 2001 From: Serhiy Storchaka Date: Sun, 31 May 2026 15:47:49 +0300 Subject: [PATCH] Use the :abbr: role for BMP (Basic Multilingual Plane) --- Doc/library/idle.rst | 2 +- Doc/library/pyexpat.rst | 3 ++- Doc/whatsnew/3.16.rst | 3 ++- Doc/whatsnew/3.3.rst | 3 ++- Doc/whatsnew/3.4.rst | 3 ++- Doc/whatsnew/3.8.rst | 3 ++- .../next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst | 2 +- 7 files changed, 12 insertions(+), 7 deletions(-) diff --git a/Doc/library/idle.rst b/Doc/library/idle.rst index c7c30e5300c2a4..fbca2a0703a8aa 100644 --- a/Doc/library/idle.rst +++ b/Doc/library/idle.rst @@ -864,7 +864,7 @@ A Windows console, for instance, keeps a user-settable 1 to 9999 lines, with 300 the default. A Tk Text widget, and hence IDLE's Shell, displays characters (codepoints) in -the BMP (Basic Multilingual Plane) subset of Unicode. Which characters are +the :abbr:`BMP (Basic Multilingual Plane)` subset of Unicode. Which characters are displayed with a proper glyph and which with a replacement box depends on the operating system and installed fonts. Tab characters cause the following text to begin after the next tab stop. (They occur every 8 'characters'). Newline diff --git a/Doc/library/pyexpat.rst b/Doc/library/pyexpat.rst index c88411ce0b7b91..b390cb918bb4df 100644 --- a/Doc/library/pyexpat.rst +++ b/Doc/library/pyexpat.rst @@ -76,7 +76,8 @@ The :mod:`!xml.parsers.expat` module contains two functions: For other encodings (including aliases like Latin1 and ASCII) it falls back to Python. It supports most of 8-bit encodings and many multi-byte encodings - like Shift_JIS, although only BMP characters (``U+0000-U+FFFF``) + like Shift_JIS, although only the :abbr:`BMP (Basic Multilingual Plane)` + characters (U+0000 through U+FFFF) are supported with non-native encodings (this restriction is also applied to aliases like UTF8). These restrictions only apply if *encoding* is not given. diff --git a/Doc/whatsnew/3.16.rst b/Doc/whatsnew/3.16.rst index 9a0a0d3d8831f5..669d9402a85124 100644 --- a/Doc/whatsnew/3.16.rst +++ b/Doc/whatsnew/3.16.rst @@ -115,7 +115,8 @@ xml * Add support for multiple multi-byte encodings in the :mod:`XML parser `: "cp932", "cp949", "cp950", "Big5","EUC-JP", "GB2312", "GBK", "johab", and "Shift_JIS". - Add partial support (only BMP characters) for multi-byte encodings + Add partial support (only the :abbr:`BMP (Basic Multilingual Plane)` + characters) for multi-byte encodings "Big5-HKSCS", "EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213", "utf-8-sig" and non-standard aliases like "UTF8" (without hyphen). diff --git a/Doc/whatsnew/3.3.rst b/Doc/whatsnew/3.3.rst index 1bb79bce2c3e97..79010cfec62957 100644 --- a/Doc/whatsnew/3.3.rst +++ b/Doc/whatsnew/3.3.rst @@ -262,7 +262,8 @@ The storage of Unicode strings now depends on the highest code point in the stri * pure ASCII and Latin1 strings (``U+0000-U+00FF``) use 1 byte per code point; -* BMP strings (``U+0000-U+FFFF``) use 2 bytes per code point; +* :abbr:`BMP (Basic Multilingual Plane)` strings (``U+0000-U+FFFF``) use + 2 bytes per code point; * non-BMP strings (``U+10000-U+10FFFF``) use 4 bytes per code point. diff --git a/Doc/whatsnew/3.4.rst b/Doc/whatsnew/3.4.rst index a390211ddb5021..63cfcf7a40c996 100644 --- a/Doc/whatsnew/3.4.rst +++ b/Doc/whatsnew/3.4.rst @@ -418,7 +418,8 @@ Some smaller changes made to the core Python language are: * All the UTF-\* codecs (except UTF-7) now reject surrogates during both encoding and decoding unless the ``surrogatepass`` error handler is used, with the exception of the UTF-16 decoder (which accepts valid surrogate pairs) - and the UTF-16 encoder (which produces them while encoding non-BMP characters). + and the UTF-16 encoder (which produces them while encoding characters that + are not in the :abbr:`BMP (Basic Multilingual Plane)`). (Contributed by Victor Stinner, Kang-Hao (Kenny) Lu and Serhiy Storchaka in :issue:`12892`.) diff --git a/Doc/whatsnew/3.8.rst b/Doc/whatsnew/3.8.rst index 5078fc30ac111e..bb792f7c5e7706 100644 --- a/Doc/whatsnew/3.8.rst +++ b/Doc/whatsnew/3.8.rst @@ -868,7 +868,8 @@ window are shown and hidden in the Options menu. (Contributed by Tal Einat and Saimadhav Heblikar in :issue:`17535`.) OS native encoding is now used for converting between Python strings and Tcl -objects. This allows IDLE to work with emoji and other non-BMP characters. +objects. This allows IDLE to work with emoji and other characters that are not +in the :abbr:`BMP (Basic Multilingual Plane)`. These characters can be displayed or copied and pasted to or from the clipboard. Converting strings from Tcl to Python and back now never fails. (Many people worked on this for eight years but the problem was finally diff --git a/Misc/NEWS.d/next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst b/Misc/NEWS.d/next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst index d0af77366378b8..ed8d2f52c0dc1b 100644 --- a/Misc/NEWS.d/next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst +++ b/Misc/NEWS.d/next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst @@ -1,6 +1,6 @@ Add support for multiple multi-byte encodings in the :mod:`XML parser `: "cp932", "cp949", "cp950", "Big5","EUC-JP", "GB2312", -"GBK", "johab", and "Shift_JIS". Add partial support (only BMP characters) +"GBK", "johab", and "Shift_JIS". Add partial support (only the BMP characters) for multi-byte encodings "Big5-HKSCS", "EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213", "utf-8-sig" and non-standard aliases like "UTF8" (without hyphen). The parser now raises :exc:`ValueError` for