Skip to content

extract_url_auth does not percent-decode URL userinfo, breaking HTTP Basic auth for usernames containing '@' (e.g. SAML/SSO email identities) #351

@abhisheksaxena29

Description

@abhisheksaxena29

Summary

extract_url_auth in src/manage/urlutils.py returns the userinfo
fields from winhttp_urlsplit without percent-decoding them, and
_basic_auth_header then base64-encodes the literal raw string.
This violates RFC 7617 (HTTP Basic Authentication), which requires
the header to carry the decoded userinfo, and causes
authentication to fail against any private runtime feed whose
username must be percent-encoded in URLs — most commonly an @
in email-format SAML/SSO usernames.

Steps to reproduce

  1. Stand up a private runtime feed (e.g. an internal Artifactory /
    Nexus / S3 generic mirror) protected by HTTP Basic auth, where
    the authenticating user's name contains @
    (e.g. user@example.com — typical for SAML-provisioned users).

  2. Configure pymanager with that feed:

    https://user%40example.com:password@feed.example.com/index.json

    (%40 is the RFC 3986 §3.2.1 percent-encoding of @, required
    because @ is the userinfo/host delimiter and is not in the
    userinfo character set.)

  3. Run pymanager install ... against this feed.

Expected

pymanager sends:

Authorization: Basic <base64("user@example.com:password")>

…and authentication succeeds (this is what every other Python
packaging tool does — see "Prior art" below).

Actual

pymanager sends:

Authorization: Basic <base64("user%40example.com:password")>

The server receives the literal string user%40example.com as the
username, which does not match the stored user, and returns 401.
The unencoded URL form (https://user@example.com:password@host/...)
also fails because WinHttpCrackUrl returns
ERROR_WINHTTP_INVALID_URL (12005) on unencoded @ in userinfo —
which the codebase already catches in sanitise_url. So users
have no working URL form today when the username contains @.

Root cause

In src/manage/urlutils.py:

def extract_url_auth(url):
    if not url:
        return url
    p = winhttp_urlsplit(url)
    user, passw = p[U_USERNAME], p[U_PASSWORD]
    if user or passw:
        return user or "", passw or ""
    return None

On the native Windows path, winhttp_urlsplit wraps
WinHttpCrackUrl, which by default returns pwszUserName /
pwszPassword without percent-decoding (no ICU_DECODE
behavior). The values are then passed verbatim to
_basic_auth_header:

def _basic_auth_header(username, password):
    from base64 import b64encode
    pair = f"{username}:{password}".encode("utf-8")
    token = b64encode(pair)
    return "Basic " + token.decode("ascii")

— no decoding step at any point.
(The pure-Python fallback winhttp_urlsplit happens to work
because urllib.parse.urlsplit().username percent-decodes by
default — so there's also a silent behavior divergence between
the native and fallback paths.)
A subtle confirmation that the maintainers already know native
winhttp_urlsplit returns percent-encoded values: sanitise_url
in the same file has explicit logic for %-prefixed passwords:

pw = p[U_PASSWORD]
if pw and not (pw.startswith("%") and pw.startswith("%")):
    p[U_PASSWORD] = None

Affected users

Anyone deploying pymanager with a private runtime feed where the
authenticating identity has @ in the username — i.e. virtually
every enterprise that uses SAML/SSO with email-format usernames
and mirrors python.org's index-windows.json through an internal
artifact repository. This is a very common pattern under
PEP 773's "alternative feed"
model.

Suggested fix

One-line change in extract_url_auth to percent-decode user and
password before returning:

def extract_url_auth(url):
    if not url:
        return url
    p = winhttp_urlsplit(url)
    user, passw = p[U_USERNAME], p[U_PASSWORD]
    if user or passw:
        import urllib.parse
        return (urllib.parse.unquote(user or ""),
                urllib.parse.unquote(passw or ""))
    return None

This matches what the rest of the Python ecosystem already does
client-side (see prior art).
Happy to send a PR if helpful.

Prior art (this is a well-established convention)

  • pip: pypa/pip#3236
    (2015, fixed in pip 10.0.0) and
    pypa/pip#6775 (2019,
    closing comment by @pradyunsg): "pip 19.2 now requires
    credentials to be URL quoted. Thus, characters like @ and %
    need to be URL quoted (like %40 and %25)."
  • pip docs (current):
    Authentication — Basic HTTP authentication
    explicitly documents %XX encoding for special characters in
    credentials and percent-decodes them client-side.
  • Poetry: python-poetry/poetry#686,
    fixed in #1402.
  • Stack Overflow (70 upvotes, 64k views):
    Escaping username characters in basic auth URLs
    — accepted answer cites RFC 3986 §3.2.1 for %40 as the correct
    form.
  • RFC 3986 §3.2.1 — userinfo grammar requires percent-encoding
    for characters outside unreserved / pct-encoded / sub-delims / ":".
  • RFC 7617 — HTTP Basic Authentication header carries decoded
    user-id and password.

Environment

  • pymanager: [latest]
  • Windows: [latest]
  • Feed server: [Artifactory]

Metadata

Metadata

Assignees

Labels

bugSomething isn't working
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions