Skip to content

asyncio.protocols._feed_data_to_buffered_proto: avoid O(N^2) bytes slicing #150621

@deadlovelll

Description

@deadlovelll

Bug report

Bug description:

asyncio.protocols._feed_data_to_buffered_proto feeds incoming data into a BufferedProtocol reusable buffer. The current multi-iteration path is O(N^2) because it slices data as bytes on every iteration:

# Lib/asyncio/protocols.py:200-216
def _feed_data_to_buffered_proto(proto, data):
    data_len = len(data)
    while data_len:
        buf = proto.get_buffer(data_len)
        buf_len = len(buf)
        if not buf_len:
            raise RuntimeError('get_buffer() returned an empty buffer')

        if buf_len >= data_len:
            buf[:data_len] = data
            proto.buffer_updated(data_len)
            return
        else:
            buf[:buf_len] = data[:buf_len]
            proto.buffer_updated(buf_len)
            data = data[buf_len:]
            data_len = len(data)

data = data[buf_len:] executes N / buf_len times, each time recopying the remaining tail.

Proposed fix

Track the offset into data manually instead of reslicing:

def _feed_data_to_buffered_proto(proto, data):
    data_len = len(data)
    start = 0
    while data_len:
        buf = proto.get_buffer(data_len)
        buf_len = len(buf)
        if not buf_len:
            raise RuntimeError('get_buffer() returned an empty buffer')

        if buf_len >= data_len:
            buf[:data_len] = data[start:start + data_len] if start else data
            proto.buffer_updated(data_len)
            return
        else:
            buf[:buf_len] = data[start:start + buf_len]
            proto.buffer_updated(buf_len)
            start += buf_len
            data_len -= buf_len

Benchmarks

Benchmarks made via pyperf

+-------------------------+---------+-------------------------+
| Benchmark               | old     | new                     |
+=========================+=========+=========================+
| data=   65536 buf= 4096 | 48.7 us | 16.8 us: 2.89x faster   |
+-------------------------+---------+-------------------------+
| data=  262144 buf= 4096 | 637 us  | 67.4 us: 9.45x faster   |
+-------------------------+---------+-------------------------+
| data= 1048576 buf= 4096 | 8.28 ms | 269 us: 30.77x faster   |
+-------------------------+---------+-------------------------+
| data= 4194304 buf= 4096 | 205 ms  | 1.08 ms: 189.36x faster |
+-------------------------+---------+-------------------------+
| data= 1048576 buf=65536 | 741 us  | 237 us: 3.13x faster    |
+-------------------------+---------+-------------------------+
| data=     100 buf= 4096 | 558 ns  | 571 ns: 1.02x slower    |
+-------------------------+---------+-------------------------+
| data=    1024 buf= 4096 | 636 ns  | 641 ns: 1.01x slower    |
+-------------------------+---------+-------------------------+
| data=    4096 buf= 4096 | 812 ns  | 823 ns: 1.01x slower    |
+-------------------------+---------+-------------------------+
| data=   32768 buf=65536 | 2.53 us | 2.55 us: 1.01x slower   |
+-------------------------+---------+-------------------------+
| Geometric mean          | (ref)   | 4.27x faster            |
+-------------------------+---------+-------------------------+

The scenarios when buf_len >= data_len are not regressed. The 1.01-1.02x differences can be treated as statistical noise

Full benchmark code I'll pin in the PR

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows, macOS

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytopic-asynciotype-featureA feature request or enhancement
    No fields configured for issues without a type.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions