Bug report
Bug description:
The binary format defines total_samples as just u32:
|
#define HDR_SIZE_SAMPLES 4 |
|
#define HDR_OFF_THREADS (HDR_OFF_SAMPLES + HDR_SIZE_SAMPLES) |
That's not that much... With just 100khz:
| Threads |
overflow after... |
| 1 |
~11.9 h |
| 4 |
~3.0 h |
| 10 |
~71 min |
| 64 |
~11 min |
especially if we aim for continous profiling of real production systems. But even of macOS, I'm observing ~0.5-1MHz on my mach_vm_remap branch already...
To make matters worse, since #150349 it results in OverflowError and the binary file is just corrupted:
2026-06-11T02:03:53.357502000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (vmremap 0bdde7f?) % ls -l /tmp/overflow.bin
-rw-r--r--@ 1 root wheel 4894411257 Jun 11 01:45 /tmp/overflow.bin
2026-06-11T02:03:58.920689000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (vmremap 0bdde7f?) % head -c 128 /tmp/overflow.bin | xxd
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 28b5 2ffd 0058 5c1d 013a d500 1621 8025 (./..X\..:...!.%
00000050: 49d2 0133 3003 331a 3330 3d32 35a9 ccc0 I..30.3.30=25...
00000060: 3aa9 049c b5d6 b69d d296 3122 4488 0c6d :.........1"D..m
00000070: 0148 0150 014f 0946 32bd b931 318c b6de .H.P.O.F2..11...
as the header is written on finalize:
|
if (FSEEK64(writer->fp, 0, SEEK_SET) < 0) { |
|
PyErr_SetFromErrno(PyExc_IOError); |
|
return -1; |
|
} |
|
|
|
/* Convert file offsets and counts to fixed-width types for portable header format. |
|
* This ensures correct behavior on both little-endian and big-endian systems. */ |
|
uint64_t string_table_offset_u64 = (uint64_t)string_table_offset; |
|
uint64_t frame_table_offset_u64 = (uint64_t)frame_table_offset; |
|
uint32_t thread_count_u32 = (uint32_t)writer->thread_count; |
|
uint32_t compression_type_u32 = (uint32_t)writer->compression_type; |
|
|
|
uint8_t header[FILE_HEADER_SIZE] = {0}; |
|
uint32_t magic = BINARY_FORMAT_MAGIC; |
|
uint32_t version = BINARY_FORMAT_VERSION; |
|
memcpy(header + HDR_OFF_MAGIC, &magic, HDR_SIZE_MAGIC); |
|
memcpy(header + HDR_OFF_VERSION, &version, HDR_SIZE_VERSION); |
|
header[HDR_OFF_PY_MAJOR] = PY_MAJOR_VERSION; |
|
header[HDR_OFF_PY_MINOR] = PY_MINOR_VERSION; |
|
header[HDR_OFF_PY_MICRO] = PY_MICRO_VERSION; |
|
memcpy(header + HDR_OFF_START_TIME, &writer->start_time_us, HDR_SIZE_START_TIME); |
|
memcpy(header + HDR_OFF_INTERVAL, &writer->sample_interval_us, HDR_SIZE_INTERVAL); |
|
memcpy(header + HDR_OFF_SAMPLES, &writer->total_samples, HDR_SIZE_SAMPLES); |
|
memcpy(header + HDR_OFF_THREADS, &thread_count_u32, HDR_SIZE_THREADS); |
|
memcpy(header + HDR_OFF_STR_TABLE, &string_table_offset_u64, HDR_SIZE_STR_TABLE); |
|
memcpy(header + HDR_OFF_FRAME_TABLE, &frame_table_offset_u64, HDR_SIZE_FRAME_TABLE); |
|
memcpy(header + HDR_OFF_COMPRESSION, &compression_type_u32, HDR_SIZE_COMPRESSION); |
|
if (fwrite_checked_allow_threads(header, FILE_HEADER_SIZE, writer->fp) < 0) { |
To put aside that u32 is just too little, we should be graceful here.
Other fields seem to be fine, but need a double check here. I think that rotating files is a stop gap, and we need chunking.
Reproduction
(as a part of routine stress tests of maurycy#3)
[130] 2026-06-10T23:35:58.225891000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (vmremap 92ad857*?) % sudo ./python.exe -m profiling.sampling run --binary -r 1000khz -d 25000 -o /tmp/overflow.bin --realtime-stats busywork.py
Stats: 647,833.9Hz (1.5µs) Min: 615,379.9Hz Max: 705,712.7Hz N=4294631818 Cache: 100.0% (4294631818+0/1)Traceback (most recent call last):
File "<frozen runpy>", line 201, in _run_module_as_main
File "<frozen runpy>", line 87, in _run_code
File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/__main__.py", line 65, in <module>
main()
~~~~^^
File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/cli.py", line 977, in main
_main()
~~~~~^^
File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/cli.py", line 1133, in _main
handler(args)
~~~~~~~^^^^^^
File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/cli.py", line 1280, in _handle_run
collector = sample(
process.pid,
...<9 lines>...
blocking=args.blocking,
)
File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/sample.py", line 504, in sample
profiler.sample(collector, duration_sec, async_aware=async_aware)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/sample.py", line 167, in sample
raise e from None
File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/sample.py", line 155, in sample
collector.collect(stack_frames)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling/binary_collector.py", line 84, in collect
self._writer.write_sample(stack_frames, timestamp_us)
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: too many samples for binary format
[1] 2026-06-11T01:47:42.249903000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (vmremap cd5a675*?) %
def hot_a(n):
return sum(i * i for i in range(n))
def hot_b(n):
return sum(i + i for i in range(n))
def worker():
while True:
hot_a(6_000_000)
hot_b(6_000_000)
worker()
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS
Bug report
Bug description:
The binary format defines
total_samplesas justu32:cpython/Modules/_remote_debugging/binary_io.h
Line 269 in 540b3d0
cpython/Modules/_remote_debugging/binary_io.h
Lines 53 to 54 in 540b3d0
That's not that much... With just 100khz:
especially if we aim for continous profiling of real production systems. But even of macOS, I'm observing ~0.5-1MHz on my
mach_vm_remapbranch already...To make matters worse, since #150349 it results in
OverflowErrorand the binary file is just corrupted:as the header is written on finalize:
cpython/Modules/_remote_debugging/binary_io_writer.c
Line 1074 in 540b3d0
cpython/Modules/_remote_debugging/binary_io_writer.c
Lines 1196 to 1223 in 540b3d0
To put aside that
u32is just too little, we should be graceful here.Other fields seem to be fine, but need a double check here. I think that rotating files is a stop gap, and we need chunking.
Reproduction
(as a part of routine stress tests of maurycy#3)
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS