cppalliance · mvandeberg · Jun 24, 2026
diff --git a/doc/modules/ROOT/pages/4.coroutines/4c.executors.adoc b/doc/modules/ROOT/pages/4.coroutines/4c.executors.adoc
@@ -59,7 +59,8 @@ void schedule_work(executor_ref ex, continuation& c)
 int main()
 {
     thread_pool pool;
-    executor_ref ex = pool.get_executor();  // Type erasure
+    auto pool_ex = pool.get_executor();
+    executor_ref ex = pool_ex;  // Type erasure; pool_ex must outlive ex
 
     continuation c = /* ... */;
     schedule_work(ex, c);

diff --git a/doc/modules/ROOT/pages/4.coroutines/4d.io-awaitable.adoc b/doc/modules/ROOT/pages/4.coroutines/4d.io-awaitable.adoc
@@ -126,7 +126,7 @@ To create a custom IoAwaitable:
 struct my_awaitable
 {
     io_env const* env_ = nullptr;
-    std::coroutine_handle<> continuation_;
+    continuation cont_;
     result_type result_;
 
     bool await_ready() const noexcept
@@ -138,8 +138,10 @@ struct my_awaitable
     {
         // Store pointer to environment, never copy
         env_ = env;
-        continuation_ = h;
-
+        // Wrap the caller's handle in a continuation we own, so it stays
+        // at a stable address until the executor resumes it.
+        cont_.h = h;
+
         // Start async operation...
         start_operation();
 
@@ -155,16 +157,18 @@ struct my_awaitable
 private:
     void on_completion()
     {
-        // Resume on caller's executor
-        env_->executor.dispatch(continuation_);
+        // Resume the caller on its executor. post() takes the
+        // continuation by reference and queues it; never resume inline
+        // from a completion callback (it may run on the wrong thread).
+        env_->executor.post(cont_);
     }
 };
 ----
 
 The key points:
 
 1. Store the `io_env` as a pointer (`io_env const*`), never a copy. Launch functions guarantee the `io_env` outlives the awaitable's operation.
-2. Use the executor to dispatch completion
+2. To resume the caller, wrap its handle in a `continuation` and pass that to the executor's `post` (or `dispatch`) — these take a `continuation&`, not a raw `coroutine_handle`. Store the `continuation` in the awaitable so it keeps a stable address until the executor dequeues and resumes it; the executor links continuations intrusively, so a temporary would dangle.
 3. Respect the stop token for cancellation
 
 === Stop Callbacks Must Post, Not Resume

diff --git a/doc/modules/ROOT/pages/4.coroutines/4e.cancellation.adoc b/doc/modules/ROOT/pages/4.coroutines/4e.cancellation.adoc
@@ -366,24 +366,28 @@ NOTE: Capy's built-in I/O awaitables (via Corosio) already use the post-back pat
 
 === Timeout Pattern
 
-Combine a timer with stop token to implement timeouts:
+Capy ships a first-class `timeout()` combinator (`<boost/capy/timeout.hpp>`) that races an `io_result`-returning awaitable against a deadline. The first to complete wins and cancels the other; if the timer fires first, the result carries `cond::timeout`:
 
 [source,cpp]
 ----
-task<> with_timeout(task<> operation, std::chrono::seconds timeout)
+#include <boost/capy/timeout.hpp>
+
+using namespace std::chrono_literals;
+
+task<void> read_with_timeout(socket& sock, mutable_buffer buf)
 {
-    std::stop_source source;
-
-    // Timer that requests stop after timeout
-    auto timer = co_await start_timer(timeout, [&source] {
-        source.request_stop();
-    });
-
-    // Run operation with our stop token
-    co_await run_with_token(source.get_token(), std::move(operation));
+    auto [ec, n] = co_await capy::timeout(sock.read_some(buf), 50ms);
+    if (ec == cond::timeout)
+    {
+        // deadline elapsed before the read completed
+        co_return;
+    }
+    // ... use the n bytes read
 }
 ----
 
+The deadline itself is built on `delay()` (`<boost/capy/delay.hpp>`), an awaitable that suspends for a duration and resumes with `cond::canceled` if its stop token is activated. Reach for `timeout()` rather than wiring a timer to a `std::stop_source` by hand.
+
 === User Cancellation
 
 Connect UI cancellation to stop tokens. Pass the token through `run_async` so it propagates automatically via the execution environment—the task accesses it with `co_await this_coro::stop_token` instead of receiving it as a function argument:

diff --git a/doc/modules/ROOT/pages/4.coroutines/4f.composition.adoc b/doc/modules/ROOT/pages/4.coroutines/4f.composition.adoc
@@ -95,7 +95,7 @@ I/O errors are reported through the `ec` field of the `io_result`. When any chil
 
 1. Stop is requested for sibling tasks
 2. All tasks complete (or respond to stop)
-3. The first `ec` is propagated in the outer `io_result`
+3. The first `ec` (in completion order, not input order) is propagated in the outer `io_result`
 
 [source,cpp]
 ----
@@ -175,6 +175,8 @@ task<> example()
 
 The result is a `variant` with `error_code` at index 0 (failure/no winner) and one alternative per input task at indices 1..N. Only tasks returning `!ec` can win; errors and exceptions do not count as winning. When a winner is found, stop is requested for all siblings. All tasks complete before `when_any` returns.
 
+When every task fails, `when_any` reports a failure, but *which* one is unspecified: the result either carries an `error_code` at index 0 or rethrows one of the children's exceptions. Unlike `when_all`, there is no priority between error codes and exceptions, and no guarantee about which task's failure surfaces (including no guarantee that it is the first or last to complete). Do not rely on receiving the failure from any particular task.
+
 === Errors Do Not Win (wait_for_one_success)
 
 A child that returns a non-zero `ec` (or throws) does *not* win, and it does *not* cancel its siblings. `when_any` keeps waiting until some child succeeds or until every child has finished. Only when *all* children fail does the result settle at index 0, holding an `error_code`.

diff --git a/doc/modules/ROOT/pages/4.coroutines/4g.allocators.adoc b/doc/modules/ROOT/pages/4.coroutines/4g.allocators.adoc
@@ -55,19 +55,16 @@ capy::safe_resume(h);   // saves and restores TLS around h.resume()
 
 `safe_resume` saves the current thread-local allocator, calls `h.resume()`, then restores the saved value. This makes TLS behave like a stack: nested resumes cannot spoil the outer value. All of Capy's built-in executors (`thread_pool`, strands, `blocking_context`) use `safe_resume` internally. Custom executor event loops must do the same -- see xref:8.examples/8n.custom-executor.adoc[Custom Executor] for an example.
 
-== The FrameAllocator Concept
+== Custom Allocator Requirements
 
-Custom allocators must satisfy the `FrameAllocator` concept, which is compatible with {cpp} allocator requirements:
+Custom allocators must meet the usual {cpp} allocator requirements, or be a `std::pmr::memory_resource*`. The library does not expose a separate public concept for them; a value-type allocator works as a frame allocator when it provides, illustratively:
 
 [source,cpp]
 ----
-template<typename A>
-concept FrameAllocator = requires {
-    typename A::value_type;
-} && requires(A& a, std::size_t n) {
-    { a.allocate(n) } -> std::same_as<typename A::value_type*>;
-    { a.deallocate(std::declval<typename A::value_type*>(), n) };
-};
+// Illustrative requirements — not a named public concept:
+typename A::value_type;
+a.allocate(n)        // -> A::value_type*
+a.deallocate(p, n);
 ----
 
 In practice, any standard allocator works.
@@ -108,6 +105,31 @@ Capy provides `recycling_memory_resource`, a memory resource optimized for corou
 
 This allocator is used by default for `thread_pool` and other execution contexts.
 
+NOTE: `recycling_memory_resource` honors only the default new alignment (`__STDCPP_DEFAULT_NEW_ALIGNMENT__`, typically `alignof(std::max_align_t)`). The alignment argument passed to `do_allocate`/`do_deallocate` is ignored, so over-aligned requests are not satisfied. This is sufficient for coroutine frames but means the resource is not a drop-in replacement where over-aligned allocations are required.
+
+== Frame Allocator Mixin
+
+Most users never need to allocate coroutine frames manually -- `task<T>` and the built-in awaitable types already participate in TLS frame allocation. When you write your own coroutine promise type and want it to use the same fast path, inherit from `frame_alloc_mixin`:
+
+[source,cpp]
+----
+struct my_coroutine
+{
+    struct promise_type : capy::frame_alloc_mixin
+    {
+        // get_return_object, initial_suspend, ...
+    };
+};
+----
+
+`frame_alloc_mixin` (in `<boost/capy/ex/frame_alloc_mixin.hpp>`) supplies `operator new` and `operator delete` that:
+
+* Read the thread-local frame allocator set by `run_async` (falling back to `std::pmr::get_default_resource()` when none is set).
+* Bypass virtual dispatch when that allocator is the default recycling memory resource.
+* Store the resolved allocator pointer at the tail of each frame, so deallocation uses the correct resource even if the thread-local allocator has since changed.
+
+This is the same strategy used internally by `io_awaitable_promise_base`. Use the mixin directly when your promise type does not need the full environment and continuation support that `io_awaitable_promise_base` provides. The allocation fast path uses thread-local storage and needs no synchronization; the global pool fallback is mutex-protected.
+
 == HALO Optimization
 
 *Heap Allocation eLision Optimization* (HALO) allows the compiler to allocate coroutine frames on the stack instead of the heap when:

diff --git a/doc/modules/ROOT/pages/5.buffers/5a.overview.adoc b/doc/modules/ROOT/pages/5.buffers/5a.overview.adoc
@@ -95,7 +95,7 @@ This single signature accepts:
 * A single `const_buffer`
 * A `span<const_buffer>`
 * A `vector<const_buffer>`
-* A `string_view` (converts to single buffer)
+* A `string_view` wrapped with `make_buffer` (which yields a single `const_buffer`)
 * A custom composite type
 * *Any composition of the above—without allocation*
 

diff --git a/doc/modules/ROOT/pages/5.buffers/5b.types.adoc b/doc/modules/ROOT/pages/5.buffers/5b.types.adoc
@@ -7,13 +7,27 @@ This section introduces Capy's fundamental buffer types: `const_buffer` and `mut
 * Completed xref:5.buffers/5a.overview.adoc[Why Concepts, Not Spans]
 * Understanding of why concept-driven buffers enable composition
 
-== Why Not std::byte?
+== Buffers Are Handles
+
+A `const_buffer` or `mutable_buffer` is a *handle*: a non-owning `(pointer, size)` view of memory it does not own. Constructing one copies no bytes, and destroying one frees nothing.
+
+This splits lifetime responsibility cleanly:
+
+* *You own the bytes.* The memory a buffer refers to—a stack array, a `std::string`, a slab from your allocator—is yours to keep alive. It must remain valid for the entire duration of any operation you hand the buffer to, including across the suspension points of a `co_await`-ed I/O operation.
+* *The library owns the handles.* Capy creates and manages buffer handles and handle-sequences on your behalf—the buffers a dynamic buffer exposes through `prepare`/`data`, the sub-range a `buffer_slice` produces, the descriptors a type-erased stream passes to the OS. Each such handle is valid only for the window its API documents, typically until the next call that mutates the owner.
+
+The library never copies or takes ownership of your bytes through a buffer; it only moves handles. This split explains every buffer-lifetime rule in this chapter.
+
+== Why `void*`, Not `std::byte`?
 
 `std::byte` imposes a semantic opinion. It says "this is raw bytes"—but that is itself an opinion about the data's nature.
 
 POSIX uses `void*` for buffers. This expresses semantic neutrality: "I move memory without opining on what it contains." The OS doesn't care if the bytes represent text, integers, or compressed data—it moves them.
 
-But `std::span<void>` doesn't compile. {cpp} can't express a type-agnostic buffer abstraction using `span`.
+Two concrete forces favor `void*` specifically over `std::span<std::byte>`:
+
+* *Platform types already use it.* The OS structures Capy maps onto—`iovec`'s `iov_base`, `WSABUF`'s `buf`—are `void*`/`char*`. Erasing to `void*` makes conversion to those structures a layout match rather than a reinterpretation.
+* *Callers supply many element types.* User data arrives as `char[]`, `unsigned char[]`, `std::byte[]`, `std::string`, and more. A single neutral pointer erases all of them to one representation. `std::span<std::byte>` would force every caller to reinterpret their bytes first, and `std::span<void>` is ill-formed—{cpp} cannot express a type-agnostic buffer with `span`.
 
 Capy provides `const_buffer` and `mutable_buffer` as semantically neutral buffer types with known layout.
 
@@ -155,6 +169,8 @@ The returned buffer type depends on the element constness of the range:
 * Ranges of mutable elements → `mutable_buffer`
 * Ranges of const elements, `string_view`, string literals → `const_buffer`
 
+The buffer's size, in bytes, is `count * sizeof(element)`.
+
 == Layout Compatibility
 
 `const_buffer` and `mutable_buffer` have the same memory layout as OS buffer structures:

diff --git a/doc/modules/ROOT/pages/5.buffers/5c.sequences.adoc b/doc/modules/ROOT/pages/5.buffers/5c.sequences.adoc
@@ -15,6 +15,8 @@ A *buffer sequence* is any type that can produce an iteration of buffers. Formal
 * A range of buffers (like `vector<const_buffer>`) is a multi-element sequence
 * Any bidirectional range with buffer-convertible values qualifies
 
+Treating a single buffer as a one-element sequence is a deliberate convenience, not an accident of the definition. It lets one concept-constrained signature serve both the common single-buffer call and scatter/gather composition, with no overload and no explicit wrap at the call site. Capy favors this convenience as a primary design goal and applies it consistently—`make_buffer`, for instance, accepts any contiguous range of bytes—so that buffer-passing reads the same whether you hand over one region or many.
+
 == The Concepts
 
 === ConstBufferSequence

diff --git a/doc/modules/ROOT/pages/5.buffers/5d.system-io.adoc b/doc/modules/ROOT/pages/5.buffers/5d.system-io.adoc
@@ -67,32 +67,34 @@ Internally, Capy:
 
 == Stack-Based Conversion
 
-For common cases (small numbers of buffers), conversion happens on the stack:
+Conversion always happens on the stack—the implementation never
+allocates. A fixed-size, on-frame window of buffer descriptors (16
+entries) is filled from the sequence and passed to the OS call. If the
+sequence has more buffers than fit in the window, the window is refilled
+and the OS call is repeated for the remaining buffers:
 
 [source,cpp]
 ----
 // Pseudocode of internal implementation
 template<ConstBufferSequence Buffers>
 auto platform_write(Buffers const& buffers)
 {
-    std::size_t count = buffer_length(buffers);
-
-    if (count <= 8)  // Small buffer optimization
-    {
-        iovec iovecs[8];
-        fill_iovecs(iovecs, buffers, count);
-        return writev(fd, iovecs, count);
-    }
-    else  // Heap fallback
+    iovec iovecs[16];  // fixed on-frame window, never heap-allocated
+
+    auto it = begin(buffers);
+    auto last = end(buffers);
+    while (it != last)
     {
-        std::vector<iovec> iovecs(count);
-        fill_iovecs(iovecs.data(), buffers, count);
-        return writev(fd, iovecs.data(), count);
+        std::size_t count = fill_iovecs(iovecs, it, last, 16);  // up to 16
+        auto result = writev(fd, iovecs, count);
+        // ... advance the window past the buffers just written
     }
 }
 ----
 
-Most real-world code uses fewer than 8 buffers, so heap allocation is rarely needed.
+The window size (16) is fixed and implementation-defined. Sequences with
+more buffers than the window are handled by refilling it across
+successive OS calls; there is no heap fallback.
 
 == Scatter/Gather Benefits
 

diff --git a/doc/modules/ROOT/pages/6.streams/6b.streams.adoc b/doc/modules/ROOT/pages/6.streams/6b.streams.adoc
@@ -20,6 +20,8 @@ concept ReadStream =
     };
 ----
 
+The `requires` clause names a single representative buffer (`mutable_buffer_archetype`) because a {cpp} concept cannot say "works with every buffer sequence." The real contract is that `read_some` accepts *any* `MutableBufferSequence`—one buffer or a range; the archetype only samples that requirement.
+
 === read_some Semantics
 
 [source,cpp]
@@ -91,6 +93,8 @@ concept WriteStream =
     };
 ----
 
+As with `ReadStream`, the `const_buffer_archetype` is only a representative: the real contract is that `write_some` accepts *any* `ConstBufferSequence`, which a {cpp} concept cannot fully express.
+
 === write_some Semantics
 
 [source,cpp]

diff --git a/doc/modules/ROOT/pages/6.streams/6c.sources-sinks.adoc b/doc/modules/ROOT/pages/6.streams/6c.sources-sinks.adoc
@@ -35,6 +35,8 @@ Await-returns `(error_code, std::size_t)`:
 * On EOF: `ec == cond::eof`, and `n` is bytes read before EOF (partial read)
 * On error: `ec`, and `n` is bytes read before error
 
+If `buffer_empty(buffers)` is true, the operation completes immediately with `!ec` and `n` equal to 0.
+
 The key difference from `ReadStream`: a successful read fills the buffer completely.
 
 === Use Cases