Skip to content

PEP 475 EINTR retry paths don't check tstate->async_exc #144748

@samliddicott

Description

@samliddicott

Bug report

Bug description:

Summary

PyThreadState_SetAsyncExc() and PEP 475's EINTR retry are two existing mechanisms that don't work together. The eval loop checks tstate->async_exc at bytecode boundaries via _Py_HandlePending(). The EINTR retry paths are an equivalent checkpoint — a moment between syscall attempts where pending state can be safely examined — but they don't perform this check. No new API or mechanism is needed; the fix is extending an existing check to existing checkpoints.

As a result, PyThreadState_SetAsyncExc() cannot interrupt a thread blocked in a C-level call (time.sleep, socket.recv, select.select, Lock.acquire, etc.), even when a signal causes EINTR and gives the retry path an opportunity to notice the pending exception.

The suggested fix (below, after the reproducer) is a small, self-contained change to check tstate->async_exc as part of _Py_MakePendingCalls

Checkpoint inconsistency

The EINTR retry paths already call Py_MakePendingCalls() or PyErr_CheckSignals() before retrying, but neither checks tstate->async_exc:

Checkpoint Pending-state call Checks async_exc?
Eval loop (_Py_HandlePending) _Py_HandlePending() Yes
time.sleep EINTR retry Py_MakePendingCalls() No
socket.recv EINTR retry PyErr_CheckSignals() No
select.select EINTR retry PyErr_CheckSignals() No
Lock.acquire EINTR retry (nothing) No
  • Py_MakePendingCalls() processes the pending-calls queue but does not check tstate->async_exc. That check lives only in _Py_HandlePending() in ceval.c.
  • PyErr_CheckSignals() runs Python-level signal handlers, but only on the main thread — it is a no-op on non-main threads.
  • Lock.acquire doesn't call either function, retrying sem_timedwait immediately after EINTR with no pending-state check at all.

Reproducer

Two tests using the same API (PyThreadState_SetAsyncExc) and the same signal (SIGUSR1). The only difference is whether the target thread is at a bytecode boundary or inside a C-level blocking call. This appears to be an omission in the PEP 475 implementation rather than a design or technical constraint — the suggested fix below the reproducer is a small, self-contained change.

import threading, ctypes, signal, time

signal.signal(signal.SIGUSR1, lambda *_: None)

def inject(t, delay=0.5):
    """Set async exception on thread t, then send SIGUSR1 to cause EINTR."""
    time.sleep(delay)
    ctypes.pythonapi.PyThreadState_SetAsyncExc(
        ctypes.c_ulong(t.ident),
        ctypes.py_object(KeyboardInterrupt))
    signal.pthread_kill(t.ident, signal.SIGUSR1)

# --- Test 1: bytecode loop (eval loop checks async_exc) ---

def bytecode_worker():
    try:
        while True:
            pass
    except KeyboardInterrupt:
        pass

t1 = threading.Thread(target=bytecode_worker)
t1.start()
t0 = time.monotonic()
inject(t1)
t1.join()
e1 = time.monotonic() - t0
print(f"Test 1 (bytecode loop): interrupted in {e1:.1f}s — eval loop checks async_exc")

# --- Test 2: C-level time.sleep (EINTR retry does NOT check async_exc) ---

def sleep_worker():
    try:
        time.sleep(15)
    except KeyboardInterrupt:
        pass

t2 = threading.Thread(target=sleep_worker)
t2.start()
t0 = time.monotonic()
inject(t2)
t2.join()
e2 = time.monotonic() - t0
print(f"Test 2 (time.sleep):    interrupted in {e2:.1f}s — EINTR retry does NOT check async_exc")

print()
if e1 < 2 and e2 > 5:
    print(f"BUG CONFIRMED: same API, same exception, same mechanism.")
    print(f"  Bytecode checkpoint: {e1:.1f}s (works)")
    print(f"  EINTR checkpoint:    {e2:.1f}s (ignored — blocked full sleep duration)")
elif e1 < 2 and e2 < 2:
    print("FIXED: both checkpoints now honor async_exc")

Output on CPython 3.12 (Linux):

Test 1 (bytecode loop): interrupted in 0.5s — eval loop checks async_exc
Test 2 (time.sleep):    interrupted in 15.0s — EINTR retry does NOT check async_exc

BUG CONFIRMED: same API, same exception, same mechanism.
  Bytecode checkpoint: 0.5s (works)
  EINTR checkpoint:    15.0s (ignored — blocked full sleep duration)

The async exception is delivered in both cases — but the EINTR retry path ignores it and retries the syscall, delaying delivery until time.sleep completes naturally.

Suggested fix

Option A (minimal, preferred) — Add the async_exc check to Py_MakePendingCalls():

static int
_Py_MakePendingCalls(PyThreadState *tstate)
{
    /* ... existing pending-calls processing ... */

    /* Check for async exceptions (PyThreadState_SetAsyncExc) */
    if (tstate->async_exc) {
        PyObject *exc = tstate->async_exc;
        tstate->async_exc = NULL;
        PyErr_SetNone(exc);
        Py_DECREF(exc);
        return -1;
    }

    /* ... existing signal handling (main thread only) ... */
}

This is a small, self-contained change. Any C code that already calls Py_MakePendingCalls() after EINTR (such as time.sleep) automatically gains async_exc support with no per-module changes.

Option B — Add per-module checks for code that doesn't call Py_MakePendingCalls():

if (err == EINTR) {
    if (Py_MakePendingCalls() < 0)
        return -1;
    if (_PyThreadState_GET()->async_exc) {
        return -1;
    }
}

Modules like Lock.acquire that currently have no EINTR checkpoint at all would need individual fixes regardless — their lack of any pending-state check after EINTR means they silently swallow signal delivery, which is arguably a separate issue.

Impact

This affects any use of PyThreadState_SetAsyncExc() where the target thread may be blocked in a C-level call:

  • Task cancellation in thread pools (Celery workers, concurrent.futures thread executors)
  • Timeout enforcement across thread boundaries
  • Debuggers and profilers interrupting blocked threads
  • C extension modules using the public PyThreadState_SetAsyncExc API as documented

Relationship to #137958

Issue #137958 requested a way to interrupt threads blocked in C calls and was closed as Not Planned. That issue framed the problem as a feature request for new thread-interruption capability.

This issue is narrower: an existing public API (PyThreadState_SetAsyncExc) and an existing checkpoint mechanism (EINTR retry) that don't work together due to an omission in the PEP 475 implementation. The eval loop already checks async_exc at its checkpoints; the EINTR retry paths should do the same for consistency.

Versions

Tested on CPython 3.12. The EINTR retry mechanism was introduced in PEP 475 (Python 3.5), so earlier versions with the same retry paths are likely affected.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions