Skip to content

compile() raises UnicodeEncodeError for docstrings with lone surrogates #149303

@maurycy

Description

@maurycy

Bug report

Bug description:

Found by fuzzing.

The simplest repro:

"""\ud800"""
x = 1
2026-05-03T01:46:20.460163000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main 6b632ce?) % ./python.exe repro.py 
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
[1] 2026-05-03T01:46:21.697294000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main 6b632ce?) % 

I believe this is the root cause:

cpython/Python/compile.c

Lines 1559 to 1576 in c650b51

// C implementation of inspect.cleandoc()
//
// Difference from inspect.cleandoc():
// - Do not remove leading and trailing blank lines to keep lineno.
PyObject *
_PyCompile_CleanDoc(PyObject *doc)
{
doc = PyObject_CallMethod(doc, "expandtabs", NULL);
if (doc == NULL) {
return NULL;
}
Py_ssize_t doc_size;
const char *doc_utf8 = PyUnicode_AsUTF8AndSize(doc, &doc_size);
if (doc_utf8 == NULL) {
Py_DECREF(doc);
return NULL;
}

It used to work:

2026-05-03T01:50:40.264145000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main 6b632ce?) % uv run --python 3.11 ./repro.py
2026-05-03T01:50:46.764500000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main 6b632ce?) % uv run --python 3.12 ./repro.py
2026-05-03T01:50:51.752907000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main 6b632ce?) % uv run --python 3.13 ./repro.py
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
[1] 2026-05-03T01:50:54.414813000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main 6b632ce?) %   

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions