Skip to content

rt-smart kernel assert fail after exec failure before task startup#11356

Open
zhangyangysu wants to merge 1 commit intoRT-Thread:masterfrom
zhangyangysu:master
Open

rt-smart kernel assert fail after exec failure before task startup#11356
zhangyangysu wants to merge 1 commit intoRT-Thread:masterfrom
zhangyangysu:master

Conversation

@zhangyangysu
Copy link
Copy Markdown

@zhangyangysu zhangyangysu commented May 3, 2026

When msh tries to execute a non-ELF path, lwp_execve() may allocate a PID before lwp_load() fails. The old error path only dropped the LWP reference, leaving the PID tree entry pointing to a freed LWP.

In an init-less boot flow, this can poison pid 1 after a failed command from msh. A later LWP launch may then treat the stale pid 1 entry as a valid parent LWP, resulting in invalid pgrp/session state and a job-control assertion during process exit.

Add lwp_pid_rollback() for exec/spawn failures before the process becomes runnable. Unlike lwp_pid_put(), it always releases the PID lock and does not enter the "no more pid allocation" state when the PID tree becomes empty.

Use the rollback helper in lwp_execve() failure paths after PID allocation.

拉取/合并请求描述:(PR description)

[

Test on qemu-virt64-aarch64, which enable rt-smart
Reproduce:
sd.bin must no init binary, which would not start init process, then kernel
would enter msh not ash, run 'bin' which would be fail, then run 'hello',
kernel would assert failuer:
Log:
\ | /

  • RT - Thread Smart Operating System
    / | \ 5.3.0 build May 3 2026 09:19:10
    2006 - 2024 Copyright by RT-Thread team
    [I/drivers.serial] Using /dev/ttyS0 as default console
    mnt_init!
    file system initialization done!
    [I/rtdm.mnt] File system initialization done
    hello rt-thread
    [I/cpu.aa64] Call cpu 1 on success
    msh />[I/cpu.aa64] Call cpu 2 on success
    [I/cpu.aa64] Call cpu 3 on success

msh />bin
[E/load.elf] elf_file_read : read from offset: 0x0 error
[E/load.elf] elf_load_ehdr : elf_file_read failed, ret : -255
[E/load.elf] lwp_load : elf_file_load error, ret : -255
bin: command not found.
msh />hello
msh />hello world!
(session) assertion failed at function:lwp_jobctrl_on_exit, line number:53
please use: addr2line -e rtthread.elf -a -f
0xffff00000012c0d8 0xffff000000092f78 0xffff000000098028 0xffff000000097aa8 0xffff000000096198 0xffff00000009dfe0 0xffff000000086c3c 0xffff000000086be0

为什么提交这份PR (why to submit this PR)

RT-Smart kernel will assertion failed in some specific case.

你的解决方案是什么 (what is your solution)

Release a process ID and clean up associated resources after exec failure.
And must not using lwp_pid_put, because lwp_pid_root would be AVL_EMPTY, msh would
take the pid_mtx, then if enter ash, it can't fork any new process, lwp_execve would be hanged
in lwp_pid_lock_take.

请提供验证的bsp和config (provide the config and bsp)

  • BSP:
    bsp/qemu-virt64-aarch64
  • .config:
    Just enable RT SMART in the menuconfig
  • action:
    N/A (build verified locally on bsp/qemu-virt64-aarch64)
    ]

当前拉取/合并请求的状态 Intent for your PR

必须选择一项 Choose one (Mandatory):

  • 本拉取/合并请求是一个成熟版本 This PR is mature, and ready to be integrated into the repo

代码质量 Code Quality:

我在这个拉取/合并请求中已经考虑了 As part of this pull request, I've considered the following:

  • 已经仔细查看过代码改动的对比 Already check the difference between PR and old code
  • 代码风格正确,包括缩进空格,命名及其他风格 Style guide is adhered to, including spacing, naming and other styles
  • 没有垃圾代码,代码尽量精简,不包含#if 0代码,不包含已经被注释了的代码 All redundant code is removed and cleaned up
  • 所有变更均有原因及合理的,并且不会影响到其他软件组件代码或BSP All modifications are justified and not affect other components or BSP
  • 对难懂代码均提供对应的注释 I've commented appropriately where code is tricky
  • 代码是高质量的 Code in this PR is of high quality
  • 已经使用formatting 等源码格式化工具确保格式符合RT-Thread代码规范 This PR complies with RT-Thread code specification
  • 如果是新增bsp, 已经添加ci检查到.github/ALL_BSP_COMPILE.json 详细请参考链接BSP自查

When msh tries to execute a non-ELF path, lwp_execve() may allocate a PID
before lwp_load() fails. The old error path only dropped the LWP reference,
leaving the PID tree entry pointing to a freed LWP.

In an init-less boot flow, this can poison pid 1 after a failed command from
msh. A later LWP launch may then treat the stale pid 1 entry as a valid parent
LWP, resulting in invalid pgrp/session state and a job-control assertion during
process exit.

Add lwp_pid_rollback() for exec/spawn failures before the process becomes
runnable. Unlike lwp_pid_put(), it always releases the PID lock and does not
enter the "no more pid allocation" state when the PID tree becomes empty.

Use the rollback helper in lwp_execve() failure paths after PID allocation.

Signed-off-by: zhangyang <gaoshanliukou@163.com>
@zhangyangysu zhangyangysu requested a review from BernardXiong as a code owner May 3, 2026 02:21
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


yang.zhang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

👋 感谢您对 RT-Thread 的贡献!Thank you for your contribution to RT-Thread!

为确保代码符合 RT-Thread 的编码规范,请在你的仓库中执行以下步骤运行代码格式化工作流(如果格式化CI运行失败)。
To ensure your code complies with RT-Thread's coding style, please run the code formatting workflow by following the steps below (If the formatting of CI fails to run).


🛠 操作步骤 | Steps

  1. 前往 Actions 页面 | Go to the Actions page
    点击进入工作流 → | Click to open workflow →

  2. 点击 Run workflow | Click Run workflow

  • 设置需排除的文件/目录(目录请以"/"结尾)
    Set files/directories to exclude (directories should end with "/")
  • 将目标分支设置为 \ Set the target branch to:master
  • 设置PR number为 \ Set the PR number to:11356
  1. 等待工作流完成 | Wait for the workflow to complete
    格式化后的代码将自动推送至你的分支。
    The formatted code will be automatically pushed to your branch.

完成后,提交将自动更新至 master 分支,关联的 Pull Request 也会同步更新。
Once completed, commits will be pushed to the master branch automatically, and the related Pull Request will be updated.

如有问题欢迎联系我们,再次感谢您的贡献!💐
If you have any questions, feel free to reach out. Thanks again for your contribution!

@github-actions github-actions Bot added RT-Smart RT-Thread Smart related PR or issues component: lwp Component labels May 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

📌 Code Review Assignment

🏷️ Tag: components

Reviewers: @Maihuanyi

Changed Files (Click to expand)
  • components/lwp/lwp.c
  • components/lwp/lwp_pid.c
  • components/lwp/lwp_pid.h

🏷️ Tag: components_lwp

Reviewers: @xu18838022837

Changed Files (Click to expand)
  • components/lwp/lwp.c
  • components/lwp/lwp_pid.c
  • components/lwp/lwp_pid.h

📊 Current Review Status (Last Updated: 2026-05-03 10:21 CST)


📝 Review Instructions

  1. 维护者可以通过单击此处来刷新审查状态: 🔄 刷新状态
    Maintainers can refresh the review status by clicking here: 🔄 Refresh Status

  2. 确认审核通过后评论 LGTM/lgtm
    Comment LGTM/lgtm after confirming approval

  3. PR合并前需至少一位维护者确认
    PR must be confirmed by at least one maintainer before merging

ℹ️ 刷新CI状态操作需要具备仓库写入权限。
ℹ️ Refresh CI status operation requires repository Write permission.

@BernardXiong
Copy link
Copy Markdown
Member

@zhangyangysu Thank you for your contribution. Please sign the CLA firstly, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: lwp Component RT-Smart RT-Thread Smart related PR or issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants