rt-smart kernel assert fail after exec failure before task startup#11356
rt-smart kernel assert fail after exec failure before task startup#11356zhangyangysu wants to merge 1 commit intoRT-Thread:masterfrom
Conversation
When msh tries to execute a non-ELF path, lwp_execve() may allocate a PID before lwp_load() fails. The old error path only dropped the LWP reference, leaving the PID tree entry pointing to a freed LWP. In an init-less boot flow, this can poison pid 1 after a failed command from msh. A later LWP launch may then treat the stale pid 1 entry as a valid parent LWP, resulting in invalid pgrp/session state and a job-control assertion during process exit. Add lwp_pid_rollback() for exec/spawn failures before the process becomes runnable. Unlike lwp_pid_put(), it always releases the PID lock and does not enter the "no more pid allocation" state when the PID tree becomes empty. Use the rollback helper in lwp_execve() failure paths after PID allocation. Signed-off-by: zhangyang <gaoshanliukou@163.com>
|
yang.zhang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
👋 感谢您对 RT-Thread 的贡献!Thank you for your contribution to RT-Thread! 为确保代码符合 RT-Thread 的编码规范,请在你的仓库中执行以下步骤运行代码格式化工作流(如果格式化CI运行失败)。 🛠 操作步骤 | Steps
完成后,提交将自动更新至 如有问题欢迎联系我们,再次感谢您的贡献!💐 |
📌 Code Review Assignment🏷️ Tag: componentsReviewers: @Maihuanyi Changed Files (Click to expand)
🏷️ Tag: components_lwpReviewers: @xu18838022837 Changed Files (Click to expand)
📊 Current Review Status (Last Updated: 2026-05-03 10:21 CST)
📝 Review Instructions
|
|
@zhangyangysu Thank you for your contribution. Please sign the CLA firstly, thanks. |
When msh tries to execute a non-ELF path, lwp_execve() may allocate a PID before lwp_load() fails. The old error path only dropped the LWP reference, leaving the PID tree entry pointing to a freed LWP.
In an init-less boot flow, this can poison pid 1 after a failed command from msh. A later LWP launch may then treat the stale pid 1 entry as a valid parent LWP, resulting in invalid pgrp/session state and a job-control assertion during process exit.
Add lwp_pid_rollback() for exec/spawn failures before the process becomes runnable. Unlike lwp_pid_put(), it always releases the PID lock and does not enter the "no more pid allocation" state when the PID tree becomes empty.
Use the rollback helper in lwp_execve() failure paths after PID allocation.
拉取/合并请求描述:(PR description)
[
Test on qemu-virt64-aarch64, which enable rt-smart
Reproduce:
sd.bin must no init binary, which would not start init process, then kernel
would enter msh not ash, run 'bin' which would be fail, then run 'hello',
kernel would assert failuer:
Log:
\ | /
/ | \ 5.3.0 build May 3 2026 09:19:10
2006 - 2024 Copyright by RT-Thread team
[I/drivers.serial] Using /dev/ttyS0 as default console
mnt_init!
file system initialization done!
[I/rtdm.mnt] File system initialization done
hello rt-thread
[I/cpu.aa64] Call cpu 1 on success
msh />[I/cpu.aa64] Call cpu 2 on success
[I/cpu.aa64] Call cpu 3 on success
msh />bin
[E/load.elf] elf_file_read : read from offset: 0x0 error
[E/load.elf] elf_load_ehdr : elf_file_read failed, ret : -255
[E/load.elf] lwp_load : elf_file_load error, ret : -255
bin: command not found.
msh />hello
msh />hello world!
(session) assertion failed at function:lwp_jobctrl_on_exit, line number:53
please use: addr2line -e rtthread.elf -a -f
0xffff00000012c0d8 0xffff000000092f78 0xffff000000098028 0xffff000000097aa8 0xffff000000096198 0xffff00000009dfe0 0xffff000000086c3c 0xffff000000086be0
为什么提交这份PR (why to submit this PR)
RT-Smart kernel will assertion failed in some specific case.
你的解决方案是什么 (what is your solution)
Release a process ID and clean up associated resources after exec failure.
And must not using lwp_pid_put, because lwp_pid_root would be AVL_EMPTY, msh would
take the pid_mtx, then if enter ash, it can't fork any new process, lwp_execve would be hanged
in lwp_pid_lock_take.
请提供验证的bsp和config (provide the config and bsp)
bsp/qemu-virt64-aarch64
Just enable RT SMART in the menuconfig
N/A (build verified locally on bsp/qemu-virt64-aarch64)
]
当前拉取/合并请求的状态 Intent for your PR
必须选择一项 Choose one (Mandatory):
代码质量 Code Quality:
我在这个拉取/合并请求中已经考虑了 As part of this pull request, I've considered the following:
#if 0代码,不包含已经被注释了的代码 All redundant code is removed and cleaned up