Skip to content

WIP: rtapi_app no autostart#4206

Draft
hdiethelm wants to merge 4 commits into
LinuxCNC:masterfrom
hdiethelm:rtapi_no_autostart
Draft

WIP: rtapi_app no autostart#4206
hdiethelm wants to merge 4 commits into
LinuxCNC:masterfrom
hdiethelm:rtapi_no_autostart

Conversation

@hdiethelm

@hdiethelm hdiethelm commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Actual behavior:

  • rtapi_app starts on first loadrt command
  • rtapi_app exits on last rt component unload

This doesn't really allow to use hal.get_realtime_type() when there is no rt component loaded: #4205

This has other side effects, one example:

halcmd: debug 0 #rtapi_app starts, sets debug and immediately exits

Note: realtime scheduling unavailable (sched_setscheduler SCHED_FIFO: Operation not permitted).
  Process capabilities: cap_sys_nice=no cap_ipc_lock=no.
  Falling back to POSIX non-realtime.
  Fix: 'sudo make setcap' (preferred) or 'sudo make setuid' on rtapi_app.
  Override (testing only): set LINUXCNC_FORCE_REALTIME=1.
Note: Using POSIX non-realtime

halcmd: loadrt sum2 #rtapi_app starts again, not remembering the previous debug value

Note: realtime scheduling unavailable (sched_setscheduler SCHED_FIFO: Operation not permitted).
  Process capabilities: cap_sys_nice=no cap_ipc_lock=no.
  Falling back to POSIX non-realtime.
  Fix: 'sudo make setcap' (preferred) or 'sudo make setuid' on rtapi_app.
  Override (testing only): set LINUXCNC_FORCE_REALTIME=1.
Note: Using POSIX non-realtime
halcmd: loadrt sum2 #rtapi_app starts and stays running due to a realtime component is loaded

Note: realtime scheduling unavailable (sched_setscheduler SCHED_FIFO: Operation not permitted).
  Process capabilities: cap_sys_nice=no cap_ipc_lock=no.
  Falling back to POSIX non-realtime.
  Fix: 'sudo make setcap' (preferred) or 'sudo make setuid' on rtapi_app.
  Override (testing only): set LINUXCNC_FORCE_REALTIME=1.
Note: Using POSIX non-realtime

halcmd: debug 0

This PR changes the behavior to:

  • rtapi_app starts on realtime start
  • rtapi_app exits on realtime stop

This makes the behavior identical to RTAPI where realtime start / realtime stop is needed and makes the behavior of rtapi_app more deterministic.

Downside: Before, you where able to use halcmd loadrt sum2 on uspace without first starting realtime due to autostart. This does not work any more and it anyway never worked with RTAPI.

halcmd loadrt sum2
error: No master found. Use realtime start to start one.
<commandline>:0: waitpid failed /home/hannes/linuxcnc-src/bin/rtapi_app sum2
<commandline>:0: /home/hannes/linuxcnc-src/bin/rtapi_app exited without becoming ready
<commandline>:0: insmod for sum2 failed, returned -1

However, I don't like having autostart without autostop, this will for sure create issues.

TBD

  • Will this create potential issues?
  • Check why I had to change the expected results for one test: The owner increased by two.
  • Remove the fork() inside halcmd, this is not needed any more

@BsAtHome

Copy link
Copy Markdown
Contributor

I'm not sure I like it that the raster test needs to call realtime start. It seems to be performed on the wrong level.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

I'm not sure I like it that the raster test needs to call realtime start. It seems to be performed on the wrong level.

Raster will most probably fail in RTAI the way it is implemented.
It uses: assert os.system('halcmd -f raster.hal') == 0, "raster.hal script failed" but it should probably use halrun.
Looks like a bad fix from my side, I have to look into it.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

Hmm, assert os.system('halrun raster.hal') == 0, "raster.hal script failed" of course doesn't just work. Halrun loads the script and immediately kills it.

Is there a way to keep it running until the test is finished? Looks like the raster test does some non-standard things, that's why it breaks when I use manual start.

@BsAtHome

Copy link
Copy Markdown
Contributor

The real clue is that the test program builds a component that the raster.hal connects to and then starts the realtime from within. This is a legitimate construct as .hal files are just lines executed by halcmd. This construct is expected to work. Therefore, auto-start is a requirement, whereas auto-stop should not be.

(One very important thing is the line sets program 1000, which is not a value, but it initializes the HAL_PORT queue.)

@grandixximo

grandixximo commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

@BsAtHome makes the call for me: if a userspace component plus a .hal that brings RT up on demand is a legitimate construct, then autostart is a requirement and my "move it up to halrun / the harness" suggestion is wrong. Scratch that part.

That actually points at a smaller fix than this PR: keep autostart, drop only autostop. That preserves the raster construct with zero test edits, fixes the "debug value lost, master restarts on every loadrt" example from the PR description since the master now persists, and keeps realtime_type valid once anything RT is loaded. Mechanically it is mostly the master_process_socket_command change to !force_exit, while leaving the autostart path in main() alone. #4205 is unaffected either way: no-master-ever stays the honest UNINITIALIZED.

The one review point that still stands regardless of which way you go: the hal-show +2 owner shift is worth root-causing before rebaselining. It is the raw rtapi module id, and the two extra ids look like the master (App()) and hal_lib now landing ahead of the user comps. Worth confirming it is deterministic, and ideally having hal-show print the owner by name so the test stops being coupled to RT startup allocation order.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

The real clue is that the test program builds a component that the raster.hal connects to and then starts the realtime from within. This is a legitimate construct as .hal files are just lines executed by halcmd. This construct is expected to work. Therefore, auto-start is a requirement, whereas auto-stop should not be.

(One very important thing is the line sets program 1000, which is not a value, but it initializes the HAL_PORT queue.)

It is not legal as long as RTAI exists. As predicted, this test fails with RTAI. RTAI has no autostart:

Running test: ../tests/raster
RTAPI: ERROR: could not open shared memory (No such file or directory)
HAL: ERROR: could not initialize RTAPI
Traceback (most recent call last):
error: Test failed: Traceback (most recent call last):
  File "/home/hannes/linuxcnc-src/tests/raster/./test", line 172, in main
    c = hal.component("test")
hal.error: Invalid argument

  File "/home/hannes/linuxcnc-src/tests/raster/./test", line 259, in <module>
    exit(main())
         ~~~~^^
  File "/home/hannes/linuxcnc-src/tests/raster/./test", line 253, in main
    c.exit()
    ^
UnboundLocalError: cannot access local variable 'c' where it is not associated with a value
*** ../tests/raster: XFAIL: test run exited with 1
Runtest: 1 tests run, 0 successful, 1 failed + 0 expected, 0 skipped, 0 shmem errors
Failed: 
../tests/raster

With this PR, the test still fails but with a different reason:

Running test: ../tests/raster
ERROR:  Can't remove RTAI modules, kill the following process(es) first
                     USER        PID ACCESS COMMAND
/dev/rtai_shm:       hannes    ....m python3
 32343error: Test failed: Traceback (most recent call last):
  File "/home/hannes/linuxcnc-src/tests/raster/./test", line 202, in main
    testInvalidOffset(prog, pin)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/hannes/linuxcnc-src/tests/raster/./test", line 80, in testInvalidOffset
    assert pin['fault_code'].value == FaultCodes.InvalidOffset.value
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

*** ../tests/raster: XFAIL: test run exited with 1
Runtest: 1 tests run, 0 successful, 1 failed + 0 expected, 0 skipped, 0 shmem errors
Failed: 
../tests/raster

@BsAtHome

Copy link
Copy Markdown
Contributor

Regarding auto-start... I'm not worried about the LCNC code base. I'm worried about external installations.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

Regarding auto-start... I'm not worried about the LCNC code base. I'm worried about external installations.

That's a point. Most people probably us uspace, not RTAI, so they never noticed that their setup will not work with RTAI due to they use halcmd instead of halrun. Even the actual tests are already broken... ;-)

I have to think about how to solve this issue in a way that does not break the whole concept behind.

Main issue: If you have autostart, the chance is high you forget stop. However, this will probably show issues already due to when you don't do stop, halcmd will complain about already loaded components and you will use halrun -U to exit.

There is just the case left where you unload all rt components and expect rtapi_app to exit, which won't happen. I already added a warning when you use start while rtapi_app is already running, so it should show up.

@grandixximo

Copy link
Copy Markdown
Contributor

On the residual you flagged (unload all rt components and rtapi_app stays running): that is actually consistent, not a regression. On RTAI, unloading all components does not stop the RT base either, you realtime stop explicitly. So removing autostop while keeping autostart makes uspace match the RTAI model rather than diverge from it.

And the leak risk is bounded: halrun (halrun -U / realtime stop) and the linuxcnc script already stop on exit, so a lingering master only affects ad-hoc halcmd sessions, where the user already ends up using halrun -U as you noted. So I think keep autostart, drop autostop is the right narrow scope here.

@hdiethelm

hdiethelm commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

On the residual you flagged (unload all rt components and rtapi_app stays running): that is actually consistent, not a regression. On RTAI, unloading all components does not stop the RT base either, you realtime stop explicitly. So removing autostop while keeping autostart makes uspace match the RTAI model rather than diverge from it.

RTAI does also not do auto-start. Each linuxcnc application, including halrun has realtime start at the beginning due to this. However, one test was messed up and does not run in RTAI and with this PR also not anymore for uspace before I added realtime start to it.

Expecting that most people use uspace and some of them probably use halcmd without realtime start, being inconsistent here and might be drop a deprecation warning can be still a good thing to not brick running setups.

And the leak risk is bounded: halrun (halrun -U / realtime stop) and the linuxcnc script already stop on exit, so a lingering master only affects ad-hoc halcmd sessions, where the user already ends up using halrun -U as you noted. So I think keep auto-start, drop auto-stop is the right narrow scope here.

Agreed.

However, I don't like to re-add all this goto code and passing arguments to master that are executed immediately used for auto-start. So I might split master / client fully already in this PR and add an auto-start master functionality to the client. Let's see how this goes. Or also just create a start_master() function if splitting gets to cumbersome.

This gives rtapi_app a deterministic behaivour. No other changes needed
exept implementing start / stop in realtime due to all apps run:
realtime start at startup
realtime stop at exit
@hdiethelm hdiethelm force-pushed the rtapi_no_autostart branch from ee20601 to b8b76f2 Compare June 28, 2026 14:13
Use subprocess.Popen to start halrun and exit at the end.
@hdiethelm hdiethelm force-pushed the rtapi_no_autostart branch from b8b76f2 to 5fa7f0a Compare June 28, 2026 14:18
Comment thread tests/raster/test
#use interactive mode to be have the hal running
#while needed and exit with writing "exit" at the end
halrun = subprocess.Popen(["halrun", "-Is", "raster.hal"], stdin=subprocess.PIPE)
time.sleep(0.5) #Needs a short delay until halrun is up and running

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not nice. ToDo: Find an other way to check if halrun is up.

@hdiethelm hdiethelm force-pushed the rtapi_no_autostart branch from 6351c6c to 36269fe Compare June 28, 2026 17:20
@hdiethelm hdiethelm force-pushed the rtapi_no_autostart branch from 36269fe to 08465cd Compare June 28, 2026 17:24
@hdiethelm

Copy link
Copy Markdown
Contributor Author

So, the auto-start is back, but in a different way than before that supports easy spit to master / client later.

halcmd loadrt sum2
WARNING: Deprecated: No master found. Use realtime start to start one.
  A master is started automaticaly.
Note: Using XENOMAI4 EVL realtime

@grandixximo Do you know how I can wait until halrun = subprocess.Popen has halrun up and running (#4206 (comment)) I would have to wait until stdout shows %% on the command line to know it is ready, but halrun.stdout.readline() just blocks forever. Also if I add stdout=subprocess.PIPE and I don't see a way to add a timeout.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

Version with master/client as a separate exe:
https://github.com/hdiethelm/linuxcnc-fork/tree/rtapi_no_autostart_master_client

It works but looks like more effort to get it finalized and could create more issues in the future.

There is still some duplicated code which has to be moved to uspace_rtapi_common.cc and a library is also missing, just a POC.

So I would prefer go forward the single exe way and split it in a future PR.

@grandixximo

Copy link
Copy Markdown
Contributor

I think the %% may never come over a pipe: the prompt looks gated on isatty(stdin) (halcmd_main.c:238), so with Popen stdin not a tty, halcmd prints no prompt and readline() would block forever. That might be why it hangs.

Maybe it is easier to sync through HAL than stdout here, since the test is already a HAL peer? Something like replacing the sleep(0.5) with a poll:

halrun = subprocess.Popen(["halrun", "-Is", "raster.hal"], stdin=subprocess.PIPE)
deadline = time.time() + 10
while time.time() < deadline:
    if hal.component_exists("raster") and hal.pin_has_writer("test.output"):
        break
    time.sleep(0.01)
else:
    raise RuntimeError("raster.hal did not come up within 10s")

pin_has_writer("test.output") should go true once raster.hal runs net output ... => test.output, and start is right after, so it would be deterministic and timeout-bounded without needing a pty. What do you think, would that work for your case?

Two small things I noticed in the teardown: except TimeoutExpired looks like it should be subprocess.TimeoutExpired (otherwise a NameError), and input=b"exit" probably wants to be b"exit\n" so halcmd's fgets returns before EOF.

@BsAtHome

Copy link
Copy Markdown
Contributor

A different solution:

Why not encapsulate the test in a test.hal file:

loadusr -w ./rastertest.py

and rename test into rastertest.py. You also need to remove the os.system('halrun -U') at the bottom because that would interfere with the run_tests' use of halrun.

Hal test files are run using halrun -f test.hal. That should start/stop RT appropriately and not require any hacked external process sequencing in the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants