systemd can keep a service alive across crashes — but you also want a backoff delay and a limit so a service that crashes instantly on startup doesn’t spin forever.

The unit

[Unit]
Description=My resilient worker
# Stop trying if it fails 5 times within 60s (a crash loop):
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
ExecStart=/usr/local/bin/worker
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Restart= values, in plain terms

on-failure — restart on a non-zero exit, a signal (crash), a timeout, or a watchdog ping miss. The usual choice.
always — restart even on a clean exit 0. Use for a daemon that should never stop.
on-abnormal — only on signal/timeout/watchdog, not on a non-zero exit code.
no (default) — never restart.

RestartSec=5 waits 5 seconds between attempts (a simple backoff).

The rate limit (StartLimit)

StartLimitIntervalSec + StartLimitBurst live in the [Unit] section (they moved there in systemd v230+). The example above gives up after 5 starts in 60 seconds and puts the unit in a failed state instead of looping. Reset a unit that hit the limit with:

sudo systemctl reset-failed myapp.service
sudo systemctl start myapp.service

Apply and verify

sudo systemctl daemon-reload
sudo systemctl restart myapp.service
journalctl -u myapp.service -f      # watch it die and come back

Tip: to escalate beyond a restart, add StartLimitAction=reboot (or poweroff) so a host whose critical service can’t recover reboots itself. For a “watchdog” restart when a healthy process hangs, set WatchdogSec= and have the program call sd_notify(... WATCHDOG=1 ...).

Auto-restart a service when it crashes (with backoff and a rate limit)

The unit

Restart= values, in plain terms

The rate limit (StartLimit)

Apply and verify