-
Notifications
You must be signed in to change notification settings - Fork 11
Use systemd Boot Assessment #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
What do we do on platforms without EFI vars? Just declare them unsupported by this mechanism? |
This version uses EFI vars for simplicity, since it makes it easier to retrieve the current and default boot entries. This can work with any bootloader as long as we know this info and Automatic Boot Assessment is supported. |
I replaced the EFI variables by reimplementing Automatic Boot Assessment logic in health-checker. I calculate the default one as being the first entry, descending order based on the name, that also has not been disabled by the boot counting. It is quite bare bones, but it works. One possible issue would be having different kernel versions, then health-checker would require a more robust parser. For detecting the current entry, I am checking the snapshot version of the current mounted snapshot. |
I thought a little bit more on the approach I have taken in this PR. |
For example, if we are using My recommendation is to follow this path, or use |
Please adjust the README.md and manual page, especially with the now missing state file and the new kernel commandline options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grep
without -q
pollutes stdout
README.md
Outdated
@@ -2,34 +2,47 @@ | |||
|
|||
Check the state of a openSUSE MicroOS system after a reboot. | |||
|
|||
## Configure | |||
|
|||
All services that should be checked, need to be listed in the 'After' section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All services that should be checked, need to be listed in the 'After' section. | |
All services that should be checked need to be listed in the 'After' section. |
f66e61f
to
1c4cdfb
Compare
installed by system packages (and therefore coming through an RPM), the latter includes | ||
plugins installed manually by the system admin. Every plugin is responsible to check | ||
a special service or condition. For this, the plugin is called with the option | ||
*check*. If this fails, the plugin will exit with the return value `1`, else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check option is already mentioned some lines above, please merge this.
`0`. Have a look at the default plugins shipped in | ||
`/usr/libexec/health-checker` for examples. | ||
|
||
Its behavior depends if the system is using systemd-boot/grub2-bls (i.e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better: "is using systemd-boot, grub2-bls or any other bootloader following the Boot Loader Specification (BLS) or legacy..."
I wouldn't mention bootloader internal things like the /boot/efi/loader/entries, this can change in the future and a change is already under discussion.
Every new snapshot has a separate boot entry with a boot counter (according to | ||
`/etc/kernel/tries`, which health-checker sets to 3 by default); when that | ||
snapshot is booted for the first time, the bootloader (systemd-boot by | ||
default on MicroOS, but grub2-bls is also supported) will decrease the amount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove "(systemd-boot by default on MicroOS, but grub2-bls is also supported)", not relevant internal implementation detail which can change every time. And then we forget to adjust the text here.
number of timed configured in <filename>/etc/kernel/tries</filename>. If the | ||
system still isn't working, then an emergency shell is started. If it is not | ||
the first boot with the selected snapshot, then an emergency shell is | ||
automatically started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we will never do an automated rollback with systemd-boot or grub2-bls?
# Do not reboot by default if the entry has been chosen manually or the reboot has | ||
# been disabled in the kernel cmdline | ||
# selected_entry contains the boot count, remove it before comparing it to the default entry | ||
if ! grep -qw "health-checker-reboot=disabled" /proc/cmdline; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document this option in the manual page.
Automatic Boot Assessment allows systemd-boot and systemd to mark boot entries as either good or bad, depending on if they can boot successfully or not.
This PR changes health-checker into a service that is part of the automatic boot assessment, by using the special target
boot-complete.target
. When systemd-boot and/etc/kernel/tries
is greater than 0, the current boot entry get renamed to start the counting.If health-checker tests pass without errors, then the boot entry is marked as good by systemd-bless-boot. If there is any error, then health-checker decides if there should be a reboot, or it should start an emergency shell. If the current entry is the default one and the entry has still some tries left (i.e. it has not been marked as
bad
), then reboot. If the current entry is the default one and there are no tries left, then start an emergency shell. The default entry will be picked among the one that are known to work or haven't been tested yet, so the emergency shell is only started when all entries have been tried (this could lead to many reboots). If the user choose an entry instead of letting systemd-boot pick the default one, then health-checker will not reboot by default (this can be enforced with the argument below).I have also added two kernel cmdline arguments to fix #8 :
health-checker-reboot
:force
: always reboot when health-checker fails and the loaded boot entry is not the default onedisable
: health-checker never rebootshealth-checker=disabled
: skill all tests and mark health-checker as successful. This breaks systemd Automatic Boot Counting but helps with debugging or some edge cases.Requirements
/etc/kernel/tries
to have a number greater than 0. Currently, I am shipping this file in the health-checker package.Current blockers