Skip to content

Add nomad monitor export command #26178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
Draft

Add nomad monitor export command #26178

wants to merge 19 commits into from

Conversation

tehut
Copy link
Contributor

@tehut tehut commented Jul 1, 2025

Description

The nomad monitor export command introduces the ability for nomad to export logs a given agent has written to journald or to the nomad log file. Journald logs can be requested for a specific period of time while we just return the agent's entire nomad log file. Introducing this RPC is a prerequisite for adding journald logs to the nomad support bundle.

Testing & Reproduction steps

Links

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

@tehut tehut mentioned this pull request Jul 1, 2025
6 tasks
@tehut tehut force-pushed the f-NMD-855/monitor_external branch from 3a0e003 to 4a608b0 Compare July 1, 2025 17:42
@tehut tehut changed the title F nmd 855/monitor external Add nomad monitor export command Jul 1, 2025
@tehut tehut force-pushed the f-NMD-855/monitor_external branch from b105697 to e430e42 Compare July 2, 2025 00:25
@tehut tehut force-pushed the f-NMD-855/monitor_external branch from 68fdc38 to 2ddf97c Compare July 2, 2025 03:07
@tehut tehut force-pushed the f-NMD-855/monitor_external branch from 2ddf97c to 2c0148a Compare July 2, 2025 03:17
@tehut tehut force-pushed the f-NMD-855/monitor_external branch from 2c0148a to 8863e01 Compare July 2, 2025 03:21
@tehut tehut force-pushed the f-NMD-855/monitor_external branch from 8863e01 to 71a05f0 Compare July 2, 2025 03:25
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're not quite done @tehut but I made a first pass over this.

Sets the specific server to monitor

-service-name <service-name>
Sets the systemd unit name to query journalctl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should point out which fields only work on agents running on Linux.

Sets the systemd unit name to query journalctl

-log-since <int>
Sets the log period for journald logs. Defaults to 72 and ignored if on-disk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sets the log period for journald logs. Defaults to 72 and ignored if on-disk
Sets the log period for journald logs. Defaults to 72 and ignored if -on-disk=true

Also, do we have to restrict ourselves to integer number of hours here? Can we accept a humanize-formatted duration string like we do everywhere else?

Comment on lines 524 to 530
urlString := baseURL +
"on_disk=" + tc.onDisk +
"&service_name=" + tc.serviceName +
"&follow=" + tc.follow +
"&node_id=" + tc.nodeID +
"&server_id=" + tc.serverID +
"&mocked=" + "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: you can use https://pkg.go.dev/net/url#Values here.

must.Eq(t, err.(HTTPCodedError).Code(), tc.errCode)
return
} else {
must.Unreachable(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Unreachable" is totally correct, but this will report the error message that we shouldn't be getting:

Suggested change
must.Unreachable(t)
must.NoError(t, err)

Comment on lines +254 to +243
} else if !aclObj.AllowAgentRead() {
handleStreamResultError(structs.ErrPermissionDenied, pointer.Of(int64(403)), encoder)
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of wonder if agent:read is too low because we're giving access to other service logs, but the agent ACL is actually pretty powerful already, so maybe that's ok? Might be worth reviewing with @dduzgun-security

Comment on lines 263 to 254
if args.MockMonitor != nil {
mon = args.MockMonitor
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we do decide to keep the mock monitor infrastructure in place, I'd really like if we could figure out a way to inject this at the time we create the monitor and not via request argument. Because this leaves the possibility open for a user to request we're using the mock. That's probably harmless but it's not intended to be user-accessible, so I'd rather just make it impossible. (Plus security audits tend to ding us on this kind of thing.)

Comment on lines 285 to 313
opts := monitor.MonitorExportOpts{
LogSince: args.LogSince,
ServiceName: args.ServiceName,
NomadLogPath: args.NomadLogPath,
OnDisk: args.OnDisk,
Follow: args.Follow,
}

logCh := mon.MonitorExport(opts)

initialOffset := int64(0)
var (
eofCancelCh chan error
eofCancel bool
)
eofCancel = !opts.Follow
// receive logs and build frames
streamReader := cstructs.NewStreamReader(logCh)
go func() {
defer framer.Destroy()

if err := streamReader.StreamFixed(ctx, initialOffset, "", 0, framer, eofCancelCh, eofCancel); err != nil {
select {
case errCh <- err:
case <-ctx.Done():
}
}
}()
var streamErr error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chunk seems to be the primary difference between this and the Agent.monitor RPC, which as you'll guess from my other comments suggests to me they could share a lot of code. It seems like we're repeating all the stream handling code in the RPC handlers when we could reuse that if we have 2 different monitor implementations with the same interface (i.e. the Start/End interface), instead of introducing a new method into the existing struct.

"path": "monitor"
"routes": [
{
"title": "monitor",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"title": "monitor",
"title": "Overview",

change to match how we label section index pages. And thanks for merging the docs directory changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/docs Documentation issues and enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants