Add nomad monitor export command #26178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

tehut wants to merge 19 commits into main from f-NMD-855/monitor_external

+7,640 −134

Contributor

tehut commented Jul 1, 2025 •

edited

Loading

Description

The nomad monitor export command introduces the ability for nomad to export logs a given agent has written to journald or to the nomad log file. Journald logs can be requested for a specific period of time while we just return the agent's entire nomad log file. Introducing this RPC is a prerequisite for adding journald logs to the nomad support bundle.

Testing & Reproduction steps

Links

Contributor Checklist

Changelog Entry If this PR changes user-facing behavior, please generate and add a
changelog entry using the make cl command.
Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
ensure regressions will be caught.
Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
and job configuration, please update the Nomad website documentation to reflect this. Refer to
the website README for docs guidelines. Please also consider whether the
change requires notes within the upgrade guide.

Reviewer Checklist

Backport Labels Please add the correct backport labels as described by the internal
backporting document.
Commit Type Ensure the correct merge method is selected which should be "squash and merge"
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
within the public repository.

tehut mentioned this pull request

DRAFT- F nmd 141/monitor external #26065

Closed

6 tasks

vercel bot deployed to Preview – nomad-ui

July 1, 2025 17:33

View deployment

tehut force-pushed the f-NMD-855/monitor_external branch from 3a0e003 to 4a608b0 Compare

July 1, 2025 17:42

vercel bot deployed to Preview – nomad-ui

July 1, 2025 17:43

View deployment

vercel bot deployed to Preview – nomad-ui

July 1, 2025 18:31

View deployment

tehut changed the title ~~F nmd 855/monitor external~~ Add nomad monitor export command

tehut force-pushed the f-NMD-855/monitor_external branch from b105697 to e430e42 Compare

July 2, 2025 00:25

vercel bot deployed to Preview – nomad-ui

July 2, 2025 00:26

View deployment

vercel bot deployed to Preview – nomad-ui

July 2, 2025 01:45

View deployment

vercel bot had a problem deploying to Preview – nomad

July 2, 2025 01:46

Failure

vercel bot deployed to Preview – nomad-ui

July 2, 2025 01:54

View deployment

vercel bot had a problem deploying to Preview – nomad

July 2, 2025 01:55

Failure

tehut force-pushed the f-NMD-855/monitor_external branch from 68fdc38 to 2ddf97c Compare

July 2, 2025 03:07

vercel bot deployed to Preview – nomad-ui

July 2, 2025 03:08

View deployment

vercel bot had a problem deploying to Preview – nomad

July 2, 2025 03:08

Failure

tehut force-pushed the f-NMD-855/monitor_external branch from 2ddf97c to 2c0148a Compare

July 2, 2025 03:17

vercel bot deployed to Preview – nomad-ui

July 2, 2025 03:18

View deployment

vercel bot had a problem deploying to Preview – nomad

July 2, 2025 03:19

Failure

tehut force-pushed the f-NMD-855/monitor_external branch from 2c0148a to 8863e01 Compare

July 2, 2025 03:21

vercel bot deployed to Preview – nomad-ui

July 2, 2025 03:22

View deployment

vercel bot had a problem deploying to Preview – nomad

July 2, 2025 03:23

Failure

tehut force-pushed the f-NMD-855/monitor_external branch from 8863e01 to 71a05f0 Compare

July 2, 2025 03:25

vercel bot deployed to Preview – nomad-ui

July 2, 2025 03:26

View deployment

vercel bot deployed to Preview – nomad

July 2, 2025 03:31

View deployment

tgross reviewed

View reviewed changes

Member

tgross left a comment

I know you're not quite done @tehut but I made a first pass over this.

command/agent_monitor_export.go Outdated Show resolved Hide resolved

command/agent_monitor_export.go Outdated

+                  Sets the specific server to monitor
+                -service-name <service-name>
+                  Sets the systemd unit name to query journalctl

Member

tgross Jul 2, 2025

We probably should point out which fields only work on agents running on Linux.

command/agent_monitor_export.go Outdated Show resolved Hide resolved

command/agent_monitor_export.go Outdated

+                  Sets the systemd unit name to query journalctl
+                -log-since <int>
+                  Sets the log period for journald logs. Defaults to 72 and ignored if on-disk

Member

tgross Jul 2, 2025

Suggested change

      
                Sets the log period for journald logs. Defaults to 72 and ignored if on-disk
          
                Sets the log period for journald logs. Defaults to 72 and ignored if -on-disk=true

Also, do we have to restrict ourselves to integer number of hours here? Can we accept a humanize-formatted duration string like we do everywhere else?

command/agent_monitor_export.go Outdated Show resolved Hide resolved

command/agent/agent_endpoint_test.go Outdated

Comment on lines 524 to 530

+              				urlString := baseURL +
+              					"on_disk=" + tc.onDisk +
+              					"&service_name=" + tc.serviceName +
+              					"&follow=" + tc.follow +
+              					"&node_id=" + tc.nodeID +
+              					"&server_id=" + tc.serverID +
+              					"&mocked=" + "true"

Member

tgross Jul 2, 2025

Nitpick: you can use https://pkg.go.dev/net/url#Values here.

command/agent/agent_endpoint_test.go Outdated

+              						must.Eq(t, err.(HTTPCodedError).Code(), tc.errCode)
+              						return
+              					} else {
+              						must.Unreachable(t)

Member

tgross Jul 2, 2025

"Unreachable" is totally correct, but this will report the error message that we shouldn't be getting:

Suggested change

      
            						must.Unreachable(t)
          
            						must.NoError(t, err)

client/agent_endpoint.go

Comment on lines +254 to +243

    
              	} else if !aclObj.AllowAgentRead() {

              		handleStreamResultError(structs.ErrPermissionDenied, pointer.Of(int64(403)), encoder)

              		return

              	}

Member

tgross Jul 2, 2025

I sort of wonder if agent:read is too low because we're giving access to other service logs, but the agent ACL is actually pretty powerful already, so maybe that's ok? Might be worth reviewing with @dduzgun-security

client/agent_endpoint.go Outdated

Comment on lines 263 to 254

    
              	if args.MockMonitor != nil {

              		mon = args.MockMonitor

              	}

Member

tgross Jul 2, 2025

Even if we do decide to keep the mock monitor infrastructure in place, I'd really like if we could figure out a way to inject this at the time we create the monitor and not via request argument. Because this leaves the possibility open for a user to request we're using the mock. That's probably harmless but it's not intended to be user-accessible, so I'd rather just make it impossible. (Plus security audits tend to ding us on this kind of thing.)

client/agent_endpoint.go Outdated

Comment on lines 285 to 313

    
              	opts := monitor.MonitorExportOpts{

              		LogSince:     args.LogSince,

              		ServiceName:  args.ServiceName,

              		NomadLogPath: args.NomadLogPath,

              		OnDisk:       args.OnDisk,

              		Follow:       args.Follow,

              	}

              	logCh := mon.MonitorExport(opts)

              	initialOffset := int64(0)

              	var (

              		eofCancelCh chan error

              		eofCancel   bool

              	)

              	eofCancel = !opts.Follow

              	// receive logs and build frames

              	streamReader := cstructs.NewStreamReader(logCh)

              	go func() {

              		defer framer.Destroy()

              		if err := streamReader.StreamFixed(ctx, initialOffset, "", 0, framer, eofCancelCh, eofCancel); err != nil {

              			select {

              			case errCh <- err:

              			case <-ctx.Done():

              			}

              		}

              	}()

              	var streamErr error

Member

tgross Jul 2, 2025

This chunk seems to be the primary difference between this and the Agent.monitor RPC, which as you'll guess from my other comments suggests to me they could share a lot of code. It seems like we're repeating all the stream handling code in the RPC handlers when we could reuse that if we have 2 different monitor implementations with the same interface (i.e. the Start/End interface), instead of introducing a new method into the existing struct.

vercel bot deployed to Preview – nomad-ui

July 3, 2025 00:22

View deployment

tehut force-pushed the f-NMD-855/monitor_external branch from 5e7423f to 18ed5c1 Compare

July 3, 2025 01:23

vercel bot deployed to Preview – nomad-ui

July 3, 2025 01:24

View deployment

tehut force-pushed the f-NMD-855/monitor_external branch from 18ed5c1 to ec7eb62 Compare

July 3, 2025 01:44

vercel bot deployed to Preview – nomad-ui

July 3, 2025 01:45

View deployment

tehut force-pushed the f-NMD-855/monitor_external branch from 0459d61 to 594237f Compare

July 10, 2025 19:32

vercel bot deployed to Preview – nomad-ui

July 10, 2025 19:33

View deployment

vercel bot had a problem deploying to Preview – nomad

July 10, 2025 19:34

Failure

tehut force-pushed the f-NMD-855/monitor_external branch from 594237f to 9215971 Compare

July 10, 2025 19:39

vercel bot deployed to Preview – nomad-ui

July 10, 2025 19:40

View deployment

vercel bot had a problem deploying to Preview – nomad

July 10, 2025 19:41

Failure

tehut force-pushed the f-NMD-855/monitor_external branch from 9215971 to e546200 Compare

July 10, 2025 20:09

vercel bot deployed to Preview – nomad-ui

July 10, 2025 20:10

View deployment

vercel bot had a problem deploying to Preview – nomad

July 10, 2025 20:11

Failure


          update nav post-docs fix

ab483cb

tehut force-pushed the f-NMD-855/monitor_external branch from e546200 to ab483cb Compare

July 10, 2025 20:36

vercel bot deployed to Preview – nomad-ui

July 10, 2025 20:37

View deployment

vercel bot had a problem deploying to Preview – nomad

July 10, 2025 20:38

Failure


          revert client_agent_endpoint use of shared helper and clean up others

6b605d4

vercel bot deployed to Preview – nomad-ui

July 10, 2025 23:00

View deployment

vercel bot had a problem deploying to Preview – nomad

July 10, 2025 23:08

Failure


          fix website parsing

98ca6a0

vercel bot deployed to Preview – nomad-ui

July 10, 2025 23:29

View deployment

vercel bot deployed to Preview – nomad

July 10, 2025 23:34

View deployment


          fix plaintext option that was breaking monitor tests

7836c93

vercel bot deployed to Preview – nomad-ui

July 10, 2025 23:37

View deployment


          api helper and nav cleanup

0f2f851

vercel bot deployed to Preview – nomad-ui

July 11, 2025 00:51

View deployment

vercel bot deployed to Preview – nomad

July 11, 2025 00:56

View deployment

aimeeu reviewed

View reviewed changes

website/data/commands-nav-data.json

-                  "path": "monitor"
+                  "routes": [
+                    {
+                      "title": "monitor",

Contributor

aimeeu Jul 11, 2025

Suggested change

      
                    "title": "monitor",
          
                    "title": "Overview",

change to match how we label section index pages. And thanks for merging the docs directory changes!


          command streamFrames helper test

259ee8d

vercel bot deployed to Preview – nomad-ui

July 12, 2025 01:07

View deployment


          fixup for clearer channel name

cf2d9f0

tehut force-pushed the f-NMD-855/monitor_external branch from c00547f to cf2d9f0 Compare

July 14, 2025 01:37

vercel bot deployed to Preview – nomad-ui

July 14, 2025 01:38

View deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels