Skip to content

More support export logs #102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Sep 10, 2024

Conversation

philrhurst
Copy link
Collaborator

The pgo support export feature gathers Postgres logs from replicas in addition to the primary and pgbackrest logs where appropriate (in addition to those gathered in the container logs.

Philip Hurst added 8 commits August 14, 2024 17:04
- Pod is the RepoHost
- Pod is currently a replica
- Container is pgBackRest

added a function to return labels for the following;

- Replica instances
- RepoHost instance
- PG Conf files from db instances (anything in the DATADIR matching *.conf
- pgBackRest log files from the db instances
- pgBackRest log files from the repo host
…primary and replicas) and the pgBackRest logs (from the primary, replicas, and repo host).

- gatherAllPostgresLogs
- gatherDbBackrestLogs
- gatherRepoHostLogs

Because of these additional logs, we also need to restructure where the logs are stored within the Support Export output. We create a folder for each pod within the /logs directory; and we place the container logs and the file logs there when appropriate.
The gatherPostgresqlLogs function was replaced with gatherAllPostgresLogs. We shoudl remove the old function since it is no longer used.
@caitlinstrong
Copy link

I tested this in my local environment (CPK 5.6.0, PG 16) and confirmed:

  • Postgres log files are retrieved from the primary and replica db instances
  • Postgres conf files are retrieved from the primary and replica db instances
  • pgBackRest log files are retrieved from the primary and replica db instances
  • pgBackRest log files are retrieved from the repo host
  • the 'log' directory contains a folder for each pod in the cluster. Each pod folder has a 'container' folder for container logs, and a 'pgdata' folder for the Postgres logs/conf files and pgBackRest logs:
📦logs
 ┣ 📂hippo-backup-9bnh-4shxp
 ┃ ┗ 📂containers
 ┃ ┃ ┗ 📜pgbackrest.log
 ┣ 📂hippo-instance1-lqd7-0 (REPLICA)
 ┃ ┣ 📂containers
 ┃ ┃ ┣ 📜database.log
 ┃ ┃ ┣ 📜nss-wrapper-init.log
 ┃ ┃ ┣ 📜pgbackrest-config.log
 ┃ ┃ ┣ 📜pgbackrest.log
 ┃ ┃ ┣ 📜postgres-startup.log
 ┃ ┃ ┗ 📜replication-cert-copy.log
 ┃ ┗ 📂pgdata
 ┃ ┃ ┣ 📂pg16
 ┃ ┃ ┃ ┣ 📂log
 ┃ ┃ ┃ ┃ ┗ 📜postgresql-Tue.log
 ┃ ┃ ┃ ┣ 📜pg_hba.conf
 ┃ ┃ ┃ ┣ 📜pg_ident.conf
 ┃ ┃ ┃ ┣ 📜postgresql.auto.conf
 ┃ ┃ ┃ ┣ 📜postgresql.base.conf
 ┃ ┃ ┃ ┗ 📜postgresql.conf
 ┃ ┃ ┗ 📂pgbackrest
 ┃ ┃ ┃ ┗ 📂log
 ┃ ┃ ┃ ┃ ┗ 📜db-restore.log
 ┣ 📂hippo-instance1-zftn-0 (PRIMARY)
 ┃ ┣ 📂containers
 ┃ ┃ ┣ 📜database.log
 ┃ ┃ ┣ 📜nss-wrapper-init.log
 ┃ ┃ ┣ 📜pgbackrest-config.log
 ┃ ┃ ┣ 📜pgbackrest.log
 ┃ ┃ ┣ 📜postgres-startup.log
 ┃ ┃ ┗ 📜replication-cert-copy.log
 ┃ ┗ 📂pgdata
 ┃ ┃ ┣ 📂pg16
 ┃ ┃ ┃ ┣ 📂log
 ┃ ┃ ┃ ┃ ┗ 📜postgresql-Tue.log
 ┃ ┃ ┃ ┣ 📜pg_hba.conf
 ┃ ┃ ┃ ┣ 📜pg_ident.conf
 ┃ ┃ ┃ ┣ 📜postgresql.auto.conf
 ┃ ┃ ┃ ┣ 📜postgresql.base.conf
 ┃ ┃ ┃ ┗ 📜postgresql.conf
 ┃ ┃ ┗ 📂pgbackrest
 ┃ ┃ ┃ ┗ 📂log
 ┃ ┃ ┃ ┃ ┗ 📜db-stanza-create.log
 ┣ 📂hippo-repo-host-0
 ┃ ┣ 📂containers
 ┃ ┃ ┣ 📜nss-wrapper-init.log
 ┃ ┃ ┣ 📜pgbackrest-config.log
 ┃ ┃ ┣ 📜pgbackrest-log-dir.log
 ┃ ┃ ┗ 📜pgbackrest.log
 ┃ ┗ 📂pgbackrest
 ┃ ┃ ┗ 📂repo1
 ┃ ┃ ┃ ┗ 📂log
 ┃ ┃ ┃ ┃ ┣ 📜db-backup.log
 ┃ ┃ ┃ ┃ ┗ 📜db-expire.log
 ┗ 📜cli

@benjaminjb
Copy link
Collaborator

I'm curious, what do you get if you run this command while a backup job is running for the cluster?

if err != nil {
writeInfo(cmd, err.Error())
if strings.Contains(stderr, "No such file or directory") {
writeDebug(cmd, "Cannot find any Backrest log files. This is acceptable in some configurations.\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you (manually) tested out a scenario where there were no pgbackrest log files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but I found some places when a continue would help the CLI recover and still gather. I'll push those improvements in a bit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this before, but I think this stderr check could go below in the stderr block (lines 1143-1146 currently)

@philrhurst
Copy link
Collaborator Author

@benjaminjb

I'm curious, what do you get if you run this command while a backup job is running for the cluster?

For this one - the CLI is pulling the Pod logs from the Pod created by the CronJob. Depending upon the backrest config, there can be a lot of detail there. If log-level-file is turned on, then the backrest repo Pod has logs for the backup (I don't think the CronJob will log anything to file). The CLI will pull whatever has been written to the file whenever the CLI runs that function.

Philip Hurst added 3 commits August 30, 2024 17:41
minor refactoring to improve edge cases without backrest logs. renaming functions for consistency.
Comment on lines 999 to 1012
if err != nil {
if strings.Contains(stderr, "No such file or directory") {
writeDebug(cmd, "Cannot find any Postgres log files. This is acceptable in some configurations.\n")
}
if apierrors.IsForbidden(err) {
writeInfo(cmd, err.Error())
return nil
}
continue
}
if stderr != "" {
writeDebug(cmd, stderr)
continue
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
if strings.Contains(stderr, "No such file or directory") {
writeDebug(cmd, "Cannot find any Postgres log files. This is acceptable in some configurations.\n")
}
if apierrors.IsForbidden(err) {
writeInfo(cmd, err.Error())
return nil
}
continue
}
if stderr != "" {
writeDebug(cmd, stderr)
continue
}
if err != nil {
if apierrors.IsForbidden(err) {
writeInfo(cmd, err.Error())
return nil
}
continue
}
if stderr != "" {
if strings.Contains(stderr, "No such file or directory") {
writeDebug(cmd, "Cannot find any Postgres log files. This is acceptable in some configurations.\n")
} else {
writeDebug(cmd, stderr)
}
continue
}

What do you think about moving the stderr check into its own discrete section?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can simplify this further if I can assume that any error would populate err and stderr. Is that a safe assumption in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically reduce that section to the following.

My overall goal is just to emit a helpful message when no PG logs are found, but still continue or return nil` when appropriate.

		// Get Postgres Log Files
		stdout, stderr, err := Executor(exec).listPGLogFiles(numLogs)
		if err != nil {
			if strings.Contains(stderr, "No such file or directory") {
				writeDebug(cmd, "Cannot find any Postgres log files. This is acceptable in some configurations.\n")
			}
			if apierrors.IsForbidden(err) {
				writeInfo(cmd, err.Error())
				return nil
			}
			writeDebug(cmd, stderr)
			continue
		}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can simplify this further if I can assume that any error would populate err and stderr. Is that a safe assumption in this case?

I don't know if that's a safe assumption. The way I think about it, "stderr" is when the function returns an error, while "err" is the possibility that the function doesn't even run.

Some code for an executor:

exec, err := remotecommand.NewSPDYExecutor(config, "POST", request.URL())
if err == nil {
err = exec.Stream(remotecommand.StreamOptions{
Stdin: stdin,
Stdout: stdout,
Stderr: stderr,
})
}

For instance, if you run the command to find files, and there are no files, we get the stderr: "no files". But if you run the command to find files and Kubernetes has deleted your pod, we get an err telling us there's no pod with that name.

for _, pod := range dbPods.Items {
writeDebug(cmd, fmt.Sprintf("Pod Name is %s\n", pod.Name))

podExec, err := util.NewPodExecutor(config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔
Could you declare this NewPodExecutor outside of the loop and then reuse it on every loop?

Comment on lines 1227 to 1229
if strings.Contains(stderr, "No such file or directory") {
writeDebug(cmd, "Cannot find any Repo Host log files. This is acceptable in some configurations.\n")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Refactor error handling for finding PG and backrest logs. I refactored the error handling to use an OR condition to capture the state where no logs are available. This is because err and/or stderr may report 'no results' depending upon the way we call `ls` (i.e.  with/without piping the results to `head`),
Copy link
Collaborator

@benjaminjb benjaminjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just one question about a line.

Comment on lines +1278 to +1288
writeDebug(cmd, "Error getting pgBackRest logs\n")

if err != nil {
writeDebug(cmd, fmt.Sprintf("%s\n", err.Error()))
}
if stderr != "" {
writeDebug(cmd, stderr)
}

if strings.Contains(stderr, "No such file or directory") {
writeDebug(cmd, "Cannot find any pgBackRest log files. This is acceptable in some configurations.\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ This function is really to make your life easier, so you know better than me, but these logs in gatherRepoHostLogs look the same as the logs in gatherDbBackrestLogs -- will that be confusing? It doesn't look that confusing to me, but wanted to check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're different logs and coming from different paths - they could be abstracted into a common function (along with the PG logs), but right now they are very different.

@philrhurst philrhurst merged commit 78e0830 into CrunchyData:main Sep 10, 2024
7 checks passed
@philrhurst philrhurst deleted the more-support-export-logs branch September 10, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants