Skip to content

Expand --nstimes option #421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 10, 2025
Merged

Expand --nstimes option #421

merged 5 commits into from
Jun 10, 2025

Conversation

tgreenx
Copy link
Contributor

@tgreenx tgreenx commented Feb 11, 2025

Purpose

This PR proposes to expand the --nstimes option in several ways. See the "Changes" section below.

Note: this option applies only to "sent" queries. Cached queries are not counted.

Context

N/A

Changes

  • Extend to all queried name servers: this means that non-queried name servers will not appear, with the exception of name servers of the child or parent zone
  • Add classification of name servers ("child", "parent", "other"). Note that name servers shared by the child and parent zone will appear in both categories.
  • Add "Count" column as the number of queries sent per name server
  • Add "Grand total" row for query times and query count
  • Refactoring

How to test this PR

Run any test with the --nstimes option. Check that the formatting is done correctly.

Example output:

$ zonemaster-cli --nstimes --raw --no-ipv6 --test basic02 afnic.fr

                     Name servers         Max        Min        Avg     Stddev     Median       Total       Count
=================================  ========== ========== ========== ========== ========== =========== ===========
Child zone ----------------------
          g.ext.nic.fr/194.0.36.1      110.98      33.27      72.88      22.62      69.07     1093.16          15
      g.ext.nic.fr/2001:678:4c::1        0.00       0.00       0.00       0.00       0.00        0.00           0
           ns1.nic.fr/192.134.4.1        4.78       4.78       4.78       0.00       4.78        4.78           1
  ns1.nic.fr/2001:67c:2218:2::4:1        0.00       0.00       0.00       0.00       0.00        0.00           0
            ns2.nic.fr/192.93.0.4        4.82       4.82       4.82       0.00       4.82        4.82           1
  ns2.nic.fr/2001:660:3005:1::1:2        0.00       0.00       0.00       0.00       0.00        0.00           0
          ns3.nic.fr/192.134.0.49        4.96       4.96       4.96       0.00       4.96        4.96           1
  ns3.nic.fr/2001:660:3006:1::1:1        0.00       0.00       0.00       0.00       0.00        0.00           0
Parent zone ---------------------
               d.nic.fr/194.0.9.1      113.08      60.96      87.02      26.06      87.02      174.05           2
           d.nic.fr/2001:678:c::1        0.00       0.00       0.00       0.00       0.00        0.00           0
      f.ext.nic.fr/194.146.106.46        0.00       0.00       0.00       0.00       0.00        0.00           0
f.ext.nic.fr/2001:67c:1010:11::53        0.00       0.00       0.00       0.00       0.00        0.00           0
          g.ext.nic.fr/194.0.36.1      110.98      33.27      72.88      22.62      69.07     1093.16          15
      g.ext.nic.fr/2001:678:4c::1        0.00       0.00       0.00       0.00       0.00        0.00           0
Other ---------------------------
    a.root-servers.net/198.41.0.4      113.63     113.63     113.63       0.00     113.63      113.63           1
  m.root-servers.net/202.12.27.33      110.43       8.39      43.07      21.62      39.96      602.93          14
=================================  ========== ========== ========== ========== ========== =========== ===========
                      Grand total                                                             1998.33          35

@tgreenx tgreenx added the V-Patch Versioning: The change gives an update of patch in version. label Feb 11, 2025
@tgreenx tgreenx added this to the v2025.1 milestone Feb 11, 2025
mattias-p
mattias-p previously approved these changes Feb 12, 2025
Copy link
Member

@mattias-p mattias-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes the --nstimes option go outside of timing territory and into the wider statistics territory, I guess. I think it begs the question if there are other per-nameserver stats that we would be interested in, and if so we should consider having an --nsstats option. But IMHO that's out of scope for this PR.

marc-vanderwal
marc-vanderwal previously approved these changes Feb 13, 2025
Copy link
Contributor

@marc-vanderwal marc-vanderwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for those statistics it makes perfect sense to have the sample size. Could we also have a count of the number of queries answered? (Or conversely, the number of “lost” packets?)

@tgreenx
Copy link
Contributor Author

tgreenx commented Feb 13, 2025

Could we also have a count of the number of queries answered? (Or conversely, the number of “lost” packets?)

We could, but not with the current implementation of Zonemaster::Engine::Nameserver.

@tgreenx tgreenx dismissed stale reviews from marc-vanderwal and mattias-p via 62ee25e February 25, 2025 17:59
@tgreenx tgreenx changed the title Expand --nstimes option to also output the total number of sent queries per name server Expand --nstimes option Feb 25, 2025
@tgreenx tgreenx marked this pull request as draft February 25, 2025 18:12
@tgreenx
Copy link
Contributor Author

tgreenx commented Feb 25, 2025

@mattias-p @marc-vanderwal Sorry, I've further extended this PR with some new changes, could you make a fresh new review ? Thanks.

@tgreenx tgreenx marked this pull request as ready for review February 25, 2025 18:45
@tgreenx tgreenx added V-Minor Versioning: The change gives an update of minor in version. and removed V-Patch Versioning: The change gives an update of patch in version. labels Feb 25, 2025
Comment on lines 743 to 746
$total_queries_count += scalar @{ $ns->times } unless $nss_already_processed{$ns};
$total_queries_times += ( 1000 * $ns->sum_time ) unless $nss_already_processed{$ns};

return $total_queries_count, $total_queries_times, %nss_already_processed;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to move this summation out from print_nstimes and into its own loop?

@matsduf
Copy link
Contributor

matsduf commented Mar 12, 2025

What are the use cases for using --nstimes?

Note: this option applies only to "sent" queries. Cached queries are not counted.

Is there any technical reason to exclude those? Otherwise it could be reasonable to see the queries to cache too. Will there be any difference if global or local cache is used? If so, it can be confusing when using global cache.

@mattias-p
Copy link
Member

Note: this option applies only to "sent" queries. Cached queries are not counted.

Is there any technical reason to exclude those? Otherwise it could be reasonable to see the queries to cache too.

AFAICT what is actually interesting is the number of distinct requests. That number should to be the same whether or not recorded data is used, and whether or not intermediate results within Engine are cached. I believe the number of queries that are actually sent, the number of queries that would be sent if we didn't cache, and the sum of those are of little interest.

@matsduf
Copy link
Contributor

matsduf commented Mar 12, 2025

AFAICT what is actually interesting is the number of distinct requests. That number should to be the same whether or not recorded data is used, and whether or not intermediate results within Engine are cached. I believe the number of queries that are actually sent, the number of queries that would be sent if we didn't cache, and the sum of those are of little interest.

@mattias-p, what use case do consider when writing this? Do you see other use cases of the --ns-times option?

@mattias-p
Copy link
Member

AFAICT what is actually interesting is the number of distinct requests. That number should to be the same whether or not recorded data is used, and whether or not intermediate results within Engine are cached. I believe the number of queries that are actually sent, the number of queries that would be sent if we didn't cache, and the sum of those are of little interest.

@mattias-p, what use case do consider when writing this? Do you see other use cases of the --ns-times option?

I'm not really thinking about use cases. I'm thinking about how to make meaningful measurements. A measurement is more meaningful when it is variable in a single dimension and isn't influenced by multiple different things. So I meant to suggest that we could define the count so that it isn't affected by whether caching options and implementation details.

@matsduf
Copy link
Contributor

matsduf commented Mar 14, 2025

I think it is reasonable to have some use cases in mind when defining what figures to extract and present. Maybe it could be useful to distinguish between queries that are responded from cache and those that require sending a message externally.

@tgreenx tgreenx requested a review from marc-vanderwal June 3, 2025 16:20
@tgreenx
Copy link
Contributor Author

tgreenx commented Jun 4, 2025

@matsduf @mattias-p @marc-vanderwal please re-review this PR.

Copy link
Contributor

@matsduf matsduf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine as function, but I think there is a lack of documentation. I am not sure that the best place is to add documentation into the scripts. Rather I think a markdown document in the document tree would be better, but a reference from here.

Please share your thoughts of documentation and I am ready to approve.

Copy link
Member

@mattias-p mattias-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I comments for a bunch for remarks and questions and whatnot. And here's another one that I didn't know where to put as a comment:

The section mapping is defined once for the JSON output and once for table output. And the grand totals are calculated as part of the table output logic. IMHO it would be better to set up the data to be shown just once and do it before we branch to the individual output formats.

}

printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ===========';
printf "%${max}s %67.2f %11s\n", __( 'Grand total' ), $total_queries_times, $total_queries_count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the grand total be included in the JSON output also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that too, but decided against it for now. It could be done more easily once I do the refactoring you suggested.

Copy link
Contributor Author

@tgreenx tgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine as function, but I think there is a lack of documentation. I am not sure that the best place is to add documentation into the scripts. Rather I think a markdown document in the document tree would be better, but a reference from here.

Please share your thoughts of documentation and I am ready to approve.

So far I think all the options are described that way. I could attempt to extend the documentation of this option in a future PR, although I don't immediately see what is missing. I think the current option description and its output are rather self-explanatory, considering that I've added explicit headers to the output.

The section mapping is defined once for the JSON output and once for table output. And the grand totals are calculated as part of the table output logic. IMHO it would be better to set up the data to be shown just once and do it before we branch to the individual output formats.

Yes you're right, if you don't mind though I'd prefer to have it done in a future PR (I can open an issue to track it) so that this one can be merged in time for v2025.1. Is that fine with you?

}

printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ===========';
printf "%${max}s %67.2f %11s\n", __( 'Grand total' ), $total_queries_times, $total_queries_count;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that too, but decided against it for now. It could be done more easily once I do the refactoring you suggested.

@matsduf
Copy link
Contributor

matsduf commented Jun 5, 2025

This looks fine as function, but I think there is a lack of documentation. I am not sure that the best place is to add documentation into the scripts. Rather I think a markdown document in the document tree would be better, but a reference from here.
Please share your thoughts of documentation and I am ready to approve.

So far I think all the options are described that way. I could attempt to extend the documentation of this option in a future PR, although I don't immediately see what is missing. I think the current option description and its output are rather self-explanatory, considering that I've added explicit headers to the output.

Please look at it from a "foreign" eye, and then the output is far from self-explanatory.

Copy link
Contributor

@marc-vanderwal marc-vanderwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s fine, but some column headings in the statistics table aren’t translated. I took the liberty to suggest some little style changes as well.

Comment on lines +729 to +693
my $header = __( 'Name servers' );
my $max = max map { length( "$_" ) } ( ( @child_nss, @parent_nss, @all_responded_nss ), $header );
printf "\n%${max}s %s\n", $header, ' Max Min Avg Stddev Median Total Count';
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ===========';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table’s columns aren’t fully localized and that’s a problem.

Also, it's better to write printf "%*s", $width, $string instead of printf "%${width}s", $string.

Suggested change
my $header = __( 'Name servers' );
my $max = max map { length( "$_" ) } ( ( @child_nss, @parent_nss, @all_responded_nss ), $header );
printf "\n%${max}s %s\n", $header, ' Max Min Avg Stddev Median Total Count';
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ===========';
my $max = max map { length( "$_" ) } ( ( @child_nss, @parent_nss, @all_responded_nss ), __( 'Name servers' ) );
my @columns = (
{ label => __( 'Name servers' ), width => $max },
{ label => __( 'Max' ), width => 11 },
{ label => __( 'Min' ), width => 10 },
{ label => __( 'Avg' ), width => 10 },
{ label => __( 'Stddev' ), width => 10 },
{ label => __( 'Median' ), width => 10 },
{ label => __( 'Total' ), width => 11 },
{ label => __( 'Count' ), width => 11 },
);
# Table header
print "\n";
say join " ", map { sprintf "%*s", $_->{width}, $_->{label} } @columns;
say join " ", map { "=" x $_->{width} } @columns;

Copy link
Contributor

@matsduf matsduf Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The width of each the column is wide enough for the English header name, but in translation the header name might get wider. I guess the solution is to have adaptive column width for all comlumns.

Or does the second suggestion take care of that?

Comment on lines +742 to +710
printf "%${max}s ", $ns->string;
printf "%11.2f ", 1000 * $ns->max_time;
printf "%10.2f ", 1000 * $ns->min_time;
printf "%10.2f ", 1000 * $ns->average_time;
printf "%10.2f ", 1000 * $ns->stddev_time;
printf "%10.2f ", 1000 * $ns->median_time;
printf "%11.2f ", 1000 * $ns->sum_time;
printf "%11d\n", scalar @{ $ns->times };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to introducing the @columns variable above in my previous suggestion, we can have a single source of truth for the column widths.

And there’s another trick: by using %*2$s instead of %*s as a format string, the width field can be put second instead of first in the call to printf.

Suggested change
printf "%${max}s ", $ns->string;
printf "%11.2f ", 1000 * $ns->max_time;
printf "%10.2f ", 1000 * $ns->min_time;
printf "%10.2f ", 1000 * $ns->average_time;
printf "%10.2f ", 1000 * $ns->stddev_time;
printf "%10.2f ", 1000 * $ns->median_time;
printf "%11.2f ", 1000 * $ns->sum_time;
printf "%11d\n", scalar @{ $ns->times };
printf '%*2$s ', $ns->string, $columns[0]->{width};
printf '%*2$.2f ', 1000 * $ns->max_time, $columns[1]->{width};
printf '%*2$.2f ', 1000 * $ns->min_time, $columns[2]->{width};
printf '%*2$.2f ', 1000 * $ns->average_time, $columns[3]->{width};
printf '%*2$.2f ', 1000 * $ns->stddev_time, $columns[4]->{width};
printf '%*2$.2f ', 1000 * $ns->median_time, $columns[5]->{width};
printf '%*2$.2f ', 1000 * $ns->sum_time, $columns[6]->{width};
printf "%*2\$d\n", scalar @{ $ns->times }, $columns[7]->{width};


foreach my $section_order ( sort keys %section_mapping ) {
foreach my $section_header ( keys % { $section_mapping{$section_order} } ) {
printf "%s %s\n", $section_header, '-' x ( ( $max - length $section_header ) - 1 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using printf to do such a thing is discouraged:

Suggested change
printf "%s %s\n", $section_header, '-' x ( ( $max - length $section_header ) - 1 );
say $section_header, ' ', '-' x ( ( $max - length $section_header ) - 1 );

Comment on lines +774 to +736
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ===========';
printf "%${max}s %67.2f %11s\n", __( 'Grand total' ), $total_queries_times, $total_queries_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, using a single source of truth for the column widths:

Suggested change
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ===========';
printf "%${max}s %67.2f %11s\n", __( 'Grand total' ), $total_queries_times, $total_queries_count;
# Totals line
say join " ", map { "=" x $_->{width} } @columns;
printf "%*s %67.2f %11s\n", $max, __( 'Grand total' ), $total_queries_times, $total_queries_count;

@@ -1,15 +1,14 @@
# Brief help module to define the exception we use for early exits.
package Zonemaster::Engine::Exception::NormalExit;
use 5.014002;
use v5.26;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that we increase the lowest Perl version of Zonemaster from v5.16 to v5.26?

@matsduf
Copy link
Contributor

matsduf commented Jun 10, 2025

This can be merged, but three issues to be created:

matsduf
matsduf previously approved these changes Jun 10, 2025
Copy link
Contributor

@matsduf matsduf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict to be resolved.

@marc-vanderwal
Copy link
Contributor

This can be merged, but three issues to be created: […]

* Refactoring of the table printing code to better support translations and increase maintainability.

I created #439 to track that.

tgreenx added 5 commits June 10, 2025 15:58
- Extend to all queried name servers: this means that non-queried name servers objects will not appear, with the exception of name servers of the child or parent zone
- Add classification of name servers ("child", "parent", "other"). Note that name servers shared by the child and parent zone will appear in both categories.
- Add number of queries sent per name server
- Add grand total row for query times and query count
- Refactoring
- Bump Perl version to v5.26
- Use 'my sub { }' syntax
- Update unit test
@tgreenx
Copy link
Contributor Author

tgreenx commented Jun 10, 2025

[ ... ]
* Refactoring of section mapping of data for both outputs. Grand total is to be included in the JSON output.

I created #440 for this.

@tgreenx tgreenx merged commit e3c9168 into zonemaster:develop Jun 10, 2025
3 checks passed
@tgreenx tgreenx deleted the update-nstimes branch June 10, 2025 15:28
@marc-vanderwal
Copy link
Contributor

Successfully release tested this PR on Rocky Linux 9.

Using the command line in the test procedure, the output I obtained looks similar to the example, albeit with different numbers. The formatting of the table looks correct, though currently, none of the strings are translated.

@marc-vanderwal marc-vanderwal added the S-ReleaseTested Status: The PR has been successfully tested in release testing label Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-ReleaseTested Status: The PR has been successfully tested in release testing V-Minor Versioning: The change gives an update of minor in version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants