-
Notifications
You must be signed in to change notification settings - Fork 22
Expand --nstimes option #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes the --nstimes
option go outside of timing territory and into the wider statistics territory, I guess. I think it begs the question if there are other per-nameserver stats that we would be interested in, and if so we should consider having an --nsstats
option. But IMHO that's out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for those statistics it makes perfect sense to have the sample size. Could we also have a count of the number of queries answered? (Or conversely, the number of “lost” packets?)
We could, but not with the current implementation of |
62ee25e
8807350
to
62ee25e
Compare
62ee25e
to
a3a46df
Compare
a3a46df
to
fd5075c
Compare
@mattias-p @marc-vanderwal Sorry, I've further extended this PR with some new changes, could you make a fresh new review ? Thanks. |
lib/Zonemaster/CLI.pm
Outdated
$total_queries_count += scalar @{ $ns->times } unless $nss_already_processed{$ns}; | ||
$total_queries_times += ( 1000 * $ns->sum_time ) unless $nss_already_processed{$ns}; | ||
|
||
return $total_queries_count, $total_queries_times, %nss_already_processed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to move this summation out from print_nstimes and into its own loop?
What are the use cases for using
Is there any technical reason to exclude those? Otherwise it could be reasonable to see the queries to cache too. Will there be any difference if global or local cache is used? If so, it can be confusing when using global cache. |
AFAICT what is actually interesting is the number of distinct requests. That number should to be the same whether or not recorded data is used, and whether or not intermediate results within Engine are cached. I believe the number of queries that are actually sent, the number of queries that would be sent if we didn't cache, and the sum of those are of little interest. |
@mattias-p, what use case do consider when writing this? Do you see other use cases of the |
I'm not really thinking about use cases. I'm thinking about how to make meaningful measurements. A measurement is more meaningful when it is variable in a single dimension and isn't influenced by multiple different things. So I meant to suggest that we could define the count so that it isn't affected by whether caching options and implementation details. |
I think it is reasonable to have some use cases in mind when defining what figures to extract and present. Maybe it could be useful to distinguish between queries that are responded from cache and those that require sending a message externally. |
@matsduf @mattias-p @marc-vanderwal please re-review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine as function, but I think there is a lack of documentation. I am not sure that the best place is to add documentation into the scripts. Rather I think a markdown document in the document tree would be better, but a reference from here.
Please share your thoughts of documentation and I am ready to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I comments for a bunch for remarks and questions and whatnot. And here's another one that I didn't know where to put as a comment:
The section mapping is defined once for the JSON output and once for table output. And the grand totals are calculated as part of the table output logic. IMHO it would be better to set up the data to be shown just once and do it before we branch to the individual output formats.
} | ||
|
||
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ==========='; | ||
printf "%${max}s %67.2f %11s\n", __( 'Grand total' ), $total_queries_times, $total_queries_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the grand total be included in the JSON output also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about that too, but decided against it for now. It could be done more easily once I do the refactoring you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine as function, but I think there is a lack of documentation. I am not sure that the best place is to add documentation into the scripts. Rather I think a markdown document in the document tree would be better, but a reference from here.
Please share your thoughts of documentation and I am ready to approve.
So far I think all the options are described that way. I could attempt to extend the documentation of this option in a future PR, although I don't immediately see what is missing. I think the current option description and its output are rather self-explanatory, considering that I've added explicit headers to the output.
The section mapping is defined once for the JSON output and once for table output. And the grand totals are calculated as part of the table output logic. IMHO it would be better to set up the data to be shown just once and do it before we branch to the individual output formats.
Yes you're right, if you don't mind though I'd prefer to have it done in a future PR (I can open an issue to track it) so that this one can be merged in time for v2025.1. Is that fine with you?
} | ||
|
||
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ==========='; | ||
printf "%${max}s %67.2f %11s\n", __( 'Grand total' ), $total_queries_times, $total_queries_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about that too, but decided against it for now. It could be done more easily once I do the refactoring you suggested.
Please look at it from a "foreign" eye, and then the output is far from self-explanatory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s fine, but some column headings in the statistics table aren’t translated. I took the liberty to suggest some little style changes as well.
my $header = __( 'Name servers' ); | ||
my $max = max map { length( "$_" ) } ( ( @child_nss, @parent_nss, @all_responded_nss ), $header ); | ||
printf "\n%${max}s %s\n", $header, ' Max Min Avg Stddev Median Total Count'; | ||
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ==========='; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table’s columns aren’t fully localized and that’s a problem.
Also, it's better to write printf "%*s", $width, $string
instead of printf "%${width}s", $string
.
my $header = __( 'Name servers' ); | |
my $max = max map { length( "$_" ) } ( ( @child_nss, @parent_nss, @all_responded_nss ), $header ); | |
printf "\n%${max}s %s\n", $header, ' Max Min Avg Stddev Median Total Count'; | |
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ==========='; | |
my $max = max map { length( "$_" ) } ( ( @child_nss, @parent_nss, @all_responded_nss ), __( 'Name servers' ) ); | |
my @columns = ( | |
{ label => __( 'Name servers' ), width => $max }, | |
{ label => __( 'Max' ), width => 11 }, | |
{ label => __( 'Min' ), width => 10 }, | |
{ label => __( 'Avg' ), width => 10 }, | |
{ label => __( 'Stddev' ), width => 10 }, | |
{ label => __( 'Median' ), width => 10 }, | |
{ label => __( 'Total' ), width => 11 }, | |
{ label => __( 'Count' ), width => 11 }, | |
); | |
# Table header | |
print "\n"; | |
say join " ", map { sprintf "%*s", $_->{width}, $_->{label} } @columns; | |
say join " ", map { "=" x $_->{width} } @columns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The width of each the column is wide enough for the English header name, but in translation the header name might get wider. I guess the solution is to have adaptive column width for all comlumns.
Or does the second suggestion take care of that?
printf "%${max}s ", $ns->string; | ||
printf "%11.2f ", 1000 * $ns->max_time; | ||
printf "%10.2f ", 1000 * $ns->min_time; | ||
printf "%10.2f ", 1000 * $ns->average_time; | ||
printf "%10.2f ", 1000 * $ns->stddev_time; | ||
printf "%10.2f ", 1000 * $ns->median_time; | ||
printf "%11.2f ", 1000 * $ns->sum_time; | ||
printf "%11d\n", scalar @{ $ns->times }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to introducing the @columns
variable above in my previous suggestion, we can have a single source of truth for the column widths.
And there’s another trick: by using %*2$s
instead of %*s
as a format string, the width field can be put second instead of first in the call to printf
.
printf "%${max}s ", $ns->string; | |
printf "%11.2f ", 1000 * $ns->max_time; | |
printf "%10.2f ", 1000 * $ns->min_time; | |
printf "%10.2f ", 1000 * $ns->average_time; | |
printf "%10.2f ", 1000 * $ns->stddev_time; | |
printf "%10.2f ", 1000 * $ns->median_time; | |
printf "%11.2f ", 1000 * $ns->sum_time; | |
printf "%11d\n", scalar @{ $ns->times }; | |
printf '%*2$s ', $ns->string, $columns[0]->{width}; | |
printf '%*2$.2f ', 1000 * $ns->max_time, $columns[1]->{width}; | |
printf '%*2$.2f ', 1000 * $ns->min_time, $columns[2]->{width}; | |
printf '%*2$.2f ', 1000 * $ns->average_time, $columns[3]->{width}; | |
printf '%*2$.2f ', 1000 * $ns->stddev_time, $columns[4]->{width}; | |
printf '%*2$.2f ', 1000 * $ns->median_time, $columns[5]->{width}; | |
printf '%*2$.2f ', 1000 * $ns->sum_time, $columns[6]->{width}; | |
printf "%*2\$d\n", scalar @{ $ns->times }, $columns[7]->{width}; |
|
||
foreach my $section_order ( sort keys %section_mapping ) { | ||
foreach my $section_header ( keys % { $section_mapping{$section_order} } ) { | ||
printf "%s %s\n", $section_header, '-' x ( ( $max - length $section_header ) - 1 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using printf
to do such a thing is discouraged:
printf "%s %s\n", $section_header, '-' x ( ( $max - length $section_header ) - 1 ); | |
say $section_header, ' ', '-' x ( ( $max - length $section_header ) - 1 ); |
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ==========='; | ||
printf "%${max}s %67.2f %11s\n", __( 'Grand total' ), $total_queries_times, $total_queries_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, using a single source of truth for the column widths:
printf "%${max}s %s\n", '=' x $max, ' ========== ========== ========== ========== ========== =========== ==========='; | |
printf "%${max}s %67.2f %11s\n", __( 'Grand total' ), $total_queries_times, $total_queries_count; | |
# Totals line | |
say join " ", map { "=" x $_->{width} } @columns; | |
printf "%*s %67.2f %11s\n", $max, __( 'Grand total' ), $total_queries_times, $total_queries_count; |
@@ -1,15 +1,14 @@ | |||
# Brief help module to define the exception we use for early exits. | |||
package Zonemaster::Engine::Exception::NormalExit; | |||
use 5.014002; | |||
use v5.26; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that we increase the lowest Perl version of Zonemaster from v5.16 to v5.26?
This can be merged, but three issues to be created:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conflict to be resolved.
I created #439 to track that. |
- Extend to all queried name servers: this means that non-queried name servers objects will not appear, with the exception of name servers of the child or parent zone - Add classification of name servers ("child", "parent", "other"). Note that name servers shared by the child and parent zone will appear in both categories. - Add number of queries sent per name server - Add grand total row for query times and query count - Refactoring
- Bump Perl version to v5.26 - Use 'my sub { }' syntax - Update unit test
I created #440 for this. |
Successfully release tested this PR on Rocky Linux 9. Using the command line in the test procedure, the output I obtained looks similar to the example, albeit with different numbers. The formatting of the table looks correct, though currently, none of the strings are translated. |
Purpose
This PR proposes to expand the
--nstimes
option in several ways. See the "Changes" section below.Note: this option applies only to "sent" queries. Cached queries are not counted.
Context
N/A
Changes
How to test this PR
Run any test with the
--nstimes
option. Check that the formatting is done correctly.Example output: