Skip to content

Add basic support for tagging text in the spec #1769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Timmmm
Copy link
Contributor

@Timmmm Timmmm commented Dec 11, 2024

This adds three things:

  1. Custom CSS to add a subtle dotted underline on hover to any element with an ID starting with manual__.
  2. An example tag manual__x0_is_zero, tagging the text that specifies that x0 is hardwired to 0.
  3. A way of extracting the text of the tags into JSON.

This just adds a single tag as an example, but the intention is that such tags would be added throughout the spec, allowing coverage, test plans, tests, documentation, etc. to all link to specific parts of the spec.

See #1397

@Timmmm
Copy link
Contributor Author

Timmmm commented Dec 11, 2024

It looks like this (when hovered):

image

Copy link
Contributor

@james-ball-qualcomm james-ball-qualcomm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this tags.rb work for all adoc anchor types:

  • This is before the parameter definition. [#param-foo-1]#Here's a parameter defined using inline anchors.# This is after the parameter definition.
  • [#param-foo-2]
    Here's a parameter defined in its own block using the # ID syntax.
  • [[param-foo-3]]
    Here's a parameter defined using an anchor.

@Timmmm Timmmm force-pushed the user/timh/links branch from 9beeadb to c02337e Compare July 10, 2025 09:20
This adds three things:

1. Custom CSS to add a subtle dotted underline on hover to any element with an ID starting with `manual__`.
2. An example tag `manual__x0_is_zero`, tagging the text that specifies that x0 is hardwired to 0.
3. A way of extracting the text of the tags into JSON.

This just adds a single tag as an exmaple, but the intention is that such tags would be added throughout the spec, allowing coverage, test plans, tests, documentation, etc. to all link to specific parts of the spec.
@Timmmm Timmmm force-pushed the user/timh/links branch from c02337e to 0f6e869 Compare July 14, 2025 16:58
@Timmmm
Copy link
Contributor Author

Timmmm commented Jul 14, 2025

This came up again in the Sail meeting. Can we get some progress towards merging this? Or at least a decision like "if this has tags for a significant part of the base chapters etc. then it can be merged"?

Will this tags.rb work for all adoc anchor types:

Yeah I believe so, or we can make it work if it doesn't.

@james-ball-qualcomm
Copy link
Contributor

james-ball-qualcomm commented Jul 14, 2025

Last week without remembering this PR, I created a Ruby script to extract all normative rule anchors and their associated text from the ISA manuals into a YAML file. It does this by using a CSS selector to find anchors in the ISA manual HTML files. I see Tim has instead created a new adoc backend in docs-resources/converters/tags.rb to do something similar but into JSON. Let me take a look at this approach.

My Ruby script relies on my naming convention which is documented in a UDB file at the moment. This is just a temporary home and so I've copied the documentation below. You'll see my naming convention is that anything anchor that starts with "norm:" is used by ISA manual normative rules that we are creating in the CSC (and will review with TSC). I have another Ruby script that performs an intelligent compare between the normative rule anchors in a workspace and a reference. I'd integrate both of these Ruby scripts into the ISA manual Makefile and then we could have github actions that complain if you try to remove or edit the text in a normative rule anchor.

Others potentially interested in this PR:

# Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
# SPDX-License-Identifier: BSD-3-Clause-Clear

# frozen_string_literal: true

# Creates links into RISC-V documentation with the following formats for the destination link.
# 
#   Documentation Format                                                  Applies
#   ============  ======================================================= ================================================
#   ISA manuals   norm:base:<base-name>:<identifier>                      Single base ISA (rv32i/rv32e/rv64i)
#                 norm:bases<base-name>[_<base-name>]+:<identifier>       List of bases separated by "_"
#                 norm:basegrp:<group-name>:<identifier>                  Named group of bases (e.g., rv32, all)
#                 norm:ext:<ext-name>:<identifier>                        Single extension
#                 norm:exts:<ext-name>[_<ext-name>]:<identifier>          List of extensions separated by "_"
#                 norm:extgrp:<ext-name>:<identifier>                     Named group of extensions
#                 norm:enc:insttable:<inst-name>                          Instruction table cell for instruction encoding
#                 norm:inst:<inst-name>:<identifier>                      Single instruction
#                 norm:insts:<inst-name>[_<inst-name>]+:<identifier>      List of instructions separated by "_"
#                 norm:instgrp:<group-name>:<identifier>                  Named group of insts (e.g., branch, load, store, etc.)
#                 norm:csr:<csr-name>:<identifier>                        Single CSR
#                 norm:csr_field:<csr-name>:<field-name>:<identifier>     Single CSR field
#                 norm:param:<ext-name>:<param-name>:<identifier>
#                   where <identifier> is a string that describes the tagged text
#   UDB encoding  udb:enc:inst:<inst-name>
#   UDB doc       udb:doc:ext:<ext-name>
#                 udb:doc:inst:<inst-name>
#                 udb:doc:csr:<csr-name>
#                 udb:doc:csr_field:<csr-name>:<field-name>
#                 udb:doc:param:<ext-name>:<param-name>
#                 udb:doc:func:<func-name>  (Documentation of common/built-in IDL functions)
#                 udb:doc:covpt:<org>:<id>
#                   where <org> is:
#                      sep for UDB documentation that "separates" normative rules from test plans
#                      combo for UDB documentation that "combines" normative rules with test plans
#                      appendix for UDB documentation that has normative rules and test plans in appendices
#                   where <id> is the ID of the normative rule
#   IDL code      idl:code:inst:<inst-name>:<location>
#                 TODO for CSR and CSR Fields
#
# Use underscores to replace blanks in names between colons since RISC-V uses minus signs in the names.
#
# Adding anchors into AsciiDoc files
# ==================================
#  1) Anchor to part of a paragraph
#     Syntax:      [#<anchor-name>]# ... #
#     Example:     Here is an example of [#foo]#anchoring part# of a paragraph
#                  and can have [#bar]#multiple anchors# if needed.
#     Tagged text: "anchoring part" and "multiple anchors"
#     HTML:        <div class="paragraph">
#                  <p>Here is an example of <span id="foo">anchoring part</span> of a paragraph
#                  and can have <span id="bar">multiple anchors</span> if needed.</p>
#                  </div>
#     Example:    [#monkey]#Anchoring part of a paragraph#
#                 [#zebra]#and can have multiple anchors# if needed.
#                 and create a span for each one.
#     HTML:       <div class="paragraph">
#                 <p><span id="monkey">Anchoring part of a paragraph</span>
#                 <span id="zebra">and can have multiple anchors</span> if needed.
#                 and create a span for each one.</p>
#                 </div>
#     Limitations:
#       - Can't anchor text across multiple paragraphs.
#       - Must have text next to the 2nd hash symbol (i.e., can't have newline after [#<anchor-name]#).
#       - Can't put inside admonitions such as [NOTE] (see #3 below for solution).
#
#  2) Anchor to entire paragraph
#     Syntax:     [[<anchor-name]]
#     Example:    [[zort]]
#                 Here is an example of anchoring a whole paragraph.
#     Tagged text: Entire paragraph
#     HTML:       <div id="zort" class="paragraph">
#                 <p>Here is an example of anchoring a whole paragraph.</p>
#                 </div>
#
#  3) Anchor inside admonition (e.g. [NOTE])
#     - Must use [[<anchor-name]] before each paragraph (with unique anchor names of course) being tagged
#     - Can't use [#<anchor-name]## since it just shows up in HTML as normal text
#     - Don't put [[<<anchor-name]] anchor before admonition to apply to entire admonition (one or more paragraphs)
#       since the HTML won't tag the text, just its location.
#
#  4) Anchor in table cell
#     - Must use [[<anchor-name]] after "|" and before cell contents and will tag all text in the cell.
#     Example:    |===
#                 |name|number
#
#                 |Bob|[[BobNumber]]415-555-1212
#                 |Pat| [[PatNumber]]  408-555-1212
#                 |===
#     Tagged text: "415-555-1212" and "  408-555-1212"
#     HTML:       <tr>
#                 <td class="tableblock"><p class="tableblock">Bob</p></td>
#                 <td class="tableblock"><p class="tableblock"><a id="BobNumber"></a>413-555-1212</p></td>
#                 </tr>
#                 <tr>
#                 <td class="tableblock"><p class="tableblock">Pat</p></td>
#                 <td class="tableblock"><p class="tableblock"><a id="PatNumber"></a>  408-555-1212</p></td>
#                 </tr>

@james-ball-qualcomm
Copy link
Contributor

james-ball-qualcomm commented Jul 14, 2025

At first blush, Tim's approach of using a new adoc backend looks better than my approach of parsing the HTML. I tried using the Ruby adoc API load_file() function but discovered the inline anchors aren't expanded until a backend so switched to the HTML approach.

Tim, does your adoc backend approach handle all the cases listed in my naming scheme? That would be paragraph anchors, inline anchors, anchors to table cells, and anchors to text inside admonitions? Also, I'll need the section number that the anchor is located in. I don't currently support that yet in my anchor extraction Ruby script but was about to add it. Tim, can you support that with your adoc backend?

@Timmmm
Copy link
Contributor Author

Timmmm commented Jul 15, 2025

Tim's approach of using a new adoc backend looks better than my approach of parsing the HTML.

Yeah I also started by parsing the docbook XML output, but decided it would be better to do it properly via an Asciidoctor backend. It's definitely better.

Tim, does your adoc backend approach handle all the cases listed in my naming scheme?

Yes these all work, though note that for table cells this doesn't quite work:

Must use [[]] after "|" and before cell contents and will tag all text in the cell.

This actually tags an empty paragraph at the start of the cell for some reason. However using [#<anchor>]#...# works fine in a table cell.

I pushed a commit that demonstrates all of these things.

I'll need the section number that the anchor is located in.

Yeah I was also thinking this would be really useful. My plan was to add a section hierarchy to the JSON listing all the tags in each section like this:

{
    "sections": [
       {
          "title": "Introduction",
          "html_anchor": "_introduction",
          "children": [
              {
                  "title": "RISC-V Hardware Platform Terminology",
                  "html_anchor": "_risc_v_hardware_platform_terminology",
                  "tags": ["manual__foo", "manual__bar", ...]
...
     ],
     "tags": {
          "manual_foo": "Foo is a nonsense word used for...
     }
}

This allows you to present the tags more nicely, and obviates some of the reasons for putting a ton of grouping metadata in the tags. I don't think people are going to want to use this if they have to write tags like [#norm__insts__andi_ori_xori__imm_is_sign_extended] or whatever. Eh maybe that's not awful. Anyway we should probably bike shed the naming later.

Does that "sections": schema seem reasonable?

@james-ball-qualcomm
Copy link
Contributor

I've confirmed that inline anchors in table cells work fine. Then you get a span containing the cell contents instead of no contents which seems better. Don't know why I thought inline anchors didn't work well in table cells.

@james-ball-qualcomm
Copy link
Contributor

I think I'll setup a meeting to discuss this live.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants