Skip to content

[feature request] add regex pattern filter #33

Open
@sify21

Description

@sify21

I have this pdf file: https://docs.ton.org/ton.pdf
I used following recipe to create a toc:

[[heading]]
# TON Blockchain
level = 1
greedy = true
font.name = "F102"
font.size = 17.21540069580078
# font.size_tolerance = 1e-5
# font.color = 0x000000
# font.superscript = false
# font.italic = false
# font.serif = false
# font.monospace = false
# font.bold = false
# bbox.left = 138.70851135253906
# bbox.top = 127.66803741455078
# bbox.right = 274.1837158203125
# bbox.bottom = 144.88343811035156
# bbox.tolerance = 1e-5
[[heading]]
# TON Blockchain as a Collection of 2-Blockchains
level = 2
greedy = true
font.name = "F108"
font.size = 14.346199989318848
# font.size_tolerance = 1e-5
# font.color = 0x000000
# font.superscript = false
# font.italic = false
# font.serif = false
# font.monospace = false
# font.bold = false
# bbox.left = 146.76255798339844
# bbox.top = 291.47509765625
# bbox.right = 486.075927734375
# bbox.bottom = 305.8212890625
# bbox.tolerance = 1e-5
[[heading]]
# 2.1.1. List of blockchain types.
level = 3
greedy = false
font.name = "F104"
font.size = 11.9552001953125
# font.size_tolerance = 1e-5
# font.color = 0x000000
# font.superscript = false
# font.italic = false
# font.serif = false
# font.monospace = false
# font.bold = false
# bbox.left = 110.85400390625
# bbox.top = 395.5226745605469
# bbox.right = 289.56573486328125
# bbox.bottom = 407.52569580078125
# bbox.tolerance = 1e-5

The problem is that level 3 would contain many wrong outputs, for example:

"1 Brief Description of TON Components" 3
        "2 2.1.17 2.4.20" 3
        "3" 3
        "4.1.7" 3
        "4.1.10 3.1.6" 3
        "3.2 3.2.10 3.2.14 3.2.12" 3
        "4 4.3.14 4.3.17 3.2.12 4.1.6" 4
        "4.3.1" 4
        "5" 4
        "4.3.23" 4
        "2.9.13 4.1" 4
"2 TON Blockchain" 5
    "2.1 TON Blockchain as a Collection of 2-Blockchains" 5
        "2.1.17" 5
        "2.1.1. List of blockchain types." 5
        "2.8.8 2.9.7 2.9.8" 5
        "2.8.12 2.8.8" 6
        "2.1.17" 6
        "2.1.2. Innite Sharding Paradigm." 6
        "2.1.3. Messages. Instant Hypercube Routing. 2.4.2 2.4.20" 7
        "2.1.4. Quantity of masterchains, workchains and shardchains." 7

The correct ones all share the same pattern: "\d+\.\d+\.\d+\.. Currently I can delete wrong level 3 lines in vim using this command

:'<,'>g!/"\d\+\.\d\+\.\d\+\./d

But it's better to have a regex pattern matching filter. The filter should be able to:

  • exclude an output that doesn't match a regex

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions