Replies: 5 comments
-
where do you intend to use bstr? we already use crates to extract links and traversing nodes of html. what else can we benefit from using bstr? |
Beta Was this translation helpful? Give feedback.
-
We read the entire file into memory here and we pass that object around: lychee/lychee-lib/src/types/input.rs Line 51 in 6d56c6b lychee/lychee-lib/src/types/input.rs Lines 279 to 281 in 6d56c6b I was thinking of ways to speed that up. |
Beta Was this translation helpful? Give feedback.
-
The consumer of InputContent is in lychee-lib::extract module. Most of time it's directly passed to html5gum or pulldown_cmark. The performance improvement is possible but I doubt how much it would be. More performance gain can be made if html5gum starts to allow unsafe code. |
Beta Was this translation helpful? Give feedback.
-
We could test it and run a benchmark. It probably also depends on the platform. https://github.com/BurntSushi/ripgrep/blob/master/crates/searcher/src/searcher/mmap.rs There are some general caveats mentioned in this article, but none we should be worried about for lychee. The article mentions that mmap was around 30% faster than |
Beta Was this translation helpful? Give feedback.
-
Converting this issue to a discussion, since it doesn't track any kind of planned work. The benchmark would still be valuable, but this is not fleshed out enough to be tackled as an actionable item. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
After reading this comment I was wondering what would be the downsides of testing bstr for reading inputs.
The way I see it
This would probably be the fastest way to read inputs if there was a way to stream the input to the extractor (which would be a bigger change).
Another alternative: memory maps.
This is just a thought for now. Would love to get people's opinions.
Beta Was this translation helpful? Give feedback.
All reactions