Open
Description
Code
I tried this code:
use std::convert::TryInto;
pub fn mul3(previous: &[u8], current: &mut [u8]) {
let mut c_bpp = [0; 4];
for (chunk, b_bpp) in current.chunks_exact_mut(4).zip(previous.chunks_exact(4))
{
let new_chunk = [
chunk[0].wrapping_add(c_bpp[0]),
chunk[1].wrapping_add(c_bpp[1]),
chunk[2].wrapping_add(c_bpp[2]),
chunk[3].wrapping_add(c_bpp[3]),
];
*TryInto::<&mut [u8; 4]>::try_into(chunk).unwrap() = new_chunk;
c_bpp = b_bpp.try_into().unwrap();
}
}
I expected to see this happen: Function runs quickly thanks to auto-vectorization.
Instead, this happened: Function is 60% slower than before, because it now doesn't get vectorized
Godbolt comparison link: https://godbolt.org/z/8EhWdYc13
Version it worked on
It most recently worked on: rustc 1.86.0 (which uses LLVM version 19.1.7)
Version with regression
rustc --version --verbose
:
rustc 1.87.0 (17067e9ac 2025-05-09)
binary: rustc
commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
commit-date: 2025-05-09
host: x86_64-unknown-linux-gnu
release: 1.87.0
LLVM version: 20.1.1
Other context
This is an attempted minimization of image-rs/image-png#598.
Metadata
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Category: An issue highlighting optimization opportunities or PRs implementing suchIssue: Problems and improvements with respect to performance of generated code.High priorityRelevant to the compiler team, which will review and decide on the PR/issue.Performance or correctness regression from one stable version to another.
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
moxian commentedon Jun 15, 2025
Can confirm this bisects to llvm update (#135763)
@rustbot label: -regression-untriaged +regression-from-stable-to-stable
ds84182 commentedon Jun 15, 2025
Codegen isnt the exact same but it does autovectorize the loop if you use a while loop & pattern matching to zip the slices. https://godbolt.org/z/5P7x8rK53
nikic commentedon Jun 15, 2025
The pre-SLP IR seems to be about the same for both versions, but somehow I can't reproduce the difference when running just SLP: https://llvm.godbolt.org/z/jE5vxjqMa
nikic commentedon Jun 15, 2025
Duh, I just forgot to specify the target triple. This shows the difference in behavior: https://llvm.godbolt.org/z/PP1Pja45v
5 remaining items