Skip to content

Commit 1b0f75f

Browse files
authored
Update implementation details
1 parent 90559c0 commit 1b0f75f

File tree

1 file changed

+18
-16
lines changed

1 file changed

+18
-16
lines changed

doc/ALGORITHMS.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,13 @@ access.
5656

5757
# Implementation
5858

59-
primesieve is written entirely in C++ and does not depend on
60-
external libraries. It's speed is mainly due to the segmentation of
61-
the sieve of Eratosthenes which prevents cache misses when crossing
62-
off multiples in the sieve array and the use of a bit array instead of
63-
a boolean sieve array. primesieve reuses and improves ideas from other
59+
primesieve is written in C++ and does not depend on external libraries.
60+
Some of its algorithms (such as e.g. pre-sieving) have been vectorized
61+
using SIMD instructions and we also use inline assembly in some places, e.g.
62+
for querying CPUID on x86 CPUs. The speed of primesieve is primarily due to the
63+
segmentation of the sieve of Eratosthenes which prevents cache misses when
64+
crossing off multiples in the sieve array and the use of a bit array instead
65+
of a boolean sieve array. primesieve reuses and improves ideas from other
6466
great sieve of Eratosthenes implementations, namely Achim
6567
Flammenkamp's [prime_sieve.c](https://wwwhomes.uni-bielefeld.de/achim/prime_sieve.html),
6668
Tomás Oliveira e Silva's [A1 implementation](http://sweet.ua.pt/tos/software/prime_sieve.html#s)
@@ -71,21 +73,21 @@ efficiently uses the CPU's multi level cache hierarchy.
7173

7274
### Optimizations used in primesieve
7375

74-
* Uses a bit array with 8 flags each 30 numbers for sieving
75-
* Pre-sieves multiples of small primes ≤ 163
76-
* Compresses the sieving primes in order to improve cache efficiency [[5]](#references)
77-
* Starts crossing off multiples at the square
78-
* Uses a modulo 210 wheel that skips multiples of 2, 3, 5 and 7
79-
* Uses specialized algorithms for small, medium and big sieving primes
80-
* Uses L1 cache for small sieving primes & L2 cache for medium and big sieving primes
81-
* Sorts medium sieving primes to reduce branch misprediction rate
82-
* Uses a custom memory pool (for medium & big sieving primes)
83-
* Multi-threaded using C++11 ```std::async```
76+
* Uses a bit array with 8 flags each 30 numbers for sieving.
77+
* Pre-sieves multiples of small primes ≤ 163 using SIMD instructions.
78+
* Compresses the sieving primes in order to improve cache efficiency [[5]](#references).
79+
* Starts crossing off multiples at the square.
80+
* Uses a modulo 210 wheel that skips multiples of 2, 3, 5 and 7.
81+
* Uses specialized algorithms for small, medium and big sieving primes.
82+
* Uses L1 cache for small sieving primes & L2 cache for medium and big sieving primes.
83+
* Sorts medium sieving primes to reduce branch misprediction rate.
84+
* Uses a custom memory pool (for medium & big sieving primes).
85+
* Multi-threaded using C++11 ```std::async```.
8486

8587
### Highly optimized inner loop
8688

8789
primesieve's inner sieving loop has been optimized using
88-
[extreme loop unrolling](https://github.com/kimwalisch/primesieve/blob/master/src/EratSmall.cpp#L112),
90+
[extreme loop unrolling](https://github.com/kimwalisch/primesieve/blob/v12.7/src/EratSmall.cpp#L108),
8991
on average crossing off a multiple uses just 1.375 instructions on
9092
x64 CPUs. Below is the assembly GCC generates for primesieve's inner
9193
sieving loop, each andb instruction unsets a bit (crosses off a

0 commit comments

Comments
 (0)