@@ -56,11 +56,13 @@ access.
56
56
57
57
# Implementation
58
58
59
- primesieve is written entirely in C++ and does not depend on
60
- external libraries. It's speed is mainly due to the segmentation of
61
- the sieve of Eratosthenes which prevents cache misses when crossing
62
- off multiples in the sieve array and the use of a bit array instead of
63
- a boolean sieve array. primesieve reuses and improves ideas from other
59
+ primesieve is written in C++ and does not depend on external libraries.
60
+ Some of its algorithms (such as e.g. pre-sieving) have been vectorized
61
+ using SIMD instructions and we also use inline assembly in some places, e.g.
62
+ for querying CPUID on x86 CPUs. The speed of primesieve is primarily due to the
63
+ segmentation of the sieve of Eratosthenes which prevents cache misses when
64
+ crossing off multiples in the sieve array and the use of a bit array instead
65
+ of a boolean sieve array. primesieve reuses and improves ideas from other
64
66
great sieve of Eratosthenes implementations, namely Achim
65
67
Flammenkamp's [ prime_sieve.c] ( https://wwwhomes.uni-bielefeld.de/achim/prime_sieve.html ) ,
66
68
Tomás Oliveira e Silva's [ A1 implementation] ( http://sweet.ua.pt/tos/software/prime_sieve.html#s )
@@ -71,21 +73,21 @@ efficiently uses the CPU's multi level cache hierarchy.
71
73
72
74
### Optimizations used in primesieve
73
75
74
- * Uses a bit array with 8 flags each 30 numbers for sieving
75
- * Pre-sieves multiples of small primes ≤ 163
76
- * Compresses the sieving primes in order to improve cache efficiency [[ 5]] ( #references )
77
- * Starts crossing off multiples at the square
78
- * Uses a modulo 210 wheel that skips multiples of 2, 3, 5 and 7
79
- * Uses specialized algorithms for small, medium and big sieving primes
80
- * Uses L1 cache for small sieving primes & L2 cache for medium and big sieving primes
81
- * Sorts medium sieving primes to reduce branch misprediction rate
82
- * Uses a custom memory pool (for medium & big sieving primes)
83
- * Multi-threaded using C++11 ``` std::async ```
76
+ * Uses a bit array with 8 flags each 30 numbers for sieving.
77
+ * Pre-sieves multiples of small primes ≤ 163 using SIMD instructions.
78
+ * Compresses the sieving primes in order to improve cache efficiency [[ 5]] ( #references ) .
79
+ * Starts crossing off multiples at the square.
80
+ * Uses a modulo 210 wheel that skips multiples of 2, 3, 5 and 7.
81
+ * Uses specialized algorithms for small, medium and big sieving primes.
82
+ * Uses L1 cache for small sieving primes & L2 cache for medium and big sieving primes.
83
+ * Sorts medium sieving primes to reduce branch misprediction rate.
84
+ * Uses a custom memory pool (for medium & big sieving primes).
85
+ * Multi-threaded using C++11 ``` std::async ``` .
84
86
85
87
### Highly optimized inner loop
86
88
87
89
primesieve's inner sieving loop has been optimized using
88
- [ extreme loop unrolling] ( https://github.com/kimwalisch/primesieve/blob/master /src/EratSmall.cpp#L112 ) ,
90
+ [ extreme loop unrolling] ( https://github.com/kimwalisch/primesieve/blob/v12.7 /src/EratSmall.cpp#L108 ) ,
89
91
on average crossing off a multiple uses just 1.375 instructions on
90
92
x64 CPUs. Below is the assembly GCC generates for primesieve's inner
91
93
sieving loop, each andb instruction unsets a bit (crosses off a
0 commit comments