Segmented Sieve: Difference between revisions
From charlesreid1
| Line 47: | Line 47: | ||
In other words, let's suppose we're on the step where we want to cross out factors of 7 in our sub-segment. We want to start at some multiple of 7 in the bit vector; then we walk 7 elements and cross out the number at that position; and we repeat that process until we get to the endof the sub-segment. | In other words, let's suppose we're on the step where we want to cross out factors of 7 in our sub-segment. We want to start at some multiple of 7 in the bit vector; then we walk 7 elements and cross out the number at that position; and we repeat that process until we get to the endof the sub-segment. | ||
But finding the start position takes care. If we start at the first number 1,000, we eliminate <math>1,000; 1,007; 1,014; \dots</math>, which are not multiples of 7. In fact, we need to start at 1,001, which is the first multiple of 7 in our sub-segment: <math>7 \times 143 = | But finding the start position takes care. If we start at the first number 1,000, we eliminate <math>1,000; 1,007; 1,014; \dots</math>, which are not multiples of 7. In fact, we need to start at 1,001, which is the first multiple of 7 in our sub-segment: <math>7 \times 143 = 1,001</math>. | ||
To generalize this, we can use a formula: | To generalize this, we can use a formula which forms the core of the segmented sieve algorithm: | ||
<pre> | |||
protected static int qInit(long left, int p) { | |||
// first term here is because modOfThis must be positive. | |||
long modOfThis = (left/2)*p - (left+1+p)/2; | |||
int result = (int)(modOfThis%((long)p)); | |||
return result; | |||
} | |||
</pre> | |||
<pre> | |||
protected static int qReset(int p, int q, int delta) { | |||
int result = (q-delta)%p; | |||
int j = 1; | |||
while(result < 0) { | |||
result += j*p; | |||
} | |||
return result; | |||
} | |||
</pre> | |||
=Code= | =Code= | ||
Revision as of 10:32, 18 July 2017
This page describes a segmented prime number sieve (i.e., a sieve that can look for prime numbers over an arbitrary interval $ [M,N] $). This is related to Project Euler/501. Also see Sieve of Eratosthenes.
Theory
Segmented Prime Sieve
The idea behind a segmented prime number sieve is to search for prime numbers over an arbitrary range $ (M,N) $. To properly implement the segmented sieve, we restrict $ M > \sqrt{N} $, although we generate primes less than $ \sqrt{N} $ in the process so these can be saved out if this is not the case.
The algorithm is straightforward, but for one component. What we do is, we split the interval $ [M,N] $ into sub-segments of size S, so that we search for prime numbers on the intervals $ [M, M+S-1], [M+S, M+2S-1], \dots, [N-S, N] $. We then generate a list of prime numbers between $ 2 $ and $ \sqrt{N} $ using a regular Sieve of Eratosthenes algorithm. Once we have that list, we can iterate through each sub-segment, and for each sub-segment we eliminate any numbers that happens to be a multiple of that prime, using a bit vector technique similar to the Sieve of Eratosthenes. For example, if we are running a segmented sieve for the interval $ [1000, 1100] $, we have:
$ \sqrt{1100} \approx 33.166 $
So we run a Sieve of Eratosthenes to find all primes up to the nearest floored integer, 33. Now, for this list of numbers, we want to eliminate any numbers between $ [1000,1100] $ that are multiples of 2, then any that are multiples of 3, then any that are multiples of 5, and so on.
Let's keep things simple and consider a single sub-segment of size 100 that covers the entire segment. Then after crossing out multiples of 2, 3, and 5, we have:
X 1000 1001 X 1002 1003 X 1004 X 1005 X 1006 1007 X 1008 1009 X 1010 X 1011 X 1012 1013 X 1014 X 1015 X 1016 ...
After running through this process with each prime in the range $ [2,\sqrt{N}] $ (optionally stopping when you reach $ \sqrt{M+S} $ or the upper boundary of the segment, if the segment is particularly large), the primes can be collected.
The Subtlety: Starting Points
The algorithm described above is great, but for one thing: how can we determine, quickly, where to begin crossing out factors?
In other words, let's suppose we're on the step where we want to cross out factors of 7 in our sub-segment. We want to start at some multiple of 7 in the bit vector; then we walk 7 elements and cross out the number at that position; and we repeat that process until we get to the endof the sub-segment.
But finding the start position takes care. If we start at the first number 1,000, we eliminate $ 1,000; 1,007; 1,014; \dots $, which are not multiples of 7. In fact, we need to start at 1,001, which is the first multiple of 7 in our sub-segment: $ 7 \times 143 = 1,001 $.
To generalize this, we can use a formula which forms the core of the segmented sieve algorithm:
protected static int qInit(long left, int p) {
// first term here is because modOfThis must be positive.
long modOfThis = (left/2)*p - (left+1+p)/2;
int result = (int)(modOfThis%((long)p));
return result;
}
protected static int qReset(int p, int q, int delta) {
int result = (q-delta)%p;
int j = 1;
while(result < 0) {
result += j*p;
}
return result;
}
Code
PrimeGenerator Class
The following class implements the segmented Sieve of Eratosthenes, together with the segmented Sieve of Eratosthenes, to create a PrimeGenerator object that continually generates new prime numbers:
import java.math.BigDecimal;
import java.util.List;
import java.util.Queue;
import java.util.LinkedList;
import java.util.ArrayList;
/** Prime number generator class
* implemented using Longs.
*
* Uses a segmented sieve of Eratosthenes
* algorithm.
*/
public class PrimeGeneratorLong {
/**
* Prime Generator class implementation.
*/
private Queue<Long> q; // The queue stores the prime numbers this generator has found
private long max;
private long lo, hi;
private int rootmax;
private int cycle;
public PrimeGeneratorLong(long max) {
this.max = max;
this.rootmax = (int)(Math.sqrt(this.max));
// Making rootmax even makes life easier.
if(rootmax%2!=0) {
rootmax++;
}
this.q = new LinkedList<Long>();
this.lo = 1;
this.hi = 1;
// Look ahead. This enables hasNext() check.
repopulateQueue();
}
public void reset() {
this.q = new LinkedList<Long>();
this.lo = 1;
this.hi = 1;
repopulateQueue();
}
public Long next() {
Long result = q.remove();
// Look ahead. This enables hasNext() check.
if(q.isEmpty()) {
repopulateQueue();
}
return result;
}
/** Repopulate queue with prime number starting at lo and going to hi. */
public void repopulateQueue() {
int ACCORDION = (int)(Math.round(Math.sqrt(rootmax)));
// Value of ACCORDION*sqrt(MAX) is the stride
// of our segmented Sieve of Eratosthenes algorithm.
if(lo==1) {
this.cycle = 1;
// First time through,
// use normal Sieve of Eratosthenes.
// Add everything up to sqrt(MAX).
// This ensures we don't call segmented Sieve of Eratosthenes
// when low < sqrt(hi), because we already begin
// with low above sqrt(MAX).
q.addAll( primeSieve(rootmax) );
// set new lo and hi.
// lo must be even.
// only need to do this once.
lo = rootmax;
hi = rootmax + ACCORDION*rootmax;
} else {
this.cycle++;
// Every other time through,
// use segmented Sieve of Eratosthenes.
q.addAll( primesRange(lo, hi) );
if(q.isEmpty()) {
// This is only a problem with small integers.
// But it's still a problem.
hi *= 2;
q.addAll( primesRange(lo, hi) );
}
lo = hi;
hi = hi + ACCORDION*rootmax;
}
}
/** Find all prime numbers between left and right as longs.
* */
public static List<Long> primesRange(long left, long right) {
if(left%2!=0 || right%2!=0) {
throw new IllegalArgumentException("Error calling primesRange(): left and right must be even.");
}
// Return this
ArrayList<Long> result = new ArrayList<Long>();
// Ensure that we get precisely the correct range of primes.
int delta;
delta = (int)(right-left);
delta /= 2;
delta += 1;
boolean[] sieve = new boolean[delta-1];
// could make these longs... if sqrt(right) < 10^18, which, obviously.
// could make these ints... if sqrt(right) < 10^9, which... also has to be.
// these compromises are needed to keep the class working.
//
// ...let future self deal with those and hack away at this
// when we actually get there. For now, just get this done.
//
List<Integer> Ps = primeSieveInt( (int)(Math.sqrt(right)) );
// Skip 2
Ps.remove(Ps.indexOf(2));
// Create Qs starting index vector
List<Integer> Qs = new ArrayList<Integer>();
for(int i=0; i<Ps.size(); i++ ) {
Qs.add( qInit(left, Ps.get(i)) );
}
// Loop over each cluster
while(left<right){
// Assume everybody prime
for(int k=0; k<(delta-1); k++) {
sieve[k] = true;
}
// Cross out multiple of each prime
for(int i=0; i<Ps.size(); i++) {
int p = Ps.get(i);
int q = Qs.get(i);
for(int j=q; j<(delta-1); j+=p) {
sieve[j] = false;
}
}
// Reset for next cluster
for(int k=0; k<Ps.size(); k++ ) {
Qs.set(k, qReset(Ps.get(k), Qs.get(k), delta) );
}
// Have knocked out non-primes, so gather the primes
long t = left + 1;
for(int i=0; i<delta-1; i++) {
// t ends at right
if(sieve[i]) {
result.add( new Long(t) );
}
t += 2;
}
left += 2*delta;
}
return result;
}
protected static int qInit(long left, int p) {
// modOfThis must be positive.
//long modOfThis = (((left-1)*(p-1))/2)-1;
long modOfThis = (left/2)*p - (left+1+p)/2;
int result = (int)(modOfThis%((long)p));
return result;
}
protected static int qReset(int p, int q, int delta) {
int result = (q-delta)%p;
int j = 1;
while(result < 0) {
result += j*p;
}
return result;
}
/** Find all prime numbers below n.
*
* Return them as Longs.
*/
public static List<Long> primeSieve(int n) {
// initially assume all integers are prime
boolean[] isPrime = new boolean[n+1];
for (int i = 2; i <= n; i++) {
isPrime[i] = true;
}
// mark non-primes <= n using Sieve of Eratosthenes
for (int factor = 2; factor*factor <= n; factor++) {
// if factor is prime, then mark multiples of factor as nonprime
// suffices to consider mutiples factor, factor+1, ..., n/factor
if (isPrime[factor]) {
for (int j = factor; factor*j <= n; j++) {
isPrime[factor*j] = false;
}
}
}
// Add primes to list
ArrayList<Long> primes = new ArrayList<>();
int nprimes = 0;
for(int k=2; k<=n; k++) {
if(isPrime[k]) {
primes.add( new Long(k) );
nprimes++;
}
}
return primes;
}
/** Find all prime numbers below n.
*
* Return then as Integers.
*/
public static List<Integer> primeSieveInt(int n) {
// initially assume all integers are prime
boolean[] isPrime = new boolean[n+1];
for (int i = 2; i <= n; i++) {
isPrime[i] = true;
}
// mark non-primes <= n using Sieve of Eratosthenes
for (int factor = 2; factor*factor <= n; factor++) {
// if factor is prime, then mark multiples of factor as nonprime
// suffices to consider mutiples factor, factor+1, ..., n/factor
if (isPrime[factor]) {
for (int j = factor; factor*j <= n; j++) {
isPrime[factor*j] = false;
}
}
}
// Add primes to list
ArrayList<Integer> primes = new ArrayList<>();
int nprimes = 0;
for(int k=2; k<=n; k++) {
if(isPrime[k]) {
primes.add( new Integer(k) );
nprimes++;
}
}
return primes;
}
/** The prime generator is agnostic to the maximum;
* we use hasNext to determine when to stop.
* Change this to true for an infinite prime generator.
* */
public boolean hasNext() {
// This is possible because we always ensure there is something in the queue
// before we finish operations on the queue.
if(q.peek().compareTo(max)>0){
return false;
} else {
return true;
}
}
}
Flags