Project Euler/501
From charlesreid1
Breakdown
This was an extremely interesting problem. When I first read through the problem, the solution came to me almost immediately - if we want to know how many divisors a number has, it relates to its prime factorization, and if we want a number with eight divisors, we want a number whose prime factors can be rearranged in exactly 4 distinct ways (leading to 8 pairings, or 8 factors).
This is confirmed by examining the examples the problem gives. Several integers that are below 100 and have exactly 8 factors were given. The prime factorizations of each number followed certain patterns. These patterns are:
$ p_1 p_2 p_3 $
and
$ p_A^3 p_B $
Why these patterns? It's because these are the two kinds of groupings of objects that lead to exactly four ways of partitioning the objects. The first is if we have 3 distinct objects, which can be grouped in 4 distinct ways:
A B C | A B | C A | B C A C | B
and if we have 3 identical items and one unique item, these can also be arranged in exactly four ways:
s s s T | s s s | T s s | s T s | s s T
The problem statement gives two cases where this happens, and leaves you stumped when your code gives you 179 numbers with exactly 8 divisors, instead of the information given in the problem, which is 180. I struggled over this for a while. I thought it might be a bug in my code, but typically these kinds of bugs are symmetric in some way, so I would be leaving out two or more numbers with 8 divisors.
The next thing I checked was the form of the numbers I was generating, which was either the product of 3 distinct primes:
$ p_1 p_2 p_3 $
and
$ p_A^3 p_B $
I was generating every combination of these numbers for every prime below the maximum, in this case 1000, and things seemed to be working out okay, except that I was missing one. When I ran it on the case of a 1000000 maximum, I was off from the information given in the problem by four. The challenge, of course, is that when I spot checked my list of 179 numbers, all of them had exactly 8 divisors - it wasn't that I was generating too many numbers. It was that some number not included on my list had 8 divisors, and I had to find it.
This was a cryptic problem. I tried investigating whether the primes I was using could, in fact, be composite numbers. So, if I substitute 4 for pA and 6 for pB, does that result in a number with 8 factors? The answer, of course, is no. These definitely need to be primes. What if pA and pB, or p1 and p2 and p3, are not distinct primes? This led to more numbers with fewer or more than 8 factors.
I tried pA = 2 and pB = 2, which led to 2^4 or 16, which did not have 8 factors. I tried p1 = 4 and p2 = 2 and p3 = 4, totaling 2^5 or 32, which also did not have 8 factors. Then I tried pA = 4 and pB = 2, for a total of 2^7 or 128, and to my surprise, 128 had 8 factors. Here was the missing prime! 128 was not in the list of primes taking the two forms given above. However, the procedure for finding 2^7 and the rule for it are a bit different. The rule relates to how we can use the prime numbers to construct new numbers, as we saw in Problem 500, the prior problem.
(To give it away: I was missing a prime of the form $ p_r^7 $. More details below.)
Generating composites from primes
Let's go back to that problem for a moment. There, we were trying to construct numbers with a specified number of divisors. To do that, we constructed a table with the prime numbers, with columns arranged in a particular way: n, n^2, n^4, n^8, and so on. Then, to construct new numbers, we found new combinations of numbers from each column. The reason we have only powers of 2 for these primes is, any number can be expressed as the sum of powers of 2 - i.e., can be represented in binary. if we had an odd power of n, say n^5, we could write in terms of powers of 2 like: (n)(n^4). Likewise for n^13, we could write it as (n)(n^4)(n^8).
That is, any integer power of n can be written as a binary number, and that binary number yields the order of selection of the primes from the table.
Generating composites with 8 divisors
Back to our problem at hand. We are looking for a third combination of prime numbers that will give us numbers with 8 divisors. If, instead of using primes in the above expressions, we instead use even powers of primes, we get a whole new class of numbers with 8 divisors, which do not show up until at least 128.
For example, a number with 8 divisors can be formed by letting the primes be distinct powers of the prime 2:
$ \begin{align} p_1 &=& 2 \\ p_2 &=& 2^2 \\ p_3 &=& 2^4 \end{align} $
That forms the candidate number:
$ c = 2^{4+2+1} = 128 $
which, indeed, has 8 divisors:
We were only off by 1 (our program predicted 179 numbers with 8 factors below 1,000, instead of the problem statement's 180) below 1,000 because the next such candidate number is:
$ c = 3^{4+2+1} = 2187 $
which has 8 divisors:
and here are the values of the seventh power of the first few primes:
2 128 3 2187 5 78125 7 823543 11 19487171 13 62748517 17 410338673 19 893871739
which all have 8 divisors:
All of which is to say, in the process of generating numbers with 8 divisors, we need to include numbers of the form $ p_r^7 $.
Procedure
Now, the procedure is as follows:
- Generate various combinations of three primes $ p_1, p_2, p_3 $ whose product is less than the maximum; those numbers are numbers with 8 divisors of the form $ p_1 p_2 p_3 $. The range for these numbers should be all prime numbers from 2 to max/6.
- Generate combinations of two primes $ p_A p_B $, and generate the numbers $ p_A^3 p_B $ and $ p_A p_B^3 $. Keep the products that are less than the maximum; those numbers each have 8 divisors. The range for these numbers should be all prime numbers from 0 to the square root of max.
- Generate primes $ p_r $ and corresponding numbers $ p_r^7 $. Keep results that are less than the maximum; those numbers each have 8 divisors. The range for these numbers should be from 0 to the seventh root of max (square root of square root, if you're lazy).
Once we do that, we're all set. Indeed, this procedure gets me correct answers for all numbers up to 1000 that have 8 divisors, and also gives me the correct answer for all numbers up to 1,000,000 that have 8 factors. It's also blazing fast, finding the correct number of composites up to 1 million with 8 divisors and returning it in less than half a second:
$ javac Problem501.java && time java Problem501 Finished with first part. On to second part... Finished with second part. On to third part... Number of integers with exactly 8 divisors: 224427 real 0m0.286s user 0m0.512s sys 0m0.052s
But when we raise the ceiling to 1 trillion, it chokes.
Not So Fast, Eratosthenes
So, what's the problem here?
The problem is that we are trying to count to a trillion - possibly several times.
Why are we trying to count to a trillion? We're trying to assemble numbers with 8 divisors, and one possible form of these numbers is:
$ p_1 p_2 p_3 $
where the p's are arbitrary prime numbers.
The best case scenario is that these three primes are all around the same size, the cube root of our max (the cube root of a trillion is only 1,000).
But worst case scenario is that we have two very small primes like 2 or 3, and the last prime is an integer near 1000000000000/6. That means that to FIND those primes, we're going to be running a number sieve and actually counting to almost a trillion.
To put a trillion into perspective, the typical computer can do a billion operations per second. Doing anything interesting takes around a hundred instructions, so that leaves us with around ten million operations per second. A simple back of the envelope tells you that you're going to be waiting
$ \dfrac{10^{12} \mbox{ ops}}{ 10^7 \mbox{ ops/s}} = 10^5 \mbox{ s} \sim 1 \mbox{ day} 4 \mbox{ hours} \dots $
Keep in mind, this is optimistic. A factor of 10 could ruin your week.
This means our approach to the problem with a maximum of a million is much simpler than our approach with a maximum of a trillion, because suddenly we're doing everything we were doing before, a million times. Even the smallest inefficiencies will snowball.
Solving this problem requires a couple of tools, beginning with a prime number sieve that can start at an arbitrary location, instead of starting at 2. The term for this type of prime number sieve is a segmented Sieve of Eratosthenes; it is a prime number sieve taking lower and upper bound arguments.
Because this is a combinatorics search problem, we also need to spend additional time analyzing the problem space and finding some elegant shortcuts to save ourselves time.
Segmented Sieve
See Segmented Sieve
A Long Hiatus
(We let this problem sit and stew for a few years without coming back to it)
Redux
On revisiting Project Euler Problem 501, here's the crux of the problem as we left it:
- We want to count the number of integers less than a trillion that have exactly 8 divisors.
- We know that such numbers can take several forms, involving prime factors raised to various powers. Specifically:
$ p_A p_B p_C $
$ p_1^3 p_2 $
$ p_r^7 $
These prime number factors can be as large as (in the first case) a trillion divided by 6, which really doesn't cut the problem size down by much. Here's an example of an integer that has exactly eight factors:
$ 999,999,999,762 = 2 \times 3 \times 166,666,666,627 $
Whose eight divisors are:
$ 1 \quad 2 \quad 3 \quad 6 \quad 166666666627 \quad 333333333254 \quad 499999999881 \quad 999999999762 $
Because these prime factors can be any prime numbers, the worst case scenario is that we have an integer very close to a trillion where the first two prime factors are 2 and 3, meaning we have to account for primes very close to a trillion.
And that means (using the approach above) we would have to GENERATE all primes below a trillion (or, a trillion divided by six, if you want to split hairs). That's totally out.
But here's the thing: we don't have to GENERATE the prime numbers, we just have to count them. So if we abandon the Sieve of Eratosthenes, which actually generates the prime numbers, we might be able to find a function that gives us less information (a count of prime numbers less than a trillion, instead of the prime numbers themselves) with the tradeoff that it goes much faster.
Is It Feasible?
Take a look at the MathWorld site on prime counting functions: http://mathworld.wolfram.com/PrimeCountingFunction.html
This states that Mathematica can count primes up to approximately
$ 8 \times 10^{13} $
Looking at the table given on that page, we can actually just read off the number of primes less than a trillion, which is
$ 37,607,912,018 \sim 3.7 \times 10^{10} $
So we're in safe territory - we won't have to outperform Mathematica to get the answer to this question.
Why you can't use a table value
The easiest solution would be to just read the value of $ \pi(x) $ for x = 1 trillion, then boom, voila, we just have to figure out how many ways to configure those numbers there are.
But that's not possible, because we need to evaluate the number of primes less than (some arbitrary numbers).
For example, the number of primes less than the seventh root of n.
Quick Sympy Test
$ python Python 3.6.3 |Anaconda, Inc.| (default, Oct 6 2017, 12:04:38) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sympy >>> sympy.ntheory.generate.primepi(1e6) 78498 >>> sympy.ntheory.generate.primepi(1e7) 664579 >>> sympy.ntheory.generate.primepi(1e8) 5761455 >>> sympy.ntheory.generate.primepi(1e9) 50847534 >>> sympy.ntheory.generate.primepi(1e10) 455052511
These take mere seconds to evaluate. This one takes a little longer, but less than 1 minute, showing that this approach is absolutely feasible:
>>> sympy.ntheory.generate.primepi(1e11) 4118054813
Links
The wikipedia article on the prime counting function is very high quality and provides a good overview of the prime counting function: https://en.wikipedia.org/wiki/Prime-counting_function
This math stack exchange answer gives some nice references, including papers and implementations: https://math.stackexchange.com/a/893767
Prime counting codes:
Flags