I briefly wrote about this several years ago, but with my newfound powers of publishing LaTeX to the web, I feel that it can be made more accessible now.
Here’s the math problem, which sounds innocent enough:
Find the sum of all integers between 0 and (one million) such that both and the reverse of are divisible by 7. More generally, replace by and find an algorithm that works for any .
Just to work up to this a bit, if those extra conditions did not exist, the problem would be easy. Ignoring 0, there are exactly integers between 1 and , and they can be paired off into sets: all the way up to the last set . Each of these sets has the same sum: . Therefore the sum of all the numbers is just .
In addition, if the divisible-by-7 condition did exist, but not the condition on the reverse of , it would also be relatively easy. The numbers to add would be the multiples of 7: up to . We can divide each of these numbers by 7, add them, then multiply the product by 7 at the end. That addition can be done the same way as in the above paragraph and we end up with .
It is the condition on ‘s reverse that makes this hard. Qualifying numbers start out 0, 7, 70, 77, 161, 168, 252, 259…, and end with …999838, 999922, 999929, 999992, 999999. There doesn’t seem to be anything obvious about the digits themselves, nor anything very regular about the spacing between the numbers (which was the key to the intermediate problem with only one condition).
Let be the number of integers with and (mod 7) and (mod 7). Similarly, let be the sum of those integers. is the answer to the our question, by definition, but first let’s look at . It will turn out that we can use this to calculate .
Observe that can be represented as the sum of 10 instances of over intervals of size , one for each possible first digit (left padding with zeros to ensure N digits).
As an example, what Equation (1) is saying is that
This is because however many numbers there are between 0 and 100 with our divisibility properties, they are divided up among the ranges [0,9], [10,19], [20,29] up to [90,99]. If we count the numbers in those bins up separately and then add the totals for the bins, it will come at the same as if the numbers are counted up all at once.
We will characterize the numbers counted by each of the summands in Equation (2) in order to fold them together. The property we use to relate functions over different intervals comes from the digits of the numbers. Let be an -digit number, with first digit . Then,
That is, every number in an interval can be mapped to a number in with quantifiable changes to its (and its reverse’s) residue modulo 7, simply by removing the first digit. In particular, by (3) and (4) this reduction is
For example, each number in the interval [60, 69], can be mapped onto a one digit number by removing the tens digit, 6 (e.g, 62 becomes 2). If we were trying to calculate , we could use this relation:
Here’s why: The first 2 comes from the fact that when you subtract 60 from something with a remainder of 6 when divided by 7, the difference has a remainder of (mod 7). The second 2 in Equation (6) is a little harder to see: Removing an initial 6 from a number whose reverse has a remainder of 5 has the effect of removing the last digit from its reverse. Subtracting 6 from a number that had a remainder of 5 changes the remainder to 6, then dividing by 10 changes the remainder to (mod 7).
Notice anything? Three of the terms, , and , appear twice. That means that rather than examine each number from 0 to 99 as it looked like we might need to do to find , we need only examine the numbers from 0 to 9, 7 times each. We would then sum up the results of those 7 passes, taking care to count a certain three of them twice. That’s a 30% savings, but it doesn’t end there.
Suppose the original problem had been . We could divide it into 10 subproblems, each ranging from 0 to 100. Then, we could divide each of those up into 10 smaller problems from 0 to 10, then combining the ones that are copies. How many distinct problems would there be at this stage? They all have the form and and can only be integers in . That’s only 49 subsubproblems, at the most. Now we have saved more than half the work. For any larger-sized ranges we could think of, the number of subproblems stays limited to 49.
We could find by only looking at for 49 sets of and , then doing a lot of adding!
There are two things left to do: Figure out how to add these subproblems back together and how to get the sum of those numbers, which was the original question.
Just to make things a little more compact, let be equal to (mod 7) and to (mod 7), we have
Remember the explanation for how we derived Equation (1)? We can carry out a similar reduction on to get
This is somewhat more complicated because when we map -digit numbers to -digit numbers, we aren’t changing , which counts the numbers, but we are changing , their sum. We have to add back in contributions made by the removed digits.
We’ll maintain tables and at each level that hold the multiplicity of each of the -digit subproblems.
To continue our mission to compute , we begin by initializing the lowest order table as follows below. The colors will be explained in a moment.
Only has the value 1 because the only number with “0 digits” is 0, and it and its reverse have residue 0 modulo 7. We can apply Equation (1) to populate . Notice that each element in is the sum of 7 elements in , 3 of them counted twice. E.g.,
with (mod 7), and . When , this corresponds to the bluest cell (2,4) in . At , we’ve moved up and to the left to a slighter lighter blue cell. As ranges up to 9, we take a knight’s tour illustrated by the color fading from blue to green. The other cells visited are (1,2), (0,0), wrapping to (6,5), (5,3), (4,1), (3,6), and back to (2,4), (1,2) and (0,0). Notice that the 3 bluest cells are visited twice. In particular (0,0) is visited third and tenth, giving us two 1’s in the sum and .
Filling out the rest of ,
And continuing with ,
To illustrate the interpretation of this, take , in red. This means that are 4 numbers between 0 and 100 which have a remainder of 5 when divided by seven and when reversed, have a remainder of 0. Sure enough, there are four and they are 12, 19, 82 and 89. Continuing,
At this point, we can extract all the needed values of the -function to generate the tables, via Equation (7). The table is built in a similar fashion, except it draws from both and . Unlike , is entirely zeros. Within an asymptotically constant factor, the numbers in are the squares of those in . As a result, the tables must be omitted for space. However, ultimately, we do arrive at our answer .
In general, for a base the computation of each subproblem is . With forward and backward divisors and , there are subproblems on each of levels, thus this algorithm requires operations. Practically speaking, , a number with 1,999 digits, can be calculated in only 14 seconds on my laptop.
For completeness and for the curious, here are some extended results. The numbers in the range on the left that meet our divisibility conditions have their sum equal to the number on the right.
[0, 1] – 0
[0, 10] – 7
[0, 100] – 154
[0, 1000] – 10787
[0, 10000] – 1029567
[0, 100000] – 105714126
[0, 1000000] – 10363989636
[0, 10000000] – 1027216680497
[0, 100000000] – 102184086890270
[0, 1000000000] – 10205609904879424
[0, 10000000000] – 1020424310227628614
[0, 100000000000] – 102049428685193293045
[0, 1000000000000] – 10204479595989795520404
[0, 10000000000000] – 1020425244837350504933851
[0, 100000000000000] – 102041179796115463104759635
[0, 1000000000000000] – 10204085511291024760949778376