Copyright & Disclaimer

 

 

 SATOCONOR.COM

Anonymous ‘Notes On Last Digit Distribution of the Prime Numbers’ 6.4. (2007)

Communication to the editor

SATOCONOR.COM Journal of RANDOMICS

 

 

Notes on Last Digit Distribution of the Prime Numbers

Email correspondence with professional mathematician who, for his own reasons, wants to remain anonymous

Communication with Johan Gerard van der Galiën

For comments: johan.van.der.galien@satoconor.com

Version 1.0 November 29, 2007

 

HOME of SATOCONOR.COM

 

 

----- Original Message -----

From: "Anonymous"

To: <johan.van.der.galien@satoconor.com>

Subject: A comment on "Last Digit Distribution of the Prime Numbers” Are the Prime Numbers Randomly Distributed? Part 2" SATO 5.3. (2006)

 

Just as an experiment I used Mathematica to select 50000 random

numbers between 10^6 and 10^7-1, collected only those that were

prime, and did a Chi Squared calculation on the distribution of

the last digit.  I repeated this whole process 100 times, and

sorted the resulting values.  As an example, I got

 

0.113208, 0.200373, 0.227615, 0.278162, 0.306059,

0.354415, 0.357572, 0.357677, 0.449953, 0.504005,

0.512089, 0.767157, 0.770541, 0.779788, 0.941967,

1.00554, 1.08613, 1.10349, 1.11877, 1.13043,

1.18732, 1.24862, 1.249, 1.26835, 1.26924,

1.27138, 1.27453, 1.30117, 1.31329, 1.35723,

1.37658, 1.3942, 1.45258, 1.53429, 1.55221,

1.61485, 1.70891, 1.80764, 1.82649, 1.85961,

1.92429, 1.93912, 2.02925, 2.03448, 2.18291,

2.21013, 2.32982, 2.39387, 2.39969, 2.42981,

2.4572, 2.48232, 2.65632, 2.67554, 2.70417,

2.72791, 2.7325, 2.75107, 2.75417, 2.7898,

2.79956, 2.92413, 2.94878, 2.9511, 3.26214,

3.29618, 3.30631, 3.3789, 3.38745, 3.40358,

3.7597, 3.96487, 3.99733, 4.00215, 4.22697,

4.34551, 4.43908, 4.44759, 4.49261, 4.87123,

4.87198, 5.08945, 5.17533, 5.26189, 5.28086,

5.66526, 5.75077, 6.31547, 6.54233, 6.54706,

6.66218, 7.61201, 7.67855, 8.39802, 8.52527,

8.92465, 9.48768, 10.1931, 11.8016, 12.2043

 

And I find about 5 below the 5% level and 5 above the 95% level.

If I repeat this whole process I tend to get roughly the same

results, but not exactly each time, as I would expect.

Hopefully I haven't made a mistake in doing this.  But if not

this seems to give different results from the author's tests.

 

I am puzzled why we get different results and wonder if there

is an explanation that could resolve this.

 

Thank you

 

 

----- Original Message -----

From: “Anonymous”

To: Johan van der Galiën

Subject: A comment on "Last Digit Distribution of the Prime Numbers” Are the Prime Numbers Randomly Distributed? Part 2" SATO 5.3. (2006)

 

Johan van der Galiën wrote: >

 

>Dear Sir,

 

>I think the difference comes from that I used consecutive primes and you

>used primes picked at random! That this will be a difference is also clearly

>stated in the 'Last Digit Distribution of the Prime Numbers' article.

 

>The 167 "ideal" sample size is of course a very rough estimation based on

>only one (consecutive primes) measurement. From you I know now that 50000 on

>a 10^6 to 10^7-1 interval also works very well!

 

I seem to remember that there were other conditions for the Chi Squared

test to be applied and not give misleading results, but it has been a long

time since I have worked on that and do not remember the details.

 

For my own reasons I would remain anonymous.  I do not want any credit for

anything that I do.

 

On that condition I will show you a few minutes work on this:

--------------------------------------------------------------------------

Chi Squared calculation for a list of 4 items, with expected equal

numbers for all items in the list

 

chiSquared[x_]:=Module[{expected=Apply[Plus,x]/4},

     (x[[1]]-expected)^2/expected+(x[[2]]-expected)^2/expected+

     (x[[3]]-expected)^2/expected+(x[[4]]-expected)^2/expected

     ]

 

Table of Chi Squared test on number of primes ending in 1,3,7,9 in 50000

random integers between n and 10n-1

 

f[n_]:=

   TableForm[(*Print in nice row format*)

     Partition[(*Divide up into groups of 5, to neatly see 5% and 95% rows*)

       Sort[(*Sort the Chi Squared values in increasing order*)

         Table[(*Build a table of 100 runs of the experiment*)

           N[chiSquared[(*Calculate the Chi Square result for one experiment*)

               Map[Length,(*Find the number in each trailing digit group*)

                 Split[(*Break into groups of 1's, 3's, 7's and 9's*)

                   Sort[(*Sort trailing digits into increasing order*)

                     Map[Mod[#,10]&,(*Extract trailing digit of each prime*)

                       Select[(*Pick out just the prime integers*)

                         Table[(*Build a list of 50000 random integers in range*)

                           Random[Integer,{n,10n-1}],{50000}

                           ],

                         PrimeQ

                         ]

                       ]

                     ]

                   ]

                 ]

               ]

             ],{100}

           ]

         ],5

       ]

     ]

 

Let us begin with 100 experiments of sampling 50000 random integers in

10..99, extracting the primes and calculating the 100 Chi Squared values

 

f[10]

 

{ {42.397, 51.352, 53.195, 54.268, 57.743},

   {58.466, 59.688, 60.394, 62.401, 63.658},

   {64.850, 64.877, 65.032, 65.123, 66.498},

   {66.616, 67.089, 67.181, 67.251, 67.641},

   {68.391, 69.777, 70.264, 70.585, 72.152},

   {72.235, 73.181, 74.645, 74.752, 75.027},

   {75.115, 75.306, 75.886, 76.193, 76.402},

   {76.582, 76.771, 76.910, 77.356, 77.990},

   {78.386, 78.639, 78.689, 79.260, 80.194},

   {80.392, 80.919, 81.071, 81.142, 81.856},

   {82.369, 83.081, 83.098, 83.998, 84.040},

   {84.530, 84.920, 85.401, 85.851, 86.223},

   {86.257, 86.555, 86.628, 87.429, 87.717},

   {88.316, 88.764, 89.496, 89.944, 90.349},

   {90.822, 90.838, 91.840, 92.709, 93.413},

   {94.036, 94.437, 94.681, 96.770, 97.307},

   {98.494, 100.139, 100.662, 103.817, 106.110},

   {106.228, 106.498, 106.714, 107.355, 107.479},

   {109.474, 111.990, 114.357, 114.853, 115.270},

   {119.929, 124.612, 130.716, 132.142, 136.796} },

 

Chi Square is very convinced this would not arise by sampling if 1,3,7,9

appeared equally frequently

 

The large number of samples makes this test very sensitive.  Look at how

many of each trailing digit appears

 

Split[Sort[Map[Mod[#,10]&,Select[Range[10,99],PrimeQ]]]]

 

{{1,1,1,1,1},{3,3,3,3,3,3},{7,7,7,7,7},{9,9,9,9,9}}

 

Ah, so 3 appears 6 times while the others appear only 5 times

 

And so Chi Square has correctly concluded our sample is unlikely to have

been from equal frequency digits

 

What is the Chi Square value if we used all the primes in this range as

a sample?

 

N[chiSquared[{5,6,5,5}]]

0.142857

 

That statistic would occur more than 1% of the time but thus is far less

surprising than my extreme table values indicate, because of sample size

 

How about larger numbers?

 

f[10^2]

 

{ {16.251, 18.350, 19.150, 21.582, 22.723},

   {24.428, 26.840, 26.938, 27.040, 28.566},

   {32.105, 32.395, 32.748, 33.449, 34.539},

   {34.684, 35.310, 35.609, 35.628, 36.131},

   {36.235, 36.496, 36.749, 36.809, 37.067},

   {37.556, 37.862, 37.866, 37.932, 38.366},

   {38.908, 39.038, 39.365, 39.394, 39.538},

   {39.712, 39.755, 40.278, 40.989, 41.022},

   {41.304, 41.378, 42.290, 42.404, 42.611},

   {42.735, 43.004, 43.511, 43.686, 44.273},

   {44.516, 45.098, 45.159, 45.507, 45.585},

   {45.646, 46.459, 46.919, 46.964, 47.516},

   {47.546, 47.895, 48.163, 48.271, 48.296},

   {48.943, 49.580, 49.721, 50.049, 50.698},

   {51.557, 52.394, 52.409, 53.555, 54.283},

   {54.763, 54.935, 55.122, 55.182, 55.615},

   {55.696, 55.958, 56.604, 56.649, 58.084},

   {60.420, 60.823, 62.450, 62.586, 65.002},

   {65.022, 65.351, 67.807, 70.084, 73.685},

   {75.689, 76.203, 76.490, 85.047, 90.606} },

 

Again Chi Square claims not likely selected from 4 equal categories,

why?  Just show the counts of trailing digits

 

Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[100,999],PrimeQ]]]]]

 

{35,35,40,33}

 

And that certainly is not equally occurring frequencies of trailing digits

 

N[chiSquared[{35,35,40,33}]]

 

0.748252

 

But that lies neatly between the upper and lower Chi Square limits.

This again shows the difference in power when using many samples versus

one sample

 

f[10^3]

 

{ {0.080, 0.198, 0.269, 0.274, 0.289},

   {0.337, 0.366, 0.375, 0.486, 0.580},

   {0.604, 0.851, 0.926, 0.929, 0.939},

   {1.002, 1.132, 1.132, 1.173, 1.318},

   {1.331, 1.336, 1.385, 1.479, 1.497},

   {1.511, 1.540, 1.609, 1.790, 1.812},

   {1.837, 1.838, 1.881, 1.971, 1.974},

   {1.992, 2.026, 2.087, 2.136, 2.187},

   {2.230, 2.268, 2.291, 2.383, 2.471},

   {2.487, 2.508, 2.524, 2.618, 2.720},

   {2.769, 2.771, 2.823, 2.857, 2.871},

   {2.874, 2.897, 2.902, 2.955, 3.040},

   {3.154, 3.188, 3.310, 3.383, 3.395},

   {3.397, 3.770, 3.777, 3.841, 3.914},

   {3.924, 3.982, 3.982, 4.191, 4.250},

   {4.416, 4.418, 4.475, 4.490, 4.674},

   {4.724, 4.923, 4.989, 5.155, 5.236},

   {5.471, 5.673, 5.720, 5.744, 6.066},

   {6.295, 6.361, 6.922, 7.058, 7.763},

   {7.806, 7.960, 8.415, 14.458, 16.201} },

 

This is very close to the expected 5% and 95% values for Chi Square of

0.352 and 7.815, what are the counts?

 

Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[1000,9999],PrimeQ]]]]]

 

{266,268,262,265}

 

N[chiSquared[{266,268,262,265}]]

 

0.070688

 

And the single sample shows a very different answer from large numbers

of samples

 

f[10^4]

{ {0.088, 0.165, 0.225, 0.279, 0.313},

   {0.347, 0.356, 0.374, 0.408, 0.499},

   {0.504, 0.537, 0.547, 0.582, 0.621},

   {0.725, 0.750, 0.754, 0.761, 0.818},

   {0.859, 1.025, 1.119, 1.131, 1.184},

   {1.216, 1.244, 1.326, 1.397, 1.448},

   {1.486, 1.542, 1.555, 1.668, 1.739},

   {1.794, 1.798, 1.868, 1.940, 1.968},

   {1.981, 2.056, 2.066, 2.132, 2.219},

   {2.296, 2.458, 2.522, 2.600, 2.625},

   {2.644, 2.699, 2.796, 2.880, 2.893},

   {2.919, 2.983, 2.992, 3.066, 3.092},

   {3.161, 3.219, 3.221, 3.255, 3.276},

   {3.401, 3.451, 3.523, 3.623, 3.688},

   {3.796, 3.812, 3.928, 4.174, 4.306},

   {4.595, 4.626, 4.694, 5.049, 5.060},

   {5.102, 5.290, 5.797, 5.820, 5.826},

   {5.983, 6.085, 6.248, 6.321, 6.528},

   {6.534, 6.750, 6.860, 6.942, 6.957},

   {7.191, 7.359, 7.662, 12.378, 13.506} },

 

Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[10000,99999],PrimeQ]]]]]

 

{2081,2092,2103,2087}

 

N[chiSquared[{2081,2092,2103,2087}]]

 

0.124716

 

f[10^5]

{ {0.019, 0.073, 0.185, 0.210, 0.214},

   {0.221, 0.310, 0.328, 0.358, 0.369},

   {0.391, 0.486, 0.525, 0.529, 0.562},

   {0.610, 0.660, 0.688, 0.691, 0.768},

   {0.778, 0.814, 0.910, 0.979, 1.007},

   {1.120, 1.289, 1.302, 1.350, 1.486},

   {1.498, 1.565, 1.720, 1.814, 1.817},

   {1.849, 1.875, 2.006, 2.006, 2.047},

   {2.137, 2.148, 2.207, 2.233, 2.234},

   {2.239, 2.397, 2.416, 2.443, 2.468},

   {2.474, 2.521, 2.616, 2.616, 2.636},

   {2.665, 2.685, 2.685, 2.691, 2.730},

   {2.907, 3.016, 3.081, 3.146, 3.207},

   {3.272, 3.274, 3.299, 3.307, 3.334},

   {3.374, 3.379, 3.443, 3.469, 3.517},

   {3.623, 3.927, 4.150, 4.154, 4.193},

   {4.475, 4.904, 4.940, 5.138, 5.188},

   {5.245, 5.255, 5.329, 5.442, 5.522},

   {5.990, 6.145, 6.466, 7.464, 7.732},

   {7.762, 9.606, 9.913, 10.952, 11.956} },

 

Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[100000,999999],PrimeQ]]]]]

 

{17230,17263,17210,17203}

 

N[chiSquared[{17230,17263,17210,17203}]]

 

0.125911

 

f[10^6]

{ {0.093, 0.138, 0.206, 0.246, 0.319},

   {0.346, 0.375, 0.451, 0.480, 0.502},

   {0.541, 0.626, 0.648, 0.807, 0.832},

   {0.837, 0.844, 0.899, 0.932, 0.937},

   {0.976, 1.028, 1.037, 1.069, 1.147},

   {1.183, 1.315, 1.358, 1.373, 1.381},

   {1.460, 1.496, 1.565, 1.576, 1.583},

   {1.639, 1.715, 1.769, 1.775, 1.822},

   {1.826, 1.892, 1.905, 1.922, 1.950},

   {1.989, 2.185, 2.202, 2.293, 2.572},

   {2.589, 2.623, 2.623, 2.659, 2.700},

   {2.930, 2.938, 3.094, 3.139, 3.190},

   {3.239, 3.397, 3.527, 3.567, 3.579},

   {3.609, 3.622, 3.662, 3.807, 3.906},

   {3.947, 4.015, 4.031, 4.061, 4.090},

   {4.195, 4.574, 4.677, 4.685, 5.098},

   {5.107, 5.218, 5.408, 5.536, 5.571},

   {5.616, 5.654, 5.905, 5.962, 6.078},

   {6.168, 6.171, 7.179, 7.225, 7.344},

   {8.118, 8.355, 8.836, 9.114, 9.348} },

 

Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[1000000,9999999],PrimeQ]]]]]

 

{146487,146565,146590,146439}

 

N[chiSquared[{146487,146565,146590,146439}]]

 

0.0994726

 

f[10^7]

{ {0.118, 0.125, 0.138, 0.224, 0.282},

   {0.297, 0.380, 0.382, 0.415, 0.471},

   {0.507, 0.546, 0.568, 0.585, 0.610},

   {0.681, 0.682, 0.685, 0.750, 0.750},

   {0.829, 0.831, 0.860, 0.860, 0.909},

   {0.911, 0.967, 0.991, 1.010, 1.039},

   {1.111, 1.143, 1.180, 1.303, 1.441},

   {1.486, 1.533, 1.573, 1.617, 1.657},

   {1.682, 1.763, 1.769, 1.786, 1.818},

   {1.832, 1.908, 1.928, 1.973, 1.981},

   {2.069, 2.113, 2.115, 2.170, 2.213},

   {2.282, 2.305, 2.318, 2.357, 2.520},

   {2.524, 2.608, 2.612, 2.722, 2.723},

   {2.816, 2.844, 2.952, 3.023, 3.034},

   {3.141, 3.196, 3.230, 3.391, 3.431},

   {3.456, 3.710, 3.728, 3.889, 4.105},

   {4.226, 4.242, 4.268, 4.349, 4.587},

   {4.630, 4.657, 4.715, 4.926, 5.255},

   {5.446, 5.637, 5.803, 6.147, 6.800},

   {7.489, 7.651, 8.191, 10.400, 10.503} },

 

f[10^8]

{ {0.040, 0.159, 0.204, 0.229, 0.243},

   {0.298, 0.365, 0.383, 0.400, 0.453},

   {0.528, 0.531, 0.531, 0.586, 0.605},

   {0.677, 0.686, 0.740, 0.764, 0.785},

   {0.841, 0.877, 0.907, 0.923, 0.926},

   {1.007, 1.036, 1.066, 1.073, 1.084},

   {1.117, 1.123, 1.145, 1.152, 1.161},

   {1.202, 1.260, 1.275, 1.410, 1.472},

   {1.531, 1.661, 1.672, 1.753, 1.758},

   {1.759, 1.761, 1.784, 1.837, 1.879},

   {2.073, 2.097, 2.188, 2.214, 2.258},

   {2.266, 2.328, 2.378, 2.513, 2.514},

   {2.807, 2.840, 2.853, 2.861, 2.910},

   {3.065, 3.133, 3.193, 3.483, 3.549},

   {3.552, 3.650, 3.692, 3.714, 3.744},

   {3.810, 3.958, 4.077, 4.320, 4.348},

   {4.505, 4.668, 4.868, 4.916, 5.159},

   {5.255, 5.458, 5.720, 5.798, 5.856},

   {5.964, 5.973, 6.218, 6.223, 7.058},

   {7.187, 8.450, 8.901, 14.907, 15.151} },

 

f[10^9]

{ {0.020, 0.129, 0.130, 0.448, 0.457},

   {0.475, 0.547, 0.582, 0.624, 0.661},

   {0.684, 0.719, 0.758, 0.780, 0.783},

   {0.874, 0.924, 1.005, 1.035, 1.074},

   {1.086, 1.284, 1.313, 1.581, 1.64},

   {1.641, 1.649, 1.650, 1.653, 1.785},

   {1.851, 1.872, 1.904, 1.972, 1.995},

   {2.000, 2.018, 2.024, 2.124, 2.129},

   {2.136, 2.142, 2.147, 2.154, 2.240},

   {2.241, 2.317, 2.370, 2.375, 2.424},

   {2.478, 2.523, 2.529, 2.646, 2.677},

   {2.684, 2.749, 2.914, 2.923, 3.071},

   {3.115, 3.133, 3.245, 3.257, 3.526},

   {3.543, 3.575, 3.647, 3.743, 3.748},

   {3.843, 3.864, 3.910, 4.100, 4.182},

   {4.256, 4.312, 4.364, 4.392, 4.728},

   {4.730, 4.764, 4.887, 4.991, 5.026},

   {5.686, 5.786, 6.269, 6.433, 6.671},

   {7.646, 7.748, 7.808, 7.823, 8.080},

   {9.934, 10.171, 10.799, 10.915, 12.310} },

 

Thus it seems that with small numbers of samples it is possible to get

values that are very different from large numbers of samples.  I gave

my statistics books away to students long long ago and cannot remember

the subject now.  But it seems like there was some rule about when this

could be applied.

 

> For example the Random Function of Mathematica seems to pass this test.

> 

> Kind regards,

 

I hope something in this might be of use to you.  Do with it what you

wish, as long as I get no credit.

 

Thank you

 

>Johan van der Galiën.

>SATOCONOR.COM Chief-Editor and Webmaster