Academia.eduAcademia.edu

Twin Primes Segmented Sieve of Zakiya (SSoZ) Explained

In 2014 I released The Segmented Sieve of Zakiya (SSoZ) [1]. It described a general method to find primes using an efficient prime sieve based on Prime Generators (PG). I expanded upon it, and in 2018 I released The Use of Prime Generators to Implement Fast Twin Primes Sieve of Zakiya (SoZ), Applications to Number Theory, and Implications for the Riemann Hypotheses [2]. The algorithm has been improved and now also used to find Cousin Primes. This paper explains in detail the what, why, and how of the algorithm and shows its implementation in 6 software languages, and performance data for these 6 languages run on 2 different cpu systems, with 8 and 16 threads.

Twin Primes Segmented Sieve of Zakiya (SSoZ) Explained Jabari Zakiya © Revised June 28, 2022 jzakiya@gmail.com Introduction In 2014 I released The Segmented Sieve of Zakiya (SSoZ) [1]. It described a general method to find primes using an efficient prime sieve based on Prime Generators (PG). I expanded upon it, and in 2018 I released The Use of Prime Generators to Implement Fast Twin Primes Sieve of Zakiya (SoZ), Applications to Number Theory, and Implications for the Riemann Hypotheses [2]. The algorithm has been improved and now also used to find Cousin Primes. This paper explains in detail the what, why, and how of the algorithm and shows its implementation in 6 software languages, and performance data for these 6 languages run on 2 different cpu systems with 8 and 16 threads. General Description The programs count the number of Twin|Cousin Primes between two numbers within a 64-bit range, i.e. 0 – 18,446,744,073,709,551,615 (2**64 – 1), and also returns the largest twin|cousin value within it. The algorithm has no mathematical limits, but [hard|soft]ware does, so its coded to run on commonly available 64-bit multi-core systems containing a reasonable amount of memory (the more the better). Below is a diagram and description of the major functional components of the algorithm and software. Inputs Formatting One or two values are entered (order doesn’t matter) specifying the numerical range. They’re converted to odd values, and|or defaults, after conditional checks. Inputs Formatting Pn Selection and Parametization Pn Selection and Parameterization The inputs numerical range is used to select the Pn generator used to perform the residues sieve. Once determined, its generator parameters are created. Sieve Primes Generation The sieving primes ≤ sqrt(end_num) for the range are generated, but only those with multiples within the numerical range are used for the Pn generator. Sieve Primes Generation Residues Sieves In parallel for each twin|cousin residues pair for Pn, the sieve primes are used to create the nextp array of start locations for marking their multiples for each segment size the input numerical range is split into. Outputs Collection and Display The prime pairs count and largest value is collected for each residue pair thread, and their final greatest values displayed, along with timing data. 1 Residues Sieves Outputs Collection and Display Math Fundamentals Prime numbers do not exist randomly! When we break the number line into even sized groups of integers (the group numerical bandwidth and prime generator modulus value), the primes are evenly distributed along the residues in each group, i.e. the coprime values to the modulus (their greatest common divisor (gcd) with the modulus is 1). Thus a modulus, and its associated residues, form a Prime Generator (PG), a mathematical expression and framework for generating and identifying every prime not a modulus prime factor. While a PG modulus can be any even number, the most efficient moduli are strictly prime primorials. These prime generators have the smallest ratios of (# of residues)/modulus and make the number space primes exist within the smallest possible for a given number of residues. As more primes are used to form the PG moduli they systematically squeeze the primes into smaller and smaller number spaces. The S|SoZ algorithms are based on the structure and framework of Prime Generators, whose math and properties are formalized in Prime Generator Theory (PGT). For an extensive review read [1], [2], [3] and see the video – (Simplest) Proof of the Twin Primes and Polignac’s Conjectures. https://www.youtube.com/watch?v=HCUiPknHtfY&t=940s [4]. Below is a list of the major properties of Prime Generators that comprise the mathematical foundation for the S|SoZ algorithms and code. Major Properties of Prime Generators • • • • • • • • • • • • • • • • • a prime generators has notational form: Pn = modpn * k + {r0 … rn} the modulus for prime generator with last prime value pn has primorial form: modpn = pn# the number of residues are even, with counts: rescntpn = (pn – 1)# = pn-1# the residues occur as modular complement pairs (mcp) to its modulus: modpn = ri + rj the last two residues of a generator are constructed as: (modpn - 1) (modpn + 1) the residues, by definition, will include all the coprime primes < modpn the first residue r0 is the next prime > pn the residues from r0 to r02 are consecutive primes each generator has a characteristic Prime Generator Sequence (PGS) of even size residue gaps the last 3 sequence gaps have form: (r0 - 1) 2 (r0 - 1) the gaps are distributed with a symmetric mirror image around a pivot gap size of 4 the residue gaps sum from r0 to (r0 + modpn) equals the modulus: modpn = Σai·2i the coefficients ai values are the frequency each gap of size 2i occurs in a PGS the sum of the coefficients ai values equal the number of residues: rescntpn = Σai coefficients a1 = a2 are odd and equal values with form: a1 = a2 = (pn – 2)# = pn-2# the coefficients ai are even values for i > 2 the number of coefficients ai in a sequence for Pn is of order pn-1 Residues have Canonical Form values (1...modpn-1), as 1 is always coprime to any modulus, but for coding|math efficiency their Functional Form values (r0…modpn+1) are used, with r0 defined above, and modpn+1 ≡ 1 modpn is the permuted first congruent value for 1. Also, as the residues exist as modular complement pairs the code determines their first half values and their 2nd half values come for FREE. To find the residues for a Pn, a smaller generator’s PGS (in the code for P3) is used to reduce the larger moduli number space to identify the residue candidates (rc) that need to be gcd checked. 2 Shown here is the primes candidates (pcs) table for P5 up to the 100th prime 541. It shows the only possible pc values that can be primes for 30 integer groupings. Each of the k columns is a residue group (resgroup) of prime candidates. The colored pc values are nonprime composites, and can be sieved out by the Sieve of Zakiya (SoZ), leaving only the prime values shown. P5 = 30 * k + {7, 11, 13, 17, 19, 23, 29, 31} k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 r0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 r1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 r3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 r4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 r5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 r6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 r7 31 Table 1. 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 Every PG represents a pcs table like this, which visually display all their properties. To identify all the Twin Primes we merely observe the residue pair values that differ by 2, (11, 13), (17, 19), (29, 31), and for Cousins those that differ by 4, (7, 11), (13, 17), (19, 23). These residues gaps form the basis for the Twins|Cousins SSoZ implementations, and other k-tuples of interest. To find larger constellations of prime pairs, et al, we merely identify the residue pairs of desired size. For Sexy Primes (p, p+6), we just use the pairs (7, 13), (11, 17), (13, 19), (17, 23), (23, 29), (31, 37). Using them, we easily see and count there are 47 Sexy Primes (with [5:11]) within the first 100 primes. Larger generators have more residues and larger gaps and enable identifying more desired size k-tuples. In my video [4], I define the residue gaps as the gaps between consecutive residues, and thus I refer to prime gaps as consecutive prime (2, n) tuples, with n any even number. Thus in the video I state there are 25 Sexy Primes in the table above, i.e. 25 pairs of consecutive primes that differ by 6. However in the academic math world, Sexy and Cousin primes are defined as any (2, 6) and (2, 4) tuple, thus [7:13] is a Sexy Prime even though we see 11 is between them. So [5:11] is defined as the first Sexy Prime and [3:7] the first Cousin, and [3:103] would be the first (2, 100) tuple, i.e. 2 primes that differ by 100. However, if you want to know and understand the true distribution of primes, what you want to know is the distribution of the gaps between consecutive primes, which I’ll define as prime gap kpg-tuples. So the actual first (2, 100) kpg-tuple is [396,733: 396,833], a very big difference. It’s from the kpg-tuples that inform you where the prime deserts are (long number stretches without primes), and characterize the true average thinning (density) of primes as the integers grow larger. And as shown and explained in [3] and [4], there are an infinity of consecutive prime gaps of any even size. Thus the PGS for the Pn’s provide a deterministic floor (minimum) value of the number of kpg-tuples of any size, and their prime values, over any range of numbers, which we can (in theory) create an SSoZ residues sieve to identify and count. 3 Shown here are the PG parameters for the first 9 Pn generators P2 – P23 where modpn = Here pn = is the prime value of the mth prime, thus: p2 = p1, p3 = p2, p5 = p3, p7 = p4,, etc. Pn’s modulus value modpn: (pn - 0)# = pn-0# = Π (pn - 0) = (2 - 0) * (3 - 0) * (5 - 0) … * (pm - 0) Number of residues rescnt: (pn - 1)# = pn-1# = Π (pn - 1) = (2 - 1) * (3 - 1) * (5 - 1) … * (pm - 1) # of twins|cousins pairscnt: (pn - 2)# = pn-2# = Π (pn - 2) = (2 - 2) * (3 - 2) * (5 - 2) … * (pm – 2) For P23 modulus: modp23 = 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 = 223092870 For P23 residues: rescount = 1 * 2 * 4 * 6 * 10 * 12 * 16 * 18 * 22 = 36495360 For P23 twins|cousin: pairs = 1 * 1 * 3 * 5 * 9 * 11 * 15 * 17 * 21 = 7952175 The primes number space % is: (rescntpn/modpn) * 100 = (pn-1# / pn#) * 100 The pairscnt number space % is: (pairscntpn*2/modpn) * 100 = (pn-2# / pn#) * 200 Pn P2 P3 P5 P7 P11 modulus (modpg) 2 6 30 210 2310 30030 510510 9699690 223092870 residues count (rescnt) 1 2 8 48 480 5760 92160 1658880 36495360 twins|cousins pairscnt 0 1 3 15 135 1485 22275 378675 7952175 primes % number space 50.00 33.33 26.67 22.86 20.78 19.18 18.05 17.10 16.36 pairs % number space Table 2. 8.73 7.81 7.13 50.00 33.33 20.00 14.29 11.69 P13 9.89 P17 P19 P23 As the Pn primorial primes pm increase, the number space containing primes and twins|cousins steadily decreases, and can be made an arbitrarily small value ε > 0 of the total number space as m→∞. Primes Number Space 50 45 Number Space % 40 35 30 25 primes pairs 20 15 10 5 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Number of Pn Primorial Primes This graph shows the decreasing prime number space for Pn using the first 100 primes. Once past the knee of the curve, the differential change becomes smaller for each additional pm. For many common use cases we can effectively limit usable Pn generators to the first 10 primes or so. However, for prime searches in large number values ranges, using the largest generator possible for a system is desirable, to make the maximum searchable number space as small as possible. 4 Generating Sieve Primes The SSoZ uses the necessary sieving primes ≤ (i.e. only those with multiples within the inputs range) to sieve out their nonprime multiples. An efficient coded P5 Sieve of Zakiya (SoZ) generates them at runtime (though other means can be used). Below is its algorithm. SoZ Algorithm To find all the primes ≤ N = 1. for Prime Generator P5, using its generator parameters 2. determine kmax, the number of residue groups (resgroups) up to N 3. create byte array prms[kmax] to represent the value|residue of each resgroup pc 4. perform outer sieve loop: • starting from the first resgroup, determine where each pc bit location is prime • if a bit location a prime, keep its residue value in prm_r; numerate its prime value • exit loop when prime > sqrt(N) 5. perform inner sieve loop with each residue ri: • create cross-product (prm_r * ri) • determine the resgroup kn it’s in, and its residue rn • compute first prime multiple resgroup kpm for the prime with ri • mark in prms each primenth kpm resgroup bitn[rn] as nonprime until its end 6. repeat from 4 for next resgroup 7. when sieve ends, numerate|store from each prms resgroup the needed sieving primes ≤ N P5’s primes candidates (pcs) table up to 541 (the 100th prime) is shown below. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 The function sozpg performs the P5 sieve exactly as shown. An array prms of kmax bytes is created to represent each resgroup|column of 8 pc values|rows up to the resgroup that covers the input value. Each row represents a residue value|bit position|residue track. prms is initialized to ‘0’ to make all bit positions be prime. The sieve computes for each prime ≤ its first prime multiple resgroup kpm on each row, and starting from these, sets each primenth resgroup bit on each row to ‘1’, to mark its multiples (colors), to eliminate the nonprimes. The process is explained in greater detail as follows. 5 Performing SoZ Sieve To sieve the nonprimes from P5’s pcs table up to 541 we use the primes ≤ isqrt(541)=23. They are the first 6 primes|residues: 7, 11, 13, 17, 19, 23, whose first unique multiples are shown with 6 different colors. The value 541 resides in residue group k=17, so kmax=18 is the number of resgroups up to it. Starting with the first prime in regroup k=0, 7 multiplies each pc in the resgroup, whose multiples are in blue: 7 * [7, 11, 13, 17, 19, 23, 29, 31] = [49, 77, 91, 119, 133, 161, 203, 217]. Each 7th resgroup|col along each restrack|row from these start values are 7’s multiples. Thus 7 * 7 = 49 in resgroup k=1, on rt4|r=19 is 7’s first multiple. Every 7th regroup starting there (k=1, 8, 15) < kmax on rt4 is a multiple of 7 and set to ‘1’ to mark as nonprime. We repeat for 7’s other first multiples 77, 91, etc, on their rows. We then use the next prime location in resgroup k=0 after 7, which is 11, and repeat the process with it. 11 * [7, 11, 13, 17, 19, 23, 29, 31] = [77, 121, 143, 187, 209, 253, 319, 341], whose first unique multiples are red. Note, the first unique multiple for each prime is its square, which for 11 is 121. The first multiples with smaller primes, e.g. 11* 7 = 77, are colored with those primes colors (here 7|blue). Also note, each prime must multiply each member in its resgroup, whether prime or not, to map its starting first prime multiple onto each distinct row in some kpm resgroup. As shown, this process is very simple and fast, and we can perform the multiplications very efficiently. We can also perform the sieve and primes extraction process in parallel, making it even faster. Extracting Sieve Primes To extract the primes from prms in sequential order, we start at resgroup k=0 and iterate over each byte bit, then continue with each successive byte. A ‘0’ bit position represents a prime value in each byte, and if ‘1’ we skip to the next bit. The prime values are numerated as: prime = modpg * k + ri, with k the resgroup index, ri the residue for the bit position, and modpg = 30 for P5’s modulus. Alternatively we can reverse the order, and for each bit row, iterate over each resgroup byte and find the primes along them. This may provide certain software computational advantages, but the primes will no longer be extracted in sequential order (though if necessary they could be sorted afterwards). For the purposes of the SSoZ algorithm, it’s not necessary the primes be used in sequential order. To optimize performance of the SSoZ, during the prime sieve extraction process, primes which don’t have multiples within the inputs range are filtered out. This significantly increases SSoZ performance for small input ranges between large input numbers, by reducing the work the residues sieves do. The algorithm described here is generic to all Pn generators, where only their parameters change for each. Implementations may vary based on hardware|software particulars, but the work performed is the same. Larger generators systematically reduce the primes number space, by having larger modulus sizes and more residues, but we generally want to pick the smallest Pn generator that optimizes the system resources for given input values and ranges. For the implementations provided, whose inputs range are constrained to 64-bits, using P5 to perform the SoZ to generate the sieve primes with was the overall most efficient choice, as it’s straightforward to code, and as we’ll see, can also be done in parallel to increase its performance. 6 Efficient residue multiplications To find the resgroup (column) for a pc value in the table we integer divide it by the PG modulus. To find its residue value, we find its integer remainder when dividing by the PG modulus. Thus each pc regroup value has parameters: k = pc div modpg, with residue value: ri = pc mod modpg. Multiplying two regroup pcs e.g. (17 * 19) = 323 gives: k, ri = (17 * 19).divmod 30 –> k = 10, ri = 23. From P5’s pc table, we see pc = 323 is in resgroup k=10 with residue 23 on restrack rt5. Each prime can be parameterized by its residue r and resgroup k values e.g.: prime = modk + r, where modk = modpg * k for each resgroup, and each resgroup pc_i has form: pc_i = modk + ri. Thus the multiplication – (prime * pc_i) – translates into the following parameterized form: The original multiplication has now been transformed to the form: product = modpg * kk + rr where kk = k * (prime + ri) and rr = r * ri, which also has the general form: pc = modpg * k + r. The (r * ri) term represents the base residues (k = 0) cross products (which can be pre-computed). We extract from it its resgroup value: kn = (r * ri) / modpg, and residue: rn = (r * ri) % modpg, which maps to a restrack bit value as rt_n = residues.index(rn). Thus for P5, r = 7 is at residues[0], so that its rt_i row value is: i = residues.index(7) = 0, whose bit mask is: bit_r = 2i = (1 << i) in the code. Thus, the product of two members in resgroup k maps to a higher resgroup: kp = kk + kn on rt_n, comprised of two components; kn (their cross-product resgroup), and kk (their k resgroup component). To describe this verbally, to find the product resgroup kp of any two resgroup members, numerate one member (for us a prime), call its residue r, add the other’s residue ri to it, multiply their sum by the resgroup value k, then add it to their residues cross-product resgroup. For (97 * 109) with k = 3 gives: Ex: kp = (97 * 109) / 30 = 3 * (97 + 19) + (7 * 19) / 30 = 3 * (109 + 7) + (19 * 7) / 30 = 352 For each Pn the last resgroup pc value is: (modpg + 1) ≡ 1 mod modpg, so for P5, its modpg*k + 31. To ensure pc / modpg = k always produces the correct k value, 2 is subtracted before the division. Thus the resultant residue value is 2 less than the correct one, so 2 is added back to get the true value. In sozpg: kn, rn = (prm * ri - 2).divmod md; kn is the correct resgroup and (rn + 2) the correct residue. The code uses rn without the addition sometimes when doing memory addressing. (In the code, the posn array performs the mapping at address (r – 2) into restrack rtn indices 0 – 7). Ex: (7 * 43) / 30 = 301 / 30 = 10, but 301 is the last pc in resgroup 9, so (301 – 2) / 30 is correct value. Also 301 % 30 = 1, but 299 % 30 = 29, and when 2 is added we get the correct residue 31 for pc 301. 7 sozpg def sozpg(val, res_0, start_num, end_num) # Compute the primes r0..sqrt(input_num) and store in 'primes' array. # Any algorithm (fast|small) is usable. Here the SoZ for P5 is used. md, rscnt = 30u64, 8 # P5's modulus and residues count res = [7,11,13,17,19,23,29,31] # P5's residues bitn = [0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128] kmax = (val - 2) // md + 1 prms = Array(UInt8).new(kmax, 0) modk, r, k = 0, -1, 0 # number of resgroups upto input value # byte array of prime candidates, init '0' # initialize residue parameters loop do # for r0..sqrtN primes mark their multiples if (r += 1) == rscnt; r = 0; modk += md; k += 1 end # resgroup parameters next if prms[k] & (1 << r) != 0 # skip pc if not prime prm_r = res[r] # if prime save its residue value prime = modk + prm_r # numerate the prime value break if prime > Math.isqrt(val) # exit loop when it's > sqrtN res.each do |ri| # mark prime's multiples in prms kn,rn = (prm_r * ri - 2).divmod md # cross-product resgroup|residue bit_r = bitn[rn] # bit mask for prod's residue kpm = k * (prime + ri) + kn # resgroup for 1st prime mult while kpm < kmax; prms[kpm] |= bit_r; kpm += prime end end end # prms now contains the nonprime positions for the prime candidates r0..N # extract only primes that are in inputs range into array 'primes' primes = [] of UInt64 # create empty dynamic array for primes prms.each_with_index do |resgroup, k| # for each kth residue group res.each_with_index do |r_i, i| # check for each ith residue in resgroup if resgroup & (1 << i) == 0 # if bit location a prime prime = md * k + r_i # numerate its value, store if in range # check if prime has multiple in range, if so keep it, if not don't n, rem = start_num.divmod prime # if rem 0 then start_num is multiple of prime primes << prime if (res_0 <= prime <= val) && (prime * n <= end_num - prime || rem == 0) end end end primes end Inputs: val – integer value for res_0 – first residue for selected SSoZ Pn end_num – inputs high value start_num – inputs low value Output: primes – array of sieving primes within inputs range sieves the prime multiples ≤ val to create P5’s pcs table held in byte array prms, as described. To extract only the necessary primes for the SSoZ it uses inputs: res_0, start_num, end_num sozpg is the first residue of the selected Pn for the SSoZ. For P5 it’s 7, but when Pn is larger, e.g. P7, P11, P13 etc, their res_0 are greater, i.e. 11, 13, 17, etc, so only the primes ≥ res_0 are kept. The last byte prm[kmax-1] may also have bit positions for primes > val, which aren’t needed and are discarded. res_0 We thus perform two checks for each found prime, the first being: (res_0 <= prime <= val) This filters out from P5’s pcs table the primes outside the SSoZ inputs range for the selected Pn. The second check filters out the primes without multiples within the SSoZ inputs range. For small input ranges, primes > the range size can be discarded if they don’t have multiples within it. This is done by the check: (prime * n <= end_num - prime || rem == 0) 8 All the primes ≤ range = ( ( – – are used if their values are ≤ range = ( – ). But if )< some sieving primes may be discarded, i.e. when )< some primes may not have multiples within the range. Example: ( = 4,000,000; – )< (4,000,000 – 2,000) < 3,998,000 < = 2,000 If ≤ 3,998,000; say 500,000; the input range is ≥ 1999, the largest prime less than 2000, and all the primes < will have at least one multiple in the range, and must be used. If > 3,998,000, say 3,999,300, the primes < 700 (the input range) will have multiples in the range; 122 for P5. But some of the 178 primes between 700 < p < 2,000 will not, and can be discarded. The second test finds 103 are needed. So for P5 only 75% (225 of 300) of the primes < 2000 are used. Described below is the process to determine if a prime p has at least one multiple in the inputs range. | ––– p ––– | |rem | | np+p 1p…2p…3p…..np….|-------+-----------------------| start_num end_num Here, n*p + rem = , where n is the number of prime’s multiples e.g. np ≤ . If rem = 0 then is a multiple of p, otherwise 0 < rem < p. If p > , n = 0. Thus (n*p + p) = p*(n + 1) is the next multiple of p whose value is > . If p*(n + 1) ≤ p is in range, if not, but rem = 0, then p*n = , and p is in range. To code, for every prime we do: n = start_num // prime; rem = start_num % prime In Crystal, et al, we can just do: n, rem = start_num.divmod prime Then we perform the above tests as: prime * (n + 1) <= end_num || rem == 0 To avoid arithmetic overflow we do: prime * n <= end_num - prime || rem == 0 Also, when performing: kn, rn = (prm_r * ri - 2).divmod md, rn’s true value is reduced by 2, but we need to know its true residue bit position to mark the prime multiples for those bit positions. Conceptually, given residue rn, its bit index is: posn[rn] = res.index(rn), for P5 a value from 0..7. Because the rn values are 2 less than their real values, (rn – 2) is used as their addresses into the array posn used to map them, coded as: posn=[];(0..rscnt-1).each { |n| posn[res[n]-2] = n } Then posn[7-2] = 0, posn[11-2] = 1, etc, and each rn bit value is: bit_r = 1 << posn[rn], which are OR’d into prms to mark the prime multiples as: prms[kpm] |= bit_r. The shift values 2i can be converted to their bit position values directly using array bitn[] e.g. now: bit_r= bitn[rn] posn =[0,0,0,0,0,0,0,0,0,1,0,2,0,0,0,3,0, 4,0,0,0, 5,0,0,0,0,0, 6,0, 7] bitn =[0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128] In both cases byte arrays can be used to store the values, as they all can be represented by just 8 bits. This is an implementation detail to decide. 9 Because the processing of each row is independent from the others we can perform both the sieve and prime extraction processes in parallel. Below shows Rust code using the Rayon crate to do this. fn atomic_slice(slice: &mut [u8]) -> &[AtomicU8] { unsafe { &*(slice as *mut [u8] as *const [AtomicU8]) } } fn sozpg(val: usize, res_0: usize, start_num : usize, end_num : usize) -> Vec<usize> { // Compute the primes r0..sqrt(input_num) and store in 'primes' array. // Any algorithm (fast|small) is usable. Here the SoZ for P5 is used. let (md, rscnt) = (30, 8); // P5's modulus and residues count static RES: [usize; 8] = [7,11,13,17,19,23,29,31]; static BITN: [u8; 30] = [0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128]; let let let let kmax = (val - 2) / md + 1; mut prms = vec![0u8; kmax]; sqrt_n = val.integer_sqrt(); (mut modk, mut r, mut k) = (0, 0, 0 // number of resgroups upto input value // byte array of prime candidates, init '0' // compute integer sqrt of val ); loop { // for r0..sqrtN primes mark their multiples if r == rscnt { r = 0; modk += md; k += 1 } if (prms[k] & (1 << r)) != 0 { r += 1; continue } // skip pc if not prime let prm_r = RES[r]; // if prime save its residue value let prime = modk + prm_r; // numerate the prime value if prime > sqrt_n { break } // exit loop when it's > sqrtN let prms_atomic = atomic_slice(&mut prms); // share mutable prms among threads RES.par_iter().for_each (|ri| { // mark prime's multiples in prms in parallel let prod = prm_r * ri - 2; // compute cross-product for prm_r|ri pair let bit_r = BITN[prod % md]; // bit mask for prod's residue let mut kpm = k * (prime + ri) + prod / md; // 1st resgroup for prime mult while kpm < kmax { prms_atomic[kpm].fetch_or(bit_r, Ordering::Relaxed); kpm += prime; }; }); r += 1; } // prms now contains the nonprime positions for the prime candidates r0..N // numerate the primes on each bit row in prms in parallel (won't be in sequential order) // return only the primes necessary to do SSoZ for given inputs in array 'primes' let primes = RES.par_iter().enumerate().flat_map_iter( |(i, ri)| { prms.iter().enumerate().filter_map(move |(k, resgroup)| { if resgroup & (1 << i) == 0 { let prime = md * k + ri; let (n, rem) = (start_num / prime, start_num % prime); if (prime >= res_0 && prime <= val) && (prime * n <= end_num - prime || rem == 0) { return Some(prime); } } None }) }).collect(); primes } Here the primes are extracted from each row in parallel using 8 threads, thus not kept in sequential order. Reversing the loops, as in the Crystal code, will extract them in order but will be slower as the number of resgroups increase. Since sequential order isn’t necessary to do the SSoZ this is optimal. For systems with more than 8 threads, using P7 with 48 residues may be faster, especially for large input values, if P7’s smaller number space can be processed faster with those threads than using P5. We can see the performance gain that’s achieved between using all the sieving primes for end_num, to only using those with multiples within the inputs ranges, to then generating them in parallel in sozpg. The following examples using Rust show the three cases and the progressive performance increases. 10 This is the Rust output of the original unoptimized sozpg using these two 63-bit numbers as inputs. It shows (in nextp[2 x 129900044]) 129,900,044 sieving primes were generated, which accounted for most of the setup time. The times shown are for the i7 6700HQ 4C|8T and AMD 5900HX 8C|16T cpus. $ echo 7200011140000000000 7200011139993250000 | ./twinprimes_ssoz157 threads = 8 // 16 using Prime Generator parameters for P5 segment size = 65536 resgroups; seg array is [1 x 1024] 64-bits twinprime candidates = 675003; resgroups = 225001 each of 3 threads has nextp[2 x 129900044] array setup time = 13.098702568 secs // 7.089318922 secs perform twinprimes ssoz sieve 3 of 3 twinpairs done sieve time = 9.731177018 secs // 4.944145598 secs total time = 22.829885781 secs // 12.033471504 secs last segment = 28393 resgroups; segment slices = 4 total twins = 4711; last twin = 7200011139999998808+/-1 These are the result from filtering out the unnecessary primes (no multiples in inputs range), using 49x fewer primes – 2,636,377. Though there’s some setup time increases for 8 threads, there’s a massive decrease in the sieve time, as each thread now does significantly less work (and use less memory). $ echo 7200011140000000000 7200011139993250000 | ./twinprimes_ssoz158 threads = 8 // 16 using Prime Generator parameters for P5 segment size = 65536 resgroups; seg array is [1 x 1024] 64-bits twinprime candidates = 675003; resgroups = 225001 each of 3 threads has nextp[2 x 2636377] array setup time = 13.743127493 secs // 6.987116498 secs perform twinprimes ssoz sieve 3 of 3 twinpairs done sieve time = 0.175270322 secs // 0.107544045 secs total time = 13.918427314 secs // 7.094673324 secs last segment = 28393 resgroups; segment slices = 4 total twins = 4711; last twin = 7200011139999998808+/-1 Finally, when sozpg performs the prime generation and filtering process in parallel the setup times drop from 13.7|6.9 to 5.3|4.7 secs, with a total time drop from 22.8|12.0 to ~5.5|4.9 secs. $ echo 7200011140000000000 7200011139993250000 | ./twinprimes_ssoz159 threads = 8 // 16 using Prime Generator parameters for P5 segment size = 65536 resgroups; seg array is [1 x 1024] 64-bits twinprime candidates = 675003; resgroups = 225001 each of 3 threads has nextp[2 x 2636377] array setup time = 5.296482074 secs // 4.74022821 secs perform twinprimes ssoz sieve 3 of 3 twinpairs done sieve time = 0.180924203 secs // 0.116552963 secs total time = 5.477426691 secs // 4.856791579 secs last segment = 28393 resgroups; segment slices = 4 total twins = 4711; last twin = 7200011139999998808+/-1 11 Constructing nextp nextp is a table of the resgroups for the first prime multiples for the sieving primes along each restrack. From P5’s pcs table we can look at each row and create Table 3 of their first prime multiples resgroups. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 Table 3. List of resgroup values for the first prime multiples – prime * (modk + ri) – for the primes shown. rt res 0 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 7 7 6 8 6 8 22 22 7 75 64 70 64 104 104 75 203 182 192 1 11 5 11 7 7 18 5 18 11 65 83 67 67 65 96 83 185 215 187 2 13 4 8 13 16 4 8 16 13 60 72 87 92 72 92 87 176 196 221 3 17 2 2 12 17 14 14 12 17 50 50 84 95 86 84 95 158 158 216 4 19 1 10 5 9 19 17 10 19 45 80 61 73 93 80 99 149 210 177 5 23 6 4 4 10 10 23 6 23 72 58 58 76 107 72 107 198 172 172 6 29 3 6 9 3 6 9 29 29 57 66 75 57 75 119 119 171 186 201 7 31 2 3 2 12 11 12 27 31 52 55 52 82 82 115 123 162 167 162 Note on each row, when two primes have the same resgroup table value they were multiplied. When only one value occurs, its either for a prime square, or a (prime * nonprime) value. Also, for a prime in any resgroup k, its first prime multiple resgroup value on its own row is just: prime * (k + 1) + k For P5’s pcs table this is equivalent to: k * (prime + 31) + ((prm_r * 31) - 2) / 30 (This is a property for every pc member in a resgroup for every Pn, for its first multiple on its row). To construct Table 3, each prime in P5’s pcs table multiplies each regroup member, whose products are other table values. Their row|col cell locations are entries into nextp. Thus starting with first prime 7: 7 * [7, 11, 13, 17, 19, 23, 27, 29, 31] = [49, 77, 91, 119, 133, 161, 203, 217] We see in P5’s pcs table, 49 occurs in resgroup k=1 for residue value 19, which is residue track 4 (rt4). Similarly for the remaining multiples of 7, we see their placement in the table. Repeating this process for each prime, we compute their first multiples, then determine their resgroup value for each restrack. 12 These first prime multiple locations in Table 3 are used to start marking off successive prime multiples along each restrack|row. The SoZ computes each prime’s multiples on the fly once and doesn’t need to store them for later use. The SSoZ computes an initial nextp for the inputs range first segment, which is updated at the end of each segment slice to set the first prime multiples for the next segment(s). For each sieve prime we compute its first multiple resgroup k for the restracks of interest, e.g. for twin pair residues. We then determine its regroup k’≥ kmin, where kmin is the resgroup for the start_num, input value (kmin = 1 if one input given). Thus k’≥ 0 is the number of resgroups starting from kmin. In the picture below, k is a prime’s 1st multiple resgroup on a row, and k’its projection relative to kmin. If k ≥ kmin, then k’= k - kmin. Thus if kmin = 3 and k = 7, k’=4 is its first resgroup inside the segment starting at kmin. If k = kmin then k’= 0, i.e. that first prime multiple starts at the segment’s beginning. | ––– p ––– | k |rem | k’ |.…..…..……….|…...|--------|----------------------kmin If k < kmin, we compute prime’s multiple closest to kmin, i.e. where k’= 0...prime-1 resgroups ≤ kmin: k’ k’ = (kmin - k) % prime = prime - k’ if k’ > 0 –> value of rem in picture –> translated k’ value > kmin Ex: for prime 7 on rt0, let k = 7, kmin = 21: then k’ = (21 - 7) % 7 = 0; to start from (multiple of 7). Ex: for prime 7 on rt0, let k = 7, kmin = 25: then k’ = (25 - 7) % 7 = 4; k’ = 7 - 4 = 3; to start from. In software, we can reassign the variable k to use for k’, so the (Crystal, et al) code just becomes: k < kmin ? (k = (kmin - k) % prime; k = prime - k if k > 0) : k -= kmin It should be noted, while the sieve primes have at least 1 multiple within the inputs range, some may not have multiples on each restrack, especially for small ranges, and for them k > kmax. If this happens for both residue pairs, those primes could be discarded from the primes lists for those residues sieves. For general purposes though, it won’t happen enough to increase performance to justify the extra code. To make the process|code simple, the k values for each sieve prime are generated and stored in nextp, without worry if they’re > kmax. If a prime’s k is larger than a segment size its skipped for it (not used to mark prime multiples) and reduced|updated by kn with smaller values for the next segment(s). When less than a segment size, it’s used in the residues sieve to mark prime multiples. Thus in twins_sieve, only primes with multiples in a segment for each restrack are used to mark prime multiples, or skipped. A unique nextp array is created for each residues pair in each thread for the sieving primes. Thus for twin|cousin primes, nextp holds their first prime multiples resgroups values for each segment slice for both residue pairs restracks. Thus its memory increases with inputs values (more sieving primes) and larger generators (more residue pairs), though active memory use will be determined by the number of parallel threads holding onto memory. How different languages manage memory affects the size and throughput they can achieve for various inputs and ranges, for a system’s memory sizes and profile. 13 Creating nextp for SSoZ In the SoZ, a prime’s residue r multiplies each Pn residue ri and (r * ri) mod modpg maps to a unique restrack rt in some resgroup k, is the starting point to mark off that prime’s multiples for that ri. We now want to multiply r by the ri that makes (r * ri) be on a given restrack rt, for each sieving prime. Thus if for some ri, (r * ri) mod modpg = rt, to find the ri that maps each r to a specific rt we do: Where for r-1, r_inv = modinv(r, modpg) in the code, with r being the residue for a sieve prime. (A property of prime generators is that every residue has an inverse, either itself or another residue.) Now kn = (r * ri - 2) / modpg, and k = (prime - 2) / modpg, so again: kpm = k * (prime + ri) + kn If r_inv is a prime’s residue inverse, and rt the desired restrack: ri = (r_inv * rt - 2) mod modpg + 2 For each residues pair, nextp_init creates the nextp array of the sieve primes first resgroup multiples relative to kmin, for the rt values r_lo and r_hi, the upper|lower residues pair. With no loss of generality, it can be used to construct nextp for any architecture for any number of specified restracks. nextp_init def nextp_init(rhi, kmin, modpg, primes, resinvrs) # Initialize 'nextp' array for twinpair upper residue rhi in 'restwins'. # Compute 1st prime multiple resgroups for each prime r0..sqrt(N) and # store consecutively as lo_tp|hi_tp pairs for their restracks. nextp = Slice(UInt64).new(primes.size*2) # 1st mults array for twinpair r_hi, r_lo = rhi, rhi - 2 # upper|lower twinpair residue values primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) k = (prime - 2) // modpg # find the resgroup it's in r = (prime - 2) % modpg + 2 # and its residue value r_inv = resinvrs[r].to_u64 # and residue inverse rl = (r_inv * r_lo - 2) % modpg + 2 # compute r's ri for r_lo rh = (r_inv * r_hi - 2) % modpg + 2 # compute r's ri for r_hi kl = k * (prime + rl) + (r * rl - 2) // modpg # kl 1st mult resgroup kh = k * (prime + rh) + (r * rh - 2) // modpg # kh 1st mult resgroup kl < kmin ? (kl = (kmin - kl) % prime; kl = prime - kl if kl > 0) : (kl -= kh < kmin ? (kh = (kmin - kh) % prime; kh = prime - kh if kh > 0) : (kh -= nextp[j * 2] = kl.to_u64 # prime's 1st mult lo_tp resgroup val nextp[j * 2 | 1] = kh.to_u64 # prime's 1st mult hi_tp resgroup val end nextp end Inputs: rhi – hi residue value for this twinpair kmin – resgroup value for start_num modpg – modulus value for chosen pg primes – array of sieving primes resinvrs kmin) kmin) in range in range Output: nextp – array of primes 1st mults for given residues – array of residues modular inverses 14 Twins|Cousins SSoZ Let’s now construct the process to find twin primes ≤ N with a segmented sieve, using our P5 example. Twin primes are consecutive odd integers that are prime, the first two being [3:5], and [5:7]. Thus from our original P5 pcs table, we use just the consecutive pc residue tracks, whose residues table is below. A twin prime occurs when both twin pair pc values in a column are prime (not colored), e.g. [191:193]. Table 4. Twin Primes Residues Tracks Table for P5(541). k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 We see from the table the twin pair residue tracks for [11:13] has 10 twin primes ≤ 541, [17:19] has 6, and [29:31] has 7. Thus, the total twin prime count ≤ 541 is 23 + [3:5] + [5:7] = 25, with the last being [521:523]. Twin primes are usually referenced to the mid (even) number between the upper and lower consecutive odd primes pair, so the last (largest) twin pair ≤ 541 for [521:523] is written as 522 ± 1. As shown before, the number of twin|cousin residue pairs are equal to: (pn - 2)# = pn-2# = Π (pn – 2) Thus P5 has 3 residue pairs for each. Below are the three Cousin Prime pairs taken from P5’s pcs table. Table 5. Cousin Primes Residues Tracks Table for P5(541). k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 The SSoZ algorithm is the same for both, with their coding only differing to deal with accounting for low input values ranges, as the first cousin prime is defined as [3:7] and first twins are [3:5], [5:7]. Up to 541, there are 25 twin and 27 cousin primes. Their ratio over increasingly larger input ranges remains close to unity (1), as their pairs count, and pair prime values, increase without end [3], [4]. 15 Residues Sieve Description The Segmented Sieve of Zakiya (SSoZ) is a memory efficient way to find the primes using a given Pn. For an input range defined by a start_num and end_num, it divides the range into segments, which are efficiently sized to fit into usable memory for processing. This allows the reuse of the same memory to process long number ranges that otherwise would require more memory than a system has to use. A standard segment slice is ks resgroups, with last one ks’ usually less. For a given Pn and range size set_sieve_parameters determines its optimal memory size, which is set to be a multiple of 64 (bits). | Fig. 1 ks | ks | ks | | ks ks | ks | ks | ks’ kmin | kmax |…………|…………|…………|…………|…….…..|…………|…………|……....| start_num end_num Here start|end_num are the lo|hi values that define a number range of interest. They also define the absolute values for kmin and kmax for a given Pn generator, as these resgroups cover these input values. When only one input is given it becomes end_num, whose resgroup determines kmax, and start_num is set to 3 (low prime for first twin [3:5]), and kmin set to 1 (min number of resgroups). The residue sieve adjusts kmin|kmax for each residues pair when necessary, to ensure only their pc values within the inputs range are processed. For example, if start_num = 342 and end_num = 540, we see below the valid in-range pc values. Here kmin = 12 and kmax = 18. For twinpair [11:13], 341 < 342, so kmin for it is increased to 13. Then for [29:31], pc 541 > 540 is outside the range, so kmax for it is reduced to 17, and now all its resgroup values are in the range. For twinpair [17:19] no adjustment is needed (done). We can simplify this by just looking at the residue values for start|end_num and check if they’re within the residue pairs range. Thus for each residues pair, we check if r_lo is < (start_num - 2) % modpg + 2 (start_num’s residue) and if so increment kmin, then if r_hi > (end_num - 2) % modpg + 2 (end_num’s residue), and decrement kmax if so. In twins_sieve the adjusted kmin|kmax are first determined then used in nextp_init to create the sieve primes first k resgroups to start marking their multiples in the first seg. Table 6. Twin Primes Residues Tracks Table for range 342 – 540. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 16 In twins_sieve segment array seg, its resgroups size ks is a multiple of 64-bit mem elements, where each bit represents a residues pair resgroup. Thus a resgroup k maps to bit: (k mod 64) in mem elem seg[k / 64], where (k mod 64) masks k’s lower 6 bits: (k & 0x3F), and (k / 64) right shifts k by 6 bits. This is coded as: seg[(kn - 1) >> 6], bit value: 1 << ((kn - 1) & 63), (>>|<< are right|left bit-shift opts). Ex: for ks = 131072 resgroups, seg size is 2048 64-bit mem elements for resgroup k = 89257, it maps to seg[1394], bit 240, mem value = 1 << 40 = 1099511627776 |……………………. ks …………………...| Fig. 2 ki ki+kn |….…|……|……|……|…~~~…|……|…….| seg[0] seg[kn-1] is the absolute resgroup value to start each segment slice (in Fig. 1) initialized to kmin-1 (0 indexed arrays). kn is the resgroups size for each segment slice. It’s initialized to ks, but if the last segment slice ks’ < ks resgroups it’s set to its slice size. ki To sieve for twin primes, etc, each instance of twins_sieve processes a unique twinpair for the entire inputs range split into ks resgroup size segments. It first determines the adjusted kmin|kmax values for the twinpair residues, then creates their initial nextp array of first resgroup sieve prime multiples k values. Using them, it iterates over the sieve primes, computes|updates their prime multiples k values, and sets them to ‘1’ in seg for each residues pair, until k > kn, the k value past the end of the current segment. When k > kn it updates it to: k = k – kn, which is the first k multiple value into the next segment, and stores it back into nextp for that prime to update it to use for the next segment(s). This is the Crystal code to mark a prime’s resgroup multiples in seg to ‘1’. This is done for the lo|hi residues pair, and if either resgroup member is a prime’s multiple that resgroup isn’t a twinprime. k = nextp.to_unsafe[j * 2] # while k < kn # seg[k >> s] |= 1_u64 << (k & bmask) k += prime end # nextp.to_unsafe[j * 2] = k - kn # starting from this resgroup in seg mark primenth resgroup bits prime mults set resgroup for prime's next multiple save 1st resgroup in next eligible seg When the residues sieve finishes seg contains the resgroup bit positions for the twin primes. Because seg is set to all ‘0’s to start each segment, we need to set to ‘1’ any unused hi bits in its last mem elem ks’ is in when it’s not a multiple of 64. Algorithmically this only needs to be done for the last segment. However, doing it after every segment is faster in software, as it eliminates the branching code to check for the last segment, and is more efficient to compile|run. Below is the Crystal code to perform this. seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) If kn = 89257 for the last segment, only the first 1395 64-bit seg mem elems are used, up to the 41st bit in the last elem, so we need to set to ‘1’ its bit values 241..263, because (89257-1 & 63) = 40, for bit 240. Thus we invert 1 to be: 11111111..1110 and left-shift it 40 bits, which is ORed with the last mem elem. If kn is a multiple of 64, (kn – 1) & bmask = 63, shifts the bits to be all 0s, and thus when ORed doesn’t change seg’s last mem value. Thus left shifts of n = 0..62 bits mask all the upper bit values: 263... 2n+1. 17 Once all the nonprime bits are set we can count|numerate the primes. We read each seg[0..kn-1] and invert the bits, and use popcount to count the ‘1’s (as primes) for each seg[i] (the Rust code counts the ‘0’s directly), and sum their segment count in variable cnt. If cnt > 0 we find the largest prime resgroup in the segment. We first update the total pairs count with sum += cnt. Then upk is set to the last resgroup value in the segment, then loops backward checking for the first bit that’s prime (‘0’), and then upk holds the largest|last prime pair resgroup in the segment. Its absolute resgroup value in the inputs range is then: hi_tp = ki + upk. For each segment slice its value is updated to a larger value, and at the end holds the largest absolute resgroup for these residues pair in the inputs range. The r_hi prime value is numerated and returned as: hi_tp * modpg + r_hi, along with the total prime pairs count in the range, in variable sum. seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) cnt = 0 # count the twinprimes in the segment seg[0..(kn - 1) >> s].each { |m| cnt += (~m).popcount } if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg count back to largest tp while seg[upk >> s] & (1_u64 << (upk & bmask)) != 0; upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end can be modified for different purposes. The code to find the largest prime pair can be removed if all you want is their count. I also originally had code to print out the r_hi primes in each segment as a validity check (only for small ranges). However, if you really wanted to see|record the twins, a better way may be to return ki|seg for each segment and externally store|process them later for any desired range of interest. (This, of course, would be very memory intensive.) twins_sieve Twin Primes Example Using our example to find the twin primes ≤ 541 with P5, let’s see how to processes the first twin pair residues [11:13] with kmax = 18. twin_sieve can perform the sieve for each pair in a separate thread. sets the segment size, but here I’ll set it to ks = 6. Thus, the seg array will represent 6 resgroups. Below is the twin pair table for [11:13] separated it into 3 segment slices of 6 resgroups each. Underneath it is what each seg array will look like after processing for each slice. (seg conceptually is a bitarray, so each seg[i] is just 1 bit. I later show an implementation using a bitarray, which makes the code simpler|shorter, and faster, depending on a language’s implementation.) set_sieve_parameters Table 7. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt11 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt13 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 k 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 seg 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 0 1 0 initializes netxp for the sieve primes [7, 11, 13, 17, 19, 23] for residues 11 and 13, taking the values shown in Table 3. For each lo|hi residue, their k values are stored as consecutive pairs in nextp and seg is created and initialized to all primes (‘0’). nextp_init 18 j 0 1 2 3 4 5 primes 7 11 13 17 19 23 Initial nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 5 11 7 7 18 5 rt_13 4 8 13 16 4 8 k 0 1 2 3 4 5 seg 0 0 0 0 0 0 For each prime j in primes, nextp[2j|2j+1] give the pairs k’s to start marking off prime’s multiples (by incrementing k by prime’s value). When k > kn, (here kn is always 6), it’s reduced by it: k = k - 6, and updates nextp with the new k values for the next segment. Below shows the changes to nextp and seg in twins_sieve. (It’s coincidental here the index size for primes and nextp are the segment size.) seg 1 Start for Segment 1 nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 5 11 7 7 18 5 rt_13 4 8 13 16 4 8 k 0 1 2 3 4 5 seg 0 0 0 0 1 1 Start for Segment 2 nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 6 5 1 1 12 22 rt_13 5 2 7 10 17 2 seg 2 k 0 1 2 3 4 5 seg 0 1 1 0 0 1 Start for Segment 3 nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 0 10 8 12 6 16 rt_13 6 7 1 4 11 19 seg 3 19 k 0 1 2 3 4 5 seg 1 1 0 0 1 0 Below is the Crystal code to perform the residues sieve (here for twins) for a given residues pair. twins_sieve def twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) # Perform in thread the ssoz for given twinpair residues for kmax resgroups. # First create|init 'nextp' array of 1st prime mults for given twinpair, # stored consequtively in 'nextp', and init seg array for ks resgroups. # For sieve, mark resgroup bits to '1' if either twinpair restrack is nonprime # for primes mults resgroups, and update 'nextp' restrack slices acccordingly. # Return the last twinprime|sum for the range for this twinpair residues. s = 6 # shift value for 64 bits bmask = (1 << s) - 1 # bitmask val for 64 bits sum, ki, kn = 0_u64, kmin-1, ks # init these parameters hi_tp, k_max = 0_u64, kmax # max twinprime|resgroup seg = Slice(UInt64).new(((ks - 1) >> s) + 1) # seg array of ks resgroups ki += 1 if r_hi - 2 < (start_num - 2) % modpg + 2 # ensure lo tp in range k_max -= 1 if r_hi > (end_num - 2) % modpg + 2 # ensure hi tp in range nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array while ki < k_max # for ks size slices upto kmax kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) # for lower twinpair residue track k = nextp.to_unsafe[j * 2] # starting from this resgroup in seg while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2] = k - kn # save 1st resgroup in next eligible seg # for upper twinpair residue track k = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2 | 1]= k - kn # save 1st resgroup in next eligible seg end # set as nonprime unused bits in last seg[n] # so fast, do for every seg[i] seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) cnt = 0 # count the twinprimes in the segment seg[0..(kn - 1) >> s].each { |m| cnt += (~m).popcount } # invert to count ‘1’s if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg, count back to largest tp while seg.to_unsafe[upk >> s] & (1_u64 << (upk & bmask)) != 0; upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end ki += ks # set 1st resgroup val of next seg slice seg.fill(0) if ki < k_max # set next seg to all primes if in range end # when sieve done, numerate largest twin # for ranges w/o twins set largest to 1 hi_tp = (r_hi > end_num || sum == 0) ? 1 : hi_tp * modpg + r_hi {hi_tp.to_u64, sum.to_u64} # return largest twinprime|twins count end Inputs: ks – resgroups segment size rhi – hi residue value for this twinpair modpg – modulus value for chosen pg kmin – total number resgroups upto for start_num kmax – total number resgroups upto for end_num primes – array of sieving primes resinvrs – array of modular inverses for residues end_num – inputs high value start_num – inputs low value Outputs: sum – count of twinpairs for input range hi_tp – hi prime for largest twinprime in range 20 Starting with Crystal 1.4.0 (April 7, 2022) its bitarray implementation was highly optimized, making it faster than the 64-bit mem array for seg on the AMD 5900HX, while making the code substantially simpler to read|write and shorter. Below is the Crystal version using a bitarray for the seg array. def twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) # Perform in thread the ssoz for given twinpair residues for kmax resgroups. # First create|init 'nextp' array of 1st prime mults for given twinpair, # stored consequtively in 'nextp', and init seg array for ks resgroups. # For sieve, mark resgroup bits to '1' if either twinpair restrack is nonprime # for primes mults resgroups, and update 'nextp' restrack slices acccordingly. # Return the last twinprime|sum for the range for this twinpair residues. sum, ki, kn = 0_u64, kmin-1, ks # init these parameters hi_tp, k_max = 0_u64, kmax # max twinprime|resgroup seg = BitArray.new(ks) # seg array of ks resgroups ki += 1 if r_hi - 2 < (start_num - 2) % modpg + 2 # ensure lo tp in range k_max -= 1 if r_hi > (end_num - 2) % modpg + 2 # ensure hi tp in range nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array while ki < k_max # for ks size slices upto kmax kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) # for lower twinpair residue track k = nextp.to_unsafe[j * 2] # starting from this resgroup in seg while k < kn # until end of seg seg.unsafe_put(k, true) # mark primenth resgroup bits prime mults k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2] = k - kn # save 1st resgroup in next eligible seg # for upper twinpair residue track k = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg while k < kn # until end of seg seg.unsafe_put(k, true) # mark primenth resgroup bits prime mults k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2 | 1]= k - kn # save 1st resgroup in next eligible seg end cnt = seg[...kn].count(false) # count|store twinprimes in segment if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg, count back to largest tp while seg.unsafe_fetch(upk); upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end ki += ks # set 1st resgroup val of next seg slice seg.fill(false) if ki < k_max # set next seg to all primes if in range end # when sieve done, numerate largest twin # for ranges w/o twins set largest to 1 hi_tp = (r_hi > end_num || sum == 0) ? 1 : hi_tp * modpg + r_hi {hi_tp.to_u64, sum.to_u64} # return largest twinprime|twins count end The code to find the largest twinprime in the range comes for FREE, and removing it has no detectable increase in speed, and for Crystal may even be a wee tad bit slower. sum += seg[...kn].count(false) ki += ks seg.fill(false) if ki < k_max end sum.to_u64 end # count|store twinprimes in segment # set 1st resgroup val of next seg slice # set next seg to all primes if in range # return twinprimes count in range In general, a bitarray’s performance depends on the language’s implementation (test to determine), but should make the code simpler|shorter to read|write, while the memory array model should be more ubiquitous, and implementable for languages without (native or external) bitarrays. 21 gcd def gcd(m, n) while m|1 != 1; t = m; m = n % m; n = t end m end Inputs: n – even pg modulus value m – an odd pc value < pg modulus n Output: gcd of inputs; (m, n) are coprime if 1 m– This is a customized gcd (greatest common divisor) function that uses residue properties to shorten the time of the Euclidean gcd algorithm (https://en.wikipedia.org/wiki/Euclidean_algorithm). Here m is an odd residue candidate < n, the even modulus value. Some of the language implementations just use the gcd function provided with them. modinv def modinv(a0, m0) return 1 if m0 == 1 a, m = a0, m0 x0, inv = 0, 1 while a > 1 inv -= (a // m) * x0 a, m = m, a % m x0, inv = inv, x0 end inv += m0 if inv < 0 inv.to_u64 end Inputs: a0 – odd pc value < modulus m0 m0 – even pg modulus value def modinv1(r, m) r = inv = r.to_u64 while (r * inv) % m != 1 inv = (inv % m) * r end inv % m end Output: inv – inverse of, a0 mod m0, e.g. a0*inv ≡ 1 mod m0 The function on the left is the standard modular inverse function (taken from Rosetta Code). The code on the right uses the residue property that – ri * rin ≡ 1 mod modpg – for some n ≥ 1, i.e. the modular inverse of residue ri is itself raised to some power n. This is faster for generators P3 and P5, with small number of residues, but becomes comparatively slower for generators with more residues. For P5’s residues: [7, 11, 13, 17, 19, 23, 29, 31] It’s inverses are: [13, 11, 7, 23, 19, 17, 29, 1] Inverse power n: [ 3, 1, 3, 3, 1, 3, 1, 1] 22 For a chosen Pn generator, gen_pg_parameters produces its parameters used to perform the SSoZ. It uses gcd to determine the residues and modinv to compute their inverses. gen_pg_parameters def gen_pg_parameters(prime) # Create prime generator parameters for given Pn puts "using Prime Generator parameters for P#{prime}" primes = [2, 3, 5, 7, 11, 13, 17, 19, 23] modpg, res_0 = 1, 0 # compute Pn's modulus and res_0 value primes.each { |prm| res_0 = prm; break if prm > prime; modpg *= prm } restwins = [] of Int32 # save upper twinpair residues here inverses = Array.new(modpg + 2, 0) # save Pn's residues inverses here pc, inc, res = 5, 2, 0 # use P3's PGS to generate pcs while pc < (modpg >> 1) # find PG's 1st half residues if gcd(pc, modpg) == 1 # if pc a residue mc = modpg - pc # create its modular complement inverses[pc] = modinv(pc, modpg) # save pc and mc inverses inverses[mc] = modinv(mc, modpg) # if in twinpair save both hi residues restwins << pc << mc + 2 if res + 2 == pc res = pc # save current found residue end pc += inc; inc ^= 0b110 # create next P3 seq pc: 5 7 11 13 17... end restwins.sort!; restwins <<(modpg + 1) # last residue is last hi_tp inverses[modpg+1] = 1; inverses[modpg-1] = modpg - 1 # last 2 are self inverses {modpg, res_0, restwins.size, restwins, inverses} end Inputs: prime – Pn prime value 5, 7… 17 Outputs: – first residue of selected Pn (next prime > Pn prime) modpg – modulus for generator Pn; value = (prime)# inverses – array of the pg residue inverses, size = (prime-1)# restwins – ordered array of the hi pg twinpair (tp) values restwins.size – the number of pg twinpairs = (prime-2)# res_0 For a given prime number, it generates its primorial value for modpg, and keeps its r0 value in res_0. It then generates all the residues. It uses P3’s PGS to generate Pn’s first half rcs. It checks if they’re coprime to modpg to identify the residues. For each residue it creates its modular complement (mc) and stores both inverses at their address values. It then determines if the residue is part of a twin (cousin) pair, and if so, then so is its complement, and stores both hi pair values in restwins. Upon generating all the residues, and storing their inverses and twin (cousin) pairs hi residues, the restwins array is sorted to put them in sequential order, then the last hi residue for the last twin pair modgp±1 are included as the last ones. (For cousin primes, we include the hi residue for the pivot pair (modpg/2 + 2)and then sort the array). Finally, the inverses for the last two residues modgp±1 are added at their address locations, and the outputs are returned for use in set_sieve_parameters. 23 Given the input values, set_sieve_parameters determines which prime generator to use, generates its parameters, then determines the range parameters and segment size to use. Here I use a rudimentary tree algorithm to determine for my laptops the switch points for using different generators. This can be made much more sophisticated and adaptable by also accounting for the number of system threads and cache and ram memory size, to pick better segment size values and generators for a given inputs range. set_sieve_parameters def set_sieve_parameters(start_num, end_num) # Select at runtime best PG and segment size parameters for input values. # These are good estimates derived from PG data profiling. Can be improved. nrange = end_num - start_num bn, pg = 0, 3 if end_num < 49 bn = 1; pg = 3 elsif nrange < 77_000_000 bn = 16; pg = 5 elsif nrange < 1_100_000_000 bn = 32; pg = 7 elsif nrange < 35_500_000_000 bn = 64; pg = 11 elsif nrange < 14_000_000_000_000 pg = 13 if nrange > 7_000_000_000_000; bn = 384 elsif nrange > 2_500_000_000_000; bn = 320 elsif nrange > 250_000_000_000; bn = 196 else bn = 128 end else bn = 384; pg = 17 end modpg, res_0, pairscnt, restwins, resinvrs = gen_pg_parameters(pg) kmin = (start_num-2) // modpg + 1 # number of resgroups to start_num kmax = (end_num - 2) // modpg + 1 # number of resgroups to end_num krange = kmax - kmin + 1 # number of resgroups in range, at least 1 n = krange < 37_500_000_000_000 ? 4 : (krange < 975_000_000_000_000 ? 6 : 8) b = bn * 1024 * n # set seg size to optimize for selected PG ks = krange < b ? krange : b # segments resgroups size puts "segment size = #{ks} resgroups; seg array is [1 x #{((ks-1) >> 6) + 1}] 64-bits" maxpairs = krange * pairscnt # maximum number of twinprime pcs puts "twinprime candidates = #{maxpairs}; resgroups = #{krange}" {modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs} end Inputs: ––– high input value (min of 3) start_num – low input value (min of 3) end_num Outputs: – number of residue groups set for segment size res_0 – first residue of selected Pn (next prime > Pn prime) modpg – modulus value for chosen pg kmin – number resgroups to start_num kmax – number resgroups to end_num krange – number of resgroups for inputs range (at least 1) pairscnt – number of twinpairs for selected pg resinvrs – modular inverses array for the residues restwins – hi residue values array for each twinpair ks 24 Finally, this is the Crystal version of the main routine twinprimes_ssoz. It accepts the input values, performs the residues sieve, times the different parts of the process, and generates the program outputs. twinprimes_ssoz def twinprimes_ssoz() end_num = {ARGV[0].to_u64, 3u64}.max start_num = ARGV.size > 1 ? {ARGV[1].to_u64, 3u64}.max : 3u64 start_num, end_num = end_num, start_num if start_num > end_num start_num |= 1 # if start_num even increase by 1 end_num = (end_num - 1) | 1 # if end_num even decrease by 1 start_num = end_num = 7 if end_num - start_num < 2 puts "threads = #{System.cpu_count}" ts = Time.monotonic # start timing sieve setup execution # select Pn, set sieving params for inputs modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs = set_sieve_parameters(start_num, end_num) # create sieve primes <= sqrt(end_num), only use those whose multiples within inputs range primes = end_num < 49 ? [5] : sozpg(Math.isqrt(end_num), res_0, start_num, end_num) puts "each of #{pairscnt} threads has nextp[2 x #{primes.size}] array" lo_range = restwins[0] - 3 # lo_range = lo_tp - 1 twinscnt = 0_u64 # determine count of 1st 4 twins if in range for used Pn twinscnt += [3, 5, 11, 17].select { |tp| start_num <= tp <= lo_range }.size unless end_num == 3 te = puts puts t1 = (Time.monotonic - ts).total_seconds.round(6) "setup time = #{te} secs" # display sieve setup time "perform twinprimes ssoz sieve" Time.monotonic # start timing ssoz sieve execution cnts = Array(UInt64).new(pairscnt, 0) # number of twinprimes found per thread lastwins = Array(UInt64).new(pairscnt, 0) # largest twinprime val for each thread done = Channel(Nil).new(pairscnt) threadscnt = Atomic.new(0) # count of finished threads restwins.each_with_index do |r_hi, i| # sieve twinpair restracks spawn do lastwins[i], cnts[i] = twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) print "\r#{threadscnt.add(1)} of #{pairscnt} twinpairs done" done.send(nil) end end pairscnt.times { done.receive } # wait for all threads to finish print "\r#{pairscnt} of #{pairscnt} twinpairs done" last_twin = lastwins.max # find largest hi_tp twinprime in range twinscnt += cnts.sum # compute number of twinprimes in range last_twin = 5 if end_num == 5 && twinscnt == 1 kn = krange % ks # set number of resgroups in last slice kn = ks if kn == 0 # if multiple of seg size set to seg size t2 = (Time.monotonic - t1).total_seconds # sieve execution time puts puts puts puts end "\nsieve time = #{t2.round(6)} secs" # ssoz sieve time "total time = #{(t2 + te).round(6)} secs" # setup + sieve time "last segment = #{kn} resgroups; segment slices = #{(krange - 1)//ks + 1}" "total twins = #{twinscnt}; last twin = #{last_twin - 1}+/-1" twinprimes_ssoz 25 Program Output Below is typical program output, shown here for Rust, for single and two input values (order doesn’t matter), run on an Intel i7-6700HQ Linux based laptop. The programs is run in a terminal with the command-line interface (cli) shown, and display the output shown. $ echo 5000000000 | ./twinprimes_ssoz threads = 8 using Prime Generator parameters for P11 segment size = 262144 resgroups; seg array is [1 x 4096] 64-bits twinprime candidates = 292207905; resgroups = 2164503 each of 135 threads has nextp[2 x 6999] array setup time = 0.000796737 secs perform twinprimes ssoz sieve 135 of 135 twinpairs done sieve time = 0.184892352 secs total time = 0.185704753 secs last segment = 67351 resgroups; segment slices = 9 total twins = 14618166; last twin = 4999999860+/-1 $ echo 100000000000 200000000000 | ./twinprimes_ssoz threads = 8 using Prime Generator parameters for P13 segment size = 524288 resgroups; seg array is [1 x 8192] 64-bits twinprime candidates = 4945055940; resgroups = 3330004 each of 1485 threads has nextp[2 x 37493] array setup time = 0.003883411 secs perform twinprimes ssoz sieve 1485 of 1485 twinpairs done sieve time = 3.819838338 secs total time = 3.823732178 secs last segment = 184276 resgroups; segment slices = 7 total twins = 199708605; last twin = 199999999890+/-1 The program output is described as follows: Line 0 is the cli input command. When 2 inputs are given their hi|lo order doesn’t matter. Line 1 shows the number of available system threads,. Line 2 shows the Pn generator selected based on the inputs. Line 3 shows the selected resgroup segment size ks, and number of 64-bit memory elements (ks / 64) for the segment array. Line 4 shows the number of twinprime candidates for the number of resgroups spanning the inputs range. In the second example, (kmax – kmin + 1) = 3,330,004 resgroups x 1485 (number of P13 twinpairs) = 4,945,055,940 twinprime candidates. Line 5 shows the number of twinpairs for the selected PG (here 1485 for P13) and the size of the nextp array, which shows the number of sieving primes used (6999 and 37493 for theses examples. Line 6 shows the time to select and generate Pn’s parameters and the sieve primes. Line 7 announces when the residues sieve process starts. Line 8 is a dynamic display showing in realtime how many twinpair threads are done, until finished. Line 9 shows the runtime for the residues sieve. Line 10 shows the combined setup and residues sieve times. Line 11 shows how many resgroups were in the last segment slice and the number of segment slices. Line 12 shows the number of twinprimes for the inputs range, and the value of the largest one. 26 Performance The SSoZ performs optimally on multi-core systems with parallel operating threads. The more available threads the higher the possible performance. To show this, I provide data from two systems. System 1: Intel i7-6700HQ, 2.6 – 3.5 GHz, 4C|8T, 16 MB, System76 Gazelle (2016) laptop. System 2: AMD 5900HX, 3.3 – 4.6 GHz, 8C|16T, 16 MB, Lenovo Legion slim 7 (2022) laptop. For a reference I used Primesieve 7.4 [5] – https://github.com/kimwalisch/primesieve – described as “a command-line program and C/C++ library for quickly generating prime numbers...using the segmented sieve of Eratosthenes with wheel factorization.” It’s a well maintained open source project of highly optimized C/C++ code libraries, which also takes inputs over the 64-bit range (but doesn’t produce results for cousin primes). Below are sample outputs for the Rust version of twinprimes_ssoz and Primesieve performed on both systems. $ echo 378043979 1429172500581 | ./twinprimes_ssoz threads = 8 // 16 using Prime Generator parameters for P13 segment size = 802816 resgroups; seg array is [1 x 12544] twinprime candidates = 70654672440; resgroups = 47578904 each of 1485 threads has nextp[2 x 92610] array setup time = 0.006171322 secs // 0.005839409 secs perform twinprimes ssoz sieve 1485 of 1485 twinpairs done sieve time = 55.836745969 secs // 18.062863872 secs total time = 55.842928445 secs // 18.068715224 secs last segment = 212760 resgroups; segment slices = 60 total twins = 2601278756; last twin = 1429172500572+/-1 $ echo 378043979 14291725005819 | ./twinprimes_ssoz threads = 8 // 16 using Prime Generator parameters for P17 segment size = 1572864 resgroups; seg array is [1 x 24576] twinprime candidates = 623572052400; resgroups = 27994256 each of 22275 threads has nextp[2 x 268695] array setup time = 0.036543755 secs // 0.025222812 secs perform twinprimes ssoz sieve 22275 of 22275 twinpairs done sieve time = 675.667368646 secs // 235,003460103 secs total time = 675.703922948 secs // 235.027696883 secs last segment = 1255568 resgroups; segment slices = 18 total twins = 22078408103; last twin = 14291725004982+/-1 $ ./primesieve -c2 378043979 1429172500581 Sieve size = 128 KiB // 256 KiB Threads = 8 // 16 100% Seconds: 101.873 // 33.781 Twin primes: 2601278756 $ ./primesieve -c2 378043979 14291725005819 Sieve size = 128 KiB // 256 KiB Threads = 8 // 16 100% Seconds: 1218.502 // 471.776 Twin primes: 22078408103 I implemented both the twins|cousins ssoz in the 6 programming languages listed here. Again, these are reference implementations, and are not necessarily optimum for each language. The Rust versions are the most optimized, and generally the fastest, as they performs the SoZ algorithm in parallel. The code for each is < 300 ploc (programming lines of code), which highlights the simplicity of the algorithm. The next page shows tables of benchmark results for the 6 languages implementations, and Primesieve. They are the best times for both systems from multiple runs under different operating conditions. Their code was developed on System 1, and those binaries also run on System 2. Their source code was then compiled on System 2 to compare performance differences, and those were used for the benchmarks. The 6 languages, and their development environments and versions are: C++, Nim 1.6.4 (gcc 11.3.0), D (ldc2 1.28.0, LLVM 12.0.1), Crystal 1.4.1 (LLVM 10.0.0), Rust 1.60, and Go 1.18. They most likely can be improved, and I hope others will create more versions, especially for other compiled languages. 27 N 1x10^10 5x10^10 1x10^11 5x10^11 1x10^12 5x10^12 1x10^13 Rust 0.35 1.67 3.41 18.15 37.67 219.67 482.51 Twin Prime Benchmark Comparisons – Intel i7 6700HQ C++ D Nim Crystal Go Prmsv Twins Count Largest in Range 0.45 0.46 0.53 0.48 0.61 0.51 27,412,679 9,999,999,703|-2 2.14 2.19 2.27 2.40 2.76 2.81 118,903,682 49,999,999,591|-2 4.24 4.31 4.34 4.69 5.51 5.91 224,376,048 99,999,999,763|-2 21.42 21.37 21.69 23.81 28.11 32.76 986,222,314 499,999,999,063|-2 44.48 44.25 44.71 49.05 58.08 69.25 1,870,585,220 999,999,999,961|-2 253.62 256.30 253.69 279.49 319.84 395.16 8,312,493,003 4,999,999,999,879|-2 543.74 542.23 541.35 602.63 678.61 825.71 15,834,664,872 9,999,999,998,491|-2 N Rust 1x10^10 0.36 5x10^10 1.69 1x10^11 3.35 5x10^11 18.08 1x10^12 37.17 5x10^12 220.05 1x10^13 478.96 N 1x10^10 5x10^10 1x10^11 5x10^11 1x10^12 5x10^12 1x10^13 N 1x10^10 5x10^10 1x10^11 5x10^11 1x10^12 5x10^12 1x10^13 Rust 0.12 0.54 1.12 5.85 12.14 68.04 145.01 Cousin Prime Benchmark Comparisons – Intel i7 6700HQ C++ D Nim Crystal Go Cousins Count Largest in Range 0.45 0.46 0.53 0.48 0.62 27,409,998 9,999,999,707|-4 2.11 2.18 2.26 2.41 2.81 118,908,265 49,999,999,961|-4 4.20 4.46 4.32 4.64 5.52 224,373,159 99,999,999,947|-4 21.34 21.35 21.76 23.36 28.21 986,220,867 499,999,999,901|-4 44.57 44.44 44.51 49.14 58.25 1,870,585,457 999,999,998,867|-4 250.63 251.86 252.18 278.76 320.15 8,312,532,286 4,999,999,999,877|-4 534.17 541.85 540.81 597.89 678.48 15,834,656,001 9,999,999,999,083|-4 Twin Prime Benchmark Comparisons – AMD Ryzen 9 5900HX C++ D Nim Crystal Go Prmsv Twins Count Largest in Range 0.12 0.12 0.19 0.13 0.15 0.16 27,412,679 9,999,999,703|-2 0.49 0.58 0.59 0.66 0.67 0.92 118,903,682 49,999,999,591|-2 0.97 1.13 1.08 1.23 1.32 1.95 224,376,048 99,999,999,763|-2 4.88 5.75 5.22 6.22 6.92 11.17 986,222,314 499,999,999,063|-2 10.03 12.01 11.12 13.06 14.61 23.71 1,870,585,220 999,999,999,961|-2 65.41 69.24 73.54 74.29 81.23 132.99 8,312,493,003 4,999,999,999,879|-2 155.45 156.57 172.68 170.77 185.25 307.78 15,834,664,872 9,999,999,998,491|-2 Cousin Prime Benchmark Comparisons – AMD Ryzen 9 5900HX Rust C++ D Nim Crystal Go Cousins Count Largest in Range 0.12 0.11 0.13 0.19 0.13 0.15 27,409,998 9,999,999,707|-4 0.55 0.49 0.57 0.59 0.63 0.66 118,908,265 49,999,999,961|-4 1.12 0.96 1.13 1.07 1.22 1.32 224,373,159 99,999,999,947|-4 5.87 4.89 5.78 5.25 6.18 6.92 986,220,867 499,999,999,901|-4 12.25 10.14 12.14 11.06 12.56 14.67 1,870,585,457 999,999,998,867|-4 67.69 68.51 68.74 74.68 74.86 80.29 8,312,532,286 4,999,999,999,877|-4 145.02 157.68 156.01 173.16 170.06 179.07 15,834,656,001 9,999,999,999,083|-4 28 Enhanced Configurations The software provided is designed to work on readily available 64-bit systems, and serve as reference implementations, to demonstrate how Prime Generators can be used to efficiently identify and count primes. They can be enhanced to take advantage of more hardware resources when available. Ideally we want to use as many system threads as possible. So for P5, which has 3 twin|cousin residue pairs, instead of using 3 threads over an input range it may be faster to divide the range into 2 equal parts and use 6 threads (3 for each half). Even if a system has only 4 threads, this may be faster as the range increases, but should definitely be faster (for sufficiently large ranges) if a system has 6 or more threads. In fact, if a system has at least 16 threads, using P7 (15 residue pairs) as the default generator for small ranges may be more efficient than P5, as they all can run in 1 parallel threads time (ptt). Thus a more sophisticated algorithm can be devised for set_sieve_parameters to use threads count, and also cache|memory sizes, to pick the best generator and segment size for given input ranges. For best performance this would require the profiling of targeted hardware system(s), to optimize the differences between cpus and systems capabilities and resources. However, I think the algorithm would still be fairly simple to code, to dynamically compute these parameters to achieve higher performance. Eliminating Sieving Primes As the value for end_num becomes larger more|bigger sieve primes must be generated, and filtered out or kept. Generating them takes increasing time with increasing input values. This also affects the time to perform the residue sieve, by increasing the time (and memory) to create the nextp array, and use it. While it’s possible to use stored lists of primes to eliminate dynamically generating them, this doesn’t get around creating nextp with them, with the associated memory issues for it in each thread. One simple way around this is to use a fast primality test algorithm to check each residue pair pc value in each resgroup in the threads. If one value isn’t prime the other doesn’t have to be checked. By using sufficiently large generators for a given input range, the number of resgroups over a range can be made arbitrarily small to reduce the number of primality tests to perform. For example for P47, modp47 = 614,889,782,588,491,410 is the largest primorial value that can fit into (unsigned) 64-bits. Its 15,681,106,801,985,625 residue pairs use 5.1% of the number space to hold the twin|cousin primes > 47. Eliminating using sieving primes greatly reduces the work of the algorithm. Realizable machines to perform this would use as many parallel compute engines as possible, but each would now be much simpler, eliminating sozpg and nextp_init. Now gen_pg_parameters just identifies the residue pair values (and no longer their inverses), needing only a (fast) gcd function. This could be done with massive arrays of graphic processing units (GPUs), or better, Simple Super Computers (SSCs). To search for yet undiscovered million digit primes, a distributed network can be constructed, similar to that for the Grand Internet Mersenne Prime Search (GIMPS) [7] and Twin Primes Search [8]. A benefit of creating this network, is that with all the available (free) compute power in the world, groups of residue pairs can be dedicated to machine clusters and run full time, and deterministically identify new twins|cousins (thus two primes for the price of one) forever, as there are an infinity of each [3], [4]. 29 The Ultimate Primes Search Machine Using just a few basic properties of Prime Generator Theory (PGT) we can construct a conceptually simpler and more efficient machine to find as many primes as physical reality and time will allow. Because for any Pn, modpn = pm# (primorial of first m primes), r0 = pm+1, and the residues from r0 to r02 are consecutive primes, we don’t have to do primality tests for them, but merely gcd tests to determine which values are coprime to modpn. Thus we can arbitrarily use any prime as r0 of a Pn whose modpn is the primorial of all the primes < r0, to directly find the consecutive primes in [r0, r02). After finding the new additional primes, we can them create a larger Pn modulus with them, and repeat the process, to continually find more primes. Primes r0 to r0^2 30000 Number of Primes 25000 20000 15000 10000 5000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Number of Pn Primorial Primes This graph shows the number of consecutive primes in the regions [r0, r02) for generator moduli made with the first 100 primes. Thus for the last data point for p100 = 541, from r0 = 547 to r02 = 299,209 there are 25,836 primes|residues, and we now know the first 25,936 primes, with 299,197 the largest prime. Using this approach we no longer have to even identify the residue pairs, but just maintain and use the growing modulus values to perform the gcd operations with. The key here is to do the gcd operations on chunks of partial primorial values as we identify more primes and not one humongous pm# value. Thus as we identify new primes, we make partial primorial chunks with them. To check if a value is a residue we perform repeated gcd tests with all the partial primorial chunks. If any partial gcd chunk is not 1 (coprime) then that rc value isn’t a residue and we can stop testing it. Only rc values that pass all the partial chunks tests (done in parallel) are residues to the full modpn value, and thus are new primes. The main job for this machine would be to control the creation, distribution, and storage of the gcd operations, and their results, performed by a distributed network of compute engines. For each range [r0, r02) it would use the PGS for some smaller Pn, (e.g. P3’s PGS in the code to reduce the residues candidates search space to 1/3 of the range values) and distribute the rcs for testing. After creating a list of new consecutive primes, it can be processed to identify new primes or k-tuples of any type. 30 Source Code The SSoZ is a good algorithm to assess hardware and software multi-threading capabilities. It’s very simple mathematically, needing only basic computational functions most languages have, but are easy to implement if they don’t. The implementations I provide should be considered as references and not necessarily optimum for each language. They should be considered as starting points to improve upon, as they, most importantly, produce correct results that other implementations can check results against. The code source files can be found here [6]: https://gist.github.com/jzakiya, and individually below. twinprimes_ssoz Crystal – https://gist.github.com/jzakiya/2b65b609f091dcbb6f792f16c63a8ac4 Rust – https://gist.github.com/jzakiya/b96b0b70cf377dfd8feb3f35eb437225 Nim – https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e C++ – https://gist.github.com/jzakiya/fa76c664c9072ddb51599983be175a3f Go – https://gist.github.com/jzakiya/fbc77b8fdd12b0581a0ff7c2476373d9 D – https://gist.github.com/jzakiya/ae93bfa03dbc8b25ccc7f97ff8ad0f61 cousinprimes_ssoz Crystal – https://gist.github.com/jzakiya/0d6987ee00f3708d6cfd6daee9920bd7 Rust – https://gist.github.com/jzakiya/8879c0f4dfda543eaf92a3186de554d7 Nim – https://gist.github.com/jzakiya/e2fa7211b52a4aa34a4de932010eac69 C++ – https://gist.github.com/jzakiya/3799bd8604bdcba34df5c79aae6e55ac Go – https://gist.github.com/jzakiya/0ea756a8f6fd09f56cd9374d0dcf4197 D – https://gist.github.com/jzakiya/147747d391b5b0432c7967dd17dae124 Conclusion Prime Generators allow for the creation of efficient, simple, and resource sparse generic algorithms that can be performed with any Pn generator. Generators can dynamically be chosen to optimize speed and memory use for given number ranges, to best use the hardware and software resources available. The SSoZ algorithms are inherently implementable in parallel, and can be performed on any hardware or distributed system that provides multiple cores or compute engines. As shown, the more cores and threads that are available to use the higher the inherent performance will be for a given number range. While the code to generate Twin and Cousin primes was shown here, the basic math and principles explaining the process for them can be applied similarly to find other k-tuples, and other specific prime types, such as Mersenne Primes [2]. It is hoped this detailed explanation of how the SSoZ works and performs will encourage its use in applied applications, and its inclusion in software libraries, et al, that are used in the study of primes. 31 References [1] The Segmented Sieve of Zakiya (SSoZ) https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ [2] The Use of Prime Generators to Implement Fast Twin Prime Sieve of Zakiya (SoZ), Applications to Number Theory and Implications for the Riemann Hypotheses https://www.academia.edu/37952623/The_Use_of_Prime_Generators_to_Implement_Fast_Twin_Prim es_Sieve_of_Zakiya_SoZ_Applications_to_Number_Theory_and_Implications_for_the_Riemann_Hyp otheses [3] On The Infinity of Twin Primes and other K-tuples https://www.academia.edu/41024027/On_The_Infinity_of_Twin_Primes_and_other_K_tuples [4] (Simplest) Proof of Twin Primes and Polignacs’ Conjectures (video): https://www.youtube.com/watch?v=HCUiPknHtfY&t=940s [5] Primesieve - https://github.com/kimwalisch/primesieve [6] Twins|Cousins SSoZ software language source files: https://gist.github.com/jzakiya [7] Grand Internet Mersenne Primes Search (GIMPS) – https://www.mersenne.org/ [8] Twins Primes Search – https://primes.utm.edu/bios/page.php?id=949 32 # This Crystal source file is a multiple threaded implementation to perform an # extremely fast Segmented Sieve of Zakiya (SSoZ) to find Twin Primes <= N. # Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1. # Output is the number of twin primes <= N, or in range N1 to N2; the last # twin prime value for the range; and the total time of execution. # This code was developed on a System76 laptop with an Intel I7 6700HQ cpu, # 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning # probably needed to optimize for other hardware systems (ARM, PowerPC, etc). # # # # # Compile as: $ crystal build twinprimes_ssozgist.cr -Dpreview_mt --release To reduce binary size do: $ strip twinprimes_ssoz Thread workers default to 4, set to system max for optimum performance. Single val: $ CRYSTAL_WORKERS=8 ./twinprimes_ssoz val1 Range vals: $ CRYSTAL_WORKERS=8 ./twinprimes_ssoz val1 val2 # # # # # # Mathematical and technical basis for implementation are explained here: https://www.academia.edu/37952623/The_Use_of_Prime_Generators_to_Implement_Fast_ Twin_Primes_Sieve_of_Zakiya_SoZ_Applications_to_Number_Theory_and_Implications_ for_the_Riemann_Hypotheses https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_ https://www.academia.edu/19786419/PRIMES-UTILS_HANDBOOK # This source code, and its updates, can be found here: # https://gist.github.com/jzakiya/2b65b609f091dcbb6f792f16c63a8ac4 # This code is provided free and subject to copyright and terms of the # GNU General Public License Version 3, GPLv3, or greater. # License copy/terms are here: http://www.gnu.org/licenses/ # Copyright (c) 2017-2022; Jabari Zakiya -- jzakiya at gmail dot com # Last update: 2022/06/28 require "bit_array" # Customized gcd for prime generators; n > m; m odd def gcd(m, n) while m|1 != 1; t = m; m = n % m; n = t end m end # Compute modular inverse a^-1 to base m, e.g. a*(a^-1) mod m = 1 def modinv(a0, m0) return 1 if m0 == 1 a, m = a0, m0 x0, inv = 0, 1 while a > 1 inv -= (a // m) * x0 a, m = m, a % m x0, inv = inv, x0 end inv += m0 if inv < 0 inv end def gen_pg_parameters(prime) # Create prime generator parameters for given Pn puts "using Prime Generator parameters for P#{prime}" primes = [2, 3, 5, 7, 11, 13, 17, 19, 23] modpg, res_0 = 1, 0 # compute Pn's modulus and res_0 value primes.each { |prm| res_0 = prm; break if prm > prime; modpg *= prm } restwins = [] of Int32 inverses = Array.new(modpg + 2, 0) # save upper twinpair residues here # save Pn's residues inverses here 33 pc, inc, res = 5, 2, 0 # use P3's PGS to generate pcs while pc < (modpg >> 1) # find PG's 1st half residues if gcd(pc, modpg) == 1 # if pc a residue mc = modpg - pc # create its modular complement inverses[pc] = modinv(pc, modpg) # save pc and mc inverses inverses[mc] = modinv(mc, modpg) # if in twinpair save both hi residues restwins << pc << mc + 2 if res + 2 == pc res = pc # save current found residue end pc += inc; inc ^= 0b110 # create next P3 sequence pc: 5 7 11 13 17 19 ... end restwins.sort!; restwins << (modpg + 1) # last residue is last hi_tp inverses[modpg + 1] = 1; inverses[modpg - 1] = modpg - 1 # last 2 residues are self inverses {modpg, res_0, restwins.size, restwins, inverses} end def set_sieve_parameters(start_num, end_num) # Select at runtime best PG and segment size parameters for input values. # These are good estimates derived from PG data profiling. Can be improved. nrange = end_num - start_num bn, pg = 0, 3 if end_num < 49 bn = 1; pg = 3 elsif nrange < 77_000_000 bn = 16; pg = 5 elsif nrange < 1_100_000_000 bn = 32; pg = 7 elsif nrange < 35_500_000_000 bn = 64; pg = 11 elsif nrange < 14_000_000_000_000 pg = 13 if nrange > 7_000_000_000_000; bn = 384 elsif nrange > 2_500_000_000_000; bn = 320 elsif nrange > 250_000_000_000; bn = 196 else bn = 128 end else bn = 384; pg = 17 end modpg, res_0, pairscnt, restwins, resinvrs = gen_pg_parameters(pg) kmin = (start_num-2) // modpg + 1 # number of resgroups to start_num kmax = (end_num - 2) // modpg + 1 # number of resgroups to end_num krange = kmax - kmin + 1 # number of resgroups in range, at least 1 n = krange < 37_500_000_000_000 ? 4 : (krange < 975_000_000_000_000 ? 6 : 8) b = bn * 1024 * n # set seg size to optimize for selected PG ks = krange < b ? krange : b # segments resgroups size puts "segment size = #{ks} resgroups for seg bitarray" maxpairs = krange * pairscnt # maximum number of twinprime pcs puts "twinprime candidates = #{maxpairs}; resgroups = #{krange}" {modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs} end def sozpg(val, res_0, start_num, end_num) # Compute the primes r0..sqrt(input_num) and store in 'primes' array. # Any algorithm (fast|small) is usable. Here the SoZ for P5 is used. md, rscnt = 30u64, 8 # P5's modulus and residues count res = [7,11,13,17,19,23,29,31] # P5's residues bitn = [0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128] kmax = (val - 2) // md + 1 prms = Array(UInt8).new(kmax, 0) modk, r, k = 0, -1, 0 # number of resgroups upto input value # byte array of prime candidates, init '0' # initialize residue parameters 34 loop do # for r0..sqrtN primes mark their multiples if (r += 1) == rscnt; r = 0; modk += md; k += 1 end # resgroup parameters next if prms[k] & (1 << r) != 0 # skip pc if not prime prm_r = res[r] # if prime save its residue value prime = modk + prm_r # numerate the prime value break if prime > Math.isqrt(val) # exit loop when it's > sqrtN res.each do |ri| # mark prime's multiples in prms kn,rn = (prm_r * ri - 2).divmod md # cross-product resgroup|residue bit_r = bitn[rn] # bit mask for prod's residue kpm = k * (prime + ri) + kn # resgroup for 1st prime mult while kpm < kmax; prms[kpm] |= bit_r; kpm += prime end end end # prms now contains the nonprime positions for the prime candidates r0..N # extract only primes that are in inputs range into array 'primes' primes = [] of UInt64 # create empty dynamic array for primes prms.each_with_index do |resgroup, k| # for each kth residue group res.each_with_index do |r_i, i| # check for each ith residue in resgroup if resgroup & (1 << i) == 0 # if bit location a prime prime = md * k + r_i # numerate its value, store if in range # check if prime has multiple in range, if so keep it, if not don't n, rem = start_num.divmod prime # if rem 0 then start_num is multiple of prime primes << prime if (res_0 <= prime <= val) && (prime * n <= end_num - prime || rem == 0) end end end primes end def nextp_init(rhi, kmin, modpg, primes, resinvrs) # Initialize 'nextp' array for twinpair upper residue rhi in 'restwins'. # Compute 1st prime multiple resgroups for each prime r0..sqrt(N) and # store consecutively as lo_tp|hi_tp pairs for their restracks. nextp = Slice(UInt64).new(primes.size*2) # 1st mults array for twinpair r_hi, r_lo = rhi, rhi - 2 # upper|lower twinpair residue values primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) k = (prime - 2) // modpg # find the resgroup it's in r = (prime - 2) % modpg + 2 # and its residue value r_inv = resinvrs[r].to_u64 # and residue inverse rl = (r_inv * r_lo - 2) % modpg + 2 # compute r's ri for r_lo rh = (r_inv * r_hi - 2) % modpg + 2 # compute r's ri for r_hi kl = k * (prime + rl) + (r * rl - 2) // modpg # kl 1st mult resgroup kh = k * (prime + rh) + (r * rh - 2) // modpg # kh 1st mult resgroup kl < kmin ? (kl = (kmin - kl) % prime; kl = prime - kl if kl > 0) : (kl -= kh < kmin ? (kh = (kmin - kh) % prime; kh = prime - kh if kh > 0) : (kh -= nextp[j * 2] = kl.to_u64 # prime's 1st mult lo_tp resgroup val nextp[j * 2 | 1] = kh.to_u64 # prime's 1st mult hi_tp resgroup val end nextp end kmin) kmin) in range in range def twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) # Perform in thread the ssoz for given twinpair residues for kmax resgroups. # First create|init 'nextp' array of 1st prime mults for given twinpair, # stored consequtively in 'nextp', and init seg array for ks resgroups. # For sieve, mark resgroup bits to '1' if either twinpair restrack is nonprime # for primes mults resgroups, and update 'nextp' restrack slices acccordingly. # Return the last twinprime|sum for the range for this twinpair residues. sum, ki, kn = 0_u64, kmin-1, ks # init these parameters hi_tp, k_max = 0_u64, kmax # max twinprime|resgroup seg = BitArray.new(ks) # seg array of ks resgroups ki += 1 if r_hi - 2 < (start_num - 2) % modpg + 2 # ensure lo tp in range k_max -= 1 if r_hi > (end_num - 2) % modpg + 2 # ensure hi tp in range nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array while ki < k_max # for ks size slices upto kmax kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) 35 # for lower twinpair residue track # starting from this resgroup in seg # until end of seg # mark primenth resgroup bits prime mults # set resgroup for prime's next multiple # save 1st resgroup in next eligible seg # for upper twinpair residue track k = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg while k < kn # until end of seg seg.unsafe_put(k, true) # mark primenth resgroup bits prime mults k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2 | 1]= k - kn # save 1st resgroup in next eligible seg end cnt = seg[...kn].count(false) # count|store twinprimes in segment if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg, count back to largest tp while seg.unsafe_fetch(upk); upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end ki += ks # set 1st resgroup val of next seg slice seg.fill(false) if ki < k_max # set next seg to all primes if in range end # when sieve done, numerate largest twin # for ranges w/o twins set largest to 1 hi_tp = (r_hi > end_num || sum == 0) ? 1 : hi_tp * modpg + r_hi {hi_tp.to_u64, sum.to_u64} # return largest twinprime|twins count end k = nextp.to_unsafe[j * 2] while k < kn seg.unsafe_put(k, true) k += prime end nextp.to_unsafe[j * 2] = k - kn def twinprimes_ssoz() end_num = {ARGV[0].to_u64, 3u64}.max start_num = ARGV.size > 1 ? {ARGV[1].to_u64, 3u64}.max : 3u64 start_num, end_num = end_num, start_num if start_num > end_num start_num |= 1 # if start_num even increase by 1 end_num = (end_num - 1) | 1 # if end_num even decrease by 1 start_num = end_num = 7 if end_num - start_num < 2 puts "threads = #{System.cpu_count}" ts = Time.monotonic # start timing sieve setup execution # select Pn, set sieving params for inputs modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs = set_sieve_parameters(start_num, end_num) # create sieve primes <= sqrt(end_num), only use those whose multiples within inputs range primes = end_num < 49 ? [5] : sozpg(Math.isqrt(end_num), res_0, start_num, end_num) puts "each of #{pairscnt} threads has nextp[2 x #{primes.size}] array" lo_range = restwins[0] - 3 # lo_range = lo_tp - 1 twinscnt = 0_u64 # determine count of 1st 4 twins if in range for used Pn twinscnt += [3, 5, 11, 17].select { |tp| start_num <= tp <= lo_range }.size unless end_num == 3 te = puts puts t1 = (Time.monotonic - ts).total_seconds.round(6) "setup time = #{te} secs" # display sieve setup time "perform twinprimes ssoz sieve" Time.monotonic # start timing ssoz sieve execution cnts = Array(UInt64).new(pairscnt, 0) # number of twinprimes found per thread lastwins = Array(UInt64).new(pairscnt, 0) # largest twinprime val for each thread done = Channel(Nil).new(pairscnt) threadscnt = Atomic.new(0) # count of finished threads restwins.each_with_index do |r_hi, i| # sieve twinpair restracks spawn do lastwins[i], cnts[i] = twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) 36 print "\r#{threadscnt.add(1)} of #{pairscnt} twinpairs done" done.send(nil) end end pairscnt.times { done.receive } # wait for all threads to finish print "\r#{pairscnt} of #{pairscnt} twinpairs done" last_twin = lastwins.max # find largest hi_tp twinprime in range twinscnt += cnts.sum # compute number of twinprimes in range last_twin = 5 if end_num == 5 && twinscnt == 1 kn = krange % ks # set number of resgroups in last slice kn = ks if kn == 0 # if multiple of seg size set to seg size t2 = (Time.monotonic - t1).total_seconds # sieve execution time puts puts puts puts end "\nsieve time = #{t2.round(6)} secs" # ssoz sieve time "total time = #{(t2 + te).round(6)} secs" # setup + sieve time "last segment = #{kn} resgroups; segment slices = #{(krange - 1)//ks + 1}" "total twins = #{twinscnt}; last twin = #{last_twin - 1}+/-1" twinprimes_ssoz 37
Twin Primes Segmented Sieve of Zakiya (SSoZ) Explained Jabari Zakiya © June 12, 2022 jzakiya@gmail.com Introduction In 2014 I released The Segmented Sieve of Zakiya (SSoZ) [1]. It described a general method to find primes using an efficient prime sieve based on Prime Generators (PG). I expanded upon it, and in 2018 I released The Use of Prime Generators to Implement Fast Twin Primes Sieve of Zakiya (SoZ), Applications to Number Theory, and Implications for the Riemann Hypotheses [2]. The algorithm has been improved and now also used to find Cousin Primes. This paper explains in detail the what, why, and how of the algorithm and shows its implementation in 6 software languages, and performance data for these 6 languages run on 2 different cpu systems with 8 and 16 threads. General Description The programs count the number of Twin|Cousin Primes between two numbers within a 64-bit range, i.e. 0 – 18,446,744,073,709,551,615 (2**64 – 1), and also returns the largest twin|cousin value within it. The algorithm has no mathematical limits, but [hard|soft]ware does, so its coded to run on commonly available 64-bit multi-core systems containing a reasonable amount of memory (the more the better). Below is a diagram and description of the major functional components of the algorithm and software. Inputs Formatting One or two values are entered (order doesn’t matter) specifying the numerical range. They’re converted to odd values, and|or defaults, after conditional checks. Inputs Formatting Pn Selection and Parametization Pn Selection and Parameterization The inputs numerical range is used to select the Pn generator used to perform the residues sieve. Once determined, its generator parameters are created. Sieve Primes Generation The sieving primes ≤ sqrt(end_num) for the range are generated, but only those with multiples within the numerical range are used for the Pn generator. Sieve Primes Generation Residues Sieves In parallel for each twin|cousin residues pair for Pn, the sieve primes are used to create the nextp array of start locations for marking their multiples for each segment size the input numerical range is split into. Outputs Collection and Display The prime pairs count and largest value is collected for each residue pair thread, and their final greatest values displayed, along with timing data. 1 Residues Sieves Outputs Collection and Display Math Fundamentals Prime numbers do not exist randomly! When we break the number line into even sized groups of integers (the group numerical bandwidth and prime generator modulus value), the primes are evenly distributed along the residues in each group, i.e. the coprime values to the modulus (their greatest common divisor (gcd) with the modulus is 1). Thus a modulus, and its associated residues, form a Prime Generator (PG), a mathematical expression and framework for generating and identifying every prime not a modulus prime factor. While a PG modulus can be any even number, the most efficient moduli are strictly prime primorials. These prime generators have the smallest ratios of (# of residues)/modulus and make the number space primes exist within the smallest possible for a given number of residues. As more primes are used to form the PG moduli they systematically squeeze the primes into smaller and smaller number spaces. The S|SoZ algorithms are based on the structure and framework of Prime Generators, whose math and properties are formalized in Prime Generator Theory (PGT). For an extensive review read [1], [2], [3] and see the video – (Simplest) Proof of the Twin Primes and Polignac’s Conjectures. https://www.youtube.com/watch?v=HCUiPknHtfY&t=940s [4]. Below is a list of the major properties of Prime Generators that comprise the mathematical foundation for the S|SoZ algorithms and code. Major Properties of Prime Generators • • • • • • • • • • • • • • • • • a prime generators has form: Pn = modpn * k + {r0 … rn} the modulus for prime generator with last prime value pn has primorial form: modpn = pn# the number of residues are even, with counts: rescntpn = (pn – 1)# = pn-1# the residues occur as modular complement pairs to its modulus: modpn = ri + rj the last two residues of a generator are constructed as: (modpn - 1) (modpn + 1) the residues, by definition, will include all the coprime primes < modpn the first residue r0 is the next prime > pn the residues from r0 to r02 are consecutive primes each generator has a characteristic Prime Generator Sequence (PGS) of even residue gaps the last 3 sequence gaps have form: (r0 - 1) 2 (r0 - 1) the gaps are distributed with a symmetric mirror image around a pivot gap size of 4 the residue gaps sum from r0 to (r0 + modpn) equals the modulus: modpn = Σai·2i the coefficients ai are the frequency of each gap of size 2i the sum of the coefficients ai equal the number of residues: rescntpn = Σai coefficients a1 = a2 are odd and equal with form: a1 = a2 = (pn – 2)# = pn-2# the coefficients ai are even for i > 2 the number of nonzero coefficients ai in a sequence for Pn is of order pn-1 Residues have canonical form values (1...modpn-1), as 1 is always coprime to any modulus, but for coding|math efficiency their functional form values (r0…modpn+1) are used, with r0 defined above, and modpn+1 ≡ 1 modpn is the permuted first congruent value for 1. Also, as the residues exist as modular complement pairs the code determines their first half values and their 2nd half values come for FREE. To find the residues for a Pn, we can use the PGS of a smaller generator (in the code for P3), to reduce the number space of the residue candidates (rc) in larger moduli that need to be checked. 2 Shown here is the primes candidates (pcs) table for P5 up to the 100th prime 541. It shows the only possible pc values that can be primes for 30 integer groupings. Each of the k columns is a residue group (resgroup) of prime candidates. The colored pc values are nonprime composites, and can be sieved out by the SoZ (Sieve of Zakiya), leaving only the prime values shown. P5 = 30 * k + {7, 11, 13, 17, 19, 23, 29, 31} k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 r0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 r1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 r3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 r4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 r5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 r6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 r7 31 Table 1. 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 Every PG represents a pcs table like this, which visually display all their properties. To identify all the Twin Primes we merely observe the residue pair values that differ by 2, (11, 13), (17, 19), (29, 31), and for Cousins those that differ by 4, (7, 11), (13, 17), (19, 23). These residues gaps form the basis for the Twins|Cousins SSoZ implementations, and other k-tuples of interest. To find larger constellations of prime pairs, et al, we merely identify the residue pairs of desired size. For Sexy Primes (p, p+6), we just use the pairs (7, 13), (11, 17), (13, 19), (17, 23), (23, 29), (31, 37). Using them, we easily see and count there are 47 Sexy Primes (with [5:11]) within the first 100 primes. Larger generators have more residues and larger gaps and enable identifying more desired size k-tuples. In my video [4], I define the residue gaps as the gaps between consecutive residues, and thus I refer to prime gaps as consecutive prime (2, n) tuples, where n is an even integer. Thus in the video I state there are 25 Sexy Primes in the table above, i.e. 25 pairs of consecutive primes that differ by 6. However in the academic math world Sexy and Cousin primes are defined as any (2, 6) and (2, 4) tuple, thus [7:13] is a Sexy Prime even though we see 11 is between them. So [5:11] is defined as the first Sexy Prime and [3:7] the first Cousin, and [3:103] would be the first (2, 100) tuple, i.e. 2 primes that differ by 100. However, if you want to know and understand the true distribution of primes, what you want to know is the distribution of the gaps between consecutive primes, which I’ll define as prime gap kpg-tuples. So the actual first (2, 100) kpg-tuple is [396,733: 396,833], a very big difference. It’s from the kpg-tuples that inform you where the prime deserts are (long number stretches without primes), and characterize the true average thinning (density) of primes as the integers grow larger. And as shown and explained in [3] and [4], there are an infinity of consecutive prime gaps of any even size. Thus the PGS for the Pn’s provide a deterministic floor (minimum) value of the number of kpg-tuples of any size, and their prime values, over any range of numbers, which we can (in theory) create an SSoZ residues sieve to identify and count. 3 Shown here are the PG parameters for the first 9 Pn generators P2 – P23 where modpn = Here pn = is the prime value of the mth prime, thus: p2 = p1, p3 = p2, p5 = p3, p7 = p4,, etc. Pn’s modulus value modpn: (pn - 0)# = pn-0# = Π (pn - 0) = (2 - 0) * (3 - 0) * (5 - 0) … * (pm - 0) Number of residues rescnt: (pn - 1)# = pn-1# = Π (pn - 1) = (2 - 1) * (3 - 1) * (5 - 1) … * (pm - 1) # of twins|cousins pairscnt: (pn - 2)# = pn-2# = Π (pn - 2) = (2 - 2) * (3 - 2) * (5 - 2) … * (pm – 2) For P23 modulus: modp23 = 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 = 223092870 For P23 residues: rescount = 1 * 2 * 4 * 6 * 10 * 12 * 16 * 18 * 22 = 36495360 For P23 twins|cousin: pairs = 1 * 1 * 3 * 5 * 9 * 11 * 15 * 17 * 21 = 7952175 The primes number space % is: (rescntpn/modpn) * 100 = (pn-1# / pn#) * 100 The pairscnt number space % is: (pairscntpn*2/modpn) * 100 = (pn-2# / pn#) * 200 Pn P2 P3 P5 P7 P11 modulus (modpg) 2 6 30 210 2310 30030 510510 9699690 223092870 residues count (rescnt) 1 2 8 48 480 5760 92160 1658880 36495360 twins|cousins pairscnt 0 1 3 15 135 1485 22275 378675 7952175 primes % number space 50.00 33.33 26.67 22.86 20.78 19.18 18.05 17.10 16.36 pairs % number space Table 2. 8.73 7.81 7.13 50.00 33.33 20.00 14.29 11.69 P13 9.89 P17 P19 P23 As the Pn primorial primes pm increase, the number space containing primes and twins|cousins steadily decreases, and can be made an arbitrarily small value ε > 0 of the total number space as m→∞. Primes Number Space 50 45 Number Space % 40 35 30 25 primes pairs 20 15 10 5 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Number of Pn Primorial Primes This graph shows the decreasing prime number space for Pn using the first 100 primes. Once past the knee of the curve, the differential change becomes smaller for each additional pm. For many common use cases we can effectively limit usable Pn generators to the first 10 primes or so. However, for prime searches in large number values ranges, using the largest generator possible for a system is desirable, to make the maximum searchable number space as small as possible. 4 Generating Sieve Primes The SSoZ uses the necessary sieving primes ≤ (i.e. only those with multiples within the inputs range) to sieve out their nonprime multiples. An efficient coded P5 Sieve of Zakiya (SoZ) generates them at runtime (though other means can be used). Below is its algorithm. SoZ Algorithm To find all the primes ≤ N = 1. for Prime Generator P5, create its generator parameters 2. determine kmax, the number of residue groups (resgroups) up to N 3. create byte array prms[kmax] to represent the value|residue of each resgroup pc 4. perform outer sieve loop: • starting from the first resgroup, determine where each pc bit location is prime • if a bit location a prime, keep its residue value in prm_r; numerate its prime value • exit loop when prime > sqrt(N) 5. perform inner sieve loop with each residue ri: • create cross-product (prm_r * ri) • determine the resgroup kn it’s in, and its residue rn • compute first prime multiple resgroup kpm for the prime with ri • mark in prms each primenth kpm resgroup bitn[rn] as non-prime until its end 6. repeat from 4 for next resgroup 7. when sieve ends, numerate|store from each prms resgroup the needed sieving primes ≤ N P5’s primes candidates (pcs) table up to 541 (the 100th prime) is shown below. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 The function sozpg performs the P5 sieve exactly as shown. An array prms of kmax bytes is created to represent each resgroup|column of 8 pc values|rows up to the resgroup that covers the input value. Each row represents a residue value|bit position|residue track. prms is initialized to ‘0’ to make all bit positions be prime. The sieve computes for each prime ≤ its first prime multiple resgroup kpm on each row, and starting from these, sets each primenth resgroup bit on each row to ‘1’, to mark its multiples (colors), to eliminate the nonprimes. The process is explained in greater detail as follows. 5 Performing SoZ Sieve To sieve the nonprimes from P5’s pcs table up to 541 we use the primes ≤ isqrt(541)=23. They are the first 6 primes|residues: 7, 11, 13, 17, 19, 23, whose first unique multiples are shown with 6 different colors. The value 541 resides in residue group k=17, so kmax=18 is the number of resgroups up to it. Starting with the first prime in regroup k=0, 7 multiplies each pc in the resgroup, whose multiples are in blue: 7 * [7, 11, 13, 17, 19, 23, 29, 31] = [49, 77, 91, 119, 133, 161, 203, 217]. Each 7th resgroup|col along each restrack|row from these start values are 7’s multiples. Thus 7 * 7 = 49 in resgroup k=1, on rt4|r=19 is 7’s first multiple. Every 7th regroup starting there (k=1, 8, 15) < kmax on rt4 is a multiple of 7 and set to ‘1’ to mark as nonprime. We repeat for 7’s other first multiples 77, 91, etc, on their rows. We then use the next prime location in resgroup k=0 after 7, which is 11, and repeat the process with it. 11 * [7, 11, 13, 17, 19, 23, 29, 31] = [77, 121, 143, 187, 209, 253, 319, 341], whose first unique multiples are red. Note, the first unique multiple for each prime is its square, which for 11 is 121. The first multiples with smaller primes, e.g. 11* 7 = 77, are colored with those primes colors (here 7|blue). Also note, each prime must multiply each member in its resgroup, whether prime or not, to map its starting first prime multiple onto each distinct row in some kpm resgroup. As shown, this process is very simple and fast, and we can perform the multiplications very efficiently. We can also perform the sieve and primes extraction process in parallel, making it even faster. Extracting Sieve Primes To extract the primes from prms in sequential order, we start at resgroup k=0 and iterate over each byte bit, then continue with each successive byte. A ‘0’ bit position represents a prime value in each byte, and if ‘1’ we skip to the next bit. The prime values are numerated as: prime = modpg * k + ri, with k the resgroup index, ri the residue for the bit position, and modpg = 30 for P5’s modulus. Alternatively we can reverse the order, and for each bit row, iterate over each resgroup byte and find the primes along them. This may provide certain software computational advantages, but the primes will no longer be extracted in sequential order (though if necessary they could be sorted afterwards). For the purposes of the SSoZ algorithm, it’s not necessary the primes be used in sequential order. To optimize performance of the SSoZ, during the prime sieve extraction process, primes which don’t have multiples within the inputs range are discarded. This significantly increases SSoZ performance for small input ranges between large input numbers, by reducing the work the residues sieves do. The algorithm described here is generic to all Pn generators, where only their parameters change for each. Implementations may vary based on hardware|software particulars, but the work performed is the same. Larger generators systematically reduce the primes number space, by having larger modulus sizes and more residues, but we generally want to pick the smallest Pn generator that optimizes the system resources for given input values and ranges. For the implementations provided, whose inputs range are constrained to 64-bits, using P5 to perform the SoZ with was the overall most efficient choice, as it’s straightforward to code, and as we’ll see, can also be done in parallel to increase its performance. 6 Efficient residue multiplications To find the resgroup (column) for a pc value in the table we integer divide it by the PG modulus. To find its residue value, we find its integer remainder when dividing by the PG modulus. Thus each pc regroup value has parameters: k = pc div modpg, with residue value: ri = pc mod modpg. Multiplying two regroup pcs e.g. (17 * 19) = 323 gives: k, ri = (17 * 19).divmod 30 –> k = 10, ri = 23. From P5’s pc table, we see pc = 323 is in resgroup k=10 with residue 23 on restrack rt5. Each prime can be parameterized by its residue r and resgroup k values e.g.: prime = modk + r, where modk = modpg * k, for each resgroup, and each resgroup pc_i has form: pc_i = modk + ri. Thus the multiplication – (prime * pc_i) – translates into the following parameterized form: The original multiplication has now been transformed to the form: product = modpg * kk + rr where kk = k * (prime + ri) and rr = r * ri, which also has the general form: pc = modpg * k + r. The (r * ri) term represents the base residues (k = 0) cross products (which can be pre-computed). We extract from it its resgroup value: kn = (r * ri) / modpg, and residue: rn = (r * ri) % modpg, which maps to a restrack bit value as rt_n = residues.index(rn). Thus for P5, r = 7 is at residues[0], so that its rt_i row value is: i = residues.index(7) = 0, whose bit mask is: bit_r = 2i = (1 << i) in the code. Thus, the product of two members in resgroup k maps to a higher resgroup: kp = kk + kn on rt_n, comprised of two components; kn (their cross-product resgroup), and kk (their k resgroup component). To describe this verbally, to find the product resgroup kp of any two resgroup members, numerate one member (for us a prime), call its residue r, add the other’s residue ri to it, multiply their sum by the resgroup value k, then add it to their residues cross-product resgroup. For (97 * 109) with k = 3 gives: Ex: kp = (97 * 109) / 30 = 3 * (97 + 19) + (7 * 19) / 30 = 3 * (109 + 7) + (19 * 7) / 30 = 352 For each Pn the last resgroup pc value is: (modpg + 1) ≡ 1 mod modpg, so for P5, its modpg*k + 31. To ensure pc / modpg = k always produces the correct k value, 2 is subtracted before the division. Thus the resultant residue value is 2 less than the correct one, so 2 is added back to get the true value. In sozpg: kn, rn = (prm * ri - 2).divmod md; kn is the correct resgroup and (rn + 2) the correct residue. The code uses rn without the addition sometimes when doing memory addressing. (In the code, the posn array performs the mapping at address (r – 2) into restrack rtn indices 0 – 7). Ex: (7 * 43) / 30 = 301 / 30 = 10, but 301 is the last pc in resgroup 9, so (301 – 2) / 30 is correct value. Also 301 % 30 = 1, but 299 % 30 = 29, and when 2 is added we get the correct residue 31 for pc 301. 7 sozpg def sozpg(val, res_0, start_num, end_num) # Compute the primes r0..sqrt(input_num) and store in 'primes' array. # Any algorithm (fast|small) is usable. Here the SoZ for P5 is used. md, rscnt = 30u64, 8 # P5's modulus and residues count res = [7,11,13,17,19,23,29,31] # P5's residues bitn = [0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128] kmax = (val - 2) // md + 1 prms = Array(UInt8).new(kmax, 0) modk, r, k = 0, -1, 0 # number of resgroups upto input value # byte array of prime candidates, init '0' # initialize residue parameters loop do # for r0..sqrtN primes mark their multiples if (r += 1) == rscnt; r = 0; modk += md; k += 1 end # resgroup parameters next if prms[k] & (1 << r) != 0 # skip pc if not prime prm_r = res[r] # if prime save its residue value prime = modk + prm_r # numerate the prime value break if prime > Math.isqrt(val) # exit loop when it's > sqrtN res.each do |ri| # mark prime's multiples in prms kn,rn = (prm_r * ri - 2).divmod md # cross-product resgroup|residue bit_r = bitn[rn] # bit mask for prod's residue kpm = k * (prime + ri) + kn # resgroup for 1st prime mult while kpm < kmax; prms[kpm] |= bit_r; kpm += prime end end end # prms now contains the nonprime positions for the prime candidates r0..N # extract only primes that are in inputs range into array 'primes' primes = [] of UInt64 # create empty dynamic array for primes prms.each_with_index do |resgroup, k| # for each kth residue group res.each_with_index do |r_i, i| # check for each ith residue in resgroup if resgroup & (1 << i) == 0 # if bit location a prime prime = md * k + r_i # numerate its value, store if in range # check if prime has multiple in range, if so keep it, if not don't n, rem = start_num.divmod prime # if rem 0 then start_num is multiple of prime primes << prime if (res_0 <= prime <= val) && (prime * (n + 1) <= end_num || rem == 0) end end end primes end Inputs: val – integer value for res_0 – first residue for selected SSoZ Pn end_num – inputs high value start_num – inputs low value Output: primes – array of sieving primes within inputs range sieves the prime multiples ≤ val to create P5’s pcs table held in byte array prms, as described. To extract only the necessary primes for the SSoZ it uses inputs: res_0, start_num, end_num sozpg is the first residue of the selected Pn for the SSoZ. For P5 it’s 7, but when Pn is larger, e.g. P7, P11, P13 etc, their res_0 are greater, i.e. 11, 13, 17, etc, so only the primes ≥ res_0 are kept. The last byte prm[kmax-1] may also have bit positions for primes > val, which aren’t needed and are discarded. res_0 We thus perform two checks for each found prime, the first being: (res_0 <= prime <= val) This filters out from P5’s pcs table the primes outside the SSoZ inputs range for the selected Pn. The second check determines the primes with multiples within the SSoZ inputs range that are needed. For small input ranges, primes > the range size can be discarded if they don’t have multiples within it. This is done by the check: (prime * (n + 1) <= end_num || rem == 0) 8 All the primes ≤ range = ( ( – – are used if their values are ≤ range = ( – ). But if )< some sieving primes may be discarded, i.e. when )< some primes may not have multiples within the range. Example: ( = 4,000,000; – )< (4,000,000 – 2,000) < 3,998,000 < = 2,000 If ≤ 3,998,000; say 500,000; the input range is ≥ 1999, the largest prime less than 2000, and all the primes < will have at least one multiple in the range, and must be used. If > 3,998,000, say 3,999,300, the primes < 700 (the input range) will have multiples in the range; 122 for P5. But some of the 178 primes between 700 < p < 2,000 will not, and can be discarded. The second test finds 103 are needed. So for P5 only 75% (225 of 300) of the primes < 2000 are used. Described below is the process to determine if a prime p has at least one multiple in the inputs range. | ––– p ––– | |rem | | np+p 1p…2p…3p…..np….|-------+-----------------------| start_num end_num For a given prime value do: n = start_num // prime; rem = start_num % prime In Crystal, et al, can just do: n, rem = start_num.divmod prime Then do the following test: prime * (n + 1) <= end_num || rem == 0 Here, n*p + rem = , where n is the number of prime’s multiples e.g. np ≤ . If rem is 0 then is a multiple of p, otherwise 0 < rem < p. If p > , n = 0. Thus (n*p + p) = p*(n + 1) is the next multiple of p whose value is > . If p*(n + 1) ≤ p is in range, if not, but rem = 0, then p*n = , so p is in range. Also, when performing: kn, rn = (prm_r * ri - 2).divmod md, rn’s true value is reduced by 2, but we need to know its true residue bit position to mark the prime multiples for those bit positions. Conceptually, given residue rn, its bit index is: posn[rn] = res.index(rn), for P5 a value from 0..7. Because the rn values are 2 less than their real values, (rn – 2) is used as their addresses into the array posn used to map them, coded as: posn=[];(0..rscnt-1).each { |n| posn[res[n]-2] = n } Then posn[7-2] = 0, posn[11-2] = 1, etc, and each rn bit value is: bit_r = 1 << posn[rn], which are OR’d into prms to mark the prime multiples as: prms[kpm] |= bit_r. The shift values 2i can be converted to their bit position values directly using array bitn[] e.g. now: bit_r= bitn[rn] posn =[0,0,0,0,0,0,0,0,0,1,0,2,0,0,0,3,0, 4,0,0,0, 5,0,0,0,0,0, 6,0, 7] bitn =[0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128] In both cases byte arrays can be used to store the values, as they all can be represented by just 8 bits. This is an implementation detail to decide. 9 Because the processing of each row is independent from the others we can perform both the sieve and prime extraction processes in parallel. Below shows Rust code using the Rayon crate to do this. fn atomic_slice(slice: &mut [u8]) -> &[AtomicU8] { unsafe { &*(slice as *mut [u8] as *const [AtomicU8]) } } fn sozpg(val: usize, res_0: usize, start_num : usize, end_num : usize) -> Vec<usize> { // Compute the primes r0..sqrt(input_num) and store in 'primes' array. // Any algorithm (fast|small) is usable. Here the SoZ for P5 is used. let (md, rscnt) = (30, 8); // P5's modulus and residues count static RES: [usize; 8] = [7,11,13,17,19,23,29,31]; static BITN: [u8; 30] = [0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128]; let let let let kmax = (val - 2) / md + 1; mut prms = vec![0u8; kmax]; sqrt_n = val.integer_sqrt(); (mut modk, mut r, mut k) = (0, 0, 0 // number of resgroups upto input value // byte array of prime candidates, init '0' // compute integer sqrt of val ); loop { // for r0..sqrtN primes mark their multiples if r == rscnt { r = 0; modk += md; k += 1 } if (prms[k] & (1 << r)) != 0 { r += 1; continue } // skip pc if not prime let prm_r = RES[r]; // if prime save its residue value let prime = modk + prm_r; // numerate the prime value if prime > sqrt_n { break } // exit loop when it's > sqrtN let prms_atomic = atomic_slice(&mut prms); // share mutable prms among threads RES.par_iter().for_each (|ri| { // mark prime's multiples in prms in parallel let prod = prm_r * ri - 2; // compute cross-product for prm_r|ri pair let bit_r = BITN[prod % md]; // bit mask for prod's residue let mut kpm = k * (prime + ri) + prod / md; // 1st resgroup for prime mult while kpm < kmax { prms_atomic[kpm].fetch_or(bit_r, Ordering::Relaxed); kpm += prime; }; }); r += 1; } // prms now contains the nonprime positions for the prime candidates r0..N // numerate the primes on each bit row in prms in parallel (won't be in sequential order) // return only the primes necessary to do SSoZ for given inputs in array 'primes' let primes = RES.par_iter().enumerate().flat_map_iter( |(i, ri)| { prms.iter().enumerate().filter_map(move |(k, resgroup)| { if resgroup & (1 << i) == 0 { let prime = md * k + ri; let (n, rem) = (start_num / prime, start_num % prime); if (prime >= res_0 && prime <= val) && (prime * (n + 1) <= end_num || rem == 0) { return Some(prime); } } None }) }).collect(); primes } Here the primes are extracted from each row in parallel using 8 threads, thus not kept in sequential order. Reversing the loops, as in the Crystal code, will extract them in order but will be slower as the number of resgroups increase. Since sequential order isn’t necessary to do the SSoZ this is optimal. For systems with more than 8 threads, using P7 with 48 residues may be faster, especially for large input values, if P7’s smaller number space can be processed faster with those threads than using P5. We can see the performance gain that’s achieved between using all the sieving primes upto end_num, to only using those with multiples within the inputs ranges, to then generating them in parallel in sozpg. The following examples using Rust show the three cases and the progressive performance increases. 10 This is the Rust output of the original unoptimized sozpg using these two 63-bit number as inputs. It shows (in nextp[2 x 129900044]) 129,900,044 sieving primes were generated, which accounted for most of the setup time. The times shown are for the i7 6700HQ 4C|8T and AMD 5900HZ 8C|16T cpus. $ echo 7200011140000000000 7200011139993250000 | ./twinprimes_ssoz157 threads = 8 // 16 using Prime Generator parameters for P5 segment size = 65536 resgroups; seg array is [1 x 1024] 64-bits twinprime candidates = 675003; resgroups = 225001 each of 3 threads has nextp[2 x 129900044] array setup time = 13.098702568 secs // 7.089318922 secs perform twinprimes ssoz sieve 3 of 3 twinpairs done sieve time = 9.731177018 secs // 4.944145598 secs total time = 22.829885781 secs // 12.033471504 secs last segment = 28393 resgroups; segment slices = 4 total twins = 4711; last twin = 7200011139999998808+/-1 These are the result from filtering out the unnecessary primes (no multiples in inputs range), using 49x fewer primes – 2,636,377. Though there’s some setup time increases for 8 threads, there’s a massive decrease in the sieve time, as each thread now does significantly less work (and use less memory). $ echo 7200011140000000000 7200011139993250000 | ./twinprimes_ssoz158 threads = 8 // 16 using Prime Generator parameters for P5 segment size = 65536 resgroups; seg array is [1 x 1024] 64-bits twinprime candidates = 675003; resgroups = 225001 each of 3 threads has nextp[2 x 2636377] array setup time = 13.743127493 secs // 6.987116498 secs perform twinprimes ssoz sieve 3 of 3 twinpairs done sieve time = 0.175270322 secs // 0.107544045 secs total time = 13.918427314 secs // 7.094673324 secs last segment = 28393 resgroups; segment slices = 4 total twins = 4711; last twin = 7200011139999998808+/-1 Finally, when sozpg performs the prime generation and filtering process in parallel the setup times drops from 13.7|6.9 to 5.3|4.7 secs, with a total time drop from 22.8|12.0 to ~5.5|4.9 secs. $ echo 7200011140000000000 7200011139993250000 | ./twinprimes_ssoz159 threads = 8 // 16 using Prime Generator parameters for P5 segment size = 65536 resgroups; seg array is [1 x 1024] 64-bits twinprime candidates = 675003; resgroups = 225001 each of 3 threads has nextp[2 x 2636377] array setup time = 5.296482074 secs // 4.74022821 secs perform twinprimes ssoz sieve 3 of 3 twinpairs done sieve time = 0.180924203 secs // 0.116552963 secs total time = 5.477426691 secs // 4.856791579 secs last segment = 28393 resgroups; segment slices = 4 total twins = 4711; last twin = 7200011139999998808+/-1 11 Constructing nextp nextp is a table of the resgroups for the first prime multiples for the sieving primes along each restrack. From P5’s pcs table we can look at each row and create Table 3 of their first prime multiples resgroups. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 Table 3. List of resgroup values for the first prime multiples – prime * (modk + ri) – for the primes shown. rt res 0 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 7 7 6 8 6 8 22 22 7 75 64 70 64 104 104 75 203 182 192 1 11 5 11 7 7 18 5 18 11 65 83 67 67 65 96 83 185 215 187 2 13 4 8 13 16 4 8 16 13 60 72 87 92 72 92 87 176 196 221 3 17 2 2 12 17 14 14 12 17 50 50 84 95 86 84 95 158 158 216 4 19 1 10 5 9 19 17 10 19 45 80 61 73 93 80 99 149 210 177 5 23 6 4 4 10 10 23 6 23 72 58 58 76 107 72 107 198 172 172 6 29 3 6 9 3 6 9 29 29 57 66 75 57 75 119 119 171 186 201 7 31 2 3 2 12 11 12 27 31 52 55 52 82 82 115 123 162 167 162 Note on each row, when two primes have the same resgroup table value they were multiplied. When only one value occurs, its either for a prime square, or a (prime * nonprime) value. Also, for a prime in any resgroup k, its first prime multiple resgroup value on its own row is just: prime * (k + 1) + k For P5’s pcs table this is equivalent to: k * (prime + 31) + ((prm_r * 31) - 2) / modpg (This is a property for every pc member in a resgroup for every Pn, for its first multiple on its row). To construct Table 3, each prime in P5’s pcs table multiplies each regroup member, whose products are other table values. Their row|col cell locations are entries into nextp. Thus starting with first prime 7: 7 * [7, 11, 13, 17, 19, 23, 27, 29, 31] = [49, 77, 91, 119, 133, 161, 203, 217] We see in P5’s pcs table, 49 occurs in resgroup k=1 for residue value 19, which is residue track 4 (rt4). Similarly for the remaining multiples of 7, we see their placement in the table. Repeating this process for each prime, we compute their first multiples, then determine their resgroup value for each restrack. 12 These first prime multiple locations in Table 3 are used to start marking off successive prime multiples along each restrack|row. The SoZ computes each prime’s multiples on the fly once and doesn’t need to store them for later use. The SSoZ computes an initial nextp for the inputs range first segment, which is updated at the end of each segment slice to set the first prime multiples for the next segment(s). For each sieve prime we compute its first multiple resgroup k for the restracks of interest, e.g. for twin pair residues. We then determine its regroup k’≥ kmin, where kmin is the resgroup for the start_num, input value (kmin = 1 if one input given). Thus k’≥ 0 is the number of resgroups starting from kmin. In the picture below, k is a prime’s 1st multiple resgroup on a row, and k’its projection relative to kmin. If k ≥ kmin, then k’= k - kmin. Thus if kmin = 3 and k = 7, k’=4 is its first resgroup inside the segment starting at kmin. If k = kmin then k’= 0, i.e. that first prime multiple starts at the segment’s beginning. | ––– p ––– | k |rem | k’ |.…..…..……….|…...|--------|----------------------kmin If k < kmin, we compute prime’s multiple closest to kmin, i.e. where k’= 0...prime-1 resgroups ≤ kmin: k’ k’ = (kmin - k) % prime = prime - k’ if k’ > 0 –> value of rem in picture –> translated k’ value > kmin Ex: for prime 7 on rt0, let k = 7, kmin = 21: then k’ = (21 - 7) % 7 = 0; to start from (multiple of 7). Ex: for prime 7 on rt0, let k = 7, kmin = 25: then k’ = (25 - 7) % 7 = 4; k’ = 7 - 4 = 3; to start from. In software, we can reassign the variable k to use for k’, so the (Crystal, et al) code just becomes: k < kmin ? (k = (kmin - k) % prime; k = prime - k if k > 0) : k -= kmin It should be noted, while the sieve primes have at least 1 multiple within the inputs range, some may not have multiples on each restrack, especially for small ranges, and for them k > kmax. If this happens for both residue pairs, those primes could be discarded from the primes lists for those residues sieves. For general purposes though, it won’t happen enough to increase performance to justify the extra code. To make the process|code simple, the k values for each sieve prime are generated and stored in nextp, without worry if they’re > kmax. If a prime’s k is larger than a segment size its skipped for it (not used to mark prime multiples) and reduced|updated by kn with smaller values for the next segment(s). When less than a segment size, it’s used in the residues sieve to mark prime multiples. Thus in twins_sieve, only primes with multiples in a segment for each restrack are used to mark prime multiples, or skipped. A unique nextp array is created for each residues pair in each thread for the sieving primes. Thus for twin|cousin primes, nextp holds their first prime multiples resgroups values for each segment slice for both residue pairs restracks. Thus its memory increases with inputs values (more sieving primes) and larger generators (more residue pairs), though active memory use will be determined by the number of parallel threads holding onto memory. How different languages manage memory affects the size and throughput they can achieve for various inputs and ranges, for a system’s memory size and profile. 13 Creating nextp for SSoZ In the SoZ, a prime’s residue r multiplies each Pn residue ri and (r * ri) mod modpg maps to a unique restrack rt in some resgroup k, is the starting point to mark off that prime’s multiples for that ri. We now want to multiply r by the ri that makes (r * ri) be on a given restrack rt, for each sieving prime. Thus if for some ri, (r * ri) mod modpg = rt, to find the ri that maps each r to a specific rt we do: Where for r-1, r_inv = modinv(r, modpg) in the code, with r being the residue for a sieve prime. (A property of prime generators is that every residue has an inverse, either itself or another residue.) Now kn = (r * ri - 2) / modpg, and k = (prime - 2) / modpg, so again: kpm = k * (prime + ri) + kn If r_inv is a prime’s residue inverse, and rt the desired restrack: ri = ( r_inv * rt - 2) mod modpg + 2 For each residues pair, nextp_init creates the nextp array of the sieve primes first resgroup multiples relative to kmin, for the rt values r_lo and r_hi, the upper|lower twinpair residues. With no loss of generality, it can be used to construct nextp for any architecture for any number of specified restracks. nextp_init def nextp_init(rhi, kmin, modpg, primes, resinvrs) # Initialize 'nextp' array for twinpair upper residue rhi in 'restwins'. # Compute 1st prime multiple resgroups for each prime r0..sqrt(N) and # store consecutively as lo_tp|hi_tp pairs for their restracks. nextp = Slice(UInt64).new(primes.size*2) # 1st mults array for twinpair r_hi, r_lo = rhi, rhi - 2 # upper|lower twinpair residue values primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) k = (prime - 2) // modpg # find the resgroup it's in r = (prime - 2) % modpg + 2 # and its residue value r_inv = resinvrs[r].to_u64 # and residue inverse rl = (r_inv * r_lo - 2) % modpg + 2 # compute r's ri for r_lo rh = (r_inv * r_hi - 2) % modpg + 2 # compute r's ri for r_hi kl = k * (prime + rl) + (r * rl - 2) // modpg # kl 1st mult resgroup kh = k * (prime + rh) + (r * rh - 2) // modpg # kh 1st mult resgroup kl < kmin ? (kl = (kmin - kl) % prime; kl = prime - kl if kl > 0) : (kl -= kh < kmin ? (kh = (kmin - kh) % prime; kh = prime - kh if kh > 0) : (kh -= nextp[j * 2] = kl.to_u64 # prime's 1st mult lo_tp resgroup val nextp[j * 2 | 1] = kh.to_u64 # prime's 1st mult hi_tp resgroup val end nextp end Inputs: rhi – hi residue value for this twinpair kmin – resgroup value for start_num modpg – modulus value for chosen pg primes – array of sieving primes resinvrs kmin) kmin) in range in range Output: nextp – array of primes 1st mults for given residues – array of residues modular inverses 14 Twins|Cousins SSoZ Let’s now construct the process to find twin primes ≤ N with a segmented sieve, using our P5 example. Twin primes are consecutive odd integers that are prime, the first two being [3:5], and [5:7]. Thus from our original P5 pcs table, we use just the consecutive pc residue tracks, whose residues table is below. A twin prime occurs when both twin pair pc values in a column are prime (not colored), e.g. [191:193]. Table 4. Twin Primes Residues Tracks Table for P5(541). k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 We see from the table the twin pair residue tracks for [11:13] has 10 twin primes ≤ 541, [17:19] has 6, and [29:31] has 7. Thus, the total twin prime count ≤ 541 is 23 + [3:5] + [5:7] = 25, with the last being [521:523]. Twin primes are usually referenced to the mid (even) number between the upper and lower consecutive odd primes pair, so the last (largest) twin pair ≤ 541 for [521:523] is written as 522 ± 1. As shown before, the number of twin|cousin residue pairs are equal to: (pn - 2)# = pn-2# = Π (pn – 2) Thus P5 has 3 residue pairs for each. Below are the three Cousin Prime pairs taken from P5’s pcs table. Table 5. Cousin Primes Residues Tracks Table for P5(541). k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 The SSoZ algorithm is the same for both, with their coding only differing to deal with accounting for low input values ranges, as the first cousin prime is defined as [3:7] and first twins are [3:5], [5:7]. Up to 541, there are 25 twin and 27 cousin primes. Their ratio over increasingly larger input ranges remains close to unity, as their pairs count, and pair prime values, infinitely increase, [3], [4]. 15 Residues Sieve Description The Segmented Sieve of Zakiya (SSoZ) is a memory efficient way to find the primes using a given Pn. For an input range defined by a start_num and end_num, it divides the range into segments, which are efficiently sized to fit into usable memory for processing. This allows the reuse of the same memory to process long number ranges that otherwise would require more memory than a system has to use. A standard segment slice is ks resgroups, with last one ks’ usually less. For a given Pn and range size set_sieve_parameters determines its optimal memory size, which is set to be a multiple of 64 (bits). | Fig. 1 ks | ks | ks | | ks ks | ks | ks | ks’ kmin | kmax |…………|…………|…………|…………|…….…..|…………|…………|……....| start_num end_num Here start|end_num are the lo|hi values that define a number range of interest. They also define the absolute values for kmin and kmax for a given Pn generator, as these resgroups cover these input values. When only one input is given it becomes end_num, whose resgroup determines kmax, and start_num is set to 3 (low prime for first twin [3:5]), and kmin set to 1 (min number of resgroups). The SSoZ sieve adjusts kmin|kmax for each residues pair when necessary, to ensure only their pc values within the inputs range are processed. For example, if start_num = 342 and end_num = 540, we see below the valid in-range pc values. Here kmin = 12 and kmax = 18 are the global resgroup values, which are adjusted as needed in twins_sieve for each residues pair. For [11:13], 341 < 342, so its kmin is increased to 13, whose values are all in the range. Conversely for twinpair [29:31], pc 541 > 540 is outside the range, so its kmax is reduced to 17, whose resgroup values are now all in the range. For twinpair [17:19] no adjustment is needed (done). Thus for each residues pair, we check if the numerated r_lo pc value in kmin is < start_num, and if so increment kmin, and check if the numerated r_hi pc value in kmax is > end_num, and decrement kmax if so. In twins_sieve the adjusted kmin|kmax values are determined then used in nextp_init to create nextp for the sieving primes to begin performing the residues sieve for the first segment in the range. Table 6. Twin Primes Residues Tracks Table for range 342 – 540. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 16 In twins_sieve segment array seg, its resgroups size ks is a multiple of 64-bit mem elements, where each bit represents a residues pair resgroup. Thus a resgroup k maps to bit: (k mod 64) in mem elem seg[k / 64], where (k mod 64) masks k’s lower 6 bits: (k & 0x3F), and (k / 64) right shifts k by 6 bits. This is coded as: seg[(kn - 1) >> 6], bit value: 1 << ((kn - 1) & 63), (>>|<< are right|left bit-shift opts). Ex: for ks = 131072 resgroups, seg size is 2048 64-bit mem elements for resgroup k = 89257, it maps to seg[1394], bit 240, mem value = 1 << 40 = 1099511627776 |……………………. ks …………………...| Fig. 2 ki ki+kn |….…|……|……|……|…~~~…|……|…….| seg[0] seg[kn-1] is the absolute resgroup value to start each segment slice (in Fig. 1) initialized to kmin-1 (0 indexed arrays). kn is the resgroups size for each segment slice. It’s initialized to ks, but if the last segment slice ks’ < ks resgroups it’s set to its slice size. ki To sieve for twin primes, etc, each instance of twins_sieve processes a unique twinpair for the entire inputs range split into ks resgroup size segments. It first determines the adjusted kmin|kmax values for the twinpair residues, then creates their initial nextp array of first resgroup sieve prime multiples k values. Using them, it iterates over the sieve primes, computes|updates their prime multiples k values, and sets them to ‘1’ in seg for each residues pair, until k > kn, the k value past the end of the current segment. When k > kn it updates it to: k = k – kn, which is the first k multiple value into the next segment, and stores it back into nextp for that prime to update it to use for the next segment(s). This is the Crystal code to mark a prime’s resgroup multiples in seg to ‘1’. This is done for the lo|hi residues pair, and if either resgroup member is a prime’s multiple that resgroup isn’t a twinprime. k = nextp.to_unsafe[j * 2] # while k < kn # seg[k >> s] |= 1_u64 << (k & bmask) k += prime end # nextp.to_unsafe[j * 2] = k - kn # starting from this resgroup in seg mark primenth resgroup bits prime mults set resgroup for prime's next multiple save 1st resgroup in next eligible seg When the residues sieve finishes seg contains the resgroup bit positions for the twin primes. Because seg is set to all ‘0’s to start each segment, we need to set to ‘1’ any unused hi bits in its last mem elem ks’ is in when it’s not a multiple of 64. Algorithmically this only needs to be done for the last segment. However, doing it after every segment is faster in software, as it eliminates the branching code to check for the last segment, and is more efficient to compile|run. Below is the Crystal code to perform this. seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) If kn = 89257 for the last segment, only the first 1395 64-bit seg mem elems are used, up to the 41st bit in the last elem, so we need to set to ‘1’ its bit values 241..263, because (89257-1 & 63) = 40, for bit 240. Thus we invert 1 to be: 11111111..1110 and left-shift it 40 bits, which is ORed with the last mem elem. If kn is a multiple of 64, (kn – 1) & bmask = 63, shifts the bits to be all 0s, and thus when ORed doesn’t change seg’s last mem value. Thus left shifts of n = 0..62 bits mask all the upper bit values: 263... 2n+1. 17 Once all the nonprime bits are set we can count|numerate the primes. We read each seg[0..kn-1] and invert the bits, and use popcount to count the ‘1’s (as primes) for each seg[i] (the Rust code counts the ‘0’s directly), and sum their segment count in variable cnt. If cnt > 0 we find the largest prime resgroup in the segment. We first update the total pairs count with sum += cnt. Then upk is set to the last resgroup value in the segment, then loops backward checking for the first bit that’s prime (‘0’), and then upk holds the largest|last prime pair resgroup in the segment. Its absolute resgroup value in the inputs range is then: hi_tp = ki + upk. For each segment slice its value is updated to a larger value, and at the end holds the largest absolute resgroup for these residues pair in the inputs range. The r_hi prime value is numerated and returned as: hi_tp * modpg + r_hi, along with the total prime pairs count in the range, in variable sum. seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) cnt = 0 # count the twinprimes in the segment seg[0..(kn - 1) >> s].each { |m| cnt += (~m).popcount } if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg count back to largest tp while seg[upk >> s] & (1_u64 << (upk & bmask)) != 0; upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end can be modified for different purposes. The code to find the largest prime pair can be removed if all you want is their count. I also originally had code to print out the r_hi primes in each segment as a validity check (only for small ranges). However, if you really wanted to see|record the twins, a better way may be to return ki|seg for each segment and externally store|process them later for any desired range of interest. (This, of course, would be very memory intensive.) twins_sieve Twin Primes Example Using our example to find the twin primes ≤ 541 with P5, let’s see how to processes the first twin pair residues [11:13] with kmax = 18. twin_sieve can perform the sieve for each pair in a separate thread. sets the segment size, but here I’ll set it to ks = 6. Thus, the seg array will represent 6 resgroups. Below is the twin pair table for [11:13] separated it into 3 segment slices of 6 resgroups each. Underneath it is what each seg array will look like after processing for each slice. (seg conceptually is a bitarray, so each seg[i] is just 1 bit. I later show an implementation using a bitarray, which makes the code simpler|shorter, and faster, depending on a language’s implementation.) set_sieve_parameters Table 7. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt11 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt13 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 k 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 seg 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 0 1 0 initializes netxp for the sieve primes [7, 11, 13, 17, 19, 23] for residues 11 and 13, taking the values shown in Table 3. For each lo|hi residue, their k values are stored as consecutive pairs in nextp and seg is created and initialized to all primes (‘0’). nextp_init 18 j 0 1 2 3 4 5 primes 7 11 13 17 19 23 Initial nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 5 11 7 7 18 5 rt_13 4 8 13 16 4 8 k 0 1 2 3 4 5 seg 0 0 0 0 0 0 For each prime j in primes, nextp[2j|2j+1] give the pairs k’s to start marking off prime’s multiples (by incrementing k by prime’s value). When k > kn, (here kn is always 6), it’s reduced by it: k = k - 6, and updates nextp with the new k values for the next segment. Below shows the changes to nextp and seg in twins_sieve. (It’s coincidental here the index size for primes and nextp are the segment size.) seg 1 Start for Segment 1 nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 5 11 7 7 18 5 rt_13 4 8 13 16 4 8 k 0 1 2 3 4 5 seg 0 0 0 0 1 1 Start for Segment 2 nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 6 5 1 1 12 22 rt_13 5 2 7 10 17 2 seg 2 k 0 1 2 3 4 5 seg 0 1 1 0 0 1 Start for Segment 3 nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 0 10 8 12 6 16 rt_13 6 7 1 4 11 19 seg 3 19 k 0 1 2 3 4 5 seg 1 1 0 0 1 0 Below is the Crystal code to perform the residues sieve (here for twins) for a given residues pair. twins_sieve def twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) # Perform in thread the ssoz for given twinpair residues for kmax resgroups. # First create|init 'nextp' array of 1st prime mults for given twinpair, # stored consequtively in 'nextp', and init seg array for ks resgroups. # For sieve, mark resgroup bits to '1' if either twinpair restrack is nonprime # for primes mults resgroups, and update 'nextp' restrack slices acccordingly. # Return the last twinprime|sum for the range for this twinpair residues. s = 6 # shift value for 64 bits bmask = (1 << s) - 1 # bitmask val for 64 bits sum, ki, kn = 0_u64, kmin-1, ks # init these parameters hi_tp, k_max = 0_u64, kmax # max twinprime|resgroup seg = Slice(UInt64).new(((ks - 1) >> s) + 1) # seg array of ks resgroups ki += 1 if ((ki * modpg) + r_hi - 2) < start_num # ensure lo tp in range k_max -= 1 if ((k_max - 1) * modpg + r_hi) > end_num # ensure hi tp in range nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array while ki < k_max # for ks size slices upto kmax kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) # for lower twinpair residue track k = nextp.to_unsafe[j * 2] # starting from this resgroup in seg while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2] = k - kn # save 1st resgroup in next eligible seg # for upper twinpair residue track k = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2 | 1]= k - kn # save 1st resgroup in next eligible seg end # set as nonprime unused bits in last seg[n] # so fast, do for every seg[i] seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) cnt = 0 # count the twinprimes in the segment seg[0..(kn - 1) >> s].each { |m| cnt += (~m).popcount } # invert to count ‘1’s if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg, count back to largest tp while seg.to_unsafe[upk >> s] & (1_u64 << (upk & bmask)) != 0; upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end ki += ks # set 1st resgroup val of next seg slice seg.fill(0) if ki < k_max # set next seg to all primes if in range end # when sieve done, numerate largest twin # for ranges w/o twins set largest to 1 hi_tp = (r_hi > end_num || sum == 0) ? 1 : hi_tp * modpg + r_hi {hi_tp.to_u64, sum.to_u64} # return largest twinprime|twins count end Inputs: ks – resgroups segment size rhi – hi residue value for this twinpair modpg – modulus value for chosen pg kmin – total number resgroups upto for start_num kmax – total number resgroups upto for end_num primes – array of sieving primes resinvrs – array of modular inverses for residues end_num – inputs high value start_num – inputs low value Outputs: sum – count of twinpairs for input range hi_tp – hi prime for largest twinprime in range 20 Starting with Crystal 1.4.0 (April 7, 2022) its bitarray implementation was highly optimized, making it faster than the 64-bit mem array for seg on the AMD 5900HX, while making the code substantially simpler to read|write and shorter. Below is the Crystal version using a bitarray for the seg array. def twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) # Perform in thread the ssoz for given twinpair residues for kmax resgroups. # First create|init 'nextp' array of 1st prime mults for given twinpair, # stored consequtively in 'nextp', and init seg array for ks resgroups. # For sieve, mark resgroup bits to '1' if either twinpair restrack is nonprime # for primes mults resgroups, and update 'nextp' restrack slices acccordingly. # Return the last twinprime|sum for the range for this twinpair residues. sum, ki, kn = 0_u64, kmin-1, ks # init these parameters hi_tp, k_max = 0_u64, kmax # max twinprime|resgroup seg = BitArray.new(ks) # seg array of ks resgroups ki += 1 if ((ki * modpg) + r_hi - 2) < start_num # ensure lo tp in range k_max -= 1 if ((k_max - 1) * modpg + r_hi) > end_num # ensure hi tp in range nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array while ki < k_max # for ks size slices upto kmax kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) # for lower twinpair residue track k = nextp.to_unsafe[j * 2] # starting from this resgroup in seg while k < kn # until end of seg seg.unsafe_put(k, true) # mark primenth resgroup bits prime mults k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2] = k - kn # save 1st resgroup in next eligible seg # for upper twinpair residue track k = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg while k < kn # until end of seg seg.unsafe_put(k, true) # mark primenth resgroup bits prime mults k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2 | 1]= k - kn # save 1st resgroup in next eligible seg end cnt = seg[...kn].count(false) # count|store twinprimes in segment if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg, count back to largest tp while seg.unsafe_fetch(upk); upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end ki += ks # set 1st resgroup val of next seg slice seg.fill(false) if ki < k_max # set next seg to all primes if in range end # when sieve done, numerate largest twin # for ranges w/o twins set largest to 1 hi_tp = (r_hi > end_num || sum == 0) ? 1 : hi_tp * modpg + r_hi {hi_tp.to_u64, sum.to_u64} # return largest twinprime|twins count end The code to find the largest twinprime in the range comes for FREE, and removing it has no detectable increase in speed, and for Crystal may even be a wee tad bit slower. sum += seg[...kn].count(false) ki += ks seg.fill(false) if ki < k_max end sum.to_u64 end # count|store twinprimes in segment # set 1st resgroup val of next seg slice # set next seg to all primes if in range # return twinprimes count in range In general, a bitarray’s performance depends on the language’s implementation (test to determine), but should make the code simpler|shorter to read|write, while the memory array model should be more ubiquitous, and implementable for languages without (native of external) bitarrays. 21 gcd def gcd(m, n) while m|1 != 1; t = m; m = n % m; n = t end m end Inputs: n – even pg modulus value m – an odd pc value < pg modulus n Output: gcd of inputs; (m, n) are coprime if 1 m– This is a customized gcd (greatest common divisor) function that uses residue properties to shorten the time of the Euclidean gcd algorithm (https://en.wikipedia.org/wiki/Euclidean_algorithm). Here m is an odd residue candidate < n, the even modulus value. Some of the language implementations just use the gcd function provided with them. modinv def modinv(a0, m0) return 1 if m0 == 1 a, m = a0, m0 x0, inv = 0, 1 while a > 1 inv -= (a // m) * x0 a, m = m, a % m x0, inv = inv, x0 end inv += m0 if inv < 0 inv.to_u64 end Inputs: a0 – odd pc value < modulus m0 m0 – even pg modulus value def modinv1(r, m) r = inv = r.to_u64 while (r * inv) % m != 1 inv = (inv % m) * r end inv % m end Output: inv – inverse of, a0 mod m0, e.g. a0*inv ≡ 1 mod m0 The function on the left is the standard modular inverse function (taken from Rosetta Code). The code on the right uses the residue property that – ri * rin ≡ 1 mod modpg – for some n ≥ 1, i.e. the modular inverse of residue ri is itself raised to some power n. This is faster for generators P3 and P5, with small number of residues, but becomes comparatively slower for generators with more residues. For P5’s residues: [7, 11, 13, 17, 19, 23, 29, 31] It’s inverses are: [13, 11, 7, 23, 19, 17, 29, 1] Inverse power n: [ 3, 1, 3, 3, 1, 3, 1, 1] 22 For a chosen Pn generator, gen_pg_parameters produces its parameters used to perform the SSoZ. It uses gcd to determine the residues and modinv to compute their inverses. gen_pg_parameters def gen_pg_parameters(prime) # Create prime generator parameters for given Pn puts "using Prime Generator parameters for P#{prime}" primes = [2, 3, 5, 7, 11, 13, 17, 19, 23] modpg, res_0 = 1, 0 # compute Pn's modulus and res_0 value primes.each { |prm| res_0 = prm; break if prm > prime; modpg *= prm } restwins = [] of Int32 # save upper twinpair residues here inverses = Array.new(modpg + 2, 0) # save Pn's residues inverses here pc, inc, res = 5, 2, 0 # use P3's PGS to generate pcs while pc < (modpg >> 1) # find PG's 1st half residues if gcd(pc, modpg) == 1 # if pc a residue mc = modpg - pc # create its modular complement inverses[pc] = modinv(pc, modpg) # save pc and mc inverses inverses[mc] = modinv(mc, modpg) # if in twinpair save both hi residues restwins << pc << mc + 2 if res + 2 == pc res = pc # save current found residue end pc += inc; inc ^= 0b110 # create next P3 seq pc: 5 7 11 13 17... end restwins.sort!; restwins <<(modpg + 1) # last residue is last hi_tp inverses[modpg+1] = 1; inverses[modpg-1] = modpg - 1 # last 2 are self inverses {modpg, res_0, restwins.size, restwins, inverses} end Inputs: prime – Pn prime value 5, 7… 17 Outputs: – first residue of selected Pn (next prime > Pn prime) modpg – modulus for generator Pn; value = (prime)# inverses – array of the pg residue inverses, size = (prime-1)# restwins – ordered array of the hi pg twinpair (tp) values restwins.size – the number of pg twinpairs = (prime-2)# res_0 For a given prime number, it generates its primorial value for modpg, and keeps its r0 value in res_0. It then generates all the residues. It uses P3’s PGS to generate Pn’s first half rcs. It checks if they’re coprime to modpg to identify the residues. For each residue it creates its modular complement (mc) and stores both inverses at their address values. It then determines if the residue is part of a twin (cousin) pair, and if so, then so is its complement, and stores both hi pair values in restwins. Upon generating all the residues, and storing their inverses and twin (cousin) pairs hi residues, the restwins array is sorted to put them in sequential order, then the last hi residue for the last twin pair modgp±1 are included as the last ones. (For cousin primes, we include the hi residue for the pivot pair (modpg/2 + 2)and then sort the array). Finally, the inverses for the last two residues modgp±1 are added at their address locations, and the outputs are returned for use in set_sieve_parameters. 23 Given the input values, set_sieve_parameters determines which prime generator to use, generates its parameters, then determines the range parameters and segment size to use. Here I use a rudimentary tree algorithm to determine for my laptops the switch points for using different generators. This can be made much more sophisticated and adaptable by also accounting for the number of system threads and cache and ram memory size, to pick better segment size values and generators for a given inputs range. set_sieve_parameters def set_sieve_parameters(start_num, end_num) # Select at runtime best PG and segment size parameters for input values. # These are good estimates derived from PG data profiling. Can be improved. nrange = end_num - start_num bn, pg = 0, 3 if end_num < 49 bn = 1; pg = 3 elsif nrange < 77_000_000 bn = 16; pg = 5 elsif nrange < 1_100_000_000 bn = 32; pg = 7 elsif nrange < 35_500_000_000 bn = 64; pg = 11 elsif nrange < 14_000_000_000_000 pg = 13 if nrange > 7_000_000_000_000; bn = 384 elsif nrange > 2_500_000_000_000; bn = 320 elsif nrange > 250_000_000_000; bn = 196 else bn = 128 end else bn = 384; pg = 17 end modpg, res_0, pairscnt, restwins, resinvrs = gen_pg_parameters(pg) kmin = (start_num-2) // modpg + 1 # number of resgroups to start_num kmax = (end_num - 2) // modpg + 1 # number of resgroups to end_num krange = kmax - kmin + 1 # number of resgroups in range, at least 1 n = krange < 37_500_000_000_000 ? 4 : (krange < 975_000_000_000_000 ? 6 : 8) b = bn * 1024 * n # set seg size to optimize for selected PG ks = krange < b ? krange : b # segments resgroups size puts "segment size = #{ks} resgroups for seg bitarray" maxpairs = krange * pairscnt # maximum number of twinprime pcs puts "twinprime candidates = #{maxpairs}; resgroups = #{krange}" {modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs} end Inputs: ––– high input value (min of 3) start_num – low input value (min of 3) end_num Outputs: – number of residue groups set for segment size res_0 – first residue of selected Pn (next prime > Pn prime) modpg – modulus value for chosen pg kmin – number resgroups to start_num kmax – number resgroups to end_num krange – number of resgroups for inputs range (at least 1) pairscnt – number of twinpairs for selected pg resinvrs – modular inverses array for the residues restwins – hi residue values array for each twinpair ks 24 Finally, shown below is the Crystal version of the main routine twinprimes_ssoz. It accepts the inputs, performs the residues sieve, times the different parts of the process, and generates the program outputs. twinprimes_ssoz def twinprimes_ssoz() end_num = {ARGV[0].to_u64, 3u64}.max start_num = ARGV.size > 1 ? {ARGV[1].to_u64, 3u64}.max : 3u64 start_num, end_num = end_num, start_num if start_num > end_num start_num |= 1 # if start_num even increase by 1 end_num = (end_num - 1) | 1 # if end_num even decrease by 1 start_num = end_num = 7 if end_num - start_num < 2 puts "threads = #{System.cpu_count}" ts = Time.monotonic # start timing sieve setup execution # select Pn, set sieving params for inputs modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs = set_sieve_parameters(start_num, end_num) # create sieve primes <= sqrt(end_num), only use those whose multiples within inputs range primes = end_num < 49 ? [5] : sozpg(Math.isqrt(end_num), res_0, start_num, end_num) puts "each of #{pairscnt} threads has nextp[2 x #{primes.size}] array" lo_range = restwins[0] - 3 # lo_range = lo_tp - 1 twinscnt = 0_u64 # determine count of 1st 4 twins if in range for used Pn twinscnt += [3, 5, 11, 17].select { |tp| start_num <= tp <= lo_range }.size unless end_num == 3 te = puts puts t1 = (Time.monotonic - ts).total_seconds.round(6) "setup time = #{te} secs" # display sieve setup time "perform twinprimes ssoz sieve" Time.monotonic # start timing ssoz sieve execution cnts = Array(UInt64).new(pairscnt, 0) # number of twinprimes found per thread lastwins = Array(UInt64).new(pairscnt, 0) # largest twinprime val for each thread done = Channel(Nil).new(pairscnt) threadscnt = Atomic.new(0) # count of finished threads restwins.each_with_index do |r_hi, i| # sieve twinpair restracks spawn do lastwins[i], cnts[i] = twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) print "\r#{threadscnt.add(1)} of #{pairscnt} twinpairs done" done.send(nil) end end pairscnt.times { done.receive } # wait for all threads to finish print "\r#{pairscnt} of #{pairscnt} twinpairs done" last_twin = lastwins.max # find largest hi_tp twinprime in range twinscnt += cnts.sum # compute number of twinprimes in range last_twin = 5 if end_num == 5 && twinscnt == 1 kn = krange % ks # set number of resgroups in last slice kn = ks if kn == 0 # if multiple of seg size set to seg size t2 = (Time.monotonic - t1).total_seconds # sieve execution time puts puts puts puts end "\nsieve time = #{t2.round(6)} secs" # ssoz sieve time "total time = #{(t2 + te).round(6)} secs" # setup + sieve time "last segment = #{kn} resgroups; segment slices = #{(krange - 1)//ks + 1}" "total twins = #{twinscnt}; last twin = #{last_twin - 1}+/-1" twinprimes_ssoz 25 Program Output Below is typical program output, shown here for Rust, for single and two input values (order doesn’t matter), run on an Intel i7-6700HQ Linux based laptop. The programs is run in a terminal with the command-line interface (cli) shown, and display the output shown. $ echo 5000000000 | ./twinprimes_ssoz threads = 8 using Prime Generator parameters for P11 segment size = 262144 resgroups; seg array is [1 x 4096] 64-bits twinprime candidates = 292207905; resgroups = 2164503 each of 135 threads has nextp[2 x 6999] array setup time = 0.000796737 secs perform twinprimes ssoz sieve 135 of 135 twinpairs done sieve time = 0.184892352 secs total time = 0.185704753 secs last segment = 67351 resgroups; segment slices = 9 total twins = 14618166; last twin = 4999999860+/-1 $ echo 100000000000 200000000000 | ./twinprimes_ssoz threads = 8 using Prime Generator parameters for P13 segment size = 524288 resgroups; seg array is [1 x 8192] 64-bits twinprime candidates = 4945055940; resgroups = 3330004 each of 1485 threads has nextp[2 x 37493] array setup time = 0.003883411 secs perform twinprimes ssoz sieve 1485 of 1485 twinpairs done sieve time = 3.819838338 secs total time = 3.823732178 secs last segment = 184276 resgroups; segment slices = 7 total twins = 199708605; last twin = 199999999890+/-1 The program output is described as follows: Line 0 is the cli input command. When 2 inputs are given their hi|lo order doesn’t matter. Line 1 shows the number of available system threads,. Line 2 shows the Pn generator selected based on the inputs. Line 3 shows the selected resgroup segment size ks, and number of 64-bit memory elements (ks / 64) for the segment array. Line 4 shows the number of twinprime candidates for the number of resgroups spanning the inputs range. In the second example, (kmax – kmin + 1) = 3,330,004 resgroups x 1485 (number of P13 twinpairs) = 4,945,055,940 twinprime candidates. Line 5 shows the number of twinpairs for the selected PG (here 1485 for P13) and the size of the nextp array, which shows the number of sieving primes used (6999 and 37493 for theses examples. Line 6 shows the time to select and generate Pn’s parameters and the sieve primes. Line 7 announces when the residues sieve process starts. Line 8 is a dynamic display showing in realtime how many twinpair threads are done, until finished. Line 9 shows the runtime for the residues sieve. Line 10 shows the combined setup and residues sieve times. Line 11 shows how many resgroups were in the last segment slice and the number of segment slices. Line 12 shows the number of twinprimes for the inputs range, and the value of the largest one. 26 Performance The SSoZ performs optimally on multi-core systems with parallel operating threads. The more available threads the higher the possible performance. To show this, I provide data from two systems. System 1: Intel i7-6700HQ, 2.6 – 3.5 GHz, 4C|8T, 16 MB, System76 Gazelle (2016) laptop. System 2: AMD 5900HX, 3.3 – 4.6 GHz, 8C|16T, 16 MB, Lenovo Legion slim 7 (2022) laptop. For a reference I used Primesieve 7.4 [5] – https://github.com/kimwalisch/primesieve – described as “a command-line program and C/C++ library for quickly generating prime numbers...using the segmented sieve of Eratosthenes with wheel factorization.” It’s a well maintained open source project of highly optimized C/C++ code libraries, which also takes inputs over the 64-bit range (but doesn’t produce results for cousin primes). Below are sample outputs for the Rust version of twinprimes_ssoz and Primesieve performed on both systems. $ echo 378043979 1429172500581 | ./twinprimes_ssoz threads = 8 // 16 using Prime Generator parameters for P13 segment size = 802816 resgroups; seg array is [1 x 12544] twinprime candidates = 70654672440; resgroups = 47578904 each of 1485 threads has nextp[2 x 92610] array setup time = 0.006171322 secs // 0.005839409 secs perform twinprimes ssoz sieve 1485 of 1485 twinpairs done sieve time = 55.836745969 secs // 18.062863872 secs total time = 55.842928445 secs // 18.068715224 secs last segment = 212760 resgroups; segment slices = 60 total twins = 2601278756; last twin = 1429172500572+/-1 $ echo 378043979 14291725005819 | ./twinprimes_ssoz threads = 8 // 16 using Prime Generator parameters for P17 segment size = 1572864 resgroups; seg array is [1 x 24576] twinprime candidates = 623572052400; resgroups = 27994256 each of 22275 threads has nextp[2 x 268695] array setup time = 0.036543755 secs // 0.025222812 secs perform twinprimes ssoz sieve 22275 of 22275 twinpairs done sieve time = 675.667368646 secs // 235,003460103 secs total time = 675.703922948 secs // 235.027696883 secs last segment = 1255568 resgroups; segment slices = 18 total twins = 22078408103; last twin = 14291725004982+/-1 $ ./primesieve -c2 378043979 1429172500581 Sieve size = 128 KiB // 256 KiB Threads = 8 // 16 100% Seconds: 101.873 // 33.781 Twin primes: 2601278756 $ ./primesieve -c2 378043979 14291725005819 Sieve size = 128 KiB // 256 KiB Threads = 8 // 16 100% Seconds: 1218.502 // 471.776 Twin primes: 22078408103 I implemented both the twins|cousins ssoz in the 6 programming languages listed here. Again, these are reference implementations, and are not necessarily optimum for each language. The Rust versions are the most optimized, and generally the fastest, as they performs the soz algorithm in parallel. The code for each is < 300 ploc (programming lines of code), which highlights the simplicity of the algorithm. The next page shows tables of benchmark results for the 6 languages implementations, and Primesieve. They are the best times for both systems from multiple runs under different operating conditions. Their code was developed on System 1, and those binaries also run on System 2. Their source code was then compiled on System 2 to compare performance differences, and those were used for the benchmarks. The 6 languages, and their development environments and versions are: C++, Nim 1.6.4 (gcc 11.3.0), D (ldc2 1.28.0, LLVM 12.0.1), Crystal 1.4.1 (LLVM 10.0.0), Rust 1.60, and Go 1.18. They most likely can be improved, and I hope others will create more versions, especially for other compiled languages. 27 N 1x10^10 5x10^10 1x10^11 5x10^11 1x10^12 5x10^12 1x10^13 Rust 0.35 1.67 3.41 18.15 37.67 219.67 482.51 Twin Prime Benchmark Comparisons – Intel i7 6700HQ C++ D Nim Crystal Go Prmsv Twins Count Largest in Range 0.45 0.46 0.53 0.48 0.61 0.51 27,412,679 9,999,999,703|-2 2.14 2.19 2.27 2.40 2.76 2.81 118,903,682 49,999,999,591|-2 4.24 4.31 4.34 4.69 5.51 5.91 224,376,048 99,999,999,763|-2 21.42 21.37 21.69 23.81 28.11 32.76 986,222,314 499,999,999,063|-2 44.48 44.25 44.71 49.05 58.08 69.25 1,870,585,220 999,999,999,961|-2 253.62 256.30 253.69 279.49 319.84 395.16 8,312,493,003 4,999,999,999,879|-2 543.74 542.23 541.35 602.63 678.61 825.71 15,834,664,872 9,999,999,998,491|-2 N Rust 1x10^10 0.36 5x10^10 1.69 1x10^11 3.35 5x10^11 18.08 1x10^12 37.17 5x10^12 220.05 1x10^13 478.96 N 1x10^10 5x10^10 1x10^11 5x10^11 1x10^12 5x10^12 1x10^13 N 1x10^10 5x10^10 1x10^11 5x10^11 1x10^12 5x10^12 1x10^13 Rust 0.12 0.54 1.12 5.85 12.14 68.04 145.01 Cousin Prime Benchmark Comparisons – Intel i7 6700HQ C++ D Nim Crystal Go Cousins Count Largest in Range 0.45 0.46 0.53 0.48 0.62 27,409,998 9,999,999,707|-4 2.11 2.18 2.26 2.41 2.81 118,908,265 49,999,999,961|-4 4.20 4.46 4.32 4.64 5.52 224,373,159 99,999,999,947|-4 21.34 21.35 21.76 23.36 28.21 986,220,867 499,999,999,901|-4 44.57 44.44 44.51 49.14 58.25 1,870,585,457 999,999,998,867|-4 250.63 251.86 252.18 278.76 320.15 8,312,532,286 4,999,999,999,877|-4 534.17 541.85 540.81 597.89 678.48 15,834,656,001 9,999,999,999,083|-4 Twin Prime Benchmark Comparisons – AMD Ryzen 9 5900HX C++ D Nim Crystal Go Prmsv Twins Count Largest in Range 0.12 0.12 0.19 0.13 0.15 0.16 27,412,679 9,999,999,703|-2 0.49 0.58 0.59 0.66 0.67 0.92 118,903,682 49,999,999,591|-2 0.97 1.13 1.08 1.23 1.32 1.95 224,376,048 99,999,999,763|-2 4.88 5.75 5.22 6.22 6.92 11.17 986,222,314 499,999,999,063|-2 10.03 12.01 11.12 13.06 14.61 23.71 1,870,585,220 999,999,999,961|-2 65.41 69.24 73.54 74.29 81.23 132.99 8,312,493,003 4,999,999,999,879|-2 155.45 156.57 172.68 170.77 185.25 307.78 15,834,664,872 9,999,999,998,491|-2 Cousin Prime Benchmark Comparisons – AMD Ryzen 9 5900HX Rust C++ D Nim Crystal Go Cousins Count Largest in Range 0.12 0.11 0.13 0.19 0.13 0.15 27,409,998 9,999,999,707|-4 0.55 0.49 0.57 0.59 0.63 0.66 118,908,265 49,999,999,961|-4 1.12 0.96 1.13 1.07 1.22 1.32 224,373,159 99,999,999,947|-4 5.87 4.89 5.78 5.25 6.18 6.92 986,220,867 499,999,999,901|-4 12.25 10.14 12.14 11.06 12.56 14.67 1,870,585,457 999,999,998,867|-4 67.69 68.51 68.74 74.68 74.86 80.29 8,312,532,286 4,999,999,999,877|-4 145.02 157.68 156.01 173.16 170.06 179.07 15,834,656,001 9,999,999,999,083|-4 28 Enhanced Configurations The software provided is designed to work on readily available 64-bit systems, and serve as reference implementations, to demonstrate how Prime Generators can be used to efficiently identify and count primes. They can be enhanced to take advantage of more hardware resources when available. Ideally we want to use as many system threads as possible. So for P5, which has 3 twin|cousin residue pairs, instead of using 3 threads over an input range it may be faster to divide the range into 2 equal parts and use 6 threads (3 for each half). Even if a system has only 4 threads, this may be faster as the range increases, but should definitely be faster (for sufficiently large ranges) if a system has 6 or more threads. In fact, if a system has at least 16 threads, using P7 (15 residue pairs) as the default generator for small ranges may be more efficient than P5, as they all can run in 1 parallel threads time (ptt). Thus a more sophisticated algorithm can be devised for set_sieve_parameters to use threads count, and also cache|memory sizes, to pick the best generator and segment size for given input ranges. For best performance this would require the profiling of targeted hardware system(s), to optimize the differences between cpus and systems capabilities and resources. However, I think the algorithm would still be fairly simple to code, to dynamically compute these parameters to achieve higher performance. Eliminating Sieving Primes As the value for end_num becomes larger more|bigger sieve primes must be generated, and filtered out or kept. Generating them takes increasing time with increasing input values. This also affects the time to perform the residue sieve, by increasing the time (and memory) to create the nextp array, and use it. While it’s possible to use stored lists of primes to eliminate dynamically generating them, this doesn’t get around creating nextp with them, with the associated memory issues for it in each thread. One simple way around this is to use a fast primality test algorithm to check each residue pair pc value in each resgroup in the threads. If one value isn’t prime the other doesn’t have to be checked. By using sufficiently large generators for a given input range, the number of resgroups over a range can be made arbitrarily small to reduce the number of primality tests to perform. For example for P47, modp47 = 614,889,782,588,491,410 is the largest primorial value that can fit into (unsigned) 64-bits. Its 15,681,106,801,985,625 residue pairs use 5.1% of the number space to hold the twin|cousin primes > 47. Eliminating using sieving primes greatly reduces the work of the algorithm. Realizable machines to perform this would use as many parallel compute engines as possible, but each would now be much simpler, eliminating sozpg and nextp_init. Now gen_pg_parameters just identifies the residue pair values (and no longer their inverses), needing only a (fast) gcd function. This could be done with massive arrays of graphic processing units (GPUs), or better, Simple Super Computers (SSCs). To search for yet undiscovered million digit primes, a distributed network can be constructed, similar to that for the Grand Internet Mersenne Prime Search (GIMPS) [7] and Twin Primes Search [8]. A benefit of creating this network, is that with all the available (free) compute power in the world, groups of residue pairs can be dedicated to machine clusters and run full time, and deterministically identify new twins|cousins (thus two primes for the price of one) forever, as there are an infinity of each [3], [4]. 29 The Ultimate Primes Search Machine Using just a few basic properties of Prime Generator Theory (PGT) we can construct a conceptually simpler and more efficient machine to find as many primes as physical reality and time will allow. Because for any Pn, modpn = pm# (primorial of first m primes), r0 = pm+1, and the residues from r0 to r02 are consecutive primes, we don’t have to do primality tests for them, but merely gcd tests to determine which values are coprime to modpn. Thus we can arbitrarily use any prime as r0 of a Pn whose modpn is the primorial of all the primes < r0, to directly find the consecutive primes in [r0, r02). After finding the new additional primes, we can them create a larger Pn modulus with them, and repeat the process, to continually find more primes. Primes r0 to r0^2 30000 Number of Primes 25000 20000 15000 10000 5000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Number of Pn Primorial Primes This graph shows the number of consecutive primes in the regions [r0, r02) for generator moduli made with the first 100 primes. Thus for the last data point for p100 = 541, from r0 = 547 to r02 = 299,209 there are 25,836 primes|residues, and we now know the first 25,936 primes, with 299,197 the largest prime. Using this approach we no longer have to even identify the residue pairs, but just maintain and use the growing modulus values to perform the gcd operations with. The key here is to do the gcd operations on chunks of partial primorial values as we identify more primes and not one humongous pm# value. Thus as we identify new primes, we make partial primorial chunks with them. To check if a value is a residue we perform repeated gcd tests with all the partial primorial chunks. If any partial gcd chunk is not 1 (coprime) then that rc value isn’t a residue and we can stop testing it. Only rc values that pass all the partial chunks tests (done in parallel) are residues to the full modpn value, and thus are new primes. The main job for this machine would be to control the creation, distribution, and storage of the gcd operations, and their results, performed by a distributed network of compute engines. For each range [r0, r02) it would use the PGS for some smaller Pn, (e.g. P3’s PGS in the code to reduce the residues candidates search space to 1/3 of the range values) and distribute the rcs for testing. After creating a list of new consecutive primes, it can be processed to identify new primes or k-tuples of any type. 30 Source Code The SSoZ is a good algorithm to assess hardware and software multi-threading capabilities. It’s very simple mathematically, needing only basic computational functions most languages have, but are easy to implement if they don’t. The implementations I provide should be considered as references and not necessarily optimum for each language. They should be considered as starting points to improve upon, as they, most importantly, produce correct results that other implementations can check results against. The code source files can be found here [6]: https://gist.github.com/jzakiya, and individually below. twinprimes_ssoz Crystal – https://gist.github.com/jzakiya/2b65b609f091dcbb6f792f16c63a8ac4 Rust – https://gist.github.com/jzakiya/b96b0b70cf377dfd8feb3f35eb437225 Nim – https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e C++ – https://gist.github.com/jzakiya/fa76c664c9072ddb51599983be175a3f Go – https://gist.github.com/jzakiya/fbc77b8fdd12b0581a0ff7c2476373d9 D – https://gist.github.com/jzakiya/ae93bfa03dbc8b25ccc7f97ff8ad0f61 cousinprimes_ssoz Crystal – https://gist.github.com/jzakiya/0d6987ee00f3708d6cfd6daee9920bd7 Rust – https://gist.github.com/jzakiya/8879c0f4dfda543eaf92a3186de554d7 Nim – https://gist.github.com/jzakiya/e2fa7211b52a4aa34a4de932010eac69 C++ – https://gist.github.com/jzakiya/3799bd8604bdcba34df5c79aae6e55ac Go – https://gist.github.com/jzakiya/0ea756a8f6fd09f56cd9374d0dcf4197 D – https://gist.github.com/jzakiya/147747d391b5b0432c7967dd17dae124 Conclusion Prime Generators allow for the creation of efficient, simple, and resource sparse generic algorithms that can be performed with any Pn generator. Generators can dynamically be chosen to optimize speed and memory use for given number ranges, to best use the hardware and software resources available. The SSoZ algorithms are inherently implementable in parallel, and can be performed on any hardware or distributed system that provides multiple cores or compute engines. As shown, the more cores and threads that are available to use the higher the inherent performance will be for a given number range. While the code to generate Twin and Cousin primes was shown here, the basic math and principles explaining the process for them can be applied similarly to find other k-tuples, and other specific prime types, such as Mersenne Primes [2]. It is hoped this detailed explanation of how the SSoZ works and performs will encourage its use in applied applications, and its inclusion in software libraries, et al, that are used in the study of primes. 31 References [1] The Segmented Sieve of Zakiya (SSoZ) https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ [2] The Use of Prime Generators to Implement Fast Twin Prime Sieve of Zakiya (SoZ), Applications to Number Theory and Implications for the Riemann Hypotheses https://www.academia.edu/37952623/The_Use_of_Prime_Generators_to_Implement_Fast_Twin_Prim es_Sieve_of_Zakiya_SoZ_Applications_to_Number_Theory_and_Implications_for_the_Riemann_Hyp otheses [3] On The Infinity of Twin Primes and other K-tuples https://www.academia.edu/41024027/On_The_Infinity_of_Twin_Primes_and_other_K_tuples [4] (Simplest) Proof of Twin Primes and Polignacs’ Conjectures (video): https://www.youtube.com/watch?v=HCUiPknHtfY&t=940s [5] Primesieve - https://github.com/kimwalisch/primesieve [6] Twins|Cousins SSoZ software language source files: https://gist.github.com/jzakiya [7] Grand Internet Mersenne Primes Search (GIMPS) – https://www.mersenne.org/ [8] Twins Primes Search – https://primes.utm.edu/bios/page.php?id=949 32 # This Crystal source file is a multiple threaded implementation to perform an # extremely fast Segmented Sieve of Zakiya (SSoZ) to find Twin Primes <= N. # Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1. # Output is the number of twin primes <= N, or in range N1 to N2; the last # twin prime value for the range; and the total time of execution. # This code was developed on a System76 laptop with an Intel I7 6700HQ cpu, # 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning # probably needed to optimize for other hardware systems (ARM, PowerPC, etc). # # # # # Compile as: $ crystal build twinprimes_ssozgist.cr -Dpreview_mt --release To reduce binary size do: $ strip twinprimes_ssoz Thread workers default to 4, set to system max for optimum performance. Single val: $ CRYSTAL_WORKERS=8 ./twinprimes_ssoz val1 Range vals: $ CRYSTAL_WORKERS=8 ./twinprimes_ssoz val1 val2 # # # # # # Mathematical and technical basis for implementation are explained here: https://www.academia.edu/37952623/The_Use_of_Prime_Generators_to_Implement_Fast_ Twin_Primes_Sieve_of_Zakiya_SoZ_Applications_to_Number_Theory_and_Implications_ for_the_Riemann_Hypotheses https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_ https://www.academia.edu/19786419/PRIMES-UTILS_HANDBOOK # This source code, and its updates, can be found here: # https://gist.github.com/jzakiya/2b65b609f091dcbb6f792f16c63a8ac4 # This code is provided free and subject to copyright and terms of the # GNU General Public License Version 3, GPLv3, or greater. # License copy/terms are here: http://www.gnu.org/licenses/ # Copyright (c) 2017-2022; Jabari Zakiya -- jzakiya at gmail dot com # Last update: 2022/05/22 # Customized gcd for prime generators; n > m; m odd def gcd(m, n) while m|1 != 1; t = m; m = n % m; n = t end m end # Compute modular inverse a^-1 to base m, e.g. a*(a^-1) mod m = 1 def modinv(a0, m0) return 1 if m0 == 1 a, m = a0, m0 x0, inv = 0, 1 while a > 1 inv -= (a // m) * x0 a, m = m, a % m x0, inv = inv, x0 end inv += m0 if inv < 0 inv end def gen_pg_parameters(prime) # Create prime generator parameters for given Pn puts "using Prime Generator parameters for P#{prime}" primes = [2, 3, 5, 7, 11, 13, 17, 19, 23] modpg, res_0 = 1, 0 # compute Pn's modulus and res_0 value primes.each { |prm| res_0 = prm; break if prm > prime; modpg *= prm } restwins inverses pc, inc, while pc = [] of Int32 = Array.new(modpg + 2, 0) res = 5, 2, 0 < (modpg >> 1) # # # # save upper twinpair residues here save Pn's residues inverses here use P3's PGS to generate pcs find PG's 1st half residues 33 if gcd(pc, modpg) == 1 # if pc a residue mc = modpg - pc # create its modular complement inverses[pc] = modinv(pc, modpg) # save pc and mc inverses inverses[mc] = modinv(mc, modpg) # if in twinpair save both hi residues restwins << pc << mc + 2 if res + 2 == pc res = pc # save current found residue end pc += inc; inc ^= 0b110 # create next P3 sequence pc: 5 7 11 13 17 19 ... end restwins.sort!; restwins << (modpg + 1) # last residue is last hi_tp inverses[modpg + 1] = 1; inverses[modpg - 1] = modpg - 1 # last 2 residues are self inverses {modpg, res_0, restwins.size, restwins, inverses} end def set_sieve_parameters(start_num, end_num) # Select at runtime best PG and segment size parameters for input values. # These are good estimates derived from PG data profiling. Can be improved. nrange = end_num - start_num bn, pg = 0, 3 if end_num < 49 bn = 1; pg = 3 elsif nrange < 77_000_000 bn = 16; pg = 5 elsif nrange < 1_100_000_000 bn = 32; pg = 7 elsif nrange < 35_500_000_000 bn = 64; pg = 11 elsif nrange < 14_000_000_000_000 pg = 13 if nrange > 7_000_000_000_000; bn = 384 elsif nrange > 2_500_000_000_000; bn = 320 elsif nrange > 250_000_000_000; bn = 196 else bn = 128 end else bn = 384; pg = 17 end modpg, res_0, pairscnt, restwins, resinvrs = gen_pg_parameters(pg) kmin = (start_num-2) // modpg + 1 # number of resgroups to start_num kmax = (end_num - 2) // modpg + 1 # number of resgroups to end_num krange = kmax - kmin + 1 # number of resgroups in range, at least 1 n = krange < 37_500_000_000_000 ? 4 : (krange < 975_000_000_000_000 ? 6 : 8) b = bn * 1024 * n # set seg size to optimize for selected PG ks = krange < b ? krange : b # segments resgroups size puts "segment size = #{ks} resgroups for seg bitarray" maxpairs = krange * pairscnt # maximum number of twinprime pcs puts "twinprime candidates = #{maxpairs}; resgroups = #{krange}" {modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs} end def sozpg(val, res_0, start_num, end_num) # Compute the primes r0..sqrt(input_num) and store in 'primes' array. # Any algorithm (fast|small) is usable. Here the SoZ for P5 is used. md, rscnt = 30u64, 8 # P5's modulus and residues count res = [7,11,13,17,19,23,29,31] # P5's residues bitn = [0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128] kmax = (val - 2) // md + 1 prms = Array(UInt8).new(kmax, 0) modk, r, k = 0, -1, 0 # number of resgroups upto input value # byte array of prime candidates, init '0' # initialize residue parameters loop do # for r0..sqrtN primes mark their multiples if (r += 1) == rscnt; r = 0; modk += md; k += 1 end # resgroup parameters next if prms[k] & (1 << r) != 0 # skip pc if not prime 34 prm_r = res[r] # if prime save its residue value prime = modk + prm_r # numerate the prime value break if prime > Math.isqrt(val) # exit loop when it's > sqrtN res.each do |ri| # mark prime's multiples in prms kn,rn = (prm_r * ri - 2).divmod md # cross-product resgroup|residue bit_r = bitn[rn] # bit mask for prod's residue kpm = k * (prime + ri) + kn # resgroup for 1st prime mult while kpm < kmax; prms[kpm] |= bit_r; kpm += prime end end end # prms now contains the nonprime positions for the prime candidates r0..N # extract only primes that are in inputs range into array 'primes' primes = [] of UInt64 # create empty dynamic array for primes prms.each_with_index do |resgroup, k| # for each kth residue group res.each_with_index do |r_i, i| # check for each ith residue in resgroup if resgroup & (1 << i) == 0 # if bit location a prime prime = md * k + r_i # numerate its value, store if in range # check if prime has multiple in range, if so keep it, if not don't n, rem = start_num.divmod prime # if rem 0 then start_num is multiple of prime primes << prime if (res_0 <= prime <= val) && (prime * (n + 1) <= end_num || rem == 0) end end end primes end def nextp_init(rhi, kmin, modpg, primes, resinvrs) # Initialize 'nextp' array for twinpair upper residue rhi in 'restwins'. # Compute 1st prime multiple resgroups for each prime r0..sqrt(N) and # store consecutively as lo_tp|hi_tp pairs for their restracks. nextp = Slice(UInt64).new(primes.size*2) # 1st mults array for twinpair r_hi, r_lo = rhi, rhi - 2 # upper|lower twinpair residue values primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) k = (prime - 2) // modpg # find the resgroup it's in r = (prime - 2) % modpg + 2 # and its residue value r_inv = resinvrs[r].to_u64 # and residue inverse rl = (r_inv * r_lo - 2) % modpg + 2 # compute r's ri for r_lo rh = (r_inv * r_hi - 2) % modpg + 2 # compute r's ri for r_hi kl = k * (prime + rl) + (r * rl - 2) // modpg # kl 1st mult resgroup kh = k * (prime + rh) + (r * rh - 2) // modpg # kh 1st mult resgroup kl < kmin ? (kl = (kmin - kl) % prime; kl = prime - kl if kl > 0) : (kl -= kh < kmin ? (kh = (kmin - kh) % prime; kh = prime - kh if kh > 0) : (kh -= nextp[j * 2] = kl.to_u64 # prime's 1st mult lo_tp resgroup val nextp[j * 2 | 1] = kh.to_u64 # prime's 1st mult hi_tp resgroup val end nextp end kmin) kmin) in range in range def twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) # Perform in thread the ssoz for given twinpair residues for kmax resgroups. # First create|init 'nextp' array of 1st prime mults for given twinpair, # stored consequtively in 'nextp', and init seg array for ks resgroups. # For sieve, mark resgroup bits to '1' if either twinpair restrack is nonprime # for primes mults resgroups, and update 'nextp' restrack slices acccordingly. # Return the last twinprime|sum for the range for this twinpair residues. s = 6 # shift value for 64 bits bmask = (1 << s) - 1 # bitmask val for 64 bits sum, ki, kn = 0_u64, kmin-1, ks # init these parameters hi_tp, k_max = 0_u64, kmax # max twinprime|resgroup seg = Slice(UInt64).new(((ks - 1) >> s) + 1) # seg array of ks resgroups ki += 1 if ((ki * modpg) + r_hi - 2) < start_num # ensure lo tp in range k_max -= 1 if ((k_max - 1) * modpg + r_hi) > end_num # ensure hi tp in range nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array while ki < k_max # for ks size slices upto kmax kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) # for lower twinpair residue track k = nextp.to_unsafe[j * 2] # starting from this resgroup in seg 35 while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2] = k - kn # save 1st resgroup in next eligible seg # for upper twinpair residue track k = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2 | 1]= k - kn # save 1st resgroup in next eligible seg end # set as nonprime unused bits in last seg[n] # so fast, do for every seg[i] seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) cnt = 0 # count the twinprimes in the segment seg[0..(kn - 1) >> s].each { |m| cnt += (~m).popcount } if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg, count back to largest tp while seg.to_unsafe[upk >> s] & (1_u64 << (upk & bmask)) != 0; upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end ki += ks # set 1st resgroup val of next seg slice seg.fill(0) if ki < k_max # set next seg to all primes if in range end # when sieve done, numerate largest twin # for ranges w/o twins set largest to 1 hi_tp = (r_hi > end_num || sum == 0) ? 1 : hi_tp * modpg + r_hi {hi_tp.to_u64, sum.to_u64} # return largest twinprime|twins count end def twinprimes_ssoz() end_num = {ARGV[0].to_u64, 3u64}.max start_num = ARGV.size > 1 ? {ARGV[1].to_u64, 3u64}.max : 3u64 start_num, end_num = end_num, start_num if start_num > end_num start_num |= 1 # if start_num even increase by 1 end_num = (end_num - 1) | 1 # if end_num even decrease by 1 start_num = end_num = 7 if end_num - start_num < 2 puts "threads = #{System.cpu_count}" ts = Time.monotonic # start timing sieve setup execution # select Pn, set sieving params for inputs modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs = set_sieve_parameters(start_num, end_num) # create sieve primes <= sqrt(end_num), only use those whose multiples within inputs range primes = end_num < 49 ? [5] : sozpg(Math.isqrt(end_num), res_0, start_num, end_num) puts "each of #{pairscnt} threads has nextp[2 x #{primes.size}] array" lo_range = restwins[0] - 3 # lo_range = lo_tp - 1 twinscnt = 0_u64 # determine count of 1st 4 twins if in range for used Pn twinscnt += [3, 5, 11, 17].select { |tp| start_num <= tp <= lo_range }.size unless end_num == 3 te = puts puts t1 = (Time.monotonic - ts).total_seconds.round(6) "setup time = #{te} secs" # display sieve setup time "perform twinprimes ssoz sieve" Time.monotonic # start timing ssoz sieve execution cnts = Array(UInt64).new(pairscnt, 0) # number of twinprimes found per thread lastwins = Array(UInt64).new(pairscnt, 0) # largest twinprime val for each thread done = Channel(Nil).new(pairscnt) threadscnt = Atomic.new(0) # count of finished threads restwins.each_with_index do |r_hi, i| # sieve twinpair restracks spawn do lastwins[i], cnts[i] = twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, 36 resinvrs) print "\r#{threadscnt.add(1)} of #{pairscnt} twinpairs done" done.send(nil) end end pairscnt.times { done.receive } # wait for all threads to finish print "\r#{pairscnt} of #{pairscnt} twinpairs done" last_twin = lastwins.max # find largest hi_tp twinprime in range twinscnt += cnts.sum # compute number of twinprimes in range last_twin = 5 if end_num == 5 && twinscnt == 1 kn = krange % ks # set number of resgroups in last slice kn = ks if kn == 0 # if multiple of seg size set to seg size t2 = (Time.monotonic - t1).total_seconds # sieve execution time puts puts puts puts end "\nsieve time = #{t2.round(6)} secs" # ssoz sieve time "total time = #{(t2 + te).round(6)} secs" # setup + sieve time "last segment = #{kn} resgroups; segment slices = #{(krange - 1)//ks + 1}" "total twins = #{twinscnt}; last twin = #{last_twin - 1}+/-1" twinprimes_ssoz 37
Twin Primes Segmented Sieve of Zakiya (SSoZ) Explained Jabari Zakiya © June 10, 2022 jzakiya@gmail.com Introduction In 2014 I released The Segmented Sieve of Zakiya (SSoZ) [1]. It described a general method to find primes using an efficient prime sieve based on Prime Generators (PG). I expanded upon it, and in 2018 I released The Use of Prime Generators to Implement Fast Twin Primes Sieve of Zakiya (SoZ), Applications to Number Theory, and Implications for the Riemann Hypotheses [2]. The algorithm has been improved and now also used to find Cousin Primes. This paper explains in detail the what, why, and how of the algorithm and shows its implementation in 6 software languages, and performance data for these 6 languages run on 2 different cpu systems, with 8 and 16 threads. General Description The programs count the number of Twin|Cousin Primes between two numbers within a 64-bit range, i.e. 0 – 18,446,744,073,709,551,615 (2**64 – 1), and also returns the largest twin|cousin value within it. The algorithm has no mathematical limits, but [hard|soft]ware does, so its coded to run on commonly available 64-bit multi-core systems containing a reasonable amount of memory (the more the better). Below is a diagram and description of the major functional components of the algorithm and software. Inputs Formatting One or two values are entered (order doesn’t matter) specifying the numerical range. They’re converted to odd values, and|or defaults, after conditional checks. Inputs Formatting Pn Selection and Parametization Pn Selection and Parameterization The inputs numerical range is used to select the Pn generator used to perform the residues sieve. Once determined, its generator parameters are created. Sieve Primes Generation The sieving primes ≤ sqrt(end_num) for the range are generated, but only those with multiples within the numerical range are used for the Pn generator. Sieve Primes Generation Residues Sieves In parallel for each twin|cousin residues pair for Pn, the sieve primes are used to create the nextp array of start locations for marking their multiples for each segment size the input numerical range is split into. Outputs Collection and Display The prime pairs count and largest value is collected for each residue pair thread, and their final greatest values displayed, along with timing data. 1 Residues Sieves Outputs Collection and Display Math Fundamentals Prime numbers do not exist randomly! When we break the number line into even sized groups of integers (the group numerical bandwidth and prime generator modulus value), the primes are evenly distributed along the residues in each group, i.e. the coprime values to the modulus (their greatest common divisor (gcd) with the modulus is 1). Thus a modulus, and its associated residues, form a Prime Generator (PG), a mathematical expression and framework for generating and identifying every prime not a modulus prime factor. While a PG modulus can be any even number, the most efficient moduli are strictly prime primorials. These prime generators have the smallest ratios of (# of residues)/modulus and make the number space primes exist within the smallest possible for a given number of residues. As more primes are used to form the PG moduli they systematically squeeze the primes into smaller and smaller number spaces. The S|SoZ algorithms are based on the structure and framework of Prime Generators, whose math and properties are formalized in Prime Generator Theory (PGT). For an extensive review read [1], [2], [3] and see the video – (Simplest) Proof of the Twin Primes and Polignac’s Conjectures. https://www.youtube.com/watch?v=HCUiPknHtfY&t=940s [4]. Below is a list of the major properties of Prime Generators that comprise the mathematical foundation for the S|SoZ algorithms and code. Major Properties of Prime Generators • • • • • • • • • • • • • • • • • a prime generators has form: Pn = modpn * k + {r0 … rn} the modulus for prime generator with last prime value pn has primorial form: modpn = pn# the number of residues are even, with counts: rescntpn = (pn – 1)# = pn-1# the residues occur as modular complement pairs to its modulus: modpn = ri + rj the last two residues of a generator are constructed as: (modpn - 1) (modpn + 1) the residues, by definition, will include all the coprime primes < modpn the first residue r0 is the next prime > pn the residues from r0 to r02 are consecutive primes each generator has a characteristic Prime Generator Sequence (PGS) of even residue gaps the last 3 sequence gaps have form: (r0 - 1) 2 (r0 - 1) the gaps are distributed with a symmetric mirror image around a pivot gap size of 4 the residue gaps sum from r0 to (r0 + modpn) equals the modulus: modpn = Σai·2i the coefficients ai are the frequency of each gap of size 2i the sum of the coefficients ai equal the number of residues: rescntpn = Σai coefficients a1 = a2 are odd and equal with form: a1 = a2 = (pn – 2)# = pn-2# the coefficients ai are even for i > 2 the number of nonzero coefficients ai in a sequence for Pn is of order pn-1 Residues have canonical form values (1...modpn-1), as 1 is always coprime to any modulus, but for coding|math efficiency their functional form values (r0…modpn+1) are used, with r0 defined above, and modpn+1 ≡ 1 modpn is the permuted first congruent value for 1. Also, as the residues exist as modular complement pairs the code determines their first half values and their 2nd half values come for FREE. To find the residues for some Pn, we can use the PGS of a smaller generator (in the code for P3), to reduce the number space of the residue candidates in larger moduli that need to be checked. 2 Shown here is the primes candidates (pcs) table for P5 up to the 100th prime 541. It shows the only possible pc values that can be primes for 30 integer groupings. Each of the k columns is a residue group (resgroup) of prime candidates. The colored pc values are nonprime composites, and can be sieved out by the SoZ, (Sieve of Zakiya) leaving only the prime values shown. P5 = 30 * k + {7, 11, 13, 17, 19, 23, 29, 31} k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 r0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 r1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 r3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 r4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 r5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 r6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 r7 31 Table 1. 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 Every PG represents a pcs table like this, which visually display all their properties. To identify all the Twin Primes we merely observe the residue pair values that differ by 2, (11, 13), (17, 19), (29, 31), and for Cousins those that differ by 4, (7, 11), (13, 17), (19, 23). These residues gaps form the basis for the Twins|Cousins SSoZ implementations, and other k-tuples of interest. To find larger constellations of prime pairs, et al, we merely identify the residue pairs of desired size. For Sexy Primes (p, p+6), we just use the pairs (7, 13), (11, 17), (13, 19), (17, 23), (23, 29), (31, 37). Using them, we easily see and count there are 47 Sexy Primes (with [5:11]) within the first 100 primes. Larger generators have more residues and larger gaps and enable identifying more desired size k-tuples. In my video [4], I define the residue gaps as the gaps between consecutive residues, and thus I refer to prime gaps as consecutive prime (2, n) tuples, where n is an even integer. Thus in the video I state there are 25 Sexy Primes in the table above, i.e. 25 pairs of consecutive primes that differ by 6. However in the academic math world Sexy and Cousin primes are defined as any (2, 6) and (2, 4) tuple, thus [7:13] is a Sexy Prime even though we see 11 is between them. So [5:11] is defined as the first Sexy Prime and [3:7] the first Cousin, and [3:103] would be the first (2, 100) tuple, i.e. 2 primes that differ by 100. However, if you want to know and understand the true distribution of primes, what you want to know is the distribution of the gaps between consecutive primes, which I’ll define as prime gap kpg-tuples. So the actual first (2, 100) kpg-tuple is [396,733: 396,833], a very big difference. It’s from the kpg-tuples that inform you where the prime deserts are (long number stretches without primes), and characterize the true average thinning (density) of primes as the integers grow larger. And as shown and explained in [3] and [4], there are an infinity of consecutive prime gaps of any even size. Thus the PGS for the Pn’s provide a deterministic floor (minimum) value of the number of kpg-tuples of any size, and their prime values, over any range of numbers, which we can (in theory) create an SSoZ residues sieve to identify and count. 3 Shown here are the PG parameters for the first 9 Pn generators P2 – P23 where modpn = Here pn = is the prime value of the mth prime, thus: p2 = p1, p3 = p2, p5 = p3, p7 = p4,, etc. Pn’s modulus value modpn: (pn - 0)# = pn-0# = Π (pn - 0) = (2 - 0) * (3 - 0) * (5 - 0) … * (pm - 0) Number of residues rescnt: (pn - 1)# = pn-1# = Π (pn - 1) = (2 - 1) * (3 - 1) * (5 - 1) … * (pm - 1) # of twins|cousins pairscnt: (pn - 2)# = pn-2# = Π (pn - 2) = (2 - 2) * (3 - 2) * (5 - 2) … * (pm – 2) For P23 modulus: modp23 = 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 = 223092870 For P23 residues: rescount = 1 * 2 * 4 * 6 * 10 * 12 * 16 * 18 * 22 = 36495360 For P23 twins|cousin: pairs = 1 * 1 * 3 * 5 * 9 * 11 * 15 * 17 * 21 = 7952175 The primes number space % is: (rescntpn/modpn) * 100 = (pn-1# / pn#) * 100 The pairscnt number space % is: (pairscntpn*2/modpn) * 100 = (pn-2# / pn#) * 200 Pn P2 P3 P5 P7 P11 modulus (modpg) 2 6 30 210 2310 30030 510510 9699690 223092870 residues count (rescnt) 1 2 8 48 480 5760 92160 1658880 36495360 twins|cousins pairscnt 0 1 3 15 135 1485 22275 378675 7952175 primes % number space 50.00 33.33 26.67 22.86 20.78 19.18 18.05 17.10 16.36 pairs % number space Table 2. 8.73 7.81 7.13 50.00 33.33 20.00 14.29 11.69 P13 9.89 P17 P19 P23 As the Pn primorial primes pm increase, the number space containing primes and twins|cousins steadily decreases, and can be made an arbitrarily small value ε > 0 of the total number space as m→∞. Primes Number Space 50 45 Number Space % 40 35 30 25 primes pairs 20 15 10 5 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Number of Pn Primorial Primes This graph shows the decreasing prime number space for Pn using the first 100 primes. Once past the knee of the curve, the differential change becomes smaller for each additional pm. For many common use cases we can effectively limit usable Pn generators to the first 10 primes or so. However, for prime searches in large number values ranges, using the largest generator possible for a system is desirable, to make the maximum searchable number space as small as possible. 4 Generating Sieve Primes The SSoZ uses the necessary sieving primes ≤ (i.e. only those with multiples within the inputs range) to sieve out their nonprime multiples. An efficient coded P5 Sieve of Zakiya (SoZ) generates them at runtime (though other means can be used). Below is its algorithm. SoZ Algorithm To find all the primes ≤ N = 1. for Prime Generator P5, create its generator parameters 2. determine kmax, the number of residue groups (resgroups) up to N 3. create byte array prms[kmax] to represent the value|residue of each resgroup pc 4. perform outer sieve loop: • starting from the first resgroup, determine where each pc bit location is prime • if a bit location a prime, keep its residue value in prm_r; numerate its prime value • exit loop when prime > sqrt(N) 5. perform inner sieve loop with each residue ri: • create cross-product (prm_r * ri) • determine the resgroup kn it’s in, and its residue rn • compute first prime multiple resgroup kpm for the prime with ri • mark in prms each primenth kpm resgroup bitn[rn] as non-prime until its end 6. repeat from 4 for next resgroup 7. when sieve ends, numerate|store from each prms resgroup the needed sieving primes ≤ N P5’s primes candidates (pcs) table up to 541 (the 100th prime) is shown below. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 The function sozpg performs the P5 sieve exactly as shown. An array prms of kmax bytes is created to represent each resgroup|column of 8 pc values|rows up to the resgroup that covers the input value. Each row represents a residue value|bit position|residue track. prms is initialized to ‘0’ to make all bit positions be prime. The sieve computes for each prime ≤ its first prime multiple resgroup kpm on each row, and starting from these, sets each primenth resgroup bit on each row to ‘1’, to mark its multiples (colors), to eliminate the nonprimes. The process is explained in greater detail as follows. 5 Performing SoZ Sieve To sieve the nonprimes from P5’s pcs table up to 541 we use the primes ≤ isqrt(541)=23. They are the first 6 primes|residues: 7, 11, 13, 17, 19, 23, whose first unique multiples are shown with 6 different colors. The value 541 resides in residue group k=17, so kmax=18 is the number of resgroups up to it. Starting with the first prime in regroup k=0, 7 multiplies each pc in the resgroup, whose multiples are in blue: 7 * [7, 11, 13, 17, 19, 23, 29, 31] = [49, 77, 91, 119, 133, 161, 203, 217]. Each 7th resgroup|col along each restrack|row from these start values are 7’s multiples. Thus 7 * 7 = 49 in resgroup k=1, on rt4|r=19 is 7’s first multiple. Every 7th regroup starting there (k=1, 8, 15) < kmax on rt4 is a multiple of 7 and set to ‘1’ to mark as nonprime. We repeat for 7’s other first multiples 77, 91, etc, on their rows. We then use the next prime location in resgroup k=0 after 7, which is 11, and repeat the process with it. 11 * [7, 11, 13, 17, 19, 23, 29, 31] = [77, 121, 143, 187, 209, 253, 319, 341], whose first unique multiples are red. Note, the first unique multiple for each prime is its square, which for 11 is 121. The first multiples with smaller primes, e.g. 11* 7 = 77, are colored with those primes colors (here 7|blue). Also note, each prime must multiply each member in its resgroup, whether prime or not, to map its starting first prime multiple onto each distinct row in some kpm resgroup. As shown, this process is very simple and fast, and we can perform the multiplications very efficiently. We can also perform the sieve and primes extraction process in parallel, making it even faster. Extracting Sieve Primes To extract the primes from prms in sequential order, we start at resgroup k=0 and iterate over each byte bit, then continue with each successive byte. A ‘0’ bit position represents a prime value in each byte, and if ‘1’ we skip to the next bit. The prime values are numerated as: prime = modpg * k + ri, with k the resgroup index, ri the residue for the bit position, and modpg = 30 for P5’s modulus. Alternatively we can reverse the order, and for each bit row, iterate over each resgroup byte and find the primes along them. This may provide certain software computational advantages, but the primes will no longer be extracted in sequential order (though if necessary they could be sorted afterwards). For the purposes of the SSoZ algorithm, it’s not necessary the primes be used in sequential order. To optimize performance of the SSoZ, during the prime sieve extraction process, primes which don’t have multiples within the inputs range are discarded. This significantly increases SSoZ performance for small input ranges between large input numbers, by reducing the work the residues sieves do. The algorithm described here is generic to all Pn generators, where only their parameters change for each. Implementations may vary based on hardware|software particulars, but the work performed is the same. Larger generators systematically reduce the primes number space, by having larger modulus sizes and more residues, but we generally want to pick the smallest Pn generator that optimizes the system resources for given input values and ranges. For the implementations provided, whose inputs range are constrained to 64-bits, using P5 to perform the SoZ with was the overall most efficient choice, as it’s straightforward to code, and as we’ll see, can also be done in parallel to increase its performance. 6 Efficient residue multiplications To find the resgroup (column) for a pc value in the table we integer divide it by the PG modulus. To find its residue value, we find its integer remainder when dividing by the PG modulus. Thus each pc regroup value has parameters: k = pc div modpg, with residue value: ri = pc mod modpg. Multiplying two regroup pcs e.g. (17 * 19) = 323 gives: k, ri = (17 * 19).divmod 30 –> k = 10, ri = 23. From P5’s pc table, we see pc = 323 is in resgroup k=10 with residue 23 on restrack rt5. Each prime can be parameterized by its residue r and resgroup k values e.g.: prime = modk + r, where modk = modpg * k, for each resgroup, and each resgroup pc_i has form: pc_i = modk + ri. Thus the multiplication – (prime * pc_i) – translates into the following parameterized form: The original multiplication has now been transformed to the form: product = modpg * kk + rr where kk = k * (prime + ri) and rr = r * ri, which also has the general form: pc = modpg * k + r. The (r * ri) term represents the base residues (k = 0) cross products (which can be pre-computed). We extract from it its resgroup value: kn = (r * ri) / modpg, and residue: rn = (r * ri) % modpg, which maps to a restrack bit value as rt_n = residues.index(rn). Thus for P5, r = 7 is at residues[0], so that its rt_i row value is: i = residues.index(7) = 0, whose bit mask is: bit_r = 2i = (1 << i) in the code. Thus, the product of two members in resgroup k maps to a higher resgroup: kp = kk + kn on rt_n, comprised of two components; kn (their cross-product resgroup), and kk (their k resgroup component). To describe this verbally, to find the product resgroup kp of any two resgroup members, numerate one member (for us a prime), call its residue r, add the other’s residue ri to it, multiply their sum by the resgroup value k, then add it to their residues cross-product resgroup. For (97 * 109) with k = 3 gives. Ex: kp = (97 * 109) / 30 = 3 * (97 + 19) + (7 * 19) / 30 = 3 * (109 + 7) + (19 * 7) / 30 = 352 For each Pn the last resgroup pc value is: (modpg + 1) ≡ 1 mod modpg, so for P5, its modpg*k + 31. To ensure pc / modpg = k always produces the correct k value, 2 is subtracted before the division. Thus the resultant residue value is 2 less than the correct one, so 2 is added back to get the true value. In sozpg: kn, rn = (prm * ri - 2).divmod md; kn is the correct resgroup and (rn + 2) the correct residue. The code uses rn without the addition sometimes when doing memory addressing. (In the code, the posn array performs the mapping at address (r – 2) into restrack rtn indices 0 – 7). Ex: (7 * 43) / 30 = 301 / 30 = 10, but 301 is the last pc in resgroup 9, so (301 – 2) / 30 is correct value. Also 301 % 30 = 1, but 299 % 30 = 29, and when 2 is added we get the correct residue 31 for pc 301. 7 sozpg def sozpg(val, res_0, start_num, end_num) # Compute the primes r0..sqrt(input_num) and store in 'primes' array. # Any algorithm (fast|small) is usable. Here the SoZ for P5 is used. md, rscnt = 30u64, 8 # P5's modulus and residues count res = [7,11,13,17,19,23,29,31] # P5's residues bitn = [0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128] kmax = (val - 2) // md + 1 prms = Array(UInt8).new(kmax, 0) modk, r, k = 0, -1, 0 # number of resgroups upto input value # byte array of prime candidates, init '0' # initialize residue parameters loop do # for r0..sqrtN primes mark their multiples if (r += 1) == rscnt; r = 0; modk += md; k += 1 end # resgroup parameters next if prms[k] & (1 << r) != 0 # skip pc if not prime prm_r = res[r] # if prime save its residue value prime = modk + prm_r # numerate the prime value break if prime > Math.isqrt(val) # exit loop when it's > sqrtN res.each do |ri| # mark prime's multiples in prms kn,rn = (prm_r * ri - 2).divmod md # cross-product resgroup|residue bit_r = bitn[rn] # bit mask for prod's residue kpm = k * (prime + ri) + kn # resgroup for 1st prime mult while kpm < kmax; prms[kpm] |= bit_r; kpm += prime end end end # prms now contains the nonprime positions for the prime candidates r0..N # extract only primes that are in inputs range into array 'primes' primes = [] of UInt64 # create empty dynamic array for primes prms.each_with_index do |resgroup, k| # for each kth residue group res.each_with_index do |r_i, i| # check for each ith residue in resgroup if resgroup & (1 << i) == 0 # if bit location a prime prime = md * k + r_i # numerate its value, store if in range # check if prime has multiple in range, if so keep it, if not don't n, rem = start_num.divmod prime # if rem 0 then start_num is multiple of prime primes << prime if (res_0 <= prime <= val) && (prime * (n + 1) <= end_num || rem == 0) end end end primes end Inputs: val – integer value for res_0 – first residue for selected SSoZ Pn end_num – inputs high value start_num – inputs low value Output: primes – array of sieving primes within inputs range sieves the prime multiples ≤ val to create P5’s pcs table held in byte array prms, as described. To extract only the necessary primes for the SSoZ it uses inputs: res_0, start_num, end_num sozpg is the first residue of the selected Pn for the SSoZ. For P5 it’s 7, but when Pn is larger, e.g. P7, P11, P13 etc, their res_0 are greater, i.e. 11, 13, 17, etc, so only the primes ≥ res_0 are kept. The last byte prm[kmax-1] may also have bit positions for primes > val, which aren’t needed and are discarded. res_0 We thus perform two checks for each found prime, the first being: (res_0 <= prime <= val) This filters out from P5’s pcs table the primes outside the SSoZ inputs range for the selected Pn. The second check determines the primes with multiples within the SSoZ inputs range that are needed. For small input ranges, primes > the range size can be discarded if they don’t have multiples within it. This is done by the check: (prime * (n + 1) <= end_num || rem == 0) 8 All the primes ≤ range = ( ( – – are used if their values are ≤ range = ( – ). But if )< some sieving primes may be discarded, i.e. when )< some primes may not have multiples within the range. Example: ( = 4,000,000; – )< (4,000,000 – 2,000) < 3,998,000 < = 2,000 If ≤ 3,998,000; say 500,000; the input range is ≥ 1999, the largest prime less than 2000, and all the primes < will have at least one multiple in the range, and must be used. If > 3,998,000, say 3,999,300, the primes < 700 (the input range) will have multiples in the range; 122 for P5. But some of the 178 primes between 700 < p < 2,000 will not, and can be discarded. The second test finds 103 are needed. So for P5 only 75% (225 of 300) of the primes < 2000 are used. Described below is the process to determine if a prime p has at least one multiple in the inputs range. | ––– p ––– | |rem | | np+p 1p…2p…3p…..np….|-------+-----------------------| start_num end_num For a given prime value do: n = start_num // prime; rem = start_num % prime In Crystal, et al, can just do: n, rem = start_num.divmod prime Then do the following test: prime * (n + 1) <= end_num || rem == 0 Here, n*p + rem = , where n is the number of prime’s multiples e.g. np ≤ . If rem is 0 then is a multiple of p, otherwise 0 < rem < p. If p > , n = 0. Thus (n*p + p) = p*(n + 1) is the next multiple of p whose value is > . If p*(n + 1) ≤ p is in range, if not, but rem = 0, then p*n = , so p is in range. Also, when performing: kn, rn = (prm_r * ri - 2).divmod md, rn’s true value is reduced by 2, but we need to know its true residue bit position to mark the prime multiples for those bit positions. Conceptually, given residue rn, its bit index is: posn[rn] = res.index(rn), for P5 a value from 0..7. Because the rn values are 2 less than their real values, (rn – 2) is used as their addresses into the array posn used to map them, coded as: posn=[];(0..rscnt-1).each { |n| posn[res[n]-2] = n } Then posn[7-2] = 0, posn[11-2] = 1, etc, and each rn bit value is: bit_r = 1 << posn[rn], which are OR’d into prms to mark the prime multiples as: prms[kpm] |= bit_r. The shift values 2i can be converted to their bit position values directly using array bitn[] e.g. now: bit_r= bitn[rn] posn =[0,0,0,0,0,0,0,0,0,1,0,2,0,0,0,3,0, 4,0,0,0, 5,0,0,0,0,0, 6,0,7] bitn =[0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128] In both cases byte arrays can be used to store the values, as they all can be represented by just 8 bits. This is an implementation detail to decide. 9 Because the processing of each row is independent from the others we can perform both the sieve and prime extraction processes in parallel. Below shows Rust code using the Rayon crate to do this. fn atomic_slice(slice: &mut [u8]) -> &[AtomicU8] { unsafe { &*(slice as *mut [u8] as *const [AtomicU8]) } } fn sozpg(val: usize, res_0: usize, start_num : usize, end_num : usize) -> Vec<usize> { // Compute the primes r0..sqrt(input_num) and store in 'primes' array. // Any algorithm (fast|small) is usable. Here the SoZ for P5 is used. let (md, rscnt) = (30, 8); // P5's modulus and residues count static RES: [usize; 8] = [7,11,13,17,19,23,29,31]; static BITN: [u8; 30] = [0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128]; let let let let kmax = (val - 2) / md + 1; mut prms = vec![0u8; kmax]; sqrt_n = val.integer_sqrt(); (mut modk, mut r, mut k) = (0, 0, 0 // number of resgroups upto input value // byte array of prime candidates, init '0' // compute integer sqrt of val ); loop { // for r0..sqrtN primes mark their multiples if r == rscnt { r = 0; modk += md; k += 1 } if (prms[k] & (1 << r)) != 0 { r += 1; continue } // skip pc if not prime let prm_r = RES[r]; // if prime save its residue value let prime = modk + prm_r; // numerate the prime value if prime > sqrt_n { break } // exit loop when it's > sqrtN let prms_atomic = atomic_slice(&mut prms); // share mutable prms among threads RES.par_iter().for_each (|ri| { // mark prime's multiples in prms in parallel let prod = prm_r * ri - 2; // compute cross-product for prm_r|ri pair let bit_r = BITN[prod % md]; // bit mask for prod's residue let mut kpm = k * (prime + ri) + prod / md; // 1st resgroup for prime mult while kpm < kmax { prms_atomic[kpm].fetch_or(bit_r, Ordering::Relaxed); kpm += prime; }; }); r += 1; } // prms now contains the nonprime positions for the prime candidates r0..N // numerate the primes on each bit row in prms in parallel (won't be in sequential order) // return only the primes necessary to do SSoZ for given inputs in array 'primes' let primes = RES.par_iter().enumerate().flat_map_iter( |(i, ri)| { prms.iter().enumerate().filter_map(move |(k, resgroup)| { if resgroup & (1 << i) == 0 { let prime = md * k + ri; let (n, rem) = (start_num / prime, start_num % prime); if (prime >= res_0 && prime <= val) && (prime * (n + 1) <= end_num || rem == 0) { return Some(prime); } } None }) }).collect(); primes } Here the primes are extracted from each row in parallel using 8 threads, thus not kept in sequential order. Reversing the loops, as in the Crystal code, will extract them in order but will be slower as the number of resgroups increase. Since sequential order isn’t necessary to do the SSoZ this is optimal. For systems with more than 8 threads, using P7 with 48 residues may be faster, especially for large input values, if P7’s smaller number space can be processed faster with those threads than using P5. We can see the performance gain that’s achieved between using all the sieving primes upto end_num, to only using those with multiples within the inputs ranges, to then generating them in parallel in sozpg. The following examples using Rust show the three cases and the progressive performance increases. 10 This is the Rust output of the original unoptimized sozpg using these two 63-bit number as inputs. It shows (in nextp[2 x 129900044]) 129,900,044 sieving primes were generated, which accounted for most of the setup time. The times shown are for the i7 6700HQ 4C|8T and AMD 5900HZ 8C|16T cpus. $ echo 7200011140000000000 7200011139993250000 | ./twinprimes_ssoz157 threads = 8 // 16 using Prime Generator parameters for P5 segment size = 65536 resgroups; seg array is [1 x 1024] 64-bits twinprime candidates = 675003; resgroups = 225001 each of 3 threads has nextp[2 x 129900044] array setup time = 13.098702568 secs // 7.089318922 secs perform twinprimes ssoz sieve 3 of 3 twinpairs done sieve time = 9.731177018 secs // 4.944145598 secs total time = 22.829885781 secs // 12.033471504 secs last segment = 28393 resgroups; segment slices = 4 total twins = 4711; last twin = 7200011139999998808+/-1 These are the result from filtering out the unnecessary primes (no multiples in inputs range), using 49x fewer primes – 2,636,377. Though there’s some setup time increases for 8 threads, there’s a massive decrease in the sieve time, as each thread now does significantly less work (and use less memory). $ echo 7200011140000000000 7200011139993250000 | ./twinprimes_ssoz158 threads = 8 // 16 using Prime Generator parameters for P5 segment size = 65536 resgroups; seg array is [1 x 1024] 64-bits twinprime candidates = 675003; resgroups = 225001 each of 3 threads has nextp[2 x 2636377] array setup time = 13.743127493 secs // 6.987116498 secs perform twinprimes ssoz sieve 3 of 3 twinpairs done sieve time = 0.175270322 secs // 0.107544045 secs total time = 13.918427314 secs // 7.094673324 secs last segment = 28393 resgroups; segment slices = 4 total twins = 4711; last twin = 7200011139999998808+/-1 Finally, when sozpg performs the prime generation and filtering process in parallel the setup times drops from 13.7|6.9 to 5.3|4.7 secs, with a total time drop from 22.8|12.0 to ~5.5|4.9 secs. $ echo 7200011140000000000 7200011139993250000 | ./twinprimes_ssoz159 threads = 8 // 16 using Prime Generator parameters for P5 segment size = 65536 resgroups; seg array is [1 x 1024] 64-bits twinprime candidates = 675003; resgroups = 225001 each of 3 threads has nextp[2 x 2636377] array setup time = 5.296482074 secs // 4.74022821 secs perform twinprimes ssoz sieve 3 of 3 twinpairs done sieve time = 0.180924203 secs // 0.116552963 secs total time = 5.477426691 secs // 4.856791579 secs last segment = 28393 resgroups; segment slices = 4 total twins = 4711; last twin = 7200011139999998808+/-1 11 Constructing nextp nextp is a table of the resgroups for the first prime multiples for the sieving primes along each restrack. From P5’s pcs table we can look at each row and create Table 3 of their first prime multiples resgroups. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 Table 3. List of resgroup values for the first prime multiples – prime * (modk + ri) – for the primes shown. rt res 0 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 7 7 6 8 6 8 22 22 7 75 64 70 64 104 104 75 203 182 192 1 11 5 11 7 7 18 5 18 11 65 83 67 67 65 96 83 185 215 187 2 13 4 8 13 16 4 8 16 13 60 72 87 92 72 92 87 176 196 221 3 17 2 2 12 17 14 14 12 17 50 50 84 95 86 84 95 158 158 216 4 19 1 10 5 9 19 17 10 19 45 80 61 73 93 80 99 149 210 177 5 23 6 4 4 10 10 23 6 23 72 58 58 76 107 72 107 198 172 172 6 29 3 6 9 3 6 9 29 29 57 66 75 57 75 119 119 171 186 201 7 31 2 3 2 12 11 12 27 31 52 55 52 82 82 115 123 162 167 162 Note on each row, when two primes have the same resgroup table value they were multiplied. When only one value occurs, its either for a prime square, or a (prime * nonprime) value. Also, for a prime in any resgroup k in P5’s pcs table, its first prime multiple on its own row is just: prime * (k + 1) + k For P5’s pcs table this is equivalent to: k * (prime + 31) + ((prm_r * 31) - 2) / modpg (This is a property for every pc member in a resgroup for every Pn, for its first multiple on its row). To construct Table 3, each prime in P5’s pcs table multiplies each regroup member, whose products are other table values. Their row|col cell locations are entries into nextp. Thus starting with first prime 7: 7 * [7, 11, 13, 17, 19, 23, 27, 29, 31] = [49, 77, 91, 119, 133, 161, 203, 217] We see in P5’s pcs table, 49 occurs in resgroup k=1 for residue value 19, which is residue track 4 (rt4). Similarly for the remaining multiples of 7, we see their placement in the table. Repeating this process for each prime, we compute their first multiples, then determine their resgroup value for each restrack. 12 These first prime multiple locations in Table 3 are used to start marking off successive prime multiples along each restrack|row. The SoZ computes each prime’s multiples on the fly once and doesn’t need to store them for later use. The SSoZ computes an initial nextp for the inputs range first segment, which is updated at the end of each segment slice to set the first prime multiples for the next segment(s). For each sieve prime we compute its first multiple resgroup k for the restracks of interest, e.g. for twin pair residues. We then determine its regroup k’≥ kmin, where kmin is the resgroup for the start_num, input value (kmin = 1 if one input given). Thus k’≥ 0 is the number of resgroups starting from kmin. In the picture below, k is a prime’s 1st multiple resgroup on a row, and k’its projection relative to kmin. If k ≥ kmin, then k’= k - kmin. Thus if kmin = 3 and k = 7, k’=4 is its first resgroup inside the segment starting at kmin. If k = kmin then k’= 0, i.e. that first prime multiple starts at the segment’s beginning. | ––– p ––– | k |rem | k’ |.…..…..……….|…...|--------|----------------------kmin If k < kmin, we compute prime’s multiple closest to kmin, i.e. where k’= 0...prime-1 resgroups ≤ kmin: k’ k’ = (kmin - k) % prime = prime - k’ if k’ > 0 –> value of rem in picture –> translated k’value > kmin Ex: for prime 7 on rt0, let k = 7, kmin = 21: then k’ = (21 - 7) % 7 = 0; to start from (multiple of 7). Ex: for prime 7 on rt0, let k = 7, kmin = 25: then k’ = (25 - 7) % 7 = 4; k’ = 7 - 4 = 3; to start from. In software, we can reassign the variable k to use for k’, so the (Crystal, et al) code just becomes: k < kmin ? (k = (kmin - k) % prime; k = prime - k if k > 0) : k -= kmin It should be noted, while the sieve primes have at least 1 multiple within the inputs range, some may not have multiples on each restrack, especially for small ranges, and for them k > kmax. If this happens for both residue pairs, those primes could be discarded from the primes lists for those residues sieves. For general purposes though, it won’t happen enough to increase performance to justify the extra code. To make the process|code simple, the k values for each sieve prime are generated and stored in nextp, without worry if they’re > kmax. If a prime’s k is larger than a segment size its skipped for it (not used to mark prime multiples) and reduced|updated by kn with smaller values for the next segments. When less than a segment size, it’s used in the residues sieve to mark prime multiples. Thus in twins_sieve, only primes with multiples in a segment for each restrack are used to mark prime multiples, or skipped. A unique nextp array is created for each residues pair in each thread for the sieving primes. Thus for twin|cousin primes, nextp holds their first prime multiples resgroups values for each segment slice for both residue pairs restracks. Thus its memory increases with inputs values (more sieving primes) and larger generators (more residue pairs), though active memory use will be determined by the number of parallel threads holding onto memory. How different languages manage memory affects the size and throughput they can achieve for various inputs and ranges, for a system’s memory size and profile. 13 Creating nextp for SSoZ In the SoZ, a prime’s residue r multiplies each Pn residue ri and (r * ri) mod modpg maps to a unique restrack rt in some resgroup k, is the starting point to mark off that prime’s multiples for that ri. We now want to multiply r by the ri that makes (r * ri) be on a given restrack rt, for each sieving prime. Thus if for some ri, (r * ri) mod modpg = rt, to find the ri that maps each r to a specific rt we do: Where for r-1, r_inv = modinv(r, modpg) in the code, with r being the residue for a sieve prime. (A property of prime generators is that every residue has an inverse, either itself or another PG residue.) Now kn = (r * ri - 2) / modpg, and k = (prime - 2) / modpg, so again: kpm = k * (prime + ri) + kn If r_inv is a prime’s residue inverse, and rt the desired restrack: ri = ( r_inv * rt - 2) mod modpg + 2 For each residues pair, nextp_init creates the nextp array of the sieve primes first resgroup multiples relative to kmin, for the rt values r_lo and r_hi, the upper|lower twinpair residues. With no loss of generality, it can be used to construct nextp for any architecture for any number of specified restracks. nextp_init def nextp_init(rhi, kmin, modpg, primes, resinvrs) # Initialize 'nextp' array for twinpair upper residue rhi in 'restwins'. # Compute 1st prime multiple resgroups for each prime r0..sqrt(N) and # store consecutively as lo_tp|hi_tp pairs for their restracks. nextp = Slice(UInt64).new(primes.size*2) # 1st mults array for twinpair r_hi, r_lo = rhi, rhi - 2 # upper|lower twinpair residue values primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) k = (prime - 2) // modpg # find the resgroup it's in r = (prime - 2) % modpg + 2 # and its residue value r_inv = resinvrs[r].to_u64 # and residue inverse rl = (r_inv * r_lo - 2) % modpg + 2 # compute r's ri for r_lo rh = (r_inv * r_hi - 2) % modpg + 2 # compute r's ri for r_hi kl = k * (prime + rl) + (r * rl - 2) // modpg # kl 1st mult resgroup kh = k * (prime + rh) + (r * rh - 2) // modpg # kh 1st mult resgroup kl < kmin ? (kl = (kmin - kl) % prime; kl = prime - kl if kl > 0) : (kl -= kh < kmin ? (kh = (kmin - kh) % prime; kh = prime - kh if kh > 0) : (kh -= nextp[j * 2] = kl.to_u64 # prime's 1st mult lo_tp resgroup val nextp[j * 2 | 1] = kh.to_u64 # prime's 1st mult hi_tp resgroup val end nextp end Inputs: rhi – hi residue value for this twinpair kmin – resgroup value for start_num modpg – modulus value for chosen pg primes – array of sieving primes resinvrs kmin) kmin) in range in range Output: nextp – array of primes 1st mults for given residues – array of residues modular inverses 14 Twins|Cousins SSoZ Let’s now construct the process to find twin primes ≤ N with a segmented sieve, using our P5 example. Twin primes are consecutive odd integers that are prime, the first two being [3:5], and [5:7]. Thus from our original P5 pcs table, we use just the consecutive pc residue tracks, whose residues table is below. A twin prime occurs when both twin pair pc values in a column are prime (not colored), e.g. [191:193]. Table 4. Twin Primes Residues Tracks Table for P5(541). k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 We see from the table the twin pair residue tracks for [11:13] has 10 twin primes ≤ 541, [17:19] has 6, and [29:31] has 7. Thus, the total twin prime count ≤ 541 is 23 + [3:5] + [5:7] = 25, with the last being [521:523]. Twin primes are usually referenced to the mid (even) number between the upper and lower consecutive odd primes pair, so the last (largest) twin pair ≤ 541 for [521:523] is written as 522 ± 1. As shown before, the number of twin|cousin residue pairs are equal to: (pn - 2)# = pn-2# = Π (pn – 2) Thus P5 has 3 residue pairs for each. Below are the three Cousin Prime pairs taken from P5’s pcs table. Table 5. Cousin Primes Residues Tracks Table for P5(541). k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt0 7 37 67 97 127 157 187 217 247 277 307 337 367 397 427 457 487 517 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt5 23 53 83 113 143 173 203 233 263 293 323 353 383 413 443 473 503 533 The SSoZ algorithm is the same for both, with their coding only differing to deal with accounting for low input values ranges, as the first cousin prime is defined as [3:7] and first twins are [3:5], [5:7]. Up to 541, there are 25 twin and 27 cousin primes. Their ratio over increasingly larger input ranges remains close to unity, as their pairs count, and pair prime values, infinitely increase, [3], [4]. 15 Residues Sieve Description The Segmented Sieve of Zakiya (SSoZ) is a memory efficient way to find the primes using a given Pn. For an input range defined by a start_num and end_num, it divides the range into segments, which are efficiently sized to fit into usable memory for processing. This allows the reuse of the same memory to process long number ranges that otherwise would require more memory than a system has to use. A standard segment slice is ks resgroups, with last one ks’ usually less. For a given Pn and range size set_sieve_parameters determines its optimal memory size, which is set to be a multiple of 64 (bits). | Fig. 1 ks | ks | ks | | ks ks | ks | ks | ks’ kmin | kmax |…………|…………|…………|…………|…….…..|…………|…………|……....| start_num end_num Here start|end_num are the lo|hi values that define a number range of interest. They also define the absolute values for kmin and kmax for a given Pn generator, as these resgroups cover these input values. When only one input is given it becomes end_num, whose resgroup determines kmax, and start_num is set to 3 (low prime for first twin [3:5]), and kmin set to 1 (min number of resgroups). The SSoZ sieve adjusts kmin|kmax for each residues pair when necessary, to ensure only their pc values within the inputs range are processed. For example, if start_num = 342 and end_num = 540, we see below the valid in-range pc values. Here kmin = 12 and kmax = 18 are the global resgroup values, which are adjusted as needed in twins_sieve for each residues pair. For [11:13], 341 < 342, so its kmin is increased to 13, whose values are all in the range. Conversely for twinpair [29:31], pc 541 > 540 is outside the range, so its kmax is reduced to 17, whose resgroup values are now all in the range. For twinpair [17:19] no adjustment is needed (done). Thus for each residues pair, we check if the numerated r_lo pc value in kmin is < start_num, and if so increment kmin, and check if the numerated r_hi pc value in kmax is > end_num, and decrement kmax if so. In twins_sieve the adjusted kmin|kmax values are determined then used in nextp_init to create nextp for the sieving primes to begin performing the residues sieve for the first segment in the range. Table 6. Twin Primes Residues Tracks Table for range 342 – 540. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt1 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt2 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 rt3 17 47 77 107 137 167 197 227 257 287 317 347 377 407 437 467 497 527 rt4 19 49 79 109 139 169 199 229 259 289 319 349 379 409 439 469 499 529 rt6 29 59 89 119 149 179 209 239 269 299 329 359 389 419 449 479 509 539 rt7 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 16 In twins_sieve segment array seg, its resgroups size ks is a multiple of 64-bit mem elements, where each bit represents a residues pair resgroup. Thus a resgroup k maps to bit: (k mod 64) in mem elem: seg[k / 64], where (k mod 64) masks k’s lower 6 bits: (k & 0x3F), and (k / 64) right shifts k by 6 bits. This is coded as: seg[(kn - 1) >> 6], bit value: 1 << ((kn - 1) & 63), (>>|<< are right|left bit-shift opts). Ex: for ks = 131072 resgroups, seg size is 2048 64-bit mem elements for resgroup k = 89257, it maps to seg[1394], bit 240, mem value = 1 << 40 = 1099511627776 |……………………. ks …………………...| Fig. 2 ki ki+kn |….…|……|……|……|…~~~…|……|…….| seg[0] seg[kn-1] is the absolute resgroup value to start each segment slice (in Fig. 1) initialized to kmin-1 (0 indexed arrays). kn is the resgroups size for each segment slice. It’s initialized to ks, but if the last segment slice ks’ < ks resgroups it’s set to its slice size. ki To sieve for twin primes, etc, each instance of twins_sieve processes a unique twinpair for the full inputs range resgroups size, by ks segments. It first determines the adjusted kmin|kmax values for the twinpair residues, then creates their initial nextp array of first resgroup sieve prime multiples k values. Using them, it iterates over the sieve primes, computes|updates their prime multiples k values, and sets them to ‘1’ in seg for each residues pair, until k > kn, the k value past the end of the current segment. When k > kn it updates it to: k = k – kn, which is the first k multiple value into the next segment, and stores it back into nextp for that prime to update it to use for the next segment. This is the Crystal code to mark a prime’s resgroup multiples in seg to ‘1’. This is done for the lo|hi residues pair, and if either resgroup member is a prime’s multiple that resgroup isn’t a twinprime. k = nextp.to_unsafe[j * 2] # while k < kn # seg[k >> s] |= 1_u64 << (k & bmask) k += prime end # nextp.to_unsafe[j * 2] = k - kn # starting from this resgroup in seg mark primenth resgroup bits prime mults set resgroup for prime's next multiple save 1st resgroup in next eligible seg When the residues sieve finishes seg contains the resgroup bit positions for the twin primes. Because seg is set to all ‘0’s to start each segment, we need to set to ‘1’ any unused hi bits in its last mem elem ks’ is in when it’s not a multiple of 64. Algorithmically this only needs to be done for the last segment. However, doing it after every segment is faster in software, as it eliminates the branching code to check for the last segment, and is more efficient to compile|run. Below is the Crystal code to perform this. seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) If kn = 89257 for the last segment, only the first 1395 64-bit seg mem elems are used, up to the 41st bit in the last elem, so we need to set to ‘1’ its bit values 241..263, because (89257-1 & 63) = 40, for bit 240. Thus we invert 1 to be: 11111111..1110 and left-shift it 40 bits, which is ORed with the last mem elem. If kn is a multiple of 64, (kn – 1) & bmask = 63, shifts the bits to be all 0s, and thus when ORed doesn’t change seg’s last mem value. Thus left shifts of n = 0..62 bits mask all the upper bit values: 263... 2n+1. 17 Once all the nonprime bits are set we can count|numerate the primes. We read each seg[0..kn-1] and invert the bits, and use popcount to count the ‘1’s (as primes) for each seg[i] (the Rust code counts the ‘0’s directly), and sum their segment count in variable cnt. If cnt > 0 we find the largest prime resgroup in the segment. We first update the total pairs count with sum += cnt. Then upk is set to the last resgroup value in the segment, then loops backward checking for the first bit that’s prime (‘0’), and then upk holds the largest|last prime pair resgroup in the segment. Its absolute resgroup value in the inputs range is then: hi_tp = ki + upk. For each segment slice its value is updated to a larger value, and at the end holds the largest absolute resgroup for these residues pair in the inputs range. The r_hi prime value is numerated and returned as: hi_tp * modpg + r_hi, along with the total prime pairs count in the range, in variable sum. seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) cnt = 0 # count the twinprimes in the segment seg[0..(kn - 1) >> s].each { |m| cnt += (~m).popcount } if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg count back to largest tp while seg[upk >> s] & (1_u64 << (upk & bmask)) != 0; upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end can be modified for different purposes. The code to find the largest prime pair can be removed if all you want is their count. I also originally had code to print out the r_hi primes in each segment as a validity check (only for small ranges). However, if you really wanted to see|record the twins, a better way may be to return ki|seg for each segment and externally store|process them later for any desired range of interest. (This, of course, would be very memory intensive.) twins_sieve Twin Primes Example Using our example to find the twin primes ≤ 541 with P5, let’s see how to processes the first twin pair residues [11:13] with kmax = 18. twin_sieve can perform the sieve for each pair in a separate thread. sets the segment size, but here I’ll set it to ks = 6. Thus, the seg array will represent 6 resgroups. Below is the twin pair table for [11:13] separated it into 3 segment slices of 6 resgroups each. Underneath it is what each seg array will look like after processing for each slice. (seg conceptually is a bitarray, so each seg[i] is just 1 bit. I later show an implementation using a bitarray, which makes the code simpler|shorter, and faster, depending on a language’s implementation.) set_sieve_parameters Table 7. k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rt11 11 41 71 101 131 161 191 221 251 281 311 341 371 401 431 461 491 521 rt13 13 43 73 103 133 163 193 223 253 283 313 343 373 403 433 463 493 523 k 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 seg 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 0 1 0 initializes netxp for the sieve primes [7, 11, 13, 17, 19, 23] for residues 11 and 13, taking the values shown in Table 3. For each lo|hi residue, their k values are stored as consecutive pairs in nextp and seg is created and initialized to all primes (‘0’). nextp_init 18 j 0 1 2 3 4 5 primes 7 11 13 17 19 23 Initial nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 5 11 7 7 18 5 rt_13 4 8 13 16 4 8 k 0 1 2 3 4 5 seg 0 0 0 0 0 0 For each prime j in primes, nextp[2j|2j+1] give the pairs k’s to start marking off prime’s multiples (by incrementing k by prime’s value). When k > kn, (here kn is always 6), it’s reduced by it: k = k - 6, and updates nextp with the new k values for the next segment. Below shows the changes to nextp and seg in twins_sieve. (It’s coincidental here the index size for primes and nextp are the segment size.) seg 1 Start for Segment 1 nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 5 11 7 7 18 5 rt_13 4 8 13 16 4 8 k 0 1 2 3 4 5 seg 0 0 0 0 1 1 Start for Segment 2 nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 6 5 1 1 12 22 rt_13 5 2 7 10 17 2 seg 2 k 0 1 2 3 4 5 seg 0 1 1 0 0 1 Start for Segment 3 nextp[11:13] 2j 0 2 4 6 8 10 2j+1 1 3 5 7 9 11 rt_11 0 10 8 12 6 16 rt_13 6 7 1 4 11 19 seg 3 19 k 0 1 2 3 4 5 seg 1 1 0 0 1 0 Below is the Crystal code to perform the residues sieve (here for twins) for a given residues pair. twins_sieve def twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) # Perform in thread the ssoz for given twinpair residues for kmax resgroups. # First create|init 'nextp' array of 1st prime mults for given twinpair, # stored consequtively in 'nextp', and init seg array for ks resgroups. # For sieve, mark resgroup bits to '1' if either twinpair restrack is nonprime # for primes mults resgroups, and update 'nextp' restrack slices acccordingly. # Return the last twinprime|sum for the range for this twinpair residues. s = 6 # shift value for 64 bits bmask = (1 << s) - 1 # bitmask val for 64 bits sum, ki, kn = 0_u64, kmin-1, ks # init these parameters hi_tp, k_max = 0_u64, kmax # max twinprime|resgroup seg = Slice(UInt64).new(((ks - 1) >> s) + 1) # seg array of ks resgroups ki += 1 if ((ki * modpg) + r_hi - 2) < start_num # ensure lo tp in range k_max -= 1 if ((k_max - 1) * modpg + r_hi) > end_num # ensure hi tp in range nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array while ki < k_max # for ks size slices upto kmax kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) # for lower twinpair residue track k = nextp.to_unsafe[j * 2] # starting from this resgroup in seg while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2] = k - kn # save 1st resgroup in next eligible seg # for upper twinpair residue track k = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2 | 1]= k - kn # save 1st resgroup in next eligible seg end # set as nonprime unused bits in last seg[n] # so fast, do for every seg[i] seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) cnt = 0 # count the twinprimes in the segment seg[0..(kn - 1) >> s].each { |m| cnt += (~m).popcount } # invert to count ‘1’s if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg, count back to largest tp while seg.to_unsafe[upk >> s] & (1_u64 << (upk & bmask)) != 0; upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end ki += ks # set 1st resgroup val of next seg slice seg.fill(0) if ki < k_max # set next seg to all primes if in range end # when sieve done, numerate largest twin # for ranges w/o twins set largest to 1 hi_tp = (r_hi > end_num || sum == 0) ? 1 : hi_tp * modpg + r_hi {hi_tp.to_u64, sum.to_u64} # return largest twinprime|twins count end Inputs: ks – resgroups segment size rhi – hi residue value for this twinpair modpg – modulus value for chosen pg kmin – total number resgroups upto for start_num kmax – total number resgroups upto for end_num primes – array of sieving primes resinvrs – array of modular inverses for residues end_num – inputs high value start_num – inputs low value Outputs: sum – count of twinpairs for input range hi_tp – hi prime for largest twinprime in range 20 Starting with Crystal 1.4.0 (April 7, 2022) its bitarray implementation was highly optimized, making it faster than the 64-bit mem array for seg on the AMD 5900HX, while making the code substantially simpler to read|write and shorter. Below is the Crystal version using a bitarray for the seg array. def twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) # Perform in thread the ssoz for given twinpair residues for kmax resgroups. # First create|init 'nextp' array of 1st prime mults for given twinpair, # stored consequtively in 'nextp', and init seg array for ks resgroups. # For sieve, mark resgroup bits to '1' if either twinpair restrack is nonprime # for primes mults resgroups, and update 'nextp' restrack slices acccordingly. # Return the last twinprime|sum for the range for this twinpair residues. sum, ki, kn = 0_u64, kmin-1, ks # init these parameters hi_tp, k_max = 0_u64, kmax # max twinprime|resgroup seg = BitArray.new(ks) # seg array of ks resgroups ki += 1 if ((ki * modpg) + r_hi - 2) < start_num # ensure lo tp in range k_max -= 1 if ((k_max - 1) * modpg + r_hi) > end_num # ensure hi tp in range nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array while ki < k_max # for ks size slices upto kmax kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) # for lower twinpair residue track k = nextp.to_unsafe[j * 2] # starting from this resgroup in seg while k < kn # until end of seg seg.unsafe_put(k, true) # mark primenth resgroup bits prime mults k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2] = k - kn # save 1st resgroup in next eligible seg # for upper twinpair residue track k = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg while k < kn # until end of seg seg.unsafe_put(k, true) # mark primenth resgroup bits prime mults k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2 | 1]= k - kn # save 1st resgroup in next eligible seg end cnt = seg[...kn].count(false) # count|store twinprimes in segment if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg, count back to largest tp while seg.unsafe_fetch(upk); upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end ki += ks # set 1st resgroup val of next seg slice seg.fill(false) if ki < k_max # set next seg to all primes if in range end # when sieve done, numerate largest twin # for ranges w/o twins set largest to 1 hi_tp = (r_hi > end_num || sum == 0) ? 1 : hi_tp * modpg + r_hi {hi_tp.to_u64, sum.to_u64} # return largest twinprime|twins count end The code to find the largest twinprime in the range comes for FREE, and removing it has no detectable increase in speed, and for Crystal may even be a wee tad bit slower. sum += seg[...kn].count(false) ki += ks seg.fill(false) if ki < k_max end sum.to_u64 end # count|store twinprimes in segment # set 1st resgroup val of next seg slice # set next seg to all primes if in range # return twinprimes count in range In general, a bitarray’s performance depends on the language’s implementation (test to determine), but should make the code simpler|shorter to read|write, while the memory array model should be more ubiquitous, and implementable for languages without (native of external) bitarrays. 21 gcd def gcd(m, n) while m|1 != 1; t = m; m = n % m; n = t end m end Inputs: n – even pg modulus value m – an odd pc value < pg modulus n Output: gcd of inputs; (m, n) are coprime if 1 m– This is a customized gcd (greatest common divisor) function that uses residue properties to shorten the time of the Euclidean gcd algorithm (https://en.wikipedia.org/wiki/Euclidean_algorithm). Here m is an odd residue candidate < n, the even modulus value. Some of the language implementations just use the gcd function provided with them. modinv def modinv(a0, m0) return 1 if m0 == 1 a, m = a0, m0 x0, inv = 0, 1 while a > 1 inv -= (a // m) * x0 a, m = m, a % m x0, inv = inv, x0 end inv += m0 if inv < 0 inv.to_u64 end Inputs: a0 – odd pc value < modulus m0 m0 – even pg modulus value def modinv1(r, m) r = inv = r.to_u64 while (r * inv) % m != 1 inv = (inv % m) * r end inv % m end Output: inv – inverse of, a0 mod m0, e.g. a0*inv ≡ 1 mod m0 The function on the left is the standard modular inverse function (taken from Rosetta Code). The code on the right uses the residue property that – ri * rin ≡ 1 mod modpg – for some n ≥ 1, i.e. the modular inverse of residue ri is itself raised to some power n. This is faster for generators P3 and P5, with small number of residues, but becomes comparatively slower for generators with more residues. For P5’s residues: [7, 11, 13, 17, 19, 23, 29, 31] It’s inverses are: [13, 11, 7, 23, 19, 17, 29, 1] Inverse power n: [ 3, 1, 3, 3, 1, 3, 1, 1] 22 For a chosen Pn generator, gen_pg_parameters produces its parameters used to perform the SSoZ. It uses gcd to determine the residues and modinv to compute their inverses. gen_pg_parameters def gen_pg_parameters(prime) # Create prime generator parameters for given Pn puts "using Prime Generator parameters for P#{prime}" primes = [2, 3, 5, 7, 11, 13, 17, 19, 23] modpg, res_0 = 1, 0 # compute Pn's modulus and res_0 value primes.each { |prm| res_0 = prm; break if prm > prime; modpg *= prm } restwins = [] of Int32 # save upper twinpair residues here inverses = Array.new(modpg + 2, 0) # save Pn's residues inverses here pc, inc, res = 5, 2, 0 # use P3's PGS to generate pcs while pc < (modpg >> 1) # find PG's 1st half residues if gcd(pc, modpg) == 1 # if pc a residue mc = modpg - pc # create its modular complement inverses[pc] = modinv(pc, modpg) # save pc and mc inverses inverses[mc] = modinv(mc, modpg) # if in twinpair save both hi residues restwins << pc << mc + 2 if res + 2 == pc res = pc # save current found residue end pc += inc; inc ^= 0b110 # create next P3 seq pc: 5 7 11 13 17... end restwins.sort!; restwins <<(modpg + 1) # last residue is last hi_tp inverses[modpg+1] = 1; inverses[modpg-1] = modpg - 1 # last 2 are self inverses {modpg, res_0, restwins.size, restwins, inverses} end Inputs: prime – Pn prime value 5, 7… 17 Outputs: – first residue of selected Pn (next prime > Pn prime) modpg – modulus for generator Pn; value = (prime)# inverses – array of the pg residue inverses, size = (prime-1)# restwins – ordered array of the hi pg twinpair (tp) values restwins.size – the number of pg twinpairs = (prime-2)# res_0 For a given prime number, it generates its primorial value for modpg, and keeps its r0 value in res_0. It then generates all the residues. It uses P3’s PGS to generate Pn’s first half rcs. It checks if they’re coprime to modpg to identify the residues. For each residue it creates its modular complement (mc) and stores both inverses at their address values. It then determines if the residue is part of a twin (cousin) pair, and if so, then so is its complement, and stores both hi pair values in restwins. Upon generating all the residues, and storing their inverses and twin (cousin) pairs hi residues, the restwins array is sorted to put them in sequential order, then the last hi residue for the last twin pair modgp±1 are included as the last ones. (For cousin primes, we include the hi residue for the pivot pair (modpg/2 + 2)and then sort the array). Finally, the inverses for the last two residues modgp±1 are added at their address locations, and the outputs are returned for use in set_sieve_parameters. 23 Given the input values, set_sieve_parameters determines which prime generator to use, generates its parameters, then determines the range parameters and segment size to use. Here I use a rudimentary tree algorithm to determine for my laptops the switch points for using different generators. This can be made much more sophisticated and adaptable by also accounting for the number of system threads and cache and ram memory size, to pick better segment size values and generators for a given inputs range. set_sieve_parameters def set_sieve_parameters(start_num, end_num) # Select at runtime best PG and segment size parameters for input values. # These are good estimates derived from PG data profiling. Can be improved. nrange = end_num - start_num bn, pg = 0, 3 if end_num < 49 bn = 1; pg = 3 elsif nrange < 77_000_000 bn = 16; pg = 5 elsif nrange < 1_100_000_000 bn = 32; pg = 7 elsif nrange < 35_500_000_000 bn = 64; pg = 11 elsif nrange < 14_000_000_000_000 pg = 13 if nrange > 7_000_000_000_000; bn = 384 elsif nrange > 2_500_000_000_000; bn = 320 elsif nrange > 250_000_000_000; bn = 196 else bn = 128 end else bn = 384; pg = 17 end modpg, res_0, pairscnt, restwins, resinvrs = gen_pg_parameters(pg) kmin = (start_num-2) // modpg + 1 # number of resgroups to start_num kmax = (end_num - 2) // modpg + 1 # number of resgroups to end_num krange = kmax - kmin + 1 # number of resgroups in range, at least 1 n = krange < 37_500_000_000_000 ? 4 : (krange < 975_000_000_000_000 ? 6 : 8) b = bn * 1024 * n # set seg size to optimize for selected PG ks = krange < b ? krange : b # segments resgroups size puts "segment size = #{ks} resgroups for seg bitarray" maxpairs = krange * pairscnt # maximum number of twinprime pcs puts "twinprime candidates = #{maxpairs}; resgroups = #{krange}" {modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs} end Inputs: ––– high input value (min of 3) start_num – low input value (min of 3) end_num Outputs: – number of residue groups set for segment size res_0 – first residue of selected Pn (next prime > Pn prime) modpg – modulus value for chosen pg kmin – number resgroups to start_num kmax – number resgroups to end_num krange – number of resgroups for inputs range (at least 1) pairscnt – number of twinpairs for selected pg resinvrs – modular inverses array for the residues restwins – hi residue values array for each twinpair ks 24 Finally, shown below is the Crystal version of the main routine twinprimes_ssoz. It accepts the inputs, performs the residues sieve, times the different parts of the process, and generates the program outputs. twinprimes_ssoz def twinprimes_ssoz() end_num = {ARGV[0].to_u64, 3u64}.max start_num = ARGV.size > 1 ? {ARGV[1].to_u64, 3u64}.max : 3u64 start_num, end_num = end_num, start_num if start_num > end_num start_num |= 1 # if start_num even increase by 1 end_num = (end_num - 1) | 1 # if end_num even decrease by 1 start_num = end_num = 7 if end_num - start_num < 2 puts "threads = #{System.cpu_count}" ts = Time.monotonic # start timing sieve setup execution # select Pn, set sieving params for inputs modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs = set_sieve_parameters(start_num, end_num) # create sieve primes <= sqrt(end_num), only use those whose multiples within inputs range primes = end_num < 49 ? [5] : sozpg(Math.isqrt(end_num), res_0, start_num, end_num) puts "each of #{pairscnt} threads has nextp[2 x #{primes.size}] array" lo_range = restwins[0] - 3 # lo_range = lo_tp - 1 twinscnt = 0_u64 # determine count of 1st 4 twins if in range for used Pn twinscnt += [3, 5, 11, 17].select { |tp| start_num <= tp <= lo_range }.size unless end_num == 3 te = puts puts t1 = (Time.monotonic - ts).total_seconds.round(6) "setup time = #{te} secs" # display sieve setup time "perform twinprimes ssoz sieve" Time.monotonic # start timing ssoz sieve execution cnts = Array(UInt64).new(pairscnt, 0) # number of twinprimes found per thread lastwins = Array(UInt64).new(pairscnt, 0) # largest twinprime val for each thread done = Channel(Nil).new(pairscnt) threadscnt = Atomic.new(0) # count of finished threads restwins.each_with_index do |r_hi, i| # sieve twinpair restracks spawn do lastwins[i], cnts[i] = twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) print "\r#{threadscnt.add(1)} of #{pairscnt} twinpairs done" done.send(nil) end end pairscnt.times { done.receive } # wait for all threads to finish print "\r#{pairscnt} of #{pairscnt} twinpairs done" last_twin = lastwins.max # find largest hi_tp twinprime in range twinscnt += cnts.sum # compute number of twinprimes in range last_twin = 5 if end_num == 5 && twinscnt == 1 kn = krange % ks # set number of resgroups in last slice kn = ks if kn == 0 # if multiple of seg size set to seg size t2 = (Time.monotonic - t1).total_seconds # sieve execution time puts puts puts puts end "\nsieve time = #{t2.round(6)} secs" # ssoz sieve time "total time = #{(t2 + te).round(6)} secs" # setup + sieve time "last segment = #{kn} resgroups; segment slices = #{(krange - 1)//ks + 1}" "total twins = #{twinscnt}; last twin = #{last_twin - 1}+/-1" twinprimes_ssoz 25 Program Output Below is typical program output, shown here for Rust, for single and two input values (order doesn’t matter), run on an Intel i7-6700HQ Linux based laptop. The programs is run in a terminal with the command-line interface (cli) shown, and display the output shown. $ echo 5000000000 | ./twinprimes_ssoz threads = 8 using Prime Generator parameters for P11 segment size = 262144 resgroups; seg array is [1 x 4096] 64-bits twinprime candidates = 292207905; resgroups = 2164503 each of 135 threads has nextp[2 x 6999] array setup time = 0.000796737 secs perform twinprimes ssoz sieve 135 of 135 twinpairs done sieve time = 0.184892352 secs total time = 0.185704753 secs last segment = 67351 resgroups; segment slices = 9 total twins = 14618166; last twin = 4999999860+/-1 $ echo 100000000000 200000000000 | ./twinprimes_ssoz threads = 8 using Prime Generator parameters for P13 segment size = 524288 resgroups; seg array is [1 x 8192] 64-bits twinprime candidates = 4945055940; resgroups = 3330004 each of 1485 threads has nextp[2 x 37493] array setup time = 0.003883411 secs perform twinprimes ssoz sieve 1485 of 1485 twinpairs done sieve time = 3.819838338 secs total time = 3.823732178 secs last segment = 184276 resgroups; segment slices = 7 total twins = 199708605; last twin = 199999999890+/-1 The program output is described as follows: Line 0 is the cli input command. When 2 inputs are given their hi|lo order doesn’t matter. Line 1 shows the number of available system threads,. Line 2 shows the Pn generator selected based on the inputs. Line 3 shows the selected resgroup segment size ks, and number of 64-bit memory elements (ks / 64) for the segment array. Line 4 shows the number of twinprime candidates for the number of resgroups spanning the inputs range. In the second example, (kmax – kmin + 1) = 3,330,004 resgroups x 1485 (number of P13 twinpairs) = 4,945,055,940 twinprime candidates. Line 5 shows the number of twinpairs for the selected PG (here 1485 for P13) and the size of the nextp array, which shows the number of sieving primes used (6999 and 37493 for theses examples. Line 6 shows the time to select and generate Pn’s parameters and the sieve primes. Line 7 announces when the residues sieve process starts. Line 8 is a dynamic display showing in realtime how many twinpair threads are done, until finished. Line 9 shows the runtime for the residues sieve. Line 10 shows the combined setup and residues sieve times. Line 11 shows how many resgroups were in the last segment slice and the number of segment slices. Line 12 shows the number of twinprimes for the inputs range, and the value of the largest one. 26 Performance The SSoZ performs optimally on multi-core systems with parallel operating threads. The more available threads the higher the possible performance. To show this, I provide data from two systems. System 1: Intel i7-6700HQ, 2.6 – 3.5 GHz, 4C|8T, 16 MB, System76 Gazelle (2016) laptop. System 2: AMD 5900HX, 3.3 – 4.6 GHz, 8C|16T, 16 MB, Lenovo Legion slim 7 (2022) laptop. For a reference I used Primesieve 7.4 [5] – https://github.com/kimwalisch/primesieve – described as “a command-line program and C/C++ library for quickly generating prime numbers...using the segmented sieve of Eratosthenes with wheel factorization.” It’s a well maintained open source project of highly optimized C/C++ code libraries, which also takes inputs over the 64-bit range (but doesn’t produce results for cousin primes). Below are sample outputs for the Rust version of twinprimes_ssoz and Primesieve performed on both systems. $ echo 378043979 1429172500581 | ./twinprimes_ssoz threads = 8 // 16 using Prime Generator parameters for P13 segment size = 802816 resgroups; seg array is [1 x 12544] twinprime candidates = 70654672440; resgroups = 47578904 each of 1485 threads has nextp[2 x 92610] array setup time = 0.006171322 secs // 0.005839409 secs perform twinprimes ssoz sieve 1485 of 1485 twinpairs done sieve time = 55.836745969 secs // 18.062863872 secs total time = 55.842928445 secs // 18.068715224 secs last segment = 212760 resgroups; segment slices = 60 total twins = 2601278756; last twin = 1429172500572+/-1 $ echo 378043979 14291725005819 | ./twinprimes_ssoz threads = 8 // 16 using Prime Generator parameters for P17 segment size = 1572864 resgroups; seg array is [1 x 24576] twinprime candidates = 623572052400; resgroups = 27994256 each of 22275 threads has nextp[2 x 268695] array setup time = 0.036543755 secs // 0.025222812 secs perform twinprimes ssoz sieve 22275 of 22275 twinpairs done sieve time = 675.667368646 secs // 235,003460103 secs total time = 675.703922948 secs // 235.027696883 secs last segment = 1255568 resgroups; segment slices = 18 total twins = 22078408103; last twin = 14291725004982+/-1 $ ./primesieve -c2 378043979 1429172500581 Sieve size = 128 KiB // 256 KiB Threads = 8 // 16 100% Seconds: 101.873 // 33.781 Twin primes: 2601278756 $ ./primesieve -c2 378043979 14291725005819 Sieve size = 128 KiB // 256 KiB Threads = 8 // 16 100% Seconds: 1218.502 // 471.776 Twin primes: 22078408103 I implemented both the twins|cousins ssoz in the 6 programming languages listed here. Again, these are reference implementations, and are not necessarily optimum for each language. The Rust versions are the most optimized, and generally the fastest, as they performs the soz algorithm in parallel. The code for each is < 300 ploc (programming lines of code), which highlights the simplicity of the algorithm. The next page shows tables of benchmark results for the 6 languages implementations, and Primesieve. They are the best times for both systems from multiple runs under different operating conditions. Their code was developed on System 1, and those binaries also run on System 2. Their source code was then compiled on System 2 to compare performance differences, and those were used for the benchmarks. The 6 languages, and their development environments and versions are: C++, Nim 1.6.4 (gcc 11.3.0), D (ldc2 1.28.0, LLVM 12.0.1), Crystal 1.4.1 (LLVM 10.0.0), Rust 1.60, and Go 1.18. They most likely can be improved, and I hope others will create more versions, especially for other compiled languages. 27 N 1x10^10 5x10^10 1x10^11 5x10^11 1x10^12 5x10^12 1x10^13 Rust 0.35 1.67 3.41 18.15 37.67 219.67 482.51 Twin Prime Benchmark Comparisons – Intel i7 6700HQ C++ D Nim Crystal Go Prmsv Twins Count Largest in Range 0.45 0.46 0.53 0.48 0.61 0.51 27,412,679 9,999,999,703|-2 2.14 2.19 2.27 2.40 2.76 2.81 118,903,682 49,999,999,591|-2 4.24 4.31 4.34 4.69 5.51 5.91 224,376,048 99,999,999,763|-2 21.42 21.37 21.69 23.81 28.11 32.76 986,222,314 499,999,999,063|-2 44.48 44.25 44.71 49.05 58.08 69.25 1,870,585,220 999,999,999,961|-2 253.62 256.30 253.69 279.49 319.84 395.16 8,312,493,003 4,999,999,999,879|-2 543.74 542.23 541.35 602.63 678.61 825.71 15,834,664,872 9,999,999,998,491|-2 N Rust 1x10^10 0.36 5x10^10 1.69 1x10^11 3.35 5x10^11 18.08 1x10^12 37.17 5x10^12 220.05 1x10^13 478.96 N 1x10^10 5x10^10 1x10^11 5x10^11 1x10^12 5x10^12 1x10^13 N 1x10^10 5x10^10 1x10^11 5x10^11 1x10^12 5x10^12 1x10^13 Rust 0.12 0.54 1.12 5.85 12.14 68.04 145.01 Cousin Prime Benchmark Comparisons – Intel i7 6700HQ C++ D Nim Crystal Go Cousins Count Largest in Range 0.45 0.46 0.53 0.48 0.62 27,409,998 9,999,999,707|-4 2.11 2.18 2.26 2.41 2.81 118,908,265 49,999,999,961|-4 4.20 4.46 4.32 4.64 5.52 224,373,159 99,999,999,947|-4 21.34 21.35 21.76 23.36 28.21 986,220,867 499,999,999,901|-4 44.57 44.44 44.51 49.14 58.25 1,870,585,457 999,999,998,867|-4 250.63 251.86 252.18 278.76 320.15 8,312,532,286 4,999,999,999,877|-4 534.17 541.85 540.81 597.89 678.48 15,834,656,001 9,999,999,999,083|-4 Twin Prime Benchmark Comparisons – AMD Ryzen 9 5900HX C++ D Nim Crystal Go Prmsv Twins Count Largest in Range 0.12 0.12 0.19 0.13 0.15 0.16 27,412,679 9,999,999,703|-2 0.49 0.58 0.59 0.66 0.67 0.92 118,903,682 49,999,999,591|-2 0.97 1.13 1.08 1.23 1.32 1.95 224,376,048 99,999,999,763|-2 4.88 5.75 5.22 6.22 6.92 11.17 986,222,314 499,999,999,063|-2 10.03 12.01 11.12 13.06 14.61 23.71 1,870,585,220 999,999,999,961|-2 65.41 69.24 73.54 74.29 81.23 132.99 8,312,493,003 4,999,999,999,879|-2 155.45 156.57 172.68 170.77 185.25 307.78 15,834,664,872 9,999,999,998,491|-2 Cousin Prime Benchmark Comparisons – AMD Ryzen 9 5900HX Rust C++ D Nim Crystal Go Cousins Count Largest in Range 0.12 0.11 0.13 0.19 0.13 0.15 27,409,998 9,999,999,707|-4 0.55 0.49 0.57 0.59 0.63 0.66 118,908,265 49,999,999,961|-4 1.12 0.96 1.13 1.07 1.22 1.32 224,373,159 99,999,999,947|-4 5.87 4.89 5.78 5.25 6.18 6.92 986,220,867 499,999,999,901|-4 12.25 10.14 12.14 11.06 12.56 14.67 1,870,585,457 999,999,998,867|-4 67.69 68.51 68.74 74.68 74.86 80.29 8,312,532,286 4,999,999,999,877|-4 145.02 157.68 156.01 173.16 170.06 179.07 15,834,656,001 9,999,999,999,083|-4 28 Enhanced Configurations The software provided is designed to work on readily available 64-bit systems, and serve as reference implementations, to demonstrate how Prime Generators can be used to efficiently identify and count primes. They can be enhanced to take advantage of more hardware resources when available. Ideally we want to use as many system threads as possible. So for P5, which has 3 twin|cousin residue pairs, instead of using 3 threads over an input range it may be faster to divide the range into 2 equal parts and use 6 threads (3 for each half). Even if a system has only 4 threads, this may be faster as the range increases, but should definitely be faster (for sufficiently large ranges) if a system has 6 or more threads. In fact, if a system has at least 16 threads, using P7 (15 residue pairs) as the default generator for small ranges may be more efficient than P5, as they all can run in 1 parallel threads time (ptt). Thus a more sophisticated algorithm can be devised for set_sieve_parameters to use threads count, and also cache|memory sizes, to pick the best generator and segment size for given input ranges. For best performance this would require the profiling of targeted hardware system(s), to optimize the differences between cpus and systems capabilities and resources. However, I think the algorithm would still be fairly simple to code, to dynamically compute these parameters to achieve higher performance. Eliminating Sieving Primes As the value for end_num becomes larger more|bigger sieve primes must be generated, and filtered out or kept. Generating them takes increasing time with increasing input values. This also affects the time to perform the residue sieve, by increasing the time (and memory) to create the nextp array, and use it. While it’s possible to use stored lists of primes to eliminate dynamically generating them, this doesn’t get around creating nextp with them, with the associated memory issues for it in each thread. One simple way around this is to use a fast primality test algorithm to check each residue pair pc value in each resgroup in the threads. If one value isn’t prime the other doesn’t have to be checked. By using sufficiently large generators for a given input range, the number of resgroups over a range can be made arbitrarily small to reduce the number of primality tests to perform. For example for P47, modp47 = 614,889,782,588,491,410 is the largest primorial value that can fit into (unsigned) 64-bits. Its 15,681,106,801,985,625 residue pairs use 5.1% of the number space to hold the twin|cousin primes > 47. Eliminating using sieving primes greatly reduces the work of the algorithm. Realizable machines to perform this would use as many parallel compute engines as possible, but each would now be much simpler, eliminating sozpg and nextp_init. Now gen_pg_parameters just identifies the residue pair values (and no longer their inverses), needing only a (fast) gcd function. This could be done with massive arrays of graphic processing units (GPUs), or better, Simple Super Computers (SSCs). To search for yet undiscovered million digit primes, a distributed network can be constructed, similar to that for the Grand Internet Mersenne Prime Search (GIMPS) [7] and Twin Primes Search [8]. A benefit of creating this network, is that with all the available (free) compute power in the world, groups of residue pairs can be dedicated to machine clusters and run full time, and deterministically identify new twins|cousins (thus two primes for the price of one) forever, as there are an infinity of each [1][4]. 29 The Ultimate Primes Search Machine Using just a few basic properties of Prime Generator Theory (PGT) we can construct a conceptually simpler and more efficient machine to find as many primes as physical reality and time will allow. Because for any Pn, modpn = pm# (primorial of first m primes), r0 = pm+1, and the residues from r0 to r02 are consecutive primes, we don’t have to do primality tests for them, but merely gcd tests to determine which values are coprime to modpn. Thus we can arbitrarily use any prime as r0 of a Pn whose modpn is the primorial of all the primes < r0, to directly find the consecutive primes in [r0, r02). After finding the new additional primes, we can them create a larger Pn modulus with them, and repeat the process, to continually find more primes. Primes r0 to r0^2 30000 Number of Primes 25000 20000 15000 10000 5000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Number of Pn Primorial Primes This graph shows the number of consecutive primes in the regions [r0, r02) for generator moduli made with the first 100 primes. Thus for the last data point for p100 = 541, from r0 = 547 to r02 = 299,209 there are 25,836 primes|residues, and we now know the first 25,936 primes, with 299,197 the largest prime. Using this approach we no longer have to even identify the residue pairs, but just maintain and use the growing modulus values to perform the gcd operations with. The key here is to do the gcd operations on chunks of partial primorial values as we identify more primes and not one humongous pm# value. Thus as we identify new primes, we make partial primorial chunks with them. To check if a value is a residue we perform repeated gcd tests with all the partial primorial chunks. If any partial gcd chunk is not 1 (coprime) then that rc value isn’t a residue and we can stop testing it. Only rc values that pass all the partial chunks tests (done in parallel) are residues to the full modpn value, and thus are new primes. The main job for this machine would be to control the creation, distribution, and storage of the gcd operations, and their results, performed by a distributed network of compute engines. For each range [r0, r02) it would use the PGS for some smaller Pn, (e.g. P3’s PGS in the code to reduce the residues candidates (rc) search space to 1/3 of the range values) and distribute the rcs for testing. After creating a list of new consecutive primes, it can be processed to identify new primes or k-tuples of any type. 30 Source Code The SSoZ is a good algorithm to assess hardware and software multi-threading capabilities. It’s very simple mathematically, needing only basic computational functions most languages have, but are easy to implement if they don’t. The implementations I provide should be considered as references and not necessarily optimum for each language. They should be considered as starting points to improve upon, as they, most importantly, produce correct results that other implementations can check results against. The code source files can be found here [6]: https://gist.github.com/jzakiya, and individually below. twinprimes_ssoz Crystal – https://gist.github.com/jzakiya/2b65b609f091dcbb6f792f16c63a8ac4 Rust – https://gist.github.com/jzakiya/b96b0b70cf377dfd8feb3f35eb437225 Nim – https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e C++ – https://gist.github.com/jzakiya/fa76c664c9072ddb51599983be175a3f Go – https://gist.github.com/jzakiya/fbc77b8fdd12b0581a0ff7c2476373d9 D – https://gist.github.com/jzakiya/ae93bfa03dbc8b25ccc7f97ff8ad0f61 cousinprimes_ssoz Crystal – https://gist.github.com/jzakiya/0d6987ee00f3708d6cfd6daee9920bd7 Rust – https://gist.github.com/jzakiya/8879c0f4dfda543eaf92a3186de554d7 Nim – https://gist.github.com/jzakiya/e2fa7211b52a4aa34a4de932010eac69 C++ – https://gist.github.com/jzakiya/3799bd8604bdcba34df5c79aae6e55ac Go – https://gist.github.com/jzakiya/0ea756a8f6fd09f56cd9374d0dcf4197 D – https://gist.github.com/jzakiya/147747d391b5b0432c7967dd17dae124 Conclusion Prime Generators allow for the creation of efficient, simple, and resource sparse generic algorithms that can be performed with any Pn generator. Generators can dynamically be chosen to optimize speed and memory use for given number ranges, to best use the hardware and software resources available. The SSoZ algorithms are inherently implementable in parallel, and can be performed on any hardware or distributed system that provides multiple cores or compute engines. As shown, the more cores and threads that are available to use the higher the inherent performance will be for a given number range. While the code to generate Twin and Cousin primes was shown here, the basic math and principles explaining the process for them can be applied similarly to find other k-tuples, and other specific prime types, such as Mersenne Primes [2]. It is hoped this detailed explanation of how the SSoZ works and performs will encourage its use in applied applications, and its inclusion in software libraries, et al, that are used in the study of primes. 31 References [1] The Segmented Sieve of Zakiya (SSoZ) https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ [2] The Use of Prime Generators to Implement Fast Twin Prime Sieve of Zakiya (SoZ), Applications to Number Theory and Implications for the Riemann Hypotheses https://www.academia.edu/37952623/The_Use_of_Prime_Generators_to_Implement_Fast_Twin_Prim es_Sieve_of_Zakiya_SoZ_Applications_to_Number_Theory_and_Implications_for_the_Riemann_Hyp otheses [3] On The Infinity of Twin Primes and other K-tuples https://www.academia.edu/41024027/On_The_Infinity_of_Twin_Primes_and_other_K_tuples [4] (Simplest) Proof of Twin Primes and Polignacs’ Conjectures (video): https://www.youtube.com/watch?v=HCUiPknHtfY&t=940s [5] Primesieve - https://github.com/kimwalisch/primesieve [6] Twins|Cousins SSoZ software language source files: https://gist.github.com/jzakiya [7] Grand Internet Mersenne Primes Search (GIMPS) – https://www.mersenne.org/ [8] Twins Primes Search – https://primes.utm.edu/bios/page.php?id=949 32 # This Crystal source file is a multiple threaded implementation to perform an # extremely fast Segmented Sieve of Zakiya (SSoZ) to find Twin Primes <= N. # Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1. # Output is the number of twin primes <= N, or in range N1 to N2; the last # twin prime value for the range; and the total time of execution. # This code was developed on a System76 laptop with an Intel I7 6700HQ cpu, # 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning # probably needed to optimize for other hardware systems (ARM, PowerPC, etc). # # # # # Compile as: $ crystal build twinprimes_ssozgist.cr -Dpreview_mt --release To reduce binary size do: $ strip twinprimes_ssoz Thread workers default to 4, set to system max for optimum performance. Single val: $ CRYSTAL_WORKERS=8 ./twinprimes_ssoz val1 Range vals: $ CRYSTAL_WORKERS=8 ./twinprimes_ssoz val1 val2 # # # # # # Mathematical and technical basis for implementation are explained here: https://www.academia.edu/37952623/The_Use_of_Prime_Generators_to_Implement_Fast_ Twin_Primes_Sieve_of_Zakiya_SoZ_Applications_to_Number_Theory_and_Implications_ for_the_Riemann_Hypotheses https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_ https://www.academia.edu/19786419/PRIMES-UTILS_HANDBOOK # This source code, and its updates, can be found here: # https://gist.github.com/jzakiya/2b65b609f091dcbb6f792f16c63a8ac4 # This code is provided free and subject to copyright and terms of the # GNU General Public License Version 3, GPLv3, or greater. # License copy/terms are here: http://www.gnu.org/licenses/ # Copyright (c) 2017-2022; Jabari Zakiya -- jzakiya at gmail dot com # Last update: 2022/05/22 # Customized gcd for prime generators; n > m; m odd def gcd(m, n) while m|1 != 1; t = m; m = n % m; n = t end m end # Compute modular inverse a^-1 to base m, e.g. a*(a^-1) mod m = 1 def modinv(a0, m0) return 1 if m0 == 1 a, m = a0, m0 x0, inv = 0, 1 while a > 1 inv -= (a // m) * x0 a, m = m, a % m x0, inv = inv, x0 end inv += m0 if inv < 0 inv end def gen_pg_parameters(prime) # Create prime generator parameters for given Pn puts "using Prime Generator parameters for P#{prime}" primes = [2, 3, 5, 7, 11, 13, 17, 19, 23] modpg, res_0 = 1, 0 # compute Pn's modulus and res_0 value primes.each { |prm| res_0 = prm; break if prm > prime; modpg *= prm } restwins inverses pc, inc, while pc = [] of Int32 = Array.new(modpg + 2, 0) res = 5, 2, 0 < (modpg >> 1) # # # # save upper twinpair residues here save Pn's residues inverses here use P3's PGS to generate pcs find PG's 1st half residues 33 if gcd(pc, modpg) == 1 # if pc a residue mc = modpg - pc # create its modular complement inverses[pc] = modinv(pc, modpg) # save pc and mc inverses inverses[mc] = modinv(mc, modpg) # if in twinpair save both hi residues restwins << pc << mc + 2 if res + 2 == pc res = pc # save current found residue end pc += inc; inc ^= 0b110 # create next P3 sequence pc: 5 7 11 13 17 19 ... end restwins.sort!; restwins << (modpg + 1) # last residue is last hi_tp inverses[modpg + 1] = 1; inverses[modpg - 1] = modpg - 1 # last 2 residues are self inverses {modpg, res_0, restwins.size, restwins, inverses} end def set_sieve_parameters(start_num, end_num) # Select at runtime best PG and segment size parameters for input values. # These are good estimates derived from PG data profiling. Can be improved. nrange = end_num - start_num bn, pg = 0, 3 if end_num < 49 bn = 1; pg = 3 elsif nrange < 77_000_000 bn = 16; pg = 5 elsif nrange < 1_100_000_000 bn = 32; pg = 7 elsif nrange < 35_500_000_000 bn = 64; pg = 11 elsif nrange < 14_000_000_000_000 pg = 13 if nrange > 7_000_000_000_000; bn = 384 elsif nrange > 2_500_000_000_000; bn = 320 elsif nrange > 250_000_000_000; bn = 196 else bn = 128 end else bn = 384; pg = 17 end modpg, res_0, pairscnt, restwins, resinvrs = gen_pg_parameters(pg) kmin = (start_num-2) // modpg + 1 # number of resgroups to start_num kmax = (end_num - 2) // modpg + 1 # number of resgroups to end_num krange = kmax - kmin + 1 # number of resgroups in range, at least 1 n = krange < 37_500_000_000_000 ? 4 : (krange < 975_000_000_000_000 ? 6 : 8) b = bn * 1024 * n # set seg size to optimize for selected PG ks = krange < b ? krange : b # segments resgroups size puts "segment size = #{ks} resgroups for seg bitarray" maxpairs = krange * pairscnt # maximum number of twinprime pcs puts "twinprime candidates = #{maxpairs}; resgroups = #{krange}" {modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs} end def sozpg(val, res_0, start_num, end_num) # Compute the primes r0..sqrt(input_num) and store in 'primes' array. # Any algorithm (fast|small) is usable. Here the SoZ for P5 is used. md, rscnt = 30u64, 8 # P5's modulus and residues count res = [7,11,13,17,19,23,29,31] # P5's residues bitn = [0,0,0,0,0,1,0,0,0,2,0,4,0,0,0,8,0,16,0,0,0,32,0,0,0,0,0,64,0,128] kmax = (val - 2) // md + 1 prms = Array(UInt8).new(kmax, 0) modk, r, k = 0, -1, 0 # number of resgroups upto input value # byte array of prime candidates, init '0' # initialize residue parameters loop do # for r0..sqrtN primes mark their multiples if (r += 1) == rscnt; r = 0; modk += md; k += 1 end # resgroup parameters next if prms[k] & (1 << r) != 0 # skip pc if not prime 34 prm_r = res[r] # if prime save its residue value prime = modk + prm_r # numerate the prime value break if prime > Math.isqrt(val) # exit loop when it's > sqrtN res.each do |ri| # mark prime's multiples in prms kn,rn = (prm_r * ri - 2).divmod md # cross-product resgroup|residue bit_r = bitn[rn] # bit mask for prod's residue kpm = k * (prime + ri) + kn # resgroup for 1st prime mult while kpm < kmax; prms[kpm] |= bit_r; kpm += prime end end end # prms now contains the nonprime positions for the prime candidates r0..N # extract only primes that are in inputs range into array 'primes' primes = [] of UInt64 # create empty dynamic array for primes prms.each_with_index do |resgroup, k| # for each kth residue group res.each_with_index do |r_i, i| # check for each ith residue in resgroup if resgroup & (1 << i) == 0 # if bit location a prime prime = md * k + r_i # numerate its value, store if in range # check if prime has multiple in range, if so keep it, if not don't n, rem = start_num.divmod prime # if rem 0 then start_num is multiple of prime primes << prime if (res_0 <= prime <= val) && (prime * (n + 1) <= end_num || rem == 0) end end end primes end def nextp_init(rhi, kmin, modpg, primes, resinvrs) # Initialize 'nextp' array for twinpair upper residue rhi in 'restwins'. # Compute 1st prime multiple resgroups for each prime r0..sqrt(N) and # store consecutively as lo_tp|hi_tp pairs for their restracks. nextp = Slice(UInt64).new(primes.size*2) # 1st mults array for twinpair r_hi, r_lo = rhi, rhi - 2 # upper|lower twinpair residue values primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) k = (prime - 2) // modpg # find the resgroup it's in r = (prime - 2) % modpg + 2 # and its residue value r_inv = resinvrs[r].to_u64 # and residue inverse rl = (r_inv * r_lo - 2) % modpg + 2 # compute r's ri for r_lo rh = (r_inv * r_hi - 2) % modpg + 2 # compute r's ri for r_hi kl = k * (prime + rl) + (r * rl - 2) // modpg # kl 1st mult resgroup kh = k * (prime + rh) + (r * rh - 2) // modpg # kh 1st mult resgroup kl < kmin ? (kl = (kmin - kl) % prime; kl = prime - kl if kl > 0) : (kl -= kh < kmin ? (kh = (kmin - kh) % prime; kh = prime - kh if kh > 0) : (kh -= nextp[j * 2] = kl.to_u64 # prime's 1st mult lo_tp resgroup val nextp[j * 2 | 1] = kh.to_u64 # prime's 1st mult hi_tp resgroup val end nextp end kmin) kmin) in range in range def twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs) # Perform in thread the ssoz for given twinpair residues for kmax resgroups. # First create|init 'nextp' array of 1st prime mults for given twinpair, # stored consequtively in 'nextp', and init seg array for ks resgroups. # For sieve, mark resgroup bits to '1' if either twinpair restrack is nonprime # for primes mults resgroups, and update 'nextp' restrack slices acccordingly. # Return the last twinprime|sum for the range for this twinpair residues. s = 6 # shift value for 64 bits bmask = (1 << s) - 1 # bitmask val for 64 bits sum, ki, kn = 0_u64, kmin-1, ks # init these parameters hi_tp, k_max = 0_u64, kmax # max twinprime|resgroup seg = Slice(UInt64).new(((ks - 1) >> s) + 1) # seg array of ks resgroups ki += 1 if ((ki * modpg) + r_hi - 2) < start_num # ensure lo tp in range k_max -= 1 if ((k_max - 1) * modpg + r_hi) > end_num # ensure hi tp in range nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) # init nextp array while ki < k_max # for ks size slices upto kmax kn = k_max - ki if ks > (k_max - ki) # adjust kn size for last seg primes.each_with_index do |prime, j| # for each prime r0..sqrt(N) # for lower twinpair residue track k = nextp.to_unsafe[j * 2] # starting from this resgroup in seg 35 while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2] = k - kn # save 1st resgroup in next eligible seg # for upper twinpair residue track k = nextp.to_unsafe[j * 2 | 1] # starting from this resgroup in seg while k < kn # mark primenth resgroup bits prime mults seg.to_unsafe[k >> s] |= 1_u64 << (k & bmask) k += prime end # set resgroup for prime's next multiple nextp.to_unsafe[j * 2 | 1]= k - kn # save 1st resgroup in next eligible seg end # set as nonprime unused bits in last seg[n] # so fast, do for every seg[i] seg.to_unsafe[(kn - 1) >> s] |= ~1u64 << ((kn - 1) & bmask) cnt = 0 # count the twinprimes in the segment seg[0..(kn - 1) >> s].each { |m| cnt += (~m).popcount } if cnt > 0 # if segment has twinprimes sum += cnt # add segment count to total range count upk = kn - 1 # from end of seg, count back to largest tp while seg.to_unsafe[upk >> s] & (1_u64 << (upk & bmask)) != 0; upk -= 1 end hi_tp = ki + upk # set its full range resgroup value end ki += ks # set 1st resgroup val of next seg slice seg.fill(0) if ki < k_max # set next seg to all primes if in range end # when sieve done, numerate largest twin # for ranges w/o twins set largest to 1 hi_tp = (r_hi > end_num || sum == 0) ? 1 : hi_tp * modpg + r_hi {hi_tp.to_u64, sum.to_u64} # return largest twinprime|twins count end def twinprimes_ssoz() end_num = {ARGV[0].to_u64, 3u64}.max start_num = ARGV.size > 1 ? {ARGV[1].to_u64, 3u64}.max : 3u64 start_num, end_num = end_num, start_num if start_num > end_num start_num |= 1 # if start_num even increase by 1 end_num = (end_num - 1) | 1 # if end_num even decrease by 1 start_num = end_num = 7 if end_num - start_num < 2 puts "threads = #{System.cpu_count}" ts = Time.monotonic # start timing sieve setup execution # select Pn, set sieving params for inputs modpg, res_0, ks, kmin, kmax, krange, pairscnt, restwins, resinvrs = set_sieve_parameters(start_num, end_num) # create sieve primes <= sqrt(end_num), only use those whose multiples within inputs range primes = end_num < 49 ? [5] : sozpg(Math.isqrt(end_num), res_0, start_num, end_num) puts "each of #{pairscnt} threads has nextp[2 x #{primes.size}] array" lo_range = restwins[0] - 3 # lo_range = lo_tp - 1 twinscnt = 0_u64 # determine count of 1st 4 twins if in range for used Pn twinscnt += [3, 5, 11, 17].select { |tp| start_num <= tp <= lo_range }.size unless end_num == 3 te = puts puts t1 = (Time.monotonic - ts).total_seconds.round(6) "setup time = #{te} secs" # display sieve setup time "perform twinprimes ssoz sieve" Time.monotonic # start timing ssoz sieve execution cnts = Array(UInt64).new(pairscnt, 0) # number of twinprimes found per thread lastwins = Array(UInt64).new(pairscnt, 0) # largest twinprime val for each thread done = Channel(Nil).new(pairscnt) threadscnt = Atomic.new(0) # count of finished threads restwins.each_with_index do |r_hi, i| # sieve twinpair restracks spawn do lastwins[i], cnts[i] = twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, 36 resinvrs) print "\r#{threadscnt.add(1)} of #{pairscnt} twinpairs done" done.send(nil) end end pairscnt.times { done.receive } # wait for all threads to finish print "\r#{pairscnt} of #{pairscnt} twinpairs done" last_twin = lastwins.max # find largest hi_tp twinprime in range twinscnt += cnts.sum # compute number of twinprimes in range last_twin = 5 if end_num == 5 && twinscnt == 1 kn = krange % ks # set number of resgroups in last slice kn = ks if kn == 0 # if multiple of seg size set to seg size t2 = (Time.monotonic - t1).total_seconds # sieve execution time puts puts puts puts end "\nsieve time = #{t2.round(6)} secs" # ssoz sieve time "total time = #{(t2 + te).round(6)} secs" # setup + sieve time "last segment = #{kn} resgroups; segment slices = #{(krange - 1)//ks + 1}" "total twins = #{twinscnt}; last twin = #{last_twin - 1}+/-1" twinprimes_ssoz 37
This document is currently being converted. Please check back in a few minutes.