User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Pollard Rho loop analysis

Fri May 21, 2021 5:35 pm

In thread "gmp (Gnu multiple precision arithmetic lib) 5x faster on intel i7 than on Pi400 (two samples)"
viewtopic.php?f=33&t=311893
factorization algorithm based on Pollard Rho method with arbitrary precision arithmetic was presented.

In this thread I want to look into the loops that are built by function "x->(x^2+a) mod n", starting with n prime, and later n=p*q product of primes.

I said "to look", by that I mean using graphviz for vizualisation (sudo apt install graphviz).
This is basic small C-program for creating graphviz dot file:

Code: Select all

$ cat loop.c 
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
  unsigned n=atoi(argv[1]);
  unsigned a=atoi(argv[2]);
  unsigned x;

  printf("digraph D {\n");
  printf("size=8 8\n");
  for(x=0; x<n; ++x)
    printf("%u->%u\n", x, (x*x+a)%n);
  printf("}\n");
}
$

It is used by bash script "mak" fot creating digraph for parameters n and a, and display the result:

Code: Select all

$ cat mak
#!/bin/bash
./loop $1 $2 > $1_$2.dot
neato -Tsvgz $1_$2.dot > $1_$2.svgz
eog $1_$2.svgz
$
For the kind of digraphs created, force-based "neato" is the best layout algorithm, and not the standard "dot" or any other of the graphviz layout program alternatives.

Taking prime 83 as example, this is n=83 a=0, showing big diameter cycles:
83_0.svgz.png
83_0.svgz.png
83_0.svgz.png (77.06 KiB) Viewed 2776 times

n=83 and a=1 leads to very small diameter circles, but some very long paths to the circles:
83_1.svgz.png
83_1.svgz.png
83_1.svgz.png (62.83 KiB) Viewed 2776 times

Finally n=83 and a=15 leads to mixture of long diameter circles and long paths:
83_15.svgz.png
83_15.svgz.png
83_15.svgz.png (70.47 KiB) Viewed 2776 times
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

ejolson
Posts: 8055
Joined: Tue Mar 18, 2014 11:47 am

Re: Pollard Rho loop analysis

Fri May 21, 2021 5:42 pm

HermannSW wrote:
Fri May 21, 2021 5:35 pm
Finally n=83 and a=15 leads to mixture of long diameter circles and long paths:
The graph is supposed to look like

ρ

right? I think I see that happening.

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Fri May 21, 2021 5:55 pm

ejolson wrote:
Fri May 21, 2021 5:42 pm
The graph is supposed to look like

ρ

right? I think I see that happening.
Yes, you are right:
https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm


I looked into bigger "n"s, and that was the reason I did choose ".svgz" format for the generated graphs:
SVG (scalable vector graphics) is scalable, and .svgz uses compression as well to keep size low.

This is graph for biggest prime n=9973 below 10000. As you can see there are many circles:
9973_0.svgz.png
9973_0.svgz.png
9973_0.svgz.png (71.85 KiB) Viewed 2750 times

Zooming in to up to 2000% with gnome image viewer "eog" shows the fine details, parts from two big circles nearly touching each other:
9973_0.svgz.2000%.png
9973_0.svgz.2000%.png
9973_0.svgz.2000%.png (122.3 KiB) Viewed 2750 times

I wrote a small program to analyze the circles (it allocates two arrays of size 4*n, so n cannot be too large):
https://gist.github.com/Hermann-SW/83de ... 641c26c590

Code: Select all

$ ./analyze 5 0
1 1 
0: 1 0 1
1: 2 0 1
2: 2 2 3
3: 2 2 3
4: 2 1 2
c 2
m 2 1.0
l 1 1.0
lm 3 2.0
lm2 3
$
The first line says that two circles of length 1 are found (self-loops from number to itself, here 0->0 and 1->1 for "x->(x^2+0) mod 5".
The rows below tell for each number in range 0..n-1 (before the colon) values l, m and l+m.
"c 2" mean two circles found.
Then for l (length of circle), m (length of path until number on cycle reached) and lm (lm[x]=l[x]+m[x]) are reported.
First number is overall maximum, 2nd number is average over all numbers 0..n-1.
Finally lm2 reports lm[2] for start number "2" which is default start value for Pollard Rho method (which has default "a=1").


Using the analysis for n=9973 and a=0 we get more information about the image shown:

Code: Select all

$ ./analyze 9973 0 | grep -v :
1 1 92 92 92 92 92 92 276 276 92 276 276 92 276 276 92 2 6 
c 19
m 2 1.2
l 276 213.9
lm 278 215.1
lm2 94
$
There are 19 circles in total, 6 of which have length 276 and 9 of which have length 92.
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Fri May 21, 2021 6:18 pm

Now lets look at N=997 and a=0:
997_0.svgz.png
997_0.svgz.png
997_0.svgz.png (120.63 KiB) Viewed 2735 times

As can be seen there are three big circles, and analysis says they are of length 82:

Code: Select all

$ ./analyze 997 0 | grep -v :
1 1 82 82 82 2 
c 6
m 2 1.2
l 82 81.0
lm 84 82.2
lm2 84
$
Now doing prime factorization for n-1=996 = 2^2 * 3 * 83, biggest prime factor 83 is just 1 bigger than longest circles found in analysis. This phenomenon was true for nearly all "n"s I tried with a=0.

For finding biggest circles, I looked at primes p, where (p-1)/2 is also prime (safe primes):
https://oeis.org/A005385

This is graph for safe prime n=107 (a=0):
107_0.svgz.png
107_0.svgz.png
107_0.svgz.png (80.89 KiB) Viewed 2735 times
Analysis shows a single big circle of length 52, and 107=2*53+1:

Code: Select all

$ ./analyze 107 0 | grep -v :
1 1 52 
c 3
m 1 0.5
l 52 50.6
lm 53 51.1
lm2 53
$

So likely "a=0" is not a good choice for Pollard Rho factorization algorithm.
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Fri May 21, 2021 6:38 pm

Now looking at RHO-100 1st RSA factoring challenge number:
https://en.wikipedia.org/wiki/RSA_numbers#RSA-100

The factors p and q are no safe primes, but both [p-1 and q-1] have a very long biggest prime factor (18/26 digits):
$ ./factorize 37975227936943673922808872755445627854565536638198
37975227936943673922808872755445627854565536638198: 2 3167 3613 587546788471 3263521422991 865417043661324529
$
$ ./factorize 40094690950920881030683735292761468389214899724060
40094690950920881030683735292761468389214899724060: 2 2 5 41 2119363 602799725049211 38273186726790856290328531
$
$ bc -q
n=1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139
p=37975227936943673922808872755445627854565536638199
q=40094690950920881030683735292761468389214899724061
n-p*q
0
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Sun May 23, 2021 8:30 pm

All Pollard Rho algorithms I have seen sofar use "x->(x^2+a) mod n" with different, but fixed values for a.
We have seen that a=0 is likely not a good choice.

Some "a"s are good for some inputs, but not that good for other inputs.

I found a method to use a mixture of a1, a2, ..., ak, use:
(x->(x^2+a1) mod n) + n
then
(x->(x^2+a2) mod n) + 2*n
then
...
finally
x->(x^2+ak) mod n
and then repeat with a1, a2, ...

For multiplication, difference building and gcd() computation, the multiple of n offsets don't care.
I need to rewrite analysis.c tool to report numbers for using a1, a2, ..., ak ...
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Thu Jun 10, 2021 10:15 pm

In other thread posting simple hare and tortoise Pollard Rho algorithm was started in parallel on 9 cores with different values for "a":
viewtopic.php?f=33&t=311893&p=1876289#p1876289

Instead of having to wait for 37 seconds to get 31digit number factored with default value "a=1", first program stopped with factorization after only 8 seconds:
multi-factorize.png
multi-factorize.png
multi-factorize.png (46.72 KiB) Viewed 2445 times

Question now is how to choose set of 24 "a" values (I have access to a machine with 24 physical cores) so that first factorization gets reported fast. It needs to be considered that algorithm gets recursively called with "a+1" in case "gcd(hare-tortoise,n)==n":
https://gist.github.com/Hermann-SW/bfa0 ... rize-c-L73

Because of that choosing successive numbers for "a" as in above example is likely not good ...
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Thu Jun 10, 2021 11:02 pm

Next experiment, factoring (2^61-1)^2 that took 1534 seconds on i7.
I started 24 programs in background, with values "1 and primes <89" for a.
First program finished in 178 seconds only!
Default a=1 took 1631 seconds.
Last finisher was a=53 in 2240 seconds:
# for i in 1 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83; do ./factorize 5316911983139663487003542222693990401 $i & done
# 5316911983139663487003542222693990401(37): 2305843009213693951 2305843009213693951
[239379195] 178081430292ns
5316911983139663487003542222693990401(61): 2305843009213693951 2305843009213693951
[381929703] 296462510871ns
5316911983139663487003542222693990401(79): 2305843009213693951 2305843009213693951
[564229691] 463837995445ns
5316911983139663487003542222693990401(2): 2305843009213693951 2305843009213693951
[916265383] 727448510261ns
5316911983139663487003542222693990401(29): 2305843009213693951 2305843009213693951
[966671113] 730703246286ns
5316911983139663487003542222693990401(11): 2305843009213693951 2305843009213693951
[1060573965] 791627178335ns
5316911983139663487003542222693990401(73): 2305843009213693951 2305843009213693951
[1158177900] 861668098355ns
5316911983139663487003542222693990401(71): 2305843009213693951 2305843009213693951
[1195387796] 929350141287ns
5316911983139663487003542222693990401(83): 2305843009213693951 2305843009213693951
[1246192557] 929880773927ns
5316911983139663487003542222693990401(23): 2305843009213693951 2305843009213693951
[1335540845] 1009334758484ns
5316911983139663487003542222693990401(17): 2305843009213693951 2305843009213693951
[1432697673] 1073120510105ns
5316911983139663487003542222693990401(59): 2305843009213693951 2305843009213693951
[1448906908] 1096388994755ns
5316911983139663487003542222693990401(13): 2305843009213693951 2305843009213693951
[1801257367] 1346426834616ns
5316911983139663487003542222693990401(7): 2305843009213693951 2305843009213693951
[1809569828] 1347825243786ns
5316911983139663487003542222693990401(31): 2305843009213693951 2305843009213693951
[1861347840] 1390923977909ns
5316911983139663487003542222693990401(47): 2305843009213693951 2305843009213693951
[1880798495] 1400540741266ns
5316911983139663487003542222693990401(67): 2305843009213693951 2305843009213693951
[1900420809] 1415451734862ns
5316911983139663487003542222693990401(41): 2305843009213693951 2305843009213693951
[2003126736] 1493349090289ns
5316911983139663487003542222693990401(43): 2305843009213693951 2305843009213693951
[2133065076] 1589138138134ns
5316911983139663487003542222693990401(1): 2305843009213693951 2305843009213693951
[2163790305] 1631049634692ns
5316911983139663487003542222693990401(19): 2305843009213693951 2305843009213693951
[2212260579] 1648327735182ns
5316911983139663487003542222693990401(5): 2305843009213693951 2305843009213693951
[2511875037] 1945819858899ns
5316911983139663487003542222693990401(3): 2305843009213693951 2305843009213693951
[2932193235] 2190418722150ns
5316911983139663487003542222693990401(53): 2305843009213693951 2305843009213693951
[3010950498] 2240799600863ns
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Fri Jun 11, 2021 6:06 am

I tried to factor 54 decimal digit square of Mersenne prime M89=(2^89-1).
There are 46 primes below 200, and I chose 1 and 65537 as values for "a" in addition (see very right of long lines).
I had to kill all processes now, after more than 6 CPU hours runtime on each, with none of the 48 programs having reported factorization:

Code: Select all

# ps -wl | grep "factorize [0-9]*6321" | head -3
R     0 116151 111188  6888   624 pts7  23:06 06:20:57 ./factorize 383123885216472214589586755549637256619304505646776321 1
R     0 116152 111188  6888   632 pts7  23:06 06:21:54 ./factorize 383123885216472214589586755549637256619304505646776321 2
R     0 116153 111188  6888   624 pts7  23:06 06:20:36 ./factorize 383123885216472214589586755549637256619304505646776321 3
# ps -wl | grep "factorize [0-9]*6321" | tail -3
R     0 116196 111188  6888   684 pts7  23:06 06:21:54 ./factorize 383123885216472214589586755549637256619304505646776321 197
R     0 116197 111188  6888   680 pts7  23:06 06:41:54 ./factorize 383123885216472214589586755549637256619304505646776321 199
R     0 116198 111188  6888   684 pts7  23:06 06:19:31 ./factorize 383123885216472214589586755549637256619304505646776321 65537
#

So better algorithm and/or better "set of 48 'a's" is needed to factor M89^2 ...
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Fri Jun 11, 2021 11:55 am

I did add original brent cycle finding algorithm as option to factorize.c:
https://gist.github.com/Hermann-SW/bfa0 ... /revisions

Now simple help is printed when run without arguments:

Code: Select all

# ./factorize 
factorize [n [a] [floyd|brent]]
  Determines 1st factor for composite n with timing;
  hangs indefinitely for prime n.
# 

"brent" option is typically more than double as fast than floyd.
I did repeat the 24 parallel programs run to factor (2 ^61-1)^2.
First solution was reported after 68 seconds for brent, compared to 178 seconds for floyd.
The order of completion for the different "a" values is identical for first three entries only (68/114/180 vs. 178/296/463 seconds):
# for i in 1 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83; do ./factorize 5316911983139663487003542222693990401 $i brent & done
# 5316911983139663487003542222693990401(37): 2305843009213693951 2305843009213693951
[239379200] 68965957453ns (brent)
5316911983139663487003542222693990401(61): 2305843009213693951 2305843009213693951
[381929728] 114078558342ns (brent)
5316911983139663487003542222693990401(79): 2305843009213693951 2305843009213693951
[564229696] 180483982185ns (brent)
5316911983139663487003542222693990401(71): 2305843009213693951 2305843009213693951
[597693920] 189778132962ns (brent)
5316911983139663487003542222693990401(11): 2305843009213693951 2305843009213693951
[707049312] 203749813136ns (brent)
5316911983139663487003542222693990401(2): 2305843009213693951 2305843009213693951
[916265408] 248934910426ns (brent)
...
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Sun Jun 13, 2021 2:56 pm

There are two reasons why "Brent cycle finding" is faster that "Floyd cycle finding" in factorize.c:
  1. for Floyd there are 3 f() computations per loop [f(x) and f(f(y))]
    https://gist.github.com/Hermann-SW/bfa0 ... rize-c-L56
    while for Brent there is one "f(x)" inside loop
    https://gist.github.com/Hermann-SW/bfa0 ... ize-c-L112
    and one matching outside the loop
    https://gist.github.com/Hermann-SW/bfa0 ... ize-c-L135
  2. the costly "gcd()" is computed for each loop for Floyd, but only every (when k becomes >=32) 32nd loop for Brent
    https://gist.github.com/Hermann-SW/bfa0 ... ize-c-L120
So ratio 3/2 for f() computation and 32/1 for gcd() computation per loop for Floyd compared to Brent, resulting in Brent being more than 2x faster than Floyd.


While Brent is faster than Floyd for Pollard Rho, it is no magic.
I did start 40 "Brent" factorize programs Friday night (on CPUs with 24 cores/48 threads), with values for "a" of 1..20, 101..110 and 1001..1010.
None of them has completed yet after 40 hours of runtime each, to factor 54 decimal digit number (2^89-1)^2 ...
2021-06-13-163323_800x480_scrot.png
2021-06-13-163323_800x480_scrot.png
2021-06-13-163323_800x480_scrot.png (111.82 KiB) Viewed 2311 times

Code: Select all

$ bc -q
(2^89-1)^2
383123885216472214589586755549637256619304505646776321

P.S:
For a prime p: 2^(p-1) mod p = 1
For product of two primes n=p*q: 2^((p-1)*(q-1)) mod n = 1
"bc -q" output:

Code: Select all

print n, "=", p, "*", q, "\n"
1081=23*47
(2^(p-1))%p
1
(2^(q-1))%q
1
(2^(n-1))%n
165
(2^((p-1)*(q-1)))%n
1
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Tue Jun 15, 2021 11:11 pm

I wanted to know more about the cycle lengths of "f(x)=(x^2+a) mod n", for simplicity with a=0. Instead of composite numbers n=p*q being product of two primes, for analysis looking at primes n. This is diagram for "f(x)=(x^2+0) mod 31":

Code: Select all

30: 2 3 5 | 1 1 4 4 2 4 
31_0.svgz.jpg
31_0.svgz.jpg (20.45 KiB) Viewed 2245 times

I used original "factorize.c" from gmplib to factor "p-1" for prime "p", and then used "analyze.c" shown earlier in this thread to determine the cycle lengths:
$ for p in 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97;\
> do for t in `./factorize.orig $(($p-1))`; do echo -n "$t "; done;\
> echo -n " | "; ./analyze $p 0 | grep -v : | head -1; done
2: 2 | 1 1
4: 2 2 | 1 1
6: 2 3 | 1 1 2
10: 2 5 | 1 1 4
12: 2 2 3 | 1 1 2
16: 2 2 2 2 | 1 1
18: 2 3 3 | 1 1 6 2
22: 2 11 | 1 1 10
28: 2 2 7 | 1 1 3 3
30: 2 3 5 | 1 1 4 4 2 4
36: 2 2 3 3 | 1 1 6 2
40: 2 2 2 5 | 1 1 4
42: 2 3 7 | 1 1 3 6 2 3 6
46: 2 23 | 1 1 11 11
52: 2 2 13 | 1 1 12
58: 2 29 | 1 1 28
60: 2 2 3 5 | 1 1 4 4 4 2
66: 2 3 11 | 1 1 10 10 10 2
70: 2 5 7 | 1 1 12 4 12 3 3
72: 2 2 2 3 3 | 1 1 6 2
78: 2 3 13 | 1 1 12 12 12 2
82: 2 41 | 1 1 20 20
88: 2 2 2 11 | 1 1 10
96: 2 2 2 2 2 3 | 1 1 2
$

Most times biggest prime factor of "p-1" (minus 1) is a cycle length of "p":

Code: Select all

52: 2 2 13  |  1 1 12

Sometimes biggest cycle length of "p" is smaller (11) than biggest prime factor of "p-1":

Code: Select all

46: 2 23  |  1 1 11 11 

And sometimes it is bigger (6):

Code: Select all

72: 2 2 2 3 3  |  1 1 6 2 

Not that easy to fully describe how the relation is ...


P.S:
Now nearly 40*100=4000 CPU hours completed since last Friday night, still Brent cycle finding Pollard Rho algorithm has not factorized 54 decimal digit number (2^89-1)^2 for any of the 40 "a" values:
(on workstation with two 2.6GHz Intel® Xeon® Gold 6126 processors with 24 cores (48 hardware threads) in total)
nearly_4000_CPU_hours.jpg
nearly_4000_CPU_hours.jpg
nearly_4000_CPU_hours.jpg (235.68 KiB) Viewed 2257 times
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Wed Jun 16, 2021 6:59 pm

I got more insight into the number of cycles and maximal cycle lengths!
And I used the Online Encyclopedia of Integer Sequences (oeis.org) for that.

#cycles can be determined by copy+paste of the numbers reported into oeis.org input field:

Code: Select all

$ for((i=3; i<=100; ++i)); do ./analyze $i 0 | grep -v : | head -2 | tail -1 | cut -f2 -d\ ; done
There is exactly one matching sequence A023153:
https://oeis.org/search?q=2+2+2+4+3+2+3 ... &go=Search

Maximal cycle length:

Code: Select all

$ for((i=1; i<=88; ++i)); do ./analyze $i 0 | grep -v : | head -1 | sed "s/ /,/g" | python -c "print(max(input()))"; done
Again exactly one matching sequence A279186:
https://oeis.org/search?q=1+1+1+1+1+1+2 ... &go=Search
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Tue Jun 22, 2021 11:02 pm

Factorization is still running, 40 processes with 263.5 CPU hours each is 10540 CPU hours in total.

I always thought that maximal cycle length in Pollard Rho is of order sqrt(n), but that is wrong in general.
I learned from sequence A248218 (period of x->(x^2+1) mod n with start value 0)
https://oeis.org/A248218
example a(357911) = 54670, that powers of 71 have long cycles.
I used analyze.c to compute a(71^i) with 2<=i<=6.
And the period length is 770*71^(i-2) for n=71^i.
770/71^2=0.15274747073993255306, so period for those composite numbers is linear.

I used analyze.c
https://gist.github.com/Hermann-SW/83de ... 641c26c590
to further determine all start values that end up on start value 0 cycle (that numbered 1).
A(n) being the number of thos start values, A(71^i) = 3711 * 71^(i-2) for i=2..5:

Code: Select all

$ ./analyze $((71*71)) 1 | grep ": 1 " | wc --lines
3711
$
Now 3711/71^2=0.73616345963102559016, so majority of start values end on start value 0 cycle.

I submitted new results to oeis.org, the draft is in review currently:
https://oeis.org/draft/A248218

Computing a(71^5) with analyze.c did use 26.9g ram during computation.
For computing a(71^6) I did not have access to a computer with 26.9*71=1910GB ram, so I used new period.c
https://gist.github.com/Hermann-SW/eeb0 ... 761ab38998
that only determines period for cycle starting at some given start value.
Took less than an hour (3528 seconds) to compute a(128,100,283,921) = a(71^6) = 19,566,994,370 on Intel i7:

Code: Select all

$ ./period $((71*71*71*71*71*71))
n=128100283921 a=1 s=0: 19566994370 [71536325699] 3528148502656ns
$ 

Let P be period computed. In case start value is on cycle already, then variable x moved P and variable z moved 2*P steps, 3*P steps in total (58.7 billion steps for previous example, those needing 1 CPU hour to compute and compare for equality (hare z and tortoise x)).


P.S
> I submitted new results to oeis.org, the draft is in review currently:
> https://oeis.org/draft/A248218
>
Submission was approved after review process:
  • results under Comments section
  • analyze.c and period.c in Links section
  • explanation in Prog section under "(C)"
https://oeis.org/A248218
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Tue Jun 29, 2021 6:11 pm

I thought that linear cycle length (15%) for n=71^i with 2 <= i <= 5 is bad for runtime of Pollard-Rho factorization algorithm. The opposite is true, every value on start value 0 cycle has gcd of 71 to the value after applying "x -> (x^2 + 1) mod n" exactly 11 times, so factorization completes quickly.
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Tue Jul 13, 2021 12:18 pm

HermannSW wrote:
Fri May 21, 2021 5:35 pm
...
This is basic small C-program for creating graphviz dot file:

Code: Select all

$ cat loop.c 
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
  unsigned n=atoi(argv[1]);
  unsigned a=atoi(argv[2]);
  unsigned x;

  printf("digraph D {\n");
  printf("size=8 8\n");
  for(x=0; x<n; ++x)
    printf("%u->%u\n", x, (x*x+a)%n);
  printf("}\n");
}
$

It is used by bash script "mak" for creating digraph for parameters n and a, and display the result:

Code: Select all

$ cat mak
#!/bin/bash
./loop $1 $2 > $1_$2.dot
neato -Tsvgz $1_$2.dot > $1_$2.svgz
eog $1_$2.svgz
$
...
I played with this again, and sometimes after changes in loop.c, I forgot to compile before calling "mak n a" and wodered why things did not change.

One option would be to add Makefile, but then 3 files are needed (Makefile, loop.c and mak).
Instead I reduced everything into single file "loop2":
https://gist.github.com/Hermann-SW/004f ... 2a37eef09f

It is combined bash script and C file.
This part is the excutable bash part:

Code: Select all

#!/bin/bash

#if 0
if [ $0 -nt /tmp/$0 ]; then gcc -xc -Wall -Wextra <(tail -n+2 $0) -o /tmp/$0; fi
/tmp/$0 $1 $2 > $1_$2.dot
neato -Tsvg $1_$2.dot > $1_$2.svg
eog $1_$2.svg
exit
#endif
...
For executable name "loop2" it tests whether "/tmp/loop2" has not been compiled, or whether "loop2" timestamp is newer than that of "/tmp/loop2". If so, it extracts the C portion of "loop2" as file with "<(tail -n+2 $0)" and compiles it (as "C" with "-xc").
I changed output format from compressed SVG to just SVG, because browsers can deal with SVG.
For displaying the graph still gnome viewer is used, but opening the generated SVG in browsers allows to search for strings in the graph labels with CTRL-F.

For supporting browser search, all node labels get pre- and suffixed with non-break space Unicode U+00A0 character (that keeps labels "looking" the same). When calling "lab()" more than once in a print statement, separate buffers are needed:

Code: Select all

char nbsp[3] = { 0xC2, 0xA0, 0x00 };
char buf[10][100];
unsigned int buc=0;
char *lab(unsigned u)
{
  if (++buc==10)  buc=0;
  sprintf(buf[buc], "\"%s%u%s\"", nbsp, u, nbsp);
  return buf[buc];
}
Now searching for label "8" in browser press CTRL+F, and then <STRG+SHIFT>a0<ENTER>8<STRG+SHIFT>a0<ENTER><ENTER>.

This part of C code created the Graphviz graph, and creates different node box shape for labels "x", when difference of "x" and "n-x" has greatest common divisor with "n" greater than 1:

Code: Select all

  printf("digraph D {\n");
  printf("size=\"8 8\"\n");

  for(x=0; x<n; ++x)
    if (gcd(dif(x, n-x), n) > 1)
      printf("%s [shape=box]\n", lab(x));

  for(x=0; x<n; ++x)
    printf("%s->%s\n", lab(x), lab((x*x+a)%n));

  printf("}\n");

Executing this command ...

Code: Select all

pi@raspberrypi4B:~/rho $ ./loop2 91 1

... displays this graph:
2021-07-13-110347_800x480_scrot.png
2021-07-13-110347_800x480_scrot.png
2021-07-13-110347_800x480_scrot.png (55.15 KiB) Viewed 1566 times

If you want to play with loop2, make sure that graphviz is installed ("sudo apt install graphviz").
For using the many features of Graphviz, here is guide:
http://graphviz.org/pdf/dotguide.pdf


Now its time to investigate something new with this tool ...
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Tue Jul 13, 2021 9:01 pm

HermannSW wrote:
Tue Jul 13, 2021 12:18 pm
For using the many features of Graphviz, here is guide:
http://graphviz.org/pdf/dotguide.pdf


Now its time to investigate something new with this tool ...
I played with modified tools and tried non-standard polynomial transformations.
Looking at n=11*19=209 the rectangular box shapes were hard to spot when not zooming in.
So this print statement tells Graphviz to fill the box with lightgrey color:

Code: Select all

printf("%s [shape=box,style=filled,bgcolor=lightgrey]\n", lab(x));

The first experiments went into one direction compared to the trees seen sofar in this thread:
2021-07-13-222154_800x480_scrot.png
2021-07-13-222154_800x480_scrot.png
2021-07-13-222154_800x480_scrot.png (89.85 KiB) Viewed 1517 times

Finally I did a completely different approach, resulting in self-loop 1->1 as the only cycle (9 o'clock of center):
2021-07-13-222351_800x480_scrot.png
2021-07-13-222351_800x480_scrot.png
2021-07-13-222351_800x480_scrot.png (54.66 KiB) Viewed 1517 times

Will see whether the approaches will provide advantages for factorization of n ...
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Thu Jul 29, 2021 8:18 am

For prime p and a in 0..p-1:

Code: Select all

a^(p-1) = 1 mod p

Today I learned that arbitrary precision binary calculator "bc" is not capable of computing above equation for really big exponents. I took smaller prime factor of RHO-100:
https://en.wikipedia.org/wiki/RSA_numbers#RSA-100

And got error using "bc":

Code: Select all

$ bc -q
p=37975227936943673922808872755445627854565536638199
2^(p-1) % p
Runtime error (func=(main), adr=8): exponent too large in raise
$ 

After building gmplib
https://gmplib.org/#DOWNLOAD

building tool "pexpr" in demos ("make pexpr") allows to compute that in less than a second!

Code: Select all

$ export p=37975227936943673922808872755445627854565536638199
$ ./pexpr "2^($p-1) % $p"
1
$ ./pexpr "2^($p-2) % $p"
18987613968471836961404436377722813927282768319100
$ 

P.S:
Playing with Rho100 prime factors:

Code: Select all

$ a="(2*3*5*7*11*13*17*19*23*29*31*37*41*43*47*53*59*61*67*71*73*79*83*89*97)"
$ p=37975227936943673922808872755445627854565536638199
$ q=40094690950920881030683735292761468389214899724061
$ ./pexpr "$a^(($p-1)*($q-1)) % ($p*$q)"
1
$
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Thu Aug 05, 2021 1:21 pm

HermannSW wrote:
Sun Jun 13, 2021 2:56 pm
I did start 40 "Brent" factorize programs Friday night (on CPUs with 24 cores/48 threads), with values for "a" of 1..20, 101..110 and 1001..1010.
None of them has completed yet after 40 hours of runtime each, to factor 54 decimal digit number (2^89-1)^2 ...
HermannSW wrote:
Tue Jun 22, 2021 11:02 pm
Factorization is still running, 40 processes with 263.5 CPU hours each is 10540 CPU hours in total.
Back from hospital I have access to the computer running the factorizations on 40 cores.
What I had not mentioned, I redirected output for "a=$i" to file "out.$i".
Today I logged into system and did an "ls -lst", and found exactly one out file with non-zero length, with a=1:

Code: Select all

# ls -lst out.* | head -2
     4 -rw-r--r--    1 root     root           159 Jul 19 20:37 out.1
     0 -rw-r--r--    1 root     root             0 Jun 11 22:03 out.10
# date
Thu Aug  5 13:00:34 UTC 2021
# 
# 

That means that the other 39 "a"s are still running, each of the 39 cores having taken (19+31+4)*24+15=1335 CPU hours, or 52065(!) CPU hours in total -- without having found factor of "(2^89-1)^2".

"out.1" file with non-0 length completed on July 19, after (19+18)*24+22.5 = 910.5 CPU hours based on file timestamps.
But we know it better from the nanosecond resolution timestamp in factoring output, 910.6 CPU hours!

Code: Select all

# echo "3278071011998301/(10^9*60^2)" | bc -ql
910.57528111063916666666
#
# cat out.1
383123885216472214589586755549637256619304505646776321(1): 618970019642690137449562111 618970019642690137449562111
 [3201775069376] 3278071011998301ns (brent)
# 

The computer had 2 CPUs with 12 cores each, and 48 hardware threads, here are the details:

Code: Select all

# cat /proc/cpuinfo | tail -26 | head -9
processor	: 47
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
stepping	: 4
microcode	: 0x2000030
cpu MHz		: 2600.000
cache size	: 19712 KB
# 

P.S:
So factorization for "a=1" was done in 1.25 CPU months on a single core:

Code: Select all

$ echo "((19+18)*24+22.5)/(24*365/12)" | bc -ql
1.24726027397260273972
$
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Fri Aug 06, 2021 5:52 pm

I enhanced output of factorize.c gist:
https://gist.github.com/Hermann-SW/bfa0 ... ab28057bf6
factorize.diff.png
factorize.diff.png
factorize.diff.png (21.94 KiB) Viewed 819 times
Now the two values x and z are printed.
They are on the circle of the "ρ" of Pollard Rho algorithm.
And their difference has greatest common divisor >1 with to factor number n.

This is new factoring output for square of Mersenne prime M31:

Code: Select all

$ ./factorize `./pexpr "(2^31-1)^2"`
4611686014132420609(1): 2147483647 2147483647 (894488611079900130 2067272507081075303)
 [27081] 9542418ns (floyd)
$
The number 27081 reported in square brackets is the number of applications of "f(x)=(x^2+1) mod n", when started with 894488611079900130 resulting in 2067272507081075303, and when starting with 2 resulting in 894488611079900130.

This python script verifies what I stated:

Code: Select all

def _f(_t):
    return (_t[0]**2+_t[1])%_t[2]

def _repeat(_func, _m, _t):
    for _i in range(_m):
        _t = (_func(_t), _t[1], _t[2])
    return _t[0]

Code: Select all

def _gcd(_a, _b):
    if _a < _b:
        return _gcd(_b, _a)
    return _a if _b == 0 else _gcd(_b, _a%_b)

def _dif(_a, _b):
    return _a-_b if _a > _b else _b-_a

Code: Select all

S = 894488611079900130
A = 1
N = (2**31-1)**2
print(_repeat(_f, 27081, (2, A, N)) == S)

T = _repeat(_f, 27081, (S, A, N))
print(_gcd(_dif(T, S), N))
Execution results in True and >1 greatest common divisor with M31^2:

Code: Select all

$ python3 apply_funtion_n_times.py 
True
2147483647
$

From previous posting 1.25 CPU month computation we know the iteration count as well ("[3201775069376]").
I wrote a gmplib program that does compute f^i(2) much more efficiently than Python.
This is the core:

Code: Select all

...
  mpz_inits (t, t2, NULL);
  for (; i > 0; --i)
  {
    mpz_mul (t, y, y);
    mpz_add_ui (y, t, 1);
    mpz_mod (y, y, n);
  }
  gmp_printf ("%Zd\n", y);
...
I tested this code for i=10^8 and it took 13.85 seconds on another computer with Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz CPU.
I started the run 80 minutes ago, expected runtime to compute S is 32017*13.85/3600=123.2 hours, so shoud be available at 10pm next Wednesday ... (better wait 5 days than 1.25 months). Computation of T from S (for square of Mersenne prime M89) will take another 5 days after Wednesday ...

P.S:
Square of M81 has 54 digits, iteration count 3201775069376 has 13 digits, which is roughly 4th root of M81^2 [bc "l()" function is logarithm]:

Code: Select all

$ bc -ql
l((2^89-1)^2)/l(10)
53.58333922818865274822
l(3201775069376)/l(10)
12.50539081866384008770
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Sat Aug 07, 2021 7:12 am

The Python script I did show in previous posting is here
https://stamm-wilbrandt.de/apply_funtion_n_times.py.txt
because of this posting:
https://stackoverflow.com/questions/282 ... 9_28259967

gmplib "primes" demo allows to compute primes in a range.
Generating two different example primes bigger than 2^31 and not related to powers of 2:

Code: Select all

$ gmp-6.2.1/demos/primes 9999999900 10000000000
9999999929
9999999943
9999999967
$ gmp-6.2.1/demos/primes 99999999900 100000000000
99999999907
99999999943
99999999947
99999999977
$

Running factorize to determine iteration count in square brackets as well as S and T:

Code: Select all

$ ./factorize `./pexpr "9999999967*99999999977"` 
999999996470000000759(1): 9999999967 99999999977 (105284260825628478721 31982818247523240027)
 [242536] 77006230ns (floyd)
$

This is same computation as in previous posting, just for new values, only the last few lines changed compared to apply_funtion_n_times.py:

Code: Select all

$ tail -8 b.py

S = 105284260825628478721
A = 1
N = 9999999967 * 99999999977
print(_repeat(_f, 242536, (2, A, N)) == S)

T = _repeat(_f, 242536, (S, A, N))
print(_gcd(_dif(T, S), N))
$
Of course output as expected:

Code: Select all

$ python3 b.py 
True
9999999967
$

Without proof, but with strong evidence by 242536 examples, any value S on that cycle, with T applying _f() 242536 times on S has greatest common divisor with N ">1" (S and T move in sync one application of _f() by one):

Code: Select all

$ tail bl.py

T = _repeat(_f, 242536, (S, A, N))
print(_gcd(_dif(T, S), N))

for i in range(242536):
    S = _f((S, A, N))
    T = _f((T, A, N))
    assert _gcd(_dif(T, S), N) > 1

print("done")
$
Output confirming what I stated (no assertion miss):

Code: Select all

$ python3 bl.py 
True
9999999967
done
$

This script finds the very first S on cycle, starting with "2":

Code: Select all

S = 2
A = 1
N = 9999999967 * 99999999977

T = _repeat(_f, 242536, (S, A, N))
C = 0
while _gcd(_dif(T, S), N) == 1:
    C = C+1
    S = _f((S, A, N))
    T = _f((T, A, N))

print(_gcd(_dif(T, S), N), C)
The determined count 216699 shows, that in this case significant path length of original 242536 function iterations was done outside of the "ρ" cycle:

Code: Select all

$ python3 bf.py 
9999999967 216699
$
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Sat Aug 07, 2021 2:51 pm

More analysis on cylce C stating at 2 for n=9999999967*99999999977,

I used period.c to determine length of C:
https://gist.github.com/Hermann-SW/eeb0 ... 761ab38998
Took nearly 17 minutes to compute length(C)=4782220904 on Intel i7:

Code: Select all

$ time ./period `./pexpr "9999999967*99999999977"` 
n=999999996470000000759 a=1 s=0: 4782220904 [157836619584084515515] 1006591834245ns

real	16m46.617s
user	16m45.582s
sys	0m0.019s
$ 
Next I computed last value f^216698(2) not on C as determined in previous posting, with new gist:
https://gist.github.com/Hermann-SW/eb4a ... b1708aac1f

Code: Select all

$ time ./fn `./pexpr "9999999967*99999999977"` 2 `./pexpr "216698"` 
315110942281694727266
 17396119ns

real	0m0.065s
user	0m0.037s
sys	0m0.038s
$

And the value after applying f() length(C) times to the value just computed:

Code: Select all

$ time ./fn `./pexpr "9999999967*99999999977"` 2 `./pexpr "216698+4782220904"` 
183800949111896025702
 334670086624ns

real	5m34.710s
user	5m34.294s
sys	0m0.035s
$

And the first value on cycle C:

Code: Select all

$ time ./fn `./pexpr "9999999967*99999999977"` 2 `./pexpr "216699"`  
906808939944183030876
 19755085ns

real	0m0.060s
user	0m0.035s
sys	0m0.034s
$



By definition both values computed before result in first value on C when applying f():

Code: Select all

$ ./pexpr "(183800949111896025702^2+1)%(9999999967*99999999977)"
906808939944183030876
$ ./pexpr "(315110942281694727266^2+1)%(9999999967*99999999977)"
906808939944183030876
$

I created a Graphviz file showing all the facts determined for this n and cycle C sofar.
Here is Graphviz link for opening in your browser, and generated .png below:
Graphviz Fiddle link
analyze.png
analyze.png
analyze.png (55.8 KiB) Viewed 725 times
(If you are interested in Graphviz [Fiddle], you find link to dotguide.pdf when opening above link)


If "y=(x^2+1) mod n", then "y=((-x)^2+1) mod n" as well. Since both values determined that enter first value on C are not negative of each other, their difference has a very interesting feature. Their difference has gcd() with n bigger than 1 ! So if computing "f^4782220904(x)" could be done in time polylogarithmic in 4782220904 (eg. log(4782220904)^k for some constant k), factorization would be easy and RSA not secure anymore ...

Code: Select all

$ ./pexpr "gcd(183800949111896025702-315110942281694727266,9999999967*99999999977)"
99999999977
$

P.S:
length(C)=4782220904 is 47.8% of smaller prime factor 9999999967.
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Sun Aug 08, 2021 7:36 am

I thought it would be interesting to have a cycle id in order to be able to tell whether two start values end up in same cycle or not, new gist:
https://gist.github.com/Hermann-SW/1f57 ... 40a85de254

In order to avoid the overhead of hare and tortoise running around to determine cycle, cycle_id.c assumes that start value s argument is on a cycle -- if you pass s not on a cycle the program will not halt.

Code: Select all

do {
    ++c;

    mpz_mul (t, x, x);
    mpz_add (t, t, a);
    mpz_mod (x, t, n);

    if (mpz_cmp(z, x) > 0)
      mpz_set (z, x);
  }
  while (mpz_cmp (x, s) != 0);

First value on cycle (173078234037530422551) needs to be determined, here for s=3:

Code: Select all

$ ./factorize `./pexpr "9999999967*99999999977"` 1 3
999999996470000000759(1): 9999999967 99999999977 (173078234037530422551 55130608956757586602)
 [242536] 85002632ns (floyd)
$

With that value it takes less than 7 minutes to detemine (minimal value on cycle) cycle id (36623924692), cycle length is reported as well (4782220904):

Code: Select all

$ time ./cycle_id `./pexpr "9999999967*99999999977"` 1 173078234037530422551
n=999999996470000000759 a=1 s=173078234037530422551: 4782220904 [36623924692] 407692850321ns

real	6m47.717s
user	6m46.518s
sys	0m0.079s
$

Running with first value on cycle starting with 2 from previous posting we get same cycle id (36623924692):

Code: Select all

$ time ./cycle_id `./pexpr "9999999967*99999999977"` 1 906808939944183030876
n=999999996470000000759 a=1 s=906808939944183030876: 4782220904 [36623924692] 346275948831ns

real	5m46.304s
user	5m45.732s
sys	0m0.032s
$


I created digraph for n=13*43 (with "mak" tool discussed before), the cycles there are of lengths 2 and 4, so small compared to n -- maybe there are only few small cycles for n=9999999967*99999999977 as well (of length 242536 or smaller). The very long path from 2 to first value on cycle (216699) seems to indicate that huge trees exist outside of cycles for that value as well:
https://stamm-wilbrandt.de/en/forum/559_1.svg
559_1.png
559_1.png
559_1.png (203.46 KiB) Viewed 669 times

n=9999999967*99999999977 is a 70bit number, so exhaustive search for all cycles as done with analyze.c
https://gist.github.com/Hermann-SW/83de ... 641c26c590
before in this thread will fail due to too low RAM available (biggest RAM machine I have access to has 192GB memory). 128GB with storing boolean information in bits allows for 2^40 with 128GB of memory.

Code: Select all

$ echo "l(9999999967*99999999977)/l(2)" | bc -ql
69.76048998754189580376
$
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Sun Aug 08, 2021 5:13 pm

As mentioned I already wrote about analyze.c determining all cycles and a lot of statistics information in previous postings of this thread:
https://gist.github.com/Hermann-SW/83de ... 641c26c590

My Intel laptop has 32GB of main memory, 23GB free.
Since analyze.c "unsig" type is 64bit, and two arrays of unsigs are needed, there is definitely a limit.

I did determine highest prime with 4 digits 9973:

Code: Select all

$ gmp-6.2.1/demos/primes 9900 10000
9901
9907
9923
9929
9931
9941
9949
9967
9973
$

And highest prime with 5 digits 99991:

Code: Select all

$ gmp-6.2.1/demos/primes 99900 100000
99901
99907
99923
99929
99961
99971
99989
99991
$

Check that both are real primes, and n=9973*99991=997210243 is a 30bit number:

Code: Select all

$ echo $((9973*99991))
997210243
$ gmp-6.2.1/demos/factorize 997210243
997210243: 9973 99991
$ echo "l($((9973*99991)))/l(2)" | bc -ql
29.89332246087076002758
$

analyze.c did take 2 seconds for highest 3digit times highest 4digit prime.
And it took 20 seconds on Intel i7 for square of highest 4digit prime.
Product for highest 4digit and highest 5digit prime took 5:22min:

Code: Select all

$ time ./analyze 997210243 1 > out.997210243.txt

real	5m22.831s
user	4m28.168s
sys	0m15.579s
s
"top" did show 14.9G resident memory for analyze running -- if really needed I could run analyze.c for bigger numbers on machine with 192GB main memory.

Wow, a lot of that time was taken by storing result on SSD, a 20.2GB file!

Code: Select all

$ du -sm out.997210243.txt 
20677	out.997210243.txt
$

tail of that file gives lots of statistics as discussed in previous postings of this thread -- most importantly that n has c=22 cycles:

Code: Select all

$ tail -9 out.997210243.txt 
997210242: 1 202 2302
c 22
m 451 239.4
l 2260 2173.4
lm 2711 2412.8
lm1 2302
lm2 2301
lm3 2330
lm4 2617
$ 

And "head" shows the 22 cycle lengths -- they are much smaller than square root 31579 of n=997210243:

Code: Select all

$ head -7 out.997210243.txt 
2100 2100 2260 2100 2100 2100 60 60 60 60 525 525 113 113 20 20 12 12 1 1 1 1 
0: 1 203 2303
1: 1 202 2302
2: 1 201 2301
3: 2 230 2330
4: 3 357 2617
5: 1 200 2300
$
For "f(x)=(x^2+1) mod n" we know 0->1->2->5 -- all those end on cycle "1". The length until reaching the cycle decreases by one for each of them, from 203 for 0 to 200 for 5. Third column is sum of path length to cycle plus cycle length.

Like yesterday digraph can be computed and generated:

Code: Select all

$ ./factorize 997210243
997210243(1): 9973 99991 (618572983 883346160)
 [220] 312979ns (floyd)
$ ./period 997210243 1 2
n=997210243 a=1 s=2: 2100 [307255815] 1157287ns
$ ./cycle_id 997210243 1 618572983
n=997210243 a=1 s=618572983: 2100 [648258] 449558ns
$ ./fn 997210243 2 $((200))
525888605
 141150ns
$
bf.py not needed since we know 201 for s=2 until first value on cycle already by analyze.c.

This is resulting digraph:
Graphviz Fiddle link
analyze30.png
analyze30.png
analyze30.png (33.81 KiB) Viewed 605 times

P.S:
grep on generated 997210252 lines "out.997210243.txt" file takes time:

Code: Select all

$ time grep "^525888605 " out.997210243.txt 
525888605: 1 1 2101

real	0m15.098s
user	0m12.473s
sys	0m2.608s
$ time wc --lines out.997210243.txt 
997210252 out.997210243.txt

real	0m8.384s
user	0m5.687s
sys	0m2.684s
$ 
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 4516
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Pollard Rho loop analysis

Mon Aug 09, 2021 7:17 am

Simple "grep" allows to select all numbers that end up on 1st cycle discussed in previous posting:

Code: Select all

$ time grep ": 1 " out.997210243.txt > cycle.1.txt

real	0m32.490s
user	0m28.884s
sys	0m3.518s
$

89980657 numbers or 9% of all numbers end up on 1st cycle:

Code: Select all

$ time wc --lines cycle.1.txt 
89980657 cycle.1.txt

real	0m0.770s
user	0m0.531s
sys	0m0.239s
$ bc -ql
89980657/997210243
.09023238342328178432

Now I extracted all path lengths from a number to cycle 1, sorted them and printed them together with count -- we already know that 2100 numbers are on the cycle and have path length 0 to enter cycle:

Code: Select all

$ time cut -f3 -d\  cycle.1.txt | sort -n | uniq -c
   2100 0
   6300 1
  11172 2
  18932 3
  23492 4
  28976 5
  32692 6
...  

Wow, maximal path length to cycle 1 is 360, much more than 201 for number 2:

Code: Select all

...
  36144 357
  36628 358
  17980 359
  17980 360

real	3m29.384s
user	3m45.972s
sys	0m1.905s
$

I did copy the 361 path length counts into a Libre Office calc, computed weighted sum and divided by 89980657 to determine average path length to enter cycle 1 as 198.58.

Minimal counts are 2100 numbers of path length 0 on the cycle, and 6300 numbers with path length 1 to cycle 1.
Maximal count is 915394 numbers with path length 240 to cycle 1.

Reminder, all these numbers live on directed trees finally hitting the cycle.
The indegree of a number in that digraph is in range 0..4 because f(x) is a quadratic formula "mod n".

I took the 17980 numbers with path length 360 to cycle 1 and copied them into spreadsheet.
These numbers are all examples for indegree 0, since 360 was maximal path length to cycle 1.
Then I computed f(x)=(x^2+1) mod n in column B to have all the edges from path length 360 to path length 359 numbers.
Finally I sorted the data by column B, and here we see examples for indegree 1 (858232), 2 (957952), 3 (858240) and 4 (1087616):
example.edges.level_360.png
example.edges.level_360.png
example.edges.level_360.png (20.83 KiB) Viewed 546 times

Finally I copied column B (numbers with path length 359 to cycle 1) into file "359".
Then I determined distribution of indegrees for those distinct 10515 numbers:

Code: Select all

$ sort -nu 359 | wc --lines
  10515
$ sort -n 359 | uniq -c | cut -b-7 | sort -n | uniq -c
   4662       1
   4736       2
    622       3
    495       4
$

This is all I have to tell on statistics for numbers entering cycle 1 for now.


P.S:
Factorization alorithm determined difference of numbers on cycle with gcd() with n bigger than 1 as 220.
I did start factorization with number on cycle 1 as reported by that factorization.
As can be seen the minimal distance on cycle for gcd() with n bigger than 1 is 20!

Code: Select all

∀s on C: gcd(s-f^20(s), n)>1

Code: Select all

$ ./factorize 997210243 1 2
997210243(1): 9973 99991 (618572983 883346160)
 [220] 75796ns (floyd)
$ ./factorize 997210243 1 618572983
997210243(1): 9973 99991 (212552207 25129618)
 [20] 73279ns (floyd)
$

Code: Select all

$ ./fn 997210243 212552207 1
167197160
 95499ns
$ ./fn 997210243 212552207 21
353502773
 93171ns
$ ./pexpr "gcd(353502773-167197160, 9973*99991)"
9973
$ 
Last edited by HermannSW on Tue Aug 10, 2021 12:29 pm, edited 2 times in total.
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

Return to “C/C++”