ejolson
Posts: 7242
Joined: Tue Mar 18, 2014 11:47 am

Re: Advent of Code

Thu Mar 04, 2021 5:12 pm

algorithm wrote:
Thu Mar 04, 2021 3:56 pm
OK, I wrote day 24 in C. Runs in 0.035 s in a Pi4 @1.8 GHz. https://github.com/ednl/aoc2020/blob/main/day24.c

Room for improvement in memory usage: bitpacking (factor 8) and perhaps parsing the input before executing it, to get a tighter range than just "longest line". But, seems unnecessary for the Pico.
My calculation is that

2 x 401 x 401 = 321602

which seems a bit bigger than the 264K memory of the Pico. To fit the calculation in 32K the canine coder created a moving window for scratch space (questionably named neigh without any underscores), which I preserved in the torus program to perform the convolution in place so only one copy of the grid is needed.

I'm looking into some micro-optimisations for the PET version of Day 24, as it seems I learn more by trying to improve a program than writing it in the first place.

Do you have Fuzix running on your Pico?

User avatar
algorithm
Posts: 265
Joined: Mon Nov 25, 2013 9:09 pm
Location: Flatland

Re: Advent of Code

Thu Mar 04, 2021 6:05 pm

No, no Fuzix. I forgot about trying to fit it into the SRAM instead of flash, but it runs anyway, in 1.324 s with which I am satisfied. The actual longest line is 43 characters for me. Code at https://github.com/ednl/pico/blob/main/day24.c (which includes 471 lines of my puzzle input..). Output on repeat:

Code: Select all

Part 1: 411
Part 2: 4092
Time: 1.32437 s

User avatar
algorithm
Posts: 265
Joined: Mon Nov 25, 2013 9:09 pm
Location: Flatland

Re: Advent of Code

Thu Mar 04, 2021 9:25 pm

I changed my original program (non-Pico) which now parses the input twice to get the smallest grid possible. Turns out that for my input, counting from (0,0) the minimum x and y are both -18 and the maximum x and y both +18. So it's still a square, but smaller. With a border of zeros to not have to account for edge cases, the length in both dimensions is 2*100 + 18 - (-18) + 3 = 239 for a total data size of 2*239*239 = 112 KB. For my input, the black tiles do reach the border after 100 turns, so those extra zeros were necessary (aka useful). https://github.com/ednl/aoc2020/blob/main/day24.c

ejolson
Posts: 7242
Joined: Tue Mar 18, 2014 11:47 am

Re: Advent of Code

Thu Mar 04, 2021 10:56 pm

algorithm wrote:
Thu Mar 04, 2021 9:25 pm
I changed my original program (non-Pico) which now parses the input twice to get the smallest grid possible. Turns out that for my input, counting from (0,0) the minimum x and y are both -18 and the maximum x and y both +18. So it's still a square, but smaller. With a border of zeros to not have to account for edge cases, the length in both dimensions is 2*100 + 18 - (-18) + 3 = 239 for a total data size of 2*239*239 = 112 KB. For my input, the black tiles do reach the border after 100 turns, so those extra zeros were necessary (aka useful). https://github.com/ednl/aoc2020/blob/main/day24.c
With a total grid size of 239x239 it would seem that a 256x256 torus could manage the calculation no wrapping around. Would you mind running my code too, for comparison?

User avatar
algorithm
Posts: 265
Joined: Mon Nov 25, 2013 9:09 pm
Location: Flatland

Re: Advent of Code

Fri Mar 05, 2021 7:22 am

ejolson wrote:
Thu Mar 04, 2021 10:56 pm
With a total grid size of 239x239 it would seem that a 256x256 torus could manage the calculation no wrapping around. Would you mind running my code too, for comparison?
Correct result for my input. Reports running time ~3x slower on Core i5-4570 3.2 GHz, ~2x slower on Pi 400.

User avatar
algorithm
Posts: 265
Joined: Mon Nov 25, 2013 9:09 pm
Location: Flatland

Re: Advent of Code

Fri Mar 05, 2021 9:17 am

I switched the timer because it didn't work on the Pico. Also read from included text instead of file. Plus a few more standard Pico things like the init and CMakeLists.txt. Unfortunately the code is probably not laid out for multiple loops; correct values on first run (at least for part 2; serial connection loses the first one) but then garbage and the run times go up for some reason (but not catastrophically?? Edit: OK I realised it's probably because the black tiles keep piling up until some equilibrium). So, you should probably get your own Pico! :)

Code: Select all

Part 2: 4092
Time: 2.40400 s

Part 1: 4217
Part 2: 12301
Time: 3.59685 s
                                                                                                
Part 1: 12400                                                                                   
Part 2: 22837                                                                                   
Time: 5.62482 s                                                                                 
                                                                                                
Part 1: 22968
Part 2: 23121
Time: 6.62881 s

Part 1: 23226
Part 2: 23174
Time: 6.63686 s

Part 1: 23237
Part 2: 23461
Time: 6.64224 s

Part 1: 23608
Part 2: 22933
Time: 6.63561 s

Part 1: 23036
Part 2: 22952
Time: 6.63413 s
Edit2: I also ported my second version with the double parsing (but this time with single parsing and storing the tile coordinates in a ~1K array) and that runs in 0.7525 s in standard mode (data in flash) and 0.7504 s in "don't use flash" mode. See https://github.com/ednl/pico/blob/main/day24.c

ejolson
Posts: 7242
Joined: Tue Mar 18, 2014 11:47 am

Re: Advent of Code

Fri Mar 05, 2021 5:09 pm

algorithm wrote:
Fri Mar 05, 2021 9:17 am
I switched the timer because it didn't work on the Pico. Also read from included text instead of file. Plus a few more standard Pico things like the init and CMakeLists.txt. Unfortunately the code is probably not laid out for multiple loops; correct values on first run (at least for part 2; serial connection loses the first one) but then garbage and the run times go up for some reason (but not catastrophically?? Edit: OK I realised it's probably because the black tiles keep piling up until some equilibrium). So, you should probably get your own Pico! :)

Code: Select all

Part 2: 4092
Time: 2.40400 s

Part 1: 4217
Part 2: 12301
Time: 3.59685 s
                                                                                                
Part 1: 12400                                                                                   
Part 2: 22837                                                                                   
Time: 5.62482 s                                                                                 
                                                                                                
Part 1: 22968
Part 2: 23121
Time: 6.62881 s

Part 1: 23226
Part 2: 23174
Time: 6.63686 s

Part 1: 23237
Part 2: 23461
Time: 6.64224 s

Part 1: 23608
Part 2: 22933
Time: 6.63561 s

Part 1: 23036
Part 2: 22952
Time: 6.63413 s
Edit2: I also ported my second version with the double parsing (but this time with single parsing and storing the tile coordinates in a ~1K array) and that runs in 0.7525 s in standard mode (data in flash) and 0.7504 s in "don't use flash" mode. See https://github.com/ednl/pico/blob/main/day24.c
Woohoo! Thanks for running it. No Picos are available here, but I'm planning to get enough when they are.

You're right that the grid is not reinitialised except to zero at program load. Thus, the first run of 2.4 seconds is the one to compare. Run times increase as more tiles are flipped to black.

As a model for the spread of an epidemic, I wonder how quickly the 256x256 torus saturates to the point most tiles have been flipped at least once. Then the shape of the domain could be changed to represent political boundaries and quarantine efforts. Could this more accurately determine the spread of an epidemic than the models currently in use?

ejolson
Posts: 7242
Joined: Tue Mar 18, 2014 11:47 am

Re: Advent of Code

Sun Mar 07, 2021 12:24 am

ejolson wrote:
Fri Mar 05, 2021 5:09 pm
Woohoo! Thanks for running it. No Picos are available here, but I'm planning to get enough when they are.
So without any Picos I'm in the middle of Lent finishing up the Advent of Code. The last puzzle--except for the hypercube version of crab cups--is Day 25. The dog developer got very interested and barked, this seems like a good opportunity to catch a kangaroo.

Thinking Fido was taking about the Beginners All-purpose RISC Kangaroo, I remarked that the BARK™ is even further behind schedule than before due to the semiconductor shortage and the fact that the Waupaca Foundry is really an iron works. With hair bristling the developer growled, not that kangaroo but the Monte-Carlo method.
J.M. Pollard wrote: We illustrate with the problem of catching a kangaroo which is traveling along a known path in a series of what appears to be unpredictable bounds; in reality, their lengths are a function of the state of the ground at the point of take-off.
I looked at the link

https://en.wikipedia.org/wiki/Pollard%2 ... _algorithm

and pointed out that this was Reno. With statistical certainty said I, there must be a more appropriate algorithm for computing logarithms that would work better here.

At this the dog developer abruptly ended our Zoom conference. For a time it seemed like the sound of slot machines emanating from the dog house would never end, but then the noise suddenly stopped and the PET produced the output
    pet25.png
    pet25.png (132.4 KiB) Viewed 1876 times
      All those pseudo-random numbers seem like a strange way to solve the puzzle to me. Do you think Fido's 8-bit code will be full of tame or wild kangaroos? Maybe there is no need for a Pico after all.

      lurk101
      Posts: 590
      Joined: Mon Jan 27, 2020 2:35 pm
      Location: Cumming, GA (US)

      Re: Advent of Code

      Sun Mar 07, 2021 2:22 pm

      Code: Select all

      #include <iostream>
      #include <algorithm>
      
      using namespace std;
      
      typedef uint64_t u64;
      
      const u64 pK1 = 16915772, pK2 = 18447943;
      //const u64 pK1 = 5764801, pK2 = 17807724;
      
      static u64 transform(u64 s, u64 l)
      {
          u64 r = 1;
          for (u64 n = 1; n <= l; n++) {
              r = (r * s) % 20201227ULL;
          }
          return r;
      }
      
      static u64 loopCount(u64 p) {
          for (u64 l = 1;; l++) {
              u64 t = transform(7, l);
              if (t == p)
                  return l;
          }
      }
      
      int main()
      {
          cout << transform(7, 11) << endl;
          u64 l1 = loopCount(pK1);
          u64 l2 = loopCount(pK2);
          cout << l1 << " " << l2 << endl;
          u64 t1 = transform(pK1, l2);
          u64 t2 = transform(pK2, l1);
          cout << t1 << " " << t2 << endl;
      }
      
      The old semiconductor paradigms are rapidly becoming a thing of the past.
      Today, it's about the best transistors, architectures, and accelerators for the job.

      ejolson
      Posts: 7242
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Advent of Code

      Sun Mar 07, 2021 6:30 pm

      lurk101 wrote:
      Sun Mar 07, 2021 2:22 pm

      Code: Select all

      #include <iostream>
      #include <algorithm>
      
      using namespace std;
      
      typedef uint64_t u64;
      
      const u64 pK1 = 16915772, pK2 = 18447943;
      //const u64 pK1 = 5764801, pK2 = 17807724;
      
      static u64 transform(u64 s, u64 l)
      {
          u64 r = 1;
          for (u64 n = 1; n <= l; n++) {
              r = (r * s) % 20201227ULL;
          }
          return r;
      }
      
      static u64 loopCount(u64 p) {
          for (u64 l = 1;; l++) {
              u64 t = transform(7, l);
              if (t == p)
                  return l;
          }
      }
      
      int main()
      {
          cout << transform(7, 11) << endl;
          u64 l1 = loopCount(pK1);
          u64 l2 = loopCount(pK2);
          cout << l1 << " " << l2 << endl;
          u64 t1 = transform(pK1, l2);
          u64 t2 = transform(pK2, l1);
          cout << t1 << " " << t2 << endl;
      }
      
      That sure takes a long time to execute, even on the Pi 4B named turbo. Could the difficulty be a lack of kangaroos? It seems the code from your earlier post

      viewtopic.php?p=1785841#p1785841

      is much faster. Did an O(n) algorithm get changed to O(n^2)?

      User avatar
      algorithm
      Posts: 265
      Joined: Mon Nov 25, 2013 9:09 pm
      Location: Flatland

      Re: Advent of Code

      Sun Mar 07, 2021 9:02 pm

      Visualisation of the hexagonal grid: https://ednl.github.io/hexgridjs/

      (Earlier I thought it went all the way to the border but apparently that was a different edge effect a.k.a. an error made by me; it comes nowhere near.)

      lurk101
      Posts: 590
      Joined: Mon Jan 27, 2020 2:35 pm
      Location: Cumming, GA (US)

      Re: Advent of Code

      Sun Mar 07, 2021 9:56 pm

      ejolson wrote:
      Sun Mar 07, 2021 6:30 pm
      lurk101 wrote:
      Sun Mar 07, 2021 2:22 pm

      Code: Select all

      #include <iostream>
      #include <algorithm>
      
      using namespace std;
      
      typedef uint64_t u64;
      
      const u64 pK1 = 16915772, pK2 = 18447943;
      //const u64 pK1 = 5764801, pK2 = 17807724;
      
      static u64 transform(u64 s, u64 l)
      {
          u64 r = 1;
          for (u64 n = 1; n <= l; n++) {
              r = (r * s) % 20201227ULL;
          }
          return r;
      }
      
      static u64 loopCount(u64 p) {
          for (u64 l = 1;; l++) {
              u64 t = transform(7, l);
              if (t == p)
                  return l;
          }
      }
      
      int main()
      {
          cout << transform(7, 11) << endl;
          u64 l1 = loopCount(pK1);
          u64 l2 = loopCount(pK2);
          cout << l1 << " " << l2 << endl;
          u64 t1 = transform(pK1, l2);
          u64 t2 = transform(pK2, l1);
          cout << t1 << " " << t2 << endl;
      }
      
      That sure takes a long time to execute, even on the Pi 4B named turbo. Could the difficulty be a lack of kangaroos? It seems the code from your earlier post

      viewtopic.php?p=1785841#p1785841

      is much faster. Did an O(n) algorithm get changed to O(n^2)?
      Not sure, I've completely lost interest in it, and any other fictitious animal.
      The old semiconductor paradigms are rapidly becoming a thing of the past.
      Today, it's about the best transistors, architectures, and accelerators for the job.

      ejolson
      Posts: 7242
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Advent of Code

      Sun Mar 07, 2021 10:42 pm

      lurk101 wrote:
      Sun Mar 07, 2021 9:56 pm
      Not sure, I've completely lost interest in it, and any other fictitious animal.
      That's understandable. I reverted the C++ code to the faster version posted earlier. Since I remember you originally being interested in cryptography, here are some timing comparisons between the two algorithms running on the Raspberry Pi 4B.

      Code: Select all

                puzzle 1   puzzle 2
      card       6270530   16915772
      door      14540258   18447943
      
      C++         1.130s     0.458s
      kangaroo    0.023s     0.018s
      
      For reference, the kangaroo code is

      Code: Select all

      /*  advent of code day 25 https://adventofcode.com/2020/day/10
          written march 5, 2021 by eric olson using the kangaroo method */
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <time.h>
      
      #define M 20201227
      
      #ifdef PET
      
      typedef unsigned long ui32;
      typedef unsigned char ui8;
      
      #define fopen myfopen
      #define fgets myfgets
      #define fclose myfclose
      static char *myday25[] = {
          "6270530\n",
          "14540258\n",
          0 };
      
      static int myline=0;
      static FILE *myfopen(char *,char *){
          myline=0;
          return (FILE *)1;
      }
      static char *myfgets(char *s,int n,FILE *){
          if(myday25[myline]){
              strncpy(s,myday25[myline],n);
              myline++;
              return s;
          }
          return 0;
      }
      static int myfclose(FILE *){
          return 0;
      }
      
      static ui32 tstart;
      static void tic(){
          tstart=clock();
      }
      static void toc(){
          ui32 tstop=clock();
          ui32 tsec=tstop-tstart;
          ui32 tcen=10000ul*(tsec%CLOCKS_PER_SEC);
          tsec/=CLOCKS_PER_SEC;
          tcen=(tcen+CLOCKS_PER_SEC/2)/CLOCKS_PER_SEC;
          printf("\nTotal execution time %lu.%04lu seconds.\n",tsec,tcen);
      }
      
      static ui32 r[13];
      /*  this 32-bit modmul is seven times slower than slow */
      static ui32 modmul(ui32 x,ui32 y){
          int i,j;
          ui8 p[7],q[7];
          ui32 z;
          for(i=0;i<7;i++){
              p[i]=x&15; x>>=4;
              q[i]=y&15; y>>=4;
          }
          z=0;
          for(i=0;i<7;i++){
              ui32 t=0;
              for(j=0;j<7;j++){
                  t+=q[j]*r[i+j];
              }
              z+=p[i]*(t%M);
          }
          return z%M;
      }
      
      #else
      
      typedef unsigned int ui32;
      typedef unsigned char ui8;
      
      static struct timespec tic_start;
      static void tic(){
          clock_gettime(CLOCK_MONOTONIC_RAW,&tic_start);
      }
      static void toc(){
          struct timespec tic_stop;
          clock_gettime(CLOCK_MONOTONIC_RAW,&tic_stop);
          double sec=tic_stop.tv_sec-tic_start.tv_sec;
          double r=sec+(tic_stop.tv_nsec-tic_start.tv_nsec)*1.0e-9;
          printf("\nTotal execution time %g seconds.\n",r);
      }
      
      static ui32 r[13];
      static ui32 modmul(ui32 x,ui32 y){
          return ((unsigned long long)x*y)%M;
      }
      
      #endif
          
      static ui32 sqrtM,shake[2];
      
      static ui32 ui32sqr(ui32 x){
          ui32 a=0,b=65536;
          for(;;){
              ui32 c=(a+b)/2;
              if(c*c>=x) b=c;
              else a=c;
              if(b-a<=1) return b;
          }
      }
      
      static void doinit(){
          FILE *fp;
          int i;
          r[0]=1;
          for(i=1;i<13;i++){
              r[i]=(16*r[i-1])%M;
          }
          sqrtM=ui32sqr(M);
          fp=fopen("day25.dat","r");
          if(fp==0){
              printf("Error opening day25.dat for import!\n");
              exit(1);
          }
          for(i=0;i<2;i++){
              char buf[128];
              if(!fgets(buf,sizeof(buf),fp)){
                  printf("Error reading day25.dat at line %d!\n",i);
                  exit(1);
              }
              shake[i]=atol(buf);
          }
          fclose(fp);
      }
      
      static ui32 modpow(ui32 p,ui32 n){
          ui32 r;
          for(r=1;n!=0;p=modmul(p,p)){
              if((n&1)!=0) r=modmul(r,p);
              n>>=1;
          }
          return r;
      }
      
      static ui32 rooparm[2]={ 3659326121ul, 3048033998ul };
      static void rooseed(){
          int k;
          for(k=0;k<2;k++){
              rooparm[k]=srand()|1;
          }
      }
      static ui32 roojump(ui32 x){
          x=x*(x+rooparm[1])+rooparm[0];
          return ((x>>16)|(x<<16))%sqrtM+1;
      }
      
      static ui32 xi,yi,d,dn,alpha,beta;
      static void rootame(){
          int i;
          xi=1;
          d=0;
          for(i=0;i<sqrtM;i++){
              ui32 fi=roojump(xi);
              d+=fi;
              xi=modmul(xi,modpow(alpha,fi));
          }
      }
      static ui32 roowild(){
          yi=beta;
          dn=0;
          for(;;){
              ui32 fi;
              if(yi==xi) return 1;
              if(dn>=d+M) return 0;
              fi=roojump(yi);
              dn+=fi;
              yi=modmul(yi,modpow(alpha,fi));
          }
      }
      static ui32 modlog(ui32 base,ui32 y){
          int k;
          alpha=base;
          beta=y;
          for(k=0;k<sqrtM;k++){
              rootame();
              if(roowild()){
                  return (M-1+d-dn)%(M-1);
              }
              rooseed();
          }
          return 0;
      }
      
      static void part1(){
          ui32 x=modlog(7,shake[0]);
          ui32 key=modpow(shake[1],x);
          printf("Part 1 The encryption key is %lu\n",(unsigned long)key);
      }
      
      int main(){
          tic();
          printf("Advent of Code Day 25\n\n");
          doinit();
          part1();
          toc();
          return 0;
      }
      
      Please ignore the ifdef section for the PET. The program should just run on any C compiler that supports 64-bit arithmetic.
      Last edited by ejolson on Mon Mar 08, 2021 5:40 pm, edited 7 times in total.

      ejolson
      Posts: 7242
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Advent of Code

      Sun Mar 07, 2021 11:43 pm

      ejolson wrote:
      Sun Mar 07, 2021 10:42 pm
      Please ignore the ifdef section for the PET. The program should just run on any C compiler that supports 64-bit arithmetic.
      It turns out the nibble-at-a-time modmul routine I wrote for the PET is actually faster on Raspberry Pi OS than using 64-bit integers. Calling the modified program slow25 yields the updated timing table

      Code: Select all

                puzzle 1   puzzle 2
      card       6270530   16915772
      door      14540258   18447943
      
      C++         1.130s     0.458s
      kangaroo    0.023s     0.018s
      slow25      0.013s     0.011s
      
      While that routine is definitely slower than slow when running in 64-bit mode, it seems not quite as slow as it looks. I wonder if it could be further optimized.

      ejolson
      Posts: 7242
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Advent of Code

      Mon Mar 08, 2021 4:06 am

      ejolson wrote:
      Sun Mar 07, 2021 11:43 pm
      While that routine is definitely slower than slow when running in 64-bit mode, it seems not quite as slow as it looks.
      Here are some runs on the 64-bit beta test of Raspberry Pi OS.

      Code: Select all

      $ su
      # echo performance >/sys/devices/system/cpu/cpufreq/policy0/scaling_governor
      $ time ./pet25
      Advent of Code Day 25
      
      Part 1 The encryption key is 16311885
      
      Total execution time 0.00156378 seconds.
      
      real    0m0.003s
      user    0m0.000s
      sys 0m0.003s
      $ time ./slow25 
      Advent of Code Day 25
      
      Part 1 The encryption key is 16311885
      
      Total execution time 0.00734076 seconds.
      
      real    0m0.009s
      user    0m0.009s
      sys 0m0.001s
      
      Which is faster is reversed such that when in 64-bit mode the nibble-based modmul is about 4.6 times slower.

      The updated table is

      Code: Select all

                puzzle 1   puzzle 2
      card       6270530   16915772
      door      14540258   18447943
      
                    32-bit results
      C++         1.130s     0.458s
      kangaroo    0.023s     0.018s
      slow25      0.013s     0.011s
      
                    64-bit results
      C++         0.359s     0.145s
      kangaroo    0.003s     0.003s
      slow25      0.009s     0.008s
      
      Solving the puzzle for Day 25 is definitely a case where 64-bit mode is advantageous. For this reason I suspect the nibble-based approach will be faster on the Pico.

      User avatar
      algorithm
      Posts: 265
      Joined: Mon Nov 25, 2013 9:09 pm
      Location: Flatland

      Re: Advent of Code

      Mon Mar 08, 2021 10:18 am

      I didn't do much myself for day 25 after realising it was DHKE. I stole the precise description from Reddit, put in a completely naive discrete logarithm and stole the modular exponentiation from Wikipedia: https://github.com/ednl/aoc2020/blob/main/day25.c

      I thought it was interesting that there is potentially a lot of runtime difference between calculating card+door vs. door+card, and I don't think it is predictable which one will be faster. Especially obvious in non-symmetrical algorithms like this one I saw on Reddit, the second one in my Python solution: https://github.com/ednl/aoc2020/blob/main/day25.py

      Timings for the supposedly symmetrical C version on a Pi 4 with Raspbian 32-bit:

      Code: Select all

      algorithm (ednl)
        15113849  4206373 :  1890859  (0.09833 s)
         4206373 15113849 :  1890859  (0.05571 s)
      
      ejolson
         6270530 14540258 : 16311885  (0.01467 s)
        14540258  6270530 : 16311885  (0.01471 s)
      
      lurk101
        16915772 18447943 :  6011069  (0.17018 s)
        18447943 16915772 :  6011069  (0.16997 s)
      (So, these times are all for my C program, just with different puzzle inputs.) Same thing on a Pi 400 with Ubuntu 64-bit is MUCH faster:

      Code: Select all

      algorithm (ednl)
        15113849  4206373 :  1890859  (0.01575 s)
         4206373 15113849 :  1890859  (0.01036 s)
      
      ejolson
         6270530 14540258 : 16311885  (0.00329 s)
        14540258  6270530 : 16311885  (0.00327 s)
      
      lurk101
        16915772 18447943 :  6011069  (0.03821 s)
        18447943 16915772 :  6011069  (0.03815 s)

      User avatar
      algorithm
      Posts: 265
      Joined: Mon Nov 25, 2013 9:09 pm
      Location: Flatland

      Re: Advent of Code

      Mon Mar 08, 2021 12:45 pm

      Your modmul() might not be the fastest version possible because there's a couple of implicit conversions, starting with the #define of M as a signed integer (could be 32- or 64-bit, should be unsigned). Architecture independent 32/64-bit postfixes (which would be defined as either UL or ULL) are defined in stdint.h. In the same way, "unsigned long long" is ambiguous; could be different on PET, RaspiOS 32bit, RaspiOS 64bit.

      See https://en.cppreference.com/w/c/languag ... etic_types and https://en.cppreference.com/w/c/types/integer

      ejolson
      Posts: 7242
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Advent of Code

      Mon Mar 08, 2021 4:33 pm

      algorithm wrote:
      Mon Mar 08, 2021 12:45 pm
      Your modmul() might not be the fastest version possible because there's a couple of implicit conversions, starting with the #define of M as a signed integer (could be 32- or 64-bit, should be unsigned). Architecture independent 32/64-bit postfixes (which would be defined as either UL or ULL) are defined in stdint.h. In the same way, "unsigned long long" is ambiguous; could be different on PET, RaspiOS 32bit, RaspiOS 64bit.

      See https://en.cppreference.com/w/c/languag ... etic_types and https://en.cppreference.com/w/c/types/integer
      As I wrote my 32-bit modmul, I was hoping cc65 would recognize a 32-bit integer multiplied by an 8-bit integer doesn't need to call the full 32-bit by 32-bit software multiply routine.

      It just occurred to me, that since the base of 7 fits in a nibble, there is an easy way to code the non-kangaroo way of computing logarithms using only 32-bit arithmetic. Since you've explicitly written all the constants into the source, I wonder if gcc data flow analysis realises that and can optimize the loop at the beginning of dhke in day25.c to use only 32-bit numbers.

      ejolson
      Posts: 7242
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Advent of Code

      Mon Mar 08, 2021 5:03 pm

      algorithm wrote:
      Mon Mar 08, 2021 10:18 am
      I thought it was interesting that there is potentially a lot of runtime difference between calculating card+door vs. door+card, and I don't think it is predictable which one will be faster.
      Since the computation time is so short, I think you are seeing power-saving effects of the default CPU governor. Did you try setting the governor to performance as shown in my post above?

      viewtopic.php?p=1832513#p1832513

      If that doesn't fix the predictability, I'd try forcing the minimum CPU clock speed to be equal the maximum. It can also help to repeat the test and take the minimum time as the one absent the weird scheduling irregularities that for want of a Pico distinguish not just Linux but all preemptive time-sharing systems.

      Did you ever get the solution

      viewtopic.php?p=1829958#p1829958

      for Day 24 to run on a Pico?

      User avatar
      algorithm
      Posts: 265
      Joined: Mon Nov 25, 2013 9:09 pm
      Location: Flatland

      Re: Advent of Code

      Mon Mar 08, 2021 6:28 pm

      ejolson wrote:
      Mon Mar 08, 2021 5:03 pm
      Since the computation time is so short, I think you are seeing power-saving effects of the default CPU governor.
      Perhaps that was it. I didn't mean to sound so sure about the tiny variations in my version which was SUPPOSED to be symmetrical (at least, I couldn't figure out why there would be a difference) but that other version in the Python file definitely showed a large difference for a different ordering of my input. Anyway, here's wonder -Wall, on a Pi 4 with RaspiOS 32bit, using uint_fast32_t where possible, looping 100 times each (with 3 warmup loops). So yeah, symmetrical. Poor lurk got shafted with his input, it seems.

      Code: Select all

      algorithm (ednl)
        15113849  4206373 :  1890859 (min 0.00761 avg 0.00764 max 0.00936 s)
         4206373 15113849 :  1890859 (min 0.00761 avg 0.00764 max 0.00901 s)
      
      ejolson
         6270530 14540258 : 16311885 (min 0.00243 avg 0.00244 max 0.00244 s)
        14540258  6270530 : 16311885 (min 0.00243 avg 0.00244 max 0.00244 s)
      
      lurk101
        16915772 18447943 :  6011069 (min 0.02824 avg 0.02825 max 0.02847 s)
        18447943 16915772 :  6011069 (min 0.02824 avg 0.02825 max 0.02850 s)

      ejolson
      Posts: 7242
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Advent of Code

      Fri Apr 09, 2021 1:29 am

      ejolson wrote:
      Sat Jan 09, 2021 6:25 am
      Paeryn wrote:
      Thu Jan 07, 2021 12:43 am
      Though why he wants to learn prolog I'll never know, cat logic is quite different from anything we understand.
      If Scratch reflects any measure of cat logic, I would agree about not understanding it.
      I just noticed the new thread on a faster version of Scratch 3 for the Pi.

      viewtopic.php?f=77&t=308834

      I wonder if TurboWarp is only faster for graphics or whether these programming algorithm problems also run faster.

      ejolson
      Posts: 7242
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Advent of Code

      Wed May 05, 2021 5:02 pm

      ejolson wrote:
      Fri Feb 19, 2021 5:44 pm
      I've now made some changes to my Go program to make sure it doesn't detect an infinite loop where there wasn't one. Along the way, I discovered some random shuffles of cards which seem to result in very long games. An example is

      Code: Select all

      Player 1
      23
      21
      30
      6
      33
      48
      8
      50
      36
      28
      11
      45
      7
      42
      4
      41
      34
      9
      17
      44
      26
      22
      32
      37
      47
      
      Player 2
      39
      15
      1
      49
      38
      43
      40
      2
      46
      10
      5
      18
      35
      25
      24
      29
      16
      20
      12
      19
      14
      27
      13
      3
      31
      
      which on my 2GB Pi 4B now hits swap.
      Inspired by the recent Barnsely fern program

      http://fractal.math.unr.edu/~ejolson/

      which runs 4 times faster using gccgo compared to the golang gc compiler, I decided to test my go22.go solution to day 22 using gccgo. Running on an 8-core Ryzen 1700 the results were

      Code: Select all

      $ /usr/bin/time ./go22.gcc 
      Advent of Code Day 22
      
      Part 1 Player 1 won with score 32701
      Part 2 Player 1 won with score 35440
      
      Total execution time 546.860303401947 seconds.
      1984.48user 40.93system 9:07.69elapsed 369%CPU (0avgtext+0avgdata 14046440maxresident)k
      0inputs+0outputs (0major+11979611minor)pagefaults 0swaps
      $ /usr/bin/time ./go22
      Advent of Code Day 22
      
      Part 1 Player 1 won with score 32701
      Part 2 Player 1 won with score 35440
      
      Total execution time 400.439382314682 seconds.
      1680.11user 46.73system 6:41.41elapsed 430%CPU (0avgtext+0avgdata 13884584maxresident)k
      0inputs+0outputs (0major+10086156minor)pagefaults 0swaps
      
      While the wall time shows the gccgo executable to be about 30 percent slower (rather than the 4-fold improvement hoped for), what I found more interesting was that this non-parallel program used on average 4 cores while it was executing. Go appears to be so heavily threaded and processor intensive that while my computation is running on one core the runtime system consumes three extra cores. Could all of that be the garbage collector?

      Return to “General programming discussion”