ejolson
Posts: 7079
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Sun Apr 11, 2021 5:58 pm

ejolson wrote:
Wed Apr 07, 2021 4:48 pm
just as the real epidemic has led to extended depreciation schedules for practical computing clusters, the resulting silicon drought has intubated the life of the super-cheap cluster for learning.
It looks like a new 64-bit cluster will not be needed as a 32-bit ARM Singularity container with Julia version 1.6.0 has appeared out of the thin air here in the high desert. For more information, see

viewtopic.php?p=1849972#p1849972

After updating the computational nodes so they know how to run Singularity containers, the next step is to test everything with a simple Julia test program. Hopefully, the Slurm batch scheduler will then manage to launch any container as needed without further incident.

ejolson
Posts: 7079
Joined: Tue Mar 18, 2014 11:47 am

Re: Super-cheap Computing Cluster for Learning

Sun Apr 11, 2021 7:56 pm

ejolson wrote:
Sun Apr 11, 2021 5:58 pm
After updating the computational nodes so they know how to run Singularity containers, the next step is to test everything with a simple Julia test program.
Since the super-cheap cluster was first set up some time ago, it's worth summarizing the configuration. From a hardware point of view things are pretty simple: 5 Pi Zero's are plugged into a powered USB hub that is then connected to a Pi B+ which serves as the head node. Since the original configuration, the SD cards have been removed from the Pi Zero's which now run without locally attached storage.

Note that the only reason for removing the SD cards was because they were needed for another project. This change also parallels the development of the Rabbit integrated storage system for the El Capitan exascale machine which replaces the directly attached SSD storage used in the Sierra supercomputer.

https://www.nextplatform.com/2021/03/09 ... e-storage/

Another modification to the hardware was setting the clock speed of the Zeros from the overclocked default of 1 GHz to the same 700 MHz level used on the head node. This was done for long-term stability in much the same way as turning off the L1 cache of Blue Gene decreased performance in exchange for the system stability necessary to make extended use of that supercomputer.

The software configuration is a bit more involved. The Each Pi Zero boots using rpiboot through USB and then mounts its root file system over NFS from BTRFS copy-on-write snapshots of the same system image used on the head node. This ensures the system software stays synchronized and allows the Linux filesystem buffer on the head node to efficiently cache the common system image served to all nodes.

Note that this way of updating the system software on the super-cheap cluster allows the computational nodes to be isolated from the Internet. In particular, there is no need to set up any type of IP masquerade or packet forwarding so the Zeros can run apt-get. Such quarantine measures are important as computer viruses have been spreading almost as fast as human ones during the epidemic.

Scripts to automatically update the snapshots used as the root filesystems for the computational nodes were presented earlier in this thread. Over years the names of these scripts have changed slightly. As the next step is to update the system images on the Zeros so they can run singularity containers, it is worth reviewing the necessary scripts and their new names.
  • sdown -- shut down the computational nodes.
    • supdate -- create new snapshots of the current system image for each computational node.
      • srun -- reboot the computational nodes.
      On a larger cluster it would be advisable to perform a similar update node at a time by consecutively draining the Slurm queue for one node and updating it separately. Automating such a procedure is outside the scope of the present learning project and not necessary because with only 5 computational nodes, updating them all at once is not a big inconvenience.

      Note that we are not updating the system software from Raspbian Stretch to Buster, but simply propagating the newly installed version of Singularity out to the computational nodes.

      Custom-built software like Singularity could also be provided by a separate read-only NFS mount which could then be updated from the head node without rebooting the computational nodes. As this is not very difficult to configure, I've not focused on such things for this project. Moreover, using the same technique for all software on a small cluster is just simpler.

      Now, since Singularity is installed on the head node and working well, the same system image can be propagated to the computational nodes via

      Code: Select all

      # ./sdown
      Connection to s2 closed by remote host.
      Connection to s3 closed by remote host.
      Connection to s1 closed by remote host.
      Connection to s0 closed by remote host.
      Connection to s4 closed by remote host.
      # ./supdate 
      Configuring s0...
      Delete subvolume (no-commit): '/x/s0'
      Create a snapshot of '/' in './s0'
      Configuring s1...
      Delete subvolume (no-commit): '/x/s1'
      Create a snapshot of '/' in './s1'
      Configuring s2...
      Delete subvolume (no-commit): '/x/s2'
      Create a snapshot of '/' in './s2'
      Configuring s3...
      Delete subvolume (no-commit): '/x/s3'
      Create a snapshot of '/' in './s3'
      Configuring s4...
      Delete subvolume (no-commit): '/x/s4'
      Create a snapshot of '/' in './s4'
      # ./srun 
      Starting rpiboot to boot nodes...
      
      When rpiboot finishes reloading the kernel and initial RAM filesystem, each node mounts a newly made copy-on-write snapshot as its root filesystem. After some time all the computational nodes should be back online and the system software fully synchronized across the cluster.

      The command

      Code: Select all

      # bridge link
      16: s0 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 7418 master san state forwarding priority 32 cost 100 
      17: s1 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 7418 master san state forwarding priority 32 cost 100 
      18: s2 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 7418 master san state forwarding priority 32 cost 100 
      19: s3 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 7418 master san state forwarding priority 32 cost 100 
      20: s4 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 7418 master san state forwarding priority 32 cost 100 
      
      verifies that all the Zero computers have rebooted and the 5 separate Ethernet gadgets are again networked together by means of a bridge device.

      For some reason Slurm always needs a little help after such an update. This does not present a serious difficulty as

      Code: Select all

      # for i in s0 s1 s2 s3 s4
      > do
      >     scontrol update nodename=$i state=idle
      > done
      
      resets the state of the computational nodes so they again can start accepting jobs. Before testing whether Julia jobs now work from the batch queue, let's log into one of the Zeros to see whether Julia works interactively.

      Code: Select all

      $ slogin s1
      Linux s1 4.9.59+ #1047 Sun Oct 29 11:47:10 GMT 2017 armv6l
      Dec 7 2017:  Installed Raspbian Stretch
      ------------------------------------------------------------------------
       Broadcom BCM2835 700Mhz/512MB                               snail.wulf
      ------------------------------------------------------------------------
       Welcome to the Snail cluster!
      
       This system based on Raspbian Stretch.  Unauthorized use prohibited.
      
      ------------------------------------------------------------------------
      Last login: Sun Apr 11 11:35:03 2021 from 192.168.174.67
      $ singularity run lib/julia160.sif
                     _
         _       _ _(_)_     |  Documentation: https://docs.julialang.org
        (_)     | (_) (_)    |
         _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
        | | | | | | |/ _` |  |
        | | |_| | | | (_| |  |  Version 1.6.0 (2021-03-24)
       _/ |\__'_|_|_|\__'_|  |  
      |__/                   |
      
      julia> n=500
      500
      
      julia> A=rand(n,n);
      
      julia> x=zeros(n);
      
      julia> b=A*x;
      
      julia> @time A\b;
       42.304877 seconds (2.53 M allocations: 107.392 MiB, 3.28% gc time, 98.75% compilation time)
      
      julia> @time A\b;
        0.492899 seconds (4 allocations: 1.913 MiB)
      
      julia> @time A\b;
        0.490753 seconds (4 allocations: 1.913 MiB)
      
      Woohoo! What this shows is an up-to-date version of Julia inside a Singularity container based on the most recent version of 32-bit ARMv6 Alpine Linux running on a cluster whose system software is many years out of date, though not as old as RHEL7 or CentOS.

      Such containers imply there is no need to update the highly-customized system software that powers the super-cheap cluster in order to run the latest version of Julia. For a real supercomputer not only would everything be behind multiple firewalls, but only trusted users who have successfully applied for a grant to use the system for a specific project would have access.

      Around here additional precautions have been taken because Raspbian Stretch (unlike Red Hat) is not receiving security updates. There is also that little detail in which the dog developer has a login account.

      ejolson
      Posts: 7079
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Super-cheap Computing Cluster for Learning

      Thu Apr 15, 2021 2:44 am

      ejolson wrote:
      Sun Apr 11, 2021 7:56 pm
      Around here additional precautions have been taken because Raspbian Stretch (unlike Red Hat) is not receiving security updates. There is also that little detail in which the dog developer has a login account.
      Some time ago the super-cheap cluster was expanded by logging it into a network block device hosted by a QNAP NAS from the head node and then sharing the mounted file system using NFS to the computational nodes. More details about setting up iSCSI are described in

      viewtopic.php?f=36&t=274553

      Things are are bit simpler here as we are still using an SD card for the root file system. At the same time, things are more complicated as Fido insisted on account of the epidemic that the network connection to the NAS be virus proof. I suggested using VLAN tagging, but since (among other things) all the switches in the doghouse turned out to be unmanageable and dumb, the final solution was to isolate the NAS subnet using WireGuard.

      Since real supercomputers include various types of storage, adding the NAS to store data and as a scratch space regains part of the realism that was lost when the SD cards were removed from the individual Pi Zero computers. As might have been expected, complicated storage configurations pose only a minor problem when running Julia from a Singularity container: By default Singularity mounts the user's home directory into the container but not the scratch space.

      In the current setup the scratch space is mounted as /x/snab on the head node and shared over NFS to the five computational nodes. Note it is important for a cluster that each file system be mounted at the same point on each node.

      Now, suppose the current working directory is on the above mentioned scratch space which contains a Julia script called linpack.jl given by

      Code: Select all

      function linpack(n)
          A=rand(n,n)
          x=ones(n)
          b=A*x
          tsec=@elapsed A\b
          for j=1:10
              t=@elapsed A\b
              if t<tsec
                  tsec=t
              end
          end
          mflops=(2/3*n^3+2*n^2)/tsec/1000/1000
          println("n=$n tsec=$tsec MFLOPS=$mflops")
      end
      
      linpack(1000)
      
      To make the scratch space visible from within the Singularity container use the --bind option.

      In details the job submission file

      Code: Select all

      #!/bin/bash
      singularity run --bind /x/snab $HOME/lib/julia160.sif linpack.jl
      
      first binds the /x/snab file system into the Singularity container and then runs Julia from within that container with the program argument linpack.jl. From within the container Julia starts running and reads linpack.jl to begin executing it.

      For reference the resulting Slurm output file looks like

      Code: Select all

      n=1000 tsec=3.312522957 MFLOPS=201.86023624489752
      
      Given the fact that the computational nodes on the super-cheap cluster are currently running at 700 MHz, this compares well to the 213 MFLOPS obtained by Vince for the Pi B+ at

      http://web.eece.maine.edu/~vweaver/group/machines.html

      Thus, our custom built Julia container is within 5 percent of a tuned run of HPL when solving large systems of linear equations.
      Last edited by ejolson on Sat Apr 24, 2021 12:37 am, edited 1 time in total.

      ejolson
      Posts: 7079
      Joined: Tue Mar 18, 2014 11:47 am

      Re: Super-cheap Computing Cluster for Learning

      Fri Apr 23, 2021 5:06 am

      This post begins a description of how to launch a parallel computation on the super-cheap cluster using using an up-to-date version of Julia in a Singularity container. Rather than MPI, the goal is to use the built-in Distributed package.

      Ideally, one might want to use the ClusterManagers package for tighter integration with Slurm. Although ClusterManagers was easily installed on the 32-bit ARMv6 version of Julia, the way it works turns out to be incompatible with running Julia inside a container.

      The difficulty is that the package relies of being able to execute the Slurm srun command from within a running instance of Julia. However, running Julia means you are already inside the container while Slurm is outside the container. Therefore, there is no way to execute srun from Julia. There seem to be two possible solutions.
      • Install Slurm into the container.
        • Create a service that allows a program inside the container to run commands outside.
        I tried the first approach and immediately ran into two problems: The current version of Slurm is not compatible with the musl C library and 32-bit support has been deprecated. To avoid being tormented by what feels like another irritating retro-computing project, I further discovered a philosophical difficulty. Including a custom version of Slurm inside the container would make it less portable to other systems.

        The service approach is interesting due to its generality. Although Singularity is only designed to solve the problem of incompatible dynamic libraries--specifically not a sandbox--there may be other reasons why a hostexec command that allows a program running inside the container to execute a command outside the container is not already included.

        A web search didn't reveal a solution to this ridiculous chicken and egg problem but instead the fact that Slurm integration for R in a Singularity container poses a similar difficulty. Therefore, to get something done without too much effort, I decided to install ssh into the container and use that to distribute the communicating tasks.

        As openssh is already available in the 32-bit ARMv6 version of Alpine Linux, installation was easily accomplishing by unpacking julia160.sif into a directory (called juliawork for definiteness), entering that directory with Singularity and installing openssh.

        Code: Select all

        # singularity shell -w juliawork
        Singularity> apk add openssh
        Singularity> ssh-keygen -f /etc/ssh/ssh_host_rsa_key
        Generating public/private rsa key pair.
        Enter passphrase (empty for no passphrase): 
        Enter same passphrase again: 
        Your identification has been saved in /etc/ssh/ssh_host_rsa_key
        Your public key has been saved in /etc/ssh/ssh_host_rsa_key.pub
        
        After exiting the container I copied ssh_known_hosts from /etc/ssh on the snail cluster to juliawork/etc/ssh so the public keys for the computational nodes would be recognized. Finally pack the modified container back up with

        Code: Select all

        # singularity build julia160.sif juliawork
        
        The next post describes testing ssh from within the Singularity container, installing the container and using it to run a parallel Julia computation.

        ejolson
        Posts: 7079
        Joined: Tue Mar 18, 2014 11:47 am

        Re: Super-cheap Computing Cluster for Learning

        Fri Apr 23, 2021 4:06 pm

        ejolson wrote:
        Fri Apr 23, 2021 5:06 am
        The next post describes testing ssh from within the Singularity container, installing the container and using it to run a parallel Julia computation.
        To check whether ssh is properly installed in the Singularity container type

        Code: Select all

        singularly shell --bind `pwd` julia160.sif
        Singularity> ssh s0
        $ hostname
        s0
        $ exit
        logout
        Connection to s0 closed.
        Singularity> exit
        
        The point of this test is to make sure ssh does not ask for a password and already knows the public-key signatures of the nodes. This is needed for Julia to start a parallel job from within the container later.

        The next step is to install a convenience script that automatically runs the Julia container when julia is typed at the command line. We do this because the Distributed package expects to be able to start additional Julia processes by simply running the julia command on the system. Although Distributed can be configured to use any command for this purpose, it is convenient to place a script which runs the Singularity container in more or less the same path were the real Julia is inside the container.

        To this end, create a julia-1.6.0 directory and copy julia160.sif to it.

        Code: Select all

        # mkdir -p /usr/local/julia-1.6.0/bin
        # cp julia160.sif /usr/local/julia-1.6.0
        
        Then create a script named julia in /usr/local/julia-1.6.0/bin containing

        Code: Select all

        #!/bin/bash
        exec singularity run --bind `pwd` /usr/local/julia-1.6.0/julia160.sif "$@"
        
        Note that the bind command mounts the current working directory inside the Singularity container, which I found sufficient in all the use cases considered so far.

        Set the execute permission and link the startup script to /usr/local/bin with

        Code: Select all

        # chmod 755 /usr/local/julia-1.6.0/bin/julia
        # cd /usr/local/bin
        # ln -s ../julia-1.6.0/bin/julia .
        
        After making these changes. Shut the cluster down, update the system images on the worker nodes and then reboot.

        Code: Select all

        # cd /x
        # ./sdown
        # ./supdate
        # ./srun
        
        If everything goes well, one should be able to log into a remote node and run Julia as

        Code: Select all

        $ slogin s0
        $ julia
                       _
           _       _ _(_)_     |  Documentation: https://docs.julialang.org
          (_)     | (_) (_)    |
           _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
          | | | | | | |/ _` |  |
          | | |_| | | | (_| |  |  Version 1.6.0 (2021-03-24)
         _/ |\__'_|_|_|\__'_|  |  
        |__/                   |
        
        julia> 1+1
        2
        
        julia> exit()
        
        The Julia container is now ready to use for distributed parallel processing on the super-cheap cluster.

        ejolson
        Posts: 7079
        Joined: Tue Mar 18, 2014 11:47 am

        Re: Super-cheap Computing Cluster for Learning

        Fri Apr 23, 2021 9:59 pm

        ejolson wrote:
        Fri Apr 23, 2021 4:06 pm
        The Julia container is now ready to use for distributed parallel processing on the super-cheap cluster.
        To test Julia 1.6.0 on the five Raspberry Pi Zero computers which constitute the super-cheap cluster, this post describes a parallel computation that renders a publication quality image of Barnsley's fern-shaped fractal at a resolution of 3108 x 6292 using 800000000 random iterations.

        This computation can be split into multiple tasks each running on a separate processor in a way that is sometimes called embarrassingly parallel. The idea is the same as how multiple random processes were used to approximate Pi with the dartboard method earlier.

        viewtopic.php?p=1265703#p1265703

        Since srun can't be executed from within the Singularity container, we take inspriation from the last section of

        https://slurm.schedmd.com/srun.html

        and execute srun it ahead of time in a way similar to making a machine file for certain versions of MPI lacking Slurm integration.

        This led to a job description file which looks like

        Code: Select all

        #!/bin/bash
        #SBATCH -n 5
        #SBATCH -J ferndist
        export JULIA_MACHINES=`srun -l /bin/hostname | sort -n | awk '{print $2}'`
        export JULIA_WORKER_TIMEOUT=1000
        time julia ferndist.jl
        
        Note since the super-cheap cluster is named snail, it was necessary to increase the default timeout to ensure the parallel Julia instances had time to start.

        Submitting this job with

        Code: Select all

        $ sbatch ferndist.slm
        
        eventually led to the output

        Code: Select all

        Main task running on s0
        Worker 3 running on s4
        Worker 5 running on s1
        Worker 2 running on s2
        Worker 4 running on s3
        
        ferndist.jl -- Compute Barnsley's Fern (JULIA_MACHINES="s0 s1 s2 s3 s4")
        
        Iteration rate is 3.4813772532331985e6 per second.
        Total execution time 229.794113596 seconds.
        
        real    16m39.041s
        user    10m36.830s
        sys 0m18.500s
        
        I found it interesting that the execution time was only 3.5 minutes while the total time was more than 16 minutes. This extra 12.5 minutes appears to be startup time related to the fact that the cluster networking fabric is made from a USB2 hub.

        It is important to remember the goal with the super-cheap cluster is to learn the tools and techniques used in high-performance computing, not to actually create a fast machine. Thus, the main success here is getting the distributed computing package to work from a container.

        For reference, the source code is

        Code: Select all

        #  ferndist.jl -- Compute the Barnsley Fern
        
        using Distributed
        
        if "JULIA_MACHINES" in keys(ENV)
            global machines=split(ENV["JULIA_MACHINES"],[' ','\n'])
            addprocs(machines[2:length(machines)])
        else
            global machines=["localhost"]
            addprocs(Sys.CPU_THREADS-1)
        end
        
        @everywhere begin
        
        using StaticArrays, Random
        
        const N=800000000
        
        const A=[SA[0.85 0.04; -0.04 0.85],
            SA[0.20 -0.26; 0.23 0.22],
            SA[-0.15 0.28; 0.26 0.24],
            SA[0.00 0.00; 0.00  0.16]]
        
        const B=[SA[0.00, 1.60],
            SA[0.00, 1.60],
            SA[0.00, 0.44],
            SA[0.00, 0.00]]
        
        const P=[0.85, 0.07, 0.07, Inf]
        const cdf=[sum(P[1:i]) for i=1:4]
        
        f(i,x)=A[i]*x+B[i]
        i(p)=findfirst(x->p<=x,cdf)
        
        const xmin=SA[-2.1820,0]
        const xmax=SA[2.6558,9.9983]
        const border=0.1
        const scale=617.0
        
        const nu=Integer.(floor.(scale*(xmax-xmin.+2*border)))
        const image=zeros(Int8,nu...)
        
        function getnode(c)
            put!(c,(myid(),gethostname()))
        end
        
        function point(x)
            i1=Integer(floor(scale*(x[1]-xmin[1]+border)))
            i2=Integer(floor(scale*(x[2]-xmin[2]+border)))
            image[i1,i2]=1
        end
        
        function work(s,jmax,c)
            gen=MersenneTwister(s)
            xn=SA[0.0,0.0]
            point(xn)
            for j=1:jmax
                xn=f(i(rand(gen)),xn)
                point(xn)
            end
            flip=size(image)[2]+1
            bmap=zeros(UInt8,(size(image)[1]+7)÷8,size(image)[2])
            for iy=1:size(image)[2]
                rx=1; rb=UInt8(0)
                ib=UInt8(128)
                for ix=1:size(image)[1]
                    if image[ix,iy]!=0
                        rb|=ib
                    end
                    ib>>=1
                    if ib==0
                        bmap[rx,flip-iy]=rb; ib=UInt8(128)
                        rb=UInt8(0); rx+=1
                    end
                end
                if ib!=0
                    bmap[rx,flip-iy]=rb
                end
            end
            put!(c,bmap)
        end
        
        end # everywhere
        
        function plot(bmap)
            open("fern.pnm","w") do io
                println(io,"P4")
                println(io,size(image)[1]," ",size(image)[2])
                write(io,bmap)
            end
        end
        
        function main()
            ncpu=nprocs()
            println("ferndist.jl -- Compute Barnsley's Fern ",
                "(JULIA_MACHINES=\"",join(machines,' '),"\")")
            ret=RemoteChannel(()->Channel{Array{UInt8,2}}(nprocs()))
            for n in workers()
                remotecall(work,n,rand(UInt),N÷ncpu,ret)
            end
            work(rand(UInt),N÷ncpu,ret)
            bfern=take!(ret)
            for n in workers()
                bfern.=bfern.|take!(ret)
            end
            plot(bfern)
        end
        
        function startup()
            ret=RemoteChannel(()->Channel{Tuple}(nprocs()))
            for n in workers()
                remotecall(getnode,n,ret)
            end
            getnode(ret)
            for n=1:nprocs()
                id,name=take!(ret)
                if id==1
                    println("Main task running on ",name)
                else
                    println("Worker ",id," running on ",name)
                end
            end
            println()
            flush(stdout)
        end
        
        startup()
        tsec=@elapsed main()
        println("\nIteration rate is ",N/tsec," per second.")
        println("Total execution time ",tsec," seconds.")
        rmprocs(workers()...; waitfor=1000)
        

        User avatar
        Gavinmc42
        Posts: 5479
        Joined: Wed Aug 28, 2013 3:31 am

        Re: Super-cheap Computing Cluster for Learning

        Sat Apr 24, 2021 2:01 am

        Would a cluster of the cheapest CM4s be the way to go?
        No wireless, no eMMC, just booting from high speed Ethernet.
        1GB of ram is plenty?
        I'm dancing on Rainbows.
        Raspberries are not Apples or Oranges

        ejolson
        Posts: 7079
        Joined: Tue Mar 18, 2014 11:47 am

        Re: Super-cheap Computing Cluster for Learning

        Sat Apr 24, 2021 4:08 am

        Gavinmc42 wrote:
        Sat Apr 24, 2021 2:01 am
        Would a cluster of the cheapest CM4s be the way to go?
        No wireless, no eMMC, just booting from high speed Ethernet.
        1GB of ram is plenty?
        To reach the same level of memory to CPU core as a Pi Zero one would need the 2GB model. Other than being out of stock, the main difficulty for building a cluster with the CM4 seems to be lack of a suitable carrier board. While I expect the supply-chain problems will eventually be solved, given the way the CM4 mounts flat side down rather than using an edge connector I'm not sure there will ever be a carrier board that reaches the kind of density one can get with a CM3 or even a stack of Pi 4B computers. Do you have an idea for a carrier board?

        User avatar
        Gavinmc42
        Posts: 5479
        Joined: Wed Aug 28, 2013 3:31 am

        Re: Super-cheap Computing Cluster for Learning

        Sat Apr 24, 2021 4:48 am

        Do you have an idea for a carrier board?
        Not till you just asked :D
        Assuming we want a nice 19" rack format?
        44.45mm, 1U height, going to be tight mounting on edge.

        The only thing we need would be power and Ethernet?
        A simple PCB with the Mag jack and PoE could slot into that rack.
        Two rows of edge mounted PCBs?

        Hmm, 1/2 width racks, that would be a cool desktop enclosure.
        !/2 rack width PoE switch?
        Those might be hard to get.

        Version 2 PCB would have NVMe slot as well.
        What is the smallest PoE power circuit?

        Prototype PCB, Mag socket and USB power/boot connector.
        That would probably be a cheap PCB.
        I would stick a few i2c connectors on it for making IoT thingies.

        Why stick to Rack format?
        Could make a cluster into all sorts of shapes.
        Stick some LEDs on it, Clustered Chandelier?

        You have me wanting to fire up the PCB software now.
        I'm dancing on Rainbows.
        Raspberries are not Apples or Oranges

        ejolson
        Posts: 7079
        Joined: Tue Mar 18, 2014 11:47 am

        Re: Super-cheap Computing Cluster for Learning

        Tue Apr 27, 2021 9:52 pm

        Gavinmc42 wrote:
        Sat Apr 24, 2021 4:48 am
        Why stick to Rack format?
        Could make a cluster into all sorts of shapes.
        Stick some LEDs on it, Clustered Chandelier?

        You have me wanting to fire up the PCB software now.
        I originally wanted to make the super-cheap cluster into a C shape with the power supply at the bottom like a Cray 1. Unfortunately, only 5 Pi Zero computers weren't enough for a convincing C. I would have used more, but didn't want to run into limitations on the number of USB-connected devices with the Pi B+ that serves as the head node.

        In my opinion a CM4-based cluster board should not use power over Ethernet but include its own power management. I think it would also be nice to skip the usual Ethernet physical layer with a built-in switch that directly connects all the modules without the sockets, magnetics and associated CAT6 wires. Reserving the PCIe bus for NVMe SSDs sounds like a good idea as one could then learn about Ceph, Hadoop and other clustered-storage compute solutions.

        Since the 2GB CM4 costs US$ 30, I think a carrier under US$ 120 with space for 8 modules could make a marketable product. Given the 12.5 minutes start-up time for Julia on the super-cheap cluster,

        An Affordable 64-bit Cluster for Learning

        might be a good title for a new thread.

        I could reconfigure the Pi cloud as a parallel computing cluster, but it's currently working so well as a cloud. Anyway, it would look better on the resume to receive a grant for building something new. Where do I apply? A new cluster based on the CM4 should even work well enough to use in certain production settings.

        Return to “Teaching and learning resources”