User avatar
bensimmo
Posts: 4152
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 8:59 am

did i forget to mention, my overclocked Pi4 ran it (insane) in 1946ms?

Heater
Posts: 13088
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 9:02 am

Seems I described WASM slightly wrongly above.

The byte codes of Java and C# are designed to be run by a virtual machine. They can be read, decoded, executed one at a time. Or the VM can do some just-in-time compilation of the byte codes to native instructions as it runs thus speeding things up.

Conversely WASM binaries are designed to be compiled to native instructions as a whole and then executed.

As far as I can tell...

Soruk
Posts: 21
Joined: Thu Jun 20, 2019 11:25 am

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 10:21 am

ejolson wrote:
Thu Aug 08, 2019 4:58 am
ejolson wrote:
Wed Aug 07, 2019 3:42 pm
Since cream rises to the top, it's not surprising that Basic has taken over the top of the bar chart. As Basic was the language responsible for the widespread computer literacy of the golden age of personal computing, I've been looking at hanagram.bas to see if there are any obvious ways to improve the performance.
After a few micro-optimizations to hanagram.bas I arrived at
(code)
Running this code on the Raspberry Pi 4B yields

Code: Select all

$ make
/usr/local/fbc-300/bin/fbc -R -lang qb hanav2.bas
$ time ./hanav2 >hanav2.insane

real    0m10.634s
user    0m9.699s
sys     0m0.931s
$ time ./hanav2 >hanav2.insane

real    0m10.497s
user    0m9.628s
sys     0m0.870s
$ time ./hanav2 >hanav2.insane

real    0m10.543s
user    0m9.682s
sys     0m0.859s
$ md5sum hanav2.insane selfgrams.pl.insane 
addeda36a4631afc983f3bff13742e36  hanav2.insane
addeda36a4631afc983f3bff13742e36  selfgrams.pl.insane
Although the new code is about 2 times faster than before, it still appears near the top of the bar chart.
I've modified the code for BBC BASIC, initially just to make it run on Matrix Brandy, then some optimisations available in BASIC V/VI.

Code: Select all

1000 REM hanabrandy.bas -- Find anagrams
1010 REM Written July 24, 2019 by Eric Olson
1020 REM
1030 REM This program demonstrates using letter counts and a heap to
1040 REM find anagrams in the insane British dictionary.
1050 REM
1060 REM Converted to BBC BASIC for Matrix Brandy by Michael McConnell
1990 IF HIMEM < &2200000 THEN PRINT "Insufficient memory, please use -size 34M":END
2000 F$="/usr/share/dict/british-english-insane"
2001 REM F$="tail.dat"
2010 m0%=0:F%=OPENIN(F$)
2020 REPEAT A$=GET$#F%:m0%+=1:UNTIL EOF#F%
2030 DIM L$(m0%),K$(m0%),h%(m0%),v0%(26)
2040 CLOSE #F%:F%=OPENIN(F$)
2050 A1%=ASC("a")-1:A2%=ASC("0"):N0%=0:H0%=0
2060 FOR I%=1 TO m0%:A$=GET$#F%
2063 FOR J%=1 TO 26:v0%(J%)=0:NEXT J%:J%=0
2065 IF J%>=LEN(A$) THEN 2075
2068 J%+=1:C%=ASC(MID$(A$,J%,1))-A1%
2070 IF (C%<1) OR (C%>26) THEN 2130
2071 v0%(C%)+=1:GOTO 2065
2075 V$="00000000000000000000000000":FOR J%=1 TO 26
2080 MID$(V$,J%,1)=CHR$(v0%(J%)+A2%):NEXT J%
2120 N0%+=1:L$(N0%)=A$:K$(N0%)=V$:PROCinsert
2130 NEXT I%:CLOSE #F%
2140 IF H0%=0 THEN END
2150 PROCremove
2160 j0%=J%:L$(j0%)=L$(j0%)+":"
2170 IF H0%=0 THEN 2400 
2180 PROCremove
2190 IF K$(J%)<>K$(j0%) THEN 2160
2200 L$(j0%)=L$(j0%)+" "+L$(J%)+",":GOTO 2170
2400 FOR I%=1 TO N0%:A$=L$(I%)
2410 IF RIGHT$(A$,1)="," THEN PRINT LEFT$(A$,LEN(A$)-1)
2430 NEXT I%
2900 END
3999:
4000 DEFPROCinsert
4001 REM Insert a key into the heap
4010 REM  inputs: N0% the key to insert
4020 H0%+=1:R%=H0%:h%(R%)=N0%
4030 S%=R%DIV2:IF S%=0 THEN ENDPROC
4035 IF K$(h%(R%))>K$(h%(S%)) THEN ENDPROC
4036 IF K$(h%(R%))<K$(h%(S%)) THEN 4060
4040 IF L$(h%(R%))>=L$(h%(S%)) THEN ENDPROC
4060 T%=h%(S%):h%(S%)=h%(R%):h%(R%)=T%:R%=S%:GOTO 4030
4499:
4500 DEFPROCremove
4501 REM Remove a key from the heap
4510 REM  output: J% the key removed
4520 J%=h%(1):h%(1)=h%(H0%):H0%-=1:S%=1
4530 R0%=S%<<1:R1%=R0%+1:IF R0%>H0% THEN ENDPROC
4535 IF R1%>H0% THEN 4550
4536 IF K$(h%(R0%))>K$(h%(R1%)) THEN 4600
4537 IF K$(h%(R0%))<K$(h%(R1%)) THEN 4550
4540 IF L$(h%(R0%))>=L$(h%(R1%)) THEN 4600
4550 IF K$(h%(R0%))>K$(h%(S%)) THEN ENDPROC
4551 IF K$(h%(R0%))<K$(h%(S%)) THEN 4560
4552 IF L$(h%(R0%))>=L$(h%(S%)) THEN ENDPROC
4560 T=h%(R0%):h%(R0%)=h%(S%):h%(S%)=T:S%=R0%:GOTO 4530
4600 IF K$(h%(R1%))>K$(h%(S%)) THEN ENDPROC
4601 IF K$(h%(R1%))<K$(h%(S%)) THEN 4660
4602 IF L$(h%(R1%))>=L$(h%(S%)) THEN ENDPROC
4660 T=h%(R1%):h%(R1%)=h%(S%):h%(S%)=T:S%=R1%:GOTO 4530
Sadly I don't have a RasPi4B to run it on, but on my ready-to-hand RasPi2 I get:

Code: Select all

$ time sbrandy -size 34M hanabrandy.bas > insane

real    3m58.576s
user    3m50.410s
sys     0m8.030s
....which isn't exactly quick.

User avatar
Gavinmc42
Posts: 3622
Joined: Wed Aug 28, 2013 3:31 am

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 10:43 am

hmm just had a crazy thought.
1) Will node.js run WebGL samples?
2) Can it do it without x11?

I have not touched node.js for about 4-5 years, seems usable now :D
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Heater
Posts: 13088
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 1:46 pm

Gavibmc42,

1) No.
2) No

What? We have been using node.js in production services since it was the new shiney thing. It was always very useable, performant and reliable.

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 1:48 pm

Soruk wrote:
Thu Aug 08, 2019 10:21 am
....which isn't exactly quick.
Congratulations! Another challenge solution has floated towards the top of the Insane British Anagram Bar Chart of Fame.

Image

After changing the line

Code: Select all

2410 IF RIGHT$(A$,1)="," THEN PRINT LEFT$(A$,LEN(A$)-1);CHR$(10);
to avoid the MSDOS-style line terminators, the output agrees with the reference Perl code. Running on the Pi 4B I obtained

Code: Select all

$ time sbrandy -size 34M hanabrandy.bas >hanabrandy.insane

real    1m26.964s
user    1m18.896s
sys 0m8.062s
$ time sbrandy -size 34M hanabrandy.bas >hanabrandy.insane

real    1m27.247s
user    1m19.136s
sys 0m8.111s
$ time sbrandy -size 34M hanabrandy.bas >hanabrandy.insane

real    1m27.202s
user    1m19.149s
sys 0m8.051s
$ md5sum hanabrandy.insane selfgrams.pl.insane 
addeda36a4631afc983f3bff13742e36  hanabrandy.insane
addeda36a4631afc983f3bff13742e36  selfgrams.pl.insane
For a pure interpreter, the runtime seems relatively quick. For reference I also tried running hanav2.bas using bwBasic. The program finished in about 16 minutes with an obviously wrong answer. It looked like some sort of memory corruption, hopefully not terminal reaction vessel meltdown in the turboencabulator.

Do you think it would help to clean the flux ports?
Last edited by ejolson on Thu Aug 08, 2019 11:29 pm, edited 1 time in total.

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 5:29 pm

ejolson wrote:
Wed Aug 07, 2019 5:04 pm
John_Spikowski wrote:
Wed Aug 07, 2019 4:56 pm
VB doen't run on the RPi so would it still count?
Visual Basic .NET runs on the Raspberry Pi using Mono and the open-source vbnc compiler.
Here is anagram-vb.bas, a direct translation of anagram-sb.bas to Visual Basic.

Code: Select all

module anagram

'   anagram-vb.bas -- Find anagrams
'   Written August 8, 2019 by Eric Olson
'   Based on anagram-sb.bas written by John Spikowski
'
'   This program demonstrates using associative arrays and a bubble
'   sort to find anagrams in the insane British dictionary.

dim filename, wordlist() as string

function isword(s as string) as boolean
    dim i,c as integer
    for i=1 to len(s)
        c=asc(mid(s,i,1))
        if (c<asc("a")) or (c>asc("z")) then
            return false
        end if
    next i
    return true
end function

function strkey(s as string) as string
    dim sortword as string
    dim i,j as integer
    sortword=s
    for i=1 to len(sortword)
        for j=i+1 to len(sortword)
            if mid(sortword,i,1)>mid(sortword,j,1) then
                dim temp as string
                temp=mid(sortword,i,1)
                mid(sortword,i,1)=mid(sortword,j,1)
                mid(sortword,j,1)=temp
            end if
        next j
    next i
    return sortword
end function

dim anagram as new system.collections.generic.dictionary(of string,string)
dim index as new system.collections.generic.list(of string)

function main() as integer
    dim thisword,sortword as string
    filename="/usr/share/dict/british-english-insane"
    wordlist=system.io.file.readalllines(filename)
    for each thisword in wordlist
        if isword(thisword) then
            sortword=strkey(thisword)
            if anagram.containskey(sortword) then
                dim temp as string
                anagram.trygetvalue(sortword,temp)
                anagram.remove(sortword)
                anagram.add(sortword,temp+" "+thisword+",")
            else
                anagram.add(sortword,thisword+":")
                index.add(sortword)
            end if
        end if
    next thisword
    dim anaidx as string
    for each anaidx in index
        dim temp as string
        anagram.trygetvalue(anaidx,temp)
        if right(temp,1)="," then
            console.writeline(left(temp,len(temp)-1))
        end if
    next anaidx
end function

end module
Running on the Raspberry Pi 4B yields the following runtimes.

Code: Select all

$ time mono anagram-vb.exe >anagram-vb.insane

real    0m39.643s
user    0m39.434s
sys 0m0.520s
$ time mono anagram-vb.exe >anagram-vb.insane

real    0m38.955s
user    0m38.617s
sys 0m0.680s
$ time mono anagram-vb.exe >anagram-vb.insane

real    0m39.141s
user    0m38.812s
sys 0m0.640s
$ md5sum anagram-vb.insane selfgrams.pl.insane 
addeda36a4631afc983f3bff13742e36  anagram-vb.insane
addeda36a4631afc983f3bff13742e36  selfgrams.pl.insane
I'm happy the turboencabulator appears to be working fine and correct results were obtained with Visual Basic. The problem before must have been bwBasic. The bar chart has been updated with the new challenge entry.
Last edited by ejolson on Thu Aug 08, 2019 11:23 pm, edited 1 time in total.

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 6:42 pm

ejolson wrote:
Thu Aug 08, 2019 5:29 pm
Here is anagram-vb.bas, a direct translation of the ScriptBasic program anagram-sb.bas to Visual Basic.
I showed my Visual Basic code to Fido and the dog developer barked out loud laughing. That bubble sort will certainly create some global warming, why not use dogarithms?

Modifications to anagram-vb.bas after consulting with the canine coder produced the program

Code: Select all

module anagram

'   dogagram.bas -- Find anagrams
'   Written August 8, 2019 by Eric Olson
'   Based on anagram-sb.bas written by John Spikowski
'
'   This program demonstrates using associative arrays and dogarithms
'   to find anagrams in the insane British dictionary.

dim filename, wordlist() as string
dim prime() as integer = { 
    0, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 
    43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101 }
dim dog(26) as int64 

function isword(s as string) as boolean
    dim i,c as integer
    for i=1 to len(s)
        c=asc(mid(s,i,1))
        if (c<asc("a")) or (c>asc("z")) then
            return false
        end if
    next i
    return true
end function

function strkey(s as string) as int64
    dim key as int64
    dim i as integer
    key=0
    for i=1 to len(s)
        dim c as integer
        c=asc(mid(s,i,1))-asc("a")+1
        key=key+dog(c)
    next i
    return key
end function

dim anagram as new system.collections.generic.dictionary(of int64,string)
dim index as new system.collections.generic.list(of int64)

function main() as integer
    dim i as integer
    for i=1 to 26
        dog(i)=math.log(prime(i))*4503599627370496.0+0.5
    next i
    dim word as string
    dim key as int64
    filename="/usr/share/dict/british-english-insane"
    wordlist=system.io.file.readalllines(filename)
    for each word in wordlist
        if isword(word) then
            key=strkey(word)
            if anagram.containskey(key) then
                dim temp as string
                anagram.trygetvalue(key,temp)
                anagram.remove(key)
                anagram.add(key,temp+" "+word+",")
            else
                anagram.add(key,word+":")
                index.add(key)
            end if
        end if
    next word
    dim anaidx as int64
    for each anaidx in index
        dim temp as string
        anagram.trygetvalue(anaidx,temp)
        if right(temp,1)="," then
            console.writeline(left(temp,len(temp)-1))
        end if
    next anaidx
end function

end module
On the Pi 4B the output was

Code: Select all

$ time mono ./dogagram.exe >dogagram.insane 

real    0m7.317s
user    0m7.058s
sys     0m0.461s
$ time mono ./dogagram.exe >dogagram.insane 

real    0m7.449s
user    0m7.060s
sys     0m0.591s
$ time mono ./dogagram.exe >dogagram.insane 

real    0m7.322s
user    0m6.796s
sys     0m0.729s
$ md5sum dogagram.insane selfgrams.pl.insane 
addeda36a4631afc983f3bff13742e36  dogagram.insane
addeda36a4631afc983f3bff13742e36  selfgrams.pl.insane
The bar chart has been updated with the improved timing.
Last edited by ejolson on Fri Aug 09, 2019 12:15 am, edited 3 times in total.

Heater
Posts: 13088
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 6:54 pm

I spent enough time on the insane-british-anagram in Rust to run as a native executable, under node.js and in the browser that I thought it was time it had it's own Github repository: https://github.com/ZiCog/insane-british-anagram-rust

If nothing else it might serve as a small, simple, working example for anyone who wants to get high speed code running in a web page.

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 8:03 pm

Heater wrote:
Thu Aug 08, 2019 6:54 pm
If nothing else it might serve as a small, simple, working example for anyone who wants to get high speed code running in a web page.
How difficult do you think it would be to run this Pi Pie Chart program in the browser? The graphical output is already in SVG format, so should be easy to display along with the text. Is OpenMP available yet?

Heater
Posts: 13088
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Thu Aug 08, 2019 8:34 pm

No, no, no, I'm not going to look down that rabbit hole.

If it's all plain C that does no file I/O and does not use threads you can simple compile it with emscripten instead of gcc. emcc takes all the same options. Arrange for a function that returns the SVG as a sting.

If there is any file I/O in there it emscripten simulates a lot of the standard C library API and maintains a file system in memory.

I suspect threads and are right out.

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 12:28 am

Heater wrote:
Thu Aug 08, 2019 8:34 pm
No, no, no, I'm not going to look down that rabbit hole.

If it's all plain C that does no file I/O and does not use threads you can simple compile it with emscripten instead of gcc. emcc takes all the same options. Arrange for a function that returns the SVG as a sting.

If there is any file I/O in there it emscripten simulates a lot of the standard C library API and maintains a file system in memory.

I suspect threads and are right out.
From what I read, plans to support parallel processing by means of threads in WASM have been planned for some time. Here is some interesting research exploring task-based parallelism.

User avatar
John_Spikowski
Posts: 1330
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 9:01 am

Nice work @ejolson using VB. It was enlightening to see that there is a stable VB compiler for Linux.

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 4:42 pm

John_Spikowski wrote:
Fri Aug 09, 2019 9:01 am
Nice work @ejolson using VB. It was enlightening to see that there is a stable VB compiler for Linux.
Mono seems to work pretty well on the Raspberry Pi. Although C# is being promoted more (even my brother has become a C# expert), Visual Basic has maintained feature parity until recently.

I made some modifications to dogagram.bas to make better use of dictionaries and ended up with cleaner and faster code.

Code: Select all

module anagram

'   fastgram.bas -- Find anagrams
'   Modified to use dictionaries in a more paradigmatic way
'   Written August 8, 2019 by Eric Olson
'   Based on anagram-sb.bas written by John Spikowski
'
'   This program demonstrates using associative arrays and dogarithms
'   to find anagrams in the insane British dictionary.

dim filename, wordlist() as string
dim prime() as integer = { 
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 
    43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101 }
dim dog(26) as int64 

function isword(s as string) as boolean
    dim i,c as integer
    for i=1 to len(s)
        c=asc(mid(s,i,1))
        if (c<asc("a")) or (c>asc("z")) then
            return false
        end if
    next i
    return true
end function

function strkey(s as string) as int64
    dim key as int64
    dim i as integer
    key=0
    for i=1 to len(s)
        dim c as integer
        c=asc(mid(s,i,1))-asc("a")
        key=key+dog(c)
    next i
    return key
end function

dim anagram as new system.collections.generic.dictionary(of int64,string)
dim index as new system.collections.generic.list(of int64)

function main() as integer
    dim i as integer
    for i=0 to 25
        dog(i)=math.log(prime(i))*4503599627370496.0+0.5
    next i
    dim word as string
    dim key as int64
    filename="/usr/share/dict/british-english-insane"
    wordlist=system.io.file.readalllines(filename)
    for each word in wordlist
        if isword(word) then
            dim temp as string
            key=strkey(word)
            if anagram.trygetvalue(key,temp) then
                anagram(key)=temp+" "+word+","
            else
                anagram.add(key,word+":")
                index.add(key)
            end if
        end if
    next word
    dim anaidx as int64
    for each anaidx in index
        dim temp as string
        anagram.trygetvalue(anaidx,temp)
        if right(temp,1)="," then
            console.writeline(left(temp,len(temp)-1))
        end if
    next anaidx
end function

end module
Results on the Pi 4B are

Code: Select all

$ make
vbnc /optimize /removeintchecks fastgram.bas
Visual Basic.Net Compiler version 0.0.0.5943 (Mono 4.0.1 - tarball)
Copyright (C) 2004-2010 Rolf Bjarne Kvinge. All rights reserved.

Compilation successful
Compilation took 00:00:02.3132840
$ time mono fastgram.exe >fastgram.insane

real    0m6.879s
user    0m6.535s
sys     0m0.541s
$ time mono fastgram.exe >fastgram.insane

real    0m6.660s
user    0m6.411s
sys     0m0.439s
$ time mono fastgram.exe >fastgram.insane

real    0m6.783s
user    0m6.528s
sys     0m0.451s
$ md5sum fastgram.insane selfgrams.pl.insane 
addeda36a4631afc983f3bff13742e36  fastgram.insane
addeda36a4631afc983f3bff13742e36  selfgrams.pl.insane
While performance improvements are admittedly minor, the readability of the code has improved as well as my knowledge about the Visual Basic language and compiler.

In my opinion, whether learning to program for the first time or simply learning a new language, it is important to make experiments and continue to refine a small project--the insane British anagram challenge--in order to learn things well. This is also what has been happening here with the Rust programming language.
Last edited by ejolson on Fri Aug 09, 2019 7:44 pm, edited 1 time in total.

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 6:44 pm

ejolson wrote:
Fri Aug 09, 2019 4:42 pm
This is also what has been happening here with the Rust programming language.
I realized that I haven't run the last two Rust programs on the Pi 4B. Here are the results:

Code: Select all

$ make
rustc -C opt-level=3 iba-5.rs
rustc -C opt-level=3 iba-4.rs
$$ time ./iba-4 >iba-4.insane

real    0m0.631s
user    0m0.471s
sys 0m0.160s
$ time ./iba-4 >iba-4.insane

real    0m0.637s
user    0m0.403s
sys 0m0.232s
$ time ./iba-4 >iba-4.insane

real    0m0.630s
user    0m0.396s
sys 0m0.234s
$ time ./iba-5 >iba-5.insane

real    0m0.614s
user    0m0.443s
sys 0m0.171s
$ time ./iba-5 >iba-5.insane

real    0m0.622s
user    0m0.438s
sys 0m0.179s
$ time ./iba-5 >iba-5.insane

real    0m0.614s
user    0m0.322s
sys 0m0.292s
$ md5sum iba-4.insane iba-5.insane selfgrams.pl.insane 
addeda36a4631afc983f3bff13742e36  iba-4.insane
addeda36a4631afc983f3bff13742e36  iba-5.insane
addeda36a4631afc983f3bff13742e36  selfgrams.pl.insane
It looks like the latest versions in Rust are slightly faster.

Along different lines, I've now run anagram-sb.bas on the Pi 4B. It successfully completed in twelve hours. That's about 1000 times slower than anagram-vb.bas, which is essentially the same program running under a different version of Basic; however, what's important is that it got the right answer. I wonder how difficult it would be to replace the linear lists used for associative arrays in the internals of ScriptBasic with hash tables or trees.

The insane anagram bar chart of fame has been updated.
Last edited by ejolson on Fri Aug 09, 2019 7:26 pm, edited 2 times in total.

Heater
Posts: 13088
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 7:20 pm

Dang, you beat me to it.

I was about to ask: If anyone out there with a Pi 4 based Turboencabulator could run the latest Rust anagram solution?

This has just received some simplifications and optimizations via a pull request on Githhub from a Rust guru. The result is a good bit faster on my PC. Marginally so on a Pi 3.

I removed the date and version indication from the comments in there. All that meta data is now in the git repo. The file is actually main.rs in the source directory with a git tag of v0.1.2. https://github.com/ZiCog/insane-british-anagram-rust. There is build.sh script in the top level directory.

Code: Select all

//
// insane-british-anagram.rs - Find words that have valid anagrams
//                             Words sourced from Debian's british-english-insane dictionary
//
// WARNING: This is not a good solution. Only a crazy experiment in trying to write Rust like C.
//          It's verbose, complex but marginally faster.

// LOOK AT:  Bare Metal WASM by Cliff L Biffle:
//           https://users.rust-lang.org/t/writing-a-213-byte-webassembly-graphics-demo-with-rust/29099
//           http://cliffle.com/blog/bare-metal-wasm/

use std::io::{self, Write};
use hashbrown::HashMap;             // Google's faster HashMap
use std::time::{Duration, Instant};
use wasm_bindgen::prelude::*;
use arrayvec::ArrayVec;

#[derive(Copy, Clone)]
struct SliceSpec {
    begin: usize,
    end: usize,
}

#[derive(Clone)]
struct AnagramSet {
    word_slices: ArrayVec<[SliceSpec; 17]>,
}

impl AnagramSet {
    fn new(word: SliceSpec) -> AnagramSet {
        let mut word_slices = ArrayVec::new();
        word_slices.push(word);
        AnagramSet {
            word_slices,
        }
    }
    fn push(&mut self, slice: SliceSpec) {
        self.word_slices.push(slice);
    }
}

#[cfg(not(feature = "web"))]
fn line_break() -> String {
    let br: String = String::from("\n");
    br
}

#[cfg(feature = "web")]
fn line_break() -> String {
    let br: String = String::from("<br>");
    br
}

fn output_anagrams(
    index: &[u64],
    anagram_map: &HashMap<u64, usize>,
    anagram_sets: &[AnagramSet],
    dictionary: &[u8],
) -> String {
    let mut output: String = std::string::String::from("");
    for hash in index {
        if let Some(anagram_sets_idx) = anagram_map.get(hash).copied() {
            let size = anagram_sets[anagram_sets_idx as usize].word_slices.len();
            if size > 1 {
                let mut separator = "";
                for (i, slice) in anagram_sets[anagram_sets_idx as usize].word_slices.iter().enumerate() {
                    let begin = slice.begin;
                    let end = slice.end;
                    let word = &dictionary[begin..end];
                    output += separator;
                    output += &String::from_utf8_lossy(word);

                    if i == 0 {
                        separator = ": ";
                    } else {
                        separator = ", ";
                    }
                }
                output += &line_break();
            }
        }
    }
    output
}

fn find_anagrams(
    index: &mut Vec<u64>,
    anagram_map: &mut HashMap<u64, usize>,
    anagram_sets: &mut Vec<AnagramSet>,
    dictionary: &[u8],
) {
    let mut word_index = 0;
    let mut character_index = 0;
    let mut reject = false;
    let mut hash: u64 = 1;

    for &c in dictionary {
        if c.is_ascii_lowercase() {
            // We are scanning a valid word
            let prime_index = (c - b'a') as usize;
            hash = hash.wrapping_mul(PRIMES[prime_index].into());
            character_index += 1;
        } else if c == b'\n' {
            // We have hit the end of a word, use the word if it's valid
            if !reject {
                // Do we have a word with this key (potential anagram)?
                let word_spec = SliceSpec {
                    begin: word_index,
                    end: character_index,
                };
                match anagram_map.get_mut(&hash).copied() {
                    Some(idx) => {
                        // Found: Append it to the existing anagram set
                        anagram_sets[idx].push(word_spec);
                    }
                    None => {
                        // Not found: Add it to the map as start of new anagram set.
                        // Make a new anagram set with one word in it.
                        let anagram_set = AnagramSet::new(word_spec);
                        // Add the new anagram set to our list of anagram sets
                        anagram_map.insert(hash, anagram_sets.len());
                        anagram_sets.push(anagram_set);

                        // And add the new anagram set to index
                        index.push(hash);
                    }
                }
            }
            character_index += 1;
            word_index = character_index;
            hash = 1;
            reject = false;
        } else {
            // Invalid character
            hash = 1;
            reject = true;
            character_index += 1;
        }
    }
}

// One prime number for each lower case letter of the alphabet
static PRIMES: [u8; 26] = [
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
    101,
];

pub fn anagrams(dictionary: &[u8]) -> String {
    // Container for sets of anagrams
    // An anagram set is simply an array of offets into the anagram_sets array
    let mut anagram_map = HashMap::with_capacity(376877);

    // Vector of AnagramSets
    let mut anagram_sets = Vec::with_capacity(376877);

    // An ordered index of anagram set keys
    let mut index = Vec::with_capacity(376877);

    find_anagrams(&mut index, &mut anagram_map, &mut anagram_sets, &dictionary);
    let output: String = output_anagrams(&index, &anagram_map, &anagram_sets, &dictionary);
    output
}

#[wasm_bindgen]
pub fn anagrams_html(s: String) -> String {
    let output: String = anagrams(s.as_bytes());
    output
}

// Called when the wasm module is instantiated
#[cfg(target_arch = "wasm32")]
#[wasm_bindgen(start)]
pub fn start() -> Result<(), JsValue> {
    // Use `web_sys`'s global `window` function to get a handle on the global
    // window object.
    //let window = web_sys::window().expect("no global `window` exists");
    //let document = window.document().expect("should have a document on window");
    //let body = document.body().expect("document should have a body");

    // Manufacture the element we're gonna append
    //let val = document.create_element("p")?;
    //val.set_inner_html("Hello from Rust!");
    //body.append_child(&val)?;

    Ok(())
}

fn main() {
    match std::fs::read("/usr/share/dict/british-english-insane") {
        // Takes 25ms on PC
        Ok(dictionary) => {

            let mut start = Instant::now();
            let output1 = anagrams(&dictionary);
            let mut end = Instant::now();
            let mut elapsed = end - start;
            eprintln!("{}ms", elapsed.as_nanos() / 1000_000);

            start = Instant::now();
            let output2 = anagrams(&dictionary);
            end = Instant::now();
            elapsed = end - start;
            eprintln!("{}ms", elapsed.as_nanos() / 1000_000);

            let stdout = io::stdout();
            let mut stdout_handle = stdout.lock();
            match stdout_handle.write_all(output1.as_bytes()) {
                Ok(()) => {}
                Err(e) => eprintln!("Error writing reult {}", e),
            }

            match stdout_handle.write_all(output2.as_bytes()) {
                Ok(()) => {}
                Err(e) => eprintln!("Error writing reult {}", e),
            }
        }
        Err(e) => {
            eprintln!("Error reading dictionary: {}", e);
        }
    }
}

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 8:08 pm

Heater wrote:
Fri Aug 09, 2019 7:20 pm
I was about to ask: If anyone out there with a Pi 4 based Turboencabulator could run the latest Rust anagram solution?
I'm having trouble compiling main.rs because I seem to be missing Google's hashbrown library. I tried a web search and found

https://en.wikipedia.org/wiki/Hash_browns

Image

Installing the hashbrowns clogged the flux ports on the entropy distributor and made me feel a bit sick, so there may be a slight delay getting the results from the turboencabulator. Luckily, I didn't land on the "not to be confused with" link.
Last edited by ejolson on Sat Aug 10, 2019 3:16 am, edited 3 times in total.

Heater
Posts: 13088
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 8:19 pm

I think the best thing to do is clone the git repository and follow the build/run instructions on the repos front page.

Probably the WASM parts of the build for node.js will fail on a Pi but never mind (Seems building for the WASM target on ARM is busted at this time)

User avatar
John_Spikowski
Posts: 1330
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 10:43 pm

I'm guessing that once the HASH extension module is fixed the ScriptBasic time will be under 10 seconds.

Heater
Posts: 13088
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 11:12 pm

OK. Bets are on.

If you can get it down to 1 hour I'll be impressed.

But I'm wondering how come ScriptBasic slowed down from nearly 5 hours to 12 hours in recent reports by ejolson:

17123 seconds here: https://www.raspberrypi.org/forums/view ... 0#p1512079

43562 seconds here: https://www.raspberrypi.org/forums/view ... 0#p1515612

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 11:20 pm

Heater wrote:
Fri Aug 09, 2019 11:12 pm
But I'm wondering how come ScriptBasic slowed down from nearly 5 hours to 12 hours in recent reports by ejolson:

17123 seconds here: https://www.raspberrypi.org/forums/view ... 0#p1512079

43562 seconds here: https://www.raspberrypi.org/forums/view ... 0#p1515612
The previous times were those you reported from an x86 machine. In a day or three I should have correct timings for the Pi 3B+.

Heater
Posts: 13088
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 11:31 pm

Ah, yeah, I remember.

Luckily it's on a log scale so the factor or 2 or 3 in Pi/PC performance hardly shows up!

How goes the latest Rust effort on the encabulator?

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 11:44 pm

Heater wrote:
Fri Aug 09, 2019 11:31 pm
Ah, yeah, I remember.

Luckily it's on a log scale so the factor or 2 or 3 in Pi/PC performance hardly shows up!

How goes the latest Rust effort on the encabulator?
I've been eating the burgers while cleaning potatoes out of the flux ports. It appears the salt created rust on the entropy distributor, but that might be helpful. There seems to be some dog hair in there too.

Here is the post where I got the timings for the x86 machine.
Last edited by ejolson on Sat Aug 10, 2019 3:18 am, edited 2 times in total.

Heater
Posts: 13088
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Aug 09, 2019 11:55 pm

If there is one thing worse than dog hair in the encabulator's entropy distributor it's salt water on the reaction core.

It causes rapid accumulation of Rust. Particularly a problem when you live by the C.

ejolson
Posts: 3407
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sat Aug 10, 2019 1:44 am

Heater wrote:
Fri Aug 09, 2019 11:55 pm
If there is one thing worse than dog hair in the encabulator's entropy distributor it's salt water on the reaction core.

It causes rapid accumulation of Rust. Particularly a problem when you live by the C.
Much to my astonishment

Code: Select all

$ cargo build --release
automatically downloaded, compiled and installed 63 crates. In addition to the popular bumpalo and wheedle crates, this included the necessary hashbrown crate. After my experience with fried potatoes in the flux ports, I simply let cargo do its thing until finally the global warming was over. At that point main.rs was compiled and insane-british-anagram appeared mysteriously in the target/release subdirectory.

The output obtained by running the executable contained two copies of the anagram list and consequently didn't agree with the md5sum of the reference Perl program. I showed the code to the lead developer of FidoBasic who growled that the new turbo hair dryer kept pulling and tangling the dog's hair.

Fido then deleted some lines so the main routine looked like

Code: Select all

fn main() {^M
    match std::fs::read("/usr/share/dict/british-english-insane") {^M
        // Takes 25ms on PC^M
        Ok(dictionary) => {^M
            let output1 = anagrams(&dictionary);^M
            let stdout = io::stdout();^M
            let mut stdout_handle = stdout.lock();^M
            match stdout_handle.write_all(output1.as_bytes()) {^M
                Ok(()) => {}^M
                Err(e) => eprintln!("Error writing reult {}", e),^M
            }^M
        }^M
        Err(e) => {^M
            eprintln!("Error reading dictionary: {}", e);^M
        }^M
    }^M
}^M
I asked why there were ^M's at the end of each line. In reply the canine coder whined something about the digital apocalypse--then was quiet.

Running on the Pi 4B resulted in

Code: Select all

$ mv target/release/insane-british-anagram iba-main
$ time ./iba-main >iba-main.insane

real    0m0.462s
user    0m0.291s
sys 0m0.171s
$ time ./iba-main >iba-main.insane

real    0m0.467s
user    0m0.298s
sys 0m0.169s
$ time ./iba-main >iba-main.insane

real    0m0.465s
user    0m0.283s
sys 0m0.182s
$ md5sum iba-main.insane selfgrams.pl.insane 
addeda36a4631afc983f3bff13742e36  iba-main.insane
addeda36a4631afc983f3bff13742e36  selfgrams.pl.insane
The bar chart has been updated.

Have you watched this presentation about cache-friendly hash tables?

Return to “General programming discussion”