User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 6:50 pm

@ejolson,

Would you be kind enough to sudo code how using an associative array may work?

ejolson
Posts: 4077
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 7:58 pm

John_Spikowski wrote:
Sat Jul 27, 2019 6:50 pm
Would you be kind enough to sudo code how using an associative array may work?
According to this webpage, ScriptBasic uses curly braces to denote an associative array. The typical algorithm might look like
  1. Read a word as WS from the insane British dictionary. If no more words, goto step 6.
  2. Check that all letters are lower case. If not, goto step 1.
  3. Sort the letters of WS to construct a key KS for the associative array. Alternatively, use a hashing method, as discussed in many of the posts above, to construct the key.
  4. Add WS to the associative array using something like

    Code: Select all

    AS{KS}=AS{KS} & ", " & WS
  5. Goto step 1.
  6. Start reading the dictionary again from the beginning.
  7. Read a word as WS from the insane British dictionary. If no more words, stop.
  8. Check that all letters are lower case. If not, goto step 7.
  9. Construct the key KS from the word WS in the same way as performed in step 3.
  10. If the entry AS{KS} contains more than one word it is a list of anagrams. In that case, format the line correctly for printing, print it and finally remove it with code such as

    Code: Select all

    AS{KS}=""
  11. Goto step 7.
A concrete example that is based on associative arrays may be found in this Python code.

gkreidl
Posts: 6174
Joined: Thu Jan 26, 2012 1:07 pm
Location: Germany

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 8:46 pm

ejolson wrote:
Sat Jul 27, 2019 6:32 pm
How big is the largest integer that appears in the prime number algorithm? What assignment of prime numbers to letters minimises that number?
Biggest hash:
208741561674716660518143595093314485576064635609153569802552562500
in Hex:
0x1fb6c315794d440acd056a8a6ef9865c586884adb355dede5235344

prime_numbers = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101]
Minimal Kiosk Browser (kweb)
Slim, fast webkit browser with support for audio+video+playlists+youtube+pdf+download
Optional fullscreen kiosk mode and command interface for embedded applications
Includes omxplayerGUI, an X front end for omxplayer

ejolson
Posts: 4077
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 9:00 pm

gkreidl wrote:
Sat Jul 27, 2019 8:46 pm
ejolson wrote:
Sat Jul 27, 2019 6:32 pm
How big is the largest integer that appears in the prime number algorithm? What assignment of prime numbers to letters minimises that number?
Biggest hash:
208741561674716660518143595093314485576064635609153569802552562500
in Hex:
0x1fb6c315794d440acd056a8a6ef9865c586884adb355dede5235344

prime_numbers = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101]
If you swap 3 with 71 in the above list, does the maximum increase or decrease?

Heater
Posts: 14288
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 9:05 pm

ejolson,

I thought about GMP. Meh, that's cheating.

gkreidl has beaten me to it, I was preparing this list of fascinating numbers and words:

For your reading pleasure here is the biggest integer produced by the primeHash, preceded by some interesting maximums it found along the way:

a 2
aaa 8
aaerially 16395154952
aaerialness 9699577409896
aardwolves 14353576221908
abalienation 31882652141736
abandonments 673429603929852
abarticulation 106555126311709080
abdominoanterior 1016389111132065569052
abdominohysterectomy 16154021201112963451594978590
acetylphenylhydrazine 82867095735835194491994004180
adrenocorticosteroids 169014055156743597689597520350
adrenocorticotrophins 1418584532340018322069637958650
adrenoleukodystrophies 129911950569011937339428441697286
anthropomorphologically 4162001394160724346696210600733780
antidisestablishmentarianism 10816738360843396874796178492197077328
antidisestablishmentarianisms 724721470176507590611343958977204180976
cyclotrimethylenetrinitramine 1816546187380118721203679993771742512144550
diaminopropyltetramethylenediamine 195794488566152155619577322001494110603013566376
pneumonoultramicroscopicsilicovolcanoconioses 99832920800951446334764328088106927884204825726116924688177312500
pneumonoultramicroscopicsilicovolcanoconiosis 208741561674716660518143595093314485576064635609153569802552562500

Of those "antidisestablishmentarianisms" is what we considered to be the longest word in English when I was in school.

It's not clear to me how one would squish letter frequencies into 96 bits.
Last edited by Heater on Sat Jul 27, 2019 9:18 pm, edited 1 time in total.
Memory in C++ is a leaky abstraction .

gkreidl
Posts: 6174
Joined: Thu Jan 26, 2012 1:07 pm
Location: Germany

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 9:08 pm

ejolson wrote:
Sat Jul 27, 2019 9:00 pm
if you swap 3 with 71 in the above list, does the maximum increase or decrease?
8820065986255633543020151905351316291946393053907897315600812500
0x1570bd93ed0c92f546194ba2188bd908589d3209432806723ec9d4
Minimal Kiosk Browser (kweb)
Slim, fast webkit browser with support for audio+video+playlists+youtube+pdf+download
Optional fullscreen kiosk mode and command interface for embedded applications
Includes omxplayerGUI, an X front end for omxplayer

Heater
Posts: 14288
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 9:17 pm

John_Spikowski
Would you be kind enough to sudo code how using an associative array may work?
A concrete example that is based on associative arrays may be found in this Javascript code.

https://www.raspberrypi.org/forums/view ... 5#p1507542

I know you have a downer of JS but oddly enough I find it makes a really good pseudo code from which to develop a solution in C/C++ or whatever.
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 9:19 pm

Thanks @ejolson for the example. It gives me something to run with.

gkreidl
Posts: 6174
Joined: Thu Jan 26, 2012 1:07 pm
Location: Germany

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 11:01 pm

An optimized prime list based on character frequency :
[7, 61, 29, 41, 2, 67, 53, 47, 3, 101, 73, 23, 43, 11, 13, 37, 97, 17, 5, 19, 31, 71, 79, 83, 59, 89]

Largest number is now:
47581657613600031769481261874157211293117463088750
0x208e8351dede28a8f5dbf77ba51925c1458d734e6e
Minimal Kiosk Browser (kweb)
Slim, fast webkit browser with support for audio+video+playlists+youtube+pdf+download
Optional fullscreen kiosk mode and command interface for embedded applications
Includes omxplayerGUI, an X front end for omxplayer

ejolson
Posts: 4077
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 1:51 am

gkreidl wrote:
Sat Jul 27, 2019 11:01 pm
An optimized prime list based on character frequency :
[7, 61, 29, 41, 2, 67, 53, 47, 3, 101, 73, 23, 43, 11, 13, 37, 97, 17, 5, 19, 31, 71, 79, 83, 59, 89]

Largest number is now:
47581657613600031769481261874157211293117463088750
0x208e8351dede28a8f5dbf77ba51925c1458d734e6e
That was quick work. How did you find the optimal arrangement? While 168 bits is still more than 64 or even 96, it's significantly less than before.

I'm planning to create an Insane British Anagram Bar Chart of Fame by running all the entries on an actively-cooled non-throttling Raspberry Pi 4B. Unfortunately, Micro Center sold their Pi 4 stock last week. Therefore, except for a couple Zeros and SD cards, my shopping resulted in failure.

However, if anyone volunteers to run the programs on their Pi 4, I'd be happy to make the chart.

gkreidl
Posts: 6174
Joined: Thu Jan 26, 2012 1:07 pm
Location: Germany

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 3:46 am

ejolson wrote:
Sun Jul 28, 2019 1:51 am
gkreidl wrote:
Sat Jul 27, 2019 11:01 pm
An optimized prime list based on character frequency :
[7, 61, 29, 41, 2, 67, 53, 47, 3, 101, 73, 23, 43, 11, 13, 37, 97, 17, 5, 19, 31, 71, 79, 83, 59, 89]

Largest number is now:
47581657613600031769481261874157211293117463088750
0x208e8351dede28a8f5dbf77ba51925c1458d734e6e
That was quick work. How did you find the optimal arrangement? While 168 bits is still more than 64 or even 96, it's significantly less than before.

I'm planning to create an Insane British Anagram Bar Chart of Fame by running all the entries on an actively-cooled non-throttling Raspberry Pi 4B. Unfortunately, Micro Center sold their Pi 4 stock last week. Therefore, except for a couple Zeros and SD cards, my shopping resulted in failure.

However, if anyone volunteers to run the programs on their Pi 4, I'd be happy to make the chart.
In fact it was quite simple. I counted the number of character ocurrences of a-z in the British insane Dictionary. Then I made sure, that the character with the highest number (e) uses prime 2, the next uses 3 etc.
Minimal Kiosk Browser (kweb)
Slim, fast webkit browser with support for audio+video+playlists+youtube+pdf+download
Optional fullscreen kiosk mode and command interface for embedded applications
Includes omxplayerGUI, an X front end for omxplayer

Heater
Posts: 14288
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 5:15 am

I plugged gkreidl's letter frequency ordered primes into the primeHash in C++.

After multiple runs I couldn't see any appreciable difference in run time, lost in the noise.

I suspect all the over heads of making Bints and such dominates the little multiplication loop time. Of course using 1.5 gigs of RAM to get the job done is massively cache unfriendly.

It's appalling that the C++ using Bints is 3 times slower than Javascript using its BigInts !

What it needs is a customized NotSoBigInteger multiply and comparison. Or use GMP. I leave that as an exercise for the reader...
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 5:40 am

I was able to reduce the time from 1 minute and 35 seconds to 13 seconds using the associative array method.

Code: Select all

flen = FILELEN("tail.dat")
OPEN "tail.dat" FOR INPUT AS #1
fraw = INPUT(flen, 1)
SPLITA fraw BY "\n" TO wordlist
CLOSE(1)

FOR lstidx = 0 TO UBOUND(wordlist)
  SPLITA wordlist[lstidx] BY "" TO wordarray
  FOR wrdidx = 0 TO UBOUND(wordarray)
    IF wordarray[wrdidx] < "a" OR wordarray[wrdidx] > "z" THEN GOTO NextWord
  NEXT
  SPLITA wordlist[lstidx] BY "" TO thisword

  FOR i = 0 TO UBOUND(thisword)
    FOR j = i + 1 TO UBOUND(thisword)
      IF thisword[i] > thisword[j] THEN
        temp = thisword[i]
        thisword[i] = thisword[j]
        thisword[j] = temp
      END IF
    NEXT
  NEXT
  FOR x = 0 TO UBOUND(thisword)
    sortword &= thisword[x]
  NEXT
  IF anagram{sortword} = undef THEN
    anagram{sortword} = wordlist[lstidx] & ":"
  ELSE
    anagram{sortword} &= " " & wordlist[lstidx] & ","
  END IF
  sortword = ""
NextWord:
NEXT
FOR anaidx = 0 TO UBOUND(anagram)
  IF RIGHT(anagram[anaidx], 1) = "," THEN
    PRINT LEFT(anagram[anaidx], LEN(anagram[anaidx]) - 1), "\n"
  END IF
NEXT
Output

Code: Select all

[email protected]:~/sb/ejolson$ time scriba ana2.sb
wheedle: wheeled
wheer: where
whein: whine
whemmel: whemmle
whenso: whosen
whereat: wreathe
whereer: wherere
wherrit: whirret, writher
whilend: whindle
whilter: whirtle
whipray: yarwhip
whips: whisp
whirliest: wiltshire
whirrets: writhers
whirtles: whistler
whissing: wishings
whist: whits, wisht, withs
whister: withers, writhes
whisting: whitings
whistling: whitlings
whit: with
white: withe
whited: withed
whiten: withen
whitepot: whitetop
whiter: wither, wrihte, writhe
whites: withes
whitest: withset
whitewares: wreathwise
whitewood: withewood
whitier: withier
whities: withies
whitiest: withiest
whitin: within
whiting: withing
whitret: whitter
whitrets: whitster, whitters
whity: withy
whomsoever: whosomever
whoopies: whoopsie
whooses: wooshes
whort: worth, wroth
whorts: worths
whoso: woosh
wicht: witch
wickeder: wickered
widdle: wilded
wide: wied
widely: wieldy
widen: wined
wider: wierd, wired, wride, wried
wides: wised
widest: wisted
widgeon: wongied
wiel: wile
wield: wiled
wiels: wiles
wigeons: wongies
wiggler: wriggle
wigglers: wriggles
wildflower: wildfowler
wildflowers: wildfowlers
wildness: windless
willest: willets
wilroun: wournil
wilting: witling
wimpler: wrimple
windel: windle
windocks: windsock
windores: windrose
windroses: wordiness
wines: wisen
winkel: winkle
winkler: wrinkle
winklers: wrinkles
winklot: wotlink
winnel: winnle
winster: winters
winzes: wizens
wipings: wisping
wips: wisp
wirble: wrible
wirer: wrier
wirerooms: worrisome
wires: wiser, wries
wisents: witness
wises: wisse
wisest: witess
wisewoman: womanwise
wishes: wisshe
wishmay: yahwism
wissel: wissle
wist: wits
wiste: wites
wister: wriest, writes
withdrawers: witherwards
withering: wrightine
witherso: worthies
wiver: wrive
woeful: woulfe
woiwodes: woodwise
wolfkins: wolfskin
wolof: woolf
wolter: wortle
wonnot: wonton
wooden: wooned
woodlark: workload
woodlarks: workloads
woodnote: woodtone
woodnotes: woodstone, woodtones
woodworm: wormwood
woodworms: wormwoods
wordier: worried
workyard: yardwork
worst: worts
worthful: wrothful
worthily: wrothily
worthiness: wrothiness
worthy: wrothy
wost: wots
woy: yow
wraitly: wrytail
wrate: wreat
wrist: writs
wrister: writers
wristing: writings
wye: yew
wyes: yews
xanthein: xanthine
xantheins: xanthines
xiv: xvi
xix: xxi
xxiv: xxvi
xylidin: xylinid
xylitone: xylonite
xylomas: xylosma
yachan: yancha
yachtmanships: yachtsmanship
yade: yead
yager: yerga
yamun: yuman
yander: yarned
yare: year
yarely: yearly
yarta: yatra
yartas: yatras
yate: yeat, yeta
yates: yeast, yeats
yearend: yearned
yede: yeed
yedes: yeeds
yedo: yode
yees: yese
yelk: ylke
yelks: ylkes
yelm: ylem
yelms: ylems
yeo: yoe
yeti: yite
yetis: yites
yeuk: yuke
yeuks: yukes
yirm: ymir
yodel: yodle
yodels: yodles
youk: yuko
youks: yukos
yuckel: yuckle
zambo: zomba
zaminder: zemindar
zaniest: zeatins
zanze: zazen
zanzes: zazens
zapus: zupas
zari: zira
zedonk: zonked
zein: zine
zeins: zines
zemmi: zimme
zendic: zinced
zendik: zinked
zendo: zoned
zenick: zincke
zeno: zone
zinciest: zincites
zinco: zonic
zolotink: zolotnik
zoography: zoogrpahy

real	0m12.901s
user	0m12.775s
sys	0m0.108s
[email protected]:~/sb/ejolson$ 

Heater
Posts: 14288
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 6:58 am

I broke down. I cheated. I couldn't resist unleashing the awesome numeric power of GMP on the C++ anagram finders primeHash() function.

With interesting results:

Code: Select all

[email protected]:~ $ g++ -Wall -O3 -I. -o insane-british-anagram insane-british-anagram.cpp
[email protected]:~ $ g++ -Wall -O3 -I. -o insane-british-anagram-bint insane-british-anagram-bint.cpp
[email protected]:~ $ g++ -Wall -O3 -I. -o insane-british-anagram-gmp insane-british-anagram-gmp.cpp -lgmp
[email protected]:~ $ time ./insane-british-anagram > anagrams-1.txt

real    0m1.560s
user    0m1.364s
sys     0m0.072s
[email protected]:~ $ time ./insane-british-anagram-bint > anagrams-2.txt
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

real    0m8.392s
user    0m7.334s
sys     0m1.028s
[email protected]:~ $ time ./insane-british-anagram-gmp > anagrams-3.txt

real    0m12.161s
user    0m12.003s
sys     0m0.131s
[email protected]:~ $ time node insane-british-anagram.js > anagrams-4.txt
rss 112.34 MB
heapTotal 80.87 MB
heapUsed 76.89 MB
external 0.6 MB

real    0m8.523s
user    0m9.705s
sys     0m0.181s
[email protected]:~ $ time ./selfgrams.pl > anagrams-5.txt

real    0m11.406s
user    0m11.215s
sys     0m0.160s
[email protected]:~ $ md5sum anagrams-*.txt
addeda36a4631afc983f3bff13742e36  anagrams-1.txt
d41d8cd98f00b204e9800998ecf8427e  anagrams-2.txt
addeda36a4631afc983f3bff13742e36  anagrams-3.txt
addeda36a4631afc983f3bff13742e36  anagrams-4.txt
addeda36a4631afc983f3bff13742e36  anagrams-5.txt
[email protected]:~ $
In summary:

insane-british-anagram 1.560s
insane-british-anagram.js 8.523s
selfgrams.pl 11.406s
insane-british-anagram-gmp 12.161s
insane-british-anagram-bint FAIL

The fastest solution is my C++ using integer hashes. If you are prepared to accept that it suffers from overflows and may give incomplete/incorrect results if new long words get added to the dictionary. For now it's correct and "good enough".

If that is not acceptable then the fastest solution is Javascript!

Using GMP for the prime hash uses less memory than my Bint but is disappointingly slow.

My Bint just fails with out of memory on a Pi 3.

I don't want to hear anymore know it all's making fun of Javascript. They should bow their heads with respect.
Memory in C++ is a leaky abstraction .

Heater
Posts: 14288
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 1:58 pm

Yayyy... ScriptBasic completes the Insane British Anagram challenge correctly:

I killed off the old scriba effort after running for over a day and used John's new solution with assoc. arrays on my x86 PC:

Code: Select all

 
$ time scriba insane-british-anagram.sb > insane-british-anagram-sb.txt

real    285m23.856s
user    256m19.672s
sys     27m12.656s
$ md5sum insane-british-anagram-sb.txt
addeda36a4631afc983f3bff13742e36  insane-british-anagram-sb.txt
$

Only 4 hours and 45 minutes.

During the time that was running the temperature round here rose from 34C to 42C in the shade.

I'm not suggesting cause and effect there but it is notable given how far north I am. I dare not run this on my Pi for fear of it melting!

Now, that is 8500 times slower than the original Perl solution. One wonders what scriba is doing in there? Are those associative arrays implemented as linked lists?
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 3:24 pm

Thanks Heater for running the full dictionary.
Are those associative arrays implemented as linked lists?
Yes.

Any points for readability and code size?

jalih
Posts: 95
Joined: Mon Apr 15, 2019 3:54 pm

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 5:50 pm

I modified 8th anagram program and replaced key sorting with prime hash resulting a nice speed boost:

Code: Select all

\ Find anagrams in a word list:

m:new constant anamap
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101] constant primes

: prime-hash  \ s -- s
  1 swap
  ( 'a n:- primes swap a:@ nip n:*
  ) s:each!
  "%x" strfmt ;

: process-words \ word --
  /^[a-z]{3,}$/ r:match if
    r:str nip dup >r prime-hash
    anamap over m:@ null? if
      drop swap a:new r> a:push m:!
    else
      r> a:push 2drop
    then
  then drop ;

\ load and process the file requested:
: read-and-check-words  \ fname --
  f:slurp ' process-words s:eachline ;

\ for sorting the word keys array:
: key-val-cmp         \ key1 key2 -- cmp
  anamap swap m:@     \ key1 anamap key2val
  swap rot m:@        \ val2 anamap val1
  0 a:@ nip           \ val2 anamap v1
  rot 0 a:@ nip       \ map v1 v2
  s:cmp nip ;

: sort-keys-by-first-word \ keys1 -- keys2
  ' key-val-cmp a:sort ;

: app:main
  0 args scriptdir "unixdict.txt" s:+ ?:
  d:msec >r
  read-and-check-words

  \ get an array of those keys which have at least 2 elements:
  a:new anamap ( a:len nip 1 n:> if a:push else drop then ) m:each swap

  \ print the anagram list in order of first-word:
  sort-keys-by-first-word
  tuck m:@@ nip ( ( . space ) a:each! cr drop ) a:each! drop

  \ get stats:
  a:len d:msec r> n:- 1000 /mod swap 3 a:close
  "\n%d anagrams found in %d.%03d seconds\n" s:strfmt .
  bye ;

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 6:34 pm

I'm going to try and replace the associative array part with the ScriptBasic hash extension module. This should reduce the time even further.

ejolson
Posts: 4077
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 6:39 pm

Heater wrote:
Sun Jul 28, 2019 1:58 pm
I dare not run this on my Pi for fear of it melting!
I'm pretty sure Pi computers generate less heat than Linux subsystem running on a Windows desktop.

It is convenient that the entries of an associative array in Script Basic can be read out in the order they were inserted using square braces. This, along with the length of the runtime, suggests that some sort of linear structure such as a linked list is being used behind the scenes.

The code itself seems clear, concise and well-written to me.

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 6:46 pm

Thank you @ejolson!

My guess is the hash extension module version should make you smile more.

ejolson
Posts: 4077
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 6:53 pm

jalih wrote:
Sun Jul 28, 2019 5:50 pm
I modified 8th anagram program and replaced key sorting with prime hash resulting a nice speed boost
Does the new 8-th code use big number arithmetic or integer arithmetic modulo 2^64 to compute the hash?

Aside from the word "zoogrpahy," one of the things I find most insane about the British anagram challenge is how big number arithmetic is somehow making things faster. Do you think Fibonacci heaps could help as well?

jalih
Posts: 95
Joined: Mon Apr 15, 2019 3:54 pm

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 7:07 pm

ejolson wrote:
Sun Jul 28, 2019 6:53 pm
jalih wrote:
Sun Jul 28, 2019 5:50 pm
I modified 8th anagram program and replaced key sorting with prime hash resulting a nice speed boost
Does the new 8-th code use big number arithmetic or integer arithmetic modulo 2^64 to compute the hash?
8th promotes number to big-int and to big-float if needed. I can just do the math and format result as hex string.

Code: Select all

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101] constant primes

: prime-hash  \ s -- s
  1 swap
  ( 'a n:- primes swap a:@ nip n:*
  ) s:each!
  "%x" strfmt ;

User avatar
scruss
Posts: 2752
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 7:32 pm

Heater wrote:
Sat Jul 27, 2019 9:05 pm
It's not clear to me how one would squish letter frequencies into 96 bits.
Since we're¹ not above developing custom hash functions here to fit the wbritish-insane list, consider the subset matching /^[a-z]+$/² :
  • the word with the most repeated letters is floccinaucinihilipilification, with 9 letter ‘i”s
  • the letter frequency, from least to most (l→r), is jqzxwkvfbyghmdpucltronasie
  • the word with the most letter ‘j’s is ajaja with only two ‘j’s
  • if you create a decimal hash based on the frequency table above (e → units, i → tens, s → hundreds, … , j → 10²⁶) the largest unique value for this input will be less than 3 × 10²⁶
  • 3 × 10²⁶ fits snugly into 88 bits (just: 2⁸⁸ is 3.09485 × 10²⁶)
The decimals won't overflow, as the highest count is 9. You'll have to keep track of significant zeroes too. The implementation is left to someone else, as I'm too busy putting old Apple II user group disks up on the internet archive.

---
¹: well, some of us aren't above it. Note that Perl's using its standard dynamic hash algorithm in my code and it still manages a decent speed. Even pre-allocating a big static hash with 2²⁰ entries (%grams = 2**20;) doesn't make it usefully quicker. All that money that booking.com put into Perl optimization really paid off.
²: PCRE 4 lyfe!
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 7:46 pm

You may not be perfect but I would never ask you to leave the room.

User avatar
scruss
Posts: 2752
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: Project Digital Apocalypse Not Now

Sun Jul 28, 2019 7:56 pm

John_Spikowski wrote:
Sun Jul 28, 2019 7:46 pm
You may not be perfect but I would never ask you to leave the room.
Nobody's perfect, and so am I …

Incidentally, I saw in you last ScriptBASIC example that it featured an entire sort of the array. The rest of the solutions have been relying on the list being already sorted and the anagrams appearing in sorted order. This saves a lot of time.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

Return to “General programming discussion”