Page 31 of 67
Re: Project Digital Apocalypse Not Now
Posted: Mon Jul 22, 2019 6:17 pm
by Heater
ejolson,
The JavaScript timings look promising. I wonder what needs to be done so that code runs on the Raspberry Pi. The incompatibility reminds me of Basic.
Nothing. It runs just fine. See below.
What incompatibility? It's the same node.js engine running the language to the same ECMA standard.
Nothing like BASIC at all.
OK. On a Pi 3 (not +) with out of the box Buster and node.js install I get this:
Code: Select all
pi@pi3-buster-1:~ $ uname -a
Linux pi3-buster-1 4.19.50-v7+ #896 SMP Thu Jun 20 16:11:44 BST 2019 armv7l GNU/Linux
pi@pi3-buster-1:~ $ node -v
v10.15.2
pi@pi3-buster-1:~ $ time node selfgrams.js > selfgrams_js.txt
rss 147.39 MB
heapTotal 114.88 MB
heapUsed 100.21 MB
external 0.01 MB
real 0m16.848s
user 0m18.010s
sys 0m0.401s
pi@pi3-buster-1:~ $ time ./selfgrams.pl > selfgrams_pl.txt
real 0m11.214s
user 0m11.042s
sys 0m0.171s
pi@pi3-buster-1:~ $ md5sum selfgrams_pl.txt selfgrams_js.txt
addeda36a4631afc983f3bff13742e36 selfgrams_pl.txt
addeda36a4631afc983f3bff13742e36 selfgrams_js.txt
Note the heap use report I added there.
What was running on what exactly to cause that FATAL ERROR?
Re: Project Digital Apocalypse Not Now
Posted: Mon Jul 22, 2019 6:52 pm
by Heater
node.js version 12.6.0 is doing much better. Speed is up. Memory consumption is down:
Code: Select all
pi@pi3-buster-1:~ $ node -v
v12.6.0
pi@pi3-buster-1:~ $ time ./selfgrams.pl > selfgrams_pl.txt
real 0m11.297s
user 0m11.161s
sys 0m0.111s
pi@pi3-buster-1:~ $ time node selfgrams.js > selfgrams_js.txt
rss 119.24 MB
heapTotal 83.82 MB
heapUsed 77.32 MB
external 0.64 MB
real 0m11.817s
user 0m13.635s
sys 0m0.292s
pi@pi3-buster-1:~ $ md5sum selfgrams_js.txt selfgrams_pl.txt
addeda36a4631afc983f3bff13742e36 selfgrams_js.txt
addeda36a4631afc983f3bff13742e36 selfgrams_pl.txt
pi@pi3-buster-1:~ $
Re: Project Digital Apocalypse Not Now
Posted: Mon Jul 22, 2019 9:12 pm
by davidcoton
Heater wrote: ↑Mon Jul 22, 2019 5:13 pm
Give me two ticks...
But where in all Unicode do we find ticks?

Re: Project Digital Apocalypse Not Now
Posted: Mon Jul 22, 2019 9:40 pm
by Heater
✓✓

Re: Project Digital Apocalypse Not Now
Posted: Mon Jul 22, 2019 10:08 pm
by davidcoton
OK I'll bookmark your post, so I can mine it for ticks next time I need them.

Re: Project Digital Apocalypse Not Now
Posted: Tue Jul 23, 2019 3:39 pm
by Heater
jamesh will be glad to hear that he is correct about 32 bit vs 64 bit OS when it comes to this anagrams problem.
On a Pi 3 running pi64:
pi@pi64-aalto:~$ time node insane-british-anagram.js > insane-british-anagram.txt
rss 231.5 MB
heapTotal 194.57 MB
heapUsed 169.08 MB
external 0.01 MB
real 0m15.261s
user 0m15.748s
sys 0m0.572s
pi@pi64-aalto:~$ time ./selfgrams.pl > selfgrams.txt
real 0m11.489s
user 0m11.155s
sys 0m0.332s
Execution time is much the same. Memory usage is twice as much!
Re: Project Digital Apocalypse Not Now
Posted: Tue Jul 23, 2019 4:31 pm
by John Spikowski
Heater,
And you gave up crossword puzzles for this?
Re: Project Digital Apocalypse Not Now
Posted: Tue Jul 23, 2019 5:20 pm
by scruss
Heater wrote: ↑Mon Jul 22, 2019 6:17 pm
What was running on what exactly to cause that FATAL ERROR?
Your code, on a Raspberry Pi Zero, running whatever the stock version of node is (dunno, but old) on Stretch. I can't reach that particular machine right now to check.
I just ran it on a 3A+ running Stretch (node 8.11.1) and it completed in a rather tidy 19.7 s. ISTR node having problems on the single-core Raspberry Pis.
Re: Project Digital Apocalypse Not Now
Posted: Tue Jul 23, 2019 5:44 pm
by ejolson
scruss wrote: ↑Tue Jul 23, 2019 5:20 pm
Heater wrote: ↑Mon Jul 22, 2019 6:17 pm
What was running on what exactly to cause that FATAL ERROR?
Your code, on a Raspberry Pi Zero, running whatever the stock version of node is (dunno, but old) on Stretch. I can't reach that particular machine right now to check.
I just ran it on a 3A+ running Stretch (node 8.11.1) and it completed in a rather tidy 19.7 s. ISTR node having problems on the single-core Raspberry Pis.
Do you need a different binary for Node.js compiled as ARMv6 to run on the Pi Zero?
You might end up with wrong binary if switching cards between different flavours of Raspberry Pi, but I had problems with the default
openjdk-9 binaries in Raspbian on the Zero earlier. While I know Java and JavaScript are different, maybe something similar is happening in the present case.
Re: Project Digital Apocalypse Not Now
Posted: Tue Jul 23, 2019 5:46 pm
by Heater
scruss,
Thanks for looking into this. Something odd going on here.
I may still be wrong but as far as I can see there should be problem running this in 512MB RAM. I can only see it using a shade less than 90MB. Presuming there is not something else using a lot of RAM.
I don't have a Stretch here anymore but it does run under my Buster which has node 10.15.2 out of the box.
I have been running node.js on Pi since the first models, never heard of any problems running on a single core machine. Out of curiosity I disabled all but one core on a Pi 3. Everything runs fine in 14 seconds. A bit slower than with more cores.
Re: Project Digital Apocalypse Not Now
Posted: Tue Jul 23, 2019 5:48 pm
by Heater
ejolson,
Do you need a different binary for Node.js compiled as ARMv6 to run on the Pi Zero?
In theory the node that comes with Raspbian runs on all models of Pi. Like the rest of Raspbian.
Sadly I don't have a Zero here to experiment with.
Re: Project Digital Apocalypse Not Now
Posted: Tue Jul 23, 2019 11:00 pm
by scruss
Ah, it's all good now.
For some reason the Zero only had ~32 MB free memory and the GPU memory set very high. I fixed that. It works now. Sorry about that.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 2:11 am
by Heater
Phew. No worries. Mystery solved. I have done similar in the past.
Given that the British insane dictionary is 6.6MB it could be argued that my using nearly 100MB here is insane!
The way that JS works it:
Reads the entire file to RAM -- ~7MB,
builds the anagrams list -- another 7MB
then builds a formatted output string --- another 7MB.
That's 21MB. What is JS doing with the rest of it?
There has to be be about a million pointers used in there, so that's another 8MB.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 5:48 am
by ejolson
ejolson wrote: ↑Sun Jul 21, 2019 7:33 pm
I'm presently working on a version in classic line-numbered Basic that organizes all the anagrams into a
heap data structure.
At FidoBasic headquarters I entered what looked like a lift. It turned out to be a time machine. Suddenly I was standing in the midst of the golden age of personal computing, still holding my Raspberry Pi. In true canine fashion Fido's head turned to one side as I explained the insane British anagram challenge to the wondering crowd of teenage student Basic programmers. By the time I was finished explaining how with a challenge anyone who writes a working program is a winner, one of the children motioned to me.
With the hunt-and-peck quickness of a ninja and stealth, a working program had already been written without my notice. On the screen of a nearby 8-bit microcomputer was the following:
Code: Select all
1000 REM banagram.bas -- Find anagrams
1010 REM Written July 16, 1978 by Eric Olson
1020 REM
1030 REM This program demonstrates using letter counts and linear
1040 REM lists to find anagrams in the insane British dictionary.
1050 REM
2000 REM F$="/usr/share/dict/british-english-insane"
2001 F$="tail.dat"
2010 M0=0:OPEN F$ FOR INPUT AS #1
2015 IF EOF(1)<>0 THEN 2030
2020 INPUT #1,A$:M0=M0+1:GOTO 2015
2030 DIM L$(M0),K$(M0),V0(26)
2040 CLOSE #1:OPEN F$ FOR INPUT AS #1
2050 A1=ASC("a")-1:A2=ASC("0"):N0=0
2060 FOR I=1 TO M0:INPUT #1,A$
2063 FOR J=1 TO 26:V0(J)=0:NEXT J:J=0
2065 IF J>=LEN(A$) THEN 2075
2068 J=J+1:C=ASC(MID$(A$,J,1))-A1
2070 IF (C<1) OR (C>26) THEN 2130
2071 V0(C)=V0(C)+1:GOTO 2065
2075 V$="":FOR J=1 TO 26
2080 V$=V$+CHR$(V0(J)+A2):NEXT J:J=0
2090 IF J>=N0 THEN 2120
2100 J=J+1:IF K$(J)<>V$ THEN 2090
2110 L$(J)=L$(J)+" "+A$+",":GOTO 2130
2120 N0=N0+1:L$(N0)=A$+":":K$(N0)=V$
2130 NEXT I:CLOSE #1
2400 FOR I=1 TO N0:A$=L$(I)
2410 IF RIGHT$(A$,1)=":" THEN 2430
2420 PRINT LEFT$(A$,LEN(A$)-1)
2430 NEXT I
9000 END
Two things caught my attention: The first was the fact that the comments listed me as the author; the second that the program would surely run with non-ninja turtle-like slowness because it used an O(n^2) algorithm based on linear lists to match and store the anagrams.
I turned to Fido and whispered under my breath, for a program written by someone who has not formally studied algorithms at the university this code shows remarkable skill and insight. The canine sneezed and and looked at me as if I had just congratulated myself. Then with a bark the dog developer announced, let's see if it really works.
Since the 4K memory of the PET 2001 was not sufficient to hold the insane British dictionary and the algorithm was too slow anyway, I connected the Pi 3B to a nearby television and typed
Code: Select all
$ tail -n 10000 /usr/share/dict/british-english-insane >tail.dat
$ wc tail.dat
10000 10000 93607 tail.dat
$ time ./bwbasic banagram.bas >banagram-bw.tail
real 18m30.145s
user 18m29.419s
sys 0m0.120s
$ time ./sbrandy banagram-mb.bas >banagram-mb.tail
real 0m23.888s
user 0m23.797s
sys 0m0.071s
$ time ./gplbasic banagram.bas >banagram-gpl.tail
real 0m9.476s
user 0m9.460s
sys 0m0.011s
$ fbc -lang qb banagram.bas
$ time ./banagram >banagram-fbc.tail
real 0m5.330s
user 0m5.313s
sys 0m0.011s
$ time ./anagram.perl >perl.tail
real 0m0.226s
user 0m0.205s
sys 0m0.021s
$ md5sum *.tail
2f0427e2de42059e20bb335e25c5c1ca banagram-bw.tail
2f0427e2de42059e20bb335e25c5c1ca banagram-fbc.tail
2f0427e2de42059e20bb335e25c5c1ca banagram-gpl.tail
2f0427e2de42059e20bb335e25c5c1ca banagram-mb.tail
2f0427e2de42059e20bb335e25c5c1ca perl.tail
The code for Matrix Brandy Basic was slightly different and read as
Code: Select all
1000 REM banagram-mb.bas -- Find anagrams
1010 REM Written July 17, 1978 by Eric Olson
1020 REM
1030 REM This program demonstrates using letter counts and linear
1040 REM lists to find anagrams in the insane British dictionary.
1050 REM
2000 REM F$="/usr/share/dict/british-english-insane"
2001 F$="tail.dat"
2010 M0=0:F1=OPENIN(F$)
2015 IF EOF#(F1)<>0 THEN 2030
2020 A$=GET$#(F1):M0=M0+1:GOTO 2015
2030 DIM L$(M0),K$(M0),V0(26)
2040 CLOSE# F1:F1=OPENIN(F$)
2050 A1=ASC("a")-1:A2=ASC("0"):N0=0
2060 FOR I=1 TO M0:A$=GET$#(F1)
2063 FOR J=1 TO 26:V0(J)=0:NEXT J:J=0
2065 IF J>=LEN(A$) THEN 2075
2068 J=J+1:C=ASC(MID$(A$,J,1))-A1
2070 IF (C<1) OR (C>26) THEN 2130
2071 V0(C)=V0(C)+1:GOTO 2065
2075 V$="":FOR J=1 TO 26
2080 V$=V$+CHR$(V0(J)+A2):NEXT J:J=0
2090 IF J>=N0 THEN 2120
2100 J=J+1:IF K$(J)<>V$ THEN 2090
2110 L$(J)=L$(J)+" "+A$+",":GOTO 2130
2120 N0=N0+1:L$(N0)=A$+":":K$(N0)=V$
2130 NEXT I:CLOSE# F1
2400 FOR I=1 TO N0:A$=L$(I)
2410 IF RIGHT$(A$,1)=":" THEN 2430
2420 PRINT LEFT$(A$,LEN(A$)-1);CHR$(10);
2430 NEXT I
9000 END
Fido's paw motioned that is was time to go, so we headed to the time machine. Once inside I suggested, let's go back to the future so I can get a Pi 4B. Fido became barking mad and growled, the 4B has already been released. I replied, but there won't be such a short supply in the future so shipping times will be quicker.
Instead we returned to the present and Fido created a bar chart depicting the performance of the different versions of Basic.
Fido's tail happily wagged back and forth while explaining, I've used dogarithmic coordinates because the timings involved so many different orders of magnitude. I think the canine coder meant logarithmic.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 7:23 am
by Heater
Barking mad indeed.
I thought dogs calculated by chewing on bones, Napier's bones.
It's just as well you made that trip back in time. Else the code written my your young self there in the audience of teenage student Basic programmers would have been lost forever, neglected and forgotten as it never completed on any machine at the time.
All we ever got done on a PET was a lunar lander game.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 4:13 pm
by ejolson
Heater wrote: ↑Wed Jul 24, 2019 7:23 am
I thought dogs calculated by chewing on bones, Napier's bones.
I questioned the lead developer of FidoBasic about Napier's bones. With a bark Fido proudly replied, Napier's bones are based on dogarithms. You mean logarithms, I suggested. The canine coder looked confused and whined, have you ever seen a log chew on a bone?
Before this thread goes completely to the dogs, a plea hereby goes out to the non-canine members of this forum to attempt the insane British anagram challenge and thereby help to avoid a dogital apocalypse.
Speaking of the apocalypse, I just received another letter from the zombies which read
zombies wrote:We have found a way to use dogarithms and prime numbers to speed up our anagram finding programs. Let us in now.
I turned to Fido, have any dogs been helping the zombies? The reply: Anyone writing code deserves and will receive help around here.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 5:08 pm
by Paeryn
My feline coder is halfway through it, though he was annoyed that the insane list ruined his word hashing algorithm (taking it over a 64 bit value). He's gone for a sulk and a lie down.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 5:11 pm
by gkreidl
Here is the Python 3 version for the anagram challenge.
Code: Select all
#!/usr/bin/python3
import sys,re
if sys.version[0:3] < "3.6":
from collections import OrderedDict
anagrams = OrderedDict()
else:
anagrams = {}
f = open('/usr/share/dict/british-english-insane','rt')
dictionary = f.read().split('\n')
f.close()
check = re.compile('^[a-z]+$')
for word in dictionary:
if check.match(word):
keyl = list(word)
keyl.sort()
key = ''.join(keyl)
if key in anagrams:
anagrams[key].append(word)
else:
anagrams[key] = [word]
output = ''
for v in anagrams.values():
if len(v) > 1:
op = ', '.join(v) + '\n'
output += op.replace(',',':',1)
print(output,end='')
Running it on a RPi 3+ using Stretch and comparing it to Perl I get:
Code: Select all
pi@raspberrypi4:~ $ time ./selfgrams.py > selfgrams_py.txt
real 0m14,493s
user 0m14,221s
sys 0m0,180s
pi@raspberrypi4:~ $ time ./selfgrams.pl > selfgrams_pl.txt
real 0m11,729s
user 0m11,417s
sys 0m0,120s
pi@raspberrypi4:~ $ md5sum selfgrams_pl.txt
bec74aa3b31577edbb291aeb7269a4d5 selfgrams_pl.txt
pi@raspberrypi4:~ $ md5sum selfgrams_py.txt
bec74aa3b31577edbb291aeb7269a4d5 selfgrams_py.txt
On Buster it will run more than a second faster, because we do not have to use the slower subclass OrderedDict.
BTW, the British Insane Dictionary includes words containing one or more of the following characters "èóüäçéáåöñëíâôûøêîïàùú". Why should we exclude them from the anagrams (with the exception of Brexit fans)? It also includes words containing apostrophs, but it seems natural (to me) to exclude them.
To include the special characters, the Perl code has to be modified like this:
Code: Select all
#!/usr/bin/perl -l
# selfgrams4.pl - find words that have valid anagrams
# scruss - 2019-07
my ( %grams, $key, $fh, @out, $k );
open( $fh, '<', '/usr/share/dict/british-english-insane' ) or die "$!\n";
foreach ( grep { /^[a-zèóüäçéáåöñëíâôûøêîïàùú]+$/ && chomp; } <$fh> ) {
$key = join( '', sort( split( '', $_ ) ) );
if ( exists( $grams{$key} ) ) {
$out[ $grams{$key} ] .= ', ' . $_;
}
else {
$grams{$key} = $k;
$out[$k] = $_;
$k++;
}
}
close($fh);
print join( "\n", grep { index( $_, ',' ) >= 0 && s/,/:/; } @out );
exit;
The Python code can be simplified because we don't have to use regular expressions any more:
Code: Select all
#!/usr/bin/python3
import sys
if sys.version[0:3] < "3.6":
from collections import OrderedDict
anagrams = OrderedDict()
else:
anagrams = {}
f = open('/usr/share/dict/british-english-insane','rt')
dictionary = f.read().split('\n')
f.close()
for word in dictionary:
if word.islower() and "'" not in word:
keyl = list(word)
keyl.sort()
key = ''.join(keyl)
if key in anagrams:
anagrams[key].append(word)
else:
anagrams[key] = [word]
output = ''
for v in anagrams.values():
if len(v) > 1:
op = ', '.join(v) + '\n'
output += op.replace(',',':',1)
print(output,end='')
Timing on Stretch:
Code: Select all
pi@raspberrypi4:~ $ time ./selfgramsall.py > selfgrams_py.txt
real 0m12,821s
user 0m12,599s
sys 0m0,160s
pi@raspberrypi4:~ $ time ./selfgramsall.pl > selfgrams_pl.txt
real 0m11,504s
user 0m11,320s
sys 0m0,150s
pi@raspberrypi4:~ $ md5sum selfgrams_pl.txt
a6073146aceaa463645c7685ddd6a9a0 selfgrams_pl.txt
pi@raspberrypi4:~ $ md5sum selfgrams_py.txt
a6073146aceaa463645c7685ddd6a9a0 selfgrams_py.txt
On Buster both language versions should need about the same time.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 5:27 pm
by John Spikowski
This challenge seems to favor languages with built in functions like sort , islower, ...
The same bias was shown with fibo and languages with BIGINT support.
Let's rename this thread "Obscure challenges with the goal to define Infinity".
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 6:54 pm
by ejolson
John Spikowski wrote: ↑Wed Jul 24, 2019 5:27 pm
This challenge seems to favor languages with built in functions like sort , islower, ...
The goal is to avoid the digital apocalypse by writing some code that finds anagrams. As stated in the challenge description, you are free to use any programming language you prefer, whether suitable or not.
By the design of the ASCII as well as UTF-8 character encodings, one can determine whether a string contains only lowercase characters by checking that the byte values of the string fall between the byte values for the letters a and z inclusive. In a modern version of the Basic programming language example code to do this is
Code: Select all
module wordtest
function isword(s as string) as boolean
dim i,c as integer
for i=1 to len(s)
c=asc(mid(s,i,1))
if (c<asc("a")) or (c>asc("z")) then
return false
end if
next i
return true
end function
function main() as integer
if isword("test") then
console.writeline("test is lowercase")
else
console.writeline("test is not lowercase")
end if
if isword("tEst") then
console.writeline("tEst is lowercase")
else
console.writeline("tEst is not lowercase")
end if
return 0
end function
end module
To run the above program using the mono Visual Basic compiler type
Code: Select all
$ vbnc wordtest.bas
Visual Basic.Net Compiler version 0.0.0.5943 (Mono 4.0.1 - tarball)
Copyright (C) 2004-2010 Rolf Bjarne Kvinge. All rights reserved.
Compilation successful
Compilation took 00:00:18.4694050
$ mono wordtest.exe
test is lowercase
tEst is not lowercase
If I understood correctly, ScriptBasic has built-in associative arrays. Since one can assume the dictionary is already sorted, associative arrays are actually all one needs to finish the challenge. I think this is the approach taken in the recently posted JavaScript and Python codes.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 7:10 pm
by ejolson
gkreidl wrote: ↑Wed Jul 24, 2019 5:11 pm
Here is the Python 3 version for the anagram challenge.
It's interesting that the Python code to find all anagrams with those extra letters in them seems to run faster than the original. Is that due to timing variations or is it a real effect? What happens if you rerun the original program a couple times?
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 7:44 pm
by scruss
ejolson wrote: ↑Wed Jul 24, 2019 7:10 pm
BTW, the British Insane Dictionary includes words containing one or more of the following characters "èóüäçéáåöñëíâôûøêîïàùú". Why should we exclude them from the anagrams (with the exception of Brexit fans)?
No brexitry here, I assure you. They're excluded because the English version of Bananagrams (developed in Pawtucket, RI, USA) does not have them. A truly British set would have tiles for 'ff', 'll', 'ng', 'ch' and 'gh' at the very least, since they are compound letters used in languages of the British Isles. Collating on compound letters is loads of laffs, let me tell you.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 8:14 pm
by gkreidl
ejolson wrote: ↑Wed Jul 24, 2019 7:10 pm
gkreidl wrote: ↑Wed Jul 24, 2019 5:11 pm
Here is the Python 3 version for the anagram challenge.
It's interesting that the Python code to find all anagrams with those extra letters in them seems to run faster than the original. Is that due to timing variations or is it a real effect? What happens if you rerun the original program a couple times?
The speed difference is in using str.lower() instead of using a regular expression, which is a bit slower in Python than in Perl. If I want to exclude all special chracters I have to use the regular expression method.
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 8:19 pm
by Paeryn
Kira the Koding Kitty (as he insists he be called) says he's only including words containing the 26 English lower-case letters, though he can be purr-suaded to allow hyphens which will be ignored and possibly upper-case which will be treated as lower-case, but only if I give him some turkey. And it will have to wait until he's been fishing!
Re: Project Digital Apocalypse Not Now
Posted: Wed Jul 24, 2019 10:02 pm
by Heater
No cats or dogs around here I'm afraid. I have to do everything myself.
After a days graft parsing an obscure proprietary binary protocol I found myself making this insane British anagram solution in C/C++:
Code: Select all
//
// insane-british-anagram.cpp - Find words that have valid anagrams
// Words sourced from Debian's british-english-insane dictionary
//
// heater - 2019-07-25
//
#include <fstream>
#include <string>
#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <unordered_map>
std::string getFileContents(const char *filename) {
std::ifstream in(filename, std::ios::in | std::ios::binary);
if (in) {
std::string contents;
in.seekg(0, std::ios::end);
contents.resize(in.tellg());
in.seekg(0, std::ios::beg);
in.read(&contents[0], contents.size());
in.close();
return(contents);
}
throw(std::runtime_error("Shit happened!"));
}
char* sortString (const char* s) {
char* string = (char*)malloc(strlen(s) + 1);
strcpy (string, s);
char temp;
int i, j;
int n = strlen(string);
for (i = 0; i < n-1; i++) {
for (j = i+1; j < n; j++) {
if (string[i] > string[j]) {
temp = string[i];
string[i] = string[j];
string[j] = temp;
}
}
}
return string;
}
int main (int argc, char* argv[]) {
// Map container for sets of anagrams
// An anagram set is simply a string of format like: "whiter: wither, wrihte, writhe"
std::unordered_map<std::string, std::string> anagramMap;
std::unordered_map<std::string, std::string>::iterator it;
auto dictionary = getFileContents("/usr/share/dict/british-english-insane");
char* dictionaryPtr = (char*)dictionary.c_str();
char* wordPtr = dictionaryPtr;
char* charPtr = wordPtr;
bool reject = false;
while (1) {
if (islower(*charPtr)) {
// We are scanning a valid word
charPtr++;
} else if ((*charPtr) == '\n') {
// We have hit the end of a word, use the word if it's valid
*charPtr = 0;
if (!reject) {
// Do we have a word with this key (potential anagram)?
char* key = sortString(wordPtr);
it = anagramMap.find(key);
if (it == anagramMap.end()) {
// No: Add it to the map as start of new anagram set.
anagramMap[key] = std::string(wordPtr);
} else {
// Yes: Append it to the existing anagram set.
it->second.append(", ").append(wordPtr);
}
}
charPtr++;
wordPtr = charPtr;
reject = false;
} else if ((*charPtr) != 0) {
// Invalid character
reject = true;
charPtr++;
} else {
// End of dictionary
break;
}
}
// Output all the sets that have more than one word.
for( auto const& [key, val] : anagramMap )
{
// Sets with a "," contain more than one word, the anagrams!
size_t found = val.find(',');
if (found != std::string::npos) {
std::string fart = val;
fart.replace(found, 1, ":");
std::cout << fart << std::endl ;
}
}
return (0);
}
Which runs nicely on a Pi 3 like so (with comparison to JS):
Code: Select all
pi@pi3-buster-1:~ $ g++ -std=c++17 -Wall -O3 -o insane-british-anagram insane-british-anagram.cpp
pi@pi3-buster-1:~ $ time ./insane-british-anagram > insane-british-anagram-cpp.txt
real 0m1.931s
user 0m1.580s
sys 0m0.350s
pi@pi3-buster-1:~ $ time node insane-british-anagram.js > insane-british-anagram-js.txt
rss 119.62 MB
heapTotal 83.82 MB
heapUsed 77.3 MB
external 0.64 MB
real 0m11.967s
user 0m13.957s
sys 0m0.231s
A useful 6 times faster than JS and PL.
Only one problem....
It finds all the right anagrams but the lines are printed in random order!
$ tail insane-british-anagram-cpp.txt
noup: upon
nourisher: renourish
nibbles: snibble
nouses: onuses
nov: von
koso: skoo, soko, sook
novale: novela
anorectous: courantoes
pagnes: paseng
novial: violan
Well of course, I used an unordered map (hash table) to keep track of the anagrams.
Maybe I'll be inspired to fix it after a cat nap...