Page 33 of 67
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 1:13 pm
by John_Spikowski
What is absurd is your belief that the JavaScript monkey can do unlimited tricks
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 1:32 pm
by Heater
I believe no such thing.
Which is why my solution is also in C/C++.
Although it has to be noted that the JS solution is only 3 or 4 times slower. Which I find pretty amazing.
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 1:46 pm
by Heater
ejolson,
If that is the same code as posted here, then it does not include the heap data structure and is consequently an O(n^2) algorithm.
I was thinking about that...
The user formerly known as "ScriptBasic" once intimated that in ScriptBasic, the language interpreter, arrays are not actually arrays. They are implemented as linked lists.
That means that when estimating algorithmic complexity we have to count an array access, e.g. A[n], as linear time rather than constant time.
That means, as far as I can tell, we move up from an O(n^2) problem to an O(n^3) problem.
So the 1 minute 35 seconds becomes 95 * 60^3 = 20520000 seconds. Or about 240 days!
Am I right?
If so, this is never going to produce a result before Win 10 upgrades itself, we have a power failure, it's caused so much global warming nobody is left alive to care.
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 2:00 pm
by ejolson
While one of the topics of this thread is what makes a good first programming language, the way I see it, the focus of the insane British anagram challenge is not so much on programming languages but on people writing code to perform a particular task.
As already demonstrated, there are a number of different algorithms and data structures that can be used--quick sort, merge sort, bubble sort, unique factorisation theorem, associative arrays, hash tables, collisionless hash functions and a heap. Maybe even dogarithms, but then again maybe not.
The reason the current ScriptBasic code is slow is because it was derived from my original Basic code that was written with the hypothetical style of a 12 to 16 year old living during the golden age of personal commuting who hasn't taken an algorithms course. In particular, that code employs a naive O(n^2) algorithm based on searching linear lists, while every other algorithm presented so far is O(nlogn) or better.
Replacing the linear search by a heap data structure, as in
this improved version, results in better performance. However, my understanding is that ScriptBasic also has associative arrays built in, in which case different techniques should result in even better performance.
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 2:18 pm
by John_Spikowski
I'm all ears!
Please post the updated ScriptBasic source.
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 2:21 pm
by Heater
Personally I hate anagrams. Reminds me too much of school, all that immensely boring spelling and grammar stuff they torture tested us with at a tender age (Do they still do that?). I could never understand why people might want to think about this kind of thing for fun.
However, this "absurd" challenge has had me thinking about two or three useful programming techniques I have never given much time to. One of which is far from literary, using the Fundamental Theorem of Arithmetic no less.
It's not the madness, it's the method!
Do I get a job at Google after all these weird challenges?

Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 2:25 pm
by Heater
John_Spikowski,
Please post the updated ScriptBasic source.
Call me old fashioned John but I thought the idea of such a challenge problem was to have a go at thinking about it for yourself. Not waiting around for someone else to do it for you. What would be the point of that?
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 2:52 pm
by John_Spikowski
I'll give it some thought.
Would using a SQL sever be outside the rules?
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 3:58 pm
by Heater
Sounds like a novel approach.
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 6:38 pm
by Heater
I was thinking about this technique to find anagrams using prime numbers...
I was a bit proud of myself in that it occurred to me as soon as we started discussing the problem. A bit more proud that I got it to work and that it boosted performance significantly. That is not much to boast about but it's not often I have such insights so I'm happy.
So I started wondering a couple of things about it...
Firstly I presumed that this is not a new idea so I went googling for it. Did not get many hits but the discussions I did find wrote it off as being inefficient or suggesting there are faster ways to do this. Well, given this is the fastest we have here what are we missing? What is that more efficient way?
We see that it overflows the integers for long words, which does not stop it producing the correct result for this particular problem. Perhaps in the general case all that huge integer maths required to get it right for any long string would be detrimental.
Secondly, what we have here is a way to deliberately create hash collisions. We can give identity strings to millions of different objects in such a way that a whole subsets of those objects end up in the same class or category. If their identity strings are anagrams of each other they are all in the same set. I can't help thinking this is actually useful for applications other than messing with insane dictionaries. A very fast way to classify things and search for members of the same set.
Anyone know of such applications?
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 6:39 pm
by ejolson
John_Spikowski wrote: ↑Fri Jul 26, 2019 2:52 pm
I'll give it some thought.
Would using a SQL sever be outside the rules?
A novel approach is all the better.
The description of the challenge can be found
here. The relevant guideline is
Insane British Anagram Challenge wrote:- Choose any suitable or possibly unsuitable programming language.
which appears quite inclusive.
Although one of the planned future
titles for this thread "Why Turing was Incomplete" suggests there is a lot more needed in a good first programming language beyond being Turing complete, the present title "Digital Apocalypse Not Now" concerns the question whether people control computers or computers control people and how to avoid the latter.
In particular, since it is necessary for a person to know how to write a program to tell a computer what to do, an important aspect of the anagram challenge is a focus on people developing algorithms and writing code. The exact languages used and whether they are Turing complete is secondary.
Returning to the idea of using SQL for the anagram challenge, apparently certain implementations of SQL are, in fact, Turing complete. For example in
this remark on Stack Overflow
Mathiew Vines wrote:Andrew Gierth proves that with CTE and Windowing SQL is Turing Complete, by constructing a cyclic tag system, which has been proved to be Turing Complete. The CTE feature is the important part however -- it allows you to create named sub-expressions that can refer to themselves, and thereby recursively solve problems.
The interesting thing to note is that CTE was not really added to turn SQL into a programming language -- just to turn a declarative querying language into a more powerful declarative querying language.
Of course a combination of ScriptBasic and SQL would be even more powerful.
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 6:55 pm
by Heater
I can't wait to get to the "The New Paper Tape Project" part.
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 6:59 pm
by ejolson
Heater wrote: ↑Fri Jul 26, 2019 6:38 pm
Well, given this is the fastest we have here what are we missing? What is that more efficient way?
The note from the zombies said dogarithms are more efficient, but I haven't let them in (yet) so I don't know any more details. Even Fido has been uncharacteristically quiet. Maybe that dog developer is concerned the Koding Kitty has discovered some better magic.
Why is everyone so competitive when the real dangers lie with deep learning neural networks and the resulting powerful but emotional artificial intelligences?
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 7:33 pm
by Heater
ejolson,
Why is everyone so competitive when real dangers lie with the deep learning neural networks and the resulting powerful but emotional artificial intelligences?
That's what makes us human
Arguably the real danger we face is global warming and climate change. Which despite my humorous quips about inefficient languages and software causing it, does not seem to be caused by digital beings, as yet.
Anyway, it's too late. We cannot exist as we do now with out the networks of rude and crude digital beings we have already. If we could shut them down today the civilized world would stop functioning immediately, within hours would be mass panic as we would be incapable of obtaining food, or moving around or doing anything much, except riot and fight over the remaining scraps. Mass starvation, warfare and death being the inevitable outcome.
Already we cannot turn off the ever growing and more ever more capable digital intelligence we are surrounded by.
Currently my opinion is it is not the AI that we need to be afraid of. It's those humans that can make use of it, and the gigantic amounts of data it needs to monitor us all. Those people with the power.
Having everyone learn to code is great and all, but it's not going to fix that societal problem. In fact, as far as I can tell most people who are skilled at programming are actively employed to make the situation worse!
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 10:31 pm
by ejolson
Heater wrote: ↑Fri Jul 26, 2019 1:46 pm
So the 1 minute 35 seconds becomes 95 * 60^3 = 20520000 seconds. Or about 240 days!
Am I right?
Judging from
these scaling results
array access in ScriptBasic has the same asymptotic time complexity as arrays in bwBasic, gplBasic and Matrix Brandy Basic.
Therefore, unless the apocalypse called Windows automatic update and reboot happens first, I would expect the insane O(n^2) anagram algorithm to finish within 5 days. To avoid any unscheduled reboots, maybe it would be better to use a Raspberry Pi.
Re: Project Digital Apocalypse Not Now
Posted: Fri Jul 26, 2019 10:44 pm
by John_Spikowski
I believe the SQL table approach should drastically reduce the time to completion.
Re: Project Digital Apocalypse Not Now
Posted: Sat Jul 27, 2019 3:21 pm
by jamesh
Just cleaned the thread of cruft. I don't like cleaning threads of cruft, especially on the weekend. Please don't let it happen to me again, or me and the banhammer will start making our feelings more well known.
Re: Project Digital Apocalypse Not Now
Posted: Sat Jul 27, 2019 3:39 pm
by John_Spikowski
Sorry!
I put pondering back in the cage.
Re: Project Digital Apocalypse Not Now
Posted: Sat Jul 27, 2019 4:37 pm
by bensimmo
Heater, you fibo javascript web page takes 30.3 seconds on an almost 5year old Tesco Hudl2 (think it was about £100 at the time).
just happen to be using it and you link was in a post.

Re: Project Digital Apocalypse Not Now
Posted: Sat Jul 27, 2019 4:52 pm
by Heater
bensimmo,
Thanks for that fibo report. That is about twice as fast as my not young low end Samsung phone.
I had never heard of a Tesco Hudl2. Grief, finding out was a nightmare. Nearly every site I find that reviews it is so loaded with advertising and popups and requests for push notification, I had to back out immediately!
I'm amazed that fibo works at all !
Re: Project Digital Apocalypse Not Now
Posted: Sat Jul 27, 2019 4:59 pm
by Heater
So I could not let it rest. It niggled at me that the CPP fibo using the prime number based hashing suffered from integer overflows and could miss or incorrectly identify anagrams. Even if it did happen to find the correct result for the insane British anagram challenge.
So I had to put the Bint big integer class in there. Sadly it makes the run time over 4 times longer on my PC and crashes with out of memory on my Pi 3. It does get the right result on the PC mind:
Code: Select all
$ time ./insane-british-anagram-bint > insane-british-anagram-bint-cpp.txt
real 0m4.742s
user 0m1.859s
sys 0m2.875s
$ time ./insane-british-anagram > insane-british-anagram-cpp.txt
real 0m0.398s
user 0m0.250s
sys 0m0.141s
$ md5sum insane-british-anagram.txt insane-british-anagram-bint-cpp.txt
addeda36a4631afc983f3bff13742e36 insane-british-anagram.txt
addeda36a4631afc983f3bff13742e36 insane-british-anagram-bint-cpp.txt
Here is the code if anyone is crazy enough to be interested. Not recommended:
Code: Select all
//
// insane-british-anagram-bint.cpp - Find words that have valid anagrams
// Words sourced from Debian's british-english-insane dictionary
//
// heater - 2019-07-25
//
#include <fstream>
#include <string>
#include <stdlib.h>
#include <string.h>
#include <unordered_map>
#include <vector>
std::string getFileContents(const char *filename) {
std::ifstream in(filename, std::ios::in | std::ios::binary);
if (in) {
std::string contents;
in.seekg(0, std::ios::end);
contents.resize(in.tellg());
in.seekg(0, std::ios::beg);
in.read(&contents[0], contents.size());
in.close();
return(contents);
}
throw(std::runtime_error("Shit happened!"));
}
char* sortString (const char* s) {
int n = strlen(s);
char* string = (char*)malloc(n + 1);
strcpy (string, s);
char temp;
int i, j;
for (i = 0; i < n-1; i++) {
for (j = i+1; j < n; j++) {
if (string[i] > string[j]) {
temp = string[i];
string[i] = string[j];
string[j] = temp;
}
}
}
return string;
}
// One prime number for each lower case letter of the alphabet
int primes[26] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101};
uint64_t primeHash (char* word) {
uint64_t hash = 1;
for (size_t i = 0; i < strlen(word); i++) {
int index = word[i] - 97;
hash = hash * primes[index];
}
return hash;
}
int main (int argc, char* argv[]) {
// Map container for sets of anagrams
// An anagram set is simply a vector of pointers to words in the dictionary
// Keys for the anagram sets in the map are the ordered characters of the words.
std::unordered_map<uint64_t, std::vector<char*>>anagramMap;
std::unordered_map<uint64_t, std::vector<char*>>::iterator it;
// An ordered index of anagram set keys
std::vector<uint64_t> index;
auto dictionary = getFileContents("/usr/share/dict/british-english-insane");
char* dictionaryPtr = (char*)dictionary.c_str();
char* wordPtr = dictionaryPtr;
char* charPtr = wordPtr;
bool reject = false;
while (1) {
if (islower(*charPtr)) {
// We are scanning a valid word
charPtr++;
} else if ((*charPtr) == '\n') {
// We have hit the end of a word, use the word if it's valid
*charPtr = 0;
if (!reject) {
// Do we have a word with this key (potential anagram)?
uint64_t key = primeHash(wordPtr);
it = anagramMap.find(key);
if (it == anagramMap.end()) {
// No: Add it to the map as start of new anagram set.
anagramMap[key].push_back(wordPtr);
// And add the new anagram set to index
index.push_back(key);
} else {
// Yes: Append it to the existing anagram set.
it->second.push_back(wordPtr);
}
}
charPtr++;
wordPtr = charPtr;
reject = false;
} else if ((*charPtr) != 0) {
// Invalid character
reject = true;
charPtr++;
} else {
// End of dictionary
break;
}
}
// Iterate over the collected anagram sets in order of the index.
for(auto const& key: index) {
it = anagramMap.find(key);
if (it->second.size() > 1) {
int count = 0;
for(const auto& value: it->second) {
if (count == 1) {
fputs(": ", stdout);
} else if (count > 1) {
fputs(", ", stdout);
}
fputs(value, stdout);
count++;
}
fputs("\n", stdout);
}
}
return (0);
}
Re: Project Digital Apocalypse Not Now
Posted: Sat Jul 27, 2019 5:04 pm
by bensimmo
Heater wrote: ↑Sat Jul 27, 2019 4:52 pm
bensimmo,
Thanks for that fibo report. That is about twice as fast as my not young low end Samsung phone.
I had never heard of a Tesco Hudl2. Grief, finding out was a nightmare. Nearly every site I find that reviews it is so loaded with advertising and popups and requests for push notification, I had to back out immediately!
I'm amazed that fibo works at all !
one if the nicest tablets i have used, a bit dated now, ( not in design or feel) still seems better than comparable Amazon Fire tablets. OS is old and lacking now that's all

.
wikipedia is so much easier
https://en.m.wikipedia.org/wiki/Tesco_Hudl_2 than uk click bait sites.
For reference, it quicker than my 2016 Huawei P9 lite
HiSilicon Kirin 650 (16 nm)
Octa-core (4x2.0 GHz Cortex-A53 & 4x1.7 GHz Cortex-A53)
Which takes
41.2seconds
And a Pi4 4GB at 1.75GHz Overclock takes
30.2 seconds.
So an old now defunct supermarket tablet for often less than £100 (often seen at £70 iirc) is slower than a 16% overclocked Pi4

[By 0.1 seconds]
Re: Project Digital Apocalypse Not Now
Posted: Sat Jul 27, 2019 5:49 pm
by Heater
So it turns out that the CPP anagram finder using Bint big integer class for the primeHash function uses over a giga byte of RAM!
I'm not sure where it is all going. Valgrind says there are no leaks.
My back of the envelope calculation only accounts for about half a gig. We need a Bint for every word saved in the index vector and another for every key saved in the anagram map. They have a minimum size of 32 * uint64_t.
Even then it's too much for my Pi 3. It just ain't going to fly there.
Re: Project Digital Apocalypse Not Now
Posted: Sat Jul 27, 2019 5:52 pm
by ejolson
Heater wrote: ↑Sat Jul 27, 2019 4:59 pm
So I could not let it rest. It niggled at me that the CPP fibo using the prime number based hashing suffered from integer overflows and could miss or incorrectly identify anagrams.
Most hashing algorithms (as well as prime number generation routines) rely on the intrinsic 2^n modulo-arithmetic properties of unsigned n-bit integer variables. Since the prime p=2 is a zero divisor of the resulting ring, under the assumption that overflow will occur, it is likely better to use only odd primes.
Given that zombie parents rely on randomly generated UUIDs rather than proper names to distinguish their offspring, I expect even a politician would be satisfied by a probabilistic argument showing the chances of an overflow-induced hash collision when finding anagrams using prime numbers and 64-bit integer arithmetic are negligible compared to the chances of a bit flip due to cosmic radiation.
While most people are too irritated by UUIDs to use them for identifying disk drives let alone children, standards may be different for the type of people who attempt the insane British anagram challenge. I asked the lead developer of FidoBasic and received the comment, dogs are people too but maybe not cats.
Why is there so much conflict in the world when we could all just code along with the Raspberry Pi?
Re: Project Digital Apocalypse Not Now
Posted: Sat Jul 27, 2019 6:32 pm
by ejolson
Heater wrote: ↑Sat Jul 27, 2019 5:49 pm
We need a Bint for every word saved in the index vector and another for every key saved in the anagram map. They have a minimum size of 32 * uint64_t.
Have you tried the gmp library?
According to Kira the Koding Kitty
Paeryn wrote: ↑Thu Jul 25, 2019 11:12 pm
an approach of counting letter distributions making a 96-bit key
results in a collision-free hash that can be used to identity anagrams.
How big is the largest integer that appears in the prime number algorithm? What assignment of prime numbers to letters minimises that number?
I asked Fido and he whined that alpha-beta pruning might help when solving the resulting min-max problem. Since finding anagrams isn't a game like chess I objected, but the dog developer became barking mad and replied: What fun is chess? You can't even use dogarithms to find the next move.