User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 1:13 pm

What is absurd is your belief that the JavaScript monkey can do unlimited tricks

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 1:32 pm

I believe no such thing.

Which is why my solution is also in C/C++.

Although it has to be noted that the JS solution is only 3 or 4 times slower. Which I find pretty amazing.
Memory in C++ is a leaky abstraction .

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 1:46 pm

ejolson,
If that is the same code as posted here, then it does not include the heap data structure and is consequently an O(n^2) algorithm.
I was thinking about that...

The user formerly known as "ScriptBasic" once intimated that in ScriptBasic, the language interpreter, arrays are not actually arrays. They are implemented as linked lists.

That means that when estimating algorithmic complexity we have to count an array access, e.g. A[n], as linear time rather than constant time.

That means, as far as I can tell, we move up from an O(n^2) problem to an O(n^3) problem.

So the 1 minute 35 seconds becomes 95 * 60^3 = 20520000 seconds. Or about 240 days!

Am I right?

If so, this is never going to produce a result before Win 10 upgrades itself, we have a power failure, it's caused so much global warming nobody is left alive to care.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5377
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 2:00 pm

John_Spikowski wrote:
Fri Jul 26, 2019 1:13 pm
What is absurd
While one of the topics of this thread is what makes a good first programming language, the way I see it, the focus of the insane British anagram challenge is not so much on programming languages but on people writing code to perform a particular task.

As already demonstrated, there are a number of different algorithms and data structures that can be used--quick sort, merge sort, bubble sort, unique factorisation theorem, associative arrays, hash tables, collisionless hash functions and a heap. Maybe even dogarithms, but then again maybe not.

The reason the current ScriptBasic code is slow is because it was derived from my original Basic code that was written with the hypothetical style of a 12 to 16 year old living during the golden age of personal commuting who hasn't taken an algorithms course. In particular, that code employs a naive O(n^2) algorithm based on searching linear lists, while every other algorithm presented so far is O(nlogn) or better.

Replacing the linear search by a heap data structure, as in this improved version, results in better performance. However, my understanding is that ScriptBasic also has associative arrays built in, in which case different techniques should result in even better performance.
Last edited by ejolson on Fri Jul 26, 2019 3:38 pm, edited 1 time in total.

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 2:18 pm

I'm all ears!

Please post the updated ScriptBasic source.

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 2:21 pm

Personally I hate anagrams. Reminds me too much of school, all that immensely boring spelling and grammar stuff they torture tested us with at a tender age (Do they still do that?). I could never understand why people might want to think about this kind of thing for fun.

However, this "absurd" challenge has had me thinking about two or three useful programming techniques I have never given much time to. One of which is far from literary, using the Fundamental Theorem of Arithmetic no less.

It's not the madness, it's the method!

Do I get a job at Google after all these weird challenges? :)
Memory in C++ is a leaky abstraction .

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 2:25 pm

John_Spikowski,
Please post the updated ScriptBasic source.
Call me old fashioned John but I thought the idea of such a challenge problem was to have a go at thinking about it for yourself. Not waiting around for someone else to do it for you. What would be the point of that?
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 2:52 pm

I'll give it some thought.

Would using a SQL sever be outside the rules?

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 3:58 pm

Sounds like a novel approach.
Memory in C++ is a leaky abstraction .

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 6:38 pm

I was thinking about this technique to find anagrams using prime numbers...

I was a bit proud of myself in that it occurred to me as soon as we started discussing the problem. A bit more proud that I got it to work and that it boosted performance significantly. That is not much to boast about but it's not often I have such insights so I'm happy.

So I started wondering a couple of things about it...

Firstly I presumed that this is not a new idea so I went googling for it. Did not get many hits but the discussions I did find wrote it off as being inefficient or suggesting there are faster ways to do this. Well, given this is the fastest we have here what are we missing? What is that more efficient way?

We see that it overflows the integers for long words, which does not stop it producing the correct result for this particular problem. Perhaps in the general case all that huge integer maths required to get it right for any long string would be detrimental.

Secondly, what we have here is a way to deliberately create hash collisions. We can give identity strings to millions of different objects in such a way that a whole subsets of those objects end up in the same class or category. If their identity strings are anagrams of each other they are all in the same set. I can't help thinking this is actually useful for applications other than messing with insane dictionaries. A very fast way to classify things and search for members of the same set.

Anyone know of such applications?
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5377
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 6:39 pm

John_Spikowski wrote:
Fri Jul 26, 2019 2:52 pm
I'll give it some thought.

Would using a SQL sever be outside the rules?
A novel approach is all the better.

The description of the challenge can be found here. The relevant guideline is
Insane British Anagram Challenge wrote:
  • Choose any suitable or possibly unsuitable programming language.
which appears quite inclusive.

Although one of the planned future titles for this thread "Why Turing was Incomplete" suggests there is a lot more needed in a good first programming language beyond being Turing complete, the present title "Digital Apocalypse Not Now" concerns the question whether people control computers or computers control people and how to avoid the latter.

In particular, since it is necessary for a person to know how to write a program to tell a computer what to do, an important aspect of the anagram challenge is a focus on people developing algorithms and writing code. The exact languages used and whether they are Turing complete is secondary.

Returning to the idea of using SQL for the anagram challenge, apparently certain implementations of SQL are, in fact, Turing complete. For example in this remark on Stack Overflow
Mathiew Vines wrote:Andrew Gierth proves that with CTE and Windowing SQL is Turing Complete, by constructing a cyclic tag system, which has been proved to be Turing Complete. The CTE feature is the important part however -- it allows you to create named sub-expressions that can refer to themselves, and thereby recursively solve problems.

The interesting thing to note is that CTE was not really added to turn SQL into a programming language -- just to turn a declarative querying language into a more powerful declarative querying language.
Of course a combination of ScriptBasic and SQL would be even more powerful.

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 6:55 pm

I can't wait to get to the "The New Paper Tape Project" part.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5377
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 6:59 pm

Heater wrote:
Fri Jul 26, 2019 6:38 pm
Well, given this is the fastest we have here what are we missing? What is that more efficient way?
The note from the zombies said dogarithms are more efficient, but I haven't let them in (yet) so I don't know any more details. Even Fido has been uncharacteristically quiet. Maybe that dog developer is concerned the Koding Kitty has discovered some better magic.

Why is everyone so competitive when the real dangers lie with deep learning neural networks and the resulting powerful but emotional artificial intelligences?

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 7:33 pm

ejolson,
Why is everyone so competitive when real dangers lie with the deep learning neural networks and the resulting powerful but emotional artificial intelligences?
That's what makes us human :)

Arguably the real danger we face is global warming and climate change. Which despite my humorous quips about inefficient languages and software causing it, does not seem to be caused by digital beings, as yet.

Anyway, it's too late. We cannot exist as we do now with out the networks of rude and crude digital beings we have already. If we could shut them down today the civilized world would stop functioning immediately, within hours would be mass panic as we would be incapable of obtaining food, or moving around or doing anything much, except riot and fight over the remaining scraps. Mass starvation, warfare and death being the inevitable outcome.

Already we cannot turn off the ever growing and more ever more capable digital intelligence we are surrounded by.

Currently my opinion is it is not the AI that we need to be afraid of. It's those humans that can make use of it, and the gigantic amounts of data it needs to monitor us all. Those people with the power.

Having everyone learn to code is great and all, but it's not going to fix that societal problem. In fact, as far as I can tell most people who are skilled at programming are actively employed to make the situation worse!
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5377
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 10:31 pm

Heater wrote:
Fri Jul 26, 2019 1:46 pm
So the 1 minute 35 seconds becomes 95 * 60^3 = 20520000 seconds. Or about 240 days!

Am I right?
Judging from these scaling results

Image

array access in ScriptBasic has the same asymptotic time complexity as arrays in bwBasic, gplBasic and Matrix Brandy Basic.

Therefore, unless the apocalypse called Windows automatic update and reboot happens first, I would expect the insane O(n^2) anagram algorithm to finish within 5 days. To avoid any unscheduled reboots, maybe it would be better to use a Raspberry Pi.
Last edited by ejolson on Fri Jul 26, 2019 11:39 pm, edited 3 times in total.

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Fri Jul 26, 2019 10:44 pm

I believe the SQL table approach should drastically reduce the time to completion.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26660
Joined: Sat Jul 30, 2011 7:41 pm

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 3:21 pm

Just cleaned the thread of cruft. I don't like cleaning threads of cruft, especially on the weekend. Please don't let it happen to me again, or me and the banhammer will start making our feelings more well known.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 3:39 pm

Sorry!

I put pondering back in the cage.

User avatar
bensimmo
Posts: 4626
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 4:37 pm

Heater, you fibo javascript web page takes 30.3 seconds on an almost 5year old Tesco Hudl2 (think it was about £100 at the time).
just happen to be using it and you link was in a post.
:-)

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 4:52 pm

bensimmo,

Thanks for that fibo report. That is about twice as fast as my not young low end Samsung phone.

I had never heard of a Tesco Hudl2. Grief, finding out was a nightmare. Nearly every site I find that reviews it is so loaded with advertising and popups and requests for push notification, I had to back out immediately!

I'm amazed that fibo works at all !
Memory in C++ is a leaky abstraction .

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 4:59 pm

So I could not let it rest. It niggled at me that the CPP fibo using the prime number based hashing suffered from integer overflows and could miss or incorrectly identify anagrams. Even if it did happen to find the correct result for the insane British anagram challenge.

So I had to put the Bint big integer class in there. Sadly it makes the run time over 4 times longer on my PC and crashes with out of memory on my Pi 3. It does get the right result on the PC mind:

Code: Select all

$ time ./insane-british-anagram-bint > insane-british-anagram-bint-cpp.txt

real    0m4.742s
user    0m1.859s
sys     0m2.875s
$ time ./insane-british-anagram > insane-british-anagram-cpp.txt

real    0m0.398s
user    0m0.250s
sys     0m0.141s

$ md5sum insane-british-anagram.txt insane-british-anagram-bint-cpp.txt
addeda36a4631afc983f3bff13742e36  insane-british-anagram.txt
addeda36a4631afc983f3bff13742e36  insane-british-anagram-bint-cpp.txt
Here is the code if anyone is crazy enough to be interested. Not recommended:

Code: Select all

//
// insane-british-anagram-bint.cpp - Find words that have valid anagrams
//                              Words sourced from Debian's british-english-insane dictionary
//
// heater - 2019-07-25
// 
#include <fstream>
#include <string>
#include <stdlib.h>
#include <string.h>
#include <unordered_map> 
#include <vector> 

std::string getFileContents(const char *filename) {
    std::ifstream in(filename, std::ios::in | std::ios::binary);
    if (in) {
        std::string contents;
        in.seekg(0, std::ios::end);
        contents.resize(in.tellg());
        in.seekg(0, std::ios::beg);
        in.read(&contents[0], contents.size());
        in.close();
        return(contents);
    }
    throw(std::runtime_error("Shit happened!"));
}

char* sortString (const char* s) {
    int n = strlen(s);
    char* string = (char*)malloc(n + 1); 
    strcpy (string, s); 
    char temp;
    int i, j;

    for (i = 0; i < n-1; i++) {
        for (j = i+1; j < n; j++) {
            if (string[i] > string[j]) {
            temp = string[i];
            string[i] = string[j];
            string[j] = temp;
            }
        }
    }
    return string;
}

// One prime number for each lower case letter of the alphabet
int primes[26] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101};

uint64_t primeHash (char* word) {
    uint64_t hash = 1;
    for (size_t i = 0; i < strlen(word); i++) {
        int index = word[i] -  97;  
        hash = hash * primes[index];
    }
    return hash;
}

int main (int argc, char* argv[]) {
    // Map container for sets of anagrams 
    // An anagram set is simply a vector of pointers to words in the dictionary
    // Keys for the anagram sets in the map are the ordered characters of the words.
    std::unordered_map<uint64_t, std::vector<char*>>anagramMap;
    std::unordered_map<uint64_t, std::vector<char*>>::iterator it;

    // An ordered index of anagram set keys 
    std::vector<uint64_t> index;

    auto dictionary = getFileContents("/usr/share/dict/british-english-insane");
    char* dictionaryPtr = (char*)dictionary.c_str();
    char* wordPtr = dictionaryPtr;
    char* charPtr = wordPtr;
    bool reject = false;
    while (1) {
        if (islower(*charPtr)) {
            // We are scanning a valid word
            charPtr++;
        } else if ((*charPtr) == '\n') {
            // We have hit the end of a word, use the word if it's valid
            *charPtr = 0;
            if (!reject) {
                // Do we have a word with this key (potential anagram)?
                uint64_t key = primeHash(wordPtr);

                it = anagramMap.find(key);
                if (it == anagramMap.end()) {
                    // No: Add it to the map as start of new anagram set.
                    anagramMap[key].push_back(wordPtr);

                    // And add the new anagram set to index
                    index.push_back(key);
                } else {
                    // Yes: Append it to the existing anagram set.
                    it->second.push_back(wordPtr); 
                }
            }
            charPtr++;
            wordPtr = charPtr;
            reject = false;
        } else if ((*charPtr) != 0) {
            // Invalid character
            reject = true;
            charPtr++;
        } else {
            // End of dictionary
            break;
        }
    }

    // Iterate over the collected anagram sets in order of the index.
    for(auto const& key: index) {
        it = anagramMap.find(key);
        if (it->second.size() > 1) {
            int count = 0;
            for(const auto& value: it->second) {
                if (count == 1) {
                    fputs(": ", stdout);                
                } else if (count > 1) {
                    fputs(", ", stdout);                
                }
                fputs(value, stdout);                
                count++;
            }
            fputs("\n", stdout);                
        }
    }
    return (0);
}
Memory in C++ is a leaky abstraction .

User avatar
bensimmo
Posts: 4626
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 5:04 pm

Heater wrote:
Sat Jul 27, 2019 4:52 pm
bensimmo,

Thanks for that fibo report. That is about twice as fast as my not young low end Samsung phone.

I had never heard of a Tesco Hudl2. Grief, finding out was a nightmare. Nearly every site I find that reviews it is so loaded with advertising and popups and requests for push notification, I had to back out immediately!

I'm amazed that fibo works at all !
one if the nicest tablets i have used, a bit dated now, ( not in design or feel) still seems better than comparable Amazon Fire tablets. OS is old and lacking now that's all :-(.
wikipedia is so much easier https://en.m.wikipedia.org/wiki/Tesco_Hudl_2 than uk click bait sites.


For reference, it quicker than my 2016 Huawei P9 lite
HiSilicon Kirin 650 (16 nm)
Octa-core (4x2.0 GHz Cortex-A53 & 4x1.7 GHz Cortex-A53)
Which takes
41.2seconds

And a Pi4 4GB at 1.75GHz Overclock takes
30.2 seconds.

So an old now defunct supermarket tablet for often less than £100 (often seen at £70 iirc) is slower than a 16% overclocked Pi4
;-)
[By 0.1 seconds]

Heater
Posts: 15950
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 5:49 pm

So it turns out that the CPP anagram finder using Bint big integer class for the primeHash function uses over a giga byte of RAM!

I'm not sure where it is all going. Valgrind says there are no leaks.

My back of the envelope calculation only accounts for about half a gig. We need a Bint for every word saved in the index vector and another for every key saved in the anagram map. They have a minimum size of 32 * uint64_t.

Even then it's too much for my Pi 3. It just ain't going to fly there.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5377
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 5:52 pm

Heater wrote:
Sat Jul 27, 2019 4:59 pm
So I could not let it rest. It niggled at me that the CPP fibo using the prime number based hashing suffered from integer overflows and could miss or incorrectly identify anagrams.
Most hashing algorithms (as well as prime number generation routines) rely on the intrinsic 2^n modulo-arithmetic properties of unsigned n-bit integer variables. Since the prime p=2 is a zero divisor of the resulting ring, under the assumption that overflow will occur, it is likely better to use only odd primes.

Given that zombie parents rely on randomly generated UUIDs rather than proper names to distinguish their offspring, I expect even a politician would be satisfied by a probabilistic argument showing the chances of an overflow-induced hash collision when finding anagrams using prime numbers and 64-bit integer arithmetic are negligible compared to the chances of a bit flip due to cosmic radiation.

While most people are too irritated by UUIDs to use them for identifying disk drives let alone children, standards may be different for the type of people who attempt the insane British anagram challenge. I asked the lead developer of FidoBasic and received the comment, dogs are people too but maybe not cats.

Why is there so much conflict in the world when we could all just code along with the Raspberry Pi?

ejolson
Posts: 5377
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sat Jul 27, 2019 6:32 pm

Heater wrote:
Sat Jul 27, 2019 5:49 pm
We need a Bint for every word saved in the index vector and another for every key saved in the anagram map. They have a minimum size of 32 * uint64_t.
Have you tried the gmp library?

According to Kira the Koding Kitty
Paeryn wrote:
Thu Jul 25, 2019 11:12 pm
an approach of counting letter distributions making a 96-bit key
results in a collision-free hash that can be used to identity anagrams.

How big is the largest integer that appears in the prime number algorithm? What assignment of prime numbers to letters minimises that number?

I asked Fido and he whined that alpha-beta pruning might help when solving the resulting min-max problem. Since finding anagrams isn't a game like chess I objected, but the dog developer became barking mad and replied: What fun is chess? You can't even use dogarithms to find the next move.

Return to “General programming discussion”