swampdog
Posts: 230
Joined: Fri Dec 04, 2015 11:22 am

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 1:32 am

Fwiw, GEM from Digital Research (aka early Mac and Atari ST) GUI used pointers. Each graphical OBJECT had two pointers, "head" and "tail" which would point to the first OBJECT of the current OBJECT's child node and last OBJECT child node respectively plus a "next" pointer which would point to the next sibling of the current OBJECT. A tree structure in effect with a single root OBJECT which together describes all the graphical elements.

Code: Select all

                              +--------------------+
                              | head | next | tail |
                              +--------------------+
                                 /              \
  -------------------------------                ---
 /                                                  \
+--------------------+    +--------------------+    +--------------------+
| head | next | tail |    | head | next | tail |    | head | next | tail |
+--------------------+    +--------------------+    +--------------------+
           \              /          \             /
            --------------            -------------
Each OBJECT would have its position defined as an offset from its parent so to move a tree visually onscreen all that was required was to change the offset of the root OBJECT (that offset effectively being the absolute co-ordinates). It also lends itself to very simple operation for cutting or adding one tree into another. Interestingly, as originally conceived the OBJECT offsets were defined in terms of character cells with a subpixel component for fine adjustment. In other words the structure would work even on a terminal (or curses etc). A program could write directly to its OBJECT and absolutely nothing would happen visually .. until the program invoked a redraw itself which it would typically do in response to a redraw message and clipping rectangle from the system. This tree had to fit into 32K (http://toshyp.atari.org/en/008016.html#OBJECT).

What was the tree in reality? Just an array of OBJECT. The head/tail/next pointers were actually just the (C language) OBJECT[] array index. There was only a single real pointer "ob_spec" in an OBJECT (because different visual objects have different attributes).

Today people shy away from (real) pointers because (a) they can't really program but more importantly (b) because nobody in education gave them the opportunity to find out if (a) is true.

You could replace "ob_spec" today with a map/dictionary and it would be substantially larger and it would work just fine. It's the "all you have is a hammer" syndrome: a modern implementation of this tree would likely replace head/next/tail with a map/dictionary also and it would be massive and it would work just fine.

I blame java. If you've ever seen "big iron" pausing for nearly 15 minutes while it garbage collects on a 24/7 outsourced app then brush off your CV because you don't want to be piggy in the middle on that.

Unusually for an advocacy thread, I'm going to agree with everyone! :-)

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 3:59 am

swampdog,

This is perhaps why people have problems with understanding pointers, assuming the do. You start out describing a tree structure built with pointers. You then go on to explain that you are not talking about pointers at all but rather actually just indexes into the array OBJECT[]. Which is a totally different thing. How confused do you want people to be?
Today people shy away from (real) pointers because (a) they can't really program but more importantly (b) because nobody in education gave them the opportunity to find out if (a) is true.
Do they?

The tree structure that you describe there is very common in all kind of programming. People will often come across it in GUI systems, like the GEM you describe, even if they don't normally get themselves into linked lists, trees and other data structures that use "pointer like" links between elements.

Last week I started experimenting with Microsoft's Babylon.js. Babylon is a library for creating accelerated 3D scenes. Typically a complex 3D object is composed of many parts. Those parts are linked to a parent, they can be assembled recursively into a tree, they all have position coordinates relative to the parent, move the parent and the whole thing moves.

For example, here is a simple car model, being loaded from a model file as a bunch of parts, body, wheels, windows, etc and assembled into such a tree structure:

Code: Select all

    var car_mesh = BABYLON.Mesh.CreateBox("carParent", 2, scene);
        car_mesh.position = new BABYLON.Vector3(5, 0, 5);
        car_mesh.visibility = .0;

    // Use asset manager to fetch our car
    const carMeshTask = assetsManager.addMeshTask('carTask', '', './Car-Model/', 'Car.obj');
    carMeshTask.onSuccess = (task) => {
        // Add all parts of car as children to car_mesh 
        task.loadedMeshes.forEach(function (b) {
            b.parent = car_mesh
        })
    }

    // Start the asset manager loading...
    assetsManager.load();
And here is how it looks: https://otaniemi.conveqs.fi:3000/public/jatkasaari.html

There are millions of JS programmers in the world do this kind of thing every day with "pointer like" things. No doubt in the Python and other worlds as well.
You could replace "ob_spec" today with a map/dictionary and it would be substantially larger and it would work just fine. It's the "all you have is a hammer" syndrome: a modern implementation of this tree would likely replace head/next/tail with a map/dictionary also and it would be massive and it would work just fine.
No doubt true. Don't forget though, if you are working in a high level language you should have more hammers available than just the pointers in C.
I blame java. If you've ever seen "big iron" pausing for nearly 15 minutes while it garbage collects on a 24/7 outsourced app then brush off your CV because you don't want to be piggy in the middle on that.
So do I.

I don't know about "big iron" but .... We can have all kind of "language war" debates but of all the major languages we have today Java is the only one that has absolutely no reason to exist at all or even to have been created in the first place.

Java was not created to bring us anything new conceptually, unlike languages like C++, Haskell, Scheme, Forth where the idea is to embody some different programming paradigm.

Java was not created as a solution to any particular problem at the time, unlike BASIC. In fact the initial vague idea for Java was as a language for embedded systems where it is particularly unsuitable. Then they tried targeting Java at "applets" in web pages, which was a disaster.

Java is big, slow, unpredictable, unnecessarily complicated to use, has no standard. Need I go on.

Yes, I will... I have watched online videos of lectures of under graduate courses in introductory CS from MIT and Berkeley. The oldest, and best, by Prof. Brian Harvey from Berkeley over 10 years ago.

These course covered introductory CS material including learning a programming language and moving on to data structures (lists, trees, etc) and various algorithms and algorithmic analysis etc. Over the years the language used has changed, even if the conceptual stuff remained the same. Brian Harvey originally used Scheme the last one I saw used a version of Scratch would you believe!

I was amazed to find that in the year they used Java the poor students were having a really hard time even programming simple things. Half way through the course they were still lecturing on how to program in Java. The actual CS material was way behind. Lost in the noise of all that language complexity.

Contrast to the Scheme version of the course where the language was pretty much done with after the first week and they moved on to to the interesting CS stuff immediately.

So yes, I blame Java.

ejolson
Posts: 3419
Joined: Tue Mar 18, 2014 11:47 am

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 6:48 am

scruss wrote:
Sat Jul 13, 2019 8:01 pm
Further to this, here's an example of something I do that would be very annoying to write in C, but Perl does it quickly and efficiently enough. The problem can be stated as:
List the English words (no proper nouns, no possessives) from the system word list that are valid anagrams of other words in that list. The results should be grouped by words containing the same letters and sorted alphabetically by first match
What I'm looking for is something like:

Code: Select all

abed: bade, bead
abet: bate, beat, beta
abets: baste, bates, beast, beats, betas
 …
woodworm: wormwood
wriest: writes
wrist: writs
Here's my mostly idiomatic Perl code to do this:

Code: Select all

#!/usr/bin/env perl
# selfgrams.pl - find words that have valid anagrams
# scruss - 2019-07 - may need wcanadian/wbritish/wamerican package
use v5.20;
my (@words, %grams, @elem, %output, $key, $fh);

# create source word list from system file
open( $fh, '<', '/usr/share/dict/words' ) or die "$!\n";
chomp( @words = grep { /^[a-z]+$/ } <$fh> );
close($fh);

# process source word list into hash of anagrams
foreach (@words) {
    # sort word's letters to generate key - eg: 'maps' => 'amps'
    $key = join( '', sort( split( '', $_ ) ) );
    # append word to existing hash match or create if word is new
    @elem =
        ( exists( $grams{$key} ) )
      ? ( @{ $grams{$key} }, $_ )
      : ($_);
    $grams{$key} = [@elem];
}

# generate output hash of anagrams where key matched > 1 word
foreach ( grep { scalar( @{$_} ) > 1 } values(%grams) ) {
    @elem = @{ $_ };
    $key = shift(@elem);
    $output{$key} = join( ', ', @elem );
}

foreach ( sort( keys(%output) ) ) {
    say join( ': ', $_, $output{$_} );
}
exit;
The code uses some Perlisms (hashes of arrays, heavy use of list context and the grep filter function, the @{ … } array dereference operator [aka rose]) but should be fairly clear. References are the nearest thing that Perl gets to pointers. You know you've done something wrong with a reference when Perl spits out something like ARRAY(0xcd3a28) instead of the data you wanted.

Running this on a Raspberry Pi Zero:
  • takes about 7½ seconds
  • uses a little under 19 MB or RAM (if /usr/bin/time keeps good stats)
  • scans a 920 KB words file with a little over 99,000 entries
  • roughly 63,000 words from the list pass the “no proper nouns, no possessives” test, which I've assumed to be matched by /^[a-z]+$/
  • of those, around 7800 anagrams are found with ~3500 distinct forms / lines of output.
Note that nowhere do I know anything explicit about the input data. I don't allocate any memory, and I don't make any assumptions about how many lines, how many words or how many anagrams match each pattern. Perl does all that, and I'd far rather it did than me trying to hash arrays of unknown length in C without memory leaks. Many user processes use much more memory than my little program (the desktop 30x as much, VNC services 3x as much), and while it could be a bit quicker, it's hardly worth the bother.

(And why do I do this? To get better at Bananagrams. No, seriously …)
Since this thread is in the C forum, I wrote a C code using lots of pointers to process the same dictionary file. On a Raspbery Pi 3B+ the program is about 4 times as many lines of code, runs about 15 times faster and creates exactly the same output.

While the program was fun to write, it probably says more about my time-management skills and sense of humour than anything else.

Code: Select all

/*  anagram.c -- Find anagrams in /usr/share/dict/words
    Written July 16, 2019 by Eric Olson

    This program is an example of how irritating it is to work with
    pointers and C to find anagrams and yet how efficient.

    On a Raspberry Pi 3B+ we obtain

$ gcc -O3 -mtune=Cortex-A53 -mcpu=Cortex-A53 -Wall -o anagram anagram.c
$ echo performance >/sys/devices/system/cpu/cpufreq/policy0/scaling_governor
$ time ./anagram >anagram.new

real    0m0.116s
user    0m0.115s
sys 0m0.001s
$ time ./anagram.perl >anagram.old

real    0m1.792s
user    0m1.751s
sys 0m0.040s
$ echo ondemand >/sys/devices/system/cpu/cpufreq/policy0/scaling_governor
$ diff anagram.new anagram.old

    which implies that the C version is about 15.5 times faster than
    the version written in Perl.  However there are about 4 times as
    many lines of code.
*/

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <ctype.h>
#include <string.h>
#include <strings.h>

static void mergechar(int m,char x[m],int n, char y[n],char z[n+m]){
    int k=0,i=0,j=0;
    while(i<m&&j<n){
        if(x[i]<=y[j]) z[k++]=x[i++];
        else z[k++]=y[j++];
    }
    while(i<m) z[k++]=x[i++];
    while(j<n) z[k++]=y[j++];
}
static void msortchar(int n,int p,char x[n],char y[n]){
    if(n==1) {
        if(p%2) y[0]=x[0];
        return;
    }
    int m=n/2;
    msortchar(m,p+1,x,y);
    msortchar(n-m,p+1,x+m,y+m);
    if(p%2) mergechar(m,x,n-m,x+m,y);
    else mergechar(m,y,n-m,y+m,x);
}
static char *words,*keys,*cursor;
static char *skipword(){
    for(;*cursor;cursor++){
        if(*cursor=='\n') return ++cursor;
    }
    return 0;
}
static char *getword(){
    do {
        char *r=cursor;
        while(islower(*cursor)) cursor++;
        if(*cursor=='\n'){
            *cursor++=0;
            return r;
        }
    } while(skipword());
    return 0;
}

typedef struct { int p,n; } indtype;
static indtype *ind; static int indmax=0;

static int indcomp(const void *a,const void *b){
    int r;
    r=strcmp(keys+((indtype *)a)->p,keys+((indtype *)b)->p);
    if(r) return r;
    return strcmp(words+((indtype *)a)->p,words+((indtype *)b)->p);
}
static int anacomp(const void *a,const void *b){
    return strcmp(words+ind[((indtype *)a)->p].p,
        words+ind[((indtype *)b)->p].p);
}
int main(){
    int fd=open("/usr/share/dict/words",O_RDONLY);
    struct stat fdstat;
    fstat(fd,&fdstat);
    words=malloc(fdstat.st_size+1);
    read(fd,words,fdstat.st_size);
    words[fdstat.st_size]=0;
    close(fd);
    int lines=2;
    for(char *p=words;*p;p++) if(*p=='\n') lines++;
    ind=malloc(lines*sizeof(indtype));
    int bufmax=0;
    cursor=words;
    for(;;){
        char *word=getword();
        if(!word) break;
        ind[indmax].p=word-words;
        int r=cursor-word-1;
        ind[indmax++].n=r;
        if(bufmax<r) bufmax=r;
    }
    char aux[bufmax];
    keys=malloc(fdstat.st_size+1);
    memcpy(keys,words,fdstat.st_size+1);
    for(int i=0;i<indmax;i++){
        msortchar(ind[i].n,0,keys+ind[i].p,aux);
    }
    qsort(ind,indmax,sizeof(indtype),indcomp);
    int flag=0;
    indtype *ana; int anamax=0;
    ana=malloc(sizeof(indtype)*lines/2);
    for(int i=1;i<indmax;i++){
        if(!strcmp(keys+ind[i-1].p,keys+ind[i].p)){
            if(!flag){
                ana[anamax].p=i-1;
                flag=1;
            }
        } else {
            if(flag){
                ana[anamax++].n=i;
                flag=0;
            }
        }
    }
    if(flag) ana[anamax++].n=indmax;
    qsort(ana,anamax,sizeof(indtype),anacomp);
    for(int i=0;i<anamax;i++){
        int j=ana[i].p;
        printf("%s: %s",words+ind[j].p,words+ind[j+1].p);
        for(j+=2;j<ana[i].n;j++){
            printf(", %s",words+ind[j].p);
        }
        printf("\n");
    }
    return 0;
}
Improvements to the run time and ways to make the code clearer (but not longer) would be appreciated.

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 7:42 am

15 time faster hey?

Now Perl is also on my hit list.

Remember folks, using Python or Perl causes climate change and global warming.

Apparently it also pushes C programmers over the edge into insanity :)

User avatar
Michiel O.
Posts: 178
Joined: Mon Dec 12, 2016 12:06 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 8:23 am

Heater wrote: Remember folks, using Python or Perl causes climate change and global warming
Using Intel chips instead of ARM chips causes global warming, not dynamically typed languages.
"You can't actually make computers run faster, you can only make them do less." - RiderOfGiraffes

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 8:46 am

I'm all for moving away from Intel x86 to RISC. Not just for power consumption reasons.

But I bet you will find that Perl is still 15 times slower than C on ARM. So still 15 times as much contribution to global warming and climate change.

jahboater
Posts: 4604
Joined: Wed Feb 04, 2015 6:38 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 8:50 am

ejolson wrote:
Tue Jul 16, 2019 6:48 am
Improvements to the run time and ways to make the code clearer (but not longer) would be appreciated.
Would it be possible to replace the msortchar() function (which looks very slow) with qsort() ?
It would make the code a bit shorter/simpler too.

User avatar
bensimmo
Posts: 4156
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 9:27 am

What was the overall time of each.

From scratch, too write, to compile and run (or just run) and get the answer?

More typing must take longer (assuming same skill level)
Was the compile and run faster than the just run?

Which is more efficent then?

Which one can most people understand without dedication to the language.

That's probably why Science likes Python & Perl.
Is does the job quickly for what they want.
If anything needed to be quicker they can plug in a faster bits of code if available.

And more importantly, is the skill level of programming reduced?

hippy
Posts: 5782
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 9:49 am

hippy wrote:
Sat Jul 13, 2019 7:53 pm
There are no data or data structure pointers in Python
I need to retract that.

It is entirely possible to create a linked list where one node points to another -

Code: Select all

node2 = [ 2, None  ]
node1 = [ 1, node2 ]
node0 = [ 0, node1 ]

ptr = node0
while ptr != None:
  print(ptr[0])
  ptr = ptr[1]
And if one adds "node2[1] = node0" that creates a circular linked list.

I cannot recall exactly what had me thinking Python did not have pointers in the context of the conversation but that was wrong. It was probably that one cannot set a variable to a value, eg 0x2F201050, and then use that to indirectly read what's at that memory address. That's true, one cannot. What a pointer can point to is more restrictive.

In fact, almost everything is a pointer in Python though that is often hidden from the programmer. For example "print" on that linked list will show it as a nested data structure, though that neatly terminates when presented with a circular or infinite construct.

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 10:10 am

bensimmo,

Hmm...

So what we need, with global warming and climate change in mind, is a complete life cycle energy usage analysis from the moment a person starts learning how to program a computer to the moment they finally create a program that computes the correct result.

We can assume they stop running it when they get the correct result for this particular problem.

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 10:24 am

hippy,
I cannot recall exactly what had me thinking Python did not have pointers...
But you were right the first time. The Python language does not have pointers.

The code you posted there looks very similar to my previous Javascript example.

Javascript does not have pointers either.

They do have names for things, symbols, that can refer to variables. Those variables can be simple things like numbers or they can be complicated things like arrays, objects, dictionaries....

They may well, and almost certainly are, using pointers under the hood. But their is no concept of a pointer in the languages.

See for example the Python language documentation. It does not use the word "pointer" until in comes to dealing with interfacing to C and other languages. For example:
from ctypes import *
i = c_int(42)
pi = pointer(i)
Of course "pointer" there is not a pointer as we know it but an instance of Python object called "pointer" that presumably wraps a real pointer some how.

hippy
Posts: 5782
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 12:18 pm

Heater wrote:
Tue Jul 16, 2019 10:24 am
hippy,
I cannot recall exactly what had me thinking Python did not have pointers...
But you were right the first time. The Python language does not have pointers.

The code you posted there looks very similar to my previous Javascript example.

Javascript does not have pointers either.

They do have names for things, symbols, that can refer to variables. Those variables can be simple things like numbers or they can be complicated things like arrays, objects, dictionaries....
Or, as in the case of my 'ptr' variable, can point to some other object.

I guess the question is whether a variable which points to something else is "a pointer" or not ?

Today I am believing it is. I had accepted that for function pointers but not recognised that in the general, data, case.

Python itself refers to them as "references" rather than "pointers". Today I'm not seeing there's much difference between the two.

I suppose it comes down to whether an object can ever be considered a pointer.

Perhaps we need a more absolute and universal definition of "pointer" ?

jalih
Posts: 76
Joined: Mon Apr 15, 2019 3:54 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 1:06 pm

ejolson wrote:
Mon Jul 15, 2019 6:08 pm
To understand how your results compare with a real Pi, would you mind running my pichart program available here and reporting the output either here or in that thread?
Here are results for my ROCK64 board:
Attachments
pichart.png
pichart.png (45.74 KiB) Viewed 780 times

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 2:28 pm

hippy,

We should not fret too much about the definition of "pointer". Like many terms it is often used very sloppily, other times it takes on very specific meanings, i.e. in C language specifications.

This conversation started when somebody here was giggling because Python did not have pointers and that languages like C and some languages with "BASIC" in their name do. And that it was necessary to have pointers to able to write good code and having pointers made those languages superior.

We have demonstrated the ignorance of such giggling by showing that you can do all the same things in high level languages like Python and JS even if they don't have things as crude as C or BASIC pointers.
I suppose it comes down to whether an object can ever be considered a pointer.
As it happens in the C++ standard we now have "pointers" called unique_ptr, shared_ptr and whatever else. As far as I know they are objects!

ejolson
Posts: 3419
Joined: Tue Mar 18, 2014 11:47 am

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 2:54 pm

jalih wrote:
Tue Jul 16, 2019 1:06 pm
ejolson wrote:
Mon Jul 15, 2019 6:08 pm
To understand how your results compare with a real Pi, would you mind running my pichart program available here and reporting the output either here or in that thread?
Here are results for my ROCK64 board:
Thanks. For purposes of comparison the processor speed of the Raspberry Pi 3B+ is quite similar--not surprising since both use a quad-core Cortex-A53. The differences may come from two things: the 3B+ uses LPDDR2 memory while the Rock64 uses LPDDR3. The 3B+ is running in 32-bit mode while the other 64-bit. It would be interesting to know which difference has the biggest effect.

ejolson
Posts: 3419
Joined: Tue Mar 18, 2014 11:47 am

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 3:36 pm

jahboater wrote:
Tue Jul 16, 2019 8:50 am
ejolson wrote:
Tue Jul 16, 2019 6:48 am
Improvements to the run time and ways to make the code clearer (but not longer) would be appreciated.
Would it be possible to replace the msortchar() function (which looks very slow) with qsort() ?
It would make the code a bit shorter/simpler too.
I think you might be right. I initially thought the extra indirection for the comparison function in the standard qsort routine was to heavy to use for sorting a handful of letters. Even an O(n^2) bubble sort might be faster. Alternatively, one could optimise the merge sort to end in bubbles once the list is less than four or five characters instead of recursing all the way down.

In order to save the planet I did not try any of these things, because as I sat there writing C code I myself was generating greenhouse gases. It must have been the beans.

ejolson
Posts: 3419
Joined: Tue Mar 18, 2014 11:47 am

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 4:29 pm

Heater wrote:
Tue Jul 16, 2019 2:28 pm
This conversation started when somebody here was giggling because Python did not have pointers and that languages like C and some languages with "BASIC" in their name do.
What wrong with being happy?

I just heard from the lead developer that FidoBasic will have a better kind of pointer:

Image

While not so good at anagrams or bananagrams, they are better than mmap for fetching things not already resident in core.
Last edited by ejolson on Tue Jul 16, 2019 4:33 pm, edited 1 time in total.

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 4:33 pm

Maybe I have misunderstood the problem but as far as I can tell there is no need to sort the characters within each word.

If you want to know if word A is an anagram of word B then simply count the number of occurrences of each letter of the alphabet in each word. Store these frequencies in an array.

Then, to test if A and B are anagrams of each other simply compare those character frequency arrays for equality.

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 4:39 pm

ejolson,
What wrong with being happy?
No idea. Never tried it.

I do object to people giggling at us, well me. But then I feel better when I realize they know not about what they speak.

Fido looks like a smart dog. Did he ever read "Electronics for Dogs"?

Image

ejolson
Posts: 3419
Joined: Tue Mar 18, 2014 11:47 am

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 4:54 pm

Heater wrote:
Tue Jul 16, 2019 4:33 pm
Maybe I have misunderstood the problem but as far as I can tell there is no need to sort the characters within each word.

If you want to know if word A is an anagram of word B then simply count the number of occurrences of each letter of the alphabet in each word. Store these frequencies in an array.

Then, to test if A and B are anagrams of each other simply compare those character frequency arrays for equality.
Comparing every word in the dictionary to every other word is O(N^2) where N is the number of words in the dictionary. Sorting is O(N log N).

It may be possible to use a 64-bit integer as the sort key where the bit fields of the integer count the letter frequencies: 3-bits for etaoin shrdlu and 2-bits for the others. This has the advantage of converting the primary key for the largest sort from a string to an integer.

If you are planning to write a program, note that the problem size is so small that the ondemand cpu governor doesn't react quickly enough to obtain accurate timings. How to change the governor is in the comments of my C code.

jalih
Posts: 76
Joined: Mon Apr 15, 2019 3:54 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 5:12 pm

Heater wrote:
Tue Jul 16, 2019 4:33 pm
Maybe I have misunderstood the problem but as far as I can tell there is no need to sort the characters within each word.

If you want to know if word A is an anagram of word B then simply count the number of occurrences of each letter of the alphabet in each word. Store these frequencies in an array.

Then, to test if A and B are anagrams of each other simply compare those character frequency arrays for equality.
It's a lot easier to just sort the word and used it as a key into map where to store words.

I somewhat simplified my 8th version and think it's quite readable and easy to follow (compared to the C version that is):

Code: Select all

m:new var, anamap
a:new var, anakeys


: s:sort \ s -- s
  "" s:/
  ' s:cmp a:sort
  "" a:join ;


: process-words \ word --
  dup /^[a-z]+$/ r:match nip if
    dup
    >r
    s:sort
    anamap @
    over
    m:exists? if
      over m:@ r> a:push ' s:cmp a:sort rot swap m:!
    else
      swap a:new r> a:push m:!
    then
    drop
  else
    drop
  then ;


: read-and-check-words  \ fname --
  f:open-ro
  ' process-words f:eachline
  f:close ;


: len<  \ key  --
  drop ;


: len>=  \ key --
  anakeys @ swap a:push drop ;


: fetch-ana-list \ key array --
  a:len 2 n:cmp
  1 n:+
  nip
  [ ' len< , ' len>= , ' len>= ]
  swap
  caseof ;


: key-val-cmp          \ key1 key2
  anamap @ swap m:@    \ key1 anamap key2val
  0 a:@ nip            \ key1 anamap val2
  swap rot             \ val2 anamap key1
  m:@ nip              \ val2 key1val
  0 a:@ nip            \ val2 val1
  swap                 \ val1 val2
  s:cmp ;


: sort-keys-by-first-word
  anakeys @ ' key-val-cmp a:sort drop ;


: list-words \ value --
  " " a:join . cr ;


: app:main
  "/usr/share/dict/words" read-and-check-words
  anamap @ ' fetch-ana-list m:each drop
  sort-keys-by-first-word
  anamap @ anakeys @ m:@@ nip ' list-words a:each! drop
  anakeys @ a:len nip "\nAnagrams: %d\n" s:strfmt .
  bye ;
  

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 5:27 pm

I was not thinking of comparing every word in the dictionary to every other word.

Rather I was vaguely thinking of:

1) Scan the input words and create a character frequency table for each word. Each of those tables has length of the size of your alphabet. Elements could be just bytes as we probably don't have words more than 255 chars long.

2) Along with each frequency table calculated above store a pointer to the word it was calculated from.

3) All those frequency tables get added to an array. Or pointers to them get added to an array.

4) Now you can sort that array of frequency tables. After which you have all your anagrams collected together.

5) Scan the array of frequency tables, and print the words they point back to.
`
So one scan of the word list, one quick sort, and one scan of the frequency table array.

Or something like that.

ejolson
Posts: 3419
Joined: Tue Mar 18, 2014 11:47 am

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 5:33 pm

jalih wrote:
Tue Jul 16, 2019 5:12 pm
Heater wrote:
Tue Jul 16, 2019 4:33 pm
Maybe I have misunderstood the problem but as far as I can tell there is no need to sort the characters within each word.

If you want to know if word A is an anagram of word B then simply count the number of occurrences of each letter of the alphabet in each word. Store these frequencies in an array.

Then, to test if A and B are anagrams of each other simply compare those character frequency arrays for equality.
It's a lot easier to just sort the word and used it as a key into map where to store words.

I somewhat simplified my 8th version and think it's quite readable and easy to follow (compared to the C version that is):

Code: Select all

m:new var, anamap
a:new var, anakeys


: s:sort \ s -- s
  "" s:/
  ' s:cmp a:sort
  "" a:join ;


: process-words \ word --
  dup /^[a-z]+$/ r:match nip if
    dup
    >r
    s:sort
    anamap @
    over
    m:exists? if
      over m:@ r> a:push ' s:cmp a:sort rot swap m:!
    else
      swap a:new r> a:push m:!
    then
    drop
  else
    drop
  then ;


: read-and-check-words  \ fname --
  f:open-ro
  ' process-words f:eachline
  f:close ;


: len<  \ key  --
  drop ;


: len>=  \ key --
  anakeys @ swap a:push drop ;


: fetch-ana-list \ key array --
  a:len 2 n:cmp
  1 n:+
  nip
  [ ' len< , ' len>= , ' len>= ]
  swap
  caseof ;


: key-val-cmp          \ key1 key2
  anamap @ swap m:@    \ key1 anamap key2val
  0 a:@ nip            \ key1 anamap val2
  swap rot             \ val2 anamap key1
  m:@ nip              \ val2 key1val
  0 a:@ nip            \ val2 val1
  swap                 \ val1 val2
  s:cmp ;


: sort-keys-by-first-word
  anakeys @ ' key-val-cmp a:sort drop ;


: list-words \ value --
  " " a:join . cr ;


: app:main
  "/usr/share/dict/words" read-and-check-words
  anamap @ ' fetch-ana-list m:each drop
  sort-keys-by-first-word
  anamap @ anakeys @ m:@@ nip ' list-words a:each! drop
  anakeys @ a:len nip "\nAnagrams: %d\n" s:strfmt .
  bye ;
  
It is possible my choice of variable names was suboptimal as well as other things. On either of the standard English dictionaries (slightly less than 100,000 words) are your run times less than a second?

Heater
Posts: 13103
Joined: Tue Jul 17, 2012 3:02 pm

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 5:37 pm

You should install British English insane:

Code: Select all

$ wc -l /usr/share/dict/british-english-insane
654276 /usr/share/dict/british-english-insane

ejolson
Posts: 3419
Joined: Tue Mar 18, 2014 11:47 am

Re: I dont get it... Why is everyone from science schools so obsessed with Perl and Python when we got C?

Tue Jul 16, 2019 5:41 pm

Heater wrote:
Tue Jul 16, 2019 5:27 pm
I was not thinking of comparing every word in the dictionary to every other word.

Rather I was vaguely thinking of:

1) Scan the input words and create a character frequency table for each word. Each of those tables has length of the size of your alphabet. Elements could be just bytes as we probably don't have words more than 255 chars long.

2) Along with each frequency table calculated above store a pointer to the word it was calculated from.

3) All those frequency tables get added to an array. Or pointers to them get added to an array.

4) Now you can sort that array of frequency tables. After which you have all your anagrams collected together.

5) Scan the array of frequency tables, and print the words they point back to.
`
So one scan of the word list, one quick sort, and one scan of the frequency table array.

Or something like that.
You could probably eliminate the final sort in my C program by relying on the fact that the original word list is in alphabetical order. I suspect the Perl script relies on this property. Otherwise, you will need one more sort to print the final list of anagrams ordered by the first member.

Return to “C/C++”