User avatar
Paeryn
Posts: 2836
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Project Digital Apocalypse Not Now

Fri Aug 02, 2019 3:33 am

When the Koding Kitty looked at the zombie's code he arched his back, his fur stood on end (quite an impressive look for a long haired cat) and he hissed (paraphrased to make it kid-friendly) "Get Fido to safety, there be some strong voodoo magic going on there!".

Kira's been helping me with the Haskell version, we're nearly there, just got a slight hiccough where I know what I want to do but the last bit fails to compile. Annoying since if I pass a literal list to that function it works, but when the list is a variable the compiler complains about incorrect types. It definitely won't be a speed demon though.
She who travels light — forgot something.

ejolson
Posts: 4480
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Aug 02, 2019 4:08 pm

Paeryn wrote:
Fri Aug 02, 2019 3:33 am
When the Koding Kitty looked at the zombie's code he arched his back, his fur stood on end (quite an impressive look for a long haired cat) and he hissed (paraphrased to make it kid-friendly) "Get Fido to safety, there be some strong voodoo magic going on there!".
The lead developer of FidoBasic is back from the veterinarian. Thankfully the foaming was caused by eating too much ice cream rather than something more serious.

I spotted an easy opportunity for additional parallelization when cleaning up the zombie's code. The result is now

Code: Select all

$ time ./zanagram >zanagram.insane

real    0m0.351s
user    0m0.760s
sys 0m0.068s
$ md5sum zanagram.insane selfgrams.insane
bec74aa3b31577edbb291aeb7269a4d5  zanagram.insane
bec74aa3b31577edbb291aeb7269a4d5  selfgrams.insane
The dezombified code is

Code: Select all

/*  zanagram.c -- Find anagrams
    Written by the zombies with help from an anonymous dog
    Modified August 2, 2019 by Eric Olson

    This program demonstrates using dogarithms and a parallel merge
    sort to find anagrams in the insane British dictionary.
*/

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <ctype.h>
#include <string.h>
#include <strings.h>
#include <math.h>

#include <stdio.h>
#include <stdint.h>

#ifdef _OPENMP
# include <omp.h>
# define do_pragma(Z) _Pragma(#Z)
# define spawn(Y) do_pragma(omp task default(shared) Y)
# define forall _Pragma("omp taskloop default(shared)") for
# define sync _Pragma("omp taskwait")
# define get_ncpu(X) omp_get_num_procs(X)
# define workers(X) \
    omp_set_dynamic(0); \
    if(X) omp_set_num_threads(X); \
    else omp_set_num_threads(1); \
    _Pragma("omp parallel default(shared)") \
    _Pragma("omp single")
#else
# define spawn(Y)
# define forall for
# define sync
# define get_ncpu(X) 1
# define workers(X)
#endif

static int ncpu,logcpu;
static char *words,*cursor;
static char *skipword(){
    for(;*cursor;cursor++){
        if(*cursor=='\n') return ++cursor;
    }
    return 0;
}
static char *getword(){
    do {
        char *r=cursor;
        while(islower(*cursor)) cursor++;
        if(*cursor=='\n'){
            *cursor++=0;
            return r;
        }
    } while(skipword());
    return 0;
}

typedef uint64_t keytype;
typedef struct anatype {
    char *word; struct anatype *next; keytype key; } anatype;

static double pr[26]={
    7, 61, 29, 41, 2, 67, 53, 47, 3, 101, 73, 23, 43,
    11, 13, 37, 97, 17, 5, 19, 31, 71, 79, 83, 59, 89 };
static keytype dog[26];
static keytype strkey(char *p){
    keytype r=0;
    for(;*p;p++) r+=dog[*p-'a'];
    return r;
}
static int anacmp(anatype *ap,anatype *bp){
    if(ap->key<bp->key) return -1;
    if(ap->key>bp->key) return 1;
    return strcmp(ap->word,bp->word);
}
static void mergeserial(int m,anatype *x[m],
    int n,anatype *y[n],anatype *z[n+m]){
    int k=0,i=0,j=0;
    while(i<m&&j<n){
        if(anacmp(x[i],y[j])<=0) z[k++]=x[i++];
        else z[k++]=y[j++];
    }
    while(i<m) z[k++]=x[i++];
    while(j<n) z[k++]=y[j++];
}
static void msortserial(int n,int p,anatype *x[n],anatype *y[n]){
    if(n==1){
        if(p%2) y[0]=x[0];
        return;
    }
    int m=n/2;
    msortserial(m,p+1,x,y);
    msortserial(n-m,p+1,x+m,y+m);
    if(p%2) mergeserial(m,x,n-m,x+m,y);
    else mergeserial(m,y,n-m,y+m,x);
}
static int binsearch(int n,anatype *y[n],anatype *r){
    int a=0,b=n-1;
    if(anacmp(y[a],r)>0) return 0;
    if(anacmp(y[b],r)<0) return n;
    for(int m=(a+b)/2;a!=m;m=(a+b)/2){
        if(anacmp(y[m],r)<=0) a=m;
        else b=m;
    }
    return b;
}
static void mergeparallel(int p,int m,anatype **x,
    int n, anatype **y,anatype *z[n+m]){
    if(m<n){
        int k=m; m=n; n=k;
        anatype **t=x; x=y; y=t;
    }
    if(p>logcpu||m<4096){
        mergeserial(m,x,n,y,z);
        return;
    }
    int m2=m/2;
    int n2=binsearch(n,y,x[m2]);
    spawn() mergeparallel(p+1,m2,x,n2,y,z);
    mergeparallel(p+1,m-m2,x+m2,n-n2,y+n2,z+m2+n2);
    sync;
}
static void msortparallel(int n,int p,anatype *x[n],anatype *y[n]){
    if(p>=logcpu||n<1024){
        msortserial(n,p,x,y);
        return;
    }
    int m=n/2;
    spawn() msortparallel(m,p+1,x,y);
    msortparallel(n-m,p+1,x+m,y+m);
    sync;
    if(p%2) mergeparallel(p,m,x,n-m,x+m,y);
    else mergeparallel(p,m,y,n-m,y+m,x);
}

static anatype **sra,**srb;
static anatype *an; int anmax=0;
static void linkslice(int n){
    int imin=n*anmax/ncpu,imax=(n+1)*anmax/ncpu;
    int i=imin;
    if(i) for(;i<imax;i++){
        if(sra[i-1]->key!=sra[i]->key) break;
    }
    for(;i<imax;){
        int j=i, k=i;
        for(i++;i<anmax;j=i++){
            if(sra[i]->key!=sra[k]->key) break;
            sra[j]->next=sra[i];
        }
        if(k!=j) sra[j]->next=sra[k];
    }
}
int main(){
    for(int i=0;i<26;i++) dog[i]=log(pr[i])*(1ull<<52)+0.5;
    int fd=open("/usr/share/dict/british-english-insane",O_RDONLY);
    struct stat fdstat;
    fstat(fd,&fdstat);
    words=malloc(fdstat.st_size+1);
    read(fd,words,fdstat.st_size);
    words[fdstat.st_size]=0;
    close(fd);
    int lines=2;
    for(char *p=words;*p;p++) if(*p=='\n') lines++;
    an=malloc(lines*sizeof(anatype));
    sra=malloc(lines*sizeof(anatype *));
    srb=malloc(lines*sizeof(anatype *));
    for(cursor=words;;anmax++){
        char *wp=getword();
        if(!wp) break;
        an[anmax].word=wp;
        an[anmax].next=0;
        sra[anmax]=an+anmax;
    }
    ncpu=get_ncpu(); if(ncpu>4) ncpu=4;
    workers(ncpu){
        forall(int i=0;i<anmax;i++){
            an[i].key=strkey(an[i].word);
        }
        logcpu=log2(ncpu)+2.99;
        msortparallel(anmax,0,sra,srb);
        for(int n=0;n<ncpu;n++){
            spawn(firstprivate(n)) linkslice(n);
        }
        sync;
    }
    for(int i=0;i<anmax;i++){
        if(an[i].next){
            fputs(an[i].word,stdout);
            anatype *bp=an[i].next; an[i].next=0;
            char *s=": ";
            while(bp->next){
                fputs(s,stdout); s=", ";
                fputs(bp->word,stdout);
                anatype *op=bp; bp=bp->next; op->next=0;
            }
            fputc('\n',stdout);
        }
    }
    return 0;
}
I'm not sure it's that much easier to read, though.

The bar chart now appears as

Image
Last edited by ejolson on Wed Aug 07, 2019 7:46 pm, edited 2 times in total.

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Fri Aug 02, 2019 5:28 pm

From here (4 hour 45 minutes) the concept of less than a second sounds like mission impossible.

Heater
Posts: 14732
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Fri Aug 02, 2019 6:38 pm

Just now the concept of getting down from ~1 second to half a second is giving me headache. Before even thinking about going multi-core...
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Fri Aug 02, 2019 8:41 pm

What is your best interpretive language result?

Does the language used have to run on Linux?

ejolson
Posts: 4480
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Aug 02, 2019 11:33 pm

John_Spikowski wrote:
Fri Aug 02, 2019 8:41 pm
Does the language used have to run on Linux?
Ideally the program would run on the Raspberry Pi. For example, a language available only under RISC OS, Windows IOT or Plan 9 would be interesting. In that case, one could copy the insane British dictionary from Raspbian to the other operating system and complete the challenge.

Heater
Posts: 14732
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 2:39 am

That would be 8.4 seconds in Javascript. See bar chart above.

As this is the Raspberry Pi forum solutions had better run on a Pi, preferably all of them.

Which basically means it had better run on Linux otherwise it may as well not exist.

I guess a bare metal solution that can be flashed to SD and runs from directly from boot up would be interesting...
Memory in C++ is a leaky abstraction .

ejolson
Posts: 4480
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 2:41 am

ejolson wrote:
Sun Jul 28, 2019 1:51 am
Unfortunately, Micro Center sold their Pi 4 stock last week. Therefore, except for a couple Zeros and SD cards, my shopping resulted in failure.
Today the shopping went better and thanks to my generous wife I now have a Raspberry Pi 4B. After I figure out cooling and a case, it should be possible to rerun the programs and make a new bar chart. Woohoo! I think the phase of the moon may be about to change.

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 3:44 am

Does it feel like adopting kids when you get a new RPi?

Heater
Posts: 14732
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 4:28 am

If you are going to make a new bar chart perhaps you could add version 5 of the Rust solution or replace version 4. It shaves off 10% running time on a Pi 3:

Code: Select all

//
// insane-british-anagram-5.rs - Find words that have valid anagrams
//                               Words sourced from Debian's british-english-insane dictionary
//
// heater - 2019-08-02
// 

#![allow(non_snake_case)]

use std::fs::File;
use std::io::Read;
use std::io::{self, Write};
use std::collections::HashMap;

#[derive(Copy, Clone)]
struct SliceSpec {
    begin: usize,
    end: usize,
}

#[derive(Copy, Clone)]
struct AnagramSet {
    wordSlices : [SliceSpec; 17],
    size : usize,
}

impl AnagramSet {
    fn new(word: SliceSpec) -> AnagramSet {
        return AnagramSet {
             wordSlices: [word; 17],
             size: 1,
        };
    }    
    fn push(&mut self, slice: SliceSpec) {
        self.wordSlices[self.size] = slice;
        self.size = self.size + 1;
    }
}

fn readInsaneBritishDictionary(mut dictionary: &mut Vec<u8>) -> std::io::Result<()> {
    let mut file = File::open("/usr/share/dict/british-english-insane")?;
    file.read_to_end(&mut dictionary)?;
    return Ok(());
}

fn isLowerCase (c : &u8) -> bool {
    if (*c < 'a' as u8) || (*c > 'z' as u8) {
        return false;
    } else {
        return true;
    }
}

// One prime number for each lower case letter of the alphabet
static PRIMES: [u64; 26] = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101];

fn anagrams() {
    let stdout = io::stdout();
    let mut stdoutHandle = stdout.lock();

    // Container for sets of anagrams 
    // An anagram set is simply an array of offets into the anagramSets array 
    let mut anagramMap: HashMap<u64, usize> = HashMap::new(); 

    // Vector of AnagramSets
    let mut anagramSets: Vec<AnagramSet> = Vec::new(); 
    let mut anagramSetsCount: usize = 0;

    // An ordered index of anagram set keys 
    let mut index: Vec<u64> = Vec::new();

    let mut dictionary = Vec::new();

    match readInsaneBritishDictionary(&mut dictionary) {    // Takes 25ms on PC
        Ok(()) => {
            let mut wordIndex = 0;
            let mut characterIndex = 0;
            let mut reject = false;
            let mut hash : u64 = 1;

            for c in &dictionary  {
                if isLowerCase(&c) {
                    // We are scanning a valid word
                    let primeIndex = (c -  97) as usize;  
                    hash = hash.wrapping_mul(PRIMES[primeIndex]);
                    characterIndex = characterIndex + 1;
                } else if *c == '\n' as u8 {
                    // We have hit the end of a word, use the word if it's valid
                    if !reject {
                        // Do we have a word with this key (potential anagram)?
                        let wordSpec = SliceSpec {begin: wordIndex, end:characterIndex}; 
                        match anagramMap.get_mut(&hash) {
                            Some(anagramSetsCount) => {
                                // Found: Append it to the existing anagram set
                                anagramSets[*anagramSetsCount].push(wordSpec);
                            },
                            None => {
                                // Not found: Add it to the map as start of new anagram set.
                                // Make a new anagram set with one word in it.
                                let anagramSet = AnagramSet::new(wordSpec);
                                // Add the new anagram set to our list of anagram sets
                                anagramSets.push(anagramSet);
                                anagramMap.insert(hash, anagramSetsCount);
                                anagramSetsCount = anagramSetsCount + 1;

                                // And add the new anagram set to index
                                index.push(hash);
                            }
                        }
                    }
                    characterIndex = characterIndex + 1;
                    wordIndex = characterIndex;
                    hash = 1;
                    reject = false;
                } else {
                    // Invalid character
                    hash = 1;
                    reject = true;
                    characterIndex = characterIndex + 1;
                }
            }

            let mut output: String = "".to_string();
            for hash in index {
                match anagramMap.get(&hash) {
                    Some(AnagramSetsCount) => {
                        let size = anagramSets[*AnagramSetsCount as usize].size;   
                        if size > 1 {    
                            let mut separator = "";
                            let mut i = 0;
                            while i < size {
                                let begin = anagramSets[*AnagramSetsCount].wordSlices[i].begin;
                                let end = anagramSets[*AnagramSetsCount].wordSlices[i].end;
                                let slice = &dictionary[begin .. end];
                                let word = String::from_utf8_lossy(&slice).to_string();
                                output = output + &separator;
                                output = output + &word;

                                if i == 0 {
                                    separator = ": ";
                                } else {
                                    separator = ", ";
                                }
                                i = i + 1;
                            }
                            output = output + "\n";
                        }
                    },
                    _ => (),
                }
            }

            match stdoutHandle.write_all(output.as_bytes()) {
                Ok(()) => {
                },
                Err(e) => println!("Error writing reult {}", e),
            }
        },  
        Err(e) => {
            println!("Error reading dictionary: {}", e);
        }
    }
}

fn main() {
    anagrams();
}

Code: Select all

[email protected]:~/insane-british-anagram-rust $ cargo build --release
   Compiling insane-british-anagram v0.1.0 (/home/pi/insane-british-anagram-rust)
    Finished release [optimized] target(s) in 9.26s
[email protected]:~/insane-british-anagram-rust $ time target/release/insane-british-anagram > temp.txt

real    0m0.907s
user    0m0.718s
sys     0m0.189s
Not brilliant but better than a slap in the belly with a wet fish.

Edit: It occurred to me that while scanning the input and calculating prime hashes my previous code had to scan the characters twice, once to find a word and again to calculate it's hash. The latest code accumulates the hash as is scans characters, boom 10% time saving. Which is nice.
Memory in C++ is a leaky abstraction .

jalih
Posts: 118
Joined: Mon Apr 15, 2019 3:54 pm

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 7:44 am

Heater,

To me Rust seems like unnecessary complex language (and yes, I program in PL/I :D ). What features do you like about it the most?

There is a comparison of V and other programming languages including Rust available here. There is simple example that fetches top Hacker News stories concurrently.

Below is my 8th programming language version of that program. I eliminated some locking by giving each task it's own slice of work:

Code: Select all

needs net/http

var ids
[] var, tasks

"https://hacker-news.firebaseio.com/v0/topstories.json" constant STORIES_URL
"https://hacker-news.firebaseio.com/v0/item/"           constant ITEM_URL_BASE

: get-ids
  STORIES_URL net:get if
    nip json> ids !
  else
    "Error retrieving data." throw
  then ;

: task  \ ids-slice --
  ( ITEM_URL_BASE "%s%s.json" s:strfmt net:get if
      nip json>
      "title" m:@ nip
      . cr
    else
      "Error retrieving data." throw
    then
  ) a:each! drop ;

: start-tasks
  tasks @
  ( ids @ swap -1 8 a:slice+ 1 ' task t:task-n a:push  ) 0 7 loop
  drop ;

: wait-tasks
  tasks @ t:wait ;

: app:main
  get-ids
  start-tasks
  wait-tasks
  bye ;

Heater
Posts: 14732
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 9:08 am

jalih,
What features do you like about it[Rust] the most?
I'm not sure I want to answer that question, having only been toying with Rust for six days and produced only two little programs in it. I'm far away from finding out what will annoy me about it. However... Given that C/C++ has been my language of choice for decades for many situations I could say the following attracts me to Rust:

1) It fixes a whole bunch of issues with C/C++ that I have been moaning about for decades. Crappy syntax, undefined behaviors, ridiculous complexity (C++).

2) It's the first language I have seen for a long time that offers something conceptually new. That is of course it's ideas about data sharing, aliasing, mutability ... generally checking as much as possible at compile time for safety. All in a compiled language with the speed of C/C++. Perhaps other obscure languages have done that but they have not made it onto my radar. The last I'm aware of, and used a lot, was Ada.

With that in mind I made some notes about what I liked and disliked about Rust as I was hacking together my first Rust programs. If you are interested in the thoughts of a newbie approaching the thing from C/C++:

Likes:

0. Operator precedence is same as C. - Saves confusing users of C/C++ etc.

1. No need for redundant parenthesis around condition in "if" and "while" statements. - I always wondered why C required those.

2. Curly braces required around conditional blocks, even if they are only one line. - Consistency makes for easier reading and maintaining.

3. No more potentially confusing "++", "--" operators.

4. No more clunky "for" loop syntax.

5. No global variables.

6. No messing with clunky include files.

7. No need for #define and other preprocessor kludges.

8. No more void pointer hacks required.

9. No forward declarations required.

10. No more out of bounds array access. I could remove some messy assertions from my code.

11. Array/Vector slices clean up a lot of pointer indirection mess.

12. Arrays/Vectors are sensibly initialized with "[x, y, z...]" rather than "{x, y, z...}".

13. When I translated my fast Fourier transform from C to Rust it ran correctly on the second attempt after getting rid of all the compile errors. On the first run it failed with an out of bounds array access which was a one character typo. That might have taken some head scratching without array bounds checking.

14. The "match" syntax is far nicer than "switch"/"case".

15. No need for make or CMake or other clunky build system files. Rust's Cargo build system is brilliant.

16. No more messy and error prone returning of error status and values mixed up in the same variable.

17. "u32" etc is much nicer than "uint32_t".

18. A lot of useful features of C++ without the mind bending complexity of C++.

19. Performance is up there with C/C++.

Dislikes:

None so far. Let me work on it... Oh wait:

0) Rust devs favor symbols with underscores, "snake_case". I hate underscores. By default the compiler complains about using camel case. Luckily that can be turned off.
To me Rust seems like unnecessary complex language...
I'm curious to know why you say that?

I don't see it's complexity as any greater than C and for sure it's a lot less than the brain damaging complexity of C++. For example take a look at my Fast Fourier Transform in C https://github.com/ZiCog/fftbench/blob/ ... fftbench.c and my redition of the same in Rust: https://github.com/ZiCog/fftbench/blob/ ... ftbench.rs. I find the latter easier on the eyes to read, as a bonus it has 25% less lines in it.

Or see the anagram solutions in C++ and Rust in this thread.

What is V? Looks like YAFL that I won't be paying attention to until it supports multiple platforms and has some community traction behind it.

Now 8th is what I call complex :)
Memory in C++ is a leaky abstraction .

jahboater
Posts: 5205
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 9:12 am

Heater wrote:
Sat Aug 03, 2019 9:08 am
0. Operator precedence is same as C. - Saves confusing users of C/C++ etc.
Really? even with & and | lower than == which has always been considered a mistake?
(although chosen with good reason at the time).
Heater wrote:
Sat Aug 03, 2019 9:08 am
2. Curly braces required around conditional blocks, even if they are only one line. - Consistency makes for easier reading and maintaining.

3. No more potentially confusing "++", "--" operators.

4. No more clunky "for" loop syntax.

5. No global variables.

6. No messing with clunky include files.

7. No need for #define and other preprocessor kludges.
Nothing mandates the use of these in C.
Nice to have the choice though, rather than a nanny state language forbidding them.
As for the curly braces in (2), I see your code uses a mixture, and worse, has stuff like this.

Code: Select all

if ((b0 < 0) || (b0 >= slen)) rangeError = 1;
Worse still, you were complaining about the extra () in if statements, yet you have voluntarily added an extra four brackets here that are obviously not needed :)
Heater wrote:
Sat Aug 03, 2019 9:08 am
0) Rust devs favor symbols with underscores, "snake_case". I hate underscores. By default the compiler complains about using camel case. Luckily that can be turned off.
I don't like the "can be turned off" bit. Why cant identifier styles just be chosen by the user as in most languages?
/* BTW when I write a letter to my mother, I don't capitalize every word and remove all the spaces - do you?
hello_world is closer to normal English and is therefore easier to read. */

On a related note, I remember reading something about swift and its strong typing.
For example stuff like this is not allowed:

int32 a;
int16 b;
if( a == b )

which is perfectly sensible and safe (all int16 values can be safely represented in an int32 and the conversion is mandatory anyway on many CPU's). I would never use a language like that.
Ditto Pascal where two strings of differing lengths are (or were) regarded as incompatible types (how the heck are you supposed to write something as simple as strcpy() ? )
</rant>

Heater
Posts: 14732
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 10:17 am

Nice to have the choice though, rather than a nanny state type language forbidding them.
We obviously have a philosophical difference there.

From purely selfish motive I would be happy if every one who ever published code that I end up reading was forced to use the same style and bared from using confusing constructs. It's not just me, work on any large project and they will likely have a style guides and coding rules that get checked at review time. Why, because it's long recognized that such consistency reduces overall developer headache and confusion in the long run. The good of all outweighs personal preference.

So:

Requiring braces around even one line conditionals is more consistent and hence less prone to error my maintainers. Having seen a number of bugs caused by new statements getting added to "if"s where the lack of braces was unnoticed, and having done it myself a few times, I can only requiring braces as a good idea. There is no down side, except some programmers needless whining about a couple of extra characters they have to type.

The ++ and -- operators along with += etc are totally redundant. Again consistency rules. Again there is no down side, except some programmers needless whining about a couple of extra characters they have to type.

Global variables are totally redundant. Especially when you can define functions inside functions. They are the cause of much confusion and error in large projects. Most of the world recommends not to use globals. There is no down side to banning them.

Where have you ever seen a C program that does not use include files? Certainly you can get away without them, if you can accept a lot of compiler warnings. Or perhaps concatenate all your source into one file. I don't think so. Include files are totally redundant, all the information you need is in the actual source, why have to write it twice? There is no down side to not having include files.

As for #defines and the prepocessor, I always though of that as a hack on top of C that get's things done you can't express in C proper. Other languages have far better ways to make the equivalent of macros.
I don't like the "can be turned off" bit. Why cant identifier styles just be chosen by the user as in C, or most languages.
/* when I write a letter to my mother, I don't remove all the spaces and capitalize every word. aaa_bbb is closer to normal english and therefore easer to read. IMHO */
You have a good point. In order to be consistent in my own arguments for, well, consistency I should be happier if I could not turn off that snake_case rule. I should just man up and use underscores in the Rust way.

You know, actually putting spaces between words in human languages to make things easier to read is quite a recent idea. Even now in languages like Finnish they stick lots of little words together to make one horrible long word. Nowhere have I seen underscores used. Anyway, the point is not which way is best but to achieve a common consistency. My English teacher would complain like hell if my spelling, capitalization, punctuation and paragraph layout was all wrong, so should a compiler.

I'm in to minds about that automatic promotion of uint16_t into uint32_t and the like. On the one hand they are obviously correct and safe things to do. On the other hand the fact that you are tying to do it may be hinting at an error on your part. More importantly when someone comes to read your code and sees that they may have to wonder if you have made a mistake or not. But if they see you wrote a type conversion there to stop the compiler complaining, they can be more confident that what they are reading is what you really had in mind. It's all about communicating programmer intention to the next programmer to read the code.

Oh, I think you are right about that operator precedence thing. Looks like they have actually fixed that issue.
Memory in C++ is a leaky abstraction .

jahboater
Posts: 5205
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 1:18 pm

Heater wrote:
Sat Aug 03, 2019 10:17 am
From purely selfish motive I would be happy if every one who ever published code that I end up reading was forced to use the same style and bared from using confusing constructs. It's not just me, work on any large project and they will likely have a style guides and coding rules that get checked at review time. Why, because it's long recognized that such consistency reduces overall developer headache and confusion in the long run. The good of all outweighs personal preference.
Yes, I like (good) coding standards. But I think they are the responsibility of the project management not the language.
Heater wrote:
Sat Aug 03, 2019 10:17 am
The ++ and -- operators along with += etc are totally redundant. Again there is no down side.
Again, looking at your own published code (fftbench.c posted above) you use +=, <<=, etc throughout the code (look at the integer square root function for a good example).
If you don't like them so much, why use them?
And surprisingly with modern compilers, things like c = *ptr++ may actually produce faster code (it will use auto-increment addressing if available). a = a + 3 might have to evaluate "a" twice, a += 3 is more than just a convenience.
Heater wrote:
Sat Aug 03, 2019 10:17 am
Global variables are totally redundant. Especially when you can define functions inside functions. They are the cause of much confusion and error in large projects. Most of the world recommends not to use globals. There is no down side to banning them.
Agreed 100%. There is no doubt their use should be strictly minimized. But banning them? No. That could create horrid code where everything has to be passed all the way down as lots of extra parameters.

I have a piece of code written long long ago I think by Ken Thompson and/or Denis Ritchie. It is the early Regular Expression code, the compiler and the DFA handler with recursive back tracking. It was published as a header with UNIX and originally intended to be the common RE code for sed, ed, grep etc, 40+ years ago.

This code has global variables and goto's. Yet as far as I am aware it has zero defects, is totally portable, never gets complaints from static or run-time checkers, over several decades of use on countless platforms and countless compilers. It is easy to read, robust, and very fast.

I have tried more than once to remove the goto's and global variables, and in all cases the code ends up larger, slower, more bloated, and less readable. The original authors were much cleverer than I am. Now I still use their code, goto's and all, with my own minor performance tweaks.

Heater
Posts: 14732
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 3:50 pm

jahboater,
I like (good) coding standards. But I think they are the responsibility of the project management not the language.
OK, good, we are on the same page.

For a single project the project manger may dictate the standards, the team has to put up and shut up.

Now what if we scale things up a bit? Google, for example has hundreds of projects and project managers. As far as I can tell they have exactly one coding standard. All their code lives in one repository and is subject to the same company wide standards. project managers don't get to have a say in that.

Let's scale it up some more. The "project" is Free and Open Source software. For example all those modules available to Python users with pip, or nodejs users with npm or Rust users with cargo. The "team" is potentially every one of the 7 billion people on this planet. At this point individuals, project managers, companies should get out of the way and a common standard put in place for the project. What better way to do that than have the compiler do it?
...looking at your own published code (fftbench.c posted above) you use +=, <<=, etc throughout the code (look at the integer square root function for a good example).
If you don't like them so much, why use them?
Hangs head in shame, "It's a fair cop. Society is to blame". Guilty as charged.

What can I say. It's an old habit going back to 1980. It's "the way it's done in C". That is the way they teach it. Perhaps I had not thought about it much when I wrote that code. Actually, looking at it now, given the clunky nature of C "for loops" using ++ makes them a bit less clunky.
And surprisingly with modern compilers, things like c = *ptr++ may actually produce faster code (it will use auto-increment addressing if available). a = a + 3 might have to evaluate "a" twice, a += 3 is more than just a convenience.
I thought it was the other way around. That modern compilers would detect that +1 as an increment and use what ever instructions do that best. This calls for an experiment...

I did not understand the "a = a + 3" thing. Surely "a" is going to be clobbered so there is no need to evaluate it twice.
Agreed 100%. There is no doubt their use should be strictly minimized. But banning them? No. That could create horrid code where everything has to be passed all the way down as lots of extra parameters.
Hmm... perhaps, maybe, if they are only global to a module, "static". I think the need for this kind of global is due to limitations on the rest of C. Not being able to define functions inside functions for example.

I dare not say a word about the work of Ken Thompson or Denis Ritchie. But if that header has globals in it I'm sure there is a way I could cause it to fail in interesting ways if I used :)


Edit: When I replace all those short circuit operatirs, "++", "|=", etc, with their long hand equivalents in fftbench it runs at exactly the same speed.
Memory in C++ is a leaky abstraction .

User avatar
bensimmo
Posts: 4330
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 4:32 pm

i was reading this earlier, partII is somewhat related to what you are discussing at the moment.

Another point if view
https://medium.com/dwelo-r-d/we-rewrote ... 8867c61b67

partIi
https://medium.com/dwelo-r-d/abusing-fi ... e6774289fd

jahboater
Posts: 5205
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 4:53 pm

Heater wrote:
Sat Aug 03, 2019 3:50 pm
What better way to do that than have the compiler do it?
Interesting idea. But I am sorry, I still don't like it. Imagine if Denis Ritchie had dictated the coding standards for all C programmers for all time - 48 years ago. (I doubt if C would have been so successful, people would rebel against the autocracy). See this by the way: https://www.gnu.org/prep/standards/
Heater wrote:
Sat Aug 03, 2019 3:50 pm
I thought it was the other way around. That modern compilers would detect that +1 as an increment and use what ever instructions do that best. This calls for an experiment...
Experiment on a machine with auto-increment addressing like the Pi. You are actually right, the compiler will use all sorts of things and merge +n's from all over place into single address offsets and such. However I have noticed that c = *ptr++ does more often use say "ldrb r1,[r2,1]" on the Pi. Long ago I used to think that *p++ derived from the PDP11 "MOV R1,(R2)+" but Mr Ritchie carefully points out that that instruction post dated C.
Heater wrote:
Sat Aug 03, 2019 3:50 pm
I dare not say a word about the work of Ken Thompson or Denis Ritchie. But if that header has globals in it I'm sure there is a way I could cause it to fail in interesting ways if I used :)
I never knew why it was in a header file </usr/include/regexp.h>. I avoid including actual code in header files in C. Done in the days before threads were invented I suppose!

jahboater
Posts: 5205
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 5:17 pm

Heater wrote:
Sat Aug 03, 2019 3:50 pm
Edit: When I replace all those short circuit operatirs, "++", "|=", etc, with their long hand equivalents in fftbench it runs at exactly the same speed.
Not surprising I guess.

So which version did you think was the most readable?

I have been looking at C for so many decades that I find the += form quicker to mentally scan. For a = a + n you have to check its actually "a" used in both places, with a += n you can see at a quick glance that its just adding n to a.
Just me, no right or wrong way.

User avatar
DougieLawson
Posts: 37666
Joined: Sun Jun 16, 2013 11:19 pm
Location: A small cave in deepest darkest Basingstoke, UK
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 5:22 pm

jahboater wrote:
Sat Aug 03, 2019 5:17 pm
Heater wrote:
Sat Aug 03, 2019 3:50 pm
Edit: When I replace all those short circuit operatirs, "++", "|=", etc, with their long hand equivalents in fftbench it runs at exactly the same speed.
Not surprising I guess.

So which version did you think was the most readable?

I have been looking at C for so many decades that I find the += form quicker to mentally scan. For a = a + n you have to check its actually "a" used in both places, with a += n you can see at a quick glance that its just adding n to a.
Just me.
I prefer i++ to i =+ 1 to i = i + 1. If you want to go to extremes write it out COBOL style

Code: Select all

            ADD FOO TO BAR GIVING FOOBAR ROUNDED.
Note: Any requirement to use a crystal ball or mind reading will result in me ignoring your question.

Any DMs sent on Twitter will be answered next month.
All non-medical doctors are on my foes list.

jahboater
Posts: 5205
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 5:31 pm

DougieLawson wrote:
Sat Aug 03, 2019 5:22 pm
jahboater wrote:
Sat Aug 03, 2019 5:17 pm
Heater wrote:
Sat Aug 03, 2019 3:50 pm
Edit: When I replace all those short circuit operatirs, "++", "|=", etc, with their long hand equivalents in fftbench it runs at exactly the same speed.
Not surprising I guess.

So which version did you think was the most readable?

I have been looking at C for so many decades that I find the += form quicker to mentally scan. For a = a + n you have to check its actually "a" used in both places, with a += n you can see at a quick glance that its just adding n to a.
Just me.
I prefer i++ to i =+ 1 to i = i + 1. If you want to go to extremes write it out COBOL style

Code: Select all

            ADD FOO TO BAR GIVING FOOBAR ROUNDED.
:)
Its about 40 years or so since I wrote any COBOL. I soon learned to copy the environment division around from program to program ...

By the way, "++i" is often more efficient that "i++" if the order doesn't matter.

User avatar
DougieLawson
Posts: 37666
Joined: Sun Jun 16, 2013 11:19 pm
Location: A small cave in deepest darkest Basingstoke, UK
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 6:05 pm

jahboater wrote:
Sat Aug 03, 2019 5:31 pm
:)
Its about 40 years or so since I wrote any COBOL. I soon learned to copy the environment division around from program to program ...

By the way, "++i" is often more efficient that "i++" if the order doesn't matter.
There's about 50 lines of noise (including a large chunk of the DATA DIVISION) that moves from program to program with any COBOL I write. I've worked with a professional COBOL programmer in the past year (getting V6.2 running), I learned more from him in a week than I'd learned hacking ugly COBOL for the previous 30+ years.
Note: Any requirement to use a crystal ball or mind reading will result in me ignoring your question.

Any DMs sent on Twitter will be answered next month.
All non-medical doctors are on my foes list.

ejolson
Posts: 4480
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 7:14 pm

Heater wrote:
Sat Aug 03, 2019 3:50 pm
I did not understand the "a = a + 3" thing.
That is a particularly confusing expression, because there is no number except perhaps infinity that is equal to itself after adding three.

While gcc and clang create identical code for many of the equivalent but longer expressions, it is likely
  • Kernighan and Ritchie's original C compiler did not.
  • The short-hand notation was considered clearer.
There are indications, including the remarkable longevity of Unix and the C programming language, that the team charged with creating a word processor for the patent office at Bell Labs knew what they were doing. Moreover, even recently created languages such as Go, Kotlin, Rust and Swift seem to have been designed to be easy for people already familiar with C.

According to Carol Nichols, member of the Rust programming language core team and co-author of The Rust Programming Language book,
Carol Nichols wrote:The biggest strength of Rust is that it's an empowering technology. To write extremely fast code with a low memory footprint previously meant using C or C++. However, using those languages in production code requires you to manage memory manually and know all the ways you might cause undefined behavior.
What's particularly striking is that being able to code an efficient version of any algorithm an individual is able to think up is exactly the kind of liberation though computer literacy to which the upcoming title of this thread refers.

Although it is important to reflect upon how language features and practical performance fit together, one must not forget the more important goal is avoiding a digital apocalypse in which emotional but powerful robots control a computer-illiterate human population. When it comes to avoiding that apocalypse, being able to read and write any kind of computer program is much more powerful than a pea shooter.

Image

Back on topic, since one of the main advantages of Rust is memory safety in concurrent programming through ownership of mutable data types, it would sure be nice to see a parallel version on the insane British dictionary challenge in Rust.
Last edited by ejolson on Sat Aug 03, 2019 8:30 pm, edited 2 times in total.

User avatar
rpdom
Posts: 16365
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 7:32 pm

DougieLawson wrote:
Sat Aug 03, 2019 6:05 pm
There's about 50 lines of noise (including a large chunk of the DATA DIVISION) that moves from program to program with any COBOL I write. I've worked with a professional COBOL programmer in the past year (getting V6.2 running), I learned more from him in a week than I'd learned hacking ugly COBOL for the previous 30+ years.
I don't remember having that much common code in my COBOL programs from my years as a professional COBOL programmer. There was still a fair chunk though.

Heater
Posts: 14732
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Sat Aug 03, 2019 7:43 pm

ejolson,
Back on topic, since one of the main advantages of Rust is memory safety in concurrent programming through ownership of mutable data types, it would sure be nice to see a parallel version on the insane British dictionary challenge in Rust.
Dammit ej. Guess what I have spent this evening doing. Yes, that's right, getting to grips with threads in Rust!

I did not want to get into a concurrent anagram finder, rather, it's with a view to trying some rust on one of our real projects that could do with some threads. But now you are egging be on....

Actually, I don't have much of a clue what a good way to parallelize that beast would be...
Memory in C++ is a leaky abstraction .

Return to “General programming discussion”