Heater
Posts: 15356
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 5:08 am

Wow, impressive.

What's up with your Pi3, qanagrams is 25% faster on mine:

Code: Select all

$ g++ -O3 insane-british-anagram.cpp -o insane-british-anagram -Wall -Wextra
$ gcc -O3 qanagrams.c -o qanagrams -Wall -Wextra
$ time ./insane-british-anagram > temp.txt

real    0m1.471s
user    0m1.418s
sys     0m0.050s
$ time ./quanagrams > temp.txt

real    0m0.609s
user    0m0.556s
sys     0m0.052s
$ md5sum temp.txt
addeda36a4631afc983f3bff13742e36  temp.txt
Sweet.

Sadly qanagrams fails to run on my Debian x86-64 PC:

Code: Select all

$ gcc -O3 qanagrams.c -o qanagrams -Wall -Wextra
$ ./qanagrams
Segmentation fault (core dumped)
$ gcc -O0 qanagrams.c -o qanagrams -Wall -Wextra
$ ./qanagrams
Segmentation fault (core dumped)
$ clang -O3 qanagrams.c -o qanagrams -Wall -Wextra
$ ./qanagrams
Segmentation fault (core dumped)
$ clang -O0 qanagrams.c -o qanagrams -Wall -Wextra
$ ./qanagrams
Segmentation fault (core dumped)
Clang diagnoses it as follows:

Code: Select all

$ clang -O1 -g -fsanitize=address -fno-omit-frame-pointer -o qanagrams qanagrams.c
[email protected]:/mnt/c/Users/heater/conveqs$ ./qanagrams
=================================================================
==29437==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f0c63cf0168 at pc 0x00000052f4f8 bp 0x7fffec977390 sp 0x7fffec977388
READ of size 1 at 0x7f0c63cf0168 thread T0
    #0 0x52f4f7 in buildAnagramList /mnt/c/Users/heater/conveqs/qanagrams.c:210:14
Memory in C++ is a leaky abstraction .

Heater
Posts: 15356
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 5:30 am

I get it, you are advancing dictionary[position] off the end of dictionary in buildAnagramList() :

Code: Select all

    char letter = dictionary[position];
    if (letter < 'a' || letter > 'z') {
      while (dictionary[position] != '\n') {
        position++;                                                     <<---- Oops!
        if (position == dictionarySize) {                  <<---  This hack gets it working.
            return;
        }
      }
      continue;
    }
With that hack in place it now runs on my PC. Pretty speedily too:

$ clang -O3 qanagrams.c -o qanagrams -Wall -Wextra
$ clang++ -O3 insane-british-anagram.cpp -o insane-british-anagram -Wall -Wextra
$ time ./insane-british-anagram | md5sum
addeda36a4631afc983f3bff13742e36 -

real 0m0.360s
user 0m0.203s
sys 0m0.141s
$ time ./qanagrams | md5sum
addeda36a4631afc983f3bff13742e36 -

real 0m0.220s
user 0m0.109s
sys 0m0.109s
Memory in C++ is a leaky abstraction .

User avatar
Paeryn
Posts: 2900
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 5:37 am

Heater wrote:
Tue Jul 30, 2019 5:08 am
Wow, impressive.

What's up with your Pi3, qanagrams is 25% faster on mine:

Code: Select all

$ g++ -O3 insane-british-anagram.cpp -o insane-british-anagram -Wall -Wextra
$ gcc -O3 qanagrams.c -o qanagrams -Wall -Wextra
$ time ./insane-british-anagram > temp.txt

real    0m1.471s
user    0m1.418s
sys     0m0.050s
$ time ./quanagrams > temp.txt

real    0m0.609s
user    0m0.556s
sys     0m0.052s
$ md5sum temp.txt
addeda36a4631afc983f3bff13742e36  temp.txt
Sweet.

Sadly qanagrams fails to run on my Debian x86-64 PC:

Code: Select all

$ gcc -O3 qanagrams.c -o qanagrams -Wall -Wextra
$ ./qanagrams
Segmentation fault (core dumped)
$ gcc -O0 qanagrams.c -o qanagrams -Wall -Wextra
$ ./qanagrams
Segmentation fault (core dumped)
$ clang -O3 qanagrams.c -o qanagrams -Wall -Wextra
$ ./qanagrams
Segmentation fault (core dumped)
$ clang -O0 qanagrams.c -o qanagrams -Wall -Wextra
$ ./qanagrams
Segmentation fault (core dumped)
Clang diagnoses it as follows:

Code: Select all

$ clang -O1 -g -fsanitize=address -fno-omit-frame-pointer -o qanagrams qanagrams.c
[email protected]:/mnt/c/Users/heater/conveqs$ ./qanagrams
=================================================================
==29437==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f0c63cf0168 at pc 0x00000052f4f8 bp 0x7fffec977390 sp 0x7fffec977388
READ of size 1 at 0x7f0c63cf0168 thread T0
    #0 0x52f4f7 in buildAnagramList /mnt/c/Users/heater/conveqs/qanagrams.c:210:14
Oops, it can run off the end of the buffer when ignoring lines that don't start with a letter, like the very last line does Kira added a newline to the end of the buffer just incase there wasn't one. This is the real error, I thought a newline was added to the end, the loops terminate correctly if the very last character is a newline. The only time it checks if it is at the end is at the start of a word/line but Kira had put \0 as the last character rather than \n so the buffer over-ran looking for the end of line.
Change this line in readWordFile() so as to write '\n' rather than '\0'

Code: Select all

// change this line in readWordFile() from
newWords[fileSize] = '\0';  // Make sure last word is terminated
// to
newWords[fileSize] = '\n';  // Make sure last word is terminated
<edited the above fix, simpler than what I originally thought, and is what was intended. Note to self: don't write programs when you haven't slept>

Not sure why your RPi is faster, maybe yours jumped to max speed earlier than mine? Mine's on a pretty much vanilla Buster at the moment, I'll change back to Stretch later and run it on that, I have gcc 9.1 installed on the Stretch SD card too.

I was surprised that Kira's binary tree beat C++'s hashmap, especially as there is no balancing being done on the tree. I haven't taught Kira about red-black trees and didn't feel like adding it for him as the bluetooth keyboard I have for the RPi is horrible to type on for anything other than quick edits. I wonder how balanced that tree is currently? Casper said he'll run up it later and tell me, he likes climbing trees but is no good at typing.
Last edited by Paeryn on Tue Jul 30, 2019 1:14 pm, edited 3 times in total.
She who travels light — forgot something.

User avatar
bensimmo
Posts: 4457
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 7:46 am

Heater wrote:
Tue Jul 30, 2019 3:25 am
bensimmo,
I really want a browser based solution to sit alongside the prime one, so I can test on various devices.
Trust me, that notion already crossed my mind :)

It's not going to happen. Unlike the browser based fibo challenge the anagram challenge has the issue of getting the nearly 7 megabytes of input data into the browser and producing a gigantic output. Not impossible, just crazy! It requires creating I/O plumping that I don't want to spend time fiddling with.
:-)
...when you get bored then ...

Compared to a click-bait web site, you input and output are nothing for a web browser to deal with ;-)

I still surprised the Pi4 is so slow on the fibo-browser bench. I'd have thought 5 or so years and the large steps in architecture would have increased the speed. But the old defunct ARM/Intel tablet setups still cut the mustard.
It a shame these tablets are completely locked down that they cannot be rooted and no alternative modern OS can be put on them, so resigned to the electrical rotting heap next to my none recyclable packaging.

ejolson
Posts: 4861
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 3:32 pm

Paeryn wrote:
Tue Jul 30, 2019 5:37 am
I was surprised that Kira's binary tree beat C++'s hashmap, especially as there is no balancing being done on the tree. I haven't taught Kira about red-black trees and didn't feel like adding it for him as the bluetooth keyboard I have for the RPi is horrible to type on for anything other than quick edits. I wonder how balanced that tree is currently? Casper said he'll run up it later and tell me, he likes climbing trees but is no good at typing.
As a Pi 4 is not available, I have created the Insane British Anagram Bar Chart of Fame using a Raspberry Pi 3B+.

Image

Remember the goal is to avoid the digital apocalypse by writing code that works. Provided the code runs fast enough to finish before the reboot, speed is secondary.

To save work some of the timing results were taken from the forum. If you think there is an error or don't like the color assigned to your name, please let me know so I can change things. The files used to make the chart are available here. The chart and files will be updated as time goes on.

Heater
Posts: 15356
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 4:13 pm

That's pretty.

I'm getting results on mp Pi 3 that make qanagrams about 0.1 slower and iba-2rs about 0.1 quicker. But I'm not quibbling. Still half the speed of the leader :(

What's the secret there? Do I need to write my own binary tree in Rust?
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 5:20 pm

I'm going to do another submission using the HASH extension module I have some issues AIR and I need to resolve with the module.

My initial test show the extension module is much faster than using an associative array.

Heater
Posts: 15356
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 6:17 pm

Version 3 of my Rust solution looks like it overtakes managram.c on my Pi 3
Not sure really, can't find any source for managram.c.

Code: Select all

[email protected]:~ $ rustc -C opt-level=3 -o insane-british-anagram-rs insane-british-anagram-3.rs
[email protected]:~ $ time ./insane-british-anagram-rs > temp1.txt
real    0m1.084s
user    0m0.973s
sys     0m0.111s
The code:

Code: Select all

//
// insane-british-anagram-3.rs - Find words that have valid anagrams
//                               Words sourced from Debian's british-english-insane dictionary
//
// heater - 2019-07-30
// 

#![allow(non_snake_case)]

use std::fs::File;
use std::io::Read;
use std::io::{self, Write};
use std::collections::HashMap;

struct SliceSpec {
    begin: usize,
    end: usize,
}

fn readInsaneBritishDictionary(mut dictionary: &mut Vec<u8>) -> std::io::Result<()> {
    let mut file = File::open("/usr/share/dict/british-english-insane")?;
    file.read_to_end(&mut dictionary)?;
    return Ok(());
}

fn primeHash (slice: &[u8]) -> u64 {
    // One prime number for each lower case letter of the alphabet
    let primes: [u64; 26] = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101];

    let mut hash : u64 = 1;
    for c in slice {
        let index = (c -  97) as usize;  
        hash = hash.wrapping_mul(primes[index]);
    }
    return hash;
}

fn isLowerCase (c : &u8) -> bool {
    if (*c < 'a' as u8) || (*c > 'z' as u8) {
        return false;
    } else {
        return true;
    }
}

fn main() {
    let stdout = io::stdout();
    let mut stdoutHandle = stdout.lock();

    // Map container for sets of anagrams 
    // An anagram set is simply a vector of pointers to word strings
    let mut anagramMap: HashMap<u64, Vec<SliceSpec>> = HashMap::new();

    // An ordered index of anagram set keys 
    let mut index: Vec<u64> = Vec::new();

    let mut dictionary = Vec::new();
    match readInsaneBritishDictionary(&mut dictionary) {
        Ok(()) => {
            let mut wordIndex = 0;
            let mut characterIndex = 0;
            let mut reject = false;
            for c in &dictionary  {
                if isLowerCase(&c) {
                    // We are scanning a valid word
                    characterIndex = characterIndex + 1;
                } else if *c == '\n' as u8 {
                    // We have hit the end of a word, use the word if it's valid
                    if !reject {
                        // Do we have a word with this key (potential anagram)?
                        let word = &dictionary[wordIndex .. characterIndex];
                        let hash = primeHash(&word);
                        //let string = String::from_utf8_lossy(&word).to_string();
                        let wordSpec = SliceSpec {begin: wordIndex, end:characterIndex}; 
                        match anagramMap.get_mut(&hash) {
                            Some(anagramSet) => {
                                // Found: Append it to the existing anagram set
                                anagramSet.push(wordSpec);
                            },
                            None => {
                                // Not found: Add it to the map as start of new anagram set.
                                let mut anagramSet: Vec<SliceSpec> = Vec::new();
                                anagramSet.push(wordSpec);
                                anagramMap.insert(hash, anagramSet);

                                // And add the new anagram set to index
                                index.push(hash);
                            }
                        }
                    }
                    characterIndex = characterIndex + 1;
                    wordIndex = characterIndex;
                    reject = false;
                } else {
                    // Invalid character
                    reject = true;
                    characterIndex = characterIndex + 1;
                }
            }

            let mut output: String = "".to_string();
            for hash in index {
                match anagramMap.get(&hash) {
                    Some(anagramSet) => {
                        if anagramSet.len() > 1 {
                            let mut slice = &dictionary[anagramSet[0].begin .. anagramSet[0].end];
                            let mut word = String::from_utf8_lossy(&slice).to_string();
                            output = output + &word;
                            let mut separator = ": ";
                            for wordSlice in &anagramSet[1..] {
                                slice = &dictionary[wordSlice.begin .. wordSlice.end];
                                word = String::from_utf8_lossy(&slice).to_string();
                                output = output + &separator;
                                output = output + &word;
                                separator = ", ";
                            }
                            output = output + "\n";
                        }
                    },
                    _ => (),
                }
            }

            match stdoutHandle.write_all(output.as_bytes()) {
                Ok(()) => (),
                Err(e) => println!("Error writing reult {}", e),
            }

        },
        Err(e) => {
            println!("Error reading dictionary: {}", e);
        }
    }
}
Memory in C++ is a leaky abstraction .

ejolson
Posts: 4861
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 6:51 pm

Heater wrote:
Tue Jul 30, 2019 6:17 pm
Version 3 of my Rust solution looks like it overtakes managram.c on my Pi 3
Not sure really, can't find any source for managram.c.

Code: Select all

[email protected]:~ $ rustc -C opt-level=3 -o insane-british-anagram-rs insane-british-anagram-3.rs
[email protected]:~ $ time ./insane-british-anagram-rs > temp1.txt
real    0m1.084s
user    0m0.973s
sys     0m0.111s
The code:

Code: Select all

//
// insane-british-anagram-3.rs - Find words that have valid anagrams
//                               Words sourced from Debian's british-english-insane dictionary
//
// heater - 2019-07-30
// 

#![allow(non_snake_case)]

use std::fs::File;
use std::io::Read;
use std::io::{self, Write};
use std::collections::HashMap;

struct SliceSpec {
    begin: usize,
    end: usize,
}

fn readInsaneBritishDictionary(mut dictionary: &mut Vec<u8>) -> std::io::Result<()> {
    let mut file = File::open("/usr/share/dict/british-english-insane")?;
    file.read_to_end(&mut dictionary)?;
    return Ok(());
}

fn primeHash (slice: &[u8]) -> u64 {
    // One prime number for each lower case letter of the alphabet
    let primes: [u64; 26] = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101];

    let mut hash : u64 = 1;
    for c in slice {
        let index = (c -  97) as usize;  
        hash = hash.wrapping_mul(primes[index]);
    }
    return hash;
}

fn isLowerCase (c : &u8) -> bool {
    if (*c < 'a' as u8) || (*c > 'z' as u8) {
        return false;
    } else {
        return true;
    }
}

fn main() {
    let stdout = io::stdout();
    let mut stdoutHandle = stdout.lock();

    // Map container for sets of anagrams 
    // An anagram set is simply a vector of pointers to word strings
    let mut anagramMap: HashMap<u64, Vec<SliceSpec>> = HashMap::new();

    // An ordered index of anagram set keys 
    let mut index: Vec<u64> = Vec::new();

    let mut dictionary = Vec::new();
    match readInsaneBritishDictionary(&mut dictionary) {
        Ok(()) => {
            let mut wordIndex = 0;
            let mut characterIndex = 0;
            let mut reject = false;
            for c in &dictionary  {
                if isLowerCase(&c) {
                    // We are scanning a valid word
                    characterIndex = characterIndex + 1;
                } else if *c == '\n' as u8 {
                    // We have hit the end of a word, use the word if it's valid
                    if !reject {
                        // Do we have a word with this key (potential anagram)?
                        let word = &dictionary[wordIndex .. characterIndex];
                        let hash = primeHash(&word);
                        //let string = String::from_utf8_lossy(&word).to_string();
                        let wordSpec = SliceSpec {begin: wordIndex, end:characterIndex}; 
                        match anagramMap.get_mut(&hash) {
                            Some(anagramSet) => {
                                // Found: Append it to the existing anagram set
                                anagramSet.push(wordSpec);
                            },
                            None => {
                                // Not found: Add it to the map as start of new anagram set.
                                let mut anagramSet: Vec<SliceSpec> = Vec::new();
                                anagramSet.push(wordSpec);
                                anagramMap.insert(hash, anagramSet);

                                // And add the new anagram set to index
                                index.push(hash);
                            }
                        }
                    }
                    characterIndex = characterIndex + 1;
                    wordIndex = characterIndex;
                    reject = false;
                } else {
                    // Invalid character
                    reject = true;
                    characterIndex = characterIndex + 1;
                }
            }

            let mut output: String = "".to_string();
            for hash in index {
                match anagramMap.get(&hash) {
                    Some(anagramSet) => {
                        if anagramSet.len() > 1 {
                            let mut slice = &dictionary[anagramSet[0].begin .. anagramSet[0].end];
                            let mut word = String::from_utf8_lossy(&slice).to_string();
                            output = output + &word;
                            let mut separator = ": ";
                            for wordSlice in &anagramSet[1..] {
                                slice = &dictionary[wordSlice.begin .. wordSlice.end];
                                word = String::from_utf8_lossy(&slice).to_string();
                                output = output + &separator;
                                output = output + &word;
                                separator = ", ";
                            }
                            output = output + "\n";
                        }
                    },
                    _ => (),
                }
            }

            match stdoutHandle.write_all(output.as_bytes()) {
                Ok(()) => (),
                Err(e) => println!("Error writing reult {}", e),
            }

        },
        Err(e) => {
            println!("Error reading dictionary: {}", e);
        }
    }
}
That's a nice increase in speed. I'll add it to the chart.

All code including managram.c is provided at the link appearing at the end of the bar-chart post. For reference the same C choice was posted earlier here. The letter m was added to the original filename to indicate the use of a merge sort for the anagram keys.

ejolson
Posts: 4861
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Tue Jul 30, 2019 8:49 pm

Heater wrote:
Tue Jul 30, 2019 6:17 pm
Version 3 of my Rust solution looks like it overtakes managram.c on my Pi 3
Not sure really, can't find any source for managram.c.

Code: Select all

[email protected]:~ $ rustc -C opt-level=3 -o insane-british-anagram-rs insane-british-anagram-3.rs
[email protected]:~ $ time ./insane-british-anagram-rs > temp1.txt
real    0m1.084s
user    0m0.973s
sys     0m0.111s
I tried compiling on my Raspberry Pi 3B+ and obtained

Code: Select all

$ rustc --version
rustc 1.24.1 (f2dda7fd7 2018-08-16)
$ rustc -C opt-level=3 -o iba-3 iba-3.rs
error[E0499]: cannot borrow `anagramMap` as mutable more than once at a time
  --> iba-3.rs:84:33
   |
75 |                         match anagramMap.get_mut(&hash) {
   |                               ---------- first mutable borrow occurs here
...
84 |                                 anagramMap.insert(hash, anagramSet);
   |                                 ^^^^^^^^^^ second mutable borrow occurs here
...
89 |                         }
   |                         - first borrow ends here

error: aborting due to previous error
It seems the Rust provisions for concurrent memory safety may not be properly followed by the code.

User avatar
Paeryn
Posts: 2900
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Project Digital Apocalypse Not Now

Wed Jul 31, 2019 12:09 am

ejolson wrote:
Tue Jul 30, 2019 8:49 pm
I tried compiling on my Raspberry Pi 3B+ and obtained

Code: Select all

$ rustc --version
rustc 1.24.1 (f2dda7fd7 2018-08-16)
$ rustc -C opt-level=3 -o iba-3 iba-3.rs
error[E0499]: cannot borrow `anagramMap` as mutable more than once at a time
  --> iba-3.rs:84:33
   |
75 |                         match anagramMap.get_mut(&hash) {
   |                               ---------- first mutable borrow occurs here
...
84 |                                 anagramMap.insert(hash, anagramSet);
   |                                 ^^^^^^^^^^ second mutable borrow occurs here
...
89 |                         }
   |                         - first borrow ends here

error: aborting due to previous error
It seems the Rust provisions for concurrent memory safety may not be properly followed by the code.
You need to use rustup to get the latest version of the compiler (1.36 at the time of writing) rather than relying on Raspbian's version, 1.24 was released back in March 2018 so is over a year old. The borrow checker was improved around 1.31 to be smarter about what is valid.

Casper has come down from the top of the anagram tree and reported that Kira's basic binary tree implementation has a maximum depth of 61 nodes and on average it has to search to a depth of 34 so balancing it might be worth it. And my new modem finally turned up just before 5pm!
She who travels light — forgot something.

Heater
Posts: 15356
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Wed Jul 31, 2019 12:35 am

Ah, sorry, forgot to mention I installed a newer version of Rust. Like so:

Code: Select all

$ rustup install 1.36.0
info: syncing channel updates for '1.36.0-armv7-unknown-linux-gnueabihf'
info: latest update on 2019-07-04, rust version 1.36.0 (a53f9df32 2019-07-03)
info: downloading component 'rustc'
 75.2 MiB /  75.2 MiB (100 %)   4.2 MiB/s in 30s ETA:  0s
info: downloading component 'rust-std'
 65.0 MiB /  65.0 MiB (100 %)   4.4 MiB/s in 31s ETA:  0s
info: downloading component 'cargo'
info: installing component 'rustc'
 75.2 MiB /  75.2 MiB (100 %)   3.1 MiB/s in  1m  2s ETA:  0s
info: installing component 'rust-std'
 65.0 MiB /  65.0 MiB (100 %)   3.9 MiB/s in  1m  1s ETA:  0s
info: installing component 'cargo'
  4.2 MiB /   4.2 MiB (100 %)   3.3 MiB/s in 32s ETA:  0s

  1.36.0-armv7-unknown-linux-gnueabihf installed - rustc 1.36.0 (a53f9df32 2019-07-03)

info: checking for self-updates
[email protected]:~ $ rustup default 1.36.0
info: using existing install for '1.36.0-armv7-unknown-linux-gnueabihf'
info: default toolchain set to '1.36.0-armv7-unknown-linux-gnueabihf'

  1.36.0-armv7-unknown-linux-gnueabihf unchanged - rustc 1.36.0 (a53f9df32 2019-07-03)

[email protected]:~ $ rustc -C opt-level=3 -o insane-british-anagram-rs insane-british-anagram-3.rs
[email protected]:~ $ time ./insane-british-anagram-rs > temp1.txt

real    0m1.095s
user    0m0.995s
sys     0m0.100s
In short:

$ rustup install 1.36.0
$ rustup default 1.36.0
$ rustc -C opt-level=3 -o insane-british-anagram-rs insane-british-anagram-3.rs

Note: I just tried Rust version 1.38.0 on Buster and it segfaulted when trying to compile my code.
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Wed Jul 31, 2019 1:23 am

I would apply some WD40 to that Rusty program.

jalih
Posts: 123
Joined: Mon Apr 15, 2019 3:54 pm

Re: Project Digital Apocalypse Not Now

Wed Jul 31, 2019 3:39 pm

I modified my 8th anagram challenge program a little bit and got some speed improvement:

Code: Select all

\ Find anagrams in a word list:

m:new constant anamap
[
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
  73, 79, 83, 89, 97, 101 ] constant primes

: prime-hash  \ s -- s
  1 swap
  ( primes swap a:@ nip n:* )
  s:each! "%x" s:strfmt ;

: process-words \ word --
  /^[a-z]{3,}$/ r:match if
    r:str nip dup >r prime-hash
    anamap over m:@ null? if
      drop swap a:new r> a:push m:!
    else
      r> a:push 2drop
    then
  then drop ;

\ load and process the file requested:
: read-and-check-words  \ fname --
  f:slurp "Cannot open and read file, nothing to process" thrownull
  ' process-words s:eachline ;

\ for sorting the word keys array:
: key-val-cmp         \ key1 key2 -- cmp
  anamap swap m:@     \ key1 anamap key2val
  swap rot m:@        \ val2 anamap val1
  0 a:@ nip           \ val2 anamap v1
  rot 0 a:@ nip       \ map v1 v2
  s:cmp nip ;

: sort-keys-by-first-word \ keys1 -- keys2
  ' key-val-cmp a:sort ;

: app:main
  0 args "/usr/share/dict/british-english-insane" ?:
  read-and-check-words

  \ get an array of those keys which have at least 2 elements:
  a:new anamap ( a:len nip 1 n:> if a:push else drop then ) m:each swap

  \ print the anagram list in order of first-word:
  sort-keys-by-first-word
  m:@@ nip ( 0 a:@ ": " s:+ swap 0 a:- ", " a:join s:+ "\n" s:+ . ) a:each! drop
  bye ;
Running on ROCK64 and using 64 bit 8th binary:

Code: Select all

[email protected]:~/Downloads# time /opt/8th/bin/rpi64/8th anag.8th >a.txt

real	0m7,993s
user	0m7,408s
sys	0m0,552s
[email protected]:~/Downloads# 

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Wed Jul 31, 2019 8:45 pm

Curious.

Where you live do they use a , instead of a period for factional numbers.

Your seconds look weird for the time utility.

ejolson
Posts: 4861
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Wed Jul 31, 2019 8:51 pm

jalih wrote:
Wed Jul 31, 2019 3:39 pm
Running on ROCK64 and using 64 bit 8th binary:

Code: Select all

[email protected]:~/Downloads# time /opt/8th/bin/rpi64/8th anag.8th >a.txt

real	0m7,993s
user	0m7,408s
sys	0m0,552s
[email protected]:~/Downloads# 
Thats a good improvement. I'll run it on my system and then add it to the insane bar chart of fame.

I converted the Basic program hanagram.bas to C to see how much difference the choice of language makes to the execution speed and readability of the code. The result was

Code: Select all

/*  hanagram.c -- Find anagrams
    Written July 31, 2019 by Eric Olson

    This program demonstrates using letter counts and a heap to
    find anagrams in the insane British dictionary.
*/

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <ctype.h>
#include <string.h>
#include <strings.h>

#include <stdio.h>
#include <stdint.h>

static char *words,*cursor;
static char *skipword(){
    for(;*cursor;cursor++){
        if(*cursor=='\n') return ++cursor;
    }
    return 0;
}
static char *getword(){
    do {
        char *r=cursor;
        while(islower(*cursor)) cursor++;
        if(*cursor=='\n'){
            *cursor++=0;
            return r;
        }
    } while(skipword());
    return 0;
}

typedef struct { char v[27]; } keytype;
typedef struct anatype {
    char *word; struct anatype *next; keytype key; } anatype;

static keytype strkey(char *p){
    keytype r;
    memset(&r,'0',26*sizeof(char)); r.v[26]=0;
    for(;*p;p++) r.v[*p-'a']++;
    return r;
}
static int anacmp(anatype *ap,anatype *bp){
    int r=strcmp(ap->key.v,bp->key.v);
    if(r) return r;
    return strcmp(ap->word,bp->word);
}
static anatype **hp; int hpind=0;
static void anaput(anatype *ap){
    int r=++hpind,s;
    for(hp[r]=ap;(s=r/2);r=s){
        if(anacmp(hp[r],hp[s])>=0) return;
        anatype *t=hp[s]; hp[s]=hp[r]; hp[r]=t;
    }
}
static anatype *anaget(void){
    if(!hpind) return 0;
    anatype *ap=hp[1];
    hp[1]=hp[hpind--];
    for(int s=1;;){
        int r0=2*s; 
        if(r0>hpind) return ap;
        int r1=r0+1;
        if(r1<=hpind && anacmp(hp[r0],hp[r1])>=0){
            if(anacmp(hp[r1],hp[s])>=0) return ap;
            anatype *t=hp[r1]; hp[r1]=hp[s]; hp[s]=t; s=r1;
        } else {
            if(anacmp(hp[r0],hp[s])>=0) return ap;
            anatype *t=hp[r0]; hp[r0]=hp[s]; hp[s]=t; s=r0;
        }
    }
}

static anatype *an; int anmax=0;
int main(){
    int fd=open("/usr/share/dict/british-english-insane",O_RDONLY);
    struct stat fdstat;
    fstat(fd,&fdstat);
    words=malloc(fdstat.st_size+1);
    read(fd,words,fdstat.st_size);
    words[fdstat.st_size]=0;
    close(fd);
    int lines=2;
    for(char *p=words;*p;p++) if(*p=='\n') lines++;
    an=malloc(lines*sizeof(anatype));
    hp=malloc(lines*sizeof(anatype *)); hp--;
    char *wp;
    for(cursor=words;(wp=getword());anmax++){
        an[anmax].word=wp;
        an[anmax].next=0;
        an[anmax].key=strkey(wp);
        anaput(an+anmax);
    }
    for(anatype *ap=anaget();ap;){
        anatype *op=ap,*bp=ap;
        for(;(ap=anaget());bp=ap){
            if(strcmp(ap->key.v,op->key.v)) break;
            bp->next=ap;
        }
        if(op!=bp) bp->next=op;
    }
    for(int i=0;i<anmax;i++){
        if(an[i].next){
            fputs(an[i].word,stdout);
            anatype *bp=an[i].next; an[i].next=0;
            char *s=": ";
            while(bp->next) {
                fputs(s,stdout); s=", ";
                fputs(bp->word,stdout);
                anatype *op=bp; bp=bp->next; op->next=0;
            }
            fputc('\n',stdout);
        }
    }
    return 0;
}
Running the program on a Raspberry Pi 3B+ led to the results

Code: Select all

$ make
gcc -O3 -mtune=Cortex-A53 -mcpu=Cortex-A53 -Wall -o hanagram hanagram.c -lm
$ time ./hanagram >hanagram.insane

real    0m1.681s
user    0m1.629s
sys     0m0.050s
$ time perl selfgrams.pl  >selfgrams.insane

real    0m14.559s
user    0m14.343s
sys     0m0.200s
$ md5sum *.insane
bec74aa3b31577edbb291aeb7269a4d5  hanagram.insane
bec74aa3b31577edbb291aeb7269a4d5  selfgrams.insane
Compared to the same algorithm coded in classic line-numbered Basic and compiled with FreeBasic, the C program was 27.7 times faster. It is more difficult to judge which program is easier to read.
Last edited by ejolson on Thu Aug 01, 2019 6:08 pm, edited 5 times in total.

ejolson
Posts: 4861
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Wed Jul 31, 2019 9:03 pm

jalih wrote:
Wed Jul 31, 2019 3:39 pm
I modified my 8th anagram challenge program a little bit and got some speed improvement
On my 32-bit Raspberry Pi 3B+ I obtain

Code: Select all

$ time 8th ./anahashv2.8th >anahashv2.insane

real    0m12.973s
user    0m12.514s
sys     0m0.440s
$ diff anahashv2.insane selfgrams.insane 
732a733
> ad: da
1179a1181
> ag: ga
1366a1369
> ah: ha
1391a1395
> ai: ia
1517a1522
> ak: ka
2166a2172
> am: ma
2542a2549
> an: na
4552a4560
> at: ta
5054a5063
> ay: ya
6310a6320
> bg: gb
7647a7658
> by: yb
9817a9829
> ci: ic
12080a12093
> cv: vc
13720a13734
> dg: gd
14445a14460
> dm: md
14447a14463
> do: od
15226a15243
> dx: xd
15508a15526
> eh: he
15742a15761
> em: me
15926a15946
> en: ne
16704a16725
> er: re
17176a17198
> ew: we
17188a17211
> ex: xe
17302a17326
> ey: ye
17718a17743
> fi: if
19328a19354
> gn: ng
20774a20801
> hm: mh
20775a20803
> ho: oh
21044a21073
> hs: sh
21140a21170
> hw: wh
21302a21333
> ik: ki
21308a21340
> il: li
21554a21587
> in: ni
22324a22358
> io: oi
22420a22455
> is: si
22525a22561
> it: ti
22549a22586
> iv: vi
22557a22595
> iw: wi
22558a22597
> ix: xi
23093a23133
> ko: ok
23162a23203
> ku: uk
24178a24220
> lo: ol
24603a24646
> lx: xl
26197a26241
> mu: um
26330a26375,26376
> mw: wm
> my: ym
26849a26896
> no: on
27080a27128,27129
> ns: sn
> nu: un
27158a27208
> ny: yn
27442a27493
> or: ro
27584a27636
> ot: to
28025a28078
> ow: wo
28051a28105
> oy: yo
30558a30613
> pu: up
32047a32103
> ru: ur
32137a32194
> rv: vr
33672a33730
> su: us
33858a33917
> sw: ws
34455a34515
> tu: ut
34528a34589
> tx: xt
This shows that formatting is now correct, but all the two-letter anagrams are still missing. Without those two-letter words it's not clear the resulting anagram list is enough insane to prevent the deep-learning convolutional neural networks from taking over the world.

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Project Digital Apocalypse Not Now

Wed Jul 31, 2019 10:14 pm

Add C BASIC to your C code and it will look like BASIC again.

jalih
Posts: 123
Joined: Mon Apr 15, 2019 3:54 pm

Re: Project Digital Apocalypse Not Now

Thu Aug 01, 2019 4:31 am

ejolson wrote:
Wed Jul 31, 2019 9:03 pm
This shows that formatting is now correct, but all the two-letter anagrams are still missing. Without those two-letter words it's not clear the resulting anagram list is enough insane to prevent the deep-learning convolutional neural networks from taking over the world.
What it shows is that I forgot to modify RegEx from:

Code: Select all

 /^[a-z]{3,}$/
to:

Code: Select all

 /^[a-z]{2,}$/

User avatar
scruss
Posts: 3067
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: Project Digital Apocalypse Not Now

Thu Aug 01, 2019 11:56 am

jalih wrote:
Thu Aug 01, 2019 4:31 am

Code: Select all

 /^[a-z]{2,}$/
You're not matching the full input set, though. Sure, I know that single letters aren't going to be anagrams of one another, but that regex is fractionally slower to match than the one I used in Perl.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him

ejolson
Posts: 4861
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Thu Aug 01, 2019 12:02 pm

jalih wrote:
Thu Aug 01, 2019 4:31 am
What it shows is that I forgot to modify RegEx from:

Code: Select all

 /^[a-z]{3,}$/
to:

Code: Select all

 /^[a-z]{2,}$/
Thanks. That helps.

I wonder what happened to Haskell, Smalltalk and the Scheme interpreters. Would it be possible to complete the insane British anagram challenge using Scratch?

User avatar
Paeryn
Posts: 2900
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Project Digital Apocalypse Not Now

Thu Aug 01, 2019 1:29 pm

ejolson wrote:
Thu Aug 01, 2019 12:02 pm
I wonder what happened to Haskell, Smalltalk and the Scheme interpreters. Would it be possible to complete the insane British anagram challenge using Scratch?
I've been considering writing a version in Haskell, I'll see what I can come up with.

Haskell is usually a compiled language rather than interpreted one. The Glasgow Haskell Compiler is the primary implementation and produces executable code, though it is capable of compiling to bytecode which is used in the interactive environment GHCi.
She who travels light — forgot something.

Heater
Posts: 15356
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Thu Aug 01, 2019 1:51 pm

Good question, where are those guys?

Meanwhile I have what has to be my last solution in Rust.

I took it into my head that all those variable length Stings and Vectors and all that pushing and inserting smelled to much like memory allocations that surely must be slowing things down. So I got rid of most of them. Rewriting the Rust more in the style of C with fixed sized arrays and structures and the like.

Code: Select all

//
// insane-british-anagram-4.rs - Find words that have valid anagrams
//                               Words sourced from Debian's british-english-insane dictionary
//
// heater - 2019-08-01
// 
// WARNING: This is not a good solution. Only a crazy experiment in trying to write Rust like C.
//          It's verbose, complex and slow!

#![allow(non_snake_case)]

use std::fs::File;
use std::io::Read;
use std::io::{self, Write};
use std::collections::HashMap;

#[derive(Copy, Clone)]
struct SliceSpec {
    begin: usize,
    end: usize,
}

#[derive(Copy, Clone)]
struct AnagramSet {
    wordSlices : [SliceSpec; 17],
    size : usize,
}

impl AnagramSet {
    fn new(word: SliceSpec) -> AnagramSet {
        return AnagramSet {
             wordSlices: [word; 17],
             size: 1,
        };
    }    
    fn push(&mut self, slice: SliceSpec) {
        self.wordSlices[self.size] = slice;
        self.size = self.size + 1;
    }
}

fn readInsaneBritishDictionary(mut dictionary: &mut Vec<u8>) -> std::io::Result<()> {
    let mut file = File::open("/usr/share/dict/british-english-insane")?;
    file.read_to_end(&mut dictionary)?;
    return Ok(());
}

fn primeHash (slice: &[u8]) -> u64 {
    // One prime number for each lower case letter of the alphabet
    let primes: [u64; 26] = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101];

    let mut hash : u64 = 1;
    for c in slice {
        let index = (c -  97) as usize;  
        hash = hash.wrapping_mul(primes[index]);
    }
    return hash;
}

fn isLowerCase (c : &u8) -> bool {
    if (*c < 'a' as u8) || (*c > 'z' as u8) {
        return false;
    } else {
        return true;
    }
}

fn main() {
    anagrams();
}

fn anagrams() {
    let stdout = io::stdout();
    let mut stdoutHandle = stdout.lock();

    // Container for sets of anagrams 
    // An anagram set is simply an array of offets into the anagramSets array 
    let mut anagramMap: HashMap<u64, usize> = HashMap::new(); 

    // Vector of AnagramSets
    let mut anagramSets: Vec<AnagramSet> = Vec::new(); 
    let mut anagramSetsCount: usize = 0;

    // An ordered index of anagram set keys 
    let mut index: Vec<u64> = Vec::new();

    let mut dictionary = Vec::new();

    match readInsaneBritishDictionary(&mut dictionary) {    // Takes 25ms on PC
        Ok(()) => {
            let mut wordIndex = 0;
            let mut characterIndex = 0;
            let mut reject = false;
            for c in &dictionary  {
                if isLowerCase(&c) {
                    // We are scanning a valid word
                    characterIndex = characterIndex + 1;
                } else if *c == '\n' as u8 {
                    // We have hit the end of a word, use the word if it's valid
                    if !reject {
                        // Do we have a word with this key (potential anagram)?
                        let word = &dictionary[wordIndex .. characterIndex];

                        let hash = primeHash(&word);

                        let wordSpec = SliceSpec {begin: wordIndex, end:characterIndex}; 
                        match anagramMap.get_mut(&hash) {
                            Some(anagramSetsCount) => {
                                // Found: Append it to the existing anagram set
                                anagramSets[*anagramSetsCount].push(wordSpec);
                            },
                            None => {
                                // Not found: Add it to the map as start of new anagram set.
                                // Make a new anagram set with one word in it.
                                let anagramSet = AnagramSet::new(wordSpec);
                                // Add the new anagram set to our list of anagram sets
                                anagramSets.push(anagramSet);
                                anagramMap.insert(hash, anagramSetsCount);
                                anagramSetsCount = anagramSetsCount + 1;

                                // And add the new anagram set to index
                                index.push(hash);
                            }
                        }
                    }
                    characterIndex = characterIndex + 1;
                    wordIndex = characterIndex;
                    reject = false;
                } else {
                    // Invalid character
                    reject = true;
                    characterIndex = characterIndex + 1;
                }
            }

            let mut output: String = "".to_string();
            for hash in index {
                match anagramMap.get(&hash) {
                    Some(AnagramSetsCount) => {
                        let size = anagramSets[*AnagramSetsCount as usize].size;   
                        if size > 1 {    
                            let mut separator = "";
                            let mut i = 0;
                            while i < size {
                                let begin = anagramSets[*AnagramSetsCount].wordSlices[i].begin;
                                let end = anagramSets[*AnagramSetsCount].wordSlices[i].end;
                                let slice = &dictionary[begin .. end];
                                let word = String::from_utf8_lossy(&slice).to_string();
                                output = output + &separator;
                                output = output + &word;

                                if i == 0 {
                                    separator = ": ";
                                } else {
                                    separator = ", ";
                                }
                                i = i + 1;
                            }
                            output = output + "\n";
                        }
                    },
                    _ => (),
                }
            }

            match stdoutHandle.write_all(output.as_bytes()) {
                Ok(()) => {
                },
                Err(e) => println!("Error writing reult {}", e),
            }
        },  
        Err(e) => {
            println!("Error reading dictionary: {}", e);
        }
    }
}
The results:

Code: Select all

[email protected]:~ $ rustc -C opt-level=3 -o insane-british-anagram-rs insane-british-anagram-4.rs
[email protected]:~ $ time ./insane-british-anagram-rs > temp.txt

real    0m0.944s
user    0m0.823s
sys     0m0.120s
Woopy do! A teeny, tiny, tad faster. Still never broken the one second barrier on the Pi 3 before.

But... On my x86 PC this code is less than half the speed of insane-british-anagram-3.rs, WTF!

Maybe it's time I made a proper effort at learning Rust from "The Book" and documentation rather than just throwing likely looking code at it at random and seeing what the compiler says about it!

Edit: I sneakily updated the code. It now uses Rust's "impl" feature to add some methods to structs. This neatens up the code and somehow also makes it another tad faster.
Memory in C++ is a leaky abstraction .

Heater
Posts: 15356
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Thu Aug 01, 2019 8:19 pm

This is crazy.

On my x86_64 PC the latest Rust solution is much, much slower than the previous effort. On the Pi 3 it's a bit faster:

Code: Select all

$ rustc -C opt-level=3 -o insane-british-anagram-rs insane-british-anagram-3.rs
$ time ./insane-british-anagram-rs > temp.txt

real    0m0.300s
user    0m0.125s
sys     0m0.156s
$ md5sum temp.txt
addeda36a4631afc983f3bff13742e36  temp.txt
$ rustc -C opt-level=3 -o insane-british-anagram-rs insane-british-anagram-4.rs
$ time ./insane-british-anagram-rs > temp.txt

real    0m0.563s
user    0m0.109s
sys     0m0.453s
$ md5sum temp.txt
addeda36a4631afc983f3bff13742e36  temp.txt
$
Disturbingly it's performance on the PC only just matches the current champion, Paeryn's solution, on the Pi!
Memory in C++ is a leaky abstraction .

ejolson
Posts: 4861
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Fri Aug 02, 2019 2:52 am

Heater wrote:
Thu Aug 01, 2019 8:19 pm
This is crazy.
According to the lead developer of FidoBasic, compared to cats the zombies weren't bad, so I let them in. On the Raspberry Pi 3B+ their code yielded

Code: Select all

$ gcc -O3 -fopenmp -mtune=Cortex-A53 -mcpu=Cortex-A53 -Wall -o zanagram zanagram.c -lm
$ sudo bash
# echo performance >/sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# exit 
$ time ./zanagram >zanagram.insane

real    0m0.393s
user    0m0.767s
sys 0m0.061s
$ md5sum zanagram.insane selfgrams.insane 
bec74aa3b31577edbb291aeb7269a4d5  zanagram.insane
bec74aa3b31577edbb291aeb7269a4d5  selfgrams.insane
$ sudo bash
# echo ondemand >/sys/devices/system/cpu/cpufreq/policy0/scaling_governor
# exit
For reference the code was

Code: Select all

/*  zanagram.c -- Find anagrams
    Written by the zombies with help from an anonymous dog

    This program demonstrates using dogarithms and a parallel merge
    sort to find anagrams in the insane British dictionary.
*/

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <ctype.h>
#include <string.h>
#include <strings.h>
#include <math.h>

#include <stdio.h>
#include <stdint.h>

#ifdef _OPENMP
# include <omp.h>
# define do_pragma(Z) _Pragma(#Z)
# define spawn(Y) do_pragma(omp task default(shared) Y)
# define forall _Pragma("omp taskloop default(shared)") for
# define sync _Pragma("omp taskwait")
# define get_ncpu(X) omp_get_num_procs(X)
# define workers(X) \
    omp_set_dynamic(0); \
    if(X) omp_set_num_threads(X); \
    else omp_set_num_threads(1); \
    _Pragma("omp parallel default(shared)") \
    _Pragma("omp single")
#else
# define spawn(Y)
# define forall for
# define sync
# define get_ncpu(X) 1
# define workers(X)
#endif
#define zombie {
#define apocalypse }

static int ncpu,logcpu;
static char *words,*cursor;
static char *skipword() zombie
    for(;*cursor;cursor++) zombie
        if(*cursor=='\n') return ++cursor;
    apocalypse
    return 0;
apocalypse
static char *getword() zombie
    do zombie
        char *r=cursor;
        while(islower(*cursor)) cursor++;
        if(*cursor=='\n') zombie
            *cursor++=0;
            return r;
        apocalypse
    apocalypse while(skipword());
    return 0;
apocalypse

typedef uint64_t keytype;
typedef struct anatype zombie
    char *word; struct anatype *next; keytype key; apocalypse anatype;

static double pr[26]= zombie
    7, 61, 29, 41, 2, 67, 53, 47, 3, 101, 73, 23, 43,
    11, 13, 37, 97, 17, 5, 19, 31, 71, 79, 83, 59, 89 apocalypse;
static keytype dog[26];
static keytype strkey(char *p) zombie
    keytype r=0;
    for(;*p;p++) r+=dog[*p-'a'];
    return r;
apocalypse
static int anacmp(anatype *ap,anatype *bp) zombie
    if(ap->key<bp->key) return -1;
    if(ap->key>bp->key) return 1;
    return strcmp(ap->word,bp->word);
apocalypse
static void mergeserial(int m,anatype *x[m],
    int n,anatype *y[n],anatype *z[n+m]) zombie
    int k=0,i=0,j=0;
    while(i<m&&j<n) zombie
        if(anacmp(x[i],y[j])<=0) z[k++]=x[i++];
        else z[k++]=y[j++];
    apocalypse
    while(i<m) z[k++]=x[i++];
    while(j<n) z[k++]=y[j++];
apocalypse
static void msortserial(int n,int p,anatype *x[n],anatype *y[n]) zombie
    if(n==1) zombie
        if(p%2) y[0]=x[0];
        return;
    apocalypse
    int m=n/2;
    msortserial(m,p+1,x,y);
    msortserial(n-m,p+1,x+m,y+m);
    if(p%2) mergeserial(m,x,n-m,x+m,y);
    else mergeserial(m,y,n-m,y+m,x);
apocalypse
static int binsearch(int n,anatype *y[n],anatype *r) zombie
    int a=0,b=n-1;
    if(anacmp(y[a],r)>0) return 0;
    if(anacmp(y[b],r)<0) return n;
    for(int m=(a+b)/2;a!=m;m=(a+b)/2) zombie
        if(anacmp(y[m],r)<=0) a=m;
        else b=m;
    apocalypse
    return b;
apocalypse
static void mergeparallel(int p,int m,anatype **x,
    int n, anatype **y,anatype *z[n+m]) zombie
    if(m<n) zombie
        int k=m; m=n; n=k;
        anatype **t=x; x=y; y=t;
    apocalypse
    if(p>logcpu||m<4096) zombie
        mergeserial(m,x,n,y,z);
        return;
    apocalypse
    int m2=m/2;
    int n2=binsearch(n,y,x[m2]);
    spawn() mergeparallel(p+1,m2,x,n2,y,z);
    mergeparallel(p+1,m-m2,x+m2,n-n2,y+n2,z+m2+n2);
    sync;
apocalypse
static void msortparallel(int n,int p,anatype *x[n],anatype *y[n]) zombie
    if(p>=logcpu||n<1024) zombie
        msortserial(n,p,x,y);
        return;
    apocalypse
    int m=n/2;
    spawn() msortparallel(m,p+1,x,y);
    msortparallel(n-m,p+1,x+m,y+m);
    sync;
    if(p%2) mergeparallel(p,m,x,n-m,x+m,y);
    else mergeparallel(p,m,y,n-m,y+m,x);
apocalypse

static anatype **sra,**srb;
static anatype *an; int anmax=0;
int main() zombie
    for(int i=0;i<26;i++) dog[i]=log(pr[i])*(1ull<<52)+0.5;
    int fd=open("/usr/share/dict/british-english-insane",O_RDONLY);
    struct stat fdstat;
    fstat(fd,&fdstat);
    words=malloc(fdstat.st_size+1);
    read(fd,words,fdstat.st_size);
    words[fdstat.st_size]=0;
    close(fd);
    int lines=2;
    for(char *p=words;*p;p++) if(*p=='\n') lines++;
    an=malloc(lines*sizeof(anatype));
    sra=malloc(lines*sizeof(anatype *));
    srb=malloc(lines*sizeof(anatype *));
    for(cursor=words;;anmax++) zombie
        char *wp=getword();
        if(!wp) break;
        an[anmax].word=wp;
        an[anmax].next=0;
        sra[anmax]=an+anmax;
    apocalypse
    ncpu=get_ncpu(); if(ncpu>4) ncpu=4;
    workers(ncpu) zombie
        forall(int i=0;i<anmax;i++) zombie
            an[i].key=strkey(an[i].word);
        apocalypse
        logcpu=log2(ncpu)+2.99;
        msortparallel(anmax,0,sra,srb);
    apocalypse
    for(int i=0;i<anmax;) zombie
        int j=i, k=i;
        for(i++;i<anmax;j=i++) zombie
            if(sra[i]->key!=sra[k]->key) break;
            sra[j]->next=sra[i];
        apocalypse
        if(k!=j) sra[j]->next=sra[k];
    apocalypse
    for(int i=0;i<anmax;i++) zombie
        if(an[i].next) zombie
            fputs(an[i].word,stdout);
            anatype *bp=an[i].next; an[i].next=0;
            char *s=": ";
            while(bp->next) zombie
                fputs(s,stdout); s=", ";
                fputs(bp->word,stdout);
                anatype *op=bp; bp=bp->next; op->next=0;
            apocalypse
            fputc('\n',stdout);
        apocalypse
    apocalypse
    return 0;
apocalypse
Only after examining the code did I notice that Fido was foaming at the mouth. I'm a little worried what will happen next, but not really because the above story--as realistic as it sounds--was actually made up.

The code, however, is real as are the runtimes.

Return to “General programming discussion”