lurk101
Posts: 380
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Binary search

Sat Aug 01, 2020 5:30 pm

And here I thought the binary search algorithm was a settled thing!

https://github.com/scandum/binary_search

Heater
Posts: 17392
Joined: Tue Jul 17, 2012 3:02 pm

Re: Binary search

Sat Aug 01, 2020 8:00 pm

Interesting.

I guess for that to be useful you need your arrays to be sorted before you start. Which might be a lot more processing time that the search itself.

Or perhaps one always keeps ones arrays sorted by using the binary search to find the insertion point for new items and moving things after that up the array.

As always, ones application and typical data will determine what is optimal.
Memory in C++ is a leaky abstraction .

pidd
Posts: 1627
Joined: Fri May 29, 2020 8:29 pm
Location: Wirral, UK
Contact: Website

Re: Binary search

Sat Aug 01, 2020 8:05 pm

lurk101 wrote:
Sat Aug 01, 2020 5:30 pm
And here I thought the binary search algorithm was a settled thing!

https://github.com/scandum/binary_search
It still is! They aren't comparing like for like, binary search is searching, others are predicting but you can't predict random.

Obviously if you have predictable data set then a prediction algorithm should beat the binary search.

The binary search will win in the long term with large randomly distributed ordered sets.

lurk101
Posts: 380
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: Binary search

Sat Aug 01, 2020 8:31 pm

pidd wrote:
Sat Aug 01, 2020 8:05 pm
The binary search will win in the long term with large randomly distributed ordered sets.
Good to know. I was a little disturbed... thinking how many less than optimal ones I might have coded.

Heater
Posts: 17392
Joined: Tue Jul 17, 2012 3:02 pm

Re: Binary search

Sat Aug 01, 2020 9:09 pm

It all depends on what you need to optimize and the size/shape of your data.

In some recent little programs I have been working on a hash map has turned out faster than any binary search.

But there is an interesting point about all this. Traditionally CS students are taught about algorithmic complexity and how to estimate run time vs data size. See "Big O notation" and all that: https://en.wikipedia.org/wiki/Big_O_notation. All good stuff.

Traditionally these analysis consider how many fundamental operations it takes to get the job done, arithmetic operations, comparisons, moves, swaps etc.

They have ignored modern day things like cache memory, processor branch predictors, multiple cores, and so on. Never mind ignoring data set size/shape/distribution etc.

So, for example, such analysis would conclude that operating on a 2D array from left to right, top to bottom, would take the same time as working from top to bottom, left to right. But in reality, now a days, one way is 10 or 100 or more times faster thanks to cache memory.

Similarly, the beloved linked list turns out to be terrible in most use cases.
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 6681
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Binary search

Sat Aug 01, 2020 9:41 pm

And for the small tables - 10 or 100 elements, a simple linear scan might be best.
Pi4 8GB (Raspberry Pi OS 64-bit), Pi4 4GB, Pi4 2GB, Pi1 Rev 1 256MB, Pi Zero

enedil
Posts: 80
Joined: Sat Feb 21, 2015 4:22 pm
Location: Toruń, Poland
Contact: Website

Re: Binary search

Sun Aug 16, 2020 1:36 am

Heater wrote:
Sat Aug 01, 2020 8:00 pm
Interesting.

I guess for that to be useful you need your arrays to be sorted before you start. Which might be a lot more processing time that the search itself.

Or perhaps one always keeps ones arrays sorted by using the binary search to find the insertion point for new items and moving things after that up the array.

As always, ones application and typical data will determine what is optimal.
I think you might be a little bit confused here. It happens sometimes that you have some read-only sorted structure. Simple example, you have a function of two parameters x and y, and you know that if you fix x, then this function is increasing with increasing values of y. Then, to find such pairs x,y, so that f(x,y)==z (where z is of your choosing), you could iterate over suitable values of x, and for each such x, check all possible values of y. This takes quite a long time. What can you do instead, is to choose x and use binary search over possible values of y. This is a considerable speedup (and contrary to alternative approaches involving hashtables, you don't need to store the array in memory at all, so no initial sorting time). Of course, the example with a function that computes the result instead of an array is a bit divergent from the idea from the repository about optimizing cycles in binary seach, but I'm pretty sure there are lots of examples where this simple approach (iterate over one dimension, binary search over the second one) pays off really quickly.
- What Can a Thoughtful Man Hope for Mankind on Earth, Given the Experience of the Past Million Years?
- Nothing.

Kurt Vonnegut, Cat's Cradle

Heater
Posts: 17392
Joined: Tue Jul 17, 2012 3:02 pm

Re: Binary search

Sun Aug 16, 2020 5:26 pm

enedil wrote:
Sun Aug 16, 2020 1:36 am
I think you might be a little bit confused here.
Hmm....I go through life in a permanent state of confusion. Generally because I am always batting my head against something new and different.

However, I have no confusion in this case. What I said was: "... ones application and typical data will determine what is optimal"

Experience teaches that whatever funky algorithm may be proved optimal in some case will almost certainly not be so in other cases. Typically algorithms honed for huge data sets do not outperform simpler algorithms below some size of data set.

And then one has to be sure what one is optimizing for. Could be average performance over all kind of random data. Could be minimum worst case run time. Could be memory size. Whatever your application needs. What size data do you typically need to deal with? And so on.

Much of which can depend heavily on the actual machine you are running on. What size instruction and data caches does it have? What kind of branch predictor does it have? Can you split the work over many cores efficiently? Etc, etc.

I think you might be interest in this presentation by Andrei Alexandrescu : “Speed Is Found In The Minds of People" : https://www.youtube.com/watch?v=FJJTYQYB1JQ where he comes up with all kind of surprising results as he tries to optimize sorting. Along the way he shows how binary search was sub-optimal for what he was doing. He ends with results that are inexplicable, his code looks like it does more work, comparisons, swaps, etc, but it runs faster!

You will be confused as well :)
Memory in C++ is a leaky abstraction .

pidd
Posts: 1627
Joined: Fri May 29, 2020 8:29 pm
Location: Wirral, UK
Contact: Website

Re: Binary search

Sun Aug 16, 2020 5:51 pm

@Heater agreed!

Goes back to what I said earlier, if there are patterns an algorithm knows about it should be better than binary search when it finds those patterns. However if it wastes time looking for patterns that don't exist then its going to double up on its losses against a binary search.

Heater
Posts: 17392
Joined: Tue Jul 17, 2012 3:02 pm

Re: Binary search

Sun Aug 16, 2020 5:57 pm

Yes, and a binary search can be sub-optimal as it totally defeats the branch predictors on modern CPU's. See linked video.
Memory in C++ is a leaky abstraction .

pidd
Posts: 1627
Joined: Fri May 29, 2020 8:29 pm
Location: Wirral, UK
Contact: Website

Re: Binary search

Sun Aug 16, 2020 6:11 pm

I can't watch the youtube now, I'll look later.

The argument for use of a limited number cores doesn't hold up, as the sets get larger eventually the binary search will catch up and overtake again, the sets might have to be stupidly large, so for practical use the binary search may lose but theoretically it still wins.

Basically any finite set has at least one pattern so a binary search can be beaten on any finite set.

Heater
Posts: 17392
Joined: Tue Jul 17, 2012 3:02 pm

Re: Binary search

Sun Aug 16, 2020 6:31 pm

pidd wrote:
Sun Aug 16, 2020 6:11 pm
The argument for use of a limited number cores doesn't hold up, as the sets get larger eventually the binary search will catch up and overtake again, the sets might have to be stupidly large, so for practical use the binary search may lose but theoretically it still wins.

Basically any finite set has at least one pattern so a binary search can be beaten on any finite set.
I'm not sure what you are getting at there. Many algorithms can scale their performance with the number of cores nicely. Although Amdahl's law always gets in the way there. For smallish data sets the overheads of setting up those threads will often defeat an algorithm that works well for bigger data.

But there is that thing about "theoretically" vs "practically". Which the linked video has nice examples of.
Memory in C++ is a leaky abstraction .

enedil
Posts: 80
Joined: Sat Feb 21, 2015 4:22 pm
Location: Toruń, Poland
Contact: Website

Re: Binary search

Sun Aug 16, 2020 11:23 pm

I know that there are examples where simple linear scan is faster than a binary search. From my experience though, once your data set has about 100 elements, bsearch is really faster, especially since you're all taking about searching in arrays, while for me, when I actually used binary search, it was most commonly on functions that took an integer as a parameter. That way the branch predictor, the RAM cache, and lots of optimizations in current CPUs start not to matter. After all, the function computing result is still called, and it's code is in the cache. I appreciate though the deep-dive into what absurd patterns in programming can generate better performance.
- What Can a Thoughtful Man Hope for Mankind on Earth, Given the Experience of the Past Million Years?
- Nothing.

Kurt Vonnegut, Cat's Cradle

Return to “General programming discussion”