How would you find a name in a phone book?
The method is an *algorithm* if you can
describe it precisely enough for a computer or
a person to follow the same steps.
In this activity, we explore
the problem of finding a word
in an alphabetized
book. We explore two questions:

- How can various search methods be written as algorithms?
- Why might we choose one algorithm instead of another?

**Introduction to Sorted Search**

Run the following short program in Pencil Code. It loads a script from a webpage, and then it runs the "find" function from that script.

await loadscript 'http://csp.pencilcode.net/lib/find.coffee', defer()do find

Each box has a word, and the words are in alphabetical order. Click on a box to get at a word.

**Exercise: An Algorithm for Searching by Hand**

- Design an algorithm to find the goal word manually:
- Click on boxes to get under them until you find the goal.
- Click on the box "
__-1 if not here__" if you have found that the word is not here. - Think about what you did, and describe your algorithm in English in a way that would let somebody else follow exactly the same procedure.

- Working with a partner or alone, analyze your
manual algorithm:
- Use your algorithm at least three times with different data.
- Each time, count how many gets you need to find the word (or to determine that it is not there).
- The number "n" is the number of boxes. Make a table relating the number of gets you needed to the number "n".

- Write a hypothesis relating n with the number of gets used by your algorithm.

**Programming a Random Search**

Here is a simple algorithm somebody might use if they did not know the order of the alphabet:

- Loop forever:
- Look on a random page.
- We have an answer if it matches.

Here is that algorithm, written in CoffeeScript.

await loadscript 'http://csp.pencilcode.net/lib/find.coffee', defer() # Careful: put a space after the word "find" below.find (goal, get, n) -># define "goal" and "get" and "n".loop# loop forever:i = random n# pick i randomly from 0 to n-1.w = get i# let w be the word on the ith page.if w is goal# check if w matches the goal:return i# if so, return the location.

In CoffeeScript, indenting is used to define blocks of code under the control of a command, so it is important to match the indenting when writing the program above.

In this program, the space after
`find`

is
also important. (Why? Because the input to
`find`

is an algorithm to use. It
must see *your whole algorithm*,
from the parentheses past the arrow to the end of
your indented code. If you forget the space,
CoffeeScript will think that the inputs to
`find`

are just the contents of the
parentheses.)

**Exercise: Analyze Random Search**

- Enter the program above, and get it to run.
Save it as "randomsearch".
- How many steps did it take to run?
- Does it always get the right answer?

- Explain why you would not want to use this
algorithm in real life.
- When searching
`n`

pages, what is the largest number of gets you should ever need? - Discuss the reliability of this algorithm.

- When searching
- Graph the performance by changing
`find`

to`findperf`

, as in the code below.- How often does the program fail to give an answer?
- Understand the bar graph.
How does the number of gets compare to
`n`

?

await loadscript 'http://csp.pencilcode.net/lib/find.coffee', defer()findperf(goal, get, n) -> # run a performance test. loop # everything below is unchanged. i = random n w = get i if w is goal return i

`find`

and `findperf`

are functions defined by the script at
http://csp.pencilcode.net/lib/find.coffee.
Although the script is long, it is ordinary
CoffeeScript: take a look if you are curious.

**Running a Linear Search**

The algorithm below is simple yet systematic. Is it better than random search?

- Look at every page from the 0th to the last:
- If the word on the page matches, we're done.

- If not found after looking at the last page, say "it's not here."

In CoffeeScript (go back to using `find`

):

await loadscript 'http://csp.pencilcode.net/lib/find.coffee', defer()find(goal, get, n) -> # animate this algorithm:for i in [0...n]# try each i from 0 to (n-1):w = get i# let w be the word under ith button.if w is goal# check if w matches the goal:return i# if so, return the location.return -1# if we get past the loop, return -1.

The loop `for i in [0...n]`

repeats
the indented block once for each value of `i`

,
starting at 0, and up to, but *not including*
`n`

. In CoffeeScript, a numeric range using
three dots will include the first number but
omit the last number. Using two dots will include both
ends.

**Exercise: Analyze Linear Search**

- Enter the program above, and get it to run.
Save it as "linearsearch".
- How many steps did it take to run?

- Analyze the performance of the algorithm.
- When searching in
`n`

pages, what is the number of gets that you would expect (on average) to do with Linear Search? - Discuss the reliability of this algorithm.

- When searching in
- Use
`findperf`

to measure the performance of the algorithm.- Do the tests differ from your prediction, and why?

**Divided Search**

When looking in a phone book, you might divide your search in half by geting first at the middle of the book. Then, taking advantage of alphabetical order, you could look through only half of the book.

- get at page #m, halfway through the book.
- If the goal is alphabetized before the middle, use the search range [0...m).
- Otherwise (when the goal is not before the middle) use the search range [m...n).
- For each page in the search range:
- If the word on the page matches, we're done.

- If not found after looking at the last page in the range, say "it's not here."

In the program below, the variable `m`

is
the half-way-page number, and the variables
`a`

and `z`

represent the
start and end of the search range. This algorithm
chooses `a`

and `z`

after comparing
the goal to the word at `m`

.

await loadscript 'http://csp.pencilcode.net/lib/find.coffee', defer() find (goal, get, n) ->m = floor(n/2)# m is the halfway location, rounded down.if goal < get(m)# compare goal to the word on #m alphabetically:a = 0# if before, then search from 0z = m# .... up to (not including) m.else# otherwise:a = m# if after, then search from mz = n# ... up to (not including) n. for i in [a...z] # do linear search on [a...z]. w = get i if w is goal return i return -1

`floor x` |
is `x` rounded down. |

`ceil x` |
is `x` rounded up. ("ceiling") |

`round x` |
is `x` rounded to the nearest integer. |

`floor`

is a math function for rounding down.
`floor(n/2)`

is used to
pick a halfway page when there
might be an odd number of pages.
**Exercise: Analyze Divided Search**

- Enter the program above, and get it to run.
Save it as "dividedsearch".
- How many steps did it take to run?

- Analyze the performance of the algorithm.
- When searching in
`n`

pages, what is the number of gets that you would expect (on average) to do with Divided Search?

- When searching in
- Measure the performance using
`findperf`

.- Explain any difference between the test results and your prediction.

**Binary Search**

After dividing the problem in half, we could
divide it in half again; then again; and again.
Eventually we will be left with just one page to
look at. This is the idea behind
*Binary Search*:

- Start with a search range [0...n)
- Loop as long as the search range has more than one page:
- get at page #m, halfway through the range.
- If the goal is alphabetized before the middle, adjust the top the search range down to m.
- Otherwise, adjust the bottom the search range down to m.

- Look at the one page in the range. Our word is either here, or it is not in the book.

**Divide and Conquer Algorithms**

Binary search is one of the most important
algorithms in computer science. It is the basic
example of a *divide-and-conquer* strategy.

In divide-and-conquer algorithms, some way is found to divide the problem in to smaller problems that are similar. Then each smaller problem can be further divided until the problem is easy to solve.

The reason divide-and-conquer algorithms are so important is that they can be blazingly fast. Even when searching a very large book, Binary Search can get an answer with just a few steps.

**Coding Binary Search**

In the code below, we again use the variables
`a`

and `z`

to represent
the current search range, and `m`

to represent the page halfway between `a`

and `z`

.

The difference between this algorithm and Divided Search is that we continue to adjust the search range repeatedly instead of doing it just once.

await loadscript 'http://csp.pencilcode.net/lib/find.coffee', defer() find (goal, get, n) ->a = 0# the start of the search range starts as 0.z = n# the end of the search range starts as n.while z - a > 1# repeat while the distance from a to z is more than 1:m = floor((a+z)/2)# m is the halfway location, rounded down.if goal < get(m)# compare goal to the word on #m alphabetically:z = m# if before, adjust the end of the range.else#a = m# otherwise, adjust the start. # when the loop is over, "a" is the only page in the range.if get(a) is goal# if the goal word is on this page:return a# this page is our answer.else# otherwisereturn -1# the word is not in the book.

The `while`

keyword in CoffeeScript
repeats a block of code as long as a condition is
true. Here, `while z - a > 1`

repeats the indented code as long as the gap
between `z`

and `a`

is
more than one page.

The first line of code that is *not*
indented under the `while`

is
reached when `z - a`

is 1 (or less).
In other words, the loop is finished when
there is only one page to look at.

**Exercise: Analyze Binary Search**

- Enter the program above, and get it to run.
Save it as "binarysearch".
- How many steps did it take to run?
- Try running with 1000 items by changing
the first line to
. How many steps does it take to run?`find 1000, (goal, get, n) ->`

- Analyze the performance of the algorithm.
- When searching in
`n`

pages, what is the number of gets that you would expect (on average) to do with Binary Search?

- When searching in
- Measure the performance using
`findperf`

.- Explain any difference between the test results and your prediction.

- Can you do better? Is there an algorithm that will use fewer gets than binary search?