The square root algorithm presented in Chapter 12 is fairly easy to understand and implement,
but it is a bit slow because it needs to compute 8 squares.
This post presents a more efficient algorithm that only requires addition, subtraction and
shifting.
Algorithm
The algorithm is derived from the "long division" pencil and paper method of computing square
roots. If you don't know this method,
this page is the best
presentation I found in a quick search.
| Base-10 | | Binary |
|
5 6 0 4. 9 9
\/31 41 59 27.
5*5 25
6 41
106*6 6 36
5 59
1120*0 0
5 59 27
11204*4 4 48 16
1 11 11 00
112089*9 1 00 88 01
10 22 99 00
1120989*9 10 08 89 01
14 09 99
| |
1 1 0 1. 0 1
\/10 11 00 01.
01 1
1 11
101 1 01
10 00
1101 0
10 00 01
11001 1 10 01
10 00 00
110101 0
10 00 00 00
1101001 1 10 10 01
1 01 11
|
The tricky part of this manual method can be finding the correct digit to append to the partial
square root such that the partial difference isn't larger than the partial remainder. In the
example above, my first guess for the 4th digit was 5, but 11205*5 =56025, which is too
big. The algorithm can be converted to base-2 which makes the digit choice 0 or 1, and
that's an easy decision; just a single comparison.
Notice that the base-10 "double and append a digit" is mathematically "double, multiply by
the radix, add the digit." In binary, given the partial square root bbb, the partial
difference will either be zero or bbb*2*2+1 = bbb01.
Implementation
To generated the square root of x:
- Start with square root q and remainder r initialized to 0.
- For each iteration:
- Shift the two most significant bits of x into the least significant end
of r. This can be thought of as the double width shift
r:x = r:x << 2.
- Shift q left 1 bit to make room for the next square root bit.
- Generate subtrahend s.
- If s ≤ r, subtract s from r and add 1 to q.
- q contains the integer square root of x.
Notes:
- Although this implementation actually computes the square root of 16-bit unsigned values, your Jack
code should still check that x is ≥ 0 and raise System Error 4 if it is not.
- Don't use r = r*4 to do shifting. That will result in a call to Math.multiply!
- Step 2.1 is a bit tricky in Jack. Hints: What does the MSB of x mean?
How do you test if the MSB of x is set?
Results
Running in the VM Emulator, my Jack implementation of this algorithm runs 5 times faster than
the Math.sqrt in the supplied Math.vm file. Half the time in each iteration is spent on
step 2.1. Jack is not good at bit manipulation!