Lesser loop execution times for mult

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Lesser loop execution times for mult

karmic
As I was doing the the project for chapter 4, and writing the code for mult, I thought of this way to make it less time consuming.
After the initialisation​ofthe variables, if we were to branch the execution into LOOP1 or LOOP2 depending on whether R0>R1 or R1>=R0, then for the multiplications with relatively high difference (say, 750*20), we can choose to add 750 a 20 times instead of doing (A+A+A...) Till B times every single time, hence decreasing loop reiterations for most multiplications.
In common sense this should be faster since the larger numbers are 16 bit additions anyways, but I've only reached chapter 4 ending, so I can't be sure. Can someone knowledgeable about this give their thoughts?


Another question I have is, if we can use machine language level optimizations to make our higher layers faster like in this example (assuming it works)then  Since it is so close and bound to the hardware, can we use hardware improvements to add functionality like multiplication, or is it easier to just put it in the machine level language? (Making a chip specifically for multiplication or division operations... Should I wait till the next chapter which deals with architecture before tackling the lsst question)
Reply | Threaded
Open this post in threaded view
|

Re: Lesser loop execution times for mult

cadet1620
Administrator
karmic wrote
After the initialisation​ of the variables, if we were to branch the execution into LOOP1 or LOOP2 depending on whether R0>R1 or R1>=R0, then for the multiplications with relatively high difference (say, 750*20), we can choose to add 750 a 20 times instead of doing (A+A+A...) Till B times every single time, hence decreasing loop reiterations for most multiplications.

Yes, this is a valid optimization. Since the largest non-overflowing square is 181^2 = 32761, your multiply will never need to do more than 180 additions.

You can probably make the program smaller by exchanging R0 and R1 as required before entering a single loop. Your two loop version will be a few instructions faster because it doesn't need to do the swap, but these few instructions are minuscule compared to overall runtime.

Since the runtime of your multiply is linearly proportional to the smaller input value, this is called an order n algorithm, written O(n). In chapter 12 you will learn how to write a faster multiply that is O(log2 n). This means that the program's runtime will be proportional to the number of bits in the input numbers. The program will loop at most 16 times for 16-bit numbers.

Another question I have is, if we can use machine language level optimizations to make our higher layers faster like in this example (assuming it works)then Since it is so close and bound to the hardware, can we use hardware improvements to add functionality like multiplication, or is it easier to just put it in the machine level language? (Making a chip specifically for multiplication or division operations... Should I wait till the next chapter which deals with architecture before tackling the lsst question)

Yes, it is possible to design multiplication and division circuits, but they are large and complex compared to the rest of the Hack CPU, and you would need to modify the assembly language to include new instructions to use that hardware.

This will make more sense after you have made your CPU and Computer in chapter 5.

Software implementation of multiply and divide was used in early micro computers. The Intel 8080 and MOS Technology 6502, which were the first really popular microprocessors, did not have multiply or divide instructions.

--Mark

Reply | Threaded
Open this post in threaded view
|

Re: Lesser loop execution times for mult

karmic
Thanks for the great response, very informative.