Points of interests and improvements

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Points of interests and improvements

Marco Bellocchi
This has been  great and satisfying course.

After concluding it few things remain to improve at least in what I did:

1. My OS implementation works fine but I have noticed that my screen class perform almost twice as slow as the OS provided, even after I gave it a good shot at optimizing it. In particular I know drawRectangle is the main culprit even if I avoid using drawPixel and try to commit entire words in memory. The main problem is with the far left and far right words in each line which take a bit longer to calculate, at least in my implementation, and I have not attempted to optimize further, as Pong is anyway playable also with my OS.

2. My Virtual Machine translates vm to assembly code in an inefficient way, in fact my OS takes the entire space available in ROM, so I cannot really play Pong compiled by my own compiler for example; Having said that I have not attempt to "shrink" the assembly generated.

I would be curious to know if anyone else encountered similar or other challenges during the course and what solutions have you found to overcame them.

Reply | Threaded
Open this post in threaded view
|

Re: Points of interests and improvements

ivant
You have several cases for the rectangles. We only need to describe how to process one horizontal line. Let's first think of lines, which occupy 2 words. On the left you have: XXXX11111 (16 bits in total), or XXXX0000, where X are bits which need to remain the same, and 0 and 1 are bits that need to be set to 0 or 1 respectively. On the right you have that mirrored: 11111XXX or 00000XXX. (Of course, the number of zeroes, ones and Xes may vary for different rectangles.)

You will need to fetch the left word from the display memory, change the specified bits to ones or zeroes and put the new value back to the display memory. The same for the right one. How can you do the bit change? Would it be helpful if you have an array with the following binary values:

0000000000000000
0000000000000001
0000000000000011
0000000000000111
...
1111111111111111

Hint: Operations like AND, OR and NOT are quite fast, because they work on whole words.

If the line is more than 2 words long, the middle words are all just ones or just zeros, so you only need to write them to memory.

That leaves us with lines contained in a single word. Let's say that we need to create the following line: XXX111XXXXXXXXXX. We again need to read the value, set the several of the bits to 1 without changing the rest and put the new value back. It would be helpful if we had the following value available:

0001110000000000

The problem is, that it would only work for this rectangle. What if we needed it to begin on the next pixel? We would have to have this value then:

0000111000000000

And we would need different values for 1 pixels wide, 2 pixels wide, etc. So having these masks precomputed and stored into memory doesn't seem optimal. But there is a fast way to compute these values on the fly. You already have these:

0000001111111111
0001111111111111

How can you get this:

0001110000000000

P.S. You can implement this in a function called drawHLine, which you can use in drawRectangle and drawCircle to speed them up. You can use it in drawLine to draw the horizontal parts of each line as well.

P.P.S. I hope I gave you enough hints without spoiling the fun of the a-ha! moment. I'm sorry if I failed.


The main thread for discussing code optimizations is Generated code size. There are a few good ideas there, like call and return optimizations, dead code elimination, etc.

If you want to diverge from the book, you can check out register machines as alternative to the stack-based VM. You can check the Some fun with SSA thread for . SSA stands for Static Single Assignment -- a code transformation, which enables a wide range of optimizations. It is used in LLVM, GCC and many more compilers.
Reply | Threaded
Open this post in threaded view
|

Re: Points of interests and improvements

Marco Bellocchi
This post was updated on .
Thanks Ivant for the tips and the quick reply,

I appreciate it.

Yes indeed for rectangle (and indeed for circle and for any horizontal line) I have an optimized routine for drawHorizontal.

I do have the array you are talking about and indeed I have implemented what you mentioned. But are you matching like to like their OS speed?

I also have noticed (I had a look at their Screen.vm) that while I use memory peek and poke, they directly access memory with the array trick, maybe that will help too.

Thanks a lot also for pointing me in the right direction for code optimization, I think it would be nice if it was part of the course and one exercise would be to indeed work on optimizing in term of number of instructions.

Again many thanks for your suggestions
Reply | Threaded
Open this post in threaded view
|

Re: Points of interests and improvements

ivant
Marco Bellocchi wrote
But are you matching like to like their OS speed?
I haven't compared the speeds, so I don't know.

Marco Bellocchi wrote
I also have noticed (I had a look at their Screen.vm) that while I use memory peek and poke, they directly access memory with the array trick, maybe that will help too.
That will help, because call and return are expensive operations. One way to do this is to write your Screen code with the array access. Another way is to make your compiler inline the peek and poke methods. You can use special comments to mark which methods to inline. Or you can make it decide automatically, by some heuristic.
Reply | Threaded
Open this post in threaded view
|

Re: Points of interests and improvements

cadet1620
Administrator
Another thing that can speed up drawRectangle is to make your drawHorizontalLine routine take a repeat count argument so that it internally does
// compute starting and ending addresses and bit masks
while (n)
    // draw line loop
    startAddress += 32
    endAddress += 32
For narrow rectangles like the ends of the paddle on Pong, the overhead computing the addresses and bit masks is noticeable.

This is also a case where you can make a size/speed tradeoff.  It runs faster if you have move the "if (color)"
out of the drawing loop so that you have two loops, one that sets bits and one that clears bits.

--Mark
Reply | Threaded
Open this post in threaded view
|

Re: Points of interests and improvements

Marco Bellocchi
Thanks Mark,

Another very good point, I have not added this optimization., many thanks!
I am sure your OS runs as quick as the provided one.

I have also noticed my drawing of characters is slightly slower, but I have not done any particular optimization there.

I have tested also with my Mine Sweeper with my OS and since I am drawing a lot of rectangles, I can see it is a bit slower compared to the provided OS.

I guess it depends on where you want to take it in term of optimization.

Marco
Reply | Threaded
Open this post in threaded view
|

Re: Points of interests and improvements

cadet1620
Administrator
Marco Bellocchi wrote
I have also noticed my drawing of characters is slightly slower, but I have not done any particular optimization there.
Drawing characters to upper half words requires shifting the font left 8 bits which is a slow operation.

You can move this into the font initialization code. Create the font words with the bits in both the half words. Then in drawCharacter() you only need to AND with 0xFF00 or 0x00FF to get the required bits.

--Mark