|
|
My program draws the graphics for the test file as the expected one. But when it draws a rectangle, like for the window and the door, It is somewhat slow(but I used the optimized way of drawing a line). My assumption for it being slow was, I've only drawLine method which draws vertical, horizontal and diagonal lines, so to check all those conditions I've one big condition inside the while loop, that selects the correct condition for the current input values(like a >= dx or a <= dx), and that condition is executed in every iteration. But I can only check the condition only one time, so that I can get the correct condition for the while loop.
What I want to do
let xCondition = (~(a > dx));
let yCondition = (~(b > dy));
if(dx < 0) {
let xCondition = (~(a < dx));
}
if(dy < 0){
let yCondition = (~(b < dy));
}
while(xCondition & yCondition) {}
But this one is not working because of refactoring the conditions.
When I didn't refactor the condition inside the while loop, it works fine. But this one makes my program slow I think.
if(dx < 0) {
let xDirection = -1;
}
if(dy < 0){
let yDirection = -1;
}
while((((xDirection = 1) & ~(a > dx)) | ((xDirection = -1) & ~(a < dx))) & (((yDirection = 1) & ~(b > dy)) | ((yDirection = -1) & ~(b < dy)))) { }
|
|
If you have a single function for drawing any kind of line, it might be slow. Especially if it draws the line one pixel at a time. There is one kind of line, which can be drawn very fast compared to the others. And this line drawing can also be used in the drawRectangle, drawCircle, etc, and even in the general drawLine.
|
|
Okay Ivant, now I refactored into drawHorizontal, drawVertical and drawDiagonal functions. And It is not bad now, so how can I draw line without drawing pixel by pixel like for the horizontal one?
|
|
You can't always do so, but if the line's slope is less than 45 degrees, it will consist of small horizontal lines. The closer to horizontal the line is, the bigger the effect.
|
|
So is there any other way drawing a horizontal line without drawing pixel by pixel?
|
|
Yes, there is. What are the steps to write a single pixel on the screen?
|
|
1. Find the physical memory location of the given x, y in terms of bytes.
2. Then find the mask for actual location of the bit we want to set.
3. Then depending on the color, which is currently selected we may "or" or "and" with the mask
so as to change the bit value to the selected color.
|
|
Well, your 3rd step is actually more like 3 separate steps:
3a. read the word
3b. change the bit
3c. write the changed word back
In horizontal line, lots of pixels are in the same word and they are also next to each other. Let's use the underscore _ for pixels that are not part of the line, so we don't want to change them, and X for pixels that we want to change. Basically, you have 4 cases:
1: ______XXXXXX (some _, rest are X total 16 - left part of the line)
2. XXXX_____ (right part of the line)
3. XXXXXXXX (16 X - middle part of the line)
4. ___XXX___ (the whole line is shorter than the word)
You can construct bit masks to change several pixels at the same time. And for the 3rd case, you don't even need to read the previous value on this address, because you are replacing it all.
|
|
Great Ivant. Yeah I was thinking when you asked me the steps for drawing a pixel?
If you can please give me the idea for constructing bit masks to change several pixels at a time?
|
|
Well, you look at the first pattern, how many bit masks would you need? If they aren't too many, perhaps it makes sense to pre-compute them in Screen.init.
The bit-patterns for the second case seem to be easily computable from the ones we already have (mostly inverting them).
The third case doesn't need any.
So the only remaining is the fourth case. Now is there a fast and easy way to compute the pattern using the ones we already have?
|
|
Thank you Ivant, you helped me a lot.
Since we handle all lines in a word, XXXXXX, in its own we only need 15 bit mask. And I precompute it in Screen.init.
The fourth case is by AND, the first and the second case.
And now my program draws fast. But the output is not as expected.
Like drawing from 0 to 18 px is not correct.
do Screen.drawLine(0,0, 18,0);
This program should set the first word all 1 and in the second word the first 3 bits. And my program handles those cases.
And I even print memory locations at
16384[0] = -1,
16384[1] = -8192 and 16384[2] = 0;
-1 = 1111111111111111
-8192 = 1110000000000000
And this seems correct, but screen draws not correctly.
|
|
It looks like you are using the "inverted" maps, in the sense that you are drawing from the wrong end. That is, instead of -8192, you should write 7 (0000000000000111). It is a bit confusing, but that's how the HACK computer works.
An easy way to test this is to start the CPUEmulator and manually change the contents of the addresses.
|
|
Make sense, but I don't have any idea now, to make the change .
|
|
Won't just fixing the masks do the trick?
|
|
Finally it is working correctly. Thank you so much Ivant.
Thank you all .
|
|
Interesting discussion. I implemented a simpler form of this:
-check if the pixel we are at is the start of a word,
-if there's 16 pixels to write coming up,
-we are writing pixels left to right (reasonable simplification since both drawRectangle() and drawCircle() do this)
Was able to gain a 3x in performance, from 20s to 7s to draw the ScreenTest image. Clearly the overhead from the condition checking is significant, since without that you'd expect about a 8x benefit.
|
|
Very good improvement! The provided OS routines are not all well optimised. The drawRectangle is amongst the better ones, but still can be improved significantly (depending on the size of the rectangle between 2x and 4x).
drawLine is the one with the most potential, especially for horizontal lines which are not treated separately (here my own OS is up to 400x faster for horizontal, 43x faster for verical and 20x faster for random).
Also drawCircle can do much better by drawing bigger areas at once (my implementation is 8.8x faster).
These improvements can all be done without compromising on the code size. Instead of adding more specialised routines, I am using the drawRectangle as the main internal helper. E.g. if drawLine detects that it is a horizontal or vertical line, it will call drawRectangle which is already optimised to do the job much faster. Also drawCircle can make use of drawRectangle and draws multiple lines at once. I haven't seen such algorithm in the literature, but I guess I'm not the only one who had that idea: instead of drawing every line immediately, you keep track when the column is changing and in which line the last column change occurred. Then you draw the whole area as rectangle in one shot. If you look at the pixel structure of a rasterized circle, it is essentially a stacking of rectangles with different heights. The closer you come to the top, the rectangles will approach a height of 1 (or mathematically, if you reach the 45 degrees your x steps will be faster than your y steps). Same like when using drawLines, you can draw symmetrically 4 rectangles at once. My algorithm also ensures that pixels are not painted twice by overlapping rectangles. (The same idea works as well for circle outlines (non-filled) btw., where instead of drawing pixels, you can draw lines in one shot.)
Another thing that is generally slowing down Screen is the calculation of the screen address. For that reason, you can see that many traditional raster graphics APIs had routines like drawLineTo(x, y). So the starting coordinates are omitted and the initial screen address didn't need to be calculated, but was kept in the state of the module. Every call was relative to the last pixel position. Anyway, for Hack the screen address calculation can be accelerated a lot by using a customised shifting routine instead of calling multiply and divide.
For the whole test image, the above improvements led to an average of 13x faster drawing versus the stock OS for me.
|
|
This is the above mentioned fast screen address calculation routine, just using addition/subtraction and taking advantage of the property that a comparison returns either 0 or -1.
function int getScreenAddress(int x, int y) {
var int i;
// (x / 16) + (y * 32) + 16384, with x < 512, y < 256
// = (x shr 4) + (y shl 5) + 16384
// = ((x ror 4) & 31) + (y shl 5) + 16384
// = ((x rol 12) & 31) + (y shl 5) + 16384
// = ((x rol (5 + 7)) & 31) + (y shl 5) + 16384
// = (((x shl 5) rol 7) & 31) + (y shl 5) + 16384
let i = 12;
while (~(i = 7)) { // shift left x and y by 5
let y = y + y;
let x = x + x; // x < 512 cannot produce a carryover
let i = i - 1;
}
while (~(i = 0)) { // rotate left x by 7 -> x = x rol (5 + 7) = x rol 12
let x = x + x - (x < 0);
let i = i - 1;
}
return (x & 31) + y + 16384; // 16384 = start of SCREEN RAM
}
|
|
That's awesome work!
I'm mostly interested in completing the course, but I couldn't help myself with this small side task.
|
|
This post was updated on .
Another small thing that could be done is to eliminate the loop variable in the first part. This can be achieved by factoring out 32 from the base address:
(x / 16) + (y * 32) + 16384
= (x / 16) + (y * 32) + 512 * 32
(x / 16) + ((y + 512) * 32)
...
So the base address will be implicitly added prior to the loop. The first loop condition would then become
while (y < 16384) { //or while (~(y > 16383))
...
Though comparisons with 0 or +/-1 increases/decreases as used in the above version can be well optimised by the vm translator, this approach should still be slightly faster.
Since the routine is not too long, loop unrolling is an option as well. In this case the carryover check for x can be limited to the last 5 shifts instead of 7 as in the loop version.
Ultimately for such a central support function, an asm version can be considered. Then it is possible to do the calculation in below 60 cycles including internal call and return.
|
|