[I wrote a more detailed post about 
Faster Character Drawing.]
Writing directly to the screen memory is tricky because each word of screen memory holds 16 pixels, 8 from each of 2 characters.
Address    Bit number      Hex   Dec
                  111111
        0123456789012345
16384   ..##.....####...   1E0C  3102
16416   .###....##..##..   330E  3635
16448   ####........##..   300F  3888
16480   ..##.......##...   180C  3096
16512   ..##......##....   0C0C  3084
16544   ..##.....##.....   060C  3078
16576   ..##....##......   030C  3075
16608   ..##....##..##..   330C  3123
16640   ######..######..   3F3F 16191
16672   ................
16704   ................
(Recall from writing drawPixel() that the bit order on the screen is LSB on the left.)
You need to do similar position computation as you did in drawPixel() to locate the starting word address, and do similar Anding and Oring to set the pixels, but you do it in 8-bit groups.  For example when drawing the "2" you would find it starts in the upper half-word or 16384 so the computation would be
pix = font["2"][0]
Screen[16384] = Screen[16384] & 0x00FF    // clear bits in upper half-word
Screen[16384] = Screen[16384] | (pix<<8)  // set bits in upper half-words
"<<" means left shift. The easiest way to do the 8-bit shift in Jack is to multiply by 256.
I suggest that you get your Output.jack working using DrawPixel(), then finish the rest of the OS. After the OS is working, then you can come back to Output and improve its performance.
--Mark