FPGA BRAM: Which Operating Mode for HACK RAM?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

FPGA BRAM: Which Operating Mode for HACK RAM?

gjd02
When setting up Block RAM (BRAM) in an FPGA, I noticed there are 3 operating modes: WRITE_FIRST, READ_FIRST, and NO_CHANGE. Below I have listed descriptions of them from the Synplify Pro synthesis tool user guide. Which is the correct setting for RAM in the HACK computer? How would an assembly instruction like M=M+1 work in terms of read and write conflicts? That assembly instruction translates to RAM[A] = RAM[A]+1 where A is the value in the address A register. In other words we are reading form a certain address and then writing to that same address.

When write enable (WE) is active ...

WRITE_FIRST:
This is a transparent mode, and the input data is simultaneously written into memory and stored in the RAM data output (DO). DO uses the value of the RAM data input (DI).

READ_FIRST:
This mode is read before write. The data previously stored at the write address appears at the RAM data output (DO) first, and then the RAM input data is stored in memory. DO uses the value of the memory content.

NO_CHANGE:
RAM data output (DO) remains the same during a write operation, with DO containing the last read data.

Any help would be appreciated!
Reply | Threaded
Open this post in threaded view
|

Re: FPGA BRAM: Which Operating Mode for HACK RAM?

dolomiti7
Short answer: I would generally say WRITE_FIRST is the correct choice for your FPGA which is the closest representation of the nand2tetris memory.

More specifically, you have to study the timing diagrams of your FPGA's block RAM. The memory implementation of nand2tetris uses a purely combinatorial addressing/memory cell selection circuit. It means that as soon as the address changes, the memory would immediately provide the data of the newly addressed cell, regardless of a clock cycle. Only the writing of new data to a cell is sequential and will occur with a new clock cycle. A typical block RAM of a FPGA will usually differ from this behaviour by at least registering the address. E.g. the address has to be written into the address register of the block RAM and the data read from memory will only be available for the next cycle.

That is especially relevant because of some oddities in the n2t instruction set which allows reading/writing to the same cell AND changing the address for the next memory instruction at the same time:
AM=M+1

Coming back to your question:
Consecutive M=M+1 commands will all refer to the same address and you want the new value written to the memory to be available immediately for the next instruction as M. This is what the WRITE_FIRST mode of your FPGA seems to provide.

More critical would be something like this:
AM=M+1
M=M+1

Nand2tetris takes advantage of the combinatorial memory, because the data at the new A is provided "in between" the cycles. So effectively n2t can read AND write a value in 1  cycle.

I am pretty sure that your block RAM cannot do that within 1 cycle:
- read from old A
- combinatorial update ALU
- write to old A
- read from new A

Some block RAMs can be configured as dual port. With such configurations you could manage to do 2 memory accesses in parallel. However, that leaves you with the issue that the VRAM portion has to be read in parallel by the video controller, typically blocking one port (though not relevant in practice since AM=M... commands typically don't occur with VRAM addresses). In any case, that's why it is so important to study the timing diagrams of your specific FPGA.

To summarise: to deal with parallel read/write operations and address changes, you need either a fully combinatorial memory decoding circuit 100% like nand2tetris (unlikely that your Block RAM can do that), or insert wait states to allow data to be written/address register of the block RAM to be updated, or (most complicated) add a write-back cache circuit.
Reply | Threaded
Open this post in threaded view
|

Re: FPGA BRAM: Which Operating Mode for HACK RAM?

gcpd
In the HACK specification all registers & memory access are implied to be clocked as the RAM modules are derived from the Register part, e.g. from Register.hdl:

/**
 * 16-bit register:
 * If load[t] == 1 then out[t+1] = in[t]
 * else out does not change
 */

A possible point of confusion is that the ALU circuit is combinatorial, but the value of M being expressed in the ALU is from no later than [t-1] because the A register can't have been updated any later than [t-1].

So in an instruction like AM=M+1 the value of M being expressed in the ALU won't be from any later than whatever was written to RAM[A] in the previous cycle, the new value will be expressed in the A register and current RAM[A] no earlier than [t+1].

HACK ISA is specified as READ_FIRST behaviour (all RHS values are evaluated then written simultaneously), if the BRAM has 1 cycle read/write latency then it will work without any other changes.

I'm an FPGA novice myself but I have gotten as far as running HACK programs on real hardware to test some of these assertions for my own edification and so far it all seems to work when implemented as described using READ_FIRST mode and syncronized read/write for the BRAM.

I've seen at least one other project attempt to treat reads as being combinatorial but this becomes unstable if operating at a clock speed close to the read/write latency of the BRAM (i.e. you mask the problem by slowing down the clock but I feel like this is just proving that the implementation is not actually correct).

Happy to be corrected if I've misunderstood something but that is how it seems to me so far.
Reply | Threaded
Open this post in threaded view
|

Re: FPGA BRAM: Which Operating Mode for HACK RAM?

gcpd
> I've seen at least one other project attempt to treat reads as being combinatorial but this becomes unstable if operating at a clock speed close to the read/write latency of the BRAM

Never mind the timing became unstable for other reasons - I guess it works even without the read being syncronized which feels wrong but if it works it works.
Reply | Threaded
Open this post in threaded view
|

Re: FPGA BRAM: Which Operating Mode for HACK RAM?

WBahn
Administrator
In reply to this post by gcpd
In general, you always have the issue of needing to ensure that the combinatorial processing of signals from sequential elements have enough time to settle before the results are written into sequential elements. This is what sets the maximum clock frequency that your system can operate at.

If you are willing and able to accept a relatively slow clock, then you can use the basic architectural approach that the Hack uses, which is generally referred to as a single-cycle processor. It's simple, reliable, cheap, and slow. To get faster speed, you can do things like break the processing of instructions up into defined steps and then use multiple clock cycles to process a single instruction. This is known as pipelining. A common approach is to use a five-stage pipeline having FETCH-DECODE-EXECUTE-MEMORY ACCESS-WRITE BACK stages. The downside is that you have to have registers between each stage to walk the date and control signals from one stage to the next, which adds a lot of complexity. The upside is that you can have new instructions working their way through the stages at the same time. Since each stage has a shorter propagation delay, you can run your clock faster, perhaps three to four times faster. Of course, it takes five clock cycles to execute an instruction, so there is a penalty there. But once the pipeline is filled, a new instruction finished on every clock cycle (unless something, like a branch, disrupts it). So your program ends up running three to four times faster overall. This is the key concept that drove the development of the RISC-type architectures.

Lots of other issues come up in the design or real computers that are trying to get as much processing power as possible, including dealing with the fact that the speed that the CPU can run at is significantly faster than what today's memory systems can keep up with. So we've developed all kinds of tricks, such as memory cache systems, to mitigate this.