Questions about the virtual machine

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Questions about the virtual machine

cyclogenesis
This chapter has been the most difficult not because of the introduction to the stack or the project itself but because they introduce this idea of the virtual machine. I am unsure about the existence of this virtual machine. Let's start off with some quotes from the book.

The basic idea is as follows: Instead of running on a real platform, the intermedi-
ate code is designed to run on a Virtual Machine. The VM is an abstract computer
that does not exist for real, but can rather be realized on other computer platforms.


The result of this elaborate translation process, known as compilation, will
be yet another text file, containing machine-level code.
Of course machine language is also an abstraction—an agreed upon set of binary
codes.


The definition of an abstract machine according to wiki is:

An abstract machine, also called an abstract computer, is a theoretical computer used for defining a model of computation. An abstract machine can also refer to a microprocessor design which has yet to be (or is not intended to be) implemented as hardware. An abstract machine implemented as a software simulation, or for which an interpreter exists, is called a virtual machine.

My first concern is that when translating assembly into binary, we never talked about the existence of a virtual machine. If intermediate code is meant to run on some abstract machine, why isn't assembly considered an even lower intermediate code that runs on some VM?

I know assembly is a one-to-one translation to binary. Perhaps this is why it doesn't make sense to say that assembly runs on some virtual machine. The binary runs on a real machine and ergo, there isn't a virtual machine.
Let's come back to this.

The first diagram presented in the book is this(figure 1.1 right before chapter 1):


This shows a hierarchy of abstractions built upon abstractions.
Looking at the VM block, I will assume the abstract interface is the VM intermediate code itself.
The implementation of this VM code is the abstract interface of assembly code. Perhaps this is an incorrect assumption.

My question is this. The VM code runs on a virtual machine. Are there any other virtual machines or abstract machines in this hierarchy?

If we had the microprocessor design in front of us(which according to the wiki is an abstract machine),
can't this be used as the abstract interface of the assembly code/machine code?

I want to say that assembly/machine code can run on an abstract machine(microprocessor architecture),
but it is not a virtual machine because it isn't "implemented as a software simulation". It is implemented with real hardware. Please correct me if I am wrong.

Let's skip to the other end up the abstraction diagram.
High level code is compiled into VM code.
If VM code is meant to run on some abstract machine that's implemented with software simulation(VM),
does high level code run on some virtual machine as well?

Thanks for your time.
Reply | Threaded
Open this post in threaded view
|

Re: Questions about the virtual machine

WBahn
Administrator
You are trying to take one definition you found somewhere on the internet and force every use of those terms to have to conform to that definition. There are strong similarities here, but not exact matches.

At the end of the day you have a Jack program that gets compiled down to Hack machine code and that code is run on a piece of hardware (or in the case here, using a simulator that behaves suitably like the hardware). Whether you are using physical hardware or running it in a simulator, it is reasonable to say that the machine code is running on the Hack CPU hardware architecture. The architecture is just the set of supported instructions, each of which results in a well-defined action when executed.

Now imagine that we design an entirely different CPU whose architecture supports the set of VM commands. We could then run a .vm file directly on this architecture. No need to go any further.

But we don't actually have a different CPU that supports the VM language. So instead we create an assembly language program that takes each VM command and creates assembly code that implements the required behavior of that command. Hence we have embedded a simulator for the VM CPU into our assembly language program. That is what is meant, in this context, by a virtual machine.

Reply | Threaded
Open this post in threaded view
|

Re: Questions about the virtual machine

cyclogenesis
Thanks for the reply!

"You are trying to take one definition you found somewhere on the internet and force every use of those terms to have to conform to that definition. There are strong similarities here, but not exact matches. "

Of course. That is the point of definitions. To give exact(as can be) meaning to terms. The book didn't really define the term "abstract machine" and maybe thought it was obvious just by reading it. I know that  definitions can be fluid among disciplines and perhaps not even well defined within a field though. I have to anchor my understanding from somewhere and terminology seems like a great place to start.

"But we don't actually have a different CPU that supports the VM language."
This is where the "virtual" part comes into play.
But then I might ask if we could have a different CPU that supports the VM language?
What I mean is, when you say we don't actually have a different CPU that supports the VM language,
does that mean that we don't currently have one in front of us, but is certainly possible, or that it is impossible to realize such a CPU?

Let's assume we didn't know about the VM language. We only know about the high level Jack and the Hack instructions. Can we change your sentence to say this:

"Now imagine that we design an entirely different CPU whose architecture supports the set of Jack commands. We could then run a .jack file directly on this architecture. No need to go any further."

Can we embed a simulator into our assembly program to run the Jack CPU?
This might sound funny because we never think of high level code running directly on a computer but if intermediate code runs on a (virtual)machine, why would high level code not do the same?

Thanks for your time.
Reply | Threaded
Open this post in threaded view
|

Re: Questions about the virtual machine

WBahn
Administrator
There is nothing that would prevent someone from designing and building hardware that directly supports the VM command set. I doubt anyone has done it, but it would be possible. If someone were to do it, there are two paths. The first, and by far simplest, would be to implement it using microcode and run it on top of a traditional architecture. This would, in reality, be properly described as a hardware emulator. The other approach would be to design it directly into silicon. This has some hurdles because one of the reasons that we have the VM instruction set is that it is a better match for human-level thinking and not a good match or implementation in hardware directly -- if it was we would be using CPUs that operate with that as their instruction set.

Regardless of whether such a CPU does or could exist, what we are still doing would still be using a virtual machine. Just like if someone uses VMWare or other VM software to run software targeted for one architecture on a different one. An example of this are the myriad programs that allow you to run code developed for old arcade game consoles directly on a PC without modification. The processors for the consoled most definitely existed and in some cases are still in production or at least available on the used aftermarket, but when you run that code on your PC you are using a virtual machine.

Another example is when you run a Java program or a .NET program. You are using the JVM (Java Virtual Machine) or the CLR (Common Language Runtime), but this VM is a bit different. Here you are running the actual VM code on a general-purpose VM implementation and not embedding the code that is needed to get the desired behavior into the program itself. The reason for this is specifically to divorce the compilation of the high level language from the underlying hardware. Instead, your target the virtual machine and then every hardware platform has a different VM that can execute the same VM code but does so using the capabilities of the hardware it is running on.

As for making a CPU that runs Jack code directly. Yes, it is theoretically possible. I imagine people have considered and tinkered with the idea. I'm not aware of any serious effort to do so, however. Having said that, many processors were moving down the CISC path with the aim of implementing CPU instructions that had very direct mappings to high level language statements and constructs. The goal was to take very commonly used coding blocks and make them extremely fast and efficient in hardware. But that is far easier said that done and few compilers even attempt to use those instructions.
Reply | Threaded
Open this post in threaded view
|

Re: Questions about the virtual machine

cyclogenesis
This post was updated on .
Proper apologies if these are obvious or stupid questions. Someone told me in the other thread that I am thinking into this too much. Perhaps but someone had to come up with the idea of virtual machine and I feel that the concept is important. Sometimes people get things quickly and sometimes they don't. This is one of those times they don't.

"Regardless of whether such a CPU does or could exist, what we are still doing would still be using a virtual machine."

Okay. I can kind of accept this. But I will admit it is hard for me to imagine a real CPU that would execute "push constant 3" much differently than how we emulate it below. Perhaps this where my understanding is lacking. Maybe perhaps this is the crux of the whole thing.

@3
D=A
@SP
M=M+1
A=M-1
M=D

If the above is considered emulating(as opposed to pure translation?) the VM command push constant 3, then in the same sense, a flip flop, which is specified by the interface at the top of the photo can be emulated using the logic gate diagram below it:


What I mean by this is, just like the assembly produced for the statement "push constant 3" is unique for systems with different ISAs, the the data flip-flop gate can be implemented(emulated?) in different ways. This is not the only way to structure logic gates to effect the DFF specification. You can see how I am trying to roll back to the levels of abstractions diagram I posted in my first post.

I believe I understand your high level view about emulating games. PS1 games are meant to run on that specific hardware(namely the PS1 architecture) so how can I play it on the PC? We translate the PS1 code into code my PC run and this is the "emulation". I am copying the way the PS1 carries out it's commands in my own native tongue(x86).

So what is emulation then? What is the opposite of it? Again, sorry to delve into semantics but perhaps this is the key. Here is what I want to say.

This hack machine command:
1111 1101 1101 1000 effects MD=M+1. There is no emulation here. This is the only language the hack system truly understands.

Similarly, this next command, a non-hack machine(secretly a VM command):
push argument 12 effects this(pretend there is a argument segment nearby in the diagram):


There is no emulation here. This is the only language of this machine( again let's assume we don't know it's a VM).

But, because we wish to run this command on the hack platform, we must emulate it.
This then changes the definition of the previous non-hack machine to a virtual machine rather than just a normal machine like hack system.
If this is true, then I understand virtual machines.
But, like I mentioned in the previous post, if this is true, then it can be extended to level of Jack as well.

For example take the C++ command:

int x = 3; which effects whatever the C++ standard specifically says it should. Basically, bringing into existence the integer 3 which can be referred to by the identifier x. This is the only language of this C++ machine. There is no emulation here.

But, since we don't have a C++ machine and only a Hack machine, we must emulate it.
The C++ machine is a virtual machine too then, just like the VM code mentioned earlier.
The C++ code will have to be emulated using the Hack ISA(probably a brutal task).

I really really hope this is how it is.
Otherwise, it really makes no sense why the idea of a virtual machine only exists at the intermediate code level rather than any other level above or below it.

Once again, thanks for your time.
Reply | Threaded
Open this post in threaded view
|

Re: Questions about the virtual machine

WBahn
Administrator
One thing to keep in mind is that what something is called often depends on the context and purpose of the discussion. That's the nature of the beast with nearly any abstraction, particularly if it is multi-layered.

When trying to communicate the notion of a complex computer hardware and software system being visualized at several different, largely independent levels, it makes perfect sense to talk about the compiler producing VM code and that VM code running on a machine that runs VM code -- a VM machine.

The fact that we know that we are not, in fact, using such a machine makes it perfectly reasonable to talk about the code running on a virtual machine, but we should just say that it is running on a VM machine because we shouldn't know that it isn't real -- it's on the other side of the wall that separated the input to the VM machine and what the VM machine does with that code. We can talk about pushing and popping the stack. We can talk about this memory segment or that memory segment. We don't need to know about how the stack or the memory segments are implemented. That, and other such details, are the responsibility of this black box we call the VM machine and maybe it's implemented in hardware, maybe it's implemented in software, maybe it's a bunch of guys with punch cards who can write really fast. We don't know and we don't care.

For those same reasons, when talking at that level we should never mention stack pointers or heaps or anything else related to the details of how a particular VM machine functions under the hood. If we want to talk about those things, then we need to move to the other side of the VM wall where we are talking about how to implement the behavior (i.e., emulate, or "reproduce the function or action of") of the VM machine using a particular architecture at the next step down (in case case, the HACK assembly language architecture). At that next lower level we don't talk about parts of the instruction word or ALUs or instruction decoding or jump logic -- those belong on the other side of the yet another lower wall between the assembler and the processor wall -- here we talk about the 28 defined instructions, the A and D registers, the RAM, and to a lesser degree the PC. We shouldn't refer to "the output of the ALU", we should talk about the result of the opcode being executed. But we do talk about the output of the ALU because humans are good at being sloppy with these boundaries and still getting the point across -- in fact the sloppiness usually facilitates that communication.

So whether it is reasonable to talk about the execution of a bunch of instructions as just a bunch of instructions being executed verses being part of the virtual machine depends on the nature of the conversation that sets the context of the discussion.
Reply | Threaded
Open this post in threaded view
|

Re: Questions about the virtual machine

cyclogenesis
"One thing to keep in mind is that what something is called often depends on the context and purpose of the discussion."

That's reasonable. When you look up the wiki on virtual machines, you'll see they attempt to remove ambiguity by specifying two different types of virtual machines:
System virtual machines and Process virtual machines

"The fact that we know that we are not, in fact, using such a machine makes it perfectly reasonable to talk about the code running on a virtual machine, but we should just say that it is running on a VM machine because we shouldn't know that it isn't real"

Can you rephrase this? Specifically "the code running on a virtual machine, but we should just say that it is running on a VM machine because we shouldn't know that it isn't real".
It sounds like you are saying that because we are dealing with an imaginary machine, we should just call it a virtual machine. In which case, I agree.

I understand the idea of abstractions and hiding away implementation details in a "black box". Then layering these abstractions on-top of each other.
What I don't understand is which abstraction layer can be considered a VM and vice-versa. I apologize if you are repeating yourself.

Please answer this.
If I am compiling Jack to an IR(the VM language specified in the book) and that is further compiled into assembly, there is a virtual machine. This is established.

If I am compiling Jack straight to Hack assembly, there isn't a virtual machine correct?
If there is no virtual machine when compiling Jack straight into Hack, then that tells me a virtual machine is an abstract machine that run intermediate code and only intermediate code.




Reply | Threaded
Open this post in threaded view
|

Re: Questions about the virtual machine

WBahn
Administrator
What I'm saying is that the person that has a VM file that they want to execute should, in general, not know whether (1) that file is being run directly on a piece of hardware that implements the VM command set architecture, or (2) that file is being run on a software emulator that is emulating the machine, or (3) that file's instructions are being translated into some other program in some other language and it is that program that is being run on a completely different hardware architecture. In that last case, there is no separation between the virtual machine and the program being run -- they have been melded into one executable.

More to the point, the person writing the compiler than compiles the Jack down to VM code has no idea which of these three will eventually be used. So the thing that makes the most sense for them to do is to talk and think like the VM code they produce is going to be running on a VM machine -- not a virtual VM machine, but just a VM machine. Maybe it's virtual, maybe it's not; they neither know nor care.

Consider someone writing a compiler to take a program and produce code that will run on an Android smartphone. They work from the mindset that the output of their program will be running on an Android smartphone. But, in fact, there's a good chance that, at some point, that program will be run on a virtual machine that is running on a PC, linux, or Mac, particularly while it is being developed. But, again, the person writing the compiler can't tell the difference.

But, in reality, we know that the output of a Java compiler (known as Java Byte Code) or a .NET compiler (Known as Microsoft Intermediate Language) is not going to be run directly on hardware, but rather on a virtual machine (the JVM for Java and the CLR for .NET) and therefore we don't even pretend that it might not be and we just call it a virtual machine.

As for your last question, the answer depends on the bounds of what you mean by a virtual machine. I know that's not very satisfying, but the meaning is highly context dependent and the very fact that we are talking about abstract concepts makes this somewhat inevitable.

Consider the VM Emulator. When you load your .vm files into it and it runs them, it does NOT translate them to assembly language. It maintains a set of data structures that are the same things that a hardware implementation of a VM CPU would have to and it interacts with them directly in the way specified by the VM language when it executed each command. The only things that get pushed and popped to/from the stack are what the commands explicitly do. There's no such thing as a stack frame and the stack isn't used by the function calling process to pass information or store context information.

But, that VM Emulator is probably running on a PC or a linux system and is, ultimately being executed as native instructions on, most likely, an x86 processor (or some variant). If the VM Emulator is written in Java (which it is), then there is the JVM running Java Byte Code. To make matters even more interesting, modern x86 architectures actually don't execute x86 instructions at the hardware level. Instead, they translate them to an internal representation that run on a RISC processor buried deep within the CPU because that was found to be more scalable but they couldn't go to it directly and, instead, had to maintain the same instructions set as far as the person outside the processor is concerned. So there are multiple virtual machines at play here.

But the Hack VM machine that we are talking about when we talk about the VM Translator is not like any of these. It is NOT a separate piece of software that is reading VM code and making things happen. If that were the case, we would simply load a VM emulator program (which would be in Hack machine code, obviously) into ROM and the VM code somewhere (maybe ROM, maybe RAM) and then run the VM emulator and let it start executing the VM code. I don't think you would have any problem saying that there was a virtual machine being used in this case, right? Instead, we translate each VM instruction in the VM program into a sequence of Hack machine code instructions that result in the same behavior that the VM emulator would perform. So are we using an actual VM or not? Depends on what you mean by a virtual machine and what the focus of the discussion is.

When we talk about the virtual machine in the context of this course, we are talking about a truly abstract concept -- we compile our programs using a theoretical model of computing that is stack based. Any time we talk about passing arguments on the stack, or local variables being stored on the stack, or all functions within a class sharing a common set of static variables, we are NOT talking about anything running at the level of the Hack CPU because the Hack CPU doesn't have a stack, any where of any kind. If our program runs that way it is because we are adhering to the framework that our program is running on a machine that implements this stack-oriented theoretical model of computing. But there is NO requirement that a Jack compiler uses a stack-based model of computation and, if it doesn't, then there may truly be no notion of a virtual machine at play. But if it does use that model, even if it is compiling directly from Jack to Hack, then there would still be that notion of a virtual machine.