|
|
This post was updated on .
I'm trying to understand how a first compiler could be written.
Still confused after reading on the topic and what they said about this in "Perspectives".
If the very first compiler is written on the very first computer, then it would be written in machine language (only 0's and 1's).
But still, for that first compiler, how would I feed it with "asm" code like "@100" and "D=M" ?
It still only understands 0's and 1's ...
Is it because the first compiler would then understand the binary representation of the "@100"/"D=M" sequence of characters ?
In wikipedia : " self-compiling compiler — that is, compiler (or assembler) written in the source programming language that it intends to compile"
"written in the source programming language" means written in binary ?
With this Hack computer, we always have to use the host computer to change the assembly commands into machine language (binaries) that the Hack computer understands. If we always have to use the host computer to change our assembly commands to binaries, can I say that the Hack computer is incomplete, because it must depends on my host computer ? Is this where the Operating System comes into play ?
|
Administrator
|
The earliest computers were hard wired to solve a single problem. To solve a different problem, the wiring had to be changed.
Then came stored-program computers which had a way to enter program codes (in machine language) into memory. The earliest papers I've read about stored-program computers include mnemonics along side the machine language, so the programming process was probably writing the programs on paper using the mnemonics and hand translating them into machine language when you were satisfied with your program.
Someone, probably many different people in different computer projects, realized that if the computer could process data, it could process mnemonics into machine language, and simple assemblers were hand coded. The simple assemblers were used to write more capable assemblers.
Assembly language would have been used to write early compilers. Better compilers and compilers for new languages and computers were written with earlier compilers.
Once you have a compiler for a language, you can use it to write a compiler that can compile itself. That's the self-compiling compiler.
For an example of a Jack Compiler written in Jack, see this forum post.
|
|
This post was updated on .
OK.
So, about "Once you have a compiler for a language, you can use it to write a compiler that can compile itself. That's the self-compiling compiler. " :
I have Python assembler that changes Hack assembly code to Hack machine langauge (binary).
Is that a "compiler for a language" ? A python compiler for the language "Hack Assembly" (like "@100" and "D=M" ).
You say I can use this python compiler to write an Hack Assembler that can compile itself ?
And that by compile itself, you mean that I can feed instructions like "@100" and "D=M" directly inside the new "Hack Assembler" ?
Still, the only way to send "@100" and "D=M" to the "Hack Assembler", is by sending the binary representation of those strings to the "Hack Assembler". Right ?
And why "Once you have a compiler for a language" ? Why do I first need the python compiler to write the self-hosting compiler using python ?
|
Administrator
|
Theoretically, given your Hack Assembler in Python you could write a Hack Assembler in Hack. You will need some way to read the .asm files and write the .hack files. This is identical in concept to my Jack Compiler written in Jack.
It would be a lot of work to write an assembly language program that large, and debug it. I did write a Hack Assembler in Jack that uses the same File System built-in OS class that I wrote for the compiler, but it ran out of RAM assembling Pong. The symbol table is too big to fit in 16K!
You won't be able to run the generated Assembler.hack files on the CPUEmulator, however, because the CPUEmulator has no provisions for simulating new hardware. There is no way that it could access the PC's disk the way my File System built-in class can do in the VMEmulator.
You might better understand the concept of self-compiling by thinking about writing a [subset of] Python interpreter in Python. As long as your interpreter is written in the subset you implement, it could run a copy of itself:
$ python mypy.py
would start your interpreter in interactive mode.
$ python mypy.py mypy.py
would start your interpreter in script mode and your interpreter would then load a copy of mypy.py and run that copy in interactive mode.
The only reason that you want a high-level language compiler to write another high-level language compile is because the work is Herculean it you try to do it in assembly language.
The Hack Hack assembler would get its source as characters from a hardware peripheral, just like reading the keyboard.
|
|
Thanks a lot !
The first compiler is now much clearer in my head than the chicken and egg dilemma.
|
|
cadet1620 wrote
Theoretically, given your Hack Assembler in Python you could write a Hack Assembler in Hack. You will need some way to read the .asm files and write the .hack files. This is identical in concept to my Jack Compiler written in Jack.
Hi :)
I am also struggling with the idea of a self-compiling complier. This post may have shed some light on it but I'm hoping to drive it home with an example of my own. I would greatly appreciate any feedback on the following...
Ignoring the implementation limitations that you have already mentioned, such as the size of the symbol table and a way to store and retrieve the programs, if i wrote an assembler in Python that takes code written in the Hack assembly language and translates it into Hack machine instructions and then I wrote a program in the Hack assembly language that does the same, could I then run the Python-based assembler passing it my Hack assembly language assembler to produce a Hack machine language assembler as output?
If so, could I then use this new Hack machine language assembler to translate any future Hack assembly language programs I please?
Perhaps this Hack machine language assembler could even be stored directly on the hardware somehow to create a distributable hardware system able to be expanded in all of the ways outline in chapters 7-12 of the book.
Thanks a lot
|
|
Im not sure that the example I gave here was exactly relevant to self-compiling compilers (I'm beginning to understand the part about the binary sub-set necessary to compile the whole API now) but I am feeling pretty confident about it being a relevant thing. Perhaps it's called porting.
For any interested in more information, this playlist from Professor David Brailsford was very helpful.
https://youtube.com/playlist?list=PLzH6n4zXuckoJaMwuI1fhr5n8cJL18hYdand here is the final video in the series that isn't included in the playlist:
https://youtu.be/9W0Vxa6eqjA
|
Administrator
|
I think that people get confused by this concept because they take the concept too literally. The compiler is not actually compiling itself. The compiler has already been written and compiled and is a program that is sitting there ready to be run.
When it is run it is just doing what most programs do, which is to read data from some input source, do some processing on that data, and producing some output. In the case of a compiler, it is reading source code from a file and producing executable code written to another file.
The notion that a compiler can compile itself merely means that if you run the compiler on a file that contains the source code that was used to make the compiler that it will produce as its output an executable file that is identical to the one that is currently being run. The compiler hasn't "compiled itself", but rather compiled a copy of its source code into a copy of itself.
One of the big values of reaching this point is that, by their nature, compilers are pretty complex pieces of code. So imagine you do the following:
On a PC you write a Jack compiler in C that will take a program written in Jack as its input and produce executable code that will run on the Hack. Call this program jcFromC. You now have a program that will run on the Hack platform that will take a file written in Jack and compile it to a program that will run on the Hack. You don't yet have any programs written in Jack, but now you have a compiler for them that will run on the Hack.
Now you set about writing a compiler in Jack that will take an input file written in Jack and produce executable code that will run on the Hack. Call this program jcFromJack. You only have one compiler write now, namely jcFromC. But after you run it, you now have a second executable program, jcFromJack, that is a second Jack compiler that will run on the Hack. So use it to compile the jcFromJack source code to get a third Jack compiler. You now have two compilers generated from the same jcFromJack source code file but with two different compilers. The resulting executable files should be identical. If they are, then yo have high confidence that your compiler is good.
|
|
Thank you very much for your reply.
I had in fact come to this point already and your explanation consolidated that for me.
From that place, however, I was then trying to convince of a small program that somehow expands as it executes, compiling itself along the way. I imagined that the system would pull itself up by the boot straps every single time it started up. I took it too literally.
Thanks
|
Administrator
|
The term "bootstrapping" has come to mean two related but significantly different things. Historically, bootstrapping a compiler meant the process we are talking about in which you start with a piece of hardware that about all it can do is load machine code from some device (a paper tape was commonly used back then) into memory and then, perhaps when a reset button is pressed, start executing that code. So you write a very simple assembler than can read simple instructions from the input device and write the assembled machine code back out onto a tape. This assembler probably has no features like labels or variables -- you have to determine all of those values and hardcode them as their numerical values. But now you have the ability to write programs in assembly language and assemble them, so you probably write some kind of simple editor program so that you can type your assembly code at a keyboard and save it to tape. Now you use your editor to write a more capable assembler that allows your code to be written at a higher level using variable names, and labels. But you have to write this assembly program without using them in the code itself as your only assembler can't handle it. But after you assemble the new one it can handle it, so now you write another assembler that handles macros and other more advanced features. You are now at a point where you can write a program in assembly that can work at a higher level. Writing a VM translator is probably a reasonable route to take and, once again, you probably start out with a no-frills translator written in assembly and then use the resulting program to write a more capable translator in VM code. Once you have enough features that it's reasonable to write significant programs with you write a very simple high-level compiler that you translate/assemble using your existing tool chain. But now you can write more complex programs in your high level language that is more complete or has more features.
The other kind of bootstrapping is along the lines of what you mentioned and is being actively explored and research from a secure-computing perspective. The idea is that instead of using a normal compiler (which might contain malicious code) to compile the source code, you have a tool chain that basically walks up a similar bootstrapping process to generate the final executable code, thus making the process much more transparent.
|
|
Thank you for your comprehensive replies. That's very kind of you.
|
|