2 AM thoughts - How do you compile a compiler?

  • 12
    With the compiler and initial bootstrapping:
  • 10
    Using your hands like this
    *gestures vaguely*
  • 2
    @Fast-Nop So C compilers are written in assembly?
    I am horrified by the thought of doing the parsing and tokenization in assembly. Brave souls, those compiler guys.
  • 6
    Lower-level languages, basically, at least initially. Over time the compiler is sometimes rewritten into the language it's supposed to compile and it's compiled with previous versions, because it's kinda cool to have a self-sustained language. But until then you basically need any programming language compiler which already exists. First proper compiler was built in assembly and first assembler was coded in machine code by hand.
  • 4
    @NickyBones first version was written in assembly but once they had a working compiler they rewrote it in c and compiled the new compiler with the assembly version.

    Once the new was ready they have been using that one and improving it.

    So modern versions of the c compiler is written in C.

    The first also only had to handle the first version of C which did not have many of the modern features and also very little if any of optimizations.

    For more info

  • 3
    @NickyBones No, only the first C compiler may have been written in assembly, and then probably not fully featured (e.g. lacking the preprocessor). Consider also that you can cross-compile to get an existing compiler to a new system.

    On top of that, C as language is pretty compact, which is one of the greatest features of C. The standard library itself is written in C so that it counts more as "first application" in this context.
  • 1
    This is a cool series where the guy writes a compiler in C#. He does everything in C# and does not use "compiler tools".


  • 1
    @Voxera @NickyBones
    Today, NO!
    C compilers are written in CPP.
    If that sounds stupid, it is.
    If it bothers you enough, maybe write an email to GNU and ask what the fuck they were thinking.
    In the case of Clang, I think its okay, as LLVM is developed as a general purpose code generation back end.
  • 0
    With another compiler.
  • 1
    @Voxera But when add extensions to the ISA? You have more assembly commands than before so how can you use earlier versions that did not include it?
  • 2
    @NickyBones You don't have to use new machine instructions, and the program will still run fine, though maybe slower as it could using the new ones. For bootstrapping, performance doesn't matter.

    That's also why it's not too hard to parse things using assembly if performance is not important; it's basically just some loops and status variables.

    C especially is, tongue in cheek, sometimes referred to as "portable macro assembler", and that's because there's a pretty direct connection between C and assembly. Experienced C programmers have a good guess at the resulting machine code.
  • 3
    @metamourge Looks to me gcc devs were trying to duplicate C++ features in C to write the C compiler. This resulted in ugly, hard to maintain code. So they decided to refactor to C++ to clean up the code. Sounds like a solid decision to me:


    Also, C++ was bootstrapped from C by writing a custom preprocessor. This preprocessor compiled C++ code to C. Once they were able to get this running they could then write the C++ compiler in C++.
  • 1
    @Fast-Nop If we have compiler code that handles loop in the previous compiler version, and now I have hardware loops or even vector operations. So the resulting assembly would be drastically different. I assume loop parsing is pretty basic and is probably part of the "genesis" compiler. But that compiler can't use new ADDVV instruction it doesn't know...
  • 3

    There is a book:


    The amazon price is like $500. Not sure why so high. Search around. It is often referred to as the dragon book.
  • 4
    @NickyBones It absolutely can. Why couldn't it? You add the new features to the code compiled at the last stage of bootstrapping, and recompile that stage. The instructions the compiler is made with have nothing to do with the instructions it produces. They're just bytes. Otherwise cross-compilation wouldn't work.
  • 1
    @Lor-inc @Lor-inc I'm asking about how you introduce new instructions to *produce*. At no point did I say anything about the instructions the compiles is made of. But if you change the ISA you need to adjust both backend and frontend.
  • 2
    @NickyBones The instructions in the output are just byte sequences it puts out, so you can compile that updated compiler using the previous compiler version without issues.

    But yeah, if you want the compiler to use new machine instructions, you have to update the compiler, sure.
  • 2
    @NickyBones You just add them to the last layer. The final compiler doesn't have to use the initial compiler's implementation of loops. Usually it doesn't use the initial compiler's code at all, because you have better abstractions by then.
  • 1
    @NickyBones You would have to recompile the compiler after changing the source of the compiler to do it.

    But the compiler does not even have to run on the same type of hardware, it only have to know how write the right binary sequence (if compiling to a binary format) or byte code for java and other lang that run on the jvm.

    Compilers that compile to the new webassemly standard is not written in webassmbly, the just create the binary code the webassembly executes.

    I could if I wanted, write a c compiler in visual basic for windows running on an intel cpu but that creates a binary for linux on arm.

    There are quite few compilers that are written in the language they compile.
  • 0
    @Voxera You can cross compile VLIW?
    I recall there was a binary incompatibility issue there
  • 1
  • 1
    @NickyBones VLIW is binary incompatible with everything including other VLIWs, but you can nonetheless cross-compile to a specific one. You just won't be able to execute the resulting blob as instructions on the compiling machine.
  • 1
    here is a great (historical) story about adding a feature that doesn’t exists yet:

  • 1
    tldr; to add a feature you first need to recompile the compiler’s source supporting but not using the feature once, then you’re good to go
  • 0
    A tad unrelated but I feel the need to say this.

    When people ask me: "Which was born first, the chicken or the egg". I answer them: "A chicken who performed cross compilation".

    From wiki: "A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running."

    Now, the platform is the body containing the executable code or genome. We have the chicken-ancestor platform which with its compiler (reproductive organs) created an executable code (the genome of a proper chicken, the first one) for a platform other than the one which the compiler was running (the first chicken body, a different platform).

    Ergo, the egg was first.
  • 0
    I have done is for gcc ,
    You will need binutills and libc
    I do have a detailed step by step written somewhere
  • 0
    Ok I might be wrong so correct me

    1. For custom ISA i need to first write bin utils for that ISA

    2. compile/cross compile that bin utils using compiler( that executes on current OS) producing bin-utils executable for target Arch.

    3. This bin-utils will be used to compile a compiler with a eabi or a compiler with no eabi for the target arch.
    Like arm-none-eabi-gcc , arm-linux-eabi-gcc

    Here target arch is the Arch where your final compiler executes but
    On execution the binary it produces will be for your custom ISA
Add Comment