Programming comes to mind when most people are presented with the term “computer science”. This is may not necessarily be the case, but a basic skill inherent with most computer scientists is the ability to program. What most people don’t know is that even the most sophisticated of programs will not be able to do anything without a compiler.
A compiler is a program that decodes instructions written in a higher order language and produces an assembly language program [1], and just like there are seven deadly sins, there are seven phases of a compiler—okay, that may have well been a really bad analogy. *grins*
Anyhow, a compiler is divided into 7 phases [2], basically from the original character stream, to the target machine code, assuming you want to run it on a machine, or virtual machine for that matter.
So basically what we’re working with is shown in the following diagram: [3]
Lexical Analyzer – lexical analysis breaks down the entire text of the program into the smallest atomic units, called tokens, at the same time throwing out unnecessary information such as white space and comments. [4]
Syntax Analyzer – syntax analysis, otherwise known as parsing, takes the tokens produced from lexical analysis and groups them into grammatical structures. [5]
Semantic Analyzer – semantic analysis utilizes the syntax tree and information in the symbol table to check the semantic consistency of the source program with the language definition. This is also where type-checking is done. [6]
Intermediate Code Generator – generates an explicit low-level or machine-like intermediate representation [7], having two properties: it should be easy to produce and it should be easy to translate into the target machine. [8]
Code Optimization – code optimization attempts to improve the intermediate code so a better target code will result. Usually better means faster, but other objectives may be desired, such as shorter code, or target code that consumes less power. [9]
Code Generator – code generation takes as input the intermediate representation as a result of code optimization, and produces the target code (e.g. machine code) to be executed by the machine (or virtual machine). [10]
Bibliography:
1 - http://wordnet.princeton.edu/perl/webwn
2 - Aho, Lam, Sethi and Ullman (2007), Compilers: Principles, Techniques, Tools (2nd Edition) pp.5, Boston: Pearson/Addison-Wesley.
3 - ibid.
4 - P. N. Hilfinger, CS 164: Programming Languages and Compilers (Class Notes #2: Lexical) pp.1 - http://www.cs.berkeley.edu/~hilfingr/cs164/public_html/lectures/note2.pdf
5 - ibid.
6 - Aho, Lam, Sethi and Ullman (2007), Compilers: Principles, Techniques, Tools (2nd Edition) pp.8, Boston: Pearson/Addison-Wesley.
7 - C.Sc. 131: Systems Architecture (Topic 3: Controlling the Computer) pp. 3 - http://www.comp.lancs.ac.uk/computing/staff/kc/Lecturing/131/7.pdf
8 - Aho, Lam, Sethi and Ullman (2007), Compilers: Principles, Techniques, Tools (2nd Edition) pp.9, Boston: Pearson/Addison-Wesley.
9 - Aho, Lam, Sethi and Ullman (2007), Compilers: Principles, Techniques, Tools (2nd Edition) pp.10, Boston: Pearson/Addison-Wesley.
10 - http://en.wikipedia.org/wiki/Code_generation_(compiler)