🔨

Language Processing (compilers, interpreters, NLP)

TagsAlgorithmsComputer scienceNLPautomata theorycompilersinterpreters
Created
Updated

Requisites

Programing Language basics

Resources

Foundations

NameTags
Compilers, principles, techniques, and tools by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. UllmanField-defining textbookThe purple dragon bookbook
Compilers by Stanford and EDXAlex AikenEDXStanfordcourse
Crafting Interpreters, Nystromdslfreeinterpreter
Modern Compiler Implementation in ML, Andrew W. AppelThe tiger bookbook
CS 421: Programming Languages and Compilerscourse
CSEP501: Compiler Construction by Washington Universitycourse
Compiler Design Analysis And Transformation by Helmut Seidl, Reinhard Wilhelm, and Sebastian Hack Springerbook
Understanding and Writing Compilers A do-it-yourself guide by Richard Bornatbook
Engineering: A Compiler: Cooper, Keith D., Torczon, Lindabook
Modern Compiler Design by Dick Grune, Kees van Reeuwijk, Henri E. Bal, Ceriel J.H. Jacobs, Koen Langendoen. Springerbook
CS143 Compilers by Stanfordcourse
Programming Languages and Compilers by CS 164 @ UC Berkeleycourse
Engineering a compiler. Second edition
Writing a C CompilerBuild a Real Programming Language from Scratchby Nora Sandler
LOUDEN, Kenneth C. Compiler Construction. Principles and Practice U.S.A. Thompson Learning, 1997
PITTMAN, Thomas, PETERS, James The Art of Compiler Design; Theory and Practice Englewood cliffs New Jersey USA Prentice Hall, 1992
TREMBLAY, Jean-Paul, SORENSON, Paul G. The Theory and practice of compiler writing U.S.A. Mc. Graw-Hill, 1985
BENNETT, J. P. Introduction to Compiling Techniques. A first Course using Ansi C, LEX and YACC U.S.A. Mc. Graw-Hill. Book Company Europe, 1996
SCHREINER A. T. ; FRIEDMAN H. G. Introduction to compiler construction with Unix EU, 2nd edition, Prentice - Hall 1990
LEMONE, KAREN A. Fundamentos de compiladores. México, Cecsa, 2000
LLORCA, SANCHIS Compiladores. Teoría y Construcción España, Paraninfo, 1991.
CS 164. Programming Languages and Compilerscourse
COMS W4115 Programming Languages and Translators Spring 2015
Andrew W. Appel Modern Compiler Implementation in Java, second editionCambridge University Press, 2002
Keith D. Cooper and Linda TorczonEngineering a Compiler, Second EditionMorgan Kaufmann, 2012
Michael L. ScottProgramming Language Pragmatics, Third EditionMorgan Kaufman, 2009
Robert W. SebestaConcepts of Programming Languages, Tenth EditionPearson/Addison-Wesley, 2012
6.035. Computer Language Engineering MITcourse
Art of programming. Volume 7 – Compiler Techniques. Knuth
Compiler Design in C; Holub,
Princeton COS 320: Compiling Techniquescourse

Complement

NameTags
9. What Compilers Can and Cannot Do by MITcourse
(How to Write a (Lisp) Interpreter (in Python)) by Norvig
The Lisp Interpreter by Northwestern
Linkers and Loaders, by John Levine (Morgan Kaufmann, 1999)
Article series about linkers by Ian Lance Taylor
Tutorialspoint. (2021). Compiler Design Tutorial. Recuperado el 16 de noviembre de 2021, de:https://www.tutorialspoint.com/compiler_design/index.htm
https://norasandler.com/2022/03/29/Write-a-C-Compiler-the-Book.html
Lex & Yacc by Doug Brown, John R. Levine, and Tony Mason. O’Reilly
MAK, Ronald Writing Compilers and Interprets 2a. Edición U.S.A. Willey, 1996
RATT, T. W, ZELKOWITZ, M. V. Lenguajes de Programación. Diseño e Implementación. México Prentice Hall, 1998
SCOTT, Michael L. Programming Language Pragmatics U.S.A. Morgan Kaufmann, 2000
Aho’s site
JEFFERY, C.L. (2021). BUILD YOUR OWN PROGRAMMING LANGUAGE: A PROGRAMMER'S GUIDE TO DESIGNING COMPILERS, INTERPRETERS, AND DSLS FOR SOLVING MODERN COMPUTING PROBLEMS. PACKT PUBLISHING.
FLORES-FIGUEROA, J., SAMANO-HERMOSILLO, E. (2016). CONSTRUCCIÓN DE COMPILADORES BÁSICOS: FLEX, BISON &; MINGW. EDITORIAL ACADÉMICA ESPAÑOLA.
NYSTROM, R. (2021). CRAFTING INTERPRETERS. GENEVER BENNING.
THAIN, D. (2020). INTRODUCTION TO COMPILERS AND LANGUAGE DESIGN: SECOND EDITION. INDEPENDENTLY PUBLISHED.
Columbia COMS W4117: Compilers and Interpreters: Software Verification Tools, Fall 2007, Prof. Alfred Aho
Niklaus Wirth and Jürg Gutknecht Project Oberon The Design of an Operating System and Compiler
Pietro. (2020, June 13). Compilers. Retrieved from https://pgrandinetti.github.io/compilers
OUCompilers. (2022, November 09). Retrieved from https://github.com/OUCompilers
Steinberger, T. (2020, January 29). Grumpy2 Web Compiler. Retrieved from https://oucompilers.github.io
Types and Programming Languages, Benjamin Pierce

Introduction

What is a language processor?

Story

A misconception is Grace Hopper made the first compiler, indeed she didn’t. She did the first linker.

John Backus made “Speedcoding”, the first high-level programming language, in 1953 for the IBM 701 because software cost is higher than hardware cost. He made Fortran I for IBM 704 in 1954.

💡
"It is a waste of a valuable scientific computing instrument to use it to do clerical work." Donald Gillies paraphrased what John Louis von Neumann said about Assembly. He preferred machine code over fancy things such as Assembly and Fortran I —indeed, he knew Backus’s work. John von Neumann. (2002, February 09). Retrieved from https://ei.cs.vt.edu/~history/VonNeumann.html

TODO. People.

1957. Internal Translator (IT): A Compiler for the 650. With J. W. Smith and H. R. Van Zoeren.

Programming languages generations

Research

ACM Symposium on Programming Language Design and Implementation (PLDI), ACM Symposium on Principles of Programming Languages (POPL), ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), and ACM Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA).

https://popl22.sigplan.org/

Compiler Research: The Next 50 Years.

ACM SIGPLAN International Conference on Compiler Construction

Ecosystem

Alan-Kay interview

Esoteric programming language

JScrewIt

https://en.wikipedia.org/wiki/Esoteric_programming_language

Jobs

Open-source.

Pietro. (2020, June 13). Compilers. Retrieved from https://pgrandinetti.github.io/compilers/page/can-i-get-a-job-in-compilers

Apa, C. (2021, November 03). Compiler engineering (Compiler Organization, C and C ++ standards, career, market, industry, etc.). Youtube. Retrieved from https://www.youtube.com/watch?v=aZbVvl_eeMA&ab_channel=ConferênciaAPA

LLVM

The LLVM Compiler Infrastructure Project. (2022, October 05). Retrieved from https://llvm.org

https://www.youtube.com/watch?v=BT2Cv-Tjq7Q&ab_channel=Fireship

Compiler architecture

Frontend & Backend

Compiler architecture

https://www.scientificbulletin.upb.ro/rev_docs_arhiva/full83b_887019.pdf

Just in Time compilers

https://www.youtube.com/watch?v=d7KHAVaX_Rs&ab_channel=Computerphile

FAQ

Are compilers a theory of computation applied project?

Kind of.

Is a Parse tree a derivative tree?

No.

Why should you study Languages and Compilers?

Stevey's Blog Rants: Rich Programmer Food. (2022, November 09). Retrieved from https://steve-yegge.blogspot.com/2007/06/rich-programmer-food.html

What can compilers and cannot do?

https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/resources/lecture-9-what-compilers-can-and-cannot-do/

Why There Are So Many Programming Languages?

Educational languages and other learning project ideas

Projects

https://projectbook.code.brettchalupa.com/command-line-interfaces/markdown-processor.html

FAQ

Why do I make a compiler to COOL, and not a real language such as C, C++, or Java?

Of course, some schools use C subsets to teach.

References

Alexander Aiken. 1996. Cool: a portable project for teaching compiler construction. SIGPLAN Not. 31, 7 (July 1996), 19–24. https://doi.org/10.1145/381841.381847 https://theory.stanford.edu/~aiken/software/cool/cool.html

PLT

PL/0 (Oberon??)

Music language. Reading 26: Little Languages I. (2022, May 23). Retrieved from https://web.mit.edu/6.031/www/sp22/classes/26-little-languages-1

UpBeat. Alfred Aho's students made a language that transforms the stock market into music.

Logic programming. Valverde, J. A. R. (2022, October 31). JARV's Blog. Retrieved from https://jariaza.es

Language design. Language specification. Manual.

Specification and Description Language (SDL)

Dimensions.

Interpreters, Compilers.

Domain-specific languages (DSL), general-purpose language (GPL).

A language specification ≠ a language implementation.

You can follow the C99 specification and make an interpreter for it.

Examples

Basic elements of Artificial Languages

Lexical Analysis (aka. scanning). Lexer

Where are you?

??

💡
Tokens = Terminal Symbols.

String to tokens.

writing a lexer with awk

A little interpreter called a calculator

https://www.jflex.de/

Worked examples

Project

https://github.com/sanchezcarlosjr/compilers-uabc/blob/main/lexical-analysis-pa2/PA2.pdf

https://github.com/sanchezcarlosjr/compilers-uabc/tree/main/lexical-analysis-pa2

Syntax Analysis (aka. parsing)

TODO: Lexical versus Syntactic parsing.

Grammar

In parsing, when people say grammar, they refer to context-free grammar.

Grammar

Backtracking

Closure

First

Top-down parsing

Top-down parsing was initiated by Lewis and Stearns (1968) and Knuth (1971).

Recursive descent parser

LL(1) grammars

Algorithm.

References

LL(1) Parser Generator. (2020, March 24). Retrieved from https://www.cs.princeton.edu/courses/archive/spring20/cos320/LL1

Nonrecursive predictive parsing

Error recovery in predictive parsing

Bottom-up parsing

Reductions

Handle Pruning

Shift-reduce parsing

Conflicts During Shift-Reduce Parsing

LR parsing

LR parsers

Simple LR, LR(0)

More Power LR, Canonical-LR Parser LR(1)

Lookahead-LR parsers, LALR

Ambiguous Grammars

References

Home. (2021, April 22). Retrieved from https://ashutoshbsathe.github.io/yacv

LR(1) Parser Generator. (2012, December 02). Retrieved from https://jsmachines.sourceforge.net/machines/lr1.html

Sathe, A. (2021, March 07). Visualizing parsing using yacv: Yet Another Compiler Visualizer. Youtube. Retrieved from https://www.youtube.com/watch?v=BozB0O0__Qg&ab_channel=AshutoshSathe

Using Ambiguous Grammars

Parser Generators (aka parsing tools)

We need a pair of tools (Yacc, Lex), (Bison, Flex), (JFlex, CUP) in order to facilitate the construction of the front end of a compiler.

YACC

Parser Tools: lex and yacc-style Parsing. (2022, August 09). Retrieved from https://docs.racket-lang.org/parser-tools

BISON

Bison - GNU Project - Free Software Foundation. (2022, October 12). Retrieved from https://www.gnu.org/software/bison

CUP (Constructor of Useful Parsers)

ultimate-pa. (2022, October 12). javacup. Retrieved from https://github.com/ultimate-pa/javacup

https://github.com/ultimate-pa/javacup

Worked examples

Project

https://github.com/sanchezcarlosjr/compilers-uabc/tree/main/parser-pa3

https://github.com/sanchezcarlosjr/compilers-uabc/blob/main/parser-pa3/PA3.pdf

References

Detecting Ambiguity in Programming Language Grammars. Naveneetha Vasudevan and Laurence Tratt https://doi.org/10.1016/S0019-9958(82)91016-6

The Art of Computer Programming. Volume 5 – Syntactic Algorithms

Semantic Analysis

“Semantics analysis” is the third phase in a compiler that relates syntactic structures to give and verify meanings. It is the last “front end” phase too. It catches all remaining errors since parsing cannot catch them and some language constructs are not context-free. So, Semantics analysis income is AST from syntax analysis, and semantics analysis outcome is annotating AST, in fact, much of it can be expressed as recursive descent. In order words, you must check and annotate AST by requirements.

annote_ast(AST node)
   annote an AST node
   for child in node.children:
       annote_ast(child)
   finish to annotate an AST node

annote_ast(ast_from_syntax_analysis.root) # a pass

You must consider Semantic analysis requires multiple passes and it is highly dependent on the language’s requirements. However, we’re considering two typical problems in semantics analysis which are name resolution and typing.

Some language requirements are

Symbol table

The symbol table is an abstract data type that keeps track of symbol information. It stores diverse symbol information such as location, kind of symbol, and scope. It can be built in the semantics analysis phase or lexical semantics analysis. Indeed, it can be used in runtime for metaprogramming o debugging. Here, we’ll build a symbol table for semantics analysis.

classDiagram
    class SymbolTable {
      enterScope()
      find_symbol(x)
      add_symbol(x)
      check_scope(x)
      exit_scope()
   }

Symbol Table

https://ipfs.io/ipfs/QmQdtJry8zfndwXufxPNuTemQ7nzNwUYDxN9iAmrjKfgGm?filename=L09-LP-Handouts.pdf

Name resolution

It involves namespace (packages), scopes, visibility rules, overloading, and accessibility.

Scope

The scope or visibility of an identifier is the portion of the program (formally context or environment) in which that identifier applies. Scoping is useful because it prevents name collisions.

A simple language can not have any complex scoping logic, it can have only a global scoping or anything at all.

Is there a language to express scope precisely? No, we don’t any language.

Note that OOP languages have Access modifiers e.g. private, protected, or public. An access modifier is a static mechanism that allows to modification of the designed scope by the language specification.

Dynamically scoped or dynamic scope

The scope is context-dependent, but how we can determine the current context? Language specification has to say it since language can support static scope (also called lexical scope), dynamic scope, or both. You don’t think ahead a lot choosing one or another, use static scope.

The former means the context is where the identifier is declared and the context can not be changed. The latter means the context is determined in runtime.

Static scope example (it is a valid program in Python)

x = 99
def g():
      print(x) # 99. The 'x' context is the global scope.

def f():
  x = 10
  g()

f()

Dynamic scope example (it is not a valid program in Python)

x = 99
def g():
      print(x) # 10. The 'x' context is the "f" function.

def f():
  local x = 10 # local is a fake keyword, but it ilustrates the point.
  g()

f()

Bash and Perl support dynamic scope.

int x = 42;
int func() {
   int x = 3840;
   {
      extern int x;
      return x; // 42
   }
}

Nested declaration identifiers

Languages support some nested declaration identifiers and others do not.

Python support nested declaration functions, so this program is invalid:

def f():
   x = 10
   def g():
      print(x) # 10

f()

C cannot support nested declaration functions, so this program is invalid:

void f()
{
        int x = 10;
        int g()
        {
            printf("%d", x);
        } 
}
f();

Override identifiers

Forward declarations.

Name masking

Hoisting

Access modifier

t

Inheritance

Type system

💡
term: type

Although the notion varies from language to language, type is a set of values and a set of operations on those values. In fact, low-level language and machine code don’t have any types.

The goal of type systems and type checking is to ensure that operations are used with the correct types.

Types are computed up the AST from the leaves towards the root.

Type theory

Three kinds of languages:

They are not black and white. Real-world languages such that Java or C have “escape” mechanics, while Python’s new version has types.

Static typing means catching many programming errors at compile time and avoiding the overhead of running type checks. Dynamics typing proponents say static type systems are restrictive.

The type system is the piece of semantics analysis that does type checking and type inference.

Type checking is the process of verifying fully typed programs.

Type inference is the process of filling in missing type information.

The formalism for type checking is Rules of Inference.

Previous formal notations were regular expressions for the lexer and context-free grammar for the parsers. In a type system, the formal notation is the rule of inference. Those rules of inference are the form

e1:T1e2:T2...en:Tn    f(e1:T1,e2:T2,...,en:Tn):Tme1:T1e2:T2...en is Truef(e1:T1,e2:T2,...,en:Tn):Tme_1:T_1\land e_2:T_2\land... e_n:T_n \implies f(e_1:T_1,e_2:T_2,...,e_n:T_n):T_m\\ e_1:T_1\land e_2:T_2\land... e_n \text{ is True} \\ f(e_1:T_1,e_2:T_2,...,e_n:T_n):T_m

The simplest examples are

i is a an integer literali:inti \text{ is a an integer literal}\\ \vdash i:int
e1:Inte2:Inte1+e2:Int\vdash e_1:Int \vdash e_2: Int \\ \vdash e_1+e_2:Int

A type system is sound if whenever e:T\vdash e:T then ee evaluates to a value of type TT. In other words, a sound-type system actually reflects the kind of value you get when you run the program.

\vdash is the turnstile symbol it means “it is provable that …”

Sound type rule examples.

e:T1isvoid e1:Bool\vdash e:T_1\\ \vdash \text{isvoid } e_1 : Bool

Unsound type rule examples.

e1:Int e2:Inte1<e2:Int\vdash e_1:Int \text{ } \vdash e_2:Int\\ \vdash e_1 <e_2 : Int

https://users.sussex.ac.uk/~mfb21/compilers/slides/6-handout.pdf

Type environments.

The type environments give types to the free identifiers in the current scope.

Let OO be a function from Identifiers to Types.

The sentence Oe:TO\vdash e:T is read: Under the assumption that free variables have the types given by OO, it is probable that the expression ee has the type TT.

Subtyping.

lub(X,Y) the least upper bound.

Typing methods.

Other

Worked examples

References

Type Theory

Algebraic data type

Intermediate code generation

Run-Time environments

Code Optimization

Code Generator

Memory, Stack, Heap

FAQ

How to make an executable without a linker, compiler, or assembler?

https://web.archive.org/web/20140130143820/http://robinhoksbergen.com/papers/howto_elf.html

Worked example

Interpreter

REPL evaluate, print and loop

References

Crafting Interpreters. (2021, July 31). Retrieved from https://www.craftinginterpreters.com

Just in Time (JIT) compilers

https://www.youtube.com/watch?v=d7KHAVaX_Rs&ab_channel=Computerphile

Domain-specific languages (DSLs)

Graphviz, PlantUML, MermaidJS

Drag and drop tools to draw sometimes are so frustrating for many reasons. So, why not make a language that draws things?

Diagrams Through Ascii Art

SVG

Build tools

Make, Gradle, CMake

Configuration tools

XML or JSON +XPath

AWK

SED

GREP

Symbolic mathematics

Your own DSL

References

Parr, T. (2009). Language implementation patterns: create your own domain-specific and general programming languages. Language Implementation Patterns, 1-380.

OnSoftware. (2012, August 15). Domain-Specific Languages with Martin Fowler. Youtube. Retrieved from https://www.youtube.com/watch?v=ZfdAwV0HlEU&ab_channel=OnSoftware

Domain Specific Languages by Martin Fowler, with Rebecca Parsons 2010

DSL Guide. (2022, November 14). Retrieved from https://martinfowler.com/dsl.html

Swift, F. (2017, April 23). Writing Domain Specific Languages – Rahul Malik. Youtube. Retrieved from https://www.youtube.com/watch?v=YLeaRtB3GfY&ab_channel=FunctionalSwift

JetBrainsTV. (2021, November 11). Why we need domain-specific languages for non-developers. Youtube. Retrieved from https://www.youtube.com/watch?v=YuOb53LU_8A&ab_channel=JetBrainsTV

GeekcampSG. (2022, November 18). The L4 Compiler: a toolchain for a DSL for law - GeekcampSG 2022. Youtube. Retrieved from https://www.youtube.com/watch?v=VfA4rIX2VuU&ab_channel=GeekcampSG

Front End

Introduction to the Lambda Calculus

Quantum mechanics compilers

Quantum programming languages

Q Sharp