11# Lexing and Parsing
22
3- The very first thing the compiler does is take the program (in Unicode) and
4- transmute it into a data format the compiler can work with more conveniently
5- than strings. This happens in two stages: Lexing and Parsing.
3+ The very first thing the compiler does is take the program (in UTF-8 Unicode text)
4+ and turn it into a data format the compiler can work with more conveniently than strings.
5+ This happens in two stages: Lexing and Parsing.
66
77 1 . _ Lexing_ takes strings and turns them into streams of [ tokens] . For
88 example, ` foo.bar + buz ` would be turned into the tokens ` foo ` , ` . ` , ` bar ` ,
@@ -13,38 +13,36 @@ than strings. This happens in two stages: Lexing and Parsing.
1313
1414 2 . _ Parsing_ takes streams of tokens and turns them into a structured form
1515 which is easier for the compiler to work with, usually called an [ * Abstract
16- Syntax Tree* (` AST ` )] [ ast ] .
16+ Syntax Tree* (AST)] [ ast ] .
1717
1818
19- An ` AST ` mirrors the structure of a Rust program in memory, using a ` Span ` to
20- link a particular ` AST ` node back to its source text. The ` AST ` is defined in
19+ An AST mirrors the structure of a Rust program in memory, using a ` Span ` to
20+ link a particular AST node back to its source text. The AST is defined in
2121[ ` rustc_ast ` ] [ rustc_ast ] , along with some definitions for tokens and token
22- streams, data structures/` trait ` s for mutating ` AST ` s , and shared definitions for
23- other ` AST ` -related parts of the compiler (like the lexer and
24- ` macro ` -expansion).
22+ streams, data structures/traits for mutating ASTs , and shared definitions for
23+ other AST-related parts of the compiler (like the lexer and
24+ macro-expansion).
2525
2626The lexer is developed in [ ` rustc_lexer ` ] [ lexer ] .
2727
2828The parser is defined in [ ` rustc_parse ` ] [ rustc_parse ] , along with a
2929high-level interface to the lexer and some validation routines that run after
30- ` macro ` expansion. In particular, the [ ` rustc_parse::parser ` ] [ parser ] contains
30+ macro expansion. In particular, the [ ` rustc_parse::parser ` ] [ parser ] contains
3131the parser implementation.
3232
3333The main entrypoint to the parser is via the various ` parse_* ` functions and others in
3434[ rustc_parse] [ rustc_parse ] . They let you do things like turn a [ ` SourceFile ` ] [ sourcefile ]
3535(e.g. the source in a single file) into a token stream, create a parser from
36- the token stream, and then execute the parser to get a [ ` Crate ` ] (the root ` AST `
36+ the token stream, and then execute the parser to get a [ ` Crate ` ] (the root AST
3737node).
3838
39- To minimize the amount of copying that is done, both [ ` StringReader ` ] and
40- [ ` Parser ` ] have lifetimes which bind them to the parent [ ` ParseSess ` ] . This
41- contains all the information needed while parsing, as well as the [ ` SourceMap ` ]
42- itself.
39+ To minimize the amount of copying that is done,
40+ both [ ` StringReader ` ] and [ ` Parser ` ] have lifetimes which bind them to the parent [ ` ParseSess ` ] .
41+ This contains all the information needed while parsing, as well as the [ ` SourceMap ` ] itself.
4342
44- Note that while parsing, we may encounter ` macro ` definitions or invocations. We
45- set these aside to be expanded (see [ Macro Expansion] ( ./macro-expansion.md ) ).
46- Expansion itself may require parsing the output of a ` macro ` , which may reveal
47- more ` macro ` s to be expanded, and so on.
43+ Note that while parsing, we may encounter macro definitions or invocations.
44+ We set these aside to be expanded (see [ Macro Expansion] ( ./macro-expansion.md ) ).
45+ Expansion itself may require parsing the output of a macro, which may reveal more macros to be expanded, and so on.
4846
4947## More on Lexical Analysis
5048
0 commit comments