After my lecture that covered scanning and parsing, you should be able to read and comprehend the parts of Calingaert's chapter 6 that deal with lexical analysis and parsing. Be sure you read it by Tuesday. Also read the following code and extensive comments about the reader before Tuesday. What follows is code for a reader that's just a little fancier than the one I described in class. Be sure you understand how the mutual recursion between read and read-list implements top-down parsing of nested structures. Also be sure you understand the following * why you can't use regular expressions to parse nested expressions. * what read-char and peek-char do. * how the local procedures in read-symbol and read-number implement looping by recursion, and that looping consumes characters that consititute consecutive characters of a symbol or digit's textual representation. * how to desugar the local procedure definion syntax into letrec and lambda. (It's in the Scheme notes.) * how the regular expressions for the lexical syntax relate to finite-state machines. (E.g., the state transition diagram I showed you in class. Work out a state transition diagram for the reader presented here.) * how the finite-state machine relates to the code. * how the BNF notation works, and how the mutual recursion between the rule for lists and the rule for s-expressions captures the possible nestings of lists in s-expressions. * How read-list uses simple recursion to read any number of list elements, and mutual recursion through micro-read to read nested expressions. * what list->string, string->symbol, and string->number do. This new reader has a few new features: * it recognizes symbol printnames that have digits in them as well as letters, though they must start with a letter. * it recognizes the end of file, and returns the end-of-file object, which is a standard Scheme object. read-char and peek-char return this special object if they can't read because there are no characters left in the input. Our little reader returns it if there are no s-expressions left to read. ------------------------ cut here ---------------------------------------- ; micro-read is a simplified Scheme reader. ; It does not handle all of Scheme's lexical syntax. ; ; It handles: ; * symbols whose names start with a letter and whose ; other characters are letters or digits ; * numbers that consist of digits only (no minus sign, ; no decimal point, etc.) ; * lists of anything it can read. ; The subset of scheme s-expressions that this reader reads can ; be recursively defined this way: ; ; We parse textual s-expressions into so-called s-expression ; data structures (parse trees) where an s-expression may be: ; ; * a symbol, or ; * a number, or ; * a list of s-expressions ; The lexical syntax for the tokens is: ; ; token class regular expression ; ----------- ------------------ ; symbol letter {letter | digit}* ; number digit {digit}* ; leftparen `(' ; rigthparen `)' ; ; Here's a BNF grammar for the language we're parsing. Note ; that this is a language of data structures, not a language ; of programs. ; ; ::= symbol ; | number ; | ; ::= leftpar ; ::= rightpar ; | ; ; Here I'm using fairly standard BNF notation. ; The ::= symbol means that the thing on the left can be expanded to ; the thing on the right if you're producing sentences in the language, ; that the thing on the right can be parsed to the thing on the left ; if you're parsing sentences of the language. That is, the thing ; on the left corresponds to a parse tree node, and the thing on ; the right corresponds to the structure of a phrase. The vertical ; bar means "or". ; ; The things in angle brackets are nonterminals---these aren't tokens ; in the language, but abstractions of phrases. ; ; Notice the familiar structure of the first rule. An s-expression ; can be a symbol, or a number, or a list. (The first two are tokens, ; but the last one is a compound structure.) The last alternative, ; , is defined recursively. ; ; Lists are defined recursively as things starting with a left ; parenthesis and followed by a sequence of s-expressions, followed ; by a right parenthesis. A sequence of s-expressions can be ; empty, in which case you only get a right parenthesis that ends ; a list. Alternatively, it can be an s-expression followed ; by an s-expression sequence that ends with a right parenthesis. ; Make sure you understand that this recursive definition covers lists ; of any length. ; The basic parsing strategy for nested lists is to call read-list ; when we hit a left parenthesis, and it will read the items in ; the list, and stop when it hits a matching right parenthesis. ; Left parentheses thus tell the reader to invoke itself recursively, ; and right parentheses tell it to return back up a level. This ; recursion is indirect, through read-list (define (micro-read) (let ((first-char (read-char))) (cond ((eof-object? first-char) ; char is EOF? first-char) ; returns this EOF object ((char-alphabetic? first-char) ; char a letter? (read-symbol first-char)) ; goes to read-symbol state ((char-numeric? first-char) ; char a digit? (read-number first-char)) ; goes to read-number state ((char-whitespace? first-char) ; char a whitespace? (micro-read)) ; skip it and return to this state ((char-leftpar? first-char) ; char a left parenthesis? (read-list '())) ; goes to read-list state (else ; default (error "illegal character in input"))))) ; read-list is very special. It implements not just a state in terms ; of lexical analysis, but a parsing step. We're recognizing a production ; in the grammar. Mutual recursion between micro-read and read-list is ; what implements top-down parsing. ; The basic idea is that when micro-read sees a left parenthesis, ; it recognizes that as the beginning of a list and calls read-list. ; Read list reads list elements by calling micro-read, until it ; runs into the right parenthesis that closes the one that started ; the list. Then it returns a list of the list elements it read. ; ; read-list doesn't have to do anything special to keep track of ; matching parentheses, because a recursive call to read that ; encounters another left parenthesis will call read-list again, ; recursively, and *that* call to read list will read up to ; its matching parenthesis, and consume it before it returns. ; So when a particular call to read-list encounters a right parenthesis, ; it is guaranteed to be the one that ends this list. Any other ; right parentheses that close nested lists will be consumed by ; the recursive calls that read those nested lists. (define (read-list list-so-far) (let ((next-char (peek-char))) ; if we hit a right parenthesis, (cond ((char-rightpar? next-char) ; then return list we've read, reversing it into proper order (read-char) ; get rid of the right parenthesis (reverse list-so-far)) ; else read next item and call self recursively to read rest (else (read-list (cons (micro-read) list-so-far)))))) ; read-symbol calls read-symbol-helper and changes the result from ; list to string then to symbol (define (read-symbol chr) ; read-symbol-helper reads in one character a time and puts it into ; a list. If it finds the character is a finishing character, then ; it reverses the list and returns. (define (read-symbol-helper list-so-far) (let ((next-char (peek-char))) ; if next-char is a letter or a digit then call self recursively (if (or (char-alphabetic? next-char) (char-numeric? next-char)) (read-symbol-helper (cons (read-char) list-so-far)) ; else return list we've read, reversing it into proper order (reverse list-so-far)))) (string->symbol (list->string (read-symbol-helper (list chr))))) ; read-string calls read-string-helper and changes the result from ; list to string (define (read-string) ; read-string-helper reads one character at a time and puts it into ; a list. If it finds a double quote ", it reverses the list and ; returns. (define (read-string-helper list-so-far) (let ((next-char (peek-char))) (cond ((eq? next-char #\") ; if we hit a double-quote (read-char) ; get rid of the closing quote (reverse list-so-far)) ; then reverse and return list ; else read next item and call self recursively to read rest (else (read-string-helper (cons (read-char) list-so-far)))))) (list->string (read-string-helper '()))) ; read-number calls read-number-helper and changes the result from ; list to string then to number (define (read-number chr) ; read-number-helper reads one character at a time and checks if ; it is a number. If so puts it into a list. Otherwise, it reverse ; the list and return. (define (read-number-helper list-so-far) (let ((next-char (peek-char))) ; if next-char is a digit then call self resursively (if (char-numeric? next-char) (read-number-helper (cons (read-char) list-so-far)) ; else return the list we've read, reversing it into proper order (reverse list-so-far)))) (string->number (list->string (read-number-helper (list chr))))) ;------------------------------------------------------------------- ; Procedures for classifying characters according to our lexical ; syntax. Note that char-alphabetic?, char-numeric?, and eof-object? ; are standard Scheme procedures, so we don't need to define them ; We don't really need char-leftpar? and char-rightpar? because we ; could just do an eq? test against character literals inline, ; but it's convenent to separate them out because the character ; literals confuse some editors' parenthesis matchers. ; whitespace? check whether char is either space or newline (define (char-whitespace? char) (or (eq? char #\space) (eq? char #\newline))) (define (char-leftpar? char) (eq? char #\( )) (define (char-rightpar? char) (eq? char #\) ))