Why?

A few months ago I joined Marc-André Cournoyer’s Great Code Club. I will right my own blog post about this club, but let me say that t $30, its well worth it to expand my programming life past Saas sites, exercism and Euler projects. Where was I? The second project from the Great Code Club is a transpiler, creating a css parser from scratch. Far out right? Well I bobbled through M-A’s use of Jison. At this stage of my career, I can pick up this stuff pretty fast. But for me its always: can I get a deeper understanding? Can I learn more? And in the world of computers, the answer is always an emphatic YES!!!!.

Javascript and Go are the languages I am learning now. Ruby is the language I am currently most fluent in. I looked at RACC and Rexical, both developed by Tenderlove. The documentation on these tools is parse. I couldn’t find any good information on them.

So, I says: “looks like its time to go back to grand-daddy C.” I begin looking at Bison (gnu’s YACC). There is some documentation. Its going to be an interesting journey.

In the next series of posts, I hope to illuminate you to the world of compiler compilers, grammars, lexers, and parsers.

We will start this adventure looking a Bison and Flex, move on to RACC and Rexical, then onto Jison and Jisonlex, and possibly look at Go’s YACC tools.

Grab your helmet, its going to be a bumpy ride.

This information is grabbed from the painful Bison docs.

Learn Me Some Bison!

Terms

Token - A terminal input Grouping - A non-terminal input of groupings and/or tokens

A single character is always a token, because it cannot be broken down further.

Lexers, Grammars, Parsers

When you get into grammars you can easily find yourself going down the rabbit hole on wikipedia. Lets keep it simple, with an analogy like Coinstar, the automated coin counting machine:

(the input is the 1000s of coins you found in your couch, as well as lent, pebbles and small pieces of plastic)

Lexical analysis (lexers) looks through the mess that is dumped into the machine and identifies coinage of a certain criteria: U.S. minted pennies, nickels, dimes and quarters. Everything else is discarded.

Grammars instruct how to total the proper coinage. Perhaps the coins are first siloed into individual denomination collections. Those collections are then subtotaled based on proper coin value. Finally the total of the subtotals is calcualted and printed to a redemption voucher to take to the store manager for cash monies.

Parsers, the are the automated output of running a compiler compiler using the lexical declaractions and grammar, is the magic of the machine: I dump in coinage, out pops a voucher

The takeaway here is that the “parser” is the output of the process. The parser is the code that can be inserted in any of our programs. The parser will be shipped along with any final product.

To take it home lets look at a contrived example:

Lexical declarations:

Symbol What to do with it
digit keep it, as token
+ (plus sign) keep it
other discard it

Grammar:

Pattern What to do with it
“token” return it
“token + token” return first plus second

Parser Generated:

Parser.parse("1")           // returns 1
Parser.parse("1 + 2")       // returns 3
Parser.parse("- A 1 + 2")   // discards - A returns 3

Bison

The layout of the Bison grammar file is:

%{
C declarations
%}

Bison declarations

%%
Grammar rules
%%
Additional C code

A lexer like flex would replace the “Bison declaration” section.



blog comments powered by Disqus

Published

11 May 2014

Tags