Defining formal languages is a tricky business. We need another language to do it; a metalanguage. Crazy.We are learning ...- About Backus-Naur Form
So that we can ...- Check language syntax be referring to BNF productions or syntax diagrams
- Formulate simple production rules
- Explain why BNF can represent some languages that cannot be representing with regular expressions
metasyntax / metalanguage used to express 'context-free' grammars; a formal way to describe formal languages where each element in the language is not constricted by it's context / surroundings. Context free grammars are easier to parse than context sensitive grammars. Two computer scientists in the late 1950's developed BNF to help define the structure of the programming language Algol↗. Backus-Naur Form is used because defining the syntax of a formal language using a regular expression is very tedious for languages with large alphabets. It would be virtually impossible to use RE to define the Python programming language for instance.This is an example of a Backus-Naur form grammar. Considering the above example, a BNF grammar consists of 4 parts ...*A set of literals, terminals or tokens* Terminals are the smallest units of syntax in a grammar. The alphabet. There are atomic and cannot be defined in terms of anything else;
*Productions* A statement written in the BNF metalanguage;
*A set of meta-variables or non-terminals* A non-terminal symbol is not part of the alphabet and is further defined in another production. Non-terminals take the place of larger pieces of syntax from a grammar;
*A start symbol* There is a special non-terminal called the start symbol which represents the highest level of the grammar; the top of the tree as it were.
Non-terminals are defined using production rules. The components of a production rule are ...`<...>` allows definition of*metavariables*(or non-terminal symbols) which may be further defined in terms of other meta-variables found elsewhere in the BNF and terminal symbols (components of the language alphabet which require no further definition);
`::=` means 'is defined as';
`<LHS> ::= RHS` means 'LHS is defined as RHS'
`|` (pipe character) represents alternation (OR);
`<LHS> ::= a|b` means 'LHS is defined as a OR b'
This is the same as two separate production rules ...
`<LHS> ::= a`
`<LHS> ::= b`
- A meta-symbol can be defined in terms of itself,
*recursively*;
`<LHS> ::= a | a <LHS>` means 'LHS is defined as a OR a followed by LHS'
`<meta-variable> ::= expression` For example, a full definition of signed and unsigned numbers in BNF uses the following productions ... `<signed_real> ::= -<unsigned_real>|<unsigned_real> `
`<digit> ::= 0|1|2|3|4|5|6|7|8|9 `
The grammar is a set of rules which explain how to build a 'parse tree'. A parse tree can be used to generate valid sentences in the language or to check whether sentences are valid in the language. When writing parse trees, we always work from the start symbol down to the terminals. The resulting string is read from left to right. Example : Writing a valid English sentence.
Example : Checking the syntax of a mathematical expression - syntactical analysis.The following BNF rule defines a simple mathematical expression. `<exp> ::= <exp>+<exp> | <exp>*<exp> | (<exp>) | a | b | c` This is equivalent to 6 separate productions ...`<exp> ::= <exp>+<exp> ` `<exp> ::= <exp>*<exp>` `<exp> ::= (<exp>)` `<exp> ::= a` `<exp> ::= b` `<exp> ::= c` If we check the validity of a string, we work from the outside of the string to the inside, choosing the most suitable production each time working down from the start string. If we cannot choose a suitable production or a terminal, the string is not valid.
An alternative is to use bottom-up or shift-reduce parsing to examine the string to see if it is indeed part of the language we have defined with the BNF. This is the reverse of the technique of drawing the parse tree we have just seen. We start with the sentence and work up to the start symbol.Earlier on in this section, we met Pencil hardness. Here is a set of BNF rules which define the structure of the Pencil hardness grammar. I've numbered the productions so I can refer to them in the example which follows. `<lead_hardness> ::= <simple_hardness>|<scale_of_hardness>` `<scale_of_hardness> ::= <numeric_value><simple_hardness>` `<simple_hardness> ::= H|B|F|HB` `<numeric_value> ::= 1|2|3|4|5|6|7|8|9`
Here is another example using the simple mathematical expression grammar. `<exp> ::= <exp>+<exp> | <exp>*<exp> | (<exp>) | a | b | c` We'll prove that `((a+b)*c)` is a valid mathematical expression using bottom up parsing.Now time for some more questions to see if you 'get it'!
For complex grammars, BNF is too restrictive. Extended BNF, or EBNF, adds regular expression style notation ...- Literals are enclosed by double quotes for example "word";
- Optional items shall be enclosed in square brackets indicating that the item appears either zero or one times.
- Repetition is introduced using curly brace meta-symbols '{' and '}'.
- Curly braces followed by a * character indicate zero or more sequential instances of the item
- Curly braces followed by a + character indicate one or more sequential instances (i.e. not optional).
- Double period (..) shall appear within a literal definition as a shortcut denoting a set of ASCII characters between the letters on either side of them. For example "A..Z" representing the set of uppercase letters.
`signed_real = ("-" | "+") unsigned_real .` `unsigned_real = integer "." integer .` `integer = digit {digit} .` `digit = "0..9" .` Note that EBNF is no more powerful than BNF, just more convenient. Any EBNF production can be written in BNF, it just might look a lot more complicated.
In the last task (the BNF club), you saw some diagrams that looked like railway sidings. These are called railroad diagrams or, more correctly, syntax diagrams. Syntax diagrams are an alternative way of defining the syntax rules of a language. They can even be used to describe a subset of the English language. The following EBNF production ... Syntax = S (Y (N (T (A (X | ctic | gma) | he (sis | tic)) | o (nym | (p (sis | tic)))) | stem [[at] ic]) | em (an | iom) tic) .... could be used to generate this syntax diagram. This is an example of a programming statement which is defined both as a BNF grammar and a syntax diagram. Look carefully at the popup. Modern programming languages still use syntax diagrams to define their structure. For instance, the following syntax diagram is from the JSON (JavaScript Object Notation) website ... If you can get from the left hand side to the right hand side whilst parsing the number (in this case) and not leaving the lines, the the number is valid as shown in the following beautiful giffy example ... How to use a syntax diagram to check the validity of a string
We can't get away without setting some challenging questions can we?
There is a free piece of software called 'EBNF Visualiser' which is available to download from the Software directory for this section as well as a number of '.ebnf' files from the resources for this section which define various grammars.
Wasn't this challenging enough? Have a night off! Railroad diagram generator : http://www.bottlecaps.de/rr/ui
END OF TOPIC ASSESSMENT |