s5cs37 bachus naur form
This page is mainly about s5cs37 bachus naur form

Defining formal languages is a tricky business. We need another language to do it; a metalanguage. Crazy stuff...
We are learning...
About Backus-Naur Form
So that we can ...
Check language syntax be referring to BNF productions or syntax diagrams
Formulate simple production rules
Explain why BNF can represent some languages that cannot be representing with regular expressions
Backus-naur form (BNF) is another metasyntax / metalanguage used to express 'context-free' grammars; a formal way to describe formal languages where each element in the language is not constricted by it's context/surroundings. Context free grammars are easier to parse than context sensitive grammars.
Two computer scientists in the late 1950's developed BNF to help define the structure of the programming language Algol.

Backus-Naur Form is used because defining the syntax of a formal language using a regular expression is very tedious for languages with large alphabets. It would be virtually impossible to use RE to define the Python programming language, for instance.

An example of a Backus-Naur form grammar
Considering the above example, a BNF grammar consists of 4 parts ...
1
A set of literals, terminals or tokens
2
Productions
3
A set of meta-variables or non-terminals
4
A start symbol

Activity 1
Metasymbols
Non-terminals are defined using production rules. The components of a production rule are...
<...>
allows definition of metavariables (or non-terminal symbols) which may be further defined in terms of other meta-variables found elsewhere in the BNF and terminal symbols (components of the language alphabet which require no further definition);::=
means 'is defined as';<LHS> ::= RHS
means 'LHS is defined as RHS'|
(pipe character) represents alternation (OR);<LHS> ::= a|b
means 'LHS is defined as a OR b'This is the same as two separate production rules ...
<LHS> ::= a
<LHS> ::= b
A meta-symbol can be defined in terms of itself, recursively;
<LHS> ::= a | a <LHS>
means 'LHS is defined as a OR a followed by LHS'The left hand side of a BNF production is always a single non-terminal whilst the right hand side can consist of any combination of terminals and non-terminals (an expression).
<meta-variable> ::= expression
For example, a full definition of signed and unsigned numbers in BNF uses the following productions...
<signed_real> ::= -<unsigned_real>|<unsigned_real>
<unsigned_real> ::= <integer>.<integer>
<integer> ::= <digit>|<digit><integer>
<digit> ::= 0|1|2|3|4|5|6|7|8|9

Task 1.1 Describing BNF in English
Can you 'describe' the above BNF productions in a natural language of your choice (probably English)?
OUTCOME : Write your 'descriptions' in your notes along with a copy of the BNF rule.


Activity 2
Parse trees
The grammar is a set of rules which explain how to build a 'parse tree'. A parse tree can be used to generate valid sentences in the language or to check whether sentences are valid in the language. When writing parse trees, we always work from the start symbol down to the terminals. The resulting string is read from left to right.
Example: Writing a valid English sentence.


Task 2.1 Constructing valid numbers
Can you use a parse tree to construct a range of valid,
<signed_real>
numbers from the BNF grammar you looked at in the last task?OUTCOME: Draw the parse trees out in your notes along with the valid numbers you have generated.

Example: Checking the syntax of a mathematical expression - syntactical analysis.
The following BNF rule defines a simple mathematical expression.
<exp> ::= <exp>+<exp>|<exp>*<exp>|(<exp>)|a|b|c
This is equivalent to 6 separate productions ...
<exp> ::= <exp>+<exp> <exp> ::= <exp>*<exp> <exp> ::= (<exp>) <exp> ::= a <exp> ::= b <exp> ::= c
If we check the validity of a string, we work from the outside of the string to the inside, choosing the most suitable production each time working down from the start string. If we cannot choose a suitable production or a terminal, the string is not valid.


Task 2.2 Parse trees
Show parse trees for each of these mathematical expressions and determine whether the given string is valid or not.
1
a+b
2
a*b+c
3
(a+b)
4
(a-b)
5
a
6
(a+(b))
OUTCOME: Draw the parse trees in your notes, state whether the string is valid and if it is not, explain why.

An alternative is to use bottom-up or shift-reduce parsing to examine the string to see if it is indeed part of the language we have defined with the BNF. This is the reverse of the technique of drawing the parse tree we have just seen. We start with the sentence and work up to the start symbol.
Time to get dull
Those of you who have been to the Keswick Pencil Museum will be fascinated to discover the connection to context-free grammars. Here is a set of BNF rules which define the structure of the 'pencil hardness' grammar. I've numbered the productions so I can refer to them in the example which follows.
<lead_hardness> ::= <simple_hardness>|<scale_of_hardness> <scale_of_hardness> ::= <numeric_value><simple_hardness> <simple_hardness> ::= H|B|F|HB <numeric_value> ::= 1|2|3|4|5|6|7|8|9
Let's try to prove that 2H is a valid lead hardness using the shift-reduce technique.

Shift-reduce parsing
Here is another example using the simple mathematical expression grammar.
<exp> ::= <exp>+<exp> | <exp>*<exp> | (<exp>) | a | b | c
We'll prove that
((a+b)*c)
is a valid mathematical expression using bottom-up parsing.
Bottom-up parsing
Now time for some more questions to see if you 'get it'!

Task 2.3 BNF Challenges
Use parse tree AND bottom-up parsing methods to answer the following set of questions.
1
Compilers
Backus Naur Form (BNF) is used by compiler writers to express the syntax of a programming language. The syntax for a part of one such language is written in BNF as follows:
<expression> ::= <integer> | <expression> <operator> <expression> <integer> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <operator> ::= + | - | * | /
Do the following expressions conform to this grammar?
1
4*9
2
8+6/2
3
-6*2
4
(4+5)*5
2
Identifiers
Identifiers in a particular programming language are valid if they are constructed based on the following BNF productions: (NEED TO CHECK PRODUCTION RULES)
<letter> ::= A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z <digit> ::= 1|2|3|4|5|6|7|8|9 <character> ::= <letter>|<digit>|_ <string> ::= <character>|<character><string> <identifier> ::= <letter>|<letter><string>
Are the following identifiers valid?
1
Z__
2
3AXY
3
D_E
4
D12
5
PETER's
6
_AbC
7
ABC
OUTCOME: Draw the parse trees and bottom-up parsing tables in your notes, state whether the string is valid and if it is not, explain why.


Activity 3
Extended Backus-Naur Form
For complex grammars, BNF is too restrictive. Extended BNF, or EBNF, adds regular expression style notation...
Literals are enclosed by double quotes for example "word";
Optional items shall be enclosed in square brackets indicating that the item appears either zero or one times.
Repetition is introduced using curly brace meta-symbols '{' and '}'.
Curly braces followed by a * character indicate zero or more sequential instances of the item
Curly braces followed by a + character indicate one or more sequential instances (i.e. not optional).
Double period (..) shall appear within a literal definition as a shortcut denoting a set of ASCII characters between the letters on either side of them. For example "A..Z" representing the set of uppercase letters.
For instance, the EBNF definitions of the signed and unsigned numbers uses the following productions ...
signed_real = ("-" | "+") unsigned_real;
unsigned_real = integer "." integer;
integer = digit {digit};
digit = "0..9"
;Note that EBNF is no more powerful than BNF, just more convenient. Any EBNF production can be written in BNF, it just might look a lot more complicated.

Task 3.1 NoSQL Grammar
Visit the Oracle NoSQL Reference Guide and spend a little time investigating the BNF rules for NoSQL. Do you see any you recognise from your work on databases? If not, it's probably because you haven't done about them yet!
Write some brief notes for your folders so that you have a reduced set of EBNF productions which demonstrate the basic concepts.
OUTCOME: A reduced set of notes on the EBNF productions for NoSQL


Activity 4
Syntax diagrams
Conveniently, productions rules aren't the only way of representing grammars. Allow me to introduce 'railroad' or 'syntax' diagrams. Syntax diagrams are an alternative way of defining the syntax rules of a language. They can even be used to describe a subset of the English language.
The following EBNF production ...
Syntax = S(Y(N(T(A(X|ctic|gma)|he(sis|tic))|o(nym|(p(sis|tic))))|stem[[at]ic])|em(an|iom)tic);
...could be used to generate this syntax diagram.

This is an example of a programming statement which is defined both as a BNF grammar and a syntax diagram. Look carefully...

Railroad diagram
Modern programming languages still use syntax diagrams to define their structure. For instance, the following syntax diagram is from the JSON (JavaScript Object Notation) website...
If you can get from the left hand side to the right hand side whilst parsing the number (in this case) and not leaving the lines, the the number is valid as shown in the following beautiful giffy example...

How to use a syntax diagram to check the validity of a string

Task 4.1 Advantages of syntax diagrams
1
Using this number syntax diagram from the Json website, can you use it to define a set of 5 valid numbers?
2
Use this number syntax diagram to determine whether the following numbers are valid in that language:
+12.33E14
-0.99
12.244E-12
-1233.34e+45
0
3
How much easier is it to use a Syntax diagram like this than the BNF grammar you used in the Task called 'Constructing valid numbers'?
OUTCOME: Make a note of the syntax diagram in your notebook and use it to write down at least 3 examples of valid numbers. You could use a different coloured pen to 'trace' your way through the railroad diagram. Include a statement justifying the advantage of syntax diagrams over BNF grammars.

We can't get away without setting some challenging questions can we?

Task 4.2 Syntax diagram challenges
1
User ID
A user ID for logging onto a computer system consists of ...
...two digits followed by
...one of the abbreviations H, L, D, R, Pa, Ph, followed by
...a surname consisting of one or more upper (A..Z) or lower case (a..z) letters followed by
...a single initial chosen from the set of upper case letters, A..Z.
Draw a syntax diagram for the user ID's in this system.
2
Identifier
An identifier in a programming language consists of a letter followed by zero or more letters or digits. Draw a syntax diagram for valid identifiers in this language.
3
What have the Romans ever done for us?
Draw a syntax diagram which defines the Roman Numerals from 1 to 10 ...
I, II, III, IV, V, VI, VII, VIII, IX, X
Your diagram should contain only 3 terminal symbols and should not consist simply of a series of alternatives. HINT: Start with I and build the alternatives up from there.
OUTCOME: Draw your syntax diagrams carefully in your notes and get your teacher to check them over. Ask for hints if you need to 😃


Activity 5
EBNF Visualiser
There is a free piece of software called 'EBNF Visualiser' which is available to download for free from a 'Not secure' website (don't worry, it's fine). I have also provided you with a number of 'grammars' in .ebnf format...
3
Pencil hardness

The user interface of EBNF Visualiser (Windows 7+ version)

Task 5.1 EBNF visualiser - a challenge!
Can you get EBNF visualiser to work? Can you find the
case_statement
syntax diagram from the screenshot above? You can inspect the *.ebnf files using notepad to see what the flavour of BNF looks like.OUTCOME: Nothing apart from the satisfaction of getting it to work!

Extension Activities
Wasn't this challenging enough? Have a night off!

Wasn't this challenging enough? Have a night off!

Other useful places on the web
🌐
🌐
www.research-collection.ethz.ch
The revised report on the Pascal programming language by Niklaus Wirth. Contains a full definition of Pascal as well as EBNF and Syntax Diagram representations.
🌐
www.bottlecaps.de
This is a tool for creating syntax diagrams, also known as railroad diagrams, from context-free grammars specified in EBNF.
🌐
What's next?
Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.
END OF TOPIC ASSESSMENT
Last modified: May 8th, 2024