Login

Please fill in your details to login.





s5cs37 bachus naur form

This page is mainly about s5cs37 bachus naur form
image

Defining formal languages is a tricky business. We need another language to do it; a metalanguage. Crazy stuff...

We are learning...

About Backus-Naur Form

So that we can ...

Check language syntax be referring to BNF productions or syntax diagrams
Formulate simple production rules
Explain why BNF can represent some languages that cannot be representing with regular expressions

Backus-naur form (BNF) is another metasyntax / metalanguage used to express 'context-free' grammars; a formal way to describe formal languages where each element in the language is not constricted by it's context/surroundings. Context free grammars are easier to parse than context sensitive grammars.

Two computer scientists in the late 1950's developed BNF to help define the structure of the programming language Algol.

image

Backus-Naur Form is used because defining the syntax of a formal language using a regular expression is very tedious for languages with large alphabets. It would be virtually impossible to use RE to define the Python programming language, for instance.

image
An example of a Backus-Naur form grammar

Considering the above example, a BNF grammar consists of 4 parts ...

1
A set of literals, terminals or tokens
Terminals are the smallest units of syntax in a grammar. The alphabet. There are atomic and cannot be defined in terms of anything else;

2
Productions
A statement written in the BNF metalanguage;

3
A set of meta-variables or non-terminals
A non-terminal symbol is not part of the alphabet and is further defined in another production. Non-terminals take the place of larger pieces of syntax from a grammar;

4
A start symbol
There is a special non-terminal called the start symbol which represents the highest level of the grammar; the top of the tree as it were.

Activity 1
Metasymbols

Non-terminals are defined using production rules. The components of a production rule are...

<...>
allows definition of metavariables (or non-terminal symbols) which may be further defined in terms of other meta-variables found elsewhere in the BNF and terminal symbols (components of the language alphabet which require no further definition);

::=
means 'is defined as';
<LHS> ::= RHS
means 'LHS is defined as RHS'

|
(pipe character) represents alternation (OR);
<LHS> ::= a|b
means 'LHS is defined as a OR b'
This is the same as two separate production rules ...
<LHS> ::= a
<LHS> ::= b

A meta-symbol can be defined in terms of itself, recursively;
<LHS> ::= a | a <LHS>
means 'LHS is defined as a OR a followed by LHS'

The left hand side of a BNF production is always a single non-terminal whilst the right hand side can consist of any combination of terminals and non-terminals (an expression).

<meta-variable> ::= expression


For example, a full definition of signed and unsigned numbers in BNF uses the following productions...

<signed_real> ::= -<unsigned_real>|<unsigned_real>
<unsigned_real> ::= <integer>.<integer>
<integer> ::= <digit>|<digit><integer>
<digit> ::= 0|1|2|3|4|5|6|7|8|9


time limit
Task 1.1 Describing BNF in English

Can you 'describe' the above BNF productions in a natural language of your choice (probably English)?

OUTCOME : Write your 'descriptions' in your notes along with a copy of the BNF rule.

Checkpoint

Activity 2
Parse trees

The grammar is a set of rules which explain how to build a 'parse tree'. A parse tree can be used to generate valid sentences in the language or to check whether sentences are valid in the language. When writing parse trees, we always work from the start symbol down to the terminals. The resulting string is read from left to right.

Example: Writing a valid English sentence.

image

time limit
Task 2.1 Constructing valid numbers

Can you use a parse tree to construct a range of valid,
<signed_real>
numbers from the BNF grammar you looked at in the last task?

OUTCOME: Draw the parse trees out in your notes along with the valid numbers you have generated.

Checkpoint

Example: Checking the syntax of a mathematical expression - syntactical analysis.

The following BNF rule defines a simple mathematical expression.

<exp> ::= <exp>+<exp>|<exp>*<exp>|(<exp>)|a|b|c


This is equivalent to 6 separate productions ...

<exp> ::= <exp>+<exp>
<exp> ::= <exp>*<exp>
<exp> ::= (<exp>)
<exp> ::= a
<exp> ::= b
<exp> ::= c


If we check the validity of a string, we work from the outside of the string to the inside, choosing the most suitable production each time working down from the start string. If we cannot choose a suitable production or a terminal, the string is not valid.

image

time limit
Task 2.2 Parse trees

Show parse trees for each of these mathematical expressions and determine whether the given string is valid or not.

1
a+b
2
a*b+c
3
(a+b)
4
(a-b)
5
a
6
(a+(b))

OUTCOME: Draw the parse trees in your notes, state whether the string is valid and if it is not, explain why.

Checkpoint

An alternative is to use bottom-up or shift-reduce parsing to examine the string to see if it is indeed part of the language we have defined with the BNF. This is the reverse of the technique of drawing the parse tree we have just seen. We start with the sentence and work up to the start symbol.

Time to get dull

Those of you who have been to the Keswick Pencil Museum will be fascinated to discover the connection to context-free grammars. Here is a set of BNF rules which define the structure of the 'pencil hardness' grammar. I've numbered the productions so I can refer to them in the example which follows.

<lead_hardness> ::= <simple_hardness>|<scale_of_hardness>
<scale_of_hardness> ::= <numeric_value><simple_hardness>
<simple_hardness> ::= H|B|F|HB
<numeric_value> ::= 1|2|3|4|5|6|7|8|9


Let's try to prove that 2H is a valid lead hardness using the shift-reduce technique.

image
Shift-reduce parsing

Here is another example using the simple mathematical expression grammar.

<exp> ::= <exp>+<exp> | <exp>*<exp> | (<exp>) | a | b | c


We'll prove that
((a+b)*c)
is a valid mathematical expression using bottom-up parsing.

image
Bottom-up parsing

Now time for some more questions to see if you 'get it'!


time limit
Task 2.3 BNF Challenges

Use parse tree AND bottom-up parsing methods to answer the following set of questions.

1
Compilers

Backus Naur Form (BNF) is used by compiler writers to express the syntax of a programming language. The syntax for a part of one such language is written in BNF as follows:

<expression> ::= <integer> | <expression> <operator> <expression>
<integer> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<operator> ::= + | - | * | /


Do the following expressions conform to this grammar?

1
4*9
2
8+6/2
3
-6*2
4
(4+5)*5

2
Identifiers

Identifiers in a particular programming language are valid if they are constructed based on the following BNF productions: (NEED TO CHECK PRODUCTION RULES)

<letter> ::= A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
<digit> ::= 1|2|3|4|5|6|7|8|9
<character> ::= <letter>|<digit>|_
<string> ::= <character>|<character><string>
<identifier> ::= <letter>|<letter><string>


Are the following identifiers valid?

1
Z__
2
3AXY
3
D_E
4
D12
5
PETER's
6
_AbC
7
ABC

OUTCOME: Draw the parse trees and bottom-up parsing tables in your notes, state whether the string is valid and if it is not, explain why.

Checkpoint

Activity 3
Extended Backus-Naur Form

For complex grammars, BNF is too restrictive. Extended BNF, or EBNF, adds regular expression style notation...

Literals are enclosed by double quotes for example "word";
Optional items shall be enclosed in square brackets indicating that the item appears either zero or one times.
Repetition is introduced using curly brace meta-symbols '{' and '}'.
Curly braces followed by a * character indicate zero or more sequential instances of the item
Curly braces followed by a + character indicate one or more sequential instances (i.e. not optional).
Double period (..) shall appear within a literal definition as a shortcut denoting a set of ASCII characters between the letters on either side of them. For example "A..Z" representing the set of uppercase letters.

For instance, the EBNF definitions of the signed and unsigned numbers uses the following productions ...

signed_real = ("-" | "+") unsigned_real;
unsigned_real = integer "." integer;
integer = digit {digit};
digit = "0..9"
;

Note that EBNF is no more powerful than BNF, just more convenient. Any EBNF production can be written in BNF, it just might look a lot more complicated.

time limit
Task 3.1 NoSQL Grammar

Visit the Oracle NoSQL Reference Guide and spend a little time investigating the BNF rules for NoSQL. Do you see any you recognise from your work on databases? If not, it's probably because you haven't done about them yet!

Write some brief notes for your folders so that you have a reduced set of EBNF productions which demonstrate the basic concepts.

OUTCOME: A reduced set of notes on the EBNF productions for NoSQL

Checkpoint

Activity 4
Syntax diagrams

Conveniently, productions rules aren't the only way of representing grammars. Allow me to introduce 'railroad' or 'syntax' diagrams. Syntax diagrams are an alternative way of defining the syntax rules of a language. They can even be used to describe a subset of the English language.

The following EBNF production ...

Syntax = S(Y(N(T(A(X|ctic|gma)|he(sis|tic))|o(nym|(p(sis|tic))))|stem[[at]ic])|em(an|iom)tic);


...could be used to generate this syntax diagram.

image

This is an example of a programming statement which is defined both as a BNF grammar and a syntax diagram. Look carefully...

image
Railroad diagram

Modern programming languages still use syntax diagrams to define their structure. For instance, the following syntax diagram is from the JSON (JavaScript Object Notation) website...

image
Click to visit http://www.json.org/

If you can get from the left hand side to the right hand side whilst parsing the number (in this case) and not leaving the lines, the the number is valid as shown in the following beautiful giffy example...

image
How to use a syntax diagram to check the validity of a string

time limit
Task 4.1 Advantages of syntax diagrams

1
Using this number syntax diagram from the Json website, can you use it to define a set of 5 valid numbers?

2
Use this number syntax diagram to determine whether the following numbers are valid in that language:

+12.33E14
-0.99
12.244E-12
-1233.34e+45
0

3
How much easier is it to use a Syntax diagram like this than the BNF grammar you used in the Task called 'Constructing valid numbers'?

OUTCOME: Make a note of the syntax diagram in your notebook and use it to write down at least 3 examples of valid numbers. You could use a different coloured pen to 'trace' your way through the railroad diagram. Include a statement justifying the advantage of syntax diagrams over BNF grammars.

Checkpoint

We can't get away without setting some challenging questions can we?

time limit
Task 4.2 Syntax diagram challenges

1
User ID

A user ID for logging onto a computer system consists of ...

...two digits followed by
...one of the abbreviations H, L, D, R, Pa, Ph, followed by
...a surname consisting of one or more upper (A..Z) or lower case (a..z) letters followed by
...a single initial chosen from the set of upper case letters, A..Z.

Draw a syntax diagram for the user ID's in this system.

2
Identifier

An identifier in a programming language consists of a letter followed by zero or more letters or digits. Draw a syntax diagram for valid identifiers in this language.

3
What have the Romans ever done for us?

Draw a syntax diagram which defines the Roman Numerals from 1 to 10 ...

I, II, III, IV, V, VI, VII, VIII, IX, X

Your diagram should contain only 3 terminal symbols and should not consist simply of a series of alternatives. HINT: Start with I and build the alternatives up from there.

OUTCOME: Draw your syntax diagrams carefully in your notes and get your teacher to check them over. Ask for hints if you need to 😃

Checkpoint

Activity 5
EBNF Visualiser

There is a free piece of software called 'EBNF Visualiser' which is available to download for free from a 'Not secure' website (don't worry, it's fine). I have also provided you with a number of 'grammars' in .ebnf format...

3
Pencil hardness

image
The user interface of EBNF Visualiser (Windows 7+ version)

time limit
Task 5.1 EBNF visualiser - a challenge!

Can you get EBNF visualiser to work? Can you find the
case_statement
syntax diagram from the screenshot above? You can inspect the *.ebnf files using notepad to see what the flavour of BNF looks like.

OUTCOME: Nothing apart from the satisfaction of getting it to work!

Checkpoint

Extension Activities

Wasn't this challenging enough? Have a night off!

image

Other useful places on the web
🌐
en.wikipedia.org
EBNF at our favourite wiki.
🌐
www.research-collection.ethz.ch
The revised report on the Pascal programming language by Niklaus Wirth. Contains a full definition of Pascal as well as EBNF and Syntax Diagram representations.
🌐
www.bottlecaps.de
This is a tool for creating syntax diagrams, also known as railroad diagrams, from context-free grammars specified in EBNF.
🌐
en.wikipedia.org
Syntax diagrams from our favourite wiki

What's next?

Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.

END OF TOPIC ASSESSMENT

Last modified: May 8th, 2024
The Computing Café works best in landscape mode.
Rotate your device.
Dismiss Warning