CS38 : Just a regular guy


A regular expression is a shorthand way of representing a set of strings which fit within a certain language.

We are learning ...
  • How to use regular expressions
So that we can ...
  • State that a language is regular if it can be described using a regular expression
  • State that regular expressions are a simple way to represent a set
  • Describe simple languages using regular expressions
  • Use regular expressions for string matching
  • Describe the relationship between a regular expression and a finite state automata

Activity 1 What are regular expressions? (20)

A regular language is a formal language that will be ...
  • accepted by a deterministic finite state automaton
  • accepted by a non-deterministic finite state automaton
  • represented by a regular expression
A regular expression describes a set of strings to define a formal language. They are mainly used for pattern matching to find valid strings in a formal language :
  • words in documents
  • files in folders
  • spelling checkers
  • commands in a programming language
  • validating email addresses
  • validating postcodes
  • syntax highlighting
  • virus signatures
Regular expressions are often abbreviated to RegEx or RE.

Task 1.1
Regular expression metacharacter reference
Regular expression metacharacters.docx
Regular expression quick reference guide.pdf

Print out and review the metacharacters on the handouts and make sure these are filed for reference. Your teacher will talk you through the contents if necessary.

Outcome : Handouts in your notebooks.


Activity 2 Developing and testing regular expressions (85)

There are lots of different flavours of regular expressions depending on the implementation, so be careful.

There are also lots of different pieces of software and platform implementations of regex that you can use. A nice free one is The Regex Coach available as an installer from http://www.weitz.de/regex-coach/

However, since you have to install it, we will be using an online regular expression engine called Regexr.com ...

http://regexr.com/
Click image to visit website

Task 2.1 Developing regular expressions
http://regexr.com/
Brain

Step one

There is a series of really good videos on YouTube using Regexr which will help you to learn how to use the platform. Watch all of them in order, trying things out on the Regexr website as you watch. Take notes if you need to.

https://www.youtube.com/watch?v=GVZOJ1rEnUg&list=PLfdtiltiRHWGRPyPMGuLPWuiWgEI9Kp1w

Step two


Next, a few exercises to help you to understand regular expressions.
  1. Check out these regular expressions on the Regexr.com website.  Can you find two matching strings for each one and state whether the language is finite or infinite?  HINT : The presence of particular metacharacters makes a language infinite ...

    https://drive.google.com/file/d/0B83yXMOilskabUtlOXhMbGpBb2c/view?usp=drive_web
    Click to view

  2. Which of the following matches the regular expression [a-zA-Z]*[^,]=

    a) Butt=
    b) BotHEr,=
    c) Ample
    d) FIdDiE7h=
    e) Brittle =

  3. Design and test a regular expression which defines the following language.

    a) It comprises the alphabet {#,  , a, b, c, d, e, f, g, h, i, j}
    b) Valid words in the language must start with the # sign
    c) The hash sign must be followed by 1 or more letters in the range specified
    d) Valid words must end in two hash signs and a space to denote the end of the word.

  4. A language L is defined by the alphabet {a, b, c} together with the regular expression [ab]+bb

    a) Explain in English what constitutes a valid string in L.
    b) Give five examples of valid strings in L.
    c) Explain whether the language is finite or infinite.

  5. Say whether the following strings match the regular expression be[ea]n

    a) ben
    b) been
    c) bean
    d) beean

  6. Say whether the following strings match the regular expression 10[1]*01

    a) 1001
    b) 100101
    c) 10101
    d) 1011101

  7. The word 'licence' has been spelled in the following ways in a piece of text: licence, license, lisence, license. Write a regular expression that will find all instances of the word 'licence' spelt in these different ways.

  8. Say whether the following strings match the regular expression \d+[-\s]\d+

    a) 01296-433006
    b) 01296--433006
    c) 01793  234589
    d) 01793 234589

  9. Which of the following matches the regular expression [a-z]+[\.\?!]
    If the string does not match, tell me why!

    1. battle!
    2. Hot
    3. green?
    4. swamping.
    5. jump up.

  10. Design a regular expression which matches only the following strings (finite language) ...

    x, yz, zx

Write up carefully what you have done in your notebooks. It is quite likely that you will simple copy and paste the regular expression and the strings into Regexr, however, to demonstrate your understanding, you will need to explain the results that you have got rather than stating them.

Outcome : Worked solutions to all 10 regular expression questions


Activity 3 The relationship between regular expression and finite state machines (45)

Regular Expression to FSA

Since a regular language can be described by either a finite state machine or a regular expression, this means that the two are equivalent. You may wish to revise your work on Finite State Machines before you attempt this!

First, watch Barry Browns videos in which he shows you how to convert a regular expression into a Finite State Automata. There are some examples for you to try in the Task which follows.

Converting regular expressions to finite state automata

More advanced regular expression to finite state automata examples

For instance, consider the following regular expression ...

RegEx: ab*c
: "One 'a' followed by 0 or more 'b' followed by one 'c'"
Alphabet: {a, b, c}
Language: {'ac', 'abc', 'abbc', 'abbbc', ...}
: Language is infinite

This is equivalent to the following finite state automata.


Notice that this representation has a 'dead state', s3, into which the machine falls if there are any unmatched characters in the input string.

Task 3.1
 Regular Expression to FSA
Brain

Convert the following regular expressions (RE) into their corresponding finite state automata (FSA). Copy the regular expression into your notes and construct the corresponding finite state automata alongside it. Add details of the alphabet, written explanation of the grammar, examples of valid words over the alphabet and whether the machine is finite or infinite.

Look at the example and the videos for guidance if you need to.
  • (c|ab)
  • (a|b)a*
  • b*ac+
Outcome : Regular expressions and their corresponding finite state automata


FSA to Regular Expression

Conversion of a given FSA to a Regular Expression by formal mathematical means is rock hard. Instead, we will learn to approximate the corresponding RE from the given FSM which should give you enough ammo to get a reasonable mark in an examination.

Strategy : Look for as many unique patterns in the input strings as you can find which will lead to the goal state and write them down. If you can, simplify this collection of expressions by looking for common starting and ending strings and then alternate the bits in between. Simples!

See how far you get with these. I'm imagining that ...



Task 3.2
 FSA to Regular Expressions
Brain

Try to find as many patterns in the strings accepted by these Finite State Automata and write them all down in a list. Look at the list of regular expressions and see whether you can spot any similarities between them. Try to reduce the number of terms as far as you can.

Then use Regexr to test out yor regular expression with strings that you know will and won't be accepted by the Finite State Automata. Adjust your regex as appropriate.

Easy

Medium

Hard

I'll be impressed if you manage to get the same RegEx as me. That's not a problem because there are lots of matching RegEx for each FSA - it depends on how much you simplify it. Try your best. Identify as many 'parts' as you can - the more parts you recognise, the more marks you will get in the exam.

Outcome
 : An attempt to convert FSA into Regular Expressions.


Extension activities

Isn't it difficult enough?

What's next?

Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.

END OF TOPIC ASSESSMENT