A LANGUAGE IS A POSSIBLY INFINITE SET OF STRINGS

100% Pass and No Plagiarism Guaranteed

A LANGUAGE IS A POSSIBLY INFINITE SET OF STRINGS

Alphabets, Strings, and Languages

DEFINITIONS

Most are familiar with the concept of a “language”. Obviously, we each speak a native tongue or language such as English or French or Japanese. Many are also familiar with the concepts of programming languages – C#, Java, PHP for example. In the study of computational theory, however, a language is a very important concept that is used to explore the boundaries of computing.

For the purpose of studying theory, a language is a possibly infinite set of strings. A string is simply a finite sequence of symbols taken from some given finite alphabet.

For example, we could choose an alphabet A consisting of the symbols 0 and 1. That is, A = { 0, 1}. This finite alphabet is used to build strings of a given language. For example if we were to define our language as L = { w | w is a some string built using alphabet A containing an equal number of 0’s and 1’s} then one string in our language would be 0011. Other strings in our language L would be 01010101, 1100110100 and 01.

Notice in the example above that our alphabet A was a finite set of symbols that we use to build strings. Alphabets are always finite. For example, words in our English language consist of sequences of the letters A-Z (along with occasional special symbols), a finite alphabet. In the study of the theory of computation we will often find ourselves dealing with the alphabet {0,1}. Why do you suppose that is? Well, the study of the theory of computation is, in effect, a search for the boundaries of what a computer can and cannot do. And how do computers do anything? With programs. And what, when you boil it down to its lowest level, are programs? They can be thought of as a sequence of 0’s and 1’s that are interpreted by the hardware of the computer.

Sample Alphabets

A1 = { 0,1 } A2 = {0, 1, 2} A3 = { a,b }

Strings are simply a sequence of symbols, concatenated one after the other, taken from our given alphabet. Each string must be finite in length. There is no maximum length that a string can be but whatever length it is it must be finite. So, theoretically, we could have a string consisting of a billion 0’s and 1’s but there must be a first symbol in the string and there must be a last symbol. Often, we refer to an arbitrary string using the symbol w.

If w is a string, then the notation |w| is used to indicate the length – or number of symbols – in the string w. Since all strings must be finite in length, |w| is some non-negative integer. So if our alphabet is {0,1} and a string w = 01010 then |w| = 5 since 01010 is 5 symbols in length.

An interesting string we will frequently see is known as the empty string. Different sources use different symbols to represent the empty string but we shall use the symbol  (Greek alphabet symbol lamda) to represent the empty string. The empty string is a string consisting of 0 symbols from the alphabet. That is || = 0.

Sample Strings given alphabet {0,1}

w1 = 01111 w2 = 0111111110 w3 = 0000000000011100000010000 w4 = 0

Also, in our previous example, the language L was defined using one of the techniques you studied in the set-theory component of this course. Since L is obviously an infinite set of strings (if it is not then there must be some “longest string” in L right? Well now concatenate 01 to the end of that supposed “longest” string and what do you end up with? That’s right – a longer string with an equal number of 0’s and 1’s and so, by contradiction, we see that our assumption that L was finite and thus had a longest string was wrong), we could not attempt to exhaustively list each member of the set. Languages can be finite but such languages are rarely ever interesting. Usually, when we study languages in computational theory we will focus upon languages consisting of an infinite collection of strings.

There are two languages that we see often in the study of theory. The first is the Empty Language L = { }. This is exactly the same concept as the empty set you studied earlier in set theory. It is a set that contains nothing – not a single string. If our chosen alphabet was say {0,1} then the empty language would simply be the set consisting of no strings at all. Do NOT confuse this with the language { } which is a language consisting of a single string which happens to be the empty string. This language contains one element – the empty string whereas the empty language contains nothing at all not even  .

The second language we frequently deal with is represented as L* (we shall see why later) and is the infinite language that contains all possible strings built using the given alphabet. So again, if our alphabet consisted of {0,1}, then L* would be the set of every string we could possibly construct using 0’s and 1’s. Again, remember, strings themselves are finite in length BUT languages can contain an infinite number of them! We can often define L* as follows.

L0 = { set of all strings of length 0 } = { }
L1 = { set of all strings of length 1 } = { 0, 1 } = A (the alphabet itself )
L2 = L1 concat L1 = {00, 01, 10, 11 } = { set of all strings of length 2)
L3 = L2 concat L1 = {000, 001, 010, 011, 100, 101, 110, 111 } = length 3 strings
…
…
L* = L0 U L1 U L2 U L3…. Where U is set union.

So L* consists of all strings of all finite lengths built from the given alphabet.

Sample Languages consisting of strings over the alphabet { 0,1}

Li = { w | w contains equal #s of 1’s and 0’s }
Lj = { w | w begins and ends with 0 }
Lk = { w | w reads exactly the same forwards and backwards } (palindromes)

Membership in a language

Although it may seem odd at first, much of the theory of computation comes down to the seemingly simple problem of attempting to identify a machine/program that can determine if a given string w is a member of a given language L. That is, we are simply trying to find the answer to the question “Is the string w a member of the language L?”.

This sounds simple enough but looks can be deceiving! The first problem we have is if we are going to solve such questions then what model should we use to do it? Should we try to use your trust desk-top computer as our model? That would seem to make sense but if you really stop and think about it, that desktop computer is very very complex. It has a CPU chip with registers and it has RAM and video RAM and various other accelerator chips or floating point processors etc – to simply and quickly talk about the state of that machine would take a great deal of time and effort and would lend itself to mistakes.

What we’d like, and what a course in the Theory of Computation attempts to find, is a simple model for computation. That is, a very basic machine that is easy to define but which has the same ability to solve problems as that desktop computer or any super-computer for that matter, but without all the ugly details that clutter things up.

Our first attempt at defining such a machine is known as a Deterministic Finite Automata or DFA for short.

DFA

DFAs are very basic machines. Every DFA is designed to carry out a very simple operation. It is given a string w as input and its job is to either accept or reject the string. If it accepts the string, then we say that w is in the language accepted by the DFA. If w is rejected, then we say that w is not a member of the language accepted by the DFA. We build DFAs to accept specific languages of interest to us. For example the language

L = { w | w is string of 0’s and 1’s that begins with 01 }

Obviously, L is an infinite language. The DFA below accepts the language L. To understand how, we need to look a bit deeper into the operation of the machine. First note that every DFA has a single input tape where the string in question is placed. The tape is divided up into consecutive cells each of which can contain a single symbol of the string.

DFAs also have a set of states which are represented as nodes/circles in the drawing. The machine begins its execution located in one special, specific state known as the start state. The start state is often represented by the label q0. This node will also be unique and will be represented with an incoming thick arrow. The machine begins its work by reading the first symbol of the string and making a transition based upon the state it is in as well as the symbol it has read.

A transition is represented in the diagram as an arrow connecting one state/node to another. Notice that each of these transitions/arrows are labeled with a symbol from the alphabet. For example, if you look at the diagram you will see that there is a transition from state q0 to state q1 if a 0 is read from the tape. Symbols are read from the tape one at a time from left to right. Each time a symbol is read, the matching transition is taken.

This process is repeated for each symbol of the input string w until the final symbol is read and the final transition is made. At this point, the machine determines if it accepts the input string or if it rejects it. It does this by examining the final state it ends up in. You may have noticed that the state q2 looks different from the other states in that it is represented with two circles rather than one. This means that it is an accepting state. If, after reading the input string w, the machine ends up in an accepting state then it accepts the string w as part of its language. If, however, it does not end in an accepting state, then the machine rejects the string w as not being part of its language.

To better understand how the given DFA works, Let us follow through its execution on the string w = 010001110 . As stated, our machine will begin operation in the start-state q0. It begins by reading the leftmost symbol of the string w which is a 0. The machine must now make a transition. Note the arrow/transition that leaves q0 and goes to q1 which has the label 0 on it. This is the correct transition and so now the machine is in state q1.
At this point, the machine will read the next symbol from the input which is a 1. It is now in state q1 having just read an input symbol of 1. If we examine the diagram once more we will se that we should make a transition into state q2, which is exactly what happens.

If we look at the diagram closely we will see that once we get into state q2 all of our transitions, whether they are 0’s or 1’s will leave us in state q2. This means that as the machine reads the rest of the string w it continues to transition back to q2. thus we read the remaining portion of the string, one symbol at a time 0001110 and upon each single transition we end up back in q2. Thus, when the machine has read the final symbol of w it is in q2 which is an accepting state and so it is a member of the language of the machine.

Now consider the string w = 0011. We would again start in q0 reading the first symbol of w. In q0 after reading a 0 we would transition to q1. Once in q1 we would read the next symbol which is also a 0. Now we would make a transition into state q3. Once in q3 we would read the next symbol which is a 1. The diagram indicates that we should transition into q3 again and we do. Next we read the final symbol of the input which is again a 1 and we transition into state q3 again. Since the final symbol has been read the machine checks its final state and sees that q3 is NOT an accepting state and thus it rejects the string 0011 as not being part of its language.

If you look carefully at the diagram of the machine you should easily notice that the only way we can ever start in q0 and end up in q2 (the accepting state) is for the string w to begin with the sequence 01 exactly. Any other variation will result in our ending up in a non-accepting state q3. Thus our machine accepts the language described earlier of all strings of 0’s and 1’s that begin with the sequence 01.

Formal DFA Definition

More formally, a DFA is defined as a 5-tuple. That is, an ordered collection of 5 things. We define a DFA M as:

M = ( Q,∑, , q0, F) where,

Q = finite set of states in the machine (in our example {q0, q1, q2, q3} )
∑ = the input alphabet ( in our example {0,1} )
 = the transition function which takes the form (qi, x) = qk. That is, it is a function that takes a current state that the machine is in qi along with the current symbol on the input string, x, and maps it to a new state qk to which the transition is made. See figure above.
q0 = the unique start state of the machine. NOTE – it must be unique.
F = the set of accepting states. Note F⊆Q obviously. In our example, {q2}

DFA Examples

Design a DFA for the given languages.

1) L = { w | w consists of 0’s and 1’s and must contain the substring 010 }

The first thing many people do when trying to design a DFA for a given language is to identify some of the words that are members of the language. Below are some of the words in L with the pertinent 010 pattern identified via underlining.

010 (the shortest word in L), 11111100101 , 00011010 , 1101011

The next step in designing the machine is to think about how it will work. Clearly, this machine will need to pay attention to any 0 it sees as it reads the input. A 0 followed by a 10 sequence will indicate that the string is in the language. However, what happens if we have a sequence such as 0010? The first 0 is a “red-herring” of sorts – that is, it is not part of the accepting 010 pattern and needs to be ignored. The machine below accepts the language L described above.

Notice a few things about this DFA. First, note that although we have stuck with the unique start state being labeled q0, the final state of this machine is now q3. Also note that transition from state q2 to state q0 on input symbol 1. This is a key transition to the correctness of this machine. IT helps if we look at the “purpose” of each state in our machine construction.

q0 - This start state effectively sits and waits for a 0 to be read from the input. If it reads a 1 it simply ignores it. Upon reading a 0 it transitions the machine to state q1.

q1 – We reach this state knowing that our previous symbol was a 0. Now if we see a 1 we can transition to state q2 in hopes of seeing the final 0 of the 010 sequence that is required of any string we accept. If, however, we read a 0 this means that the previous 2 symbols are 00 and thus we may just now have read the beginning 0 of the 010 sequence we want and so we stay put.

q2 – We reach this state knowing that our previous 2 symbols must have been 01. This means that upon reading a 0 we can transition to the final state q3 and accept the given string (once the remaining characters have been read of course). However, if we read a 1 this means that the sequence 11 has been read and now we must wiat for another 0 to occur in order for the needed 010 sequence to be read and so we return to the start state and begin again.

q3 – This is the final state of the machine. If we reach this state we know the string already contains the necessary 010 subsequence and so we can simply effectively ignore any 0’s or 1’s we read from this point on as we are assured that the string will be accepted.

You should walk through the execution of the machine on the following inputs:
1111 ( this should reject and remain in state q0)
1010 ( this should accept on the final symbol)
10000001 (this should reject and finish in state q2)
1011011 (reject and end in state q0)
1000 (reject and end in state q1)
1010111111 (accept in state q3)

2)L = { w | w is a string of 0’s and 1’s and DOES NOT contain the substring 010 }

This language is obviously different from the previous one. In fact, it is clearly the complement of the previous language. That is, any string accepted by the previous DFA should be rejected by any DFA accepting this language and any string rejected by the previous machine should be accepted by the DFA for this language.

At first glance this may seem to be a difficult thing to do but if we think carefully about the problem we will see that we effectively already have solved the problem! That is, we have a machine the rejects words in this language and accepts words that are not in it – how can we alter the machine to make it do the opposite?

Well, simply put, all we need to do is to change all of our accepting states to non-accepting states and all of the non-accepting states to accepting states! Clearly, anything that was accepted in the previous machine will now be rejected and vice versa. The machine is pictured below. Try the previous strings on this machine to confirm that those that had been rejected are now accepted and those that were accepted are now rejected.

Note the following about the machine shown above. First, it accepts the empty string ! Does this make sense? Yes it does – obviously, the empty string contains no symbols and thus trivially does not contain the substring 010.

Also note that the string 010 is rejected as it leads to the now non-final state q3.

3)L = { w | w contains an odd number of 0’s and an even # of 1’s }

This is an interesting language. If we don’t think closely about the language we might assume that we must count the number of 0’s in our string as well as the number of 1’s. But how can we do that?

To see the problem, let’s take a quick look at the following language LX = { w | w has one more 0 than it does the number of 1’s } – for example if w has 23 1’s then it must also contain 24 0’s. If it contains 88 1’s then it must have 89 0’s and so on. BUT note that the order of the 0’s and 1’s doesn’t matter! They can be spread about any way we choose.

How can we build a DFA for that language? Note that there is no limit to length of a string in LX. It is conceivable that we have a string with one-million 1’s and one-million and one 0’s. Although any particular string is finite in length (for example the string just described is of length 2-million and one) we do not bound the length that any particular string could be. However, a DFA is known as a finite-state-machine. That is, the machine must have a fixed number of states – the states of the machine cannot grow or shrink dynamically. DFA’s are limited in that they cannot count in an unbounded way. To understand this, look at the previous machine we designed. When the DFA is in state q1 does it have any way of knowing how it got there? That is, does it have any memory of whether or not it was there before? Perhaps it has been looping on 0’s. DFA’s do not have that sort of memory capability and thus our language LX can never be recognized by any DFA.

Now let us return to language example #3. After looking at LX we might decide that this language too is impossible to build a DFA to recognize. But, it turns out we can using the observation that we need not keep track of the actual numbers of 0’s and/or 1’s we have read but rather we need only keep track of whether or not we have previously seen an odd or even number of 0’s and an odd or even number of 1’s. If we stop and think about it, at any point there are only 4 possible combinations! That is quite finite and we can keep track of that.

For example, if at some point we have read the substring 000100 then we can know that so far we have seen an odd number of 0’s and an odd number of 1’s. We don’t need to know exactly how many just whether or not we have seen an even or odd number of 0’s and/or 1’s. The only 4 possible combinations are

Odd 0’s Odd 1’s
Odd 0’s Even 1’s
Even 0’s Odd 1’s
Even 0’s Even 1’s

So at any point we can be in one of only 4 possible states! We can build a DFA with 4 states, each representing one of the 4 possible combinations and transition between them appropriately. For example, if we were in the “Even 0 Even 1” state, and read a 1 then what state do we transition to? The “Even 0 Odd 1” state. The machine below recognizes the given language.

Regular Languages

We have seen that any number of languages are accepted by DFAs but obviously there are others for which we cannot build a machine such as LX we looked at earlier. There are many other similar languages. For example consider Leq = { w | w contains and equal number of 0’s and 1’s}. This language too would require counting. This being said, there are still a great number of useful languages for which we can build DFAs and which correspond to useful tools used in computing.

Any language which can be accepted by some DFA is called a Regular Language.

Sample Applications

Regular languages are surprisingly common in computer programming. Two quick examples are searching text and virus detection. Many text editors and word procesors and other similar tools allow you to search for key words or phrases. This is done by simply building a DFA for the search phrase and reading the text character by character until a final state is reached and thus a match is found. Below is a simple DFA searching for the phrase “Hello”.

In the case of computer viruses, they are often found attached to other applications or files. They are usually in an executable, binary form (thus an alphabet of 0’s and 1’s!!!!) and can be identified by a unique sequence of bits that are unlikely to have occurred by chance. The virus-detection software simply scans binary files looking for this “fingerprint” sequence of consecutive 0’s and 1’s and flags it as infected if it is found. It can use a form of DFA to carry out this action.

Many other examples exist in computer science for the use and importance of regular languages. Components in all modern programming language compilers use regular languages as do many design techniques and tools for user-interface design.

100% Plagiarism Free & Custom Written,
Tailored to your instructions

International House, 12 Constance Street, London, United Kingdom,
E16 2DQ

UK Registered Company # 11483120

100% Pass Guarantee

STILL NOT CONVINCED?

View our samples written by our professional writers to let you comprehend how your work is going to look like. We have categorised this into 3 categories with a few different subject domains

View Our Samples

We offer a £ 2999

If your assignment is plagiarised, we will give you £ 2999 in compensation