Tuesday, August 12, 2008

Grammars



The field of software in the Computer Science is growing rapidly fast due to the development of well-structured high-level languages. Grammars impart a structure to a program in a high-level language that is useful for translation into a low-level language (machine language). The study of grammar constitutes an important area of computer science called Formal languages. This fascinating area emerged in the middle of 1950, due to the effort of Noam Chomsky, who gave an interesting mathematical model of a grammar in connection with the study of natural languages.

A grammar is defined by a 4-tuple G = (VN , VT , S , f ), where VT and VN are sets of terminal and non-terminal symbols respectively, S a distinguished elements of VN, is called the starting symbol, f is a finite subset of the relation from

(VT È VN)*VN(VT È VN)* to (VT È VN)*.

The elements of f are called production or a rewriting rule.

Example 1:


G1 = ({E, T, F} , {a, +, *, (, )} , E, f ) , where f contains the productions

E ® E + T, E ® T, T ® T * F, F ® (E), F® a,
This grammar is used for generating a certain set of arithmetic expression.

Example 2:

G2 = ({S, B, C} , {a, b, c} , S , f ), where f consists of the productions.
S ® aSBC, S ® aBC, CB ® BC, aB ® ab, bB ® bb, bC ® bc, cC ® cc.

Example 3:


G = ({S, A, B, C} , {a, b}, S , f ), where f consist of the set of productions,

S ® aS, S ® aB, B ® bC, C ® aC, C ® a.

Let G = (VN, VT, S, f ) be a grammar. The string y produces s (s reduces to y or s is the derivation of y ), written as
ys if there are strings f 0, f 1, … ,f n ( n > 0) such that y = f 0 Þ f 1 , f 1 Þ f 2, … , f n-1 Þ f n and f n Þ s .

For example the string a2 b2c2 can be derived from the grammar given in Example 2.

For, S Þ aSBC Þ aaBCBC Þ aabCBC Þ aabBCC Þ aabbCC Þ aabbcC Þ aabbcc = a2b2c2.

Remark:

Observe that as long as we have non-terminal character in the string, we can produce a new string from it. On the other hand, if a string contains only terminal symbols, then the derivation is complete and we cannot produce any further string from it.

A sentential form is any derivative of the unique non-terminal symbol S. The language L generated by a grammar G is the set of all sentential forms whose symbols are terminal. More precisely,

L(G) = {s / S s and s Î }.

That is, the language is a subset of the set of all terminal strings over VT.

For example in the grammar G4= ({S,C}, {a, b}, S, f ), where f is the set of productions

S ® aCa , C ® aCa , C ® b, has the sentential forms of form aabaa, aaabaaa, etc. In general L(G4) consists of the sentential forms of the form anban, n ³ 1. Since every sentential form has to start with the production rule S ® aCa, and the non-terminal symbol C can always be replaced either by the production C ® aCa or by C ® b. As long as we use the rule
C ® aCa the length of the sentential form will grow and it will have only one non-terminal symbol C and other terminal symbols are the symbol "a’s" appearing equally on either side of C. If we use the production C ® b, then the sentential form will consist of only terminal symbols, hence the derivation is complete. Thus all the sentential form are of the form anban.

Remark:

It is clear that the language L generated by any grammar G = (VN, VT, S, f ), will be a subset of the free semigroup on the alphabet VT.

There are four different types of grammars depending on the nature of its production rules.

A grammar is called context-sensitive, if all of its productions are of the form a ® b with | a | £ | b |.

In the context sensitive grammar, a and b in the production a ® b can be expressed as a = f 1Af 2 and b = f1 y f 2
(f 1 and / or f 2 are possibly empty) where y must be non-empty.

Thus, in the production a ® b , A is replaced by y in the context of f 1 and f 2 . That is the reason we call this grammar as context sensitive.

Language generated by context-sensitive grammar is called context sensitive language.

Example:

The grammar given in Example 2 is context sensitive.

Next important and most useful grammar in many high-level languages is context free grammar. A grammar is called context-free grammar, if all its production are of the form a ® b with |a | £ | b | and a Î VN.

In the context free grammar, the rewriting variable in a sentential form is rewritten regardless of other symbols in its context. That is the reason we call this grammar as context-free grammar.

Language generated by a context free grammar is called a context-free language
.

The grammar given in the Example 1 is context-free grammar.

The other simple yet powerful type of grammar is regular grammar.

A grammar is said to be regular grammar if all of its production rules are of the form

a
® b with | a | | b |, a Î VN and b has the form aB or a, where aÎ VT and BÎ VN .

The grammar given in Example 3 is a regular grammar.

A language generated by a regular grammar is called regular language.

A grammar is sometimes called unrestricted grammar and language generated by a unrestricted grammar is called unrestricted language.
Observe that every grammar is an unrestricted grammar.

Observe from the Examples 1, 2 and 3 that every regular grammar is a context-free grammar, every context-free grammar is a context-sensitive grammar but the converse is not necessarily true.

Let T0 , T1, T2 and T3 denote unrestricted, context-sensitive, context-free and regular grammar respectively. Let L(Ti) denote the class of all languages generated by the grammar Ti , 0 £ i £ 3. Thus it follows from the above discussion that L(T3) Ì L(T2) Ì L(T1) Ì L(T0).

No comments: