An Introduction to Lexicography

DICTIONARY MAKING PHASE I -PREPARATION

Prev | Home | Next

4.0Dictionary making,General nature: The work on the compilation of a dictionary from the beginning to the final printing may be divided into the following three phases, each phase having different steps:
(1) Preparation,
(2) Editing,
(3) Preparation of the Press copy.

(1) Preparation: this phase includes the planning of the dictionary, the collection of the material and the selection of entries of the dictionary.

(2) Editing: this phase involves the setting of entry. The work includes fixation of the head word, its pronunciation, grammatical characteristics and the fixation and selection of definitions etc. of the head word.

(3) The third phase i.e. the phase for preparation of the press copy involves arrangement of entries, the use of notations and preparing an introduction for the dictionary, which includes general features of the dictionary, guide to pronunciation etc.

But these phases are not strict divisions of work. They are not exclusive to each other. As a matter of fact, the lexicographer is faced with problems, relating to all the phases, except the final one, at all phases. There is a lot of repetition. Let us take the second step of the first phase viz. the collection of data. Although this is the beginning of the work and logically once it is over work on the other steps and phases begins and the collection is stopped. But this cannot happen in actual practice1. During the course of the preparation of the dictionary, which takes years, many new texts may appear in the language and many new words may be added to the lexical stock of the language. Many lexical units may acquire new shades of meanings. It may also happen that while scrutinizing the data some new information hithertofore not available is found out. Some of the lexical units which might have occurred as only occasional and ephemeral at the beginning of the work might have stabilized in the meantime. Some of the meanings appearing as more nuances might become quite regular and be systematized in the language. In order to make the dictionary up-to-date all these facts must be taken note of. Thus the collection of data does not stop at the first phase.

The setting of entries is closely linked with the selection of entries. While writing the entries the different types of lexical units to be included are scrutinized on the basis of their form and meaning and then a final selection is made.

The notations, although they come in the third phase, are required in the second phase also, because while editing the entries the lexicographer has to use them for separation of meanings and submeanings. As a matter of fact, the use of notations and the format should be decided tentatively at the stage of planning itself. Because they provide guidelines for other stages also.


4.1 Planning: Dictionary making is a long, complex and time consuming activity. The preparation of dictionaries takes several years. The following table would show how lengthy is the process of dictionary making:

Name of the dictionary                             Year of                                             Year of

       beginning       completion

Oxford English Dictionary                                 1888                                        1928

Tamil Lexicon                                                   1913                                        1938

Malayalam Lexicon 1953 4 volumes appeared so far

Sanskrit Dictionary (Poona) 1952 2 volumes appeared. The re-edition of Webster's III took 757 editorial years and cost 3.5 million dollars.

As the work involved is stupendous, it is necessary that a detailed planning is done before the work begins. Some of the basic issues crucial for planning the work on a dictionary are discussed below:

The first point to be considered is about the type of the dictionary. The work on dictionary differs according to the types of dictionaries. The list of words in a reference dictionary is different form the one in a learner's dictionary. The dialect dictionaries contain different type of word list form an academic or normative dictionary. The word list in a special dictionary is governed by the special purpose or restrictedness of the dictionary.

The word list in a concise dictionary is much smaller than the word list of an unabridged dictionary.

Next, the lexicographer should decide about the language of the dictionary. As a matter of fact, the method of collection differs from language to language. For this the variations in the language are to be considered. If there are many dialectal variations, the dictionary maker has to decide whether all the dialectal forms are to be included in a dictionary or only a few of them. For example, for a dictionary of Hindi, it is to be decided as to whether the lexical units from Braj, Awadhi etc. are being entered in the dictionary or not.

Another point to be considered is whether the dictionary is based on purely contemporary material of the language or does it plan to incorporate earlier literature also. Idioms and proverbs represent an earlier and older stage of the language. Ordinary speakers quote from the old texts. So would it be desirable for a dictionary of Hindi to include lexical units from earlier writers like Kabir, Tulasi, Bihari etc? Although they do not form the lexical stock of the contemporary language, they are at times needed by some general readers, especially by students for comprehending texts in the language.

The lexicographer has to consider whether the language has a diglossic situation e.g. Tamil and Bengali. If there is diglossia which variety is to be included in the dictionary or is the lexicographer going to include both the varieties?

The social and stylistic variations of the language are also considered by the lexicographer. Whether the dictionary aims at presenting all the professional registers, all the slangs, jargonisms and vulgarisms etc. The inclusion of all this is very difficult if not impossible. So the lexicographer has to decide as to how much of it is to be given in the dictionary.

These decisions should be taken before starting the actual work on the dictionary and be strictly adhered to. There is practically no scope of making any large scale changes in the basic format of the dictionary at a later stage when the work has made some progress. Suppose, it is decided not to include slangs and vulgarism in the beginning but later on the decision is revised the lexicographer would have to go back again and start the work almost afresh.

All these decisions must be recorded so that when new hands join the project there is no difficulty in following the line of work. The instructions must be complete with minutest details. In order to do this it would be useful if a blue print is prepared for the project. This blue print may contain description and instructions regarding the following:

Collection of material (the sources may be mentioned), preparation and filling up of cards (with sample cards), the compilation of word list, the structure of the dictionary entry, description and definition of meaning (their order etc.), labels, phraseology, illustrations, grammatical characteristics of words, script and pronunciation etc. For all this, the actual examples could be given. This blue print or project might also contain a few sample entries for the dictionary. The entries may be subject to certain modifications, but they will provide basic guidelines for those working in the project. Again, these entries should be as varied as possible so that different types of lexical items are given in them.

Besides these details, the project or blue print should contain the scope of the dictionary, its purpose and the readership, the range of coverage, etc. The preparation of such blue print will not only help as a guide book for the compilers but can also be used to prepare the introduction of the dictionary.


4.2 Collection of Material: the collection of data differs for different types of dictionaries. For languages which have written literature, the material is collected form written texts. For unwritten languages the word list is to be collected by field method from the spoken form.

Collection of data for languages with written literature: the work of the collection of data for such languages has two basic components which should kept in view by the lexicographer:

(1) the source from which the material is collected.
(2) The method of collection.

The nature of the source material differs for different types of dictionaries.


4.2.1 Historical dictionaries:For a historical dictionary the collection is done from the available representative texts of the language from the earliest period of their availability to the present time. The lexicographer should examine the data for historical dictionary keeping in view 'the evidential value of data' from the following (Kelkar 1973).

(1) ancestral language, (2) cognate languages, (3) descendent languages, (4) donor or recipient languages, (5) substratum and superstratum languages .

The sources for a dictionary of frequency county may be determined by any criterion. There may be frequency country of journals and newspapers only, or of general literature or of general scientific texts.

The source material for learners dictionary may be based on frequency -dictionaries. Words may also be collected form contemporary literature and available dictionaries of the basic words.

The material for children's dictionaries is collected from the text books. Writings, answer scripts, note books and compositions etc. of the students should be studied and the words used by children can be tabulated with the frequencies of the use of words. These frequencies can be used in addition to the general basic vocabularies.


4.2.2. For dictionaries of written languages:The source material for a normative dictionary may be different from that of a reference dictionary. For a bilingual dictionary generally an existing monolingual dictionary is taken as a source material.

For a normative dictionary the material may be extracted from the following sources:

The material used for giving the actual context with reference to the place o f occurrence to authenticate the meaning and usage of the lexical unit. This includes creative works of literary writers, texts on technical and scientific subjects as also works of other branches of human knowledge like history, philosophy, logic etc. The reference sources consist of texts of different nature like, rule books, orders, notices, manuals etc. This helps in finding more varied usages of lexical units.

Besides these, articles, sketches, and other types of texts from journals and newspapers can be used as source material. They provide specimens of the contemporary language. They are good sources for extracting words and phrases newly introduced in the language or words or phrases used in new senses. Special terms introduced in the language are sometimes found in such sources. The language of mass media like Radio, Television etc. may also be utilized for collection of the material for a normative dictionary.

For reference dictionary the sources are a little more varied. Since the focus of this dictionary is not only the standard language but also the regional and social variations of it, such dictionaries may also use oral literature which has very marginal role in a normative dictionary. The extraction is done from different types of oral literature to add variety to the lexical units used in a dictionary. So different types of discourses, e.g. narrations, eye witness accounts, conversations, arguments, dialogues etc. can be used as source material for these dictionaries.

Besides the above source materials, some dictionaries, for example Malayalam lexicon, also utilize some other recorded materials like inscriptions and manuscripts for collection of material. Collection of data from such sources involves the problem of textual criticism, decipherment of older scripts and inscriptions. Abstraction of lexical units poses the problem of segmentation from the recorded continuum of graphemes.

If there are some dictionaries in the language these could also be used for collection of lexical items. In many cases the dictionaries can provide some additional senses of lexical items which are not otherwise available in the corpus of the dictionary. But for this the lexicographer should be very careful lest he gives many words and meanings in his dictionary which are not used in the language. For example ta has 15 meanings in a Hindi Dictionary. go has 18 meanings. Again many a lexical units in a particular dictionary might have gone out of use. A lexicographer should be careful in examining such cases.

But the collection of data from all the above sources may not be enough for a dictionary. For bigger dictionaries there are usually advisory boards consisting of experts on different branches of human knowledge. These experts not only provide terms special to their discipline but also help at a later stage to give definitions to these terms. Even common people could be associated in providing material for dictionaries. An appeal by Fowler Brothers for COD had received quite commendable response. Many of the illustrative quotations numbering nearly two millions in OED were supplied by any army of more than thirteen hundred contributors (Whitaker 1966, 31).

A reference dictionary, which aims at presenting regional or other variations, should include in their staff or advisory board, persons who could provide material for all such variations.

The lexicographer, if he is the native speaker of the language, could himself provide a lot of information. He may construct his own examples in order to disambiguate the polysemy of certain lexical items.

Another type of texts which could be extensively used by all dictionaries, especially the bilingual ones, are translations from different languages. These translations provide new technical terms and other types of words related to the life and culture of the people of the source language.

The collection of data for a dictionary is done by the method of extraction. A single lexical unit extracted on one card with its full context which is adequate enough to express the meaning of the lexical unit clearly and unambiguously. The two basic qualities of a good context are that it should be short and clear and unambiguous. The shortness of the context is conditioned by the practical problem of space in a dictionary. A concise dictionary can ill-afford to provide space for every lengthy contexts. But the shortness of the context should not be achieved at the cost of the clarity.

As for the cards, they are prepared for this purpose keeping in view the volume of the information to be given with each lexical unit. Space is marked, sometimes printed also, for each type of information. A typical card for a Kannada-Kannada-English Dictionary is given below:
___________________________________________________________
Spelling in Kannada Script               Meaning in English
______________________                ____________________________

______________________                _____________________________
Pronunciation in IPA or                     Meaning in Kannada
Roman Transliteration  
______________________                _____________________________

______________________                ____________________________

Grammatical Category

Reference……………………………………………….

This is a sample card. But the space on the card and the different information on it vary from dictionary to dictionary. Some dictionaries may provide space for etymology, synonyms and antonyms also. Sometimes a dictionary project may use cards of different colours for recording different types of information. The Etymological Dictionary of Telugu uses cards of different colours for recording words form different sources e.g. Sanskrit, Dravidian, Desi etc.

It will be useful if the cards are numbered. This will help in knowing the number of extractions made.

The collection of data on cards takes several years. The information collected is very useful form different points of view. Their utility is not lost as soon as the work on the dictionary is over. The cards should not only be preserved till the final printing is over, as some of the cards may have to be referred to even at the stage of printing but should be preserved for further work also. Such collection of cards called lexicographical archieves or scriptorium are valuable assets of any language. Bigger dictionaries have very large number of cards. Malayalam Lexicon has twenty-eight lakhs slips. Nancy France has four hundred million slips. The scriptorium of the Sanskrit Dictionary (Poona) is not merely the source of the dictionary, it is the store house of information on Indian life. It is the repository of various branches of knowledge. Dictionaries of different types e.g. dictionary of phrases and idioms, collegiate dictionaries, dictionaries of synonyms and of antonyms, can be prepared form these cards. Besides providing material for preparation of dictionaries the cards can also be used as sources for many cultural information.

The context of the lexical units, called lexicographical context, may vary according to the nature of the lexical unit and its usage. Sometimes even very short contexts may be adequate to give the meaning of the lexical unit. But some times, it may be a full stanza. From this point of view the contexts may be of the following types: -

(1) It may be a word or a single lexical unit, e.g. Hindi acchaa 'so', 'yes' Sanskrit gaccha 'go' (Imperative second person singular).

(2) It may be a phrase or a sentence. Isliye log usase ghr?n?aa karne lage. 'So people started hating him', a new hired hand, the head of the firm etc.

(3) It may be a full stanza or even a collection of sentences e.g. Sanskrit akr?ta. Adj. 1A. VIII. 'not done or prepared (specifically somebody)'. akr?ta kaaritaam bhiks?aam manasaanaanumoditam gr?hyataam vidhinaa yuktaam tapah? pus?yati yoginaam. Padm. P. (Ra.) 92.48.

aks?atayoni. adj. '(a woman) who is not deflowered' saa cedaks?atayonih? syaad gataa pratyaagataapi vaa paunarbhaveina bhartraa saa punah sam?skaaramarhati. Maha. XIII. 314. 3

Hindi yakiin karnaa : 'to be convinced'

muniim ne niicaa sir kiye hue kahaa, paaNc ser duudh aur paav bhar jalebii kii rasiid. samiti vale yakiin nahi"i"N kareNge ki vakiil pakkaa savaa paaNc serd?aal gayc sak kareNge ki muniim pacaa gayaa (DabepaaNv. 5) (quoted from Bahl 1974, 117)

Sometimes, it may take a full paragraph to give a full context especially when the lexical unit has some cultural significance.

As we can see the extraction is done for full collocations which give clear and unambiguous meanings. A word is extracted in all its possible contexts. The occurrence of a word in different contexts in the same sense should not deter the lexicographer from the collecting more extracts for the lexical unit. It is likely that a new meaning is available in a further extraction. Again, even if many cards have the same contexts i.e. give the same meaning these should be preserved. It is quite possible that a particular context may at the last moment bring forth the meaning more clearly.

The lexicographer usually makes two inferences on the basis of the lexicographic cards.

(1) An inference is made regarding the contextual sense of the word. A meaning is tentatively fixed for the word from the first extract. It is later on verified to its appropriate meanings taken out of all the possible contexts for example, form the sentence:

vah ghar meN rehtaa hE. 'he lives in the house' a tentative meaning 'house' for ghar is arrived at. Then the following contexts are verified:

(1) vah bar?e ghar kii bet?ii hE. 'She is a daughter (or girl) of high family'.

(2) Mere kurte meN bat?an ke liye ghar banaanaa hE. 'A hole is to be made for buttons in my kurta'.

(3) Is makaan meN caar bar?e aur do chot?e ghar hEN. 'There are four big and two small rooms in this house.

From sentence (1) the meaning 'family' is determined, from sentence (2) the meaning 'hole' and from (3) 'room'. The lexicographer puts all these meanings for ghar in his dictionary. As the meanings are related they are treated as the multiple meanings of the same word. We may take another example. From the following contexts the lexicographer finds out the different meanings of chair:

(1) he sat on a chair,
(2) the chair of philosophy,
(3) he will chair the meeting,
(4) he was condemned to the chair.

as

(1) separate movable seat for one person,
(2) position of Professor,
(3) to preside,
(4) electric chair for death.]

But let us compare the following contexts in which the word aam occurs:

Aam cunaav meNkaangres kii jiit ho gaii

'Congress won in the general elections'.

Banaaras kaa aam bahut miit?haa hotaa hE. 'The mango from Banaras is very sweet'.

Here the meanings (1) 'general' and (2) 'mango' are not related. Wherever the meanings do not appear to be related the lexicographer treats them as separate words.

(2) The lexicographer can make some abstractions of the canonical form which is to be set up as the head word.

While making the extractions the following points must be kept in view:

The lexicographer should be careful about depleted, incomplete and ambiguous contexts. Such contexts do not give full and clear meaning of the lexical unit. Contexts like you cannot live on moon are no good. Here the meaning of the phrase live on is ambiguous. It can be interpreted both as (1) 'reside' and (2), 'sustain'. Similarly ring in He gave me a ring may mean both 'a metal ring' and a 'telephone call'. In Hindi siitaa gaanewaalii hE the meaning of gaanewaalii is not clear. It may mean both 'Sita is to sing' and 'Sita is a singer'. Similarly in Raamko kurtaa acchaa nahu_N lagtaa has two meanings 'does not like' and 'does not suit'. The lexicographer has to examine such contexts and collect only those which are self sufficient to determine the meaning of the lexical unit.

For function words, attempt should be made to collect extracts to the maximum. Many of the complex and diverse uses of such lexical units may not be easily available if the extracts are few.

Closely related to this is the question of the nature of extractions. What type of extraction should be done from different sources? Although an ideal situation would be to extract data from all works, there is the practical difficulty of dealing with enormous amount of data. So the extraction has to be selective at some stage. Form this point of view, extractions can be of two types: -

(1) General and (2) Special.

(1) General Extraction: when the lexical units of general nature are extracted it is general extraction. Lexical units although belonging to some definite thematic groups are extracted for these general meanings. for example, from stories and articles on theme of hunting one may collect lexical units related to the field of hunting. From a general work on sky and sea words related to sky water, climate etc., may be extracted without any specification of the technical meaning or explanation of the word. for example from a book on hunting words like H. machaan 'a raised platform' (for shooting wild animals)' kheddaa 'Khedda' can be extracted. Similarly words like water, tide, wave etc. may be extracted from a book on sea.

(2) Special Extraction: this is done for finding out the special technical meanings of words belonging to any subject field. For example from a book on general linguistics, one may get a detailed list of linguistic terms in their special meanings. similarly, from a book like, 'The Language of Kabir' words for philosophical terms used by Kabir may be found. Form a book on botany the words of flora in their special meanings may be found. Textbooks in different subjects provide details of technical terms related to the particular branch of knowledge.

On the basis of its quality, the extraction can be of two types:

(1) Concordance or total extraction and (2) Selected or partial extraction.

(1) Concordance or total extraction: this is done from all the general works of a language. All the words in a text are extracted in all the contexts of their occurrence. In the beginning the extraction is of the nature of Thesaurus i.e. collection of all the occurrences of a word with actual citations. But after sometime, the extractor knows that the word in some senses is being repeated. Collection of such multiple information is discontinued at a later stage. A useful way to do this is to have concordance type extraction for every work for the beginning portions. But after sometime, if some words are found again and again without adding any new sense the extraction is to be stopped. But before deciding to stop collection of extracts wherein the meaning of a lexical unit is repeated, the lexicographer should make a thorough comparison of contexts to find out the similarity and difference in the components of the meanings of the words in its two or more occurrences. If there is any difference the context should be extracted, because it would give a further sense.

(2) Selected or partial Extraction: selected extraction is done for collecting such lexical units which have not been covered by general extraction. This situation comes when special type of lexical units are found in a text, dealing with the life of some people of one or other profession or social group. For example when one reads Amrut Santan, a Oriya novel by Gopinath Mahanti, he finds a large number of lexical items and expressions related to the tribal life of Orissa. Form such a text words relating to the tribal life may be extracted. A story or novel with regional and local colour in it e.g. the stories and novels of Phan?ishwarnath Renu in Hindi, may be selected for extracting lexical units particular to a region in Bihar in India4. From Nisi Kut?umba, a Bengali novel by Manoj Basu, a large number of words related to stealing and house-breaking may be extracted.

All these extractions can go side by side. But after the extractions have been done it is necessary to have a checking of the source material. It may be considered essential at some stage to include more words from some other type of works.

4.2.3 Collection of data for unwritten languages: for unwritten languages the data is collected by field method with the help of informants. The criteria for the selection of informants, their age, sex, cultural and psychological qualities like intelligence, memory, alertness, patience, honesty, dependability, cheerfulness etc. have been discussed in works on field linguistics (Samarin, 1967, Nida 1947 138-146). Also discussed therein are the ways in which a field worker should approach and deal with the informants to elicit as much of data as desirable without either causing annoyance to the informant or antogonism in him. This would assure faithful and proper elicitation of the proper language data.

The number of informants to be employed depends on the scope and the type of the dictionary. If all the regional varieties of the language are to be included informants should be selected from all the regions. In order to ensure optimum data it is advisable to select more than one informant for every variety. This would also help in checking and rechecking the data.

In order to elicit lexical units of as many varied types as possible it would be advisable to select informants from all the following groups5.
(a) from both sexes,
(b) from persons of all ages,
(c) from persons belonging to different economic and social groups.

Many a typical lexical item common among the women may not be elicited from male informants, it is not unlikely that, at times, only males may be able to provide such items. Many unwritten languages, especially a large number of tribal languages are fast coming under the influence of their neighbouring languages. As a result, new lexical items are introduced replacing the older stock of the language. The younger generation is adopting the new lexical items in place of the older ones leading to a gradual loss of older vocables in the language. Only older generation knows many of the at-present-dying-lexical-units. Therefore, the informants of older age would be useful for providing larger number of lexical units of the type noted above. The younger generation informant would be equally useful for providing new words introduced in the language. The representation from different social groups will ensure the inclusion of words from those groups.

The method of the collection of data for a dictionary would be slightly different from that involved for writing a linguistic description of a language. As Samarin observed "the compilation of a dictionary is a goal very much different form that of a language description, especially when the dictionary has a strong ethnographic bias". (Samarin 1967, 46). It might not take much time for an investigator to identify the phonemes and the grammatical classes of a language. All this may also not require a large amount of data as needed for a dictionary. Because "the collection of a mountain of texts, whether he can translate them or not is insufficient corpus for such a project, for it has been adequately demonstrated that long texts do not necessarily show-up new words". (Lawton 1963, 139, quoted from Samarin 1967, 46). For preparation of a dictionary of an unwritten language the lexicographer should have a knowledge of the life and culture of the people. For eliciting words in an unwritten language word list especially of a neighbouring language and other elicitation instruments might be utilized. But in making use of a wordlist its following limitations must be taken into consideration.

(a) The lists may contain a good number of lexical units which are quite unknown, not infrequently, even irrelevant, to the native language situations. Words like gulf, sex, hyena, harnia, opthalia, sapphire, niche, sash, ivory, asafetida and several others were not known to many informants in Jaintia, a dialect of Khasi. The general response for querries about such items is either 'I don't know' or 'there is no word like this'. In such situations, asked repeatedly, the informant either gives generic words for specific objects or tries to coin what we may call emasculated equivalents. In Jaintia, there is only one word khlor for stars, planets and all heavenly bodies. When asked to give equivalent for Jupiter, Venus or Neptune all that the informant gives is khlor. In the same way the admiral gets its equivalent as wahE? chipaaii (lit. big soldier) gun powder bam suloi (lit. food of the gun) picnic bam khana (lit. eat food), breakfast, jastep (lit. morning meal), lunch ja sngi (lit. day meal) and dinner jammed (lit. night meal).

(b) Some of the newly coined lexical items are very artificial. Their artificiality can be tested by getting them checked with other speakers of the language. In some cases the native speaker confesses total ignorance of a lexical item, in some there is agreement with reservation and in others quite different words are cited.

(c) There is a total unawareness of local environment and objects in such lists. Malto has the following words for 'mushroom':
naqlo, dule kora, teele kuttto, tupo, taakno, jibra, kuta pura gejo, edroosdu, peetqo pot´lo, mookrooosdu and some others.

No word list either of Hindi, or of Bengali or of English may contain these words. Every speech community has its own lexicon. The flora and fauna differ form place to place and so do the customs and the rituals. The richness of flora and fauna, the varied uses of the flora and the different lexical units to signify them are difficult to find in any word list. For example bamboo has a great signification in the life of the tribals of India, specially those of North Eastern part, as can be seen from the following lexical units in Angami.

kerie 'bamboo' (generic)
khopri 'a common type of bamboo used for the construction of houses etc.'
vu_prie 'a kind of bamboo used for making ropes, basket etc.'
vu_ni 'the biggest type of bamboo used for walling purposes'.
kuccierie 'a king of bamboo used for making post etc.
riinyu~/rutsu ' a kind of bamboo used for constructing granery of rice etc'.
luouu_ 'a kind of bamboo used for making flutes'.

This can be compared to different names associated with the products and uses of the item illustrated here by the following lexical units in Kokborok, a language spoken in Tripura.
wa (n) 'bamboo'
wakkitor 'crooked bamboo'
wajar 'a variety of bamboo'
wakhum 'bamboo earring'
wakolok 'a long bamboo lamp'
watuy 'a ring of bamboo'
wathuy 'a variety of bamboo'
wathop 'bamboo decoration'
wamlang 'a variety of bamboo'
wamlik 'a variety of bamboo'
wasun 'a name give to bamboo'

Elicitation of such lexical items becomes very difficult in these languages. In elicitation of data from such lists as noted earlier, many objects which are an integral part of the life and culture of a people are likely to be missed by a lexicographer.

In some cases the initial glosses may not give sufficient clue to identify the contrasting semantic features of the lexical unit. The possibility of interchange of words for animals and human beings is not rules out e.g., pregnant in Malto has two equivalents qabnii and kocitaanii, the former used for animals and the latter for human beings. An initial gloss 'those' may be inadequate to bring out the contrast between the following words in Jaintia:

kitu 'those there (near)'
kitay 'those there (at a distance)'
kita 'those there (not seen)'
kitey 'those there (up there)'
kiti 'those there (down there)'

Angami has the following words related to 'wine'.

Zhu 'Angami wine made of rice, rice beer'.
Zhutlo 'Angami rice bear'.
ruohi 'a kind of Angami wine'.
Khe 'a kind of Anagami wine'.
Zhuhaelu_ 'a kind of Angami wine'.

The initial gloss 'wine' may not be sufficient for eliciting all these words.

How can a lexicographer ensure maximum elicitation for a dictionary of such languages? A source book of encyclopaedic nature, a book on basketry, a book on flora and fauna and any other book with pictures may serve the purpose well. If the words in the list are grouped in grammatical classes and semantic domains the lexicographer may find it easy to elicit the lexical items he is looking for.

A list of basic words belonging to different semantic domains and grammatical classes, some of which are listed below, may be tentatively prepared for elicitation of data for unwritten languages.
(1) Nature- earth, water, sky, events, geographical and astronomical items, directions, winds, weather, seasons, etc.
(2) Mankind - sex, family, relationship, body parts, bodily functions and conditions, diseases and cures,
(3) Clothing and personal adornments,
(4) Food and drink - methods of preparation,
(5) Dwelling - part of the house, furniture etc.,
(6) Cooking utensils, tools, weapons, etc.,
(7) Flora and fauna - (including parts of animal anatomy diseases, cures etc.)
(8) Occupations and professions - equipments, rituals and customs connected with them,
(9) Road and transport,
(10) Sense perception,
(11) Emotions, temperamental, moral and aesthetic (includes insults, curses etc.).
(12) Government, war, law,
(13) Religion,
(14) Education,
(15) Games and amusement, entertainment, music, dance, drama,
(16) Metals,
(17) Numerals and system of enumeration,
(18) Measurement of time, space, volume, weight, quantity,
(19) Function words including classifiers,
(20) Fairs, festivals, customs, beliefs etc.
(21) Verbs:
(a) Physical activity
(b) Instrument verbs
(c) Verbs of fighting
(d) Music verbs
(e) Motion verbs
(f) Occupational verbs
(g) Culinary verbs
(h) Cosmetic verbs
(i) Communicative verbs
(j) Stationary verbs
(k) Cognitive verbs
(l) Sensory verbs
(m) Emotive verbs
(n) Other verbs.

This list is by no means exhaustive6. It might be treated as a sort of reference point and the related words in the semantic domain might be elicited on the basis of this list. For example while collecting words about agriculture, words about the different agricultural products, the sowing and harvesting time, rituals and ceremonies connected with them, names of the different parts at different times of growth of these products and the verbs connected with different actions connected with them may be elicited. We may take another example. While eliciting words for teeth the informant may be asked to give words for different types of teeth, diseases of teeth and cures for them.

It should not be understood from the foregoing statements that the elicitation through word lists is a very complete and perfect method. The role of collection and elucidation of the lexical units from different types of texts providing greater contextual possibilities should in no way be undermined. The data collected by word lists might not be adequate specially from the semantic point of view. The different meanings of the lexical unit may not be determined and demonstrated if collection is done of the isolated lexical items and the dictionary may suffer from the short-comings pointed out by Samarin. "The chief failure of a field dictionary is that it indicates not so much the meaning of words, but the fact that they exist. They do not define, they document". (Samarin, 1967, 208).

The data for a dictionary collected from the word list as noted above should be supplemented by data from different types of discourses, some of which are listed below. (Samarin 1967, 208).

Narrations: eye witness accounts, reminiscences, instructions on how to perform certain tasks or how to get to certain destination.

Conversations: Arguments, dialogues over 'where have you been',
Songs : Lullabies, dirges, dance songs,

Folk tales: legends, how things come to be, amusing stories, Proverbs and Riddles,

Names: personal, topographic, village,
Pseudo-onomatopoetic calls of animals or birds.

The collection of data in an unwritten language has many problems some of which are given below:

One of the most vital problems of collection of lexical units is the segmentation and identification of a word from the phonetic continuum of the texts. In written languages which have some tradition of grammar, there are certain devices and fixed criteria to identify a word. a word in written languages is generally identified as a meaningful unit, a cluster of sounds or letters written between spaces, or with potential pauses. In written languages there is no such device. The lexicographer has to analyse the data, make a grammatical analysis of the language and fix the word and the lexicographic unit.

The determination of the lexicographic word is a more ticklish problem for languages which are of isolating-agglutinating type, e.g., Khasi an Austric language and many Tibeto-Burman languages in India. In these languages the whole grammatical process involves prefixation and suffication of morphemes. Any number of words may be derived by merely juxtaposition of various morphemes to the root or the stem. The grammatical system is very simple for a lexicographer. What items should be included and in what way has always to be viewed carefully. Many lexical units formed by this way might be treated under both the main root and prefixes, the former is necessary from the semantic point of view, the latter from the point of view of alphabetical arrangement e.g. Khasi. Nong thaaw aayn. 'legislature' might be treated both under nong 'agentive marker' and aayn 'law'.

The problem of collection and selection of set-combinations and compounds is no less intriguing for a field lexicographer. For written languages, besides getting some clues of solid and hyphenated spelling about compounds, the lexicographer has at his disposal enough data wherein he comes across many occurrences of such units. This helps him in determining set expressions. For unwritten languages the field lexicographer has to collect different contexts with varied collocations of words to fin out compounds and set expressions.

In elicitation of words from glosses there is always a possibility of not getting an appropriate word. For example with a gloss 'blow' one gets a word p?ut in Jaintia for 'blowing flute' but there are different other words denoting the meaning of 'blow', e.g.

slu 'blow (mouth)'
be? 'blow (wind)'
s?er 'blow (nose)'

Similarly the following words in Jaintia with shades of differentiation might not be elicited by the simple gloss 'break'.

pnkhan 'break (stick)'
pya? 'break (bottle)'
tkuc 'break (rope)'

A great difficulty in elicitation of data for unwritten languages is presented by anisomorphims7. Two or more than two words for one object may be found in the source language for which there is only one gloss. Some examples are given in the preceding section. We may examine some more of them.

In Shina there are two words hankal and kutsur for the upper and lower parts of latch expressed by latch in English. English razor has two equivalents in Jaintia yuukhi and rati? and frying pan has kharai and talai in the same language. The words basket and spear have several equivalents in many languages.

Even when some object is in the sight of the informant, he might not identify it for the gloss and give another related word. When pointed to curry and asked to give word for it the Jaintia informant may provide mnchit 'curry juice' and not yute? the word for 'curry'.

Identification and determination of meaning in respect of flora and fauna and culture bound words is a problematic area for the lexicographer of an unwritten language8. For culture bound words mere one word gloss may not be adequate. The whole cultural information related with the word is to be provided. The dictionary of an unwritten language, especially a tribal language, is not merely a linguistic dictionary. It is more of an ethnographic dictionary with a considerable amount of encyclopaedic information in it. So for items with cultural significance data should be collected about the culture also. For Khasi s?iem the one word equivalent 'king' may not give the cultural significance. It should be accompanied with full cultural description.

For Angami keciesu a one word gloss 'a ritual' would not give the desired information. It should be accompanied by the following cultural information:

'Practice of the son of the deceased dragging a boulder in his father's remembrance if his father has died after four Sha's'.

For a good dictionary it is not only the collection of the number of words that matters. All the different meanings in different contexts should also be given. For this the lexicographer should record as far as possible all the linguistic and physical contexts9. For this the data should be thoroughly and systematically collated. The experts should be compared and similarities and dissimilarities in usage of lexical units noted to mark the different meanings of the word. the informant may be asked to produce examples of the collocational possibilities of a particular lexical unit. He may also be asked to provide synonyms and antonyms which will give additional help in the determination of meaning.

Function words should be given special treatment in the collection of data. Because of their frequency of use they have diverse and varied meanings. larger number of contexts need be scrutinized for getting their varied meanings and uses.


4.3. Selection of Entries: vrhaspatirindraaya divyam vars??a sahasram praatipadoktaanaam sabdaanaam sabdapaaraavan?am provaaca naantam jagaama

'vr?haspati taught Indra vocabulary of oral recitation for one thousand divine years, but was endless'. (Mahaabhaas?ya-Paspasaahniika'. P. 43)

This may sound poetical and highly improbable yet it is significant in that it points to the unlimited number of words of a language or the potentiality of the language to create new words. As the number of words is always increasing no dictionary, however voluminous it may be, can claim to include all the lexical units of a language with all their meanings, sub-meanings and collocational possibilities. A lexicographer has to make selection of entries for his dictionary. The general nature of the lexicographic word has been discussed elsewhere10.

The selection of entries is determined by various factors viz., size, type and purpose of the dictionary, the status and formal variation of words and the different local and social variations in the language. The lexicographer has to consider specially the following types of lexical units besides others.

(1) Neologism: the lexical stock of a language does not remain static. New objects and concepts are introduced in the speech community. These objects and concepts are expressed in the language by different ways:
(1) new words and expressions are coined, (2) new meanings are given to the existing words, and (3) the words are borrowed from other languages11.

The new words and expressions are coined for different themes ranging for day to day fashion to the nuclear warfare. "Any word or set expression formed according to the productive structured patterns or borrowed from another language and felt by the speakers as something new is a neologism". (Arnold. 1973. 232). Some of the neologisms are eventual and ephemeral. They are born today and die tomorrow. The neologisms are some conditions and as soon as these conditions disappear these words also loose their existence. But generally they have longer life. From individual and occasional usages they become social and frequent. Gradually they are stabilized and become part of the language. The lexicographer's problem is whether he should enter all such lexical units in the dictionary or not.

It depends on the size and type of the dictionary. Bigger reference dictionaries may include all such neologisms. Smaller and abridged dictionaries may not have such a scope.

As a matter of fact, till the words become a part of the language their inclusion may be doubted. But when they are in a transitory stage, i.e. they are on way to gaining their place in the language, what should be done with them? For example after 1967 a particular phrase aayaaraam gayaaraam used for persons who changed their political loyalities frequently came into usage. Should a dictionary include this word? A word rephujii lataa was introduced in Tripura with the coming of refugees from erstwhile East Pakistan (now Bangla Desh) after 1947. This is a creeper unknown earlier but now very popular (because of its luxurious growth). Should such a word find place in the dictionary but suitable labels (like frequent, rare etc.,) should be employed with them to mark their currency.

(2) Obsolete and Archaic words: As new words are born in a language, some words die also, although their number is smaller than the new words. Some concepts and objects become outdated. Words and expressions for them are dropped out of the language in course of time. Such words are called obsolete words. Similar to these words are another class of words, which are no longer in general use, but have not become completely obsolete. Such words are archaic or pracaaralvpta. How many of such words should find place in a general purpose dictionary? In a dictionary based on purely contemporary language there may not be scope for the inclusion of such words. But in a general purpose dictionary whose aim is also to help understanding texts of the earlier language viz., Kabir, Tulasi in Hindi, Namdev in Gujarati, Shakespeare in English etc. the dictionary should include such words, because some of the words and phrases used by the writers are commonly used by people. Many idioms and proverbs contain words which have become obsolete and archaic in the general language. Should such words be included in a dictionary or not?

(3) Technical Terms: It is a debatable point if all the scientific and technical terms can find place in a general purpose dictionary. The influx of technical terms in a language is quite considerable. Every day either a new technical terms is being coined or new meanings are attached to the old words. Merriam and Company has nearly four lakhs terms for Chemistry alone. How can all these terms be included in a dictionary? The Webster's III has in all four lakh and fifty thousand words only.

A problem related to the technical terminology is about the commonness and uncommonness of some terms. Some terms are very artificial and ambiguous. Should these terms be included in a dictionary and be given preference over their common counter parts. For example almost all the Indo-Aryan languages have a word bansii, 'fishing hook'. A technical term aakhet?adand?a is coined for this term. This latter term is very uncommon and also ambiguous. It may mean even a 'hunting stick' to some speakers. Sometimes more than one terms are used for an object or concept. Which one should be included in a dictionary?

(4) Proper names: A lexicographer is faced with the problem of the inclusion of proper names in his dictionary. Proper names do not form part of the language system. Their inclusion in monolingual dictionaries has been questioned by many scholars. But some proper names in course of the history of a language attain a special significance. They form an integral part of the cultural life of the people. Any information about them is a help in the interpretation of the cultural life to outside would. Let us take the word gangaa. Ganga is a river, intimately connected with the culture of India, rather it is a very vital part of the Indian culture. The name appears frequently in the literature of Indian languages. So this name should be included in any dictionary, which presents information on Indian theme. The national heroes, mythological characters etc. should find a place in the dictionary. Christ, Mohammed, Mecca, Bethelehem, Vetican although proper names convey something more than their mere referential meaning. They should find place in a dictionary.

Many a time, the proper names being used in special sense develop into general words and become a part of the lexical stock of the language. We may examine the following entries in, a Hindi dictionary with the word gangaa which has attained certain special meanings of 'any river' or 'white colour'.

gangabaraar. 'the land formed by the shifting of the current of a river (gangaa=river)'

gangasikasta. 'the land eroded by any river'

gangaajamnii 'mixed, white and black (cf. The meaning white and black of Ganga and Yamuna respectively, based on the colour of their water)'.

cf. Bengali gangaa yamunaa- (2) 'of white and black colours', 'mixed with gold and silver'.

Gangaala 'a pot for storing water'. (ganga= water)
Cf. also

Banaarasii. An adjective from Banars emans a 'sari', and Magahii from Magah (ÐMagadha) means 'betels'.

When proper names are related to common nouns they can be given in a dictionary. Shcherba gives the example of khlestakov, a character of Gogol's Revizor, as an impudent lier and a flap. The word has lost its specificity and has a derivative xlestakov-scina (Srivastava 1968 p. 122; Zgusta 1971 245.) In Hindi we have kumbhakarn?a for a man who sleeps too much with its derivative kumbhakarn?ii nidrra 'the sleep of the type of kumbhakarn?a In Bhagiirathaprayatna the name bhagirratha has lost its specificness and is used for 'great perseverance'.

Another point to be noted about proper names is this that if some derivatives from proper names are included in the dictionary, the proper names must be included in the dictionary. e.g. Webster's Seventh New Collegiate has the entry.

Muhammadan adj. of or relating to Muhammed or Islam.

But Muhammad is not an entry. The meaning of Muhammadan is derived from Muhammad. In such cases it is desirable to include such proper names even though the general policy may be against their inclusion in a dictionary.

(5) Empty words: some words occur in certain constructions only they are not used independently e.g. Hindi aas used only in aas paas 'nearby'

ar?os used only in ar?os par?os 'neighbourhood'

ajaayab (pl. of ajab 'strange') used only in ajaayabghar 'museum' and ajaayabkhaana 'a curio collection centre.
aamne used only in aamne saamne 'face to face'.

This type also includes words which are used in some collocations only e.g. fro only to and fro. Such words should be included in the dictionary with suitable cross reference and specific indication of their peculiarities of occurrence e.g.

Hindi aas see aas paas
Eng fro see to and fro

(6) Affixes: Should the dictionary enter all the prefixes and suffixes or only a few of them. As a matter of fact, all the productive prefixes and suffixes should find place in a dictionary. e.g.
English anti-, mal-, Skt prati-, anu-,Hindi su-, ku-, -pan Khasi nong-'agentive marker', jing- 'prefix for forming abstract noun and noun of action'.

(7) Function words: As the function words have no referent or denotatum they have no lexical meaning. They have only functional or relational meaning. so it may be contended whether they should be included in a dictionary or not. Their inclusion in a dictionary can be pleaded on the following grounds:

(1) The function of relation or function words especially in all their collocations cannot be predicted.
(2) A word may have lexical meaning in one context and grammatical in other. e.g. Eng. does in
He does not like to hear this argument.
He does his works in time.
And Hindi laayaa in
Vah kitaab nahiiN laayaa 'he did not bring the book'.
Vah bazaar se sar?ii sabjii ut?haa laayaa.

'he brought rotten vegetables from the market'

(3) The function words have greater occurrence in the language than content words. Because of the frequency of their use they have larger variety of functions and greater collocational possibilities. Hence there is a greater scope of sense discrimination in them12.

(8) Compounds: In the process of their formation and type compounds differ from language to language. Compounding involves joining of more than one stems/affixes either free of bound forms. In compounds the components are integrated and function as a single lexical unit in a sentence. The chief characteristics of compounds are their indivisibility (no word can be inserted between its elements) and the specific order of the components so rigidly fixed in the arrangement in which they follow each other than no element can be reversed. Semantically the meanings cannot be generally, but not exclusively, derived form the sum total of the meanings of the components. In a book of this type there is no scope for going into details of the formation and types of compounds. Our concern is to consider whether all the compounds can be entered in a dictionary. While entering them in the dictionary some peculiarities of compounds should be kept in view by the dictionary maker. These peculiarities relate to the formal and semantic characteristics of the compounds.

Formal Characteristics: formally the components entering into a compound are united phonetically or graphically or both. They are characterized by unity of stress or intonation or solid spelling, hyphen13 etc.
(a) Some of the components do not undergo any morphophonemic change while entering into a compound e.g.

Hindi cir?iimaar 'flower'
sahabaasii 'one who lives in a city'
niikamal 'blue lotus'

Bengali kaalaraatri 'the night on which death or some calamity occurs'.
balipus?t?a 'crow'

Skt. arun?anetra 'red eyed'
ghoraruupa 'of a frightful appearance'

Marathi pEsaapaavalii 'as cheap as dust'

Malayalam anangakriid?aa 'amorous sport'

(b) Some components undergo morphophonemic changes while forming a compound: e.g.

Hindi hathakar?ii 'hand cuffs'
pancakkii 'a water mill'

In some languages the morphophonemic change is so very significant that the components loose their formal identity. Such compounds are very common in Khasi e.g.

langbrot 'sheep'< blang, 'goat'
?erlangthari 'whirlwind' < l?er 'wind'.
theysotti 'vergin' <knthey 'female', 'woman'

Since the compounds function as lexical units they are naturally candidates for dictionary entry. But longer compounds with many components as found in Sanskrit cannot be entered in a dictionary. The Sanskrit Dictionary (Poona) includes compounds with two or three components. Longer compounds are avoided.

As for the first group of compounds the lexicographer does not face any problem. The second group has to be carefully scrutinised because of the change in their shape.

Semantic characteristics: the compounds have the following two semantic features which a lexicographer should take in view:

(a) The meaning of some compounds is not derivable from the combined meaning of its components. The meaning of the whole is not a mere sum of the meanings of its components. One or both of the components usually loose their meaning partially. The components collectively refer to another word, e.g.

English chatterbox ' person who talks a great deal', hot-mouse ' a building for growing plants'.

Hindi dasaanana 'one who has ten faces i.e. Raavan?a
ganesa 'the chief or lord of the people', 'name of a God'.
Bengali balipus?t?a 'crow'
Basantaduuto 'cuckoo'

The whole group of Bahuvriihi compounds comes under this.

(b) in some other cases none of the components looses its meaning and the whole meaning is equivalent to the compound meaning of its elements e.g.

Hindi cir?iimaar 'hunter' (lit. one who kills birds).
niilkamal 'blue lotus'.
raajaputra 'the son of the king'
Eng. oil rich 'rich in oil'.
Beng. Pallimangal 'the welfare of the village'.

For a lexicographer the compounds whose meanings can be predicted are not so important as those whose meanings cannot be predicted form the combined meanings of its components. The compounds whose meanings can be predicted may be given as sub-entries under the entries for the first component whereas those whose meanings cannot be predicted are generally given separate entries. Selection of compounds as main or subentry, at least in certain cases, is decided on extra-linguistic considerations.

(9) Set expressions of words or multiword lexical units: Also called phraseological units, set combinations of words or set expressions are word groups consisting of two or more words whose combination is integrated as a unit with or without specialized meanings of the whole e.g.

Hindi nau do gyaarah honaa 'to make good ones' escape'
aaNkh dikhaanaa 'to be angry'

Bengali cokhe aangul diye dekhaano 'to make clear by use'.
cokherbaali (fig) 'a man who is sore to the eyes'

English fall out, give in, cut no ice, bread and butter.

The set expressions are to be distinguished from the free combinations. In free combinations words are combined to express different meanings in different ways. e.g.

Hindi t?hand?aa paanii 'cold water' t?hand?aa mausam. 'cold weather'
garam duudh 'hot milk' garam hawaa 'hot wind'
tej churii 'sharp knife' tej dimaag 'sharp mind'

English live happily, live miserably, live comfortably.
good man, good news, good book etc.,

Bengali bhaalo chele 'good boy' bhaalo khobor 'good news'.
bhaalo byabohaar 'good behaviour'

The free combinations are created as and when the speaker wants to communicate such ideas. They are not stable in use. The maximum communicative function of the language is performed by such expressions. But as their meanings are predictable i.e. the meanings are the sum total of the meanings of the constituents, the lexicographer does not enter them in his dictionary14.

As opposed to the free combinations, the set expressions are not created on the need of communication. They are fairly stable in use.

Let us compare the following examples:

Hindi tej 'sharp' and churii 'knife' combine to form a phrase tej churii 'sharp knife'. In this tej may be substituted by any other word like naii 'new', puraanii 'old', acchii 'good', kharaab 'bad' to form phrases naii churii 'new knife' puraanii churii 'old knife', acchii churii 'good knife', kharaab churii 'bad knife'. There is no change in the denotational meaning of churii. In the same way the word churii may be replaced by any word like talwaar 'sword', kulhaar?ii 'axe' to form combinations tej talwaar 'sharp sword', tej kulhaar?ii 'sharp axe'. Here again, there is no change in the denotational meaning of tej in combination with these words. Not only that these two constituents can also be substituted by words semantically similar to them e.g. pEnii 'sharp' (pEnii churii 'sharp knife') and caakuu (pEnaa caakuu 'sharp knife') and there will be no change in the total meaning of the combination. But in miit?hii churii 'a sweet spoken traitor', 'a cheat in friend's garb' neither miit?hii nor churii can be substituted by any other word whether synonymic or similar functionally without a change in the meaning of the phrase. Thus miit?hiichurii is a set expression.

In the same way the constituent red in red flower may be substituted by blue, white or any other word denoting colour without in anyway changing the meaning of flower. But if the word red in red tape is changed to blue or white it would mean 'a tape or ribbon of certain colour' and the total meaning of red tape 'beaurocratic method' would be changed.

Another difference to be noted between the free combination and set expression is regarding the semantic relationship between the constituents of the phrase. In the former the relationship between the constituents of the phrase. In the former the relationship is additive. Each element has greater semantic independence. In the latter the information or meaning is not additive. The constituents are fused semantically.

Besides these, the set expressions have some other characteristic features, which are given below:

(a) They are generally indivisible, Nothing could be inserted between them e.g.

Eng. red tape, good morning
Hindi gudar?ii kaa laal. 'a diamond in rags' (a precious thing in most shabby quarters)
niir? kaa panchii ' home sick'.
niim hakiim 'a quack'
Bengali chelemaanusii 'senseless'
din duupure 'in the broad daylight'

(b) They can be substituted, sometimes, by single lexical units, and thus they can be called word-equivalents.

Eng. be in a brown study 'be gloomy'
Hindi caaNd kaa t?ukr?aa - sunder 'a beauty'
cal basnaa - marnaa 'to die'.
caltaa purjaa - caalaak 'cunning'.

The set expressions have the same function in a sentence as a lexical unit. So they are included in the dictionary.

(10) Proverb15: Proverbs resemble set expressions in many respects. They are traditional. Their constituents cannot be interchanged nor any element can be usually inserted in them. They are usually formed on set expressions. But can all the proverbs be included in a dictionary? The proverbs contain many words which are not found in the current language. They provide information about the cultural milieu of the speech community. A bigger dictionary may include them but not all dictionaries. Proverbs are not lexical units in the same ways as the set expressions and the compounds. They are group of lexical units. So it depends on the scope of the dictionary to include them.

(11) Quotations, clichés etc.: Quotations are different from proverbs. They are taken from literature but gradually, by constant use, they become part and parcel of the language and their source is forgotten. As a matter of fact the proverbs themselves have basically the character of repeated quotations. (Zgusta 1971, 153) Clichés are quotations which have become 'hackneyed and stale' or stereotypes. The question of their inclusion depends on the scope of the dictionary. A general purpose dictionary may not include them.

(12) Acronyms and Abbreviations: Some of the acronyms, abbreviations and clippings become very much the part of the language. The usual practice is to present them in appendices. But some abbreviations can be presented in the main body.

The selection of lexical units on the basis of social variations depends on the scope of the dictionary. the general purpose dictionaries may include the colloquialisms although normative dictionaries do not enter them. In the dictionaries using more of oral literature and unwritten texts the possibility of inclusion of colloquialism is greater than the dictionaries which are mainly based on written literature16.

Similarly a dictionary with a normative character might not include words pertaining to slangs, taboo etc., Smaller dictionaries and dictionaries for learners also to not have scope for their inclusion.

This much about the formal characteristic of the lexical entries. How to decide the number or density of entries in a dictionary? What are the criteria which help in the selection of entries? One very common and widely accepted criterion for selection of entries is the frequency of the lexical items. Frequency counts are specially made a basis for the selection of entries in a learner's dictionary, because they provide the vocabulary minimum criterion for selection of entries. There are many limitations of it.

(a) There are not many frequency counts especially in Indian languages. Whatever frequency counts are there, they are based on a very limited corpus. Many common words are not found in them. For example in Phonemic and Morphemic frequency in Hindi some basic words like akar?naa 'to be stiff', akhaar?aa 'a wrestling arena' pakauraa 'fried saltish vegetable stuffed gram flower preparation' are not included. It depends on the corpus on which the frequency count is based. Many words of daily use may not be frequent and be found in the corpus.

(b) For larger dictionaries, frequency counts cannot be made the basis for selection of entries from practical point of view. It will involve analysing a large and unwieldy data.

Frequency count may be judged from another point of view. Some words are used only once or twice by some eminent writer of a language while some others may be quite frequently used by lesser known writers, in cheap periodicals and spy thrillers etc. What should be the criterion for their selection in a dictionary? it depends on the scope and size of the dictionary. a normative dictionary may not include all such words of the second category. The reference dictionary may do so. Moreover, if only certain styles are preferred over others many words and expressions may not find place in it.

TOP

NOTES

1. For the dictionary of a dead language the collection of data may be over at one stage and need not continue as in the case of living languages in which possibilities of the creation of new words and meanings are not closed as in the case of dead languages.
2. Only the academic points have been taken into consideration. The financial and organizational matters are not discussed.
3. From Sanskrit Dictionary (Poona).
4. Collection of dialectal materials from novels and poems may sometimes give distorted idea of the dialectal form, since the novelist or poet (by his untrained faculties) may create imitative form, which may not be there in the dialects.
5. Samarin (1967, 61) gives the following factors which are correlated with speech diversity each of which should in some way be represented in a god linguistic corpus, age, sex and social class or occupation of the speaker, speaker's emotion, speed of utterance, topic, type and style of discourse.
6. Nida (1975, 178-186) gives a very detailed list of semantic domains grouped into 4 heads I Entities, II Events, III Abstracts and IV Relational with a large number of sub heads.
7. Difference between two or more language in their phonological, grammatical and semantic structure.
8. For meaning and definition of flora and fauna see Chapter 5.
9. Malinowski emphasizes the role of three kinds of contexts: the context of culture, the context of situation and the context of language for the study of words (The problem of meaning in primitive languages in Ogden and Richards 1952, 305).
10. Setting of entries 'head word'.
11. See Bull, William 'The use of vernacular languages in education' in Dell Hymes ed. Language in Culture and Society 530.
12. See also 5.2.
13. See Introduction to Oxford English Dictionary for hyphenated compounds p. XXXIII.
14. Some dictionaries especially those for learners give free combinations also, but they also do not give all possible collocations.
15. A proverb is a short familiar epigrammatic saying expressing popular wisdom, a truth or a moral lesson in a concise and imaginative way.
16. Sledd and Ebbit (1962, 50 ff) has a number of articles on the controversy on the inclusion of such colloquial words as 'a'nt' in the Webster's III.