A Review on Electronic Dictionary and Machine Translation System Developed in North-East India

Electronic Dictionary and Machine Translation system are both the most important language learning tools to achieve the knowledge about the known and unknown natural languages. The natural languages are the most important aspect in human life for communication. Therefore, these two tools are very important and frequently used in human daily life. The Electronic Dictionary (E-dictionary) and Machine Translation (MT) systems are specially very helpful for students, research scholars, teachers, travellers and businessman. The E-dictionary and MT are very important applications and research tasks in Natural Language Processing (NLP). The demand of research task in E-dictionary and MT system are growing in the world as well as in India. North-East (NE) is a very popular and multilingual region of India. Even then, a small number of E-dictionary and MT system have been developed for NE languages. Through this paper, we want to elaborate about the importance, approaches and features of E-dictionary and MT system. This paper also tries to review about the existing E-dictionary and MT system which are developed for NE languages in NE India.


INTRODUCTION
Natural language is the most important communication media for all human beings. Natural language processing is one of the interdisciplinary research areas in Computer Science, Computational Linguistic and Artificial Intelligence. It is a very attractive method for interactions between computers and natural languages. The purpose of NLP is to design and develop software that will analyze, understand, and produce speech or text of natural languages 21 . At present, it is a very demand able research area in computer science that explores how computers can be used to understand and manipulate text or speech of natural languages to do useful things. A huge amount of different applications of NLP has been developed in the world as well as in India. The most commonly used applications of NLP are Electronic Dictionary, Machine Translation, Machine Transliteration, Information Retrieval, Morphological Segmentation, Named Entity Recognition, Optical Character Recognition, Part of Speech Tagging, Parsing, Question Answering, Speech Processing, Speech Recognition, Speech Segmentation, Spelling Checker, Wordnet and Word Sense Disambiguation.
The Nor th-East is one of the most linguistically and ethnically diverse regions of India. The NE region is situated between the two great traditions of the Indic Asia and the Mongoloid Asia. The North-East India (NEI) consists of eight states which are Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Sikkim and Tripura. Each of the states has own culture, language and tradition 1,2 . The people of NEI belong to different communities and each community has different languages. Therefore, NE is also known as multilingual and multicultural region of India. In this situation, E-dictionary and MT system are very important for communication among people of the different communities. A small number of E-dictionary and MT system have been developed for NE languages in India.
In this paper, we discuss the two most important applications of NLP which are developed for NE languages in NE India, namely E-Dictionary and Machine Translation. The E-dictionary and MT systems are tremendous potential and frequently used in human daily life.

North-East Languages
The natural languages act as a bridge amongst the people and help in creating a bond among their cultures. There are about 220 spoken languages in NEI and these languages are divided into mainly three language families, namely Indo-Aryan, Sino-Tibetan and Austro-Asiatic [2]. The languages of the Indo-Aryan family are Assamese, Bengali and Nepali. The languages of the Austro-Asiatic family are Khasi and Jaintia.

Electronic Dictionary
A dictionary is a very important language learning book which contains enormous words of one or more particular languages and the words are arranged alphabetically with their meaning, part-of-speech (POS), synonyms, phonetics and examples. It is a very helpful tool for students, research scholars, teachers, travellers and other people to improve their knowledge about the various languages. The dictionary is divided basically into two broad categories, namely Paper dictionary and Electronic dictionary. The Paper dictionary is also known as hard or printed dictionary and Electronic

Fig. 1: Example of H-ABE E-dictionary
dictionary is also known as soft dictionary. The Electronic dictionary (E-dictionary) is one kind of dictionary whose data exist in digital form and can be used through different media. The E-dictionary is the most powerful tool for human to learn about the specific natural languages from anywhere place and any time using computers, smart phone and PDA. It is very convenient to use and tremendously better than paper dictionary. The E-dictionary is an important application of NLP and can be used to implement the other research tasks in NLP like machine translation, wordnet, etc.

Different types of E-dictionary
Generally, the E-dictionary can be divided into two types as discuss as follows:

Online E-dictionary
The online E-dictionary can be accessed in digital form through the Internet using web browsers from anywhere place in the world. Therefore, this dictionary is also known as Internet dictionary. It is very convenient to use if there is an Internet connection and large numbers of user can be accessed simultaneously on online.

Offline E-dictionary
The offline E-dictionary can be accessed through digital computer, personal data assistant and smart mobile phone. This dictionary can be carried and keep backup using compact disk, digital versatile disk, hard disk and pen drive. Therefore, this dictionary is also known as a According to the languages involve, the E-dictionary can be divided into three categories as below 5 :

Monolingual E-dictionary
The monolingual E-dictionary is one kind of dictionary where users can look up the meaning of word and other related information of the word like POS, synonyms and examples from one natural language to itself. For example, Assamese-Assamese, English-English, Manipuri-Manipuri and so on are monolingual E-dictionary.

Bilingual E-dictionary
The bilingual E-dictionary is one kind of dictionary where users can look up the meaning of word and other related information of the word like POS, synonyms and examples from a source natural language to a target natural language. For example, Assamese to English, English to Bengali and so on are bilingual E-dictionary.

Multilingual E-dictionary
The multilingual E-dictionary is one kind of dictionary where users can look up the meaning of word and other related information of the word like POS, synonyms and examples from one natural language to two or more natural languages. For example, Assamese to English and Bengali, English-Manipuri and Nepali, etc. are multilingual E-dictionary.

Different Techniques of E-dictionary
There are many word search techniques are available to develop the E-dictionary. Different developers use different word search techniques to look up (search) the words from the E-dictionary on both online and offline.

E-dictionary Developed in NEI
In this section, we discuss about the existing electronic dictionaries which are developed in NEI for NE languages. From the literature survey, it has been found that a large number of paper dictionaries have been compiled by many lexicographers for the maximum numbers of major languages of NEI. At present, due to expansion of computer and Internet, a small number of E-dictionary has been developed on both online and offline for the languages of NEI. In this paper, some of the electronic dictionaries which have been developed for NE languages in NE India are shown in Table 2.

Machine Translation
Machine Translation (MT) is one of the most important applications and research tasks of NLP which investigates the use of software to translate text or speech from one natural language to another natural language using computers with or without human assistance. The MT system which generates translation between two specific languages are called bilingual MT systems. The bilingual MT system may be either one direction or both directions. The machine translation is as old as that of computers and it was the first computer based applications related to NLP. The MT system generally started in the year 1950, although work can be found from earlier periods. The first non-military computers were developed in 1947, from that time the idea was proposed to translate text from a source language (SL) to a target language (TL) using a computer [14]. At present, it is a very challenging research tasks in the area of computational linguistics and NLP in the world as well as in India. The research scenario in India is relatively young and machine translation gained momentum in India only from 1980 onwards with institutions like IIT Kanpur, IIT Bombay, IIIT Hyderabad, University of Hyderabad, NCST Mumbai. The Technology Development for Indian Languages (TDIL), Centre for Development of Advanced Computing (CDAC) and Ministry of Communications and Information Technology are playing a major role in developing the MT systems [16]. The MT system is very important for human nowadays for the following reasons: Huge amount of text can be translated from one • natural language to another natural language using a MT system which is not possible by human translators. It can be used to reduce the human efforts and • to give the translation results quickly. The use of MT system can increase the volume • and speed of translation throughput. Manual translation for translating the huge • amount of text document is not only time consuming, but also need a more expense. Therefore, MT system can be used to save time and reduce cost.

Problems with Machine Translation
The machine translation is a very difficult research task in NLP due to some problems with it like Word Order (WO), Word Sense Ambiguity (WSD), Part-Of-Speech (POS) and Idioms. These problems are different between different languages. The problems of MT are discussed as below for English and Bodo languages:

Word Order
Word order is different between English and Bodo languages.

Word sense ambiguity
The same word may have different meaning or sense when being translated to another language. For example:

POS
The POS is different between English and Bodo languages. Pre-position is used in English and Post-position is used in Bodo language. For example:

Idioms
Meaning of idiom is different between English and Bodo languages. For example:

Different approaches of MT
There are various approaches of machine translation. Generally, the approaches of MT can be divided into main three categories [12,13] The SMT approach can also be divided into three categories, namely, Word Based Translation, Phrased Based Translation and Hierarchical Phrased Based Translation. The RBMT can also be divided into three categories, namely Direct MT, Transfer MT and Interlingua MT.

Example of a MT System
Let us consider, English to Assamese (E-A) is a bilingual MT system. In this system, users can translate the huge amount of text of English   [19] language (SL) into Assamese language (TL) using computers. Some examples of sentences in English to Assamese MT system are shown in figure 2.

Machine Translation System Developed in NEI
In this section, we discuss about the existing machine translation systems which are developed in NEI for NE languages. From the literature survey, it has been found that a very small number of MT system has been developed in NEI using different approaches by different developers. In this paper, some of the MT systems which have been developed for North-East languages in NEI are shown in Table 3.

CONCLUSION
Electronic dictionar y is a powerful dictionary whose data is found in digital form and can be accessed through online and offline from anywhere place using a computer, smart phone and PDA. Through this dictionary, a user can look up the meaning of word and other related information of the word like POS, synonyms and examples from a source natural language to the target natural languages. Machine translation is the process of automatic translation of text from a source natural language to another natural language using computers on both online and offline. Through the MT system, used can translate huge amount of words or phrases or sentences from a specific natural language to another natural language. The E-dictionary and MT system are the tremendously helpful for people to extend their knowledge about the known and unknown natural languages. These applications are also very important in NLP to implement other research tasks related to NLP. The main purpose of this paper is to focus on the existing E-dictionary and MT system which are developed for NE languages in the NE India. A small number of E-dictionary and MT system have been developed for NE languages in the NEI. Nowadays, some research scholars are working on E-dictionary and MT system for NE languages in India as well as in the NE region. Since, NE is a multilingual region in India. Therefore, the E-dictionary and MT system will be helpful for NE people as well as other people of India and abroad.