A language identifier for Indonesian and Malay text document

Indra, Z. and Jaafar, J. and Zamin, N. and Bakar, Z.A. (2016) A language identifier for Indonesian and Malay text document. In: UNSPECIFIED.

Full text not available from this repository.

Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

There is huge growth of online text documents in the Internet today. We can easily find documents written in languages from all over part of the just from a single click. Increasing number of online text document in Internet makes the increased availability of information on the Internet. In fact that none in the world can understand all languages of the digital documents. Hence, there is a significant need to have a language identifier to assist user to understand the information. Up to now, the language identification is more focused in European languages and still limited for Asian languages. Whilst the research of language identification for similar languages from popular languages has attracted the attention of many researchers. In this research, a new language identification for language with similar topology, Malay and Indonesian language, is proposed. The algorithm is experimented on a set of Indonesian and Malay text documents to support the limited research of language identification for Asian language. An experiment done on 100 Indonesian and Malay text documents has produced a number of satisfactorily accurate results. Â© 2015 IEEE.

Item Type:	Conference or Workshop Item (UNSPECIFIED)
Impact Factor:	cited By 1
Uncontrolled Keywords:	Algorithms; Computer programming; Computer science, Asian languages; Digital Documents; European languages; Indonesian languages; Language identification; N-grams; NAtural language processing; nocv1; Text document, Natural language processing systems
Depositing User:	Ms Sharifah Fahimah Saiyed Yeop
Date Deposited:	25 Mar 2022 07:41
Last Modified:	25 Mar 2022 07:41
URI:	http://scholars.utp.edu.my/id/eprint/30897

Actions (login required)

: View Item

Altmetric