Sains Malaysiana 49(9)(2020): 2113-2118

http://dx.doi.org/10.17576/jsm-2020-4909-09

 

Query Translation for Multilingual Content with Semantic Technique

(Terjemahan Pertanyaan untuk Kandungan Pelbagai Bahasa dengan Teknik Semantik)

 

NORITA MD NORWAWI*, SUNDRESAN A/L PERUMAL, EMRAN HUDA & WAKA JENG

 

Faculty of Science and Technology, Universiti Sains Islam Malaysia, 71800 Nilai, Negeri Sembilan Darul Khusus, Malaysia

 

Diserahkan: 23 Januari 2020/Diterima: 1 April 2020

 

ABSTRACT

Cross-lingual information retrieval (CLIR) allows user query in a different language from the language of target resources. Thus, translation is the key element in the query processing. There are three translation approaches: query, document, or hybrid query-document. However, query translation is very challenging due to the polysemy problem. Different linguistic nature of the languages will lead to ambiguity of meaning subsequently user’s true intention could be misinterpreted. This paper presents a semantic technique on query translation for a multilingual knowledge repository to improve the query processing. Offline translated documents or parallel corpora in English, Arabic, and Malay language including Jawi text was used as the data. Set of keywords were constructed preidentified by expert related to prophetic food. These keywords were annotated with the relevant Quranic verses, Hadith texts, Manuscript text images and scientific article determined by expert. The synonym and context-based translation was annotated together with the specific keyword. A query will do a three-way pattern match based on the keyword indexing list that link to the relevant documents. A one-stop knowledge repository on prophetic food was developed as a proof of concept using sources are from al-Quran, Hadith, classical manuscript, and scientific articles verified by experts to ensure the content authenticity and integrity.

 

Keywords: Cross lingual information retrieval; one stop knowledge repository; prophetic food; query translation; semantic technique

 

ABSTRAK

Dapatan semula maklumat silang bahasa (CLIR) membolehkan pertanyaan pengguna diajukan dalam bahasa yang berbeza daripada bahasa bahan sumber sasaran. Oleh itu, terjemahan menjadi kunci utama dalam pemprosesan pertanyaan. Terdapat 3 jenis pendekatan terjemahan: terjemahan pertanyaan, dokumen atau pertanyaan-dokumen hibrid. Walau bagaimanapun, terjemahan pertanyaan adalah mencabar berpunca daripada masalah polisemi. Gaya linguistik pelbagai bahasa yang berbeza menimbulkan kesamaran makna yang menyebabkan hasrat sebenar pengguna boleh disalah tafsir. Kajian ini membentangkan teknik semantik terjemahan pertanyaan repositori pelbagai bahasa untuk menambahbaik pemprosesan pertanyaan. Dokumen sumber yang diterjemahkan secara manual atau corpora selari dalam Bahasa Inggeris, Arab dan Melayu termasuk teks Jawi digunakan sebagai data kajian. Set kata kunci telah dikenal pasti oleh pakar bidang berkaitan dengan makanan sunnah. Kata kunci ini dianotasikan dengan ayat-ayat Al-Quran teks Hadith, teks dan imej manuskrip dan artikel saintifik yang berkaitan oleh pakar bidang berkenaan. Perkataan sinonim dan terjemahan secara konteks dianotasikan juga kepada kata kunci berkaitan. Setiap pertanyaan akan menggunakan 3 kaedah pemadanan ke atas senarai indeks kata kunci yang akan menghubungkan kepada dokumen yang relevan. Repositori pengetahuan sehenti berkaitan makanan sunnah dibangunkan sebagai bukti konsep menggunakan sumber daripada Al-Quran, Hadith, manuskrip klasik dan artikel saintifik yang disahkan oleh pakar bidang untuk menjamin kesahihan dan integriti.

 

Kata kunci: Dapatan semula maklumat silang bahasa; makanan sunnah; repositori pengetahuan sehenti; teknik semantik; terjemahan pertanyaan

 

RUJUKAN

Abusalah, M., Tait, J. & Oakes, M. 2005. Literature review of cross-language information retrieval. World Academy of Science, Engineering and Technology 4: 175-177.

Agbele, K.K., Ayetiran, E.F. & Aruleba, K.D. 2018. Survey on cross-lingual information retrieval. International Journal of Scientific & Engineering Research 9(8): 484-491.

Aldhlan, K.A.Zeki, A.M. & Zeki, A.M. 2010. Datamining and Islamic knowledge extraction: Alhadith as a knowledge resource. In Proceedings of the International Conference on Information and Communication Technology for Muslim World (ICT4M). IEEEE. H-21.

Azad, H.K. & Deepak, A. 2019. Query expansion techniques for information retrieval: A survey. Information Processing & Management 56(5): 1698-1735.

Elayeb, B. & Bournas, I. 2016. Arabic cross-language information retrieval: a review. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 15(3): 1-44.

Jena, G.C. & Rautaray, S.S. 2019. A comprehensive survey on cross-language information retrieval system. Indonesian Journal of Electrical Engineering and Computer Science 14(1): 127-134.

Norwawi, N.M., Perumal, S., Sempo, M.W., Huda, E. & Jeng, W. 2019. Multi-lingual content management system for prophetic food. In Proceedings of the International Conference on Islamic Applications in Computer Science and Technologies (IMAN 2019). 27(28).

Prasath, R., Sarkar, S. & O’Reilly, P. 2015. Improving cross language information retrieval using corpus based query suggestion approach. In International Conference on Intelligent Text Processing and Computational Linguistics. Springer, Cham. pp. 448-457.

Sharma, M. & Morwal, S. 2015.  A survey on cross-language information retrieval. International Journal of Advanced Research in Computer and Communication Engineering 4(2): 384-387.

Tawil, S.F.M., Ismail, R., Wahid, F.A., Norwawi, N.M. & Mazlan, A.A. 2016. Application of OASys approaches for dates ontology. In Third International Conference on Information Retrieval and Knowledge Management (CAMP). IEEE. pp. 131-135.

 

*Pengarang untuk surat-menyurat; email: norita@usim.edu.my

 

 

 

sebelumnya