Methods for Handling Spontaneous e-commerce Arabic SMS: CATS, an Operational Proof of Concept

Authors: Maher Daoud, Christian Boitet

Polibits, 37, pp. 31-42, 2008.

Abstract: The purpose of this paper is to show that it is necessary and possible to build (multilingual) NL-based ecommerce systems with mixed sublanguage and content-oriented methods. The analysis of the sublanguage and the integration of content-oriented methods will definitely increase the accuracy and robustness of the processing. To verify this assumption, we built an experimental system as a proof of concept. The system is a SMS-based classified ads selling and buying platform. To analyze the sublanguage, we first used a web based corpus to build the basic system. A content representation language is defined to capture the meaning of a classified ad post. The semantic grammars of content extraction are coded using the EnCo. Response generation is based on semantic matching (“looking for” and “sell” posts) and reasoning and is able to handle “no answer situations”. CATS is currently deployed in Jordan by Fastlink (the largest mobile operator). Testing the content extraction component with a real noisy free texts shows a 90% F-measure.

Keywords:  Spontaneous NL interface; SMS services, sublanguages; content extraction; classified ads; Arabic processing

PDF: Methods for Handling Spontaneous e-commerce Arabic SMS: CATS, an Operational Proof of Concept, Alternative link