Repository logoRepository logo

การสกัดรูปแบบคำยืมสำหรับการประเมินระดับความยากง่ายของข้อความในภาษาไทย

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

มหาวิทยาลัยสงขลานครินทร์

Abstract

We read many documents every day, and it is desirable to be able to choose which documents to read more easily and quickly. In due course, not only what is written but also how it is written are important factors for document choice. There are a variety of characteristics in how a text is written, and one important idea is text readability, which is characterized by difficulty levels of words, phrases, grammar, etc. that are employed. Text readability for Thai documents, however, have not been extensively investigated. This research proposes a new method for text readability assessment for Thai documents which consists of new text readability features and a new readability assessment technique. Based on human observations of a large Thai document set, this research focuses specifically on the use of seven different types of loanwords in Thai: 1) Pali word (P), 2) Sanskrit word (S), 3) Orthography (O), 4) Pali and Sanskrit word (PS), 5) Pali word and Orthography (PO), 6) Sanskrit word and Orthography (SO), and 7) Pali, Sanskrit word and Orthography (PSO). Employing features of these loanwords, we propose three new Thai text readability assessment techniques and compared them with human assessment. These three techniques count the frequency of those loanwords to cluster Thai documents into three levels: Easy, Medium, and Hard. Each technique is based on different clustering methods: 1) Document clustering using the proportion of the total number of documents (DoC-A) 2) Document clustering using class interval calculated from the actual maximum value (DoC-B) 3) Document clustering using class interval calculated from the actual maximum and minimum values (DoC-C). Our comparative experiment among these three techniques shows that the DoC-A technique is most approximate to human assessment at 75% of accuracy.

Description

วิทยานิพนธ์ (วท.ม. (เทคโนโลยีสารสนเทศ))--มหาวิทยาลัยสงขลานครินทร์, 2560

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By