การสกัดรูปแบบคำยืมสำหรับการประเมินระดับความยากง่ายของข้อความในภาษาไทย
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
มหาวิทยาลัยสงขลานครินทร์
Abstract
We read many documents every day, and it is desirable to be able to choose which documents to read more easily and quickly. In due course, not only what is written but also how it
is written are important factors for document choice. There are a variety of characteristics in how a
text is written, and one important idea is text readability, which is characterized by difficulty levels of words, phrases, grammar, etc. that are employed. Text readability for Thai documents, however, have not been extensively investigated. This research proposes a new method for text readability
assessment for Thai documents which consists of new text readability features and a new readability
assessment technique. Based on human observations of a large Thai document set, this research
focuses specifically on the use of seven different types of loanwords in Thai: 1) Pali word (P), 2) Sanskrit word (S), 3) Orthography (O), 4) Pali and Sanskrit word (PS), 5) Pali word and Orthography (PO), 6) Sanskrit word and Orthography (SO), and 7) Pali, Sanskrit word and Orthography (PSO). Employing features of these loanwords, we propose three new Thai text
readability assessment techniques and compared them with human assessment. These three
techniques count the frequency of those loanwords to cluster Thai documents into three levels: Easy, Medium, and Hard. Each technique is based on different clustering methods: 1) Document
clustering using the proportion of the total number of documents (DoC-A) 2) Document clustering using class interval calculated from the actual maximum value (DoC-B) 3) Document clustering using class interval calculated from the actual maximum and minimum values (DoC-C). Our
comparative experiment among these three techniques shows that the DoC-A technique is most
approximate to human assessment at 75% of accuracy.
Description
วิทยานิพนธ์ (วท.ม. (เทคโนโลยีสารสนเทศ))--มหาวิทยาลัยสงขลานครินทร์, 2560


