การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น

ณิชาภัทร ปิ่นโพธิ์

Please use this identifier to cite or link to this item: http://kb.psu.ac.th/psukb/handle/2016/18177

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	นิวรรณ วัฒนกิจรุ่งโรจน์	-
dc.contributor.author	ณิชาภัทร ปิ่นโพธิ์	-
dc.date.accessioned	2023-05-16T08:51:01Z	-
dc.date.available	2023-05-16T08:51:01Z	-
dc.date.issued	2020	-
dc.identifier.uri	http://kb.psu.ac.th/psukb/handle/2016/18177	-
dc.description	วิทยานิพนธ์ (วท.ม. (วิทยาการคอมพิวเตอร์))--มหาวิทยาลัยสงขลานครินทร์, 2563	en_US
dc.description.abstract	Recently, social media users can comment with texts to describe their opinions. These texts can be analyzed to classify them into positive and negative directions. Before creating classifier, the feature vectors for representing the texts must be prepared firstly. Generally, texts are represented by vectors of weights or frequencies of terms that appear in the text. The number of dimensions of vector is equal to the number of terms in the dictionary derived from the possible words in all texts. The large amount of words in dictionary leads to the high dimensional vector for representing text and bring about the long processing time to training and testing the text classification models. This thesis proposed two methods for representing texts including V4D and V8D which are the low-dimensional vectors. The set of positive and negative words were considered to create the vectors. In addition, the feature vectors were derived by using the words of negation which have the significant meanings in a classification of text opinions. In this thesis, four classification techniques including k-Nearest Neighbors, Naive Bayes, Artificial Neural Networks and Support Vector Machine were studies to classify the opinion texts. By experimenting on eight data sets with various domains, the proposed vectors, including V4D and V8D, were compared with the traditional vectors, including TF and TF-IDF in the view of the performances when they were applied to the classification problem. The experimental results show that the proposed vectors for representing text can improve the performance of opinion text classification and provide the best efficiency in the terms of used space and processing time.	en_US
dc.language.iso	th	en_US
dc.publisher	มหาวิทยาลัยสงขลานครินทร์	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Thailand	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/th/	*
dc.subject	เวกเตอร์วิเคราะห์	en_US
dc.title	การสกัดคุณลักษณะข้อความที่มีประสิทธิภาพเพื่อการจำแนกขั้วความคิดเห็น	en_US
dc.title.alternative	Efficient Text Feature Extraction for Opinion Polarity Classification	en_US
dc.type	Thesis	en_US
dc.contributor.department	Faculty of Science (Computer Science)	-
dc.contributor.department	คณะวิทยาศาสตร์ ภาควิชาวิทยาการคอมพิวเตอร์	-
dc.description.abstract-th	ในปัจจุบัน ผู้ใช้สื่อสังคมออนไลน์สามารถที่จะแสดงความคิดเห็นผ่านการพิมพ์ข้อความในเรื่องที่สนใจได้อย่างอิสระ ข้อความเหล่านั้นสามารถนํามาวิเคราะห์เพื่อจําแนกหาทิศ ทางการแสดงความคิดเห็นในเชิงบวกและเชิงลบ โดยการวิเคราะห์หาทิศทางความคิดเห็นจะต้องสร้าง เวกเตอร์เพื่อใช้เป็นตัวแทนของข้อความก่อน วิธีทั่วไป คือ การแทนข้อความด้วยเวกเตอร์แสดงค่า น้ําหนักหรือค่าความถี่ของคําที่มีจํานวนมิติเท่ากับจํานวนคําศัพท์ที่มีอยู่ในพจนานุกรมที่ประกอบด้วย คําศัพท์ทั้งหมดที่สามารถมีได้ในข้อความทั้งหมดที่พิจารณา ถ้าคําศัพท์มีปริมาณมาก จํานวนคําที่มีอยู่ ในพจนานุกรมจะเพิ่มขึ้น ทําให้เวกเตอร์แทนข้อความที่ได้นั้นจะมีขนาดใหญ่ตามไปด้วย ซึ่งจะทําให้การสร้างและใช้โมเดลในการจําแนกขั้วความคิดเห็นต้องใช้เวลาในการประมวลผลที่นาน วิทยานิพนธ์นี้ ได้นําเสนอการสกัดคุณลักษณะแทนข้อความในรูปของเวกเตอร์ 2 รูปแบบ คือ เวกเตอร์ V4D และเวกเตอร์ V2D ซึ่งเป็นเวกเตอร์ที่มีมิติน้อย โดยมีการพิจารณา คุณลักษณะที่ได้มาจาก ค่าน้ําหนักคําเชิงบวกและเชิงลบที่ปรากฏในข้อความ นอกจากนี้ยังได้มีการ พิจารณาคุณลักษณะที่ได้จากคําศัพท์บอกการปฏิเสธซึ่งมีความสําคัญต่อความหมายของข้อความและ การจําแนกขั้วความคิดเห็น เวกเตอร์แทนข้อความที่ได้นําเสนอจะถูกใช้เป็นข้อมูลนําเข้าเพื่อสร้าง โมเดลในการจําแนก ซึ่งในงานวิทยานิพนธ์นี้ทําการศึกษาการสร้างโมเดล 4 วิธี ได้แก่ วิธี k-Nearest Neighbors Naive Bayes Artificial Neural Networks ass Support Vector Machine จากการทดลองบนชุดข้อมูลข้อความแสดงความคิดเห็นที่มาจากหลากหลายของโดเมนจํานวน 8 ชุด ข้อมูล เพื่อเปรียบเทียบประสิทธิภาพของการสกัดคุณลักษณะในรูปแบบของเวกเตอร์แทนข้อความที่ เสนอ ได้แก่ เวกเตอร์ V4D และเวกเตอร์ V8D กับการสกัดคุณลักษณะในรูปแบบของเวกเตอร์แบบ ดั้งเดิม ได้แก่ เวกเตอร์ TF และเวกเตอร์ TF-IDF ซึ่งได้ถูกนํามาเป็นข้อมูลนําเข้าในการสร้างโมเดล สําหรับจําแนกขั้วความคิดเห็น พบว่า เวกเตอร์แทนข้อความที่เสนอช่วยเพิ่มความถูกต้องในการจําแนกขั้วความคิดเห็นและให้ประสิทธิภาพในแง่ของพื้นที่ในการจัดเก็บข้อมูลและเวลาที่ใช้ในการประมวลผลได้ดีที่สุด	en_US
Appears in Collections:	344 Thesis

Files in This Item:

File	Description	Size	Format
448170.pdf		3.31 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License