Sentiment analysis of the burmese language using the distributed representation of n-gram-based words

Myat lay phyu

Please use this identifier to cite or link to this item: http://kb.psu.ac.th/psukb/handle/2016/19011

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Kiyota Hashimoto	-
dc.contributor.author	Myat lay phyu	-
dc.date.accessioned	2023-11-02T02:51:02Z	-
dc.date.available	2023-11-02T02:51:02Z	-
dc.date.issued	2018	-
dc.identifier.uri	http://kb.psu.ac.th/psukb/handle/2016/19011	-
dc.description	Thesis (M.Sc., Information Technology)--Prince of Songkla University, 2018	en_US
dc.description.abstract	Due to the availability of people's opinions and customer reviews, the need to analyze those texts have been more important. Sentiment analysis, or opinion mining, estimates their polarity, whether they are positive or negative, using machine learning techniques. Many methods have been proposed but they assume the basic preprocessing of text data including word segmentation and word sentiment values. However, such preprocessing is not easily available for low resource languages such as Burmese, Khmer and Lao due to the unavailability of annotated big corpora and basic natural language processing tools. The objective of this research is to solve these difficulties of low resource language processing. The goal is to propose an effective and efficient method to enable sentiment analysis without considering language specific characteristics. The scope of the research is the languages without word boundaries in written text, specifically Burmese. The methodology consists of two proposals, a character-based variable-length n-gram word model and a word grouping method with word similarities calculated with distributive word representation models. The proposed method is compared with Conditional Random Field (CRF) baseline approach, which is also proposed newly in this thesis, and achieved a similar result as the CRF-based word segmentation with a small size of supervised data. The proposed method is also validated with a larger size of data using Amazon product reviews. Thus, the proposed methods in this thesis provide an effective and efficient way for low resource language processing without focusing on language specific characteristics.	en_US
dc.language.iso	en	en_US
dc.publisher	Prince of Songkla University	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Thailand	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/th/	*
dc.subject	Natural language processing (Computer science)	en_US
dc.subject	Computational linguistics	en_US
dc.subject	Sentiment analysis	en_US
dc.subject	Burmese language	en_US
dc.title	Sentiment analysis of the burmese language using the distributed representation of n-gram-based words	en_US
dc.type	Thesis	en_US
dc.contributor.department	College of Computing (Information Technology)	-
dc.contributor.department	วิทยาลัยการคอมพิวเตอร์ สาขาเทคโนโลยีสารสนเทศ	-
Appears in Collections:	976 Thesis

Files in This Item:

File	Description	Size	Format
432309.pdf		1.39 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License