Author: Hassan, Noha Yehia./ Title: Credibility Detection Approach in Social Media =

Search In this Thesis

العنوان

Credibility Detection Approach in Social Media =

المؤلف

Hassan, Noha Yehia.

هيئة الاعداد

باحث / نهى يحيى حسن

مشرف / محمد حسن حجاج

مشرف / غادة خريبة

مشرف / وائل حسن جمعة

الموضوع

Social media. Online social networks.

تاريخ النشر

2020.

عدد الصفحات

Various Pages :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

5/5/2020

مكان الإجازة

جامعة حلوان - كلية الحاسبات والمعلومات - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

137

from

137

Abstract

With the evolution of social media platforms, the Internet is used as a source for
obtaining news about current events. Recently, Twitter has become one of the most
popular social media platforms that afford public users to share the news. The
platform is overgrowing especially among young people who may be influenced by
the information from anonymous sources. Therefore, predicting the credibility of
news in Twitter becomes a necessity, especially in the case of emergencies.
In this thesis, we proposed four models for handling the problem of credibility
prediction and tested them over two different datasets in two languages (English, and
Arabic). The first model relies on extracting an extensive set of content and sourcerelated features. Five different classifiers are used for training with different feature
sets to determine whether content features only or source features only can be good
indicators for credibility. The best performance is achieved when using a combined
set of content and source features and applying Random Forests as a classifier. The
second model focuses on textual features and uses word-based N-gram analysis. The
experiments examine two feature representations (TF and TF-IDF) and different word
N-gram ranges. Best results are achieved using a combination of unigrams and
bigrams, 30000 TF-IDF extracted features, and Linear Support Vector Machines as a
classifier. The third model relies on semantic features extracted using Skip-Thoughts
algorithm. Finally, the hybrid model that concatenates the feature vectors of the
previous three models is proposed showing significant improvements over the three
models.
The proposed models are evaluated using five machine learning classifiers and 10-
fold cross-validation. The best results are achieved using Linear Support Vector
Machines with 85.3% Accuracy, 89.2% Precision, 91.6% Recall, and 90.4% FMeasure. Moreover, the evaluation shows a higher performance of the proposed
hybrid model in comparison with two different models existing in the literature over
the same dataset