Author: Taha,Amany Abdelhamid Abdelrahman/ Title: Automatic Text Summarization of Long Documents /

Search In this Thesis

العنوان

Automatic Text Summarization of Long Documents /

المؤلف

Taha,Amany Abdelhamid Abdelrahman

هيئة الاعداد

باحث / اماني عبد الحميد عبد الرحمن طه

مشرف / هدي قرشي محمد

مناقش / علياء عبدالحليم عبدالرازق

مناقش / محمود ابراهيم خليل

تاريخ النشر

2024.

عدد الصفحات

103P.:

اللغة

الإنجليزية

الدرجة

ماجستير الهندسة

التخصص

هندسة النظم والتحكم

تاريخ الإجازة

1/1/2024

مكان الإجازة

جامعة عين شمس - كلية الهندسة - كهرباء حاسبات

الفهرس

Only 14 pages are availabe for public view

from

163

from

163

Abstract

Automatic text summarization (ATS) for long documents is a very
challenging task. A long document includes more than one topic,
so it is required to generate a summary that covers the most
important information of different topics in the input document.
ATS has attracted scientific researchers since the 1950s, and there
are many ATS techniques, but the generated summaries are much
less accurate than the human summaries. Accordingly, the
automatic summarization task is considered one of the most
challenging tasks in Natural Language Processing, especially for
long documents. ATS has different approaches like extractive,
abstractive, and hybrid approaches. This thesis focuses on the
extractive approach based on Neural Networks. It enhances an
existing extractive Neural-based attentive model to summarize
long documents using two different methods. The existing model
uses a bidirectional Gated Recurrent Unit (GRU) as sentence
encoder and the word embeddings averaging as word encoder. It
also uses a Multi-Layer Perceptron (MLP) that consists of one
linear layer with ReLU activation function for dimensionality
reduction. The first proposed method is to replace the bidirectional
GRU with two bidirectional GRUs as word and sentence encoders.
Then, two ensembles are proposed by applying an average
ensemble to the proposed model with two different Neural-based
models, separately. The second proposed method is to replace the
MLP with another that consists of two linear layers and ReLUs.
This thesis uses ROUGE-1, ROUGE-2, and ROUGE-L metrics to
evaluate the two proposed methods. The first method is evaluated
on the PubMed dataset. The evaluation results of the first method
on the PubMed dataset show promising improvements of 0.14%
and 0.97% for ROUGE-1, 0.33% and 1.12% for ROUGE-2, and
0.25% for ROUGE-L. The second method is evaluated on the
PubMed and arXiv datasets. The evaluation results of the second
method on the PubMed dataset show promising improvements of
1.20 " ~ " 1.50% for ROUGE-1, 1.29 " ~ " 1.54% for ROUGE-2, and
1.13 " ~ " 1.33% for ROUGE-L. The evaluation results of the second
method on the arXiv dataset show promising improvements of
0.10 " ~ " 1.87% for ROUGE-1, 0.10 " ~ " 1.84% for ROUGE-2, and
0.003 " ~ " 1.48% for ROUGE-L