Comparison of Auto Text Summarization Techniques | Papers Computer Science

Comparison Between SumBasic and Seq2Seq Auto Text-Summarization Techniques

(CSCI 6515 – Assignment 4)

Tayab Soomro

Faculty of Agriculture

Dalhousie University

Truro, NS, Canada B2N 5E3

Email: tayab.soomro@dal.ca

Abhijeet Singh

Dept. of Computer Science

Dalhousie University

Halifax, NS, Canada B3H 4R2

Email: ab249429@dal.ca

Abstract—With the increasing number gadgets (e.g., cell-

phones, smart-watches, laptops, etc.), the amount of data that

is being generated per second is innumerable. It is equally

cruicial to be able to glean important information from large

amounts of data quickly and effciently. This is where text

summarizers come in picture. Text summarizers are machine

learning models which, given a large piece of document, return

a summarized version. In this project, we sought to compare

two types of text summarizers (extraction, and abstraction)

with WikiHow Dataset in order to get some insight into

what method works best for article form of documents. We

implement SumBasic as our extraction text summarizer and

Seq2Seq (i.e., sequence-to-sequence) as our abstraction text

summarizer. Using Recall-Oriented Understudy for Gisting

Evaluation (ROUGE) method for evaluation, we obtained the

ROUGE-1 F-measure of 22.6%,25.2%,39.6% for the three

documents respectively and ROUGE-L F-measure of 9.7%,

15.0%,13.19% for SumBasic for three documents respectively.

For the Seq2Seq model, we obtained the ROGUE-1 F-measure

of 19.8%,20.6%,30.0% for three documents respectively, and

for ROUGE-L we obtained the F-measure of 15.4%,14.7%,

18.9% for three documents respectively. This anlaysis concludes

that SumBasic model is marginally superior to the Seq2Seq in

terms of the ROUGE scores.

1. Background

1.1. Text Summarization

In this day and age, the amount of textual data being

generated by various technological means is phenomenal.

Just as an exmaple, there are roughly 5 million tweets sent

every day [1], and some 2 million research articles are

published each year [2]. With all this data being generated

at an unprecentad pace, it is extremely crucial to have the

ability to automatically parse these gigantic document sets

and to obtain summarized and most important information

from those documents. The applications for such tools is

endless. Search engines such as Google try to summarize the

information from a web or PDF document right within the

search resutls, eliminating the need for the users to actually

go into the webpage to find the answer manually. This is

true for general knowledge questions. For example, if you

type on Google as a search query “what is inertia?”. You

will get a card like view (shown in Figure 1) containing

the most relevant summary that it obtained from Wikipedia.

This information is generated using some sort of text sum-

marization technique.

Figure 1. Search result from Google when queried with the sentence: “what

is inertia?”.

There are various other applications for text summa-

rization techniques, such as in academia. With enormous

amounts of research articles being written and published

every day in countless journals, it is paramount for variuos

entites such as medical and industry professionals to be able

to summarize the papers and read only the important points

in order to make informed decisions.

1.2. Problem Statement

The aim of this project is to compare two different auto

text summarization techniques for sumamrizing WikiHow

articles. Two different text summarization techniques will

be used in order to compare their differences in terms of

performance on the given WikiHow dataset. The sumamries

Comparison of Auto Text Summarization Techniques, Papers of Computer Science

Related documents

Partial preview of the text

Download Comparison of Auto Text Summarization Techniques and more Papers Computer Science in PDF only on Docsity!