YouTube Transcript Summarizer
YouTube Transcript Summarizer
Chrome Extension
A
Major Project Report
Submitted
In partial fulfilment
BACHELOR OF TECHNOLOGY
In the Department of Computer science and Engineering
Submitted by:
PRANJAL JAIN(18etccs073)
Department of Computer Science and Engineering
i
CERTIFICATE
This is to certify that this project report “YouTube Transcript Summarizer Chrome
Extension” is the work of “Pranjal Jain(18etccs073) who have carried out the project
work under my supervision. I approve this project for submission of the Bachelor of
Kota.
Project In-charge
ii
ACKNOWLEDGEMENT
At the end I would like to express my sincere thanks to all my friends and
others who helped me directly or indirectly during this project work.
Pranjal Jain(18etccs073)
30/05/2022
iii
ABSTRACT
1. Purpose
1.1. Introduction
This Software Requirements Specification provides a complete description of all
the functions and specifications of the Chrome Extension YouTube Transcript
Summarizer.
1.2. In this project, I have created a Chrome Extension which will make a
request to a backend REST API where it will perform NLP and respond
with a summarized version of a YouTube Transcript.
1.3. Scope
Scope of this project is :Enormous number of video recordings are being created
and shared on the Internet throughout the day. It has become really difficult to
spend time watching such videos which may have a longer duration than expected
and sometimes our efforts may become futile if we couldn't find relevant
information out of it. Summarizing transcripts of such videos automatically allows
us to quickly lookout for the important patterns in the video and helps us to save
time and effort to go through the whole content of the video.
2. Document overview
The remainder of this document is 8 chapters, the first providing introduction of
the project. It lists all the functions performed by the system. The second chapter
consists of software requirements specification. The third chapter provides details
about system analysis and design. The fourth chapter gives data dictionary
information. The fifth chapter consists of snapshots of the complete project. The
sixth chapter gives testing for the project. The seventh chapter tells about the
iv
conclusion and future enhancements of the project. The final chapter concerns with
the bibliography.
TABLE OF CONTENTS
Table of Contents
Acknowledgement..........................................................................................................................................iii
Abstract...........................................................................................................................................................iv
Table of Contents............................................................................................................................................v
List of Tables.................................................................................................................................................vii
List of Figures...............................................................................................................................................viii
2.1. Purpose............................................................................................................................................. 7
v
2.5. Software Specification .................................................................................................................... 19
vi
List of Tables
vii
List of Figures
viii
List of Symbols
Term Definition
QA Quality assurance
ix
CHAPTER – I
INTRODUCTION
1
Introduction
1.1. Purpose
1.1.1. Introduction
This Software Requirements Specification provides a complete description of all
the functions and specifications of the Chrome Extension YouTube Transcript
Summarizer.
In this project, I have created a Chrome Extension which will make a request to a
backend REST API where it will perform NLP and respond with a summarized
version of a YouTube Transcript.
1.1.2. Scope
Scope of this project is :Enormous number of video recordings are being created and
shared on the Internet throughout the day. It has become really difficult to spend
time watching such videos which may have a longer duration than expected and
sometimes our efforts may become futile if we couldn't find relevant information out
of it. Summarizing transcripts of such videos automatically allows us to quickly
lookout for the important patterns in the video and helps us to save time and effort to
go through the whole content of the video.
1.1.3. References
• Building RESTful APIs with Flask in Python BoilerPlate
• HuggingFace Transformer Python Installation
• YouTube Transcript API Documentation
• How to Perform Text Summarization using Transformers in Python
• The Ultimate Guide to Building a Chrome Extension
• How to Create Chrome Extensions
• Design the user interface
• Content Scripts
• Message Passing in Chrome
• How to use XMLHttpRequest to issue HTTP requests
• Language Translator Using Google API in Python
• Parsing REST API Payload and Query Parameters With Flask
2
1.1.4. Document overview
The remainder of this document is 8 chapters, the first providing introduction of
the project. It lists all the functions performed by the system. The second chapter
consists of software requirements specification and all the dependencies. The third
chapter provides details about system analysis and design. The fourth chapter gives
backend programming and data dictionary information. The fifth chapter is of
Chrome Extension development. The sixth chapter gives testing for the project and
browsing experience without distracting from it and proper working of the project.
The seventh chapter tells about the conclusion and future enhancements of the
project. The final chapter concerns with the bibliography.
This document is meant for describing all the features and procedures that were
followed while developing the Extension.
This document specially mentions the details of the project how it was
developed, the primary requirement, as well as various features and functionalities
of the project and the procedures followed in achieving these objectives.
YouTube Transcript Summarizer is a Chrome Extension which will help you get the
short summary of contents of a video so that one can save time and get the
meaningful value of that video they were seeking for in a faster way possible.
For example, a student is watching a video to understand some topic for his/her
study, my motto is to help the student by transcribing the audio and generate subtitle
of that video and summarize that content to make the student understand the topic
faster and simplest and easy way.Benefit is that this Extension can summarize the
content so that the student might get help in learning and making notes.
3
This Chrome Extension will transcribe audio from a video and generate the short
summary of the content and this is a very useful extension for people who are
looking for a specific video and most useful in video conferences to make
summarized notes.
The Chrome Extension will lead you to a page where their will be summary of a
youtube video(text) or a summarized audio in whichever form they require
summarization.
This system will be used in different required ways of a user, as not only
youtube video summarization but also videos from websites, video conferences
from different region with diverse language based summarization to understand the
content on their own language.
4
• Summarization in text
• Summarization in audio
• Make notes
• Understand the main highlights
• Create questionaries for exam purpose
• Quick Revision
1.2.4. Constraints
The video should not be long enough because of limited word limit criteria, else it
might throw error and not generate summary.
Multilingual support function in audio summary is not available
in Extension.
5
CHAPTER – II
SOFTWARE REQUIREMENT SPECIFICATION
6
Software Requirement Specification and Dependencies
2.1. Purpose
2.1.1. Introduction
This Software Requirements Specification provides a complete description of all
the functions and specifications of the Chrome Extension YouTube Transcript
Summarizer.
2.1.2. Scope
Scope of this project is very useful for finding the perfect content we are looking
for.
7
2.1.3. Glossary
Table 2.1
Term Definition
QA Quality assurance
2.1.4. References
8
• Content Scripts
• Message Passing in Chrome
• How to use XML Http Request to issue HTTP requests
• Language Translator Using Google API in Python
• Parsing REST API Payload and Query Parameters With Flask
It lists all the functions performed by the system. The final chapter concerns
details of each of the system functions and actions and dependencies in full for the
Chrome application developers’ assistance. These two sections are cross-referenced
by topic; to increase understanding by developers and users involved.
YouTube Transcript Summarizer is designed for each and every person who needs
to find the satisfied content they are looking for in short amount of time.
Enormous number of video recordings are being created and shared on the Internet
throughout the day. It has become really difficult to spend time watching such
videos which may have a longer duration than expected and sometimes our efforts
may become futile if we couldn't find relevant information out of it. Summarizing
transcripts of such videos automatically allows us to quickly lookout for the
important patterns in the video and helps us to save time and effort to go through
the whole content of the video.
9
2.2.1. Functional requirements definitions
Functional Requirements are those that refer to the functionality of the
system, i.e., what services it will provide to the user. Nonfunctional
(supplementary) requirements pertain to other information needed to produce the
correct system and are detailed separately.
This system will be used in different required ways of a user, as not only
youtube video summarization but also videos from websites, video conferences
from different region with diverse language based summarization to understand the
content on their own language.
• Summarization in text
• Summarization in audio
• Make notes
• Understand the main highlights
• Create questionaries for exam purpose
• Quick Revision for study with summarized material
10
Fig. 2.1 Accessing Chrome Extension
Brief Description:
For this use case to be initiated, the user can use theChrome Extension YouTube
Transcript Summarizer by:
1. The user connects to the system using a web browser compulsory Chrome
browser.
2. The user selects the Extension icon on Chrome browser at top-right corner
which looks like a little greyish coloured puzzle piece little icon.
3. The system passes the user to the Chrome Extension page where their will all the
Extensions are available.
4. The user needs to find the search button on Chrome extension page and type the
name of extension that is “YTSUMMARIZER”.
11
5. Then the user clicks on the search appeared Extension and clicks on Add to
chrome blue coloured button and the extension will be added to the users
chrome browser.
6. Then user should pin the extension with the pinning icon present near extensions
name.
7. Then whenever they want to access it they can without going further inside to
find that extension after pinning it.
12
Brief Description:
The user don’t need to either log in or sign up if they already have chrome
browser account to access a Chrome Extension.
Initial step-by-step description:
For this use case to be initiated the user must on the chrome browser.
1. The system passes the user to the Chrome Extension page where their
will all the Extensions are available.
2. The user needs to find the search button on Chrome extension page and
type the name of extension that is “YTSUMMARIZER”.
3. Then the user clicks on the search appeared Extension and clicks on Add
to chrome blue coloured button and the extension will be added to the
users chrome browser.
4. Then user should pin the extension with the pinning icon present near
extensions name.
5. Then whenever they want to access it they can without going further
inside to find that extension after pinning it.
13
Fig. 2.3 Admin has provided two forms in Summarization
Text and Audio.
Admin has provided to forms in Summarization Text and Audio for user.
1. The user should select the button and click on it, whichever form they want
summarization in.
3. The user when select text summarization the summary will start processing if the
video is subtitle eligible and then it’ll show the summary on the same page with
a popup like box and you can copy paste it.
14
4. The user when selects audio summarization button, a summarized audio will get
processed and it will play the audio.
5. Then they can make notes or record the audio on some recording device by
playing it loud.
Knowledge on how to use Extension or atleast to know what the extensions are.
15
their will all the Extensions are
available.
Other
16
coloured button and the
extension will be added to the
users chrome browser.
4. Then user should pin the
extension with the pinning icon
present near extensions name.
5. Then whenever they want to
access it they can without going
further inside to find that
extension after pinning it.
Alternate Path Pinning the extension is upto the
user.
Other
17
Hard Disk :All
Server Side:
Processor: All
RAM :All
Disk space :All
USER Side:
Chrome Browser
Chrome Extension icon on chrome browser
Data Base Server:
All
Processor: All
Chrome Browser (92,93):with the updated versions if not the
most latest
RAM :All
Hard Disk :All
Disk space :All
Software Requirements:
18
Microsoft Visual Studio is an integrated development
environment (IDE) from Microsoft. It can be used to develop
console and graphical user interfaceapplications along with
Windows Forms applications, web sites, web applications, and
web services in both native code together with managed code
for all platforms supported by Microsoft Windows, Windows
Mobile, Windows CE, .NET Framework, .NET Compact
Framework and Microsoft Silverlight.
Python 3.10 version the most latest Compiler
Pipeline
Flask, flask_RESTful
Transformers
YoutubeTrasncriptApi
Tensorflow
19
CHAPTER – III
SYSTEM ANALYSIS AND DESIGN
20
System Analysis and Design
21
Text summarization is the task of shortening long pieces of text into a concise
summary that preserves key information content and overall meaning. There are
two different approaches that are widely used for text summarization: • Extractive
Summarization: This is where the model identifies the important sentences and
phrases from the original text and only outputs those. • Abstractive Summarization:
The model produces a completely different text that is shorter than the original, it
generates new sentences in a new form, just like humans do. In this project, we will
use transformers for this approach.
Speed
Accuracy
Larger videos eligible for summarization
Summarization of no-subtitle eligible videos.
22
economic, and time factors. The purpose of the study is to determine if the systems
request should proceed further.
3.3.1. Does the New System Contribute to the Overall Objectives of the Extension?
The new system would contribute to the overall objectives to of the Extension. It
would provide a quick, error free and zero cost solution to the current process. It
would provide a solution to many issues in the current system. As the new system is
flexible and scalable it can also be upgraded and extended to meet other complex
requirements which may be raised in the future.
23
limit. The new system will generate the result as soon as the summarization is
processed by user and will also store it in the database for future usage.
1.4 High-quality Audio
The new system makes it easy to store and retrieve information as required and
does not involve storing information by the user-self its on Auto mode from cloud.
It thus saves data management problems faced in the current system as it has a
Database Management System of only one-time access.
1.5 Zero Cost and No Advertisements
Unique service provider this extension as it does not show any advertisements and
provide no-cost service.
The DFD (also known as bubble chart) is a simple graphical formalism that
can be used to represent a system in terms of the input data into the system, various
processes carried on these data, and the output data generated by the system.
The main reason why the DFD technique is so popular is because the fact that
the DFD is a very simple formalism – it is simple to understand and use. A DFD
model uses a very limited number of primitive symbols to represent the functions
performed by a system and the data flow among the functions. Starting with a set of
high-level functions that a system performs, a DFD model hierarchy represents
various sub-functions.
24
Fig. 3.1 DFD level 0
25
Fig. 3.2 DFD level 1
26
3.3 UML Modelling
27
3.6.3 Context Diagram
The context diagram is a top-level view of an information system that shows the
boundaries and scope. It describes the main objective of the system and the entities
involved.
28
CHAPTER – IV
DATA DICTIONARY
29
Data Dictionary
30
Field Name Description Constraints Size Data Type
31
4.2. E-R Diagram
32
CHAPTER – V
SCREEN SHOTS
33
5.1. Google Chrome Browser
34
5.1.2. Chrome Extensions icon
35
5.1.4. Searching the chrome extension YTSummarizer
36
5.1.5. YTSummarizer chrome extension add to chrome blue button
37
5.1.6. Chrome Extension for YouTube Transcript Summarizer
YTSummarizer
5.2.14. Result
38
39
CHAPTER – VI
TESTING
40
Testing
Testing Methodology
World now is not about books for reference or help to find anything relatable or
something we are searching for or research about, it has changed and for every
thing people want to research on they search on browsers, to help and make their
searches easy this YTSummarizer extension is created to generate summary on text
or audio format of a YouTube video to save time and get the required satisfactory
content. A proper backend leads to proper development and testing is the most
important part of that.
Rising customer expectations for fault-free, requirements-exact system have
increased awareness of the importance of software testing as a critical activity.
We begin the testing process by developing a comprehensive plan to test the general
functionality and special features on a variety of platform combinations. Strict
quality control procedures are used. The process very files that the application
meets the requirements specified in the system requirements document and is bug
free. At the end of each testing day, we prepare a summary of completed and failed
tests. Applications are not allowed to launch until all identified problems are fixed.
A report is prepared at the end of testing to show exactly what was tested and to list
the final outcomes.
Our software testing methodology is applied in three distinct phases: unit testing,
system testing, and acceptance were testing.
Unit Testing: The programmers conduct unit testing during the development phase.
Programmers can test their specific functionality individually or with other units.
However, unit testing is designed to test small pieces of functionality rather than the
system as a whole. This allows the programmers to conduct the first round of
41
testing to eliminate bugs before they reach the testing staff. In unit testing the
analyst tests the programs making up a system.
For this reason, unit testing is sometimes called program testing. Unit testing gives
stress on the modules independently of one another, to find errors. This helps the
tester in detecting errors in coding and logic that are contained within that module
alone. The errors resulting from the interaction between modules are initially
avoided.
For example, a hotel information system consists of modules to handle
reservations; guest checking and checkout; restaurant, room service and
miscellaneous charges; convention activities; and accounts receivable billing. For
each, it provides the ability to enter, modify or retrieve data and respond to different
types of inquiries or print reports. The test cases needed for unit testing should
exercise each condition and option.
Unit testing can be performed from the bottom up, starting with smallest and
lowest-level modules and proceeding one at a time. For each module in bottom-up
testing a short program is used to execute the module and provides the needed data,
so that the module is asked to perform the way it will when embedded within the
larger system.
System Testing: The objective of system testing is to ensure that all individual
programs are working as expected, that the programs link together to meet the
requirements specified and to ensure that the computer system and the associated
clerical and other procedures work together.
The initial phase of system testing is the responsibility of the analyst who
determines what conditions are to be tested, generates test data, produced a schedule
of expected results, runs the tests and compares the computer produced results with
the expected results with the expected results.
42
The analyst may also be involved in procedures testing. When the analyst is
satisfied that the system is working properly, he hands it over to the users for
testing. The importance of system testing by the user must be stressed. Ultimately
it is the user must verify the system and give the go-ahead.
During testing, the system is used experimentally to ensure that the software does
not fail, i.e., that it will run according to its specifications and in the way users
expect it to. Special test data is input for processing (test plan) and the results are
examined to locate unexpected results.
A limited number of users may also be allowed to use the system so analysts can
see whether they try to use it in unexpected ways. It is preferably to find these
surprises before the organization implements the system and depends on it. In many
organizations, testing is performed by persons other than those who write the
original programs. Using persons who do not know how certain parts were
designed or programmed ensures more complete and unbiased testing and more
reliable software.
The system is tested as a complete, integrated system. System testing first occurs in
the development environment but eventually is conducted in the production
environment. Functionality and performance testing are designed to catch bugs in
the system, unexpected results, or other ways in which the system does not meet the
stated requirements.
The testers create detailed scenarios to test the strength and limits of the system,
trying to break it if possible. Editorial reviews not only correct typographical and
grammatical errors, but also improve the system’s overall usability by ensuring that
on-screen language is clear and helpful to users. Accessibility reviews ensure that
the system is accessible to users with disabilities.
43
i. Program testing
ii. String testing
iii. System testing
iv. System documentation
v. User acceptance testing
Program Testing
A program represents the logical elements of a system. For a program to run
satisfactorily, it must compile and test data correctly and tie in properly with other
programs. It is the responsibility of a programmer to have an error free program. At
The time of testing the system, there exists two types of errors that should be
checked. These errors are syntax and logic.
A syntax error is a program statement that violates one or more rules of the
language in which it is written. An improperly defined field dimension or omitted
key words are common syntax errors. These errors are shown through error
messages generated by the computer. A logic error, on the other hand, deals with
incorrect data fields out of range items, and invalid combinations.
Since the logical errors are not detected by compiler, the programmer must examine
the output carefully to detect them. When a program is tested, the actual output is
compared with the expected output. When there is a discrepancy, the sequence of
the instructions, must be traced to determine the problem. The process is facilitated
by breaking the program down into selfcontained portions, each of which can be
checked at certain key points.
String Testing
44
Programs are invariably related to one another and interact in a total system. Each
program is tested to see whether it conforms to related programs in the system.
Each part of the system is tested against the entire module with both test and live
data before the whole system is ready to be tested.
System Testing
System testing is designed to uncover weaknesses that were not found in earlier
tests. This includes forced system failure and validation of total system as it will be
implemented by its user in the operational environment. Under this testing,
generally we take low volumes of transactions based on live data. This volume is
increased until the maximum level for each transaction type is reached.
The total system is also tested for recovery and fallback after various major failures
to ensure that no data are lost during the emergency.
All this is done with the old system still in operation. When we see that the
proposed system is successful in the test, the old system is discontinued.
System Documentation
All design and test documentation should be well prepared and kept in the library
for future reference. The library is the central location for maintenance of the new
system.
45
an acceptance test is actually the user's show. User motivation is very important for
the successful performance of the system. After that a comprehensive test report is
prepared. This report shows the system's tolerance, performance range, error rate
and accuracy.
A. INTERFACE TESTING
1) User-friendliness OK
2) Consistent menus NA
B. CONTROL FLOW TESTING
1) IF-THEN-ELSE OK
2) DO WHILE OK
3) CASE-SWITCH OK
C. VALIDATION TESTING
1) Check for improper or inconsistent typing OK
2) Check for erroneous initialization or default values OK
3) Check for incorrect variable names OK
4) Check for inconsistent Data Types OK
46
D. DATA INTEGRITY/SECURITY TESTING
1) Data Insertion/ Deletion/ Updating OK
47
CHAPTER – VII
48
7.1. Limitations
The new system has been designed to meet almost all of the user requirements but
this too has certain limitations some of which can be enhanced in the future
enhancements or updates
The existing system modules and models are only able to generate summary
of upto 1024 words and less sometimes because of less accuracy. The new system
will be able to overcome this in future with the updated models being used in
backend.
The existing system does not provide high rate of accuracy and its hard to be
only dependent on this model for future works because this might lead to make us
miss some good video content if summary is not all accurate enough.
49
The existing system has the eligibility criteria for a YouTube video to
generate summarization in whatever required format because it cannot transcribe
the improper language audio to transcribe summary of a video, this leads to one
single criteria to generate only subtitle in-built videos to generate summarization.
The new system would contribute to the overall objectives to of the Extension. It
would provide a quick, error free and zero cost solution to the current process. It
would provide a solution to many issues in the current system. As the new system is
flexible and scalable it can also be upgraded and extended to meet other complex
requirements which may be raised in the future.
7.2.3 Accuracy
One of the most important draw backs of the current system is that audio is not
at its best accuracy and can’t be generated on longer size videos because of word
limit. The new system will generate the result as soon as the summarization is
processed by user and will also store it in the database for future usage.
50
7.2.4 High-quality Audio
The new system makes it easy to store and retrieve information as required and
does not involve storing information by the user-self its on Auto mode from cloud.
It thus saves data management problems faced in the current system as it has a
Database Management System of only one-time access.
7.3 Conclusion
Proper design builds upon this foundation to give a blue print, which is
actually implemented by the developers.
On realizing the importance of systematic documentation all the processes
are implemented using a software engineering approach. Working in a live
environment enables one to appreciate the intricacies involved in the System
Development Life Cycle (SDLC).
I have gained a lot of practical knowledge from this project, which i think,
shall make me stand in a good state in the future.
51
CHAPTER – VIII
BIBLIOGRAPHY
52
Bibliography
53