nxli | Nancy Li

Wikipedia Q&A

Project Info

Location: Pittsburgh, PA
Start Date: September 2016
End Date: December 2016
Related: 11-411, Natural Language Processing

Project Features

An asking program which takes in a Wikipedia article text file and a non-negative integer n, and produces n questions based on the article.
An answering program which takes in a Wikipedia article text file and a text file of questions based on the article, and produces answers to those questions.
Completed in a team of 2.

Overview

Siri, Google Asistant, Cortana, Bixby, Alexa... What do they have in common? I'll give you a hint - no, they are not tiny magical creatures that live inside our electronics that answer to our every demand. :P Instead, they are smart assistants that are becoming more and more prevalent in today's smart devices and they rely heavily on a field of Computer Science called Natural Language Processing. I've always been interested in how these smart assistants are able to help us the way they do, and with this project, I was able to gain some insight and get a little taste of NLP.

I mainly worked on the answer-generation portion of the project. Some of the tools/concepts I used include:

Python
NLTK
scikit-learn
tf-idf and cosine similarity
AMALGrAM 2.0 Supersense tagger
Nodebox English Linguistics library

Content of the video report:

Question generation: 0:05
- Sentence preprocessing: 0:11
- Making questions: 1:11
- Question scoring: 2:29
Question answering: 3:03
- Source and question preprocessing: 3:10
- Answer generation: 3:56