Wikipedia Q&A
Project Info
- Location: Pittsburgh, PA
- Start Date: September 2016
- End Date: December 2016
- Related: 11-411, Natural Language Processing
Project Features
- An asking program which takes in a Wikipedia article text file and a non-negative integer n, and produces n questions based on the article.
- An answering program which takes in a Wikipedia article text file and a text file of questions based on the article, and produces answers to those questions.
- Completed in a team of 2.
Overview
Siri, Google Asistant, Cortana, Bixby, Alexa... What do they have in common? I'll give you a hint - no, they are not tiny magical creatures that live inside our electronics that answer to our every demand. :P Instead, they are smart assistants that are becoming more and more prevalent in today's smart devices and they rely heavily on a field of Computer Science called Natural Language Processing. I've always been interested in how these smart assistants are able to help us the way they do, and with this project, I was able to gain some insight and get a little taste of NLP.
I mainly worked on the answer-generation portion of the project. Some of the tools/concepts I used include:
- Python
- NLTK
- scikit-learn
- tf-idf and cosine similarity
- AMALGrAM 2.0 Supersense tagger
- Nodebox English Linguistics library
Content of the video report: