Your term project is to design a project similar to the homework assigments you have completed
in this class. Your final project will consist of the following components:
A description of the problem, in the style and format of the homework assignment descriptions.
Training and evaluation data.
A commented implementation of the simplest possible solution to the problem. For instance, this could be a majority class baseline or a random baseline.
A commented implementation of a baseline published in the literature, along with
skeleton code obtained by removing the parts that students should implement.
One extension per team member that attempts to improve on the baseline, along
with a brief (one- to three-paragraph) accompanying write-up for each extension
describing the general approach and whether it worked.
A evaluation script that can be used to score
submissions like on the class leaderboard. The output of any model
implementations should be gradeable with this program.
We’ll vote on the best projects, and the best ones will be available for ther students to complete as an optional final homework assignment.
We’re going to split up the work on the term project into several deliverables, each with their own due dates. You don’t have to wait to start working on each part of the project. We encourage you to begin work early, so that you have a polished final product.
Milestones and Due Dates
Here are the milestones for the term project:
Mar 14, 2018 - Milestone 1 - Form a team and submit three project ideas.
Mar 28, 2018 - Milestone 2 - Collect your data, and write an evalation script and a simple baseline.
Apr 11, 2018 - Milestone 3 - Implement a published baseline.
Apr 18, 2018 - Milestone 4 - Submit your project writeup and a pitch video. Finish one of your extensions the public baseline (no late days allowed).
Apr 18, 2018 - Vote on your favorite projects from the class.
Apr 25, 2018 - Milestone 5 - Finish all your extensions to the public baseline.
Apr 25, 2018 - Do one or more of the your classmates' projects.
For Milestone 1, you’ll need to form a team and come up with 3 project ideas. For each idea you should describe:
A problem definition (1 to 2 paragraphs, plus an illustrative example)
A pointer to two or more more papers or sections textbook that describes the problem
What evaluation metrics could use to score system outputs
What type of data you will need to evaluate, and how much data is available
The term project is a team exercise. The minimum team size is 4, and the max team size is 6. If you need help finding a team, you can post on this Piazza thread.
You should identify what topic you would like to work on. Pretty much any topic in natural language processing is fair game. The first milestone for the term project is to pick 3 topic ideas that your team might be interested in exploring. The course staff will help assess the feasibility of your ideas and will make a recommendation of which of your 3 initial ideas is the best fit for the scope of the term project.
The NLP community has a great tradition of “shared tasks”. Many of these are perfect for a term-project for this class, since they give you a great starting point for a problem definition, training and test data, a standard evaluation metric, and lots of published baselines. Here are some pointers to shared tasks that were featured at CoNLL, SemEval, WMT, and Kaggle.
You are welcome to choose a share task topic or to develop your own topic, provided that it is related to NLP.
CoNLL Shared Tasks
The Conference on Computational Natural Language Learning (CoNLL) hosts a shared task every year. Here are the past CoNLL shared tasks:
Multilingual Parsing from Raw Text to Universal Dependencies
Universal Morphological Reinflection
Multilingual Shallow Discourse Parsing
Shallow Discourse Parsing
Grammatical Error Correction English Proceedings
Modelling Multilingual Unrestricted Coreference in OntoNotes
Modelling Unrestricted Coreference in OntoNotes English
Hedge Detection English Proceedings
Syntactic and Semantic Dependencies in Multiple Languages
Joint Parsing of Syntactic and Semantic Dependencies
Kaggle is a platform for machine learning competitions where people compete to produce the best models for a huge range of different datasets. Companies often offer a reward for their competitions. There’s tons of cool data and competitions that you can base your final project on.
Here are a list of potential project ideas that were brainstormed by the course staff:
Rank scalar adjectives. Adjectives like good, tasty, yummy, delicious, scrumptious all describe some property of a noun (how good something tastes), but they vary in intensity. Can you write an algorithm to put them in the correct order by intensity? For instance, good < tasty < yummy < delicious < scrumptious. Here are some good papers about the ranking scalar adjectives:
Order prenominal modifiers. In English, prenominal modifiers must come in a certain order. It sounds fluent to say the big beautiful white wooden house, but not the white wooden beautiful big house. Here’s a NLP good paper describing a class-based approach to ordering prenominal modifiers.
You could collect all of the pre-nominal modifiers from a large parsed corpus like the WaCKy corpora or the Annotated Gigaword, and then train a model to predict their order. Here’s a rule from a grammar book about what order adjectives are supposed to come in. Is it true?
Predict the star rating of Amazon reviews. Amazon released a collection of 130 million customer reviews from 1995 until 2015. How well you predict the star rating based on the text of the reviews? There are tons of papers on NLP and sentiment analysis. Chapter 6 and Chapter 18 of the textbook as a good place to start.
Help assess depression and risks of self-harm in social media. One way that NLP might be used for social good is to try to identify people who are at risk of suicide based on their social media posts. There’s a ton of good academic work on this.
Generation text description of images. There’s been a lot of cool work that combines computer vision and natural language processing. One thread of that research tries to generate captions for images. A good overview is provided in these papers:
What can we learn about people on social media through self-identification? Google BigQuery has a set of all Reddit comments since 2005. Past NLP work on mental health examined social media users who self-identified as having clinical depression or PTSD by looking for public comments like “I was diagnosed with depression”. Can we use self-indentificaiton to learn the patterns of language use of different demographics? Here’s a search through a subset of the Reddit comments for the phrase “I’m a [blank]”:
And you get back a set of usernames and how they self identified. Can you develop self-indentification queries for different demographic info, retreieve all of the comments from those users and then analyze the language differentences for different groups like men v. women or straight people v. gays and lesbians?
What do you need to turn in?
For Milestone 1, you’ll need to turn in writeups for your 3 project ideas.
For the whole project, here’s a provisional list of the deliverables that you’ll need to submit:
report.md: The final version of your write-up, incorporating any additional changes to your revised draft (if any).
readme.md: A brief description of your task and the included code.
data-train/: A directory containing the training data.
data-dev/: A directory containing the development data for local evaluation.
data-test/: A directory containing the test data for leaderboard evaluation.
default: A full implementation of the default system.
baseline: A skeleton of the baseline system to be provided to students.
baseline-solution: A full implementation of the baseline system.
extension-1, extension-2, …: Full implementations of the extensions, one per group member.
extensions.md: A brief write-up describing your extensions and their performance.
grade-dev: A grading script for local evaluation. This may be a wrapper around a generic grading script grade.
grade-test: A grading script for leaderboard evaluation. This may be a wrapper around a generic grading script grade.