Skip to main content

Term Project : Overview

Your term project is to design a project similar to the homework assigments you have completed in this class. Your final project will consist of the following components:

  1. A description of the problem, in the style and format of the homework assignment descriptions.
  2. Training and evaluation data.
  3. A commented implementation of the simplest possible solution to the problem. For instance, this could be a majority class baseline or a random baseline.
  4. A commented implementation of a baseline published in the literature, along with skeleton code obtained by removing the parts that students should implement.
  5. One extension per team member that attempts to improve on the baseline, along with a brief (one- to three-paragraph) accompanying write-up for each extension describing the general approach and whether it worked.
  6. A evaluation script that can be used to score submissions like on the class leaderboard. The output of any model implementations should be gradeable with this program.

We’ll vote on the best projects, and the best ones will be available for ther students to complete as an optional final homework assignment.

We’re going to split up the work on the term project into several deliverables, each with their own due dates. You don’t have to wait to start working on each part of the project. We encourage you to begin work early, so that you have a polished final product.

Milestones and Due Dates

Here are the milestones for the term project:
  • Mar 14, 2018 - Milestone 1 - Form a team and submit three project ideas.
  • Mar 28, 2018 - Milestone 2 - Collect your data, and write an evalation script and a simple baseline.
  • Apr 11, 2018 - Milestone 3 - Implement a published baseline.
  • Apr 18, 2018 - Milestone 4 - Submit your project writeup and a pitch video. Finish one of your extensions the public baseline (no late days allowed).
  • Apr 18, 2018 - Vote on your favorite projects from the class.
  • Apr 25, 2018 - Milestone 5 - Finish all your extensions to the public baseline.
  • Apr 25, 2018 - Do one or more of the your classmates' projects.

Milestone 1

For Milestone 1, you’ll need to form a team and come up with 3 project ideas. For each idea you should describe:

  1. A problem definition (1 to 2 paragraphs, plus an illustrative example)
  2. A pointer to two or more more papers or sections textbook that describes the problem
  3. What evaluation metrics could use to score system outputs
  4. What type of data you will need to evaluate, and how much data is available

The term project is a team exercise. The minimum team size is 4, and the max team size is 6. If you need help finding a team, you can post on this Piazza thread.

Project Ideas

You should identify what topic you would like to work on. Pretty much any topic in natural language processing is fair game. The first milestone for the term project is to pick 3 topic ideas that your team might be interested in exploring. The course staff will help assess the feasibility of your ideas and will make a recommendation of which of your 3 initial ideas is the best fit for the scope of the term project.

The NLP community has a great tradition of “shared tasks”. Many of these are perfect for a term-project for this class, since they give you a great starting point for a problem definition, training and test data, a standard evaluation metric, and lots of published baselines. Here are some pointers to shared tasks that were featured at CoNLL, SemEval, WMT, and Kaggle.

You are welcome to choose a share task topic or to develop your own topic, provided that it is related to NLP.

CoNLL Shared Tasks

The Conference on Computational Natural Language Learning (CoNLL) hosts a shared task every year. Here are the past CoNLL shared tasks:

  1. Multilingual Parsing from Raw Text to Universal Dependencies
  2. Universal Morphological Reinflection
  3. Multilingual Shallow Discourse Parsing
  4. Shallow Discourse Parsing
  5. Grammatical Error Correction English Proceedings
  6. Modelling Multilingual Unrestricted Coreference in OntoNotes
  7. Modelling Unrestricted Coreference in OntoNotes English
  8. Hedge Detection English Proceedings
  9. Syntactic and Semantic Dependencies in Multiple Languages
  10. Joint Parsing of Syntactic and Semantic Dependencies
  11. Dependency Parsing: Multilingual & Domain Adaptation
  12. Multi-Lingual Dependency Parsing
  13. Semantic Role Labeling English
  14. Language-Independent Named Entity Recognition
  15. Clause Identification
  16. Chunking
  17. NP Bracketing


The International Workshop on Semantic Evaluation (SemEval) hosts a range of shared tasks every year. Here are links to the SemEval tasks:


  1. Semantic Textual Similarity
  2. Multi­lingual and Cross­-lingual Semantic Word Similarity
  3. Community Question Answering
  4. Sentiment Analysis in Twitter
  5. Fine-Grained Sentiment Analysis on Financial Microblogs and News
  6. #HashtagWars. Learning a Sense of Humor
  7. Detection and Interpretation of English Puns
  8. RumourEval. Determining rumour veracity and support for rumours
  9. Abstract Meaning Representation Parsing and Generation
  10. Extracting Keyphrases and Relations from Scientific Publications
  11. End-User Development using Natural Language
  12. Clinical TempEval


  1. Semantic Textual Similarity. A Unified Framework for Semantic Processing and Evaluation
  2. Interpretable Semantic Textual Similarity
  3. Community Question Answering
  4. Sentiment Analysis in Twitter
  5. Aspect-Based Sentiment Analysis
  6. Detecting Stance in Tweets
  7. Determining Sentiment Intensity of English and Arabic Phrases
  8. Meaning Representation Parsing
  9. Chinese Semantic Dependency Parsing
  10. Detecting Minimal Semantic Units and their Meanings
  11. Complex Word Identification
  12. Clinical TempEval
  13. TExEval-2 – Taxonomy Extraction
  14. Semantic Taxonomy Enrichment


  1. Paraphrase and Semantic Similarity in Twitter
  2. Semantic Textual Similarity
  3. Answer Selection in Community Question Answering
  4. TimeLine. Cross-Document Event Ordering
  5. QA TempEval
  6. Clinical TempEval
  7. Diachronic Text Evaluation
  8. SpaceEval
  9. CLIPEval Implicit Polarity of Events
  10. Sentiment Analysis in Twitter
  11. Sentiment Analysis of Figurative Language in Twitter
  12. Aspect Based Sentiment Analysis
  13. Multilingual All-Words Sense Disambiguation and Entity Linking
  14. Analysis of Clinical Text
  15. A CPA dictionary-entry-building task
  16. Taxonomy Extraction Evaluation
  17. Semantic Dependency Parsing


  1. Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment
  2. Grammar Induction for Spoken Dialogue Systems
  3. Cross-Level Semantic Similarity
  4. Aspect Based Sentiment Analysis
  5. L2 Writing Assistant
  6. Supervised Semantic Parsing of Spatial Robot Commands
  7. Analysis of Clinical Text
  8. Broad-Coverage Semantic Dependency Parsing
  9. Sentiment Analysis in Twitter
  10. Multilingual Semantic Textual Similarity


  1. TempEval-3 Temporal Annotation
  2. Sentiment Analysis in Twitter
  3. Spatial Role Labeling
  4. Free Paraphrases of Noun Compounds
  5. Evaluating Phrasal Semantics
  6. Semantic Textual Similarity (becomes *Sem Shared Task)
  7. The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge
  8. Cross-lingual Textual Entailment for Content Synchronization
  9. Extraction of Drug-Drug Interactions from BioMedical Texts
  10. Cross-lingual Word Sense Disambiguation
  11. Evaluating Word Sense Induction & Disambiguation within An End-User Application
  12. Multilingual Word Sense Disambiguation
  13. Word Sense Induction for Graded and Non-Graded Senses
  14. The Coarse-Grained and Fine-Grained Chinese Lexical Sample and All-Words Task


  1. English Lexical Simplification
  2. Measuring Degrees of Relational Similarity
  3. Spatial Role Labeling
  4. Evaluating Chinese Word Similarity
  5. Chinese Semantic Dependency Parsing
  6. Semantic Textual Similarity
  7. COPA. Choice Of Plausible Alternatives An evaluation of commonsense causal reasoning
  8. Cross-lingual Textual Entailment for Content Synchronization

Previous years


Kaggle is a platform for machine learning competitions where people compete to produce the best models for a huge range of different datasets. Companies often offer a reward for their competitions. There’s tons of cool data and competitions that you can base your final project on.

Here are a few relevant competitions:

You can also check out the Linguistics tag and the Langauges tag for lots of other ideas. Want 130,000 wine reviews with their ratings, or 55,000 song lyrics? Find them on Kaggle.

Course staff ideas

Here are a list of potential project ideas that were brainstormed by the course staff:

  SELECT author, body FROM [fh-bigquery:reddit_comments.2015_05] WHERE LENGTH(body) < 255 AND LENGTH(body) > 30 AND 
  (body LIKE 'i\'m a %'
  or body LIKE 'I\'m a %'
  or body LIKE 'a\'m a %'
  or body LIKE 'i\'m an %'
  or body LIKE 'I\'m an %'
  or body LIKE 'a\'m an %') LIMIT 10000;

And you get back a set of usernames and how they self identified. Can you develop self-indentification queries for different demographic info, retreieve all of the comments from those users and then analyze the language differentences for different groups like men v. women or straight people v. gays and lesbians?

What do you need to turn in?

For Milestone 1, you’ll need to turn in writeups for your 3 project ideas.

For the whole project, here’s a provisional list of the deliverables that you’ll need to submit:

  1. The final version of your write-up, incorporating any additional changes to your revised draft (if any).
  2. A brief description of your task and the included code.
  3. data-train/: A directory containing the training data.
  4. data-dev/: A directory containing the development data for local evaluation.
  5. data-test/: A directory containing the test data for leaderboard evaluation.
  6. default: A full implementation of the default system.
  7. baseline: A skeleton of the baseline system to be provided to students.
  8. baseline-solution: A full implementation of the baseline system.
  9. extension-1, extension-2, …: Full implementations of the extensions, one per group member.
  10. A brief write-up describing your extensions and their performance.
  11. grade-dev: A grading script for local evaluation. This may be a wrapper around a generic grading script grade.
  12. grade-test: A grading script for leaderboard evaluation. This may be a wrapper around a generic grading script grade.