Shared Tasks
The NLP community has a great tradition of “shared tasks”. Many of these are perfect for a term-project for this class, since they give you a great starting point for a problem definition, training and test data, a standard evaluation metric, and lots of published baselines. Here are some pointers to shared tasks that were featured at CoNLL, SemEval, WMT, and Kaggle.
You are welcome to choose a shared task topic for your term project.
CoNLL Shared Tasks
The Conference on Computational Natural Language Learning (CoNLL) hosts a shared task every year. Here are the past CoNLL shared tasks:
- Multilingual Parsing from Raw Text to Universal Dependencies
- Universal Morphological Reinflection
- Multilingual Shallow Discourse Parsing
- Shallow Discourse Parsing
- Grammatical Error Correction English Proceedings
- Modelling Multilingual Unrestricted Coreference in OntoNotes
- Modelling Unrestricted Coreference in OntoNotes English
- Hedge Detection English Proceedings
- Syntactic and Semantic Dependencies in Multiple Languages
- Joint Parsing of Syntactic and Semantic Dependencies
- Dependency Parsing: Multilingual & Domain Adaptation
- Multi-Lingual Dependency Parsing
- Semantic Role Labeling English
- Language-Independent Named Entity Recognition
- Clause Identification
- Chunking
- NP Bracketing
SemEval
The International Workshop on Semantic Evaluation (SemEval) hosts a range of shared tasks every year. Here are links to the SemEval tasks:
SemEval-2017
- Semantic Textual Similarity
- Multilingual and Cross-lingual Semantic Word Similarity
- Community Question Answering
- Sentiment Analysis in Twitter
- Fine-Grained Sentiment Analysis on Financial Microblogs and News
- #HashtagWars. Learning a Sense of Humor
- Detection and Interpretation of English Puns
- RumourEval. Determining rumour veracity and support for rumours
- Abstract Meaning Representation Parsing and Generation
- Extracting Keyphrases and Relations from Scientific Publications
- End-User Development using Natural Language
- Clinical TempEval
SemEval-2016
- Semantic Textual Similarity. A Unified Framework for Semantic Processing and Evaluation
- Interpretable Semantic Textual Similarity
- Community Question Answering
- Sentiment Analysis in Twitter
- Aspect-Based Sentiment Analysis
- Detecting Stance in Tweets
- Determining Sentiment Intensity of English and Arabic Phrases
- Meaning Representation Parsing
- Chinese Semantic Dependency Parsing
- Detecting Minimal Semantic Units and their Meanings
- Complex Word Identification
- Clinical TempEval
- TExEval-2 – Taxonomy Extraction
- Semantic Taxonomy Enrichment
SemEval-2015
- Paraphrase and Semantic Similarity in Twitter
- Semantic Textual Similarity
- Answer Selection in Community Question Answering
- TimeLine. Cross-Document Event Ordering
- QA TempEval
- Clinical TempEval
- Diachronic Text Evaluation
- SpaceEval
- CLIPEval Implicit Polarity of Events
- Sentiment Analysis in Twitter
- Sentiment Analysis of Figurative Language in Twitter
- Aspect Based Sentiment Analysis
- Multilingual All-Words Sense Disambiguation and Entity Linking
- Analysis of Clinical Text
- A CPA dictionary-entry-building task
- Taxonomy Extraction Evaluation
- Semantic Dependency Parsing
SemEval-2014
- Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment
- Grammar Induction for Spoken Dialogue Systems
- Cross-Level Semantic Similarity
- Aspect Based Sentiment Analysis
- L2 Writing Assistant
- Supervised Semantic Parsing of Spatial Robot Commands
- Analysis of Clinical Text
- Broad-Coverage Semantic Dependency Parsing
- Sentiment Analysis in Twitter
- Multilingual Semantic Textual Similarity
SemEval-2013
- TempEval-3 Temporal Annotation
- Sentiment Analysis in Twitter
- Spatial Role Labeling
- Free Paraphrases of Noun Compounds
- Evaluating Phrasal Semantics
- Semantic Textual Similarity (becomes *Sem Shared Task)
- The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge
- Cross-lingual Textual Entailment for Content Synchronization
- Extraction of Drug-Drug Interactions from BioMedical Texts
- Cross-lingual Word Sense Disambiguation
- Evaluating Word Sense Induction & Disambiguation within An End-User Application
- Multilingual Word Sense Disambiguation
- Word Sense Induction for Graded and Non-Graded Senses
- The Coarse-Grained and Fine-Grained Chinese Lexical Sample and All-Words Task
SemEval-2012
- English Lexical Simplification
- Measuring Degrees of Relational Similarity
- Spatial Role Labeling
- Evaluating Chinese Word Similarity
- Chinese Semantic Dependency Parsing
- Semantic Textual Similarity
- COPA. Choice Of Plausible Alternatives An evaluation of commonsense causal reasoning
- Cross-lingual Textual Entailment for Content Synchronization
Previous years
Kaggle
Kaggle is a platform for machine learning competitions where people compete to produce the best models for a huge range of different datasets. Companies often offer a reward for their competitions. There’s tons of cool data and competitions that you can base your final project on.
Here are a few relevant competitions:
You can also check out the Linguistics tag and the Langauges tag for lots of other ideas. Want 130,000 wine reviews with their ratings, or 55,000 song lyrics? Find them on Kaggle.