CIS 530 - Computational Linguistics - University of Pennsylvania

Image credit: Edwige and Xavier

Shared Tasks

The NLP community has a great tradition of “shared tasks”. Many of these are perfect for a term-project for this class, since they give you a great starting point for a problem definition, training and test data, a standard evaluation metric, and lots of published baselines. Here are some pointers to shared tasks that were featured at CoNLL, SemEval, WMT, and Kaggle.

You are welcome to choose a shared task topic for your term project.

CoNLL Shared Tasks

The Conference on Computational Natural Language Learning (CoNLL) hosts a shared task every year. Here are the past CoNLL shared tasks:

Multilingual Parsing from Raw Text to Universal Dependencies
Universal Morphological Reinflection
Multilingual Shallow Discourse Parsing
Shallow Discourse Parsing
Grammatical Error Correction English Proceedings
Modelling Multilingual Unrestricted Coreference in OntoNotes
Modelling Unrestricted Coreference in OntoNotes English
Hedge Detection English Proceedings
Syntactic and Semantic Dependencies in Multiple Languages
Joint Parsing of Syntactic and Semantic Dependencies
Dependency Parsing: Multilingual & Domain Adaptation
Multi-Lingual Dependency Parsing
Semantic Role Labeling English
Language-Independent Named Entity Recognition
Clause Identification
Chunking
NP Bracketing

SemEval

The International Workshop on Semantic Evaluation (SemEval) hosts a range of shared tasks every year. Here are links to the SemEval tasks:

SemEval-2017

Semantic Textual Similarity
Multilingual and Cross-lingual Semantic Word Similarity
Community Question Answering
Sentiment Analysis in Twitter
Fine-Grained Sentiment Analysis on Financial Microblogs and News
#HashtagWars. Learning a Sense of Humor
Detection and Interpretation of English Puns
RumourEval. Determining rumour veracity and support for rumours
Abstract Meaning Representation Parsing and Generation
Extracting Keyphrases and Relations from Scientific Publications
End-User Development using Natural Language
Clinical TempEval

SemEval-2016

Semantic Textual Similarity. A Unified Framework for Semantic Processing and Evaluation
Interpretable Semantic Textual Similarity
Community Question Answering
Sentiment Analysis in Twitter
Aspect-Based Sentiment Analysis
Detecting Stance in Tweets
Determining Sentiment Intensity of English and Arabic Phrases
Meaning Representation Parsing
Chinese Semantic Dependency Parsing
Detecting Minimal Semantic Units and their Meanings
Complex Word Identification
Clinical TempEval
TExEval-2 – Taxonomy Extraction
Semantic Taxonomy Enrichment

SemEval-2015

Paraphrase and Semantic Similarity in Twitter
Semantic Textual Similarity
Answer Selection in Community Question Answering
TimeLine. Cross-Document Event Ordering
QA TempEval
Clinical TempEval
Diachronic Text Evaluation
SpaceEval
CLIPEval Implicit Polarity of Events
Sentiment Analysis in Twitter
Sentiment Analysis of Figurative Language in Twitter
Aspect Based Sentiment Analysis
Multilingual All-Words Sense Disambiguation and Entity Linking
Analysis of Clinical Text
A CPA dictionary-entry-building task
Taxonomy Extraction Evaluation
Semantic Dependency Parsing

SemEval-2014

Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment
Grammar Induction for Spoken Dialogue Systems
Cross-Level Semantic Similarity
Aspect Based Sentiment Analysis
L2 Writing Assistant
Supervised Semantic Parsing of Spatial Robot Commands
Analysis of Clinical Text
Broad-Coverage Semantic Dependency Parsing
Sentiment Analysis in Twitter
Multilingual Semantic Textual Similarity

SemEval-2013

TempEval-3 Temporal Annotation
Sentiment Analysis in Twitter
Spatial Role Labeling
Free Paraphrases of Noun Compounds
Evaluating Phrasal Semantics
Semantic Textual Similarity (becomes *Sem Shared Task)
The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge
Cross-lingual Textual Entailment for Content Synchronization
Extraction of Drug-Drug Interactions from BioMedical Texts
Cross-lingual Word Sense Disambiguation
Evaluating Word Sense Induction & Disambiguation within An End-User Application
Multilingual Word Sense Disambiguation
Word Sense Induction for Graded and Non-Graded Senses
The Coarse-Grained and Fine-Grained Chinese Lexical Sample and All-Words Task

SemEval-2012

English Lexical Simplification
Measuring Degrees of Relational Similarity
Spatial Role Labeling
Evaluating Chinese Word Similarity
Chinese Semantic Dependency Parsing
Semantic Textual Similarity
COPA. Choice Of Plausible Alternatives An evaluation of commonsense causal reasoning
Cross-lingual Textual Entailment for Content Synchronization

Previous years

Kaggle

Kaggle is a platform for machine learning competitions where people compete to produce the best models for a huge range of different datasets. Companies often offer a reward for their competitions. There’s tons of cool data and competitions that you can base your final project on.

Here are a few relevant competitions:

You can also check out the Linguistics tag and the Langauges tag for lots of other ideas. Want 130,000 wine reviews with their ratings, or 55,000 song lyrics? Find them on Kaggle.