Skip to main content

References

Tutorials

NLP courses at other universities

Python resources

  • Python itself has good documentation and a decent getting started page here.
  • Python gives a list of good tutorials here. Many are focused on people with no programming background, but two that seem a bit less introductory are the Python in 10 minutes tutorial, and Google’s Python class.
  • There is a Coursera course on Python running now.
  • Scikit-learn is an amazingly easy library for doing machine learning in Python. It is also wonderfully verbosely documented with tons of examples.
  • Kaggle has some tutorials on sklearn
  • spaCy is excellent Python NLP library. It also has a cleverly named visualization tool, displaCy.
  • AllenNLP[https://github.com/allenai/allennlp] is an open-source NLP research library from AI2, built on PyTorch

Using python 3.5+ on biglab

Biglab has python3.4 installed, which is a little out of date, so if you want to use a more modern python, follow these steps. First, to get to biglab:

$ ssh USERNAME@biglab.seas.upenn.edu

(where USERNAME is your Penn username)

You can either use an existing miniconda installation, or you can download your own.

1. Use existing miniconda installation

For this, open up ~/.bashrc and add this line to the end:

export PATH="/home1/m/mayhew/miniconda3/bin:$PATH"

Restart your terminal (exit and ssh in again), and python should be version 3.6 from anaconda.

2. Install miniconda in your home directory

This is more involved, but may give you more freedom. Anaconda is a collection of scientific packages for python, and also a virtual environment manager. I suggest miniconda, which is a stripped down version. To install go here: https://conda.io/miniconda.html. Alternatively, run this:

$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ chmod +x Miniconda3-latest-Linux-x86_64.sh
$ . Miniconda3-latest-Linux-x86_64.sh

Then restart your terminal (exit and ssh in again), and run this:

$ conda install gensim

Bash resources

  • John has a basic introduction to bash for NLP here, and a discussion of advanced topics in bash here.
  • Kevin Knight of the University of Southern California has a nice unix skills for NLP tutorial here.

Screen / byobu / tmux

Since you will be running code remotely, we strongly recommend that you use some sort of session manager. I (Stephen) use screen, but other options are byobu, or tmux. These allow you to ssh to a remote machine, start a terminal session, disconnect from it, and reconnect at a later time. This is especially useful when you want to run long jobs. Here’s a sample screenrc file.