Natural Language Processing Fundamentals

Overview

A large portion of data currently being generated is unstructured in the form of text. In order to make this data accessible for analysis and useful for data science, skills in the domain of Natural Language Processing are required. This module will cover the following topics:

  • text data processing methods (i.e. tokenization, lemmatization, bag of words, tf-idf) and associated packages with NLTK and spaCy
  • part of speech tagging
  • word embeddings
  • deep learning for text

Learning Objectives

At the end of this module students should be able to conduct Natural Language Processing work on real-world large-scale projects with little to no supervision.

Prework

This module makes a few assumptions of your established knowledge regarding your programming and/or data skills. Below are the assumptions made and some resources to read through to make sure you are properly prepared.

Assumptions

Schedule

Main Content Page is here.

Session Description
1 Text Data Processing
2 Word Embeddings
3 Supervised Learning for NLP
4 Unsupervised Learning for NLP

See the Notion board for link to slides.