Kyubyong/nlp_made_easy
Explains nlp building blocks in a simple manner.
repo name | Kyubyong/nlp_made_easy |
repo link | https://github.com/Kyubyong/nlp_made_easy |
homepage | |
language | Jupyter Notebook |
size (curr.) | 285 kB |
stars (curr.) | 215 |
created | 2019-01-18 |
license | |
NLP Made Easy
Simple code notes for explaining NLP building blocks
- Subword Segmentation Techniques
- Let’s compare various tokenizers, i.e., nltk, BPE, SentencePiece, and Bert tokenizer.
- Beam Decoding
- Beam decoding is essential for seq2seq tasks. But it’s notoriously complicated to implement. Here’s a relatively easy one, batchfying candidates.
- How to get the last hidden vector of rnns properly
- We’ll see how to get the last hidden states of Rnns in Tensorflow and PyTorch.
- Tensorflow seq2seq template based on the g2p task
- We’ll write a simple template for seq2seq using Tensorflow. For demonstration, we attack the g2p task. G2p is a task of converting graphemes (spelling) to phonemes (pronunciation). It’s a very good source for this purpose as it’s simple enough for you to up and run.
- PyTorch seq2seq template based on the g2p task
- We’ll write a simple template for seq2seq using PyTorch. For demonstration, we attack the g2p task. G2p is a task of converting graphemes (spelling) to phonemes (pronunciation). It’s a very good source for this purpose as it’s simple enough for you to up and run.
- [Attention mechanism](Work in progress)
- POS-tagging with BERT Fine-tuning
- BERT is known to be good at Sequence tagging tasks like Named Entity Recognition. Let’s see if it’s true for POS-tagging.
- Dropout in a minute
- Dropout is arguably the most popular regularization technique in deep learning. Let’s check again how it work.
- Ngram LM vs. rnnlm(WIP)
- Data Augmentation for Quora Question Pairs
- Let’s see if it’s effective to augment training data in the task of quora question pairs.