allenai/longformer
Longformer: The Long-Document Transformer
repo name | allenai/longformer |
repo link | https://github.com/allenai/longformer |
homepage | https://arxiv.org/abs/2004.05150 |
language | Python |
size (curr.) | 591 kB |
stars (curr.) | 265 |
created | 2020-03-31 |
license | Apache License 2.0 |
Longformer
Longformer
is a BERT-like model for long documents.
How to use
- Download pretrained model
-
Install environment and code
Our code relies on a custom CUDA kernel, and for now it only works on GPUs and Linux. We tested our code on Ubuntu, Python 3.7, CUDA10, PyTorch 1.2.0. If it doesn’t work for your environment, please create a new issue.
conda create --name longformer python=3.7 conda activate longformer conda install cudatoolkit=10.0 pip install git+https://github.com/allenai/longformer.git
-
Run the model
import torch from longformer.longformer import Longformer from transformers import RobertaTokenizer model = Longformer.from_pretrained('longformer-base-4096/') tokenizer = RobertaTokenizer.from_pretrained('roberta-base') tokenizer.max_len = model.config.max_position_embeddings SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000) # long input document SAMPLE_TEXT = f'{tokenizer.cls_token}{SAMPLE_TEXT}{tokenizer.eos_token}' input_ids = torch.tensor(tokenizer.encode(SAMPLE_TEXT)).unsqueeze(0) # batch of size 1 model = model.cuda() # doesn't work on CPU input_ids = input_ids.cuda() # Attention mask values -- 0: no attention, 1: local attention, 2: global attention attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device) # initialize to local attention attention_mask[:, [1, 4, 21,]] = 2 # Set global attention based on the task. For example, # classification: the <s> token # QA: question tokenss output = model(input_ids, attention_mask=attention_mask)[0]
TriviaQA
- Training scripts:
scripts/triviaqa.py
- Pretrained large model:
here
(replicates leaderboard results) - Instructions:
scripts/cheatsheet.txt
Compiling the CUDA kernel
We already include the compiled binaries of the CUDA kernel, so most users won’t need to compile it, but if you are intersted, check scripts/cheatsheet.txt
for instructions.
Known issues
Please check the repo issues for a list of known issues that we are planning to address soon. If your issue is not discussed, please create a new one.
Citing
If you use Longformer
in your research, please cite Longformer: The Long-Document Transformer.
@article{Beltagy2020Longformer,
title={Longformer: The Long-Document Transformer},
author={Iz Beltagy and Matthew E. Peters and Arman Cohan},
journal={arXiv:2004.05150},
year={2020},
}
Longformer
is an open-source project developed by the Allen Institute for Artificial Intelligence (AI2).
AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.