June 26, 2019

437 words 3 mins read

atlanhq/camelot

Camelot: PDF Table Extraction for Humans


repo name	atlanhq/camelot
repo link	https://github.com/atlanhq/camelot
homepage	https://camelot-py.readthedocs.io
language	Python
size (curr.)	16826 kB
stars (curr.)	2691
created	2016-06-18
license	Other

Camelot: PDF Table Extraction for Humans

Camelot is a Python library that makes it easy for anyone to extract tables from PDF files!

Note: You can also check out Excalibur, which is a web interface for Camelot!

Here’s how you can extract tables from PDF files. Check out the PDF used in this example here.

Cycle Name	KI (1/km)	Distance (mi)	Percent Fuel Savings
			Improved Speed	Decreased Accel	Eliminate Stops	Decreased Idle
2012_2	3.30	1.3	5.9%	9.5%	29.2%	17.4%
2145_1	0.68	11.2	2.4%	0.1%	9.5%	2.7%
4234_1	0.59	58.7	8.5%	1.3%	8.5%	3.3%
2032_2	0.17	57.8	21.7%	0.3%	2.7%	1.2%
4171_1	0.07	173.9	58.1%	1.6%	2.1%	0.5%

There’s a command-line interface too!

Note: Camelot only works with text-based PDFs and not scanned documents. (As Tabula explains, “If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based”.)

Why Camelot?

You are in control.: Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
Bad tables can be discarded based on metrics like accuracy and whitespace, without ever having to manually look at each table.
Each table is a pandas DataFrame, which seamlessly integrates into ETL and data analysis workflows.
Export to multiple formats, including JSON, Excel, HTML and Sqlite.

See comparison with other PDF table extraction libraries and tools.

Installation

Using conda

The easiest way to install Camelot is to install it with conda, which is a package manager and environment management system for the Anaconda distribution.

Using pip

After installing the dependencies (tk and ghostscript), you can simply use pip to install Camelot:

From the source code

After installing the dependencies, clone the repo using:

and install Camelot using pip:

Documentation

Great documentation is available at http://camelot-py.readthedocs.io/.

Development

The Contributor’s Guide has detailed information about contributing code, documentation, tests and more. We’ve included some basic information in this README.

Source code

You can check the latest sources with:

Setting up a development environment

You can install the development dependencies easily, using pip:

Testing

After installation, you can run tests using:

Versioning

Camelot uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out HISTORY.md.

License

This project is licensed under the MIT License, see the LICENSE file for details.

python

atlanhq/camelot

Camelot: PDF Table Extraction for Humans

Why Camelot?

Installation

Using conda

Using pip

From the source code

Documentation

Development

Source code

Setting up a development environment

Testing

Versioning

License

mahmoud/boltons

chiphuyen/sotawhat

sublimelsp/LSP

Bogdanp/molten

dddomodossola/remi

amanusk/s-tui

albumentations-team/albumentations

BNMetrics/logme

Yorko/mlcourse.ai