google/diff-match-patch
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
repo name | google/diff-match-patch |
repo link | https://github.com/google/diff-match-patch |
homepage | |
language | Python |
size (curr.) | 675 kB |
stars (curr.) | 2982 |
created | 2018-01-23 |
license | Apache License 2.0 |
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.
- Diff:
- Compare two blocks of plain text and efficiently return a list of differences.
- Diff Demo
- Match:
- Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.
- Match Demo
- Patch:
- Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn’t match.
- Patch Demo
Originally built in 2006 to power Google Docs, this library is now available in C++, C#, Dart, Java, JavaScript, Lua, Objective C, and Python.
Reference
- API - Common API across all languages.
- Line or Word Diffs - Less detailed diffs.
- Plain Text vs. Structured Content - How to deal with data like XML.
- Unidiff - The patch serialization format.
- Support - Newsgroup for developers.
Languages
Although each language port of Diff Match Patch uses the same API, there are some language-specific notes.
A standardized speed test tracks the relative performance of diffs in each language.
Algorithms
This library implements Myer’s diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.
This library also implements a Bitap matching algorithm at the heart of a flexible matching and patching strategy.