lovasoa/bad_json_parsers
Exposing problems in json parsers of several programming languages.
repo name | lovasoa/bad_json_parsers |
repo link | https://github.com/lovasoa/bad_json_parsers |
homepage | |
language | Python |
size (curr.) | 226 kB |
stars (curr.) | 335 |
created | 2017-10-21 |
license | MIT License |
Nesting levels for JSON parsers
Documenting how JSON parsers of several programming languages deal with deeply nested structures.
Introduction
Many JSON parsers (and many parsers in general) use recursion to parse nested structures. This is very convenient while programming the parser, but it has consequences on what the parser can parse: indeed, the size of the call stack is usually limited to a value several orders of magnitude smaller than the available RAM, and this implies that a program with too many levels of recursion will fail.
The two most recent JSON standards RFC 8259 and RFC 7159 both say “An implementation may set limits on the maximum depth of nesting”. However, the ECMA-404 specification doesn’t contain any limit on how deeply nested JSON structures can be.
This means that there is not a defined level of nesting which is correct or incorrect with regard to the JSON specification, and JSON parsers may differ when parsing nested structures.
Some recursive parser libraries implement a safety check in order to avoid crashing the calling program: they artificially limit the maximum depth they accept (often making that limit configurable), hoping that the size of the stack at the moment they are called plus the artificial limit will always be smaller than the total stack size. This limit is an arbitrary choice of the library implementer, and it explains all the lower values of the comparison you’ll see below.
Some parsers do not use the operating system stack at all to parse nested structures (they usually implement a state machine instead). These can usually accept arbitrarily deeply nested structures. Of course, for non-streaming parsers, they cannot physically be provided infinitely large inputs, and thus cannot produce infinitely-large outputs.
You should note that parsers that set an arbitrary limit on the input nesting level are not safer and do not provide any more memory consumption guarantees than parsers that can handle arbitrarily nested input: they still consume an amount of resources proportional to the size of their input.
This repository contains tools to measure the nesting limits of JSON parsers of different languages.
How to use
This repository contains a script called test_parser.py that takes a JSON parser and uses binary search to find the smallest JSON structure it fails to parse and print its nesting level.
The json parser must be a program that reads JSON on its standard input, exits with a status of 0 if it managed to parse it and any other status if an error occurred.
How it works
test_parser.py constructs json structures composed uniquely of nested arrays,
and gives them to the program it tests.
For instance, for a depth of 3, it builds the following json : [[[]]]
.
This allows to create a structure of only 2n bytes that has n nesting levels.
It uses binary search
to find the smallest structure for which the program fails.
Results
The various implementations in this repository are continuously tested by Tarvis CI on a virtual machine running Ubuntu 18.04, with 8Gb of RAM, and a maximum stack size of 8.192 Mb.
Here are the results we found, sorted from least nesting allowed by default to the most:
language | json library | nesting level | file size | notes |
---|---|---|---|---|
C# | System.Text.Json | 65 | 130 bytes | configurable (JsonSerializerOptions.MaxDepth ) * |
ruby | json | 101 | 202 bytes | configurable (:max_nesting ) * |
rust | serde_json | 128 | 256 bytes | disableable (disable_recursion_limit ) * |
shell | jq | 257 | 514 bytes | undocumented |
php | json_decode |
512 | 1.0 KB | configurable ($depth ) * |
perl | JSON::PP | 513 | 1.0 KB | configurable (max_depth ) * |
swift | JSONDecoder |
514 | 1.0 KB | undocumented |
python3 | json | 995 | 2.0 KB | configurable (sys.setrecursionlimit ) *, undocumented |
C | jansson | 2049 | 4.0 KB | |
javascript | JSON.parse |
5712 | 11.4 KB | Node.js 8 LTS |
java | Gson | 6100 | 12 KB | |
java | Jackson | 6577 | 13 KB | |
go | json-iterator | 10002 | 20 KB | configurable (Config.MaxDepth ) * |
PostgreSQL | json type | 11887 | 23 KB | configurable (max_stack_depth ), undocumented |
D | std.json |
37370 | 74.7 KB | segfaults |
C++ | RapidJSON | 87266 | 175 KB | segfaults |
Nim | json | 104769 | 209 KB | segfaults |
OCaml | yojson | 130380 | 260 KB | |
go | encoding/json |
1973784 | 3.9 MiB | fatal error, goroutine stack exceeds 1000000000-byte limit |
C++ | JSON for Modern C++ | ∞ | ∞ | segfault fixed in v3.7.2 |
C# | Newtonsoft.Json | ∞ | ∞ | |
ruby | Oj | ∞ | ∞ | |
Haskell | Aeson | ∞ | ∞ |
* Note that configurable and disableable mean only that the default depth check inside the parser itself can be configured or disabled, not that the parser can be made to accept any nesting depth. When disabling the limit or increasing it too much, the parser will crash the calling program instead of returning a clean error.
Remarks
I tried to test the most popular json library of each language. If you want to add a new language or a new library, feel free to open a pull request. All the parameters were left to their default values.