Skip to content

Slow toml parsing #2597

@cj81499

Description

@cj81499

Describe the bug

toml decoding/parsing is extremely slow.

For a uv.lock (toml) file:

# convert toml to json (using python), then convert json to yaml using yq (fast!)
$ time cat uv.lock | python -c 'import tomllib, json, sys; print(json.dumps(tomllib.loads(sys.stdin.read())))' | yq --input-format json --output-format yaml '.'
...
cat uv.lock  0.00s user 0.00s system 8% cpu 0.032 total
python -c   0.09s user 0.01s system 98% cpu 0.099 total
yq --input-format json --output-format yaml '.'  0.17s user 0.05s system 100% cpu 0.222 total

# convert toml to yaml using yq (slow!)
$ time yq --input-format toml --output-format yaml '.' uv.lock
...
yq --input-format toml --output-format yaml '.' uv.lock  4.18s user 0.52s system 245% cpu 1.916 total

Observe that --input-format toml takes substantially longer than --input-format json.
The performance difference grows even larger as the toml file grows. I haven't measured to evaluate if it scales linearly/exponentially/logarithmically/etc

Version of yq:

$ yq --version
yq (https://github.com/mikefarah/yq/) version v4.52.2

Operating system: Observed on both macOS and Linux (Ubuntu 24.04).
Installed via: standalone binary package (via) mise

Input Yaml Toml

The performance discrepancy is not observable with small files.

The uv.lock (a toml file. 2874 lines.) that was used to generate the above performance measurements: https://gist.github.com/cj81499/75c681ca5da21e8c2888192dd04d3702

Command

yq --input-format toml --output-format yaml '.'

Actual behaviour

Slow performance

Expected behaviour

Comparable performance to other file formats

Additional context

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions