Skip to content

Conversation

dbrgn
Copy link
Contributor

@dbrgn dbrgn commented Oct 3, 2019

No more vector pre-allocation if the data size is not known in advance. See discussion in #151.

Fixes #151 / RUSTSEC-2017-0006.

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 3, 2019

Not sure if https://github.com/3Hren/msgpack-rust/blob/master/rmp-serde/src/decode.rs#L807 needs to be adapted as well (I don't use the serde bindings).

@kornelski
Copy link
Collaborator

kornelski commented Oct 4, 2019

I'm concerned that this may be too drastic. Would it suffice to put some arbitrary limit on it? e.g. with_capacity(len.min(LIMIT)) where the limit could be something like 4K or 64K?

This would reduce the problem from outright DoS to mere amplification of memory usage in the worst case of infinitely recursive arrays overpromising their size. Amplification seems like a lesser problem, and also likely happens already anyway (all memory management has some overhead, even non-preallocated vec will grow by doubling its size, so there's always 2x amplification there).

For a completely robust solution we'd need a completely different approach, e.g. allowing to specify maximum total memory usage for the decoding process, tracking it precisely, and baling out when the limit is reached.

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 4, 2019

Yeah, a limit of 4K would probably cover most of the cases and make the amplification much less serious. However, I don't think the spec limits the nesting depth, so in case unlimited nesting is possible, with nested array16 items, an attacker with a 2 MiB input (1048576 nested arrays) could still cause an allocation of at least 4 GiB (amplification factor 2048). (I think it's even worse, since it's a Vec<Value> and not a Vec<u8>.)

What if we don't preallocate (with limit) for arrays, but only for bin/str types? The case is much less serious for binary data versus arrays, because binary data cannot be nested.

There could also be a nesting limit for arrays, however that might need to be user-configurable.

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 4, 2019

And in any case, we are speculating and should probably have some benchmarks.

@kornelski the current rmpv benchmarks are from 2017 and broken. Do we still need them, or can they be disabled or removed for now?

@kornelski
Copy link
Collaborator

It'd be nice to fix the benchmarks, so we don't have to speculate :)

Reserving capacity with a limit for non-nested types, and no reservation for nested types, sounds safe and fast enough.

@dbrgn dbrgn mentioned this pull request Oct 8, 2019
@dbrgn dbrgn force-pushed the 151-no-preallocation branch from eed631a to 02856a8 Compare October 8, 2019 09:22
@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 8, 2019

Rebased on #222. I added a few benchmarks.

Before the change:

test from_complex_read_value              ... bench:       2,757 ns/iter (+/- 1,079) = 49 MB/s
test from_complex_read_value_ref          ... bench:       2,509 ns/iter (+/- 826) = 54 MB/s
test from_complex_read_value_ref_to_owned ... bench:         432 ns/iter (+/- 171) = 118 MB/s
test from_complex_write_value_ref         ... bench:          86 ns/iter (+/- 7) = 744 MB/s
test from_string_read_value               ... bench:          58 ns/iter (+/- 5)
test from_string_read_value_ref           ... bench:          37 ns/iter (+/- 0)
test read_array_50kib                     ... bench:      11,600 ns/iter (+/- 138) = 22069 MB/s
test read_array_100kib                    ... bench:      27,137 ns/iter (+/- 377) = 18867 MB/s
test read_array_1mib                      ... bench:     568,895 ns/iter (+/- 267,342) = 9215 MB/s
test read_array_20mib                     ... bench:  47,857,478 ns/iter (+/- 1,093,477) = 2191 MB/s
test read_bin32_50kib                     ... bench:       1,887 ns/iter (+/- 39) = 27133 MB/s
test read_bin32_100kib                    ... bench:       3,845 ns/iter (+/- 49) = 26631 MB/s
test read_bin32_1mib                      ... bench:      54,690 ns/iter (+/- 525) = 19173 MB/s
test read_bin32_20mib                     ... bench:   2,587,674 ns/iter (+/- 128,720) = 8104 MB/s
test read_bin32_100mib                    ... bench:  47,660,737 ns/iter (+/- 1,960,405) = 2200 MB/s

With no pre-allocation at all:

test from_complex_read_value              ... bench:       4,089 ns/iter (+/- 161) = 33 MB/s
test from_complex_read_value_ref          ... bench:       3,551 ns/iter (+/- 154) = 38 MB/s
test from_complex_read_value_ref_to_owned ... bench:         560 ns/iter (+/- 255) = 91 MB/s
test from_complex_write_value_ref         ... bench:          84 ns/iter (+/- 21) = 761 MB/s
test from_string_read_value               ... bench:          63 ns/iter (+/- 0)
test from_string_read_value_ref           ... bench:          37 ns/iter (+/- 9)
test read_array_50kib                     ... bench:       7,642 ns/iter (+/- 59) = 33499 MB/s
test read_array_100kib                    ... bench:      16,934 ns/iter (+/- 365) = 30235 MB/s
test read_array_1mib                      ... bench:     363,263 ns/iter (+/- 24,070) = 14432 MB/s
test read_array_20mib                     ... bench:  47,641,955 ns/iter (+/- 1,763,511) = 2200 MB/s
test read_bin32_50kib                     ... bench:       1,393 ns/iter (+/- 42) = 36755 MB/s
test read_bin32_100kib                    ... bench:       2,584 ns/iter (+/- 97) = 39628 MB/s
test read_bin32_1mib                      ... bench:      33,507 ns/iter (+/- 218) = 31294 MB/s
test read_bin32_20mib                     ... bench:  10,209,202 ns/iter (+/- 439,572) = 2054 MB/s
test read_bin32_100mib                    ... bench:  47,599,682 ns/iter (+/- 362,712) = 2202 MB/s

I can't quite explain why, but the array 50/100/1024 KiB cases were actually faster without the preallocation. The array 20 MiB case was the same speed.

The same thing counts for bin32, the cases up to 1 MiB were faster without preallocation. The 20 MiB bin32 test was about 5 times slower, while the 100 MiB bin32 test was again the same speed.

                 prealloc  no-prealloc
array  50 kib      11,600        7,642 (- 34%)
array 100 kib      27,137       16,934 (- 38%)
array   1 mib     568,895      363,263 (- 36%)
array  20 mib  47,857,478   47,641,955 (-  0%)

bin32  50 kib       1,887        1,393 (- 26%)
bin32 100 kib       3,845        2,584 (- 33%)
bin32   1 mib      54,690       33,507 (- 39%)
bin32  20 mib   2,587,674   10,209,202 (+294%)
bin32 100 mib  47,660,737   47,599,682 (-  0%)

Can someone explain this?

In any case, it indicates that we're probably doing premature optimization with those preallocations.

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 8, 2019

When preallocating up to 64 KiB for bin data:

test read_bin32_100kib                    ... bench:       2,347 ns/iter (+/- 403) = 43630 MB/s
test read_bin32_100mib                    ... bench:  48,031,872 ns/iter (+/- 1,750,672) = 2183 MB/s
test read_bin32_1mib                      ... bench:      33,701 ns/iter (+/- 719) = 31114 MB/s
test read_bin32_20mib                     ... bench:   9,528,805 ns/iter (+/- 729,030) = 2200 MB/s
test read_bin32_50kib                     ... bench:       1,102 ns/iter (+/- 50) = 46460 MB/s

Comparison:

                 prealloc         no-prealloc      prelloc max64k
bin32  50 kib       1,887       1,393 (- 26%)       1,102 (- 42%)
bin32 100 kib       3,845       2,584 (- 33%)       2,347 (- 39%)
bin32   1 mib      54,690      33,507 (- 39%)      33,701 (- 38%)
bin32  20 mib   2,587,674  10,209,202 (+294%)   9,528,805 (+268%)
bin32 100 mib  47,660,737  47,599,682 (-  0%)  48,031,872 (-  1%)

Doesn't hurt, but doesn't help a lot either.

@dbrgn dbrgn force-pushed the 151-no-preallocation branch from 8ad31a7 to 1c384a0 Compare October 8, 2019 16:33
@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 8, 2019

Rebased against master.

@kornelski kornelski merged commit ab9563d into 3Hren:master Oct 8, 2019
@kornelski
Copy link
Collaborator

Thanks for fixing this

@dbrgn dbrgn deleted the 151-no-preallocation branch October 9, 2019 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unchecked vector pre-allocation
2 participants