1
0
Fork 0
mirror of https://github.com/ultrajson/ultrajson.git synced 2024-05-05 11:56:20 +02:00
Commit Graph

126 Commits

Author SHA1 Message Date
Dimitri Papadopoulos Orfanos b7a4dfda0a
Fix typos found by codespell (#610) 2023-10-17 18:45:44 +01:00
William Ayd 0959d18cfa
Fix undefined behavior in Buffer_AppendLongUnchecked (#606)
This was flagged when running with UBSAN
2023-10-01 23:41:23 +01:00
Mario Garcia-Armas 1161d5d27d Add checks to test suite and improve comments 2022-10-14 17:52:37 +00:00
Mario Garcia-Armas 6f6e69c119 Added unittest where int overflow causes exception 2022-10-11 19:48:32 +00:00
Mario Garcia-Armas fb96b9037c Fix len integer overflow issue 2022-10-04 23:16:08 +00:00
JustAnotherArchivist 8a946e5830 Add separators encoding parameter
Closes #283
2022-07-11 00:43:29 +01:00
JustAnotherArchivist b21da40ead Fix double free on string decoding if realloc fails 2022-06-27 22:26:31 +00:00
JustAnotherArchivist bc7bdff051 Replace wchar_t string decoding implementation with a uint32_t-based one
This fixes character handling on platforms with 16-bit wchar_t (notably, Windows), which was broken (in different ways) on both CPython and PyPy.

Fixes #552
2022-06-19 23:11:17 +00:00
JustAnotherArchivist aa068e335f Add support for arbitrary size integers 2022-06-16 17:26:19 +00:00
Hugo van Kemenade b47c3a70b5
Merge pull request #550 from JustAnotherArchivist/fix-decode-surrogates
Fix handling of surrogates on decoding
2022-06-16 11:05:31 +03:00
JustAnotherArchivist e0e5db9a46 Fix handling of surrogates on decoding
This implements surrogate handling on decoding as it is in the standard library. Lone escaped surrogates and any raw surrogates in the input result in surrogates in the output, and escaped surrogate pairs get decoded into non-BMP characters. Note that raw surrogate pairs get treated differently on platforms/compilers with 16-bit `wchar_t`, e.g. Microsoft Windows.
2022-06-09 18:01:22 +00:00
JustAnotherArchivist 666d159db8 Fix memory leak on encoding errors when the buffer was resized
`JSON_EncodeObject` returns `NULL` when an error occurs, but without freeing the buffer. This leads to a memory leak when the buffer is internally allocated (because the caller's buffer was insufficient or none was provided at all) and any error occurs. Similarly, `objToJSON` did not clean up the buffer in all error conditions either.

This adds the missing buffer free in `JSON_EncodeObject` (iff the buffer was allocated internally) and refactors the error handling in `objToJSON` slightly to also free the buffer when a Python exception occurred without the encoder's `errorMsg` being set.
2022-06-04 19:32:56 +00:00
Hugo van Kemenade f71d7c28ad
Merge pull request #544 from NaN-git/main
Integer parsing: always detect overflows
2022-06-01 22:56:20 +03:00
JustAnotherArchivist 9b9af1ab70 Fix handling of surrogates on encoding
This allows surrogates anywhere in the input, compatible with the json module from the standard library.

This also refactors two interfaces:
- The `PyUnicode` to `char*` conversion is moved into its own function, separated from the `JSONTypeContext` handling, so it can be reused for other things in the future (e.g. indentation and separators) which don't have a type context.
- Converting the `char*` output to a Python string with surrogates intact requires the string length for `PyUnicode_Decode` & Co. While `strlen` could be used, the length is already known inside the encoder, so the encoder function now also takes an extra `size_t` pointer argument to return that and no longer NUL-terminates the string. This also permits output that contains NUL bytes (even though that would be invalid JSON), e.g. if an object's `__json__` method return value were to contain them.

Fixes #156
Fixes #447
Fixes #537
Supersedes #284
2022-05-30 01:58:12 +00:00
Philipp Otterbein 0a0e111701 fix typo: LLONG_MIN 2022-05-29 19:24:07 +02:00
Philipp Otterbein 9c42263c80 fix integer decoding: always detect overflows 2022-05-28 17:01:30 +02:00
JustAnotherArchivist 66060a0fab Add and fix debug memory checks for all buffer appends on encoding
In DEBUG mode, this ensures that all buffer appends are safe.
It also refactors direct `memcpy` calls into a helper `Buffer_memcpy` function that ensures correct buffer pointer movement and has a similar safety check.
2022-04-24 13:58:00 -07:00
JustAnotherArchivist f4d2c87ab6 Refactor buffer reservations to ensure sufficient space on all additions
* Removed the reservations in Buffer_EscapeStringUnvalidated and Buffer_EscapeStringValidated as those are not needed and may hide other bugs.
* Debug check in Buffer_EscapeStringValidated was triggering incorrectly.
* The reservation on JT_RAW was much larger than necessary; the value is copied directly, so the factor six is not needed, and this may hide other bugs.
* Explicit accurate reservations everywhere else.
2022-04-05 21:04:39 +01:00
Brénainn Woodsend 5875168c41 Fix some more seg-faults on encoding. 2022-04-05 21:04:39 +01:00
Brénainn Woodsend 1a39406b3a Remove the hidden JSON_NO_EXTRA_WHITESPACE compile knob.
Unsetting it can lead to seg-faults. I don't think it's worth having to fix and
then test this undocumented permutation.
2022-04-05 21:04:39 +01:00
Brénainn Woodsend 61dd6f19e8 Fix unchecked buffer overflows (CVE-2021-45958).
Add a few extra memory reserve calls to account for the extra space that
indentation needs.

These kinds of memory issues are hard to spot because the buffer is resized in
powers of 2 meaning that a miscalculation would only show any symptoms if the
required buffer size is estimated to be just below a 2 power but is actually
just above. Add a debug mode which replaces the 2 power scheme with reserving
only the memory explicitly requested and adds some overflow checks.
2022-04-05 21:04:39 +01:00
joncrall 13aa30e152
Fix nan bug in pandas port 2022-04-04 13:56:19 -04:00
joncrall f090103b31
NaN and Inf in loads - Port of Pandas #30295 2022-04-04 13:56:11 -04:00
JustAnotherArchivist f9aa23b5e6 Remove dead code that used to handle the separate int type in Python 2 2022-02-20 10:59:11 +00:00
JustAnotherArchivist 7f269a4818 Clean up iterators, type contexts, and recursion level on errors 2022-02-16 08:17:47 +00:00
JustAnotherArchivist 4bd21e2483 Fix exceptions on encoding list or dict elements and non-overflow errors on int handling getting silenced
Fixes #273
2022-02-16 08:17:47 +00:00
Dr. Nick e00caaebd5 dconv no longer uses global instances of StringToDoubleConverter/DoubleToStringConverter 2021-08-03 10:17:10 -04:00
Hugo van Kemenade f2d79b89c4 Remove unused variable 2021-04-07 10:23:26 +03:00
Filip Salomonsson 7a8a614017
Fix typos in error message 2021-02-03 17:32:42 +01:00
David W.H. Swenson 6013e71381
Merge remote-tracking branch 'upstream/master' into fix_large_floats 2020-11-15 19:56:19 +01:00
Hugo van Kemenade 13e2ac7eea
Merge pull request #443 from dwhswenson/match_python_exponents 2020-11-15 19:25:37 +02:00
David W.H. Swenson 954a9a0a00
cleanup 2020-11-11 16:54:57 +01:00
David W.H. Swenson a48f8b22f1
Set same bounds as std lib for negative exponent 2020-11-11 16:41:34 +01:00
David W.H. Swenson b773bf05dc
Fix errors on reading long decimal floats 2020-11-11 14:51:27 +01:00
David W.H. Swenson af699c3cd0
Match Python json output for exponents 2020-11-11 14:41:51 +01:00
Sam Sneddon f4029cc6ef Fix #429: Make empty dict/list indented serialization match stdlib json
Previously, we'd output a couple of new lines between the start and end
of the object, whereas the stdlib doesn't bother with whitespace if
they're empty.

In my testing, the only difference in indented serialization now is
float representation.
2020-11-10 10:57:15 +00:00
Hugo 4ae63bee5c Lint end-of-file-fixer 2020-05-12 09:36:47 +03:00
Hugo 5f1e8479fa Lint trailing-whitespace 2020-05-12 09:21:45 +03:00
Eric Le Lay e0c113e6a2 Merge branch 'master' into 264-reject_bytes 2020-05-08 17:34:35 +02:00
Hugo van Kemenade f953a0978a
Update comment 2020-05-04 09:23:39 +03:00
Hugo van Kemenade d9ca1c9b5b
Merge branch 'master' into add_nan_support 2020-03-27 21:41:33 +02:00
Hugo 61453ad7fd Fix typo 2020-03-08 00:17:27 +02:00
Hugo c810a5b8a6 Also define LIKELY/UNLIKELY for _WIN32 2020-03-08 00:17:27 +02:00
Hugo 75695ba61e Indent ifdefs 2020-03-08 00:17:27 +02:00
Natanael Copa 0f52df8f9b Reduce default buffer on stack size
Fix segfaults on musl libc when ultrajson runs in a thread. On musl libc
the default thread stack size is only 80k so allocating a 128k buffer on
stack will guarantee a crash. There seems not to be any evident
performance benefit using big buffer on stack either so we just reduce
the default.

fixes #254
2020-03-02 23:45:56 +02:00
Hugo f0b428ea37 Merge branch 'master' into 50-object-trailing-comma 2020-03-01 23:54:09 +02:00
Eric Le Lay b69b37f6d0
fix typo in doc (2)
Co-Authored-By: Hugo van Kemenade <hugovk@users.noreply.github.com>
2020-03-01 15:53:43 +01:00
Hugo van Kemenade 631850788d
Merge branch 'master' into add_nan_support 2020-02-25 22:34:37 +02:00
Hugo van Kemenade 1588690257
Merge branch 'master' into 264-reject_bytes 2020-02-25 22:28:14 +02:00
Hugo d53480c332 http -> https 2020-02-18 21:57:13 +02:00