* Use PyDict_Next() to iterate over dicts.
* Use macros to access lists, tuples, bytes.
* Avoid calling PyErr_Occurred() if not necessary.
* Fix a memory leak when encoding very large ints.
* Delete dead and duplicate code.
Also,
* Raise TypeError if toDict() returns a non-dict instead of silently
converting it to null.
Infinity was being encoded as 'Inf' which, whilst the JSON spec doesn't include
any non-finite floats, differs from the conventions in other JSON libraries,
JavaScript of using 'Infinity'. It also differs from what `ujson.loads()`
expects so that `ujson.loads(ujson.dumps(math.inf))` raises an exception.
Closes #80.
There are small typos in:
- python/objToJSON.c
- tests/test_ujson.py
Fixes:
- Should read `standard` rather than `stanard`.
- Should read `gibberish` rather than `jibberish`.
Signed-off-by: Tim Gates <tim.gates@iress.com>
`JSON_EncodeObject` returns `NULL` when an error occurs, but without freeing the buffer. This leads to a memory leak when the buffer is internally allocated (because the caller's buffer was insufficient or none was provided at all) and any error occurs. Similarly, `objToJSON` did not clean up the buffer in all error conditions either.
This adds the missing buffer free in `JSON_EncodeObject` (iff the buffer was allocated internally) and refactors the error handling in `objToJSON` slightly to also free the buffer when a Python exception occurred without the encoder's `errorMsg` being set.
This allows surrogates anywhere in the input, compatible with the json module from the standard library.
This also refactors two interfaces:
- The `PyUnicode` to `char*` conversion is moved into its own function, separated from the `JSONTypeContext` handling, so it can be reused for other things in the future (e.g. indentation and separators) which don't have a type context.
- Converting the `char*` output to a Python string with surrogates intact requires the string length for `PyUnicode_Decode` & Co. While `strlen` could be used, the length is already known inside the encoder, so the encoder function now also takes an extra `size_t` pointer argument to return that and no longer NUL-terminates the string. This also permits output that contains NUL bytes (even though that would be invalid JSON), e.g. if an object's `__json__` method return value were to contain them.
Fixes #156
Fixes #447
Fixes #537
Supersedes #284
Errors during `__repr__` itself as well as ones during the conversion to a bytes object were not handled, resulting in NULL pointer dereferencing.
Cf. #382
For bytes, there was an extraneous INCREF; PyIter_Next returns a new reference. For other non-strings, the original itemName before converting to a string was never dereferenced.
Fixes #419
dump and dumps functions in python json stdlib have a default keyword argument.
It's useful for serializing complex objects. Supporting this argument will improve compatibility and flexibility of ujson.
Add a function to check if an object is of type `decimal.Decimal`.
Since that type was previously cached as a static variable, this commit
makes it a member of the module state instead. Add the associated module
state machinery.
Only enable compact ASCII shortcut in non Limited API.
Also check if the module exists before creating it anew in the init
function.
Also remove unnecessary and leaky Py_INCREF. PyObject_GetAttrString
returns a new reference.
See PEP 384 (Defining a Stable ABI):
https://www.python.org/dev/peps/pep-0384/ and PEP 3121 (Extension Module
Initialization and Finalization):
https://www.python.org/dev/peps/pep-3121/
which meant a memory leak.
PyObject_GetItem returns a new reference (and goes through
abstract object[key] API), whereas PyDict_GetItem returns a borrowed
reference and goes directly to the dict hash lookup.
with this ujson matches the builtin json behavior for NaN and Inf.
if a user wants to retain the old behavior they can pass allow_nan=False
to ensure strict json compatibility.
To fix issues with floating-point precision we've made use of Google's
double-conversion lib to handle conversions of doubles to and from strings.
In addition to fixing our precision problems this will improve double
encoding by 4-5x. Decoding is however slightly slower according to the
benchmarks - but accurate at least.
This change removes the double_precision encoding option and the
precise_float decoding option.
To better align with the standard json module this removes ujson
default serialization of date/datetime objects to unix-timestamps.
Trying to serialize such an object will now raise a TypeError "repr(obj)
is not JSON serializable".
The behavior of ujson has always been to try to serialize all objects in
any way possible. This has been quite a deviation from other json
libraries, including Pythons standard json module, and the source of a
lot of confusion and bugs. Removing this quirk moves ultrajson closer to
the expected behavior.
Instead of trying to coerce serialization ultrajson will now throw a
TypeError: "repr(obj) is not JSON serializable" exception.
Previously a None dict item key would be outputted in JSON as "None".
To better align with the standard json module this was changed to output
"null". There's no proper representation of null object keys in JSON so
this is implementation specific but it seems more natural to follow
suit when it can be done without a significant performance hit.
Added and used branch prediction macros (LIKELY/UNLIKELY) as well.
This was caused by checking for "__json__" using PyObject_HasAttrString
which clears the error set by a previous long overflow. Thus this was dependent
on the order of processing of dict items, which explains why it was
seemingly random as the dict items are likely ordered by a hash of
the key.
This fixes GH224 and GH240.