Skip to content Skip to sidebar Skip to footer

Python Json.load Missing Array Hook And Parse Callbacks?

Why does json.loads, only allow object_hook, parse_float, parse_int and parse_constant? json.loads(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, ob

Solution 1:

Actually json.loads calls its scanner to parse the input string, and all behaviors can be hooked by reconstructing the scanner in your customized class.

import json
from json.scanner import py_make_scanner
from json.decoder import JSONArray

classCustomizedDecoder(json.JSONDecoder):
    def__init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        defparse_array(*_args, **_kwargs):
            values, end = JSONArray(*_args, **_kwargs)
            for item in values:
                print(item)  # Is it what you want?return values, end

        self.parse_array = parse_array
        self.scan_once = py_make_scanner(self)

json.loads('{"a": [1, [2, [3.1], ["4"]]]}', cls=CustomizedDecoder)

outputs

3.1
4
2
[3.1]
['4']
1
[2, [3.1], ['4']]

Moreover, there are several other functions you can hook by doing exactly same things.

self.object_hook = object_hook
self.parse_float = parse_float orfloat
self.parse_int = parse_int orint
self.parse_constant = parse_constant or _CONSTANTS.__getitem__
self.object_pairs_hook = object_pairs_hook
self.parse_object = JSONObject
self.parse_array = JSONArray
self.parse_string = scanstring

Solution 2:

If you are willing to take a somewhat slower parsing into account, you can use the ruamel.yaml parser for this (disclaimer: I am the author of that package). As YAML 1.2 is a superset of JSON for all practical purposes, you can subclass the Constructor:

import sys
from ruamel.yaml import YAML, SafeConstructor

json_str = '{"a": [1, [2.0, True, [3, null]]]}'classMyConstructor(SafeConstructor):
    defconstruct_yaml_null(self, node):
        print('null')
        data = SafeConstructor.construct_yaml_null(self, node)
        return data

    defconstruct_yaml_bool(self, node):
        print('bool')
        data = SafeConstructor.construct_yaml_bool(self, node)
        return data

    defconstruct_yaml_int(self, node):
        print('int')
        data = SafeConstructor.construct_yaml_int(self, node)
        return data

    defconstruct_yaml_float(self, node):
        print('float')
        data = SafeConstructor.construct_yaml_float(self, node)
        return data

    defconstruct_yaml_str(self, node):
        print('str')
        data = SafeConstructor.construct_yaml_str(self, node)
        return data

    defconstruct_yaml_seq(self, node):
        print('seq')
        for data in SafeConstructor.construct_yaml_seq(self, node):
            passreturn data

    defconstruct_yaml_map(self, node):
        print('map')
        for data in SafeConstructor.construct_yaml_map(self, node):
            passreturn data


MyConstructor.add_constructor(
    u'tag:yaml.org,2002:null',
    MyConstructor.construct_yaml_null)

MyConstructor.add_constructor(
    u'tag:yaml.org,2002:bool',
    MyConstructor.construct_yaml_bool)

MyConstructor.add_constructor(
    u'tag:yaml.org,2002:int',
    MyConstructor.construct_yaml_int)

MyConstructor.add_constructor(
    u'tag:yaml.org,2002:float',
    MyConstructor.construct_yaml_float)

MyConstructor.add_constructor(
    u'tag:yaml.org,2002:str',
    MyConstructor.construct_yaml_str)

MyConstructor.add_constructor(
    u'tag:yaml.org,2002:seq',
    MyConstructor.construct_yaml_seq)

MyConstructor.add_constructor(
    u'tag:yaml.org,2002:map',
    MyConstructor.construct_yaml_map)


yaml = YAML(typ='safe')
yaml.Constructor = MyConstructor

data = yaml.load(json_str)
print(data)

Just replace the code in each construct_yaml_XYZ method with code that creates the objects you want and return those.

The "funny business" with the for loop when creating a mapping/dict resp. sequence/list, is to unwrap the two step process of creating these objects (necessary for "real" YAML input to deal with recursive data structures using anchors/aliases).

The above outputs:

mapstr
seq
int
seq
floatbool
seq
int
null
{'a': [1, [2.0, True, [3, None]]]}

You can also hook into the YAML parser at a lower level, but that doesn't make the implementation easier and probably only marginally faster.

Solution 3:

Martijn Pieters' comment had the correct approach:

Performance. And the hooks are not meant to allow for a wholesale replacement of the JSON format.

I was going in the wrong direction attempting to use hooks to parse JSON. They exist to augment parsing, but not replace it.

Post a Comment for "Python Json.load Missing Array Hook And Parse Callbacks?"