Relaxed JSON parsing in go
In the past I encountered situations, where I needed a relaxed approach to parse JSON documents in Go, because of the server side sending occasionally unexpected data, which was beyond of my control.
Case 1: API inconsistently returns an empty array instead of null
When I was implementing a RabbitMQ client as part of my
rabtap RabbitMQ tool, I
stumbled over the problem, that the RabbitMQ management
API
returned an empty array ([]
) where an object was
expected. When trying to unmarshal the JSON into a struct
like e.g.
type RabbitMQChannel struct {
ConnectionDetails ConnectionsDetails `json:"connection_details"`
// more Attributes of RabbitMQChannel ...
}
type ConnectionDetails struct {
// Attributes of ConnectionDetails ...
}
and RabbitMQ returns something like
{
"connections_details": []
}
The json.Unmarshal
function will fail with the error cannot unmarshal array into Go struct field
, because the attribute ConnectionsDetails
, which is an
object, cannot be deserialized as an array. Since I could not changt the server’s
response, I resolved this problem on the client side by implementing a custom
unmarshaler that checks if the unmarshaled value is an (empty) array or an
object:
func (d *ConnectionDetails) UnmarshalJSON(data []byte) error {
// alias ConnectionDetails to avoid recursion when calling Unmarshal
type Alias ConnectionDetails
aux := struct {
*Alias
}{
Alias: (*Alias)(d),
}
return unmarshalEmptyArrayOrObject(data, &aux)
}
func unmarshalEmptyArrayOrObject(data []byte, v any) error {
if data[0] == '[' {
// JSON array detected
return nil
}
return json.Unmarshal(data, v)
}
Above custom UnmarshalJSON
accepts both an object and an array (but only the
first [
is checked). When an array is detected, unmarshaling is ended,
returning the zero value. Otherwise a
JSON object is expected and unmarshaled. This makes our code flexible enough to
accept the inconsistent data sometimes returned by the RabbitMQ API.
A working example can be found on the go playground.
Case 2: Skip unparsable array elements
Recently I worked on some code that parsed large JSON documents, basically
consisting of arrays with thousands of elements. The schema of the JSON was not
formally typed on the server-side, but on the client side, using go struct
s
with JSON annoations. That worked as long as the server returned the data in
the expected format, e.g. an price
attribute is of type int
and not
string
. Since the data returned by the server was partially stored schemaless
(as JSON data in
a PostgreSQL database), eventually some records were created with the wrong
types, e.g. price
was no longer an int
like 23
, but a string
like
"23"
now. This resulted in an unmarshaling error, rejecting the whole
document.
Take this as an example for a JSON returned by the server, with a faulty
record (the one with mz-800
):
{
"items": [
{ "name": "st", "price": 1000 },
{ "name": "amiga", "price": 900 },
{ "name": "mz-800", "price": "500" },
{ "name": "archimedes", "price": 2000 }
]
}
The corresponding go type definitions look like:
type PriceData struct {
Items []PriceDataItem `json:"items"`
}
type PriceDataItem struct {
Name string `json:"name"`
Price int `json:"price"`
}
Unmarshalling the JSON with json.Unmarshal()
will now fail, returning
nothing at all.
In certain situations, however, we might want to read the data on a best-effort
basis: simply reading all array elements and skipping only those elements that cannot
be unmarshaled (in the example above, that would mean to skip the "mz-800"
element).
We can can achieve this by using json.RawMessage
and performing the unmarshalling in two passes: first we unmarshal an array of RawMessage
s,
then each RawMessage
is unmarshaled in a PriceDataItem
, skipping erroneous
ones:
type PriceData struct {
CommonPriceData
Data []PriceDataItem `json:"data"`
}
type priceData struct {
CommonPriceData
Data []json.RawMessage `json:"data"`
}
type CommonPriceData struct {
// common attributes ...
Version int `json:"version"`
}
type PriceDataItem struct {
Name string `json:"name"`
Price int `json:"price"`
}
func relaxedUnmarshalPriceDataJSON(s string) (PriceData, error) {
var rawPriceData priceData
if err := json.Unmarshal([]byte(s), &rawPriceData); err != nil {
return PriceData{}, err
}
// unmarshal the inner array item-by-item, skipping erroneous items
var priceData PriceData
for _, rawItem := range rawPriceData.Data {
var item PriceDataItem
if err := json.Unmarshal(rawItem, &item); err == nil {
priceData.Data = append(priceData.Data, item)
}
}
priceData.CommonPriceData = rawPriceData.CommonPriceData
return priceData, nil
}
A working example can be found on the go playground.
Summary
If needed, unmarshaling of JSON in go can be easily tweaked to also accept unexpected formats. In this blog post, I demonstrated two scenarios and provided possible solutions for each.