Relaxed JSON parsing in go
In the past I encountered situations, where I needed a relaxed approach to parse JSON documents in Go, because of the server side sending occasionally unexpected data, which was beyond of my control.
Case 1: API inconsistently returns an empty array instead of null
When I was implementing a RabbitMQ client as part of my
rabtap RabbitMQ tool, I
stumbled over the problem, that the RabbitMQ management
API
returned an empty array ([]) where an object was
expected. When trying to unmarshal the JSON into a struct like e.g.
type RabbitMQChannel struct {
ConnectionDetails ConnectionsDetails `json:"connection_details"`
// more Attributes of RabbitMQChannel ...
}
type ConnectionDetails struct {
// Attributes of ConnectionDetails ...
}and RabbitMQ returns something like
{
"connections_details": []
}The json.Unmarshal function will fail with the error cannot unmarshal array into Go struct field, because the attribute ConnectionsDetails, which is an
object, cannot be deserialized as an array. Since I could not changt the server’s
response, I resolved this problem on the client side by implementing a custom
unmarshaler that checks if the unmarshaled value is an (empty) array or an
object:
func (d *ConnectionDetails) UnmarshalJSON(data []byte) error {
// alias ConnectionDetails to avoid recursion when calling Unmarshal
type Alias ConnectionDetails
aux := struct {
*Alias
}{
Alias: (*Alias)(d),
}
return unmarshalEmptyArrayOrObject(data, &aux)
}
func unmarshalEmptyArrayOrObject(data []byte, v any) error {
if data[0] == '[' {
// JSON array detected
return nil
}
return json.Unmarshal(data, v)
}Above custom UnmarshalJSON accepts both an object and an array (but only the
first [ is checked). When an array is detected, unmarshaling is ended,
returning the zero value. Otherwise a
JSON object is expected and unmarshaled. This makes our code flexible enough to
accept the inconsistent data sometimes returned by the RabbitMQ API.
A working example can be found on the go playground.
Case 2: Skip unparsable array elements
Recently I worked on some code that parsed large JSON documents, basically
consisting of arrays with thousands of elements. The schema of the JSON was not
formally typed on the server-side, but on the client side, using go structs
with JSON annoations. That worked as long as the server returned the data in
the expected format, e.g. an price attribute is of type int and not
string. Since the data returned by the server was partially stored schemaless
(as JSON data in
a PostgreSQL database), eventually some records were created with the wrong
types, e.g. price was no longer an int like 23, but a string like
"23" now. This resulted in an unmarshaling error, rejecting the whole
document.
Take this as an example for a JSON returned by the server, with a faulty
record (the one with mz-800):
{
"items": [
{ "name": "st", "price": 1000 },
{ "name": "amiga", "price": 900 },
{ "name": "mz-800", "price": "500" },
{ "name": "archimedes", "price": 2000 }
]
}The corresponding go type definitions looks like:
type PriceData struct {
Items []PriceDataItem `json:"items"`
}
type PriceDataItem struct {
Name string `json:"name"`
Price int `json:"price"`
}Unmarshalling the JSON with json.Unmarshal() will now fail, returning
nothing at all.
In certain situations, however, we might want to read the data on a best-effort
basis: simply reading all array elements and skipping only those elements that cannot
be unmarshaled (in the example above, that would mean to skip the "mz-800" element).
We can can achieve this by using json.RawMessage
and performing the unmarshalling in two passes: first we unmarshal an array of RawMessages,
then each RawMessage is unmarshaled in a PriceDataItem, skipping erroneous
ones:
type PriceData struct {
CommonPriceData
Data []PriceDataItem `json:"data"`
}
type priceData struct {
CommonPriceData
Data []json.RawMessage `json:"data"`
}
type CommonPriceData struct {
// common attributes ...
Version int `json:"version"`
}
type PriceDataItem struct {
Name string `json:"name"`
Price int `json:"price"`
}
func relaxedUnmarshalPriceDataJSON(s string) (PriceData, error) {
var rawPriceData priceData
if err := json.Unmarshal([]byte(s), &rawPriceData); err != nil {
return PriceData{}, err
}
// unmarshal the inner array item-by-item, skipping erroneous items
var priceData PriceData
for _, rawItem := range rawPriceData.Data {
var item PriceDataItem
if err := json.Unmarshal(rawItem, &item); err == nil {
priceData.Data = append(priceData.Data, item)
}
}
priceData.CommonPriceData = rawPriceData.CommonPriceData
return priceData, nil
}A working example can be found on the go playground.
Summary
If needed, unmarshaling of JSON in go can be easily tweaked to also accept unexpected formats. In this blog post, I demonstrated two scenarios and provided possible solutions for each.