The implementation of serde-arrow features many serializers, one for each Arrow
data type and some additional serializers to connect everything. Each of these serializers requires
the implementation of all of the 28 required methods of the serde::Serializer trait. This
combination results in lots of repetitive code. In this post, I would like to explain my attempt to
simplify this setup using macros and a trick of writing nested macros I encountered along the way.
The serde::Serializer trait requires implementing methods like
fn serialize_bool(self, v: bool) -> Result<Self::Ok, Self::Error>;
fn serialize_i8(self, v: i8) -> Result<Self::Ok, Self::Error>;
fn serialize_i16(self, v: i16) -> Result<Self::Ok, Self::Error>;
// ...
However, most serializers in serde-arrow only implement a small subset of these methods. The rest
should return an error indicating that the type isn't supported. To simplify the implementation, I
am currently using a custom trait with default implementations for all methods. Serializers
implement this simpler trait and an adapter handles the Serde interface.
While this setup works, it obscures the underlying Serde calls. I started to wonder whether using
macros can simplify the direct implementation of the serde::Serializer trait.
The goal is to allow implementors to specify which methods they want to override. All other methods get default implementations that return errors. A simple serialzer could look like
impl<'a> serde::Serializer for &'a MySerializer {
// Declare which methods will be overwritten
implement_serializer!(
&'a MySerializer,
override serialize_i8,
override serialize_str,
);
// Implement the overriden methods
fn serialize_i8(self, v: i8) -> Result<Self::Ok, Self::Error> {
Ok(())
}
fn serialize_str(self, v: &str) -> Result<Self::Ok, Self::Error> {
Ok(())
}
}
For each method of the serde::Serializer trait, the implementaiton needs to check, whether it is
in the override list. It relies on a helper macro impl_no_match that takes a method to check and
an item to implement if the method was not passed in the override list.
This helper loops over the list of overrides using recursion. If an
override matches a provided method it stops. If the recursion finishes without match, it generates
the passed item. Due to a limitations of Rust macros, each method that can be overrided needs to
specified in its own rule. An implementation could like like
macro_rules! impl_no_match {
// If a known method is encountered, stop the recursion
(serialize_i8, [serialize_i8 $(, $tail:ident)*], $item:item) => {};
(serialize_i16, [serialize_i16 $(, $tail:ident)*], $item:item) => {};
(serialize_i32, [serialize_i32 $(, $tail:ident)*], $item:item) => {};
// ...
// If there are additional items, continue
($needle:ident, [$head:ident $(, $tail:ident)*], $item:item) => {
impl_no_match!($needle, [$($tail),*], $item);
};
// If no method matched, emit the item
($needle:ident, [], $item:item) => { $item };
}
Writing out this rule was quite error-prone and repetitive. Luckily, we can use nested macros to
remove the repetition by using one macro to define another one. Nested macros are
straightforward, if the nested macro does not require repetitions or rules do not require any reference to outer
metavariables (issue).
In this cae, they can simply include a nested macro_rules! definition (see also my other
post on this topic). For example:
macro_rules! outer {
($a:ident) => {
macro_rules! inner {
($b:stmt) => { fn $a() { $b } };
}
};
}
One thing to note is, that here the outer metavariable $a appears only in the body of inner!,
but not in the pattern. Hence, there's no ambiguity. For the impl_no_match macro however, we would
like to generate the patterns themselves. Here, we need to escape the $ used to introduce the new
metavariables. There is a proposal to allow $$ for such escapes. However, it
is not yet stabilized. Luckily, there is a workaround on stable Rust. We can use a token tree metavariable to
capture $ and then use this token tree to escape the $ occurences. I discovered this trick
reading this answer on GitHub. With this trick we can generate the match case pattern
from a list of known names as follows
macro_rules! define_impl_no_match {
(
// Should be "$", when calling this macro
$d:tt;
$($known_name:ident),*
) => {
macro_rules! impl_no_match {
// Iterate over the instances of $known_name
$(
// Use $d for any $ that should be part of the generated pattern
(
$known_name,
[$knonw_name $d (, $d haystack:ident)*],
$d item:item
) => {
// Macro body, here empty
};
)*
// The rest of the impl_no_match rules
}
};
}
This macro is called with the initial $ to allow $d as an escape and all known serializer
methods, as in
define_impl_no_match!($; serialize_i8, serialize_i16, /* .. */);
The user-facing implement_serializer macro uses the impl_no_match macro to generate defaults for
all methods not marked as an override:
macro_rules! implement_serializer {
(
& $lifetime:lifetime $name:ident,
$(override $override:ident),*
$(,)?
) => {
impl_no_match!(
serialize_i8,
[$($override),*],
fn serialize_i8(self, _v: i8) -> Result<Self::Ok, Self::Error> {
Err(Error::unsupported(concat!(
stringify!($name),
" does not support serialize_i8"
)))
}
);
impl_no_match!(
serialize_i16,
[$($override),*],
fn serialize_i16(self, _v: i16) -> Result<Self::Ok, Self::Error> {
Err(Error::unsupported(concat!(
stringify!($name),
" does not support serialize_i16"
)))
}
);
// ..
};
}
Using one macro to define another macro eliminates duplication when you have a fixed list that needs
to appear in multiple places. The trick of capturing an external $ symbol in a token tree
metavariables make it possible to dynamically generate pattern matching rules. I am still
experimenting with this approach and am figuring out whether it introduces too much "magic." While
nested macros help to reduce boilerplate and make code less error prone to write, they also make the
code harder to read and understand.
The complete implementation of the define_impl_no_match macro is given below. It hides the
machinery described in this post behind an internal @find rule. This setup allows to check the
passed needled and haystack items against the list of known methods to give clear error messages.
macro_rules! define_impl_no_match {
(
$d:tt;
$($known_name:ident),*
$(,)?
) => {
macro_rules! impl_no_match {
(
$d needle:ident,
[$d ($d haystack:ident),*],
$d item:item
) => {
// Validate that needle is a known method
impl_no_match!(
@find,
$d needle,
[$($known_name),*],
compile_error!(concat!(
"Unknown name: ",
stringify!($d needle)
));
);
// Validate each haystack item
$d (
impl_no_match!(
@find,
$d haystack,
[$($known_name),*],
compile_error!(concat!(
"Unknown name: ",
stringify!($d haystack)
));
);
)*
// Do the actual check
impl_no_match!(
@find,
$d needle,
[$d ($d haystack),*],
$d item
);
};
// Generate a match arm for each known method
$(
// Needle matches the head, stop the recursion and expand to nothing
(
@find,
$known_name,
[$known_name $d(, $d ident:ident)*],
$d item:item
) => {};
)*
// Needle does not match the head, recurse on tail
(
@find,
$d needle:ident,
[$d head:ident $d(, $d tail:ident)*],
$d item:item
) => {
impl_no_match!(
@find,
$d needle,
[$d($d tail),*],
$d item
);
};
// Needle not found in haystack, emit the item
(
@find,
$d needle:ident,
[],
$d item:item
) => {
$d item
};
}
};
}