Reducing boilerplate in Serde serializers with nested Rust macros

The implementation of serde-arrow features many serializers, one for each Arrow data type and some additional serializers to connect everything. Each of these serializers requires the implementation of all of the 28 required methods of the serde::Serializer trait. This combination results in lots of repetitive code. In this post, I would like to explain my attempt to simplify this setup using macros and a trick of writing nested macros I encountered along the way.

The serde::Serializer trait requires implementing methods like

fn serialize_bool(self, v: bool) -> Result<Self::Ok, Self::Error>;
fn serialize_i8(self, v: i8) -> Result<Self::Ok, Self::Error>;
fn serialize_i16(self, v: i16) -> Result<Self::Ok, Self::Error>;
// ...

However, most serializers in serde-arrow only implement a small subset of these methods. The rest should return an error indicating that the type isn't supported. To simplify the implementation, I am currently using a custom trait with default implementations for all methods. Serializers implement this simpler trait and an adapter handles the Serde interface. While this setup works, it obscures the underlying Serde calls. I started to wonder whether using macros can simplify the direct implementation of the serde::Serializer trait.

The goal is to allow implementors to specify which methods they want to override. All other methods get default implementations that return errors. A simple serialzer could look like

impl<'a> serde::Serializer for &'a MySerializer {
    // Declare which methods will be overwritten
    implement_serializer!(
        &'a MySerializer,
        override serialize_i8,
        override serialize_str,
    );

    // Implement the overriden methods
    fn serialize_i8(self, v: i8) -> Result<Self::Ok, Self::Error> {
        Ok(())
    }
    
    fn serialize_str(self, v: &str) -> Result<Self::Ok, Self::Error> {
        Ok(())
    }
}

For each method of the serde::Serializer trait, the implementaiton needs to check, whether it is in the override list. It relies on a helper macro impl_no_match that takes a method to check and an item to implement if the method was not passed in the override list. This helper loops over the list of overrides using recursion. If an override matches a provided method it stops. If the recursion finishes without match, it generates the passed item. Due to a limitations of Rust macros, each method that can be overrided needs to specified in its own rule. An implementation could like like

macro_rules! impl_no_match {
    // If a known method is encountered, stop the recursion
    (serialize_i8, [serialize_i8 $(, $tail:ident)*], $item:item) => {}; 
    (serialize_i16, [serialize_i16 $(, $tail:ident)*], $item:item) => {}; 
    (serialize_i32, [serialize_i32 $(, $tail:ident)*], $item:item) => {}; 
    // ...

    // If there are additional items, continue
    ($needle:ident, [$head:ident $(, $tail:ident)*], $item:item) => {
       impl_no_match!($needle, [$($tail),*], $item);  
    };

    // If no method matched, emit the item
    ($needle:ident, [], $item:item) => { $item };
}

Writing out this rule was quite error-prone and repetitive. Luckily, we can use nested macros to remove the repetition by using one macro to define another one. Nested macros are straightforward, if the nested macro does not require repetitions or rules do not require any reference to outer metavariables (issue). In this cae, they can simply include a nested macro_rules! definition (see also my other post on this topic). For example:

macro_rules! outer { 
  ($a:ident) => { 
    macro_rules! inner { 
      ($b:stmt) => { fn $a() { $b } };
    }
  };
}

One thing to note is, that here the outer metavariable $a appears only in the body of inner!, but not in the pattern. Hence, there's no ambiguity. For the impl_no_match macro however, we would like to generate the patterns themselves. Here, we need to escape the $ used to introduce the new metavariables. There is a proposal to allow $$ for such escapes. However, it is not yet stabilized. Luckily, there is a workaround on stable Rust. We can use a token tree metavariable to capture $ and then use this token tree to escape the $ occurences. I discovered this trick reading this answer on GitHub. With this trick we can generate the match case pattern from a list of known names as follows

macro_rules! define_impl_no_match {
    (
        // Should be "$", when calling this macro
        $d:tt;
        $($known_name:ident),* 
    ) => {
        macro_rules! impl_no_match {
            // Iterate over the instances of $known_name
            $(
                // Use $d for any $ that should be part of the generated pattern
                (
                    $known_name, 
                    [$knonw_name $d (, $d haystack:ident)*], 
                    $d item:item
                ) => {
                    // Macro body, here empty
                };
            )*
            // The rest of the impl_no_match rules
        }
    };
}

This macro is called with the initial $ to allow $d as an escape and all known serializer methods, as in

define_impl_no_match!($;  serialize_i8, serialize_i16, /* .. */);

The user-facing implement_serializer macro uses the impl_no_match macro to generate defaults for all methods not marked as an override:

macro_rules! implement_serializer {
    (
        & $lifetime:lifetime $name:ident,
        $(override $override:ident),* 
        $(,)?
    ) => {
        impl_no_match!(
            serialize_i8, 
            [$($override),*], 
            fn serialize_i8(self, _v: i8) -> Result<Self::Ok, Self::Error> {
                Err(Error::unsupported(concat!(
                    stringify!($name), 
                    " does not support serialize_i8"
                )))
            }
        );
        impl_no_match!(
            serialize_i16, 
            [$($override),*], 
            fn serialize_i16(self, _v: i16) -> Result<Self::Ok, Self::Error> {
                Err(Error::unsupported(concat!(
                    stringify!($name), 
                    " does not support serialize_i16"
                )))
            }
        );
        // ..
    };
}

Using one macro to define another macro eliminates duplication when you have a fixed list that needs to appear in multiple places. The trick of capturing an external $ symbol in a token tree metavariables make it possible to dynamically generate pattern matching rules. I am still experimenting with this approach and am figuring out whether it introduces too much "magic." While nested macros help to reduce boilerplate and make code less error prone to write, they also make the code harder to read and understand.

The complete implementation of the define_impl_no_match macro is given below. It hides the machinery described in this post behind an internal @find rule. This setup allows to check the passed needled and haystack items against the list of known methods to give clear error messages.

macro_rules! define_impl_no_match {
    (
        $d:tt; 
        $($known_name:ident),* 
        $(,)?
    ) => {
        macro_rules! impl_no_match {
            (
                $d needle:ident, 
                [$d ($d haystack:ident),*], 
                $d item:item
            ) => {
                // Validate that needle is a known method
                impl_no_match!(
                    @find, 
                    $d needle, 
                    [$($known_name),*],  
                    compile_error!(concat!(
                        "Unknown name: ", 
                        stringify!($d needle)
                    ));
                );
                
                // Validate each haystack item
                $d (
                    impl_no_match!(
                        @find, 
                        $d haystack, 
                        [$($known_name),*],  
                        compile_error!(concat!(
                            "Unknown name: ", 
                            stringify!($d haystack)
                        ));
                    );
                )*
                
                // Do the actual check
                impl_no_match!(
                    @find, 
                    $d needle, 
                    [$d ($d haystack),*], 
                    $d item
                );
            };
            
            // Generate a match arm for each known method
            $(
                // Needle matches the head, stop the recursion and expand to nothing
                (
                    @find,
                    $known_name, 
                    [$known_name $d(, $d ident:ident)*], 
                    $d item:item
                ) => {}; 
            )*
            
            // Needle does not match the head, recurse on tail
            (
                @find,
                $d needle:ident, 
                [$d head:ident $d(, $d tail:ident)*], 
                $d item:item
            ) => {
                impl_no_match!(
                    @find, 
                    $d needle, 
                    [$d($d tail),*], 
                    $d item
                );
            };
            
            // Needle not found in haystack, emit the item
            (
                @find,
                $d needle:ident, 
                [], 
                $d item:item
            ) => {
                $d item
            };
        }
    };
}