Go Go Gnome
the website of Sander Kooijmans

The Visitor Pattern Revisited

Posted: November 27, 2017

Introduction

Recently I was working on (gogo text search), a Java library for text search. One of the features of this library is that it can search text in strings using expressions like foo and bar and not (blub or blob).

A parser converts patterns to a tree of Criterion objects. Here is a class diagram of the Criterion interface and its implementations:

Criterion hierarchy class diagram.

After parsing the pattern foo and bar and not (blub or blob) to a tree, the tree looks like this (object diagram):

Tree object diagram.

When a text, for example barefoot, is to be searched, this text has to be matched with all leaves of the tree, resulting in the value true if the text contains the text in the leave, and false if the text does not contain the text in the leave. After that, the logical operators in the other nodes must be applied, combining the intermediate results into a final result. For barefoot the final result is true.

The Criterion hierarchy is only intended to represent a parsed pattern. I want to use it in several places to search texts in different ways. For example: I want to search case sensitively and case insensitively.
The Criterion hierarchy itself should not be extended with different search methods. At this point I decided to apply the Visitor pattern to implement the different search methods without modifying the Criterion hierarchy.

Enter Visitor

Wikipedia describes the Visitor pattern as follows

In object-oriented programming and software engineering, the visitor design pattern is a way of separating an algorithm from an object structure on which it operates. A practical result of this separation is the ability to add new operations to existent object structures without modifying the structures. It is one way to follow the open/closed principle.

Here is how I applied the Visitor pattern. The Criterion hierarchy was extended with a CriterionVisitor interface:

Criterion visitor class diagram

Below is the code of the class that implements the CriterionVisitor interface.

public class CriterionMatcher {

    public boolean matches(Criterion criterion, String text) {
        CriterionVisistor criterionVisitor = 
            new CriterionVisistor(text);
        criterion.accept(criterionVisitor);
        return criterionVisitor.matches();
    }

    private static class CriterionVisistor 
            implements CriterionVisitor {

        private final String text;
        private Deque<Boolean> stack = new LinkedList<>();

        private CriterionVisistor(String text) {
            this.text = text;
        }

        @Override
        public void visit(And and) {
            stack.push(stack.pop() & stack.pop());
        }

        @Override
        public void visit(Or or) {
            stack.push(stack.pop() | stack.pop());
        }

        @Override
        public void visit(Not not) {
            stack.push(!stack.pop());
        }

        @Override
        public void visit(StringLiteral stringLiteral) {
            stack.push(text.contains(stringLiteral.getLiteral()));
        }

        public boolean matches() {
            return stack.pop();
        }
    }

}

The accept() method of the class And looks like this:

@Override
public void accept(CriterionVisitor visitor) {
    left.accept(visitor);
    right.accept(visitor);
    visitor.visit(this);
}

Before I explain the code above, I quickly show how you can check if a text contains the pattern foo and bar:

Criterion fooAndBar = new And(
        new StringLiteral("foo"), 
        new StringLiteral("bar"));
boolean containsFooAndBar = 
        new CriterionMatcher().matches(fooAndBar, "barefoot");

Now take a closer look to the CriterionMatcher above. The code looks clean. Lots of small methods that seem to make sense. Except, why are we using a stack to determine if a criterion matches a text?

For a StringLiteral it is very easy to determine if it matches a text: text.contains(stringLiteral.getLiteral()). There is nothing more to it.

For a Not it is also pretty easy: the negation of not.getCriterion() matching the text. And for completeness sake: for And it looks like and.getLeft() matches the text and and.getRight() matches the text.

The challenge now is that each of the Criterion classes, in their accept() method, first call accept() for their child (Not) or children (And, Or), before calling visitor.visit(this). This means that the visitor's visit() methods will first be called for the children and then for the actual operator. The visitor has no control over the order in which the visit() methods are called, and thus must store push the result of the children on the stack first, before it can combine these results for the current operation.

Note that the the visit(And and) uses the non-conditional and operator & instead of the conditional and operator &&, to ensure that it always pops two values from the stack. For the same reason visit(Or or) uses the non-conditional or operator |.

Problems with the Visitor pattern

The first thing that surprised me when I read about the Visitor pattern, is that the object hierarchy that gets visited must cooperate in an intimate fashion. Each and every node within the hierarchy must implement an accept() method that takes a visitor object as argument.

The second thing that worries me, is the order in which the hierarchy is traversed. The code above shows that for each node, first the child nodes are visited, before the node itself is visited. This fits nicely with the stack used by the visitor implementation. If the order was to be changed, i.e. if each node is visited before its child nodes, the visitor implementation becomes way more complicated than it is now.

But if the order in which nodes are visited is so important, why is it left to the builder of the library to choose an order? What if one client of the library needs this order and another client of the library needs a different order?

The third problem I have with the Visitor pattern, is that the Visitor interface mentions every class of the hierarchy. This makes it impossible to add new classes to the hierarchy outside of the library. Only the person in control of releasing the library can add new classes.

There is one advantage though to the Visitor interface mentioning every class of the hierarchy: if the is updated and classes are added or removed, then at least the clients of the library will get a compilation error for the implementors of the Visitor interface.

Bob C. Martin describes a variation to the Visitor pattern, called Acyclyc Visitor. Not only does this variation break a cyclic dependency between the hierarchy classes and the visitor implementors; it also
gets rid of the Visitor interface mentioning all hierarchy classes. Finally, it even allows visitors to visit part of the hierarchy classes, instead of all of them.

Exit Visitor

An alternative for using the visitor pattern is simply let the client iterate over the hierarchy and inspect each and every node. This leads to a big switch statement to invoke specific code per node-type. This big switch statement is, as Bob C. Martin claims, where the classes hide. I must admit that the Visitor pattern gets rid of this smelly switch statement.

Another way to get rid of the switch-statement is to use a map. The keys of the map are the classes of the hierarchy, the values of the map are functions that evaluate if a criterion matches a text:

public class CriterionMatcher {

    private final Criterion criterion;

    private final Map<Class<?>, BiFunction<Criterion, String, Boolean>> classToMatchPredicate = ImmutableMap.of(
            And.class, (criterion, text) -> matches(((And) criterion).getLeft(), text) && matches(((And) criterion).getRight(), text),
            Or.class, (criterion, text) -> matches(((Or) criterion).getLeft(), text) || matches(((Or) criterion).getRight(), text),
            Not.class, (criterion, text) -> !matches(((Not) criterion).getCriterion(), text),
            StringLiteral.class, (criterion, text) -> text.contains(((StringLiteral) criterion).getLiteral()));

    public CriterionMatcher(Criterion criterion) {
        this.criterion = criterion;
    }

    public boolean matches(String text) {
        return matches(criterion, text);
    }

    private boolean matches(Criterion criterion, String text) {
        BiFunction<Criterion, String, Boolean> matchPredicate = classToMatchPredicate.get(criterion.getClass());
        return matchPredicate.apply(criterion, text);
    }

}

This code is short compared to the code that implemented the Visitor pattern. Here are the pros and cons about this variation:

Pros:

Cons:

Conclusion

Upon reading about the Visitor pattern I had my thoughts about it. When I first applied it, I was happy to get rid of a smelly switch-statement. But after rethinking and trying out a couple of variations of the Visitor pattern, I now conclude it gets rid of ugly type casts in return for a lot more code, more complex dependencies and more indirection.

Over the years I have come to think about code readability as the most important thing next to code correctness. Therefore, I will no longer use the Visitor pattern and use my solution with a map instead.