Posted: November 27, 2017
Recently I was working on (gogo text search),
a Java library for text search. One of the features of this library is that it
can search text in strings using expressions like foo and bar and not (blub or blob)
.
A parser converts patterns to a tree of Criterion
objects. Here is a class diagram of the Criterion
interface
and its implementations:
.
After parsing the pattern foo and bar and not (blub or blob)
to a tree, the tree looks like this (object diagram):
.
When a text, for example barefoot
, is to be searched, this text has to be matched with all leaves of the
tree, resulting in the value true
if the text contains the text in the leave, and false
if the text does
not contain the text in the leave. After that, the logical operators in the other nodes must be applied,
combining the intermediate results into a final result. For barefoot
the final result is true
.
The Criterion
hierarchy is only intended to represent a parsed pattern. I want to use it in several places
to search texts in different ways. For example: I want to search case sensitively and case insensitively.
The Criterion
hierarchy itself should not be extended with different search methods. At this point I decided
to apply the Visitor pattern to implement the different search methods without modifying the Criterion
hierarchy.
Wikipedia describes the Visitor pattern as follows
In object-oriented programming and software engineering, the visitor design pattern is a way of separating an algorithm from an object structure on which it operates. A practical result of this separation is the ability to add new operations to existent object structures without modifying the structures. It is one way to follow the open/closed principle.
Here is how I applied the Visitor pattern. The Criterion
hierarchy was extended with a CriterionVisitor
interface:
Below is the code of the class that implements the CriterionVisitor
interface.
public class CriterionMatcher {
public boolean matches(Criterion criterion, String text) {
CriterionVisistor criterionVisitor =
new CriterionVisistor(text);
criterion.accept(criterionVisitor);
return criterionVisitor.matches();
}
private static class CriterionVisistor
implements CriterionVisitor {
private final String text;
private Deque<Boolean> stack = new LinkedList<>();
private CriterionVisistor(String text) {
this.text = text;
}
@Override
public void visit(And and) {
stack.push(stack.pop() & stack.pop());
}
@Override
public void visit(Or or) {
stack.push(stack.pop() | stack.pop());
}
@Override
public void visit(Not not) {
stack.push(!stack.pop());
}
@Override
public void visit(StringLiteral stringLiteral) {
stack.push(text.contains(stringLiteral.getLiteral()));
}
public boolean matches() {
return stack.pop();
}
}
}
The accept()
method of the class And
looks like this:
@Override
public void accept(CriterionVisitor visitor) {
left.accept(visitor);
right.accept(visitor);
visitor.visit(this);
}
Before I explain the code above, I quickly show how you can check if a text contains the pattern foo and bar
:
Criterion fooAndBar = new And(
new StringLiteral("foo"),
new StringLiteral("bar"));
boolean containsFooAndBar =
new CriterionMatcher().matches(fooAndBar, "barefoot");
Now take a closer look to the CriterionMatcher
above. The code looks clean. Lots of small methods
that seem to make sense. Except, why are we using a stack to determine if a criterion matches a text?
For a StringLiteral
it is very easy to determine if it matches a text: text.contains(stringLiteral.getLiteral())
.
There is nothing more to it.
For a Not
it is also pretty easy: the negation of not.getCriterion()
matching the text.
And for completeness sake: for And
it looks like and.getLeft()
matches the text and and.getRight()
matches
the text.
The challenge now is that each of the Criterion
classes, in their accept()
method, first call accept()
for their child (Not
) or children (And
, Or
), before calling visitor.visit(this)
. This means that the
visitor's visit()
methods will first be called for the children and then for the actual operator. The visitor
has no control over the order in which the visit()
methods are called, and thus must store push the result
of the children on the stack first, before it can combine these results for the current operation.
Note that the the visit(And and)
uses the non-conditional and operator &
instead of the conditional
and operator &&
, to ensure that it always pops two values from the stack. For the same reason
visit(Or or)
uses the non-conditional or operator |
.
The first thing that surprised me when I read about the Visitor pattern, is that the object hierarchy that
gets visited must cooperate in an intimate fashion. Each and every node within the hierarchy must implement
an accept()
method that takes a visitor object as argument.
The second thing that worries me, is the order in which the hierarchy is traversed. The code above shows that for each node, first the child nodes are visited, before the node itself is visited. This fits nicely with the stack used by the visitor implementation. If the order was to be changed, i.e. if each node is visited before its child nodes, the visitor implementation becomes way more complicated than it is now.
But if the order in which nodes are visited is so important, why is it left to the builder of the library to choose an order? What if one client of the library needs this order and another client of the library needs a different order?
The third problem I have with the Visitor pattern, is that the Visitor
interface mentions every class
of the hierarchy. This makes it impossible to add new classes to the hierarchy outside of the library.
Only the person in control of releasing the library can add new classes.
There is one advantage though to the Visitor
interface mentioning every class of the hierarchy: if the
is updated and classes are added or removed, then at least the clients of the library will get a compilation
error for the implementors of the Visitor
interface.
Bob C. Martin describes a variation to the Visitor pattern,
called Acyclyc Visitor. Not only does this
variation break a cyclic dependency between the hierarchy classes and the visitor implementors; it also
gets rid of the Visitor
interface mentioning all hierarchy classes. Finally, it even allows visitors
to visit part of the hierarchy classes, instead of all of them.
An alternative for using the visitor pattern is simply let the client iterate over the hierarchy and inspect each and every node. This leads to a big switch statement to invoke specific code per node-type. This big switch statement is, as Bob C. Martin claims, where the classes hide. I must admit that the Visitor pattern gets rid of this smelly switch statement.
Another way to get rid of the switch-statement is to use a map. The keys of the map are the classes of the hierarchy, the values of the map are functions that evaluate if a criterion matches a text:
public class CriterionMatcher {
private final Criterion criterion;
private final Map<Class<?>, BiFunction<Criterion, String, Boolean>> classToMatchPredicate = ImmutableMap.of(
And.class, (criterion, text) -> matches(((And) criterion).getLeft(), text) && matches(((And) criterion).getRight(), text),
Or.class, (criterion, text) -> matches(((Or) criterion).getLeft(), text) || matches(((Or) criterion).getRight(), text),
Not.class, (criterion, text) -> !matches(((Not) criterion).getCriterion(), text),
StringLiteral.class, (criterion, text) -> text.contains(((StringLiteral) criterion).getLiteral()));
public CriterionMatcher(Criterion criterion) {
this.criterion = criterion;
}
public boolean matches(String text) {
return matches(criterion, text);
}
private boolean matches(Criterion criterion, String text) {
BiFunction<Criterion, String, Boolean> matchPredicate = classToMatchPredicate.get(criterion.getClass());
return matchPredicate.apply(criterion, text);
}
}
This code is short compared to the code that implemented the Visitor pattern. Here are the pros and cons about this variation:
Pros:
Visitor
interface anymore.
Thus this variation is more strict than the Visitor pattern about "the ability to add new operations to existent object
structures without modifying the structures."Cons:
Criterion
than the current version of the library. This can be remedied by unit tests.Upon reading about the Visitor pattern I had my thoughts about it. When I first applied it, I was happy to get rid of a smelly switch-statement. But after rethinking and trying out a couple of variations of the Visitor pattern, I now conclude it gets rid of ugly type casts in return for a lot more code, more complex dependencies and more indirection.
Over the years I have come to think about code readability as the most important thing next to code correctness. Therefore, I will no longer use the Visitor pattern and use my solution with a map instead.