DFA Query

Usage

1. Constructing the keyword tree

WordTree tree = new WordTree();
tree.addWord("big");
tree.addWord("big potato");
tree.addWord("potato");
tree.addWord("just out of the pot");
tree.addWord("out of the pot");

2. Searching for keywords

// Text
String text = "I have a big potato, just out of the pot";
  1. Case 1: Standard matching, matching the shortest keyword and skipping already matched keywords
// Matched "big", so "big potato" is not matched anymore
// Matched "just out of the pot", so "out of the pot" is not matched anymore (since "just" is matched first, the longer one is matched, and the shortest matching only applies to the first character that is the same, choosing the shortest)
List<String> matchAll = tree.matchAll(text, -1, false, false);
Assert.assertEquals(matchAll.toString(), "[big, potato, just out of the pot]");
  1. Case 2: Matching the shortest keyword without skipping already matched keywords
// "big" is matched, due to the shortest matching principle, "big potato" is skipped, "potato" continues to be matched
// "just out of the pot" is matched, and since already matched keywords are not skipped, "out of the pot" is matched
matchAll = tree.matchAll(text, -1, true, false);
Assert.assertEquals(matchAll.toString(), "[big, potato, just out of the pot, out of the pot]");
  1. Case 3: Matching the longest keyword and skipping already matched keywords
// Matched "big", since it's the longest match, "big potato" is then matched
// Since "big potato" is matched, "potato" is skipped, and since "just out of the pot" is matched, "out of the pot" is skipped
matchAll = tree.matchAll(text, -1, false, true);
Assert.assertEquals(matchAll.toString(), "[big, big potato, just out of the pot]");
  1. Case 4: Matching the longest keyword without skipping already matched keywords (fullest keyword)
// Matched "big", since it's the longest match, "big potato" is then matched, and since already matched keywords are not skipped, "potato" continues to be matched
// "just out of the pot" is matched, and since already matched keywords are not skipped, "out of the pot" is matched
matchAll = tree.matchAll(text, -1, true, true);
Assert.assertEquals(matchAll.toString(), "[big, big potato, potato, just out of the pot, out of the pot]");

In addition to the matchAll method, WordTree also provides two methods: match and isMatch. These two methods only search for the first matching result. Once the first keyword is found, they stop further matching, greatly improving matching efficiency.

Dealing with special characters

Sometimes, keywords in the text often contain special characters, such as “〓key☆word”. Hutool provides the StopChar class to specifically handle skipping of special characters. This process automatically removes special characters when executing the match or matchAll method.