Know Your Tokens: Real Life Tokenization for Search Engines

Attribution: Wikimedia Commons

Tokenization is a core concept of full text search. It covers the question how texts (which, in the end, are nothing more than byte arrays) can be broken down into individually searchable parts, so-called tokens. It is the token that is saved in the full text index and thus is the smallest basic unit to search for. This article will cover some strategies to cope with cases your search…