Bling FIRE Tokenizer for BERT

Bling Fire Tokenizer is a blazing fast tokenizer that we use in production at Bing for our Deep Learning models. For online scenarios, where the tokenizer is part of the critical path to return a result to the user in the shortest amount of time, every millisecond matters. This is where Bling FIRE performance helps us achieve sub second response time, allowing more execution time for complex deep models, rather than spending this time in tokenization.

We released Bling Fire Tokenizer to open source a year ago, to great community interest. We also got feedback and feature asks from you that we’ve addressed since then.

We’ve since added support for the BERT-style tokenizers with normalization and sub-token extraction. We provide precompiled tokenization models for BERT base/large, BERT base/large cased, BERT Chinese, Bert Multilingual Cased. As well as added instructions to create your own tokenization models here and here.

In terms of speed, we’ve now measured how Bling Fire Tokenizer compares with the current BERT style tokenizers: the original WordPiece BERT tokenizer and Hugging Face tokenizer.  Using the BERT Base Uncased tokenization task, we’ve ran the original BERT tokenizer, the latest Hugging Face tokenizer and Bling Fire v0.0.13 with the following results:

As you can see Bling Fire is much faster than existing tokenizers for BERT based models. You can find the details of the benchmark here.

Bling Fire provides state of the art latency and is available for Windows, Linux and Mac OS X platforms.  You can get access to the library and find out more details at: https://github.com/Microsoft/BlingFire. To reach out to the team with questions or comments, connect with us on Stack Overflow.
 
- Bling Web Data Team