BERT: Bidirectional Encoder Representations from Transformers

This 3 minutes video illustrates how BERT works, with a step-by-step illustration of the math behind the complex algorithm

Check out the original paper here: https://arxiv.org/abs/1810.04805

@article{devlin2018bert, title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding}, author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, journal={arXiv preprint arXiv:1810.04805}, year={2018} }