Distilling knowledge from Neural Networks to build smaller and faster models This article discusses GPT-2 and BERT models, as well using knowledge distillation to create highly accurate models with fewer parameters than their t… OpenTeams November 11, 2019