Training a Basic Language Model (Trigram Language Model)
Large Language Models (LLMs) have revolutionized artificial intelligence in my opinion. When I first explored OpenAi's GPT 3.5 Turbo two years back, I was shocked. That how this models works. Then with the passage of time, other LLMs like Google's Gemini, Anthropic's Claud, Microsoft's Copliot and many more amazed us with their increased context lengths, increased parameters, and accuracy as well as RAG in these models. This advancement sparked curiosity inside me, that how these giants language models work in the backend? How do they predict things? What's inside there? Which kind of mathematics and statistics is involved? So, to quench my thirst, I decided to go deep down to the very basics and understand from there, that how language models are built, how mathematics is involved, how to train them, how to increase their accuracy so they predict accurately. In this blog post, I will train a trigram language model. Trigram means three ...