Posts

Showing posts from September, 2024

Training a Basic Language Model (Trigram Language Model)

Image
Large Language Models (LLMs) have revolutionized artificial intelligence in my opinion.  When I first explored OpenAi's  GPT 3.5 Turbo two years back, I was shocked. That how this models works. Then with the passage of time, other LLMs like Google's Gemini, Anthropic's Claud, Microsoft's Copliot and many more amazed us with their increased context lengths, increased parameters, and accuracy as well as RAG in these models. This advancement sparked curiosity inside me, that how these giants language models work in the backend? How do they predict things? What's inside there? Which kind of mathematics and statistics is involved? So, to quench my thirst, I decided to go deep down to the very basics and understand from there, that how language models are built, how mathematics is involved, how to train them, how to increase their accuracy so they predict accurately.  In this blog post, I will train a trigram language model. Trigram means three ...

Analysis of the Paper: RAGE Against the Machine: Retrieval-Augmented LLM Explanations. A personal Take.

Image
In recent years, we've witnessed remarkable advancements, particularly with large language models like GPT, Gemini, Claude, and others. Research scientists are now training increasingly complex models, extending their context lengths, and equipping them with billions or even trillions of parameters and  boosting their computational capabilities. However, as these models improve rapidly, issues arise regarding their outputs. Researchers are now focusing on quantifying the accuracy, timeliness, and reasoning behind these outputs, as well as troubleshooting their limitations.  In my last blog post [https://www.blogger.com/blog/post/edit/3763838017499822545/1365082907163370755], I discussed a paper exploring whether we can generalize LLMs, rely on their capabilities, and avoid falling into the trap of fixed effects.  One of the key advancements that sets modern Large Language Models (LLMs) apart from previous iterations is the introduction of  Retrieval-Augmented Generat...

How basic concept of Calculus (Derivative) has a Key Role in Training Neural Networks?

Image
  A Neural Network is a machine learning model that works like the human brain. Just like how our brain has neurons that send signals, a neural network has artificial neurons that pass information. The network takes input, does some math, and passes the information to the next neurons until it reaches a decision. Each neuron has a weight, which shows how important it is for that neuron to pass the signal. This is like how our brain gives more attention to important thoughts or emotions. When the network doesn't make the right decision, it uses a method called backpropagation to go back and check how much each neuron's importance affected the decision. Then, it adjusts these weights to try and get a better result next time.  This process helps the neural network learn and improve over time, just like how we learn from our mistakes. Below is the basic structure of a neural network.  Adjusting weights in a backpropagation is very important to reach...