Posts

How to correctly Initialize the Neural Network: Mechanistic Interpretability Part 1

Image
Artificial Intelligence's research and new models with more advance architectures is evolving very rapidly. But at the same time research dynamics are shifting towards analyzing and understanding the more hidden bugs in training neural networks. The research is trying to understand neural networks by breaking them down into more smaller and understandable parts. The goal is to understand each smaller part and how these smaller parts interacts with each other to make up the entire behavior of a neural network.  It also identifies the hidden bugs during initialization, training, and optimization. Bugs, which we might unconsciously ignore and can cause a problem in network training and learning patterns. This whole phenomena is called "Mechanistic Interpretability".   So, basically, now to get better output, you don't just have to change your input, change the number of parameters,  hyperparameters, embedding dimensions, activation functions or another thing. You can al...

LLMs Will Always Generate Plausible yet Incorrect Output and We Have to Make Peace With it: A Paper Review

Image
Large Language models have had a tremendous growth across all domains since past few years. Researchers are actively engaged in finetuning these models, increasing parameters, context and token length, as well as developing new architectures for their better performance. But, unfortunately, as we make advancements in LLMs, we also come across various issues and limitations of these large language models. One of the biggest limitation of LLMs is generating plausible yet incorrect outputs: Hallucination. It means when a language model gives an output but that output is not 100% based on facts, not 100% correct, as well as not even 100% aligned with its trained data and retrieved information which is also called RAG. Over the course of time, various techniques have been applied, but hallucination isn't coped up with completely. And the paper I am gonna cover which is: "LLMs Will Always Hallucinate, and We Need to Live With This"  makes some interesting claims and prove them...

Training a Basic Language Model (Trigram Language Model)

Image
Large Language Models (LLMs) have revolutionized artificial intelligence in my opinion.  When I first explored OpenAi's  GPT 3.5 Turbo two years back, I was shocked. That how this models works. Then with the passage of time, other LLMs like Google's Gemini, Anthropic's Claud, Microsoft's Copliot and many more amazed us with their increased context lengths, increased parameters, and accuracy as well as RAG in these models. This advancement sparked curiosity inside me, that how these giants language models work in the backend? How do they predict things? What's inside there? Which kind of mathematics and statistics is involved? So, to quench my thirst, I decided to go deep down to the very basics and understand from there, that how language models are built, how mathematics is involved, how to train them, how to increase their accuracy so they predict accurately.  In this blog post, I will train a trigram language model. Trigram means three ...

Analysis of the Paper: RAGE Against the Machine: Retrieval-Augmented LLM Explanations. A personal Take.

Image
In recent years, we've witnessed remarkable advancements, particularly with large language models like GPT, Gemini, Claude, and others. Research scientists are now training increasingly complex models, extending their context lengths, and equipping them with billions or even trillions of parameters and  boosting their computational capabilities. However, as these models improve rapidly, issues arise regarding their outputs. Researchers are now focusing on quantifying the accuracy, timeliness, and reasoning behind these outputs, as well as troubleshooting their limitations.  In my last blog post [https://www.blogger.com/blog/post/edit/3763838017499822545/1365082907163370755], I discussed a paper exploring whether we can generalize LLMs, rely on their capabilities, and avoid falling into the trap of fixed effects.  One of the key advancements that sets modern Large Language Models (LLMs) apart from previous iterations is the introduction of  Retrieval-Augmented Generat...