Posts

Showing posts from August, 2024

What I learned from the Paper: Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities. A personal Take.

Image
This paper is under blind review, submitted to the journal "Transactions on Machine Learning Research." It aims to quantify the capabilities of LLMs, specifically GPT-4, on trivial tasks. The study investigates whether GPT-4's responses are consistent across different scenarios or change when the input data or the prompt is slightly altered. Capabilities are examined on very simple tasks, such as counting numbers from a list, finding the mean, median, or maximum number from the list, or performing multiplication tasks, like multiplying numbers from the list that contain two digits (e.g., 45) up to five-digit numbers. To explore this, a list of input data is provided to GPT-4 using different prompts. Conversely, the same prompt is used for different lists of input data to check the model's response. The goal is to determine whether GPT-4 gives the correct response and if the responses change when the prompt or input data is altered. Several conditions per task have bee...