Self-Awareness Generalization in Large Language Models
Research code: https://github.com/pranavc28/self-awareness-grpo LLM: openai/gpt-oss-20b Datasets: https://github.com/tdiggelm/climate-fever-dataset, https://fever.ai/download/fever/train.jsonl, https://github.com/TalSchuster/VitaminC Claim With effective Reinforcement Learning techniques, it is possible to make Large Language Models (LLMs) more self-aware, and better at letting users know when they cannot provide a high confidence response to a request. We will call such an outcome “not enough information/not applicable”. Such an outcome will be crucial, as AI embeds itself in more workflows, to gain more trust in situations where an incorrect outcome such as PASS/FAIL will worsen the user experience. ...
Tinkering with Generative UI
“Form and function should be one, joined in a spiritual union” - Frank Lloyd Wright Research code: https://github.com/pranavc28/generative-ui Dataset: https://huggingface.co/datasets/cfahlgren1/react-code-instructions/ Purpose This blog explores how I fine-tuned Qwen, Qwen/Qwen3-30B-A3B, an open source model producing React code for a fraction of the cost of any of the top-tier AI research labs. I used Tinker to fine-tune my own LLM, a product released by the team at Thinking Machines Lab. It did a great job, and produced generally usable React code at a better level that I expected it to from an online dataset. ...
From Thinking to Knowing: Using Natural Language Confidence From LLM Thought Processes
“And this is wisdom and temperance and self-knowledge — for a man to know what he knows, and what he does not know.” - Plato, Charmides “To say you know when you know, and to say you do not when you do not — that is knowledge.” - Confucius Special thanks to Yash Sharma for a lot of valuable feedback on my idea and evaluation methodology Research code: https://github.com/pranavc28/thought-engineering-and-self-awareness Claim Thought engineering techniques (overthinking and automated confidence refinement) improve multi-classification performance across all model architectures. The purpose of this blog is to explain these terms, and prove why this is true. ...
Temperature Sampling for OCR-VQA: Does It Matter?
Research code: https://github.com/pranavc28/temperature-ocr-vqa Definitions Temperature in LLMs controls how predictable or exploratory a model’s outputs are. Low temperature = consistent and factual, good for precise tasks. High temperature = more diverse, good for creative tasks—but also riskier. Visual Question Answering (VQA) is about answering questions directly from images. For OCR tasks, like reading a book cover, VQA can outperform raw OCR because it focuses only on what’s asked (e.g., “Who’s the author?”) instead of dumping every piece of text. ...