đź‘‹ Welcome to my blog

Hey! Thanks for visiting :) This is Pranav. I work at Google Labs, building Enterprise AI agents to power Google’s workforce for the future. Gaining interest in RL, self-awareness, and evals. Feel free to DM me, chat, or share feedback!

Dataset Distribution for Generalization: A Case Study in Self-Awareness

“In all things the middle state is to be praised.” - Aristotle Research code: https://github.com/pranavc28/self-awareness-grpo LLM: openai/gpt-oss-20b Datasets: https://github.com/tdiggelm/climate-fever-dataset, https://fever.ai/download/fever/train.jsonl, https://github.com/TalSchuster/VitaminC Claim Diverse training datasets improve generalization, with harder examples requiring stronger representation than easier ones to effectively shift model behavior. However, training exclusively on difficult examples fails to transfer to simpler domains—a mixture of both easy and hard examples is necessary for robust cross-domain performance. The domain considered is self-awareness. Follow my previous blogs for motivation and background. ...

January 20, 2026

Self-Awareness Generalization in Large Language Models

Research code: https://github.com/pranavc28/self-awareness-grpo LLM: openai/gpt-oss-20b Datasets: https://github.com/tdiggelm/climate-fever-dataset, https://fever.ai/download/fever/train.jsonl, https://github.com/TalSchuster/VitaminC Claims GRPO with calibration-aware rewards can improve NA recall by >10% on in-domain fact verification tasks. With effective Reinforcement Learning techniques, it is possible to make Large Language Models (LLMs) more self-aware, and better at letting users know when they cannot provide a high confidence response to a request. We will call such an outcome “not enough information/not applicable” (NA). Such an outcome will be crucial, as AI embeds itself in more workflows, to gain more trust in situations where an incorrect outcome such as PASS/FAIL will worsen the user experience. ...

January 5, 2026

Tinkering with Generative UI

“Form and function should be one, joined in a spiritual union” - Frank Lloyd Wright Research code: https://github.com/pranavc28/generative-ui Dataset: https://huggingface.co/datasets/cfahlgren1/react-code-instructions/ Purpose This blog explores how I fine-tuned Qwen, Qwen/Qwen3-30B-A3B, an open source model producing React code for a fraction of the cost of any of the top-tier AI research labs. I used Tinker to fine-tune my own LLM, a product released by the team at Thinking Machines Lab. It did a great job, and produced generally usable React code at a better level that I expected it to from an online dataset. ...

November 14, 2025

From Thinking to Knowing: Using Natural Language Confidence From LLM Thought Processes

“And this is wisdom and temperance and self-knowledge — for a man to know what he knows, and what he does not know.” - Plato, Charmides “To say you know when you know, and to say you do not when you do not — that is knowledge.” - Confucius Special thanks to Yash Sharma for a lot of valuable feedback on my idea and evaluation methodology Research code: https://github.com/pranavc28/thought-engineering-and-self-awareness Claim Thought engineering techniques (overthinking and automated confidence refinement) improve multi-classification performance across all model architectures. The purpose of this blog is to explain these terms, and prove why this is true. ...

October 19, 2025

Temperature Sampling for OCR-VQA: Does It Matter?

Research code: https://github.com/pranavc28/temperature-ocr-vqa Definitions Temperature in LLMs controls how predictable or exploratory a model’s outputs are. Low temperature = consistent and factual, good for precise tasks. High temperature = more diverse, good for creative tasks—but also riskier. Visual Question Answering (VQA) is about answering questions directly from images. For OCR tasks, like reading a book cover, VQA can outperform raw OCR because it focuses only on what’s asked (e.g., “Who’s the author?”) instead of dumping every piece of text. ...

September 27, 2025