Sachin Grover

I am a Post-Doctoral Researcher at Interactive Robotics Laboratory, Arizona State University (ASU) working with Prof. Heni Ben Amor, at the intersection of LLM and Robotics. Before this, was a Research Scientist at PARC, part of SRI International (prior Xerox PARC). I earned my Ph.D. from Yochan Lab under the supervision of Prof. Subbarao Kambhampati at ASU. During my Ph.D., I did summer internship as an Applied Scientist Intern in Alexa at Amazon, Pittsburgh (2018 and 2021).

Currently, I am looking for full-time opportunities in research and engineering roles.

sachin . grover @ asu . edu

Research Interest: LLM Reasoning and Agents, Learning+Planning systems, Neurosymbolic techniques.

Research

I am interested in LLM based reasoning and agent behavior, with a focus on enhancing their capabilities through both pre-training and post-training techniques. With over a decade of research experience in automated reasoning and planning, I have worked on learning-based methods such as Reinforcement Learning (RL) and formal, symbolic approaches including Automated Task Planning with an application in robotics. My long-term goal is to develop neuro-symbolic systems applied to Robotics that are both data-efficient and capable of operating in dynamic environments with minimal human supervision. These agents/robots will integrate the generalization abilities of LLMs with the structure and reliability of symbolic reasoning, enabling robust decision-making and adaptation in real-world settings.

projects

on-going

LLMs for Reinforcement Learning

Developed ProPS and ProPS⁺ methodology for generating parameterized RL policies directly from LLMs, based on their capability for linguistic and numerical reasoning, coupled with an iterative refinement process. This process is driven by a closed-loop feedback mechanism that provides the LLM with policy reward data, which, along with semantic and contextual task information, enables effective in-context learning. After evaluating across 15 tasks and comparing them with state-of-the-art RL approaches, we extend the method to enhance the RL optimization capabilities of smaller, open-source LLMs. We are actively fine-tuning models such as Qwen2.5 and Qwen3.0, and our initial experiments with 14B models have shown promising results in generating RL policies of a similar scale.

Relevant Papers

Zhou, Y., Grover, S., El Mistiri, M., Kalirathinam, K., Kerhalkar, P., Mishra, S., Kumar, N., Gaurav, S., Aran, O., & Ben Amor, H. (2025). Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs. Submitted to NeurIPS, Main Track.

Role of Data and Fine-Tuning on LLM Agents

Curate data recipes for TBench and SWEBench Agent tasks, to evaluate their role on LLM Agents. Image represents an LLM agent working with interactive tasks (taken from AgentBench github repository).

completed

Role of Data in Fine-Tuning LLMs for Reasoning Tasks

Empirically evaluated role of data construction and training recipes for finetuning of LLMs for reasoning tasks. The project created OpenThought finetuned models, whose early versions matched the DeepSeek-R1 performance on AIME and LiveCodeBench etc."

Relevant Papers

Guha, E., Marten, R., Keh, S., Raoof, N., Smyrnis, G., Bansal, H., Nezhurina, M., Mercat, J., Vu, T., Sprague, Z., Suvarna, A., Feuer, B., Chen, L., Khan, Z., Frankel, E., Grover, S., Choi, C., Muennighoff, N., Su, S., … Schmidt, L. (2025). OpenThoughts: Data Recipes for Reasoning Models. https://arxiv.org/abs/2506.04178

Evalchemy — One-stop Evaluations for LLMs

Designed a one-stop shop for evaluating LLMs with more 30 different benchmarks for several down-stream tasks such as coding, reasoning on maths problems, instruction following etc. It is build on top LM-Eval-Harness.

LLM for Natural Language Understanding - A Demonstration

Developed an end-to-end pipeline using Kitchen Domain in AI2Thor, where a Robot is given commands, and it uses an LLM to parse it to formal language, and the formal structure is converted to a plan using PDDL+ planner.

Relevant Papers

Grover, S., & Mohan, S. (2024). A Demonstration of Natural Language Understanding for Embodied Agent using LLMs. ICAPS Demonstration.

Open-World AI Agent

Designed Hydra framework for domain independent agents working in dynamic environments. Dynamic environment is the one where the assumptions made by the agent during training phase change during the test phase. The agent needs to (1) understand the assumptions, and (2) accomodate them in its own model. We used the PDDL+ to model these environments and incorporate the changes.

This was a DARPA Sail-On project with over 10 teams academic and industry partners, undertaken by PARC, where we were the top performing team. The evaluation was performed by third-party on Angry Birds game, Minecraft based grid environments, and Cartpole 3D versions. We also presented a real-world demonstration where fighter jet trajectories were updated due to changes in the environment.

Relevant Papers

Mohan, S., Piotrowski, W., Stern, R., Grover, S., Kim, S., Le, J., Sher, Y., & de Kleer, J. (2024). A domain-independent agent architecture for adaptive operation in evolving open worlds. Artificial Intelligence, 104–161.

Piotrowski, W., Stern, R., Grover, S., & Mohan, S. (2024). Self-monitoring Adaptive AI Agents Operating in Open Worlds. AAAI Spring Symposium on User-Aligned Assessment of Adaptive AI Systems.

Piotrowski, W., Chao, J., Grover, S., Stern, R., Mohan, S., & S. Lange, D. (2024). Self-adaptive Mission Planning in High Fidelity Open World Simulation. ICAPS Demonstration.

Piotrowski, W., Sher, Y., Grover, S., Stern, R., & Mohan, S. (2023). Heuristic search for physics-based problems: angry birds in PDDL+. Proceedings of the International Conference on Automated Planning and Scheduling, 33(1), 518–526.

Nyx PDDL+ Planner

PDDL+ is a modeling language that can model discrete/continuous hybrid systems with exogenous events. Nyx is a planner over the most expressive modeling language that uses time discretization defined for the task, and creates a plan executable when the actions can be executed.

Relevant Papers

Piotrowski, W., Perez, A., & Grover, S. (2024). Nyx: Domain Independent PDDL+ planner for Classic Control Problems. ICAPS Workshop on Knowledge Engineering for Planning and Scheduling.