Projects

2026 · Bachelor's thesis (TFG) · Universidade de Vigo

Partial observability in deep RL

My bachelor's thesis: a reproducible benchmark for how reinforcement-learning agents cope when they can only see part of the world — and which architectures recover the missing state.

The problem

My bachelor's thesis. Reinforcement-learning agents are usually shown the whole board; the real world hands them a keyhole. The question isn't the lazy "PPO vs Markov" — Markov isn't an algorithm. It's two cleaner questions: how much does performance drop when the observation is partial instead of near-Markovian, and which method best recovers the missing state?

Approach & tradeoffs

A modular, reproducible MiniGrid harness where observability is the independent variable. Three baselines:

They run across two environments — FourRooms for navigation and exploration, MemoryS13Random for memory under partial observability — while FullyObsWrapper and frame-stacking approximate a more Markovian state. That design isolates the effect of state representation from the choice of algorithm, which is what makes the comparison defensible.

Results

The framework runs config-driven experiments with versioned runs and a paired evaluation/compare pipeline, so each (algorithm × observability) cell is reproducible. The deliberate split between the observability question and the algorithm question is the methodological core of the thesis.

Results in progress — the experimental sweep (multiple seeds, the memory-vs-navigation contrast) is the body of the TFG. The repository is private while the thesis is in development.