In Part 1, I explored the RLM paradigm - giving LLMs a REPL to programmatically interact with data instead of processing massive contexts in a single forward pass. This post covers how I’ve been applying RLM and other techniques to manage context in my day-to-day work with coding agents like Claude Code.
I’ve been exploring Recursive Language Models (RLM), a new inference paradigm from MIT’s OASYS lab. The core idea is compelling: instead of forcing LLMs to process massive contexts in a single forward pass, give them a Python REPL and let them programmatically interact with the data.
This is Part 3 of my Titans implementation series. Part 1 covered the basic memory mechanism. Part 2 focused on performance optimization. This post tackles the full HOPE (Hierarchical Optimized Parallel Encoding) architecture - the multi-level memory system that makes Titans truly interesting - and the hard lessons learned when evaluation results didn’t match expectations.
This is a follow-up to my previous post on implementing Titans. After getting the basic implementation working, I dove deeper into performance optimization with Claude Code as a pair programmer. This post covers the journey from a working but slow implementation to something that trains efficiently on commodity GPUs.
Context engineering has become extreme intriguing to me recently, as hands on building couple of agentic platform project, one asepct of context engineering is memory and continous learning, Manus shared an excellent learning: Context Engineering for AI Agents from harness perspective,
Google Research dropped a paper in January 2025 that caught my attention: Titans: Learning to Memorize at Test Time. The core idea is elegant - give transformers a learnable memory that updates during inference, not just training. I decided to implement it from scratch to understand it deeply.