I've been doing some work with Valgrind recently, and the suggested way to get a big-picture understanding of how Valgrind works was to read this paper. Having read it, I think this is a good recommendation. Some notes, primarily for my own benefit.
Dynamic recompilation seems very similar to my under-informed understanding of QEMU's approach. Substantially more complex than our hacked-up approach to static binary instrumentation. Would probably be a lot easier to implement nowadays with LLVM than it was in 2007. Interesting loading procedure, though it has the same issue that PIN does where it shares an address space with its target (and a target seeking to interfere with analysis will likely be able to). The dispatcher / scheduler translation execution mechanism is also interesting; doesn't do translation block chaining like QEMU does (we ran into an issue with QEMU's tb-linking a couple weeks ago), but has a very tight "dispatcher" mechanism that checks a cache and executes known / hot translations, with the slower "scheduler" as fallback. Coming from writing system call models in PIN, the events system sounds pretty great; I wonder how much of Valgrind's syscall models are stealable for use in other dynamic instrumentation frameworks.
Follow-up topics I should read more on: