Doodle Agent

Exploring Freeform Visual Generation with Multimodal LLMs

How do LLMs engage in a creative act as unstructured and instinctive as doodling? We answer this question through exploring how large multimodal language models can engage in open-ended visual creation through Doodle Agent, a system that translates natural language prompts into drawing actions.