Animals can adopt diverse strategies to find rewards, including exploiting statistical structure in their environment. How animals develop strategies, and how distinct strategies differentially engage brain circuits, is not well understood. We investigated these questions in two behavioral paradigms. In the first, mice perform a visual change detection task. Using a dynamic logistic regression model to characterize mouse behavior, we find that individual mice use mixtures of a visual comparison strategy and a statistical timing strategy. Separately, mice also have periods of task engagement and disengagement. Two-photon calcium imaging in visual cortex shows large strategy-dependent differences in neural activity in excitatory, Sst inhibitory, and Vip inhibitory cells. These strategy correlates can be understood parsimoniously as the increased activation of the Vip-Sst disinhibitory circuit during the visual comparison strategy, which facilitates task-appropriate response. In a second paradigm, mice perform a two-armed bandit task with reward probabilities that change in blocks. To maximize rewards, mice must sample the environment to continuously learn which alternative provides reward. Preliminary analysis suggests mice perform this task with a diversity of strategies including state inference to exploit the block structure of the task. In ongoing work, we are measuring the activity of neuromodulatory systems---dopamine, serotonin, norepinephrine, and acetylcholine--- over learning, which may reveal neural correlates of behavioral diversity in neuromodulatory systems.