Prompt anatomy: the three axes a generative-music prompt actually controls

Instrumentation, mood, and tempo are the only three axes a SONICHAOS AI Studio prompt steers. Here is how to write each one without picking a fight with the persona.

May 9, 2026 · 4 min read · SONICHAOS editorial

A prompt is not a wish. It is three sliders dressed up as a sentence, and the model only listens to the sliders.

A free-text prompt on SONICHAOS AI Studio looks like one thing and works like three. The model parses what you wrote into a small set of axes, attaches them to the persona embedding, and renders. Most of the prompts that come back flat are flat because two axes were ornamented and the third was missing. The fix is to write each axis on purpose. There are three of them, and they want different grammar: instrumentation, mood, tempo.

Instrumentation: timbre nouns over genre nouns

The first axis is what the song is made of. Genre nouns underperform here because the model has to guess the timbre that satisfies a label. Jazz could be a 1959 quartet or a Robert Glasper trio in 2017, and the renders fight each other across re-rolls. Timbre nouns collapse that ambiguity. Rhodes electric piano is a specific tone. Upright bass with bow is more specific again. 808 sub, finger-snapped hi-hat, Wurlitzer through a Leslie is three timbre nouns that the diffusion stack can place on the mel grid without negotiation.

Two grammar notes. Adjectives in front of timbre nouns earn their keep, because they describe the recording, not the instrument. Examples that work: dusty Rhodes, close-mic'd upright, dry 808 sub. Genre adjectives in front of timbre nouns do not earn the same keep. Compare jazzy Rhodes against Rhodes, walking left hand, and the second phrase wins on every render. Write what the microphone heard.

Mood: prepositional phrases over adjective stacks

The second axis is the room and the hour. Adjective stacks (moody, atmospheric, cinematic) compress to almost nothing in the embedding because the three words point at the same vector. A prepositional phrase carries more signal because it forces a scene. For a fog-lit hotel lobby at 2 a.m. is a render direction. So is for the credits of a film about a missing sister. The model uses the scene to choose reverb tail, headroom, and whether the kick sits on the grid or behind it.

A short rule: if you can replace your mood phrase with the word vibe and the prompt still parses, the phrase is doing nothing. Cut it and write the scene. One scene per prompt — the model averages two scenes into a third one you did not ask for.

Tempo: a number, not a feeling

The third axis is groove and clock. Chill is not a tempo. 92 BPM, half-time is. A specific BPM with a feel tag (half-time, double-time, swung 16ths, straight quarters) gives the model both the metronome and the pocket. The persona then decides where the voice sits against that pocket, which is the whole point of the persona contract.

A short spec list of tempo tags that survive the embedding cleanly:

72 BPM, half-time — slow ballads, R&B verses.
96 BPM, swung 16ths — late-night neo-soul.
120 BPM, four-on-the-floor — house, gospel-house.
140 BPM, double-time hats — drum-and-bass, trailer scoring.
92 BPM, broken-beat — UK garage, two-step adjacent.

If you do not name a tempo, the persona's default tempo wins. That is fine for sketching. It is not fine for a brief.

Three before/after prompts

The same three prompts, run through the demo composer with the Arc Koi persona, before and after axis-by-axis rewriting:

Before: chill jazzy track, kind of moody, late night vibe. After: Rhodes electric piano, brushed kit, upright bass with bow, for a fog-lit hotel lobby at 2 a.m., 84 BPM half-time.
Before: epic cinematic build, very emotional. After: low strings into 24-piece brass, for the final shot of a film about a missing sister, 92 BPM, straight quarters.
Before: summer pop song, fun and bright. After: Wurlitzer through a Leslie, finger-snapped hi-hat, dry 808 sub, for a rooftop in Lisbon at golden hour, 104 BPM, swung 16ths.

Each rewrite swaps a genre noun for a timbre noun, a vibe word for a scene, and a feeling for a number. None of the rewrites argue with the persona. The persona still owns the voice. The prompt owns the room, the gear, and the clock. That split is the whole job.

What to leave out

Three things the prompt should not carry, because they belong somewhere else in the composer:

Voice and singer notes. That is the persona's job. Naming a voice in the prompt while a persona is selected pulls the render in two directions.
Lyric content. When Lyrics mode is active, the lyric is the primary mood signal. The prompt mood drops to a secondary cue. Write the prompt as if the lyric will arrive separately, because it does.
Mix language. Loud, quiet, mastered, boomy — the loudness audit handles this and tags renders that fall outside −18 to −12 LUFS. The prompt is not the mastering chain.

Three axes, three grammars, one persona on top. A render that lands on the first try almost always has a timbre noun, a prepositional mood phrase, and a BPM with a feel tag. A render that drifts is usually missing one of the three.

Open the composer

← Back to all notes

Instrumentation: timbre nouns over genre nouns

Mood: prepositional phrases over adjective stacks

Tempo: a number, not a feeling

A short spec list of tempo tags that survive the embedding cleanly:

72 BPM, half-time — slow ballads, R&B verses.

96 BPM, swung 16ths — late-night neo-soul.

120 BPM, four-on-the-floor — house, gospel-house.

140 BPM, double-time hats — drum-and-bass, trailer scoring.

92 BPM, broken-beat — UK garage, two-step adjacent.

If you do not name a tempo, the persona's default tempo wins. That is fine for sketching. It is not fine for a brief.

Three before/after prompts

The same three prompts, run through the demo composer with the Arc Koi persona, before and after axis-by-axis rewriting:

Before: chill jazzy track, kind of moody, late night vibe. After: Rhodes electric piano, brushed kit, upright bass with bow, for a fog-lit hotel lobby at 2 a.m., 84 BPM half-time.

Before: epic cinematic build, very emotional. After: low strings into 24-piece brass, for the final shot of a film about a missing sister, 92 BPM, straight quarters.

Before: summer pop song, fun and bright. After:

Wurlitzer through a Leslie, finger-snapped hi-hat, dry 808 sub, for a rooftop in Lisbon at golden hour, 104 BPM, swung 16ths

What to leave out

Three things the prompt should not carry, because they belong somewhere else in the composer:

Voice and singer notes. That is the persona's job. Naming a voice in the prompt while a persona is selected pulls the render in two directions.

Lyric content. When Lyrics mode is active, the lyric is the primary mood signal. The prompt mood drops to a secondary cue. Write the prompt as if the lyric will arrive separately, because it does.

Mix language. Loud, quiet, mastered, boomy — the loudness audit handles this and tags renders that fall outside −18 to −12 LUFS. The prompt is not the mastering chain.

Prompt anatomy: the three axes a generative-music prompt actually controls

Finish the production with licensed sound.

Prompt anatomy: the three axes a generative-music prompt actually controls