Keep AI Studio for custom renders, then move to the Store for catalog tracks, SFX, plugins, and licensing coverage on client work.
Grab a free starter kit — 50 sounds, no card.
Drum hits, one-shots, a few loops. Open in any DAW.Instrumentation, mood, and tempo are the only three axes a SONICHAOS AI Studio prompt steers. Here is how to write each one without picking a fight with the persona.
May 9, 2026 · 4 min read · SONICHAOS editorialA prompt is not a wish. It is three sliders dressed up as a sentence, and the model only listens to the sliders.
A free-text prompt on SONICHAOS AI Studio looks like one thing and works like three. The model parses what you wrote into a small set of axes, attaches them to the persona embedding, and renders. Most of the prompts that come back flat are flat because two axes were ornamented and the third was missing. The fix is to write each axis on purpose. There are three of them, and they want different grammar: instrumentation, mood, tempo.
The first axis is what the song is made of. Genre nouns underperform
here because the model has to guess the timbre that satisfies a label.
Jazz could be a 1959 quartet or a Robert Glasper trio in 2017, and
the renders fight each other across re-rolls. Timbre nouns collapse
that ambiguity. Rhodes electric piano is a specific tone. Upright bass with bow is more specific again. 808 sub, finger-snapped hi-hat, Wurlitzer through a Leslie is three timbre nouns that the diffusion
stack can place on the mel grid without negotiation.
Two grammar notes. Adjectives in front of timbre nouns earn their
keep, because they describe the recording, not the instrument. Examples
that work: dusty Rhodes, close-mic'd upright, dry 808 sub. Genre
adjectives in front of timbre nouns do not earn the same keep. Compare
jazzy Rhodes against Rhodes, walking left hand, and the second
phrase wins on every render. Write what the microphone heard.
The second axis is the room and the hour. Adjective stacks
(moody, atmospheric, cinematic) compress to almost nothing in the
embedding because the three words point at the same vector. A
prepositional phrase carries more signal because it forces a scene.
For a fog-lit hotel lobby at 2 a.m. is a render direction. So is
for the credits of a film about a missing sister. The model uses
the scene to choose reverb tail, headroom, and whether the kick
sits on the grid or behind it.
A short rule: if you can replace your mood phrase with the word
vibe and the prompt still parses, the phrase is doing nothing. Cut
it and write the scene. One scene per prompt — the model averages
two scenes into a third one you did not ask for.
The third axis is groove and clock. Chill is not a tempo. 92 BPM, half-time is. A specific BPM with a feel tag (half-time,
double-time, swung 16ths, straight quarters) gives the model
both the metronome and the pocket. The persona then decides where the
voice sits against that pocket, which is the whole point of the
persona contract.
A short spec list of tempo tags that survive the embedding cleanly:
72 BPM, half-time — slow ballads, R&B verses.96 BPM, swung 16ths — late-night neo-soul.120 BPM, four-on-the-floor — house, gospel-house.140 BPM, double-time hats — drum-and-bass, trailer scoring.92 BPM, broken-beat — UK garage, two-step adjacent.If you do not name a tempo, the persona's default tempo wins. That is fine for sketching. It is not fine for a brief.
The same three prompts, run through the demo composer with the
Arc Koi persona, before and after axis-by-axis rewriting:
chill jazzy track, kind of moody, late night vibe.
After: Rhodes electric piano, brushed kit, upright bass with bow, for a fog-lit hotel lobby at 2 a.m., 84 BPM half-time.epic cinematic build, very emotional.
After: low strings into 24-piece brass, for the final shot of a film about a missing sister, 92 BPM, straight quarters.summer pop song, fun and bright.
After: Wurlitzer through a Leslie, finger-snapped hi-hat, dry 808 sub, for a rooftop in Lisbon at golden hour, 104 BPM, swung 16ths.Each rewrite swaps a genre noun for a timbre noun, a vibe word for a scene, and a feeling for a number. None of the rewrites argue with the persona. The persona still owns the voice. The prompt owns the room, the gear, and the clock. That split is the whole job.
Three things the prompt should not carry, because they belong somewhere else in the composer:
Lyrics mode is active, the lyric is the
primary mood signal. The prompt mood drops to a secondary cue.
Write the prompt as if the lyric will arrive separately, because
it does.Loud, quiet, mastered, boomy — the
loudness audit handles this and tags renders that fall outside
−18 to −12 LUFS. The prompt is not the mastering chain.Three axes, three grammars, one persona on top. A render that lands on the first try almost always has a timbre noun, a prepositional mood phrase, and a BPM with a feel tag. A render that drifts is usually missing one of the three.