Lyric craft and the renderer: how phrasing meets phoneme alignment

Lyrics on SONICHAOS AI Studio are not free text. The composer treats them as singable input, collapses long lines to roughly twelve syllables, and aligns phonemes against the singer persona. Here is how to write for that.

May 9, 2026 · 5 min read · SONICHAOS editorial

A lyric line is not a sentence. It is a breath with words attached, and the renderer scores the breath first.

The composer has a Lyrics mode that opens a multi-line textarea and treats every blank-line block as a verse. It looks like a notes app and behaves like a score. What you type goes through a phrasing pass before it ever reaches the diffusion stack, and that pass is where most of the "why does this verse sound rushed" feedback comes from. This is a short tour of what the pass does, why it does it, and how to write a verse that the renderer can actually sing.

What `Lyrics` mode does to your text

When you switch the composer to Lyrics, three things change. The prompt mood drops to a secondary cue, because the lyric is now the primary mood signal. The persona embedding picks up a phrasing tag set keyed to the persona's vowel placement and breath length. And the lyric itself is run through a phrasing tokenizer that splits each line on phrasing tokens (soft commas, mid-line breaths, rhyme cadences) before being aligned to a phoneme grid.

The phoneme grid is the load-bearing piece. Every singer persona ships with a stored vowel-placement profile (front, middle, back vowels mapped to formant targets) and a breath-length distribution measured from the reference clips. The renderer uses the profile to decide which syllables stretch, which ones get clipped, and where the persona will steal a sixteenth-note breath. You can hear the difference between two personas singing the same verse without changing a word — the phrasing tokens land in different places.

Why we collapse lines to about twelve syllables

The phrasing tokenizer collapses any line longer than about twelve syllables before sending the verse to the model. Twelve is not a hard cap. It is the median breath length across the v1 singer personas, measured from the reference acapellas. A line of fourteen to sixteen syllables can survive the collapse with a mid-line breath inserted. A line of twenty-plus syllables either loses internal words or gets sung at a tempo the persona does not normally hold, and both outcomes feel rushed on playback.

The composer surfaces a small note next to any verse with a line over fifteen syllables, suggesting a rewrite. We do not auto-rewrite the lyric. The note is advisory because lyric writing is the part of the loop where the listener has the most opinions, and a silent auto-edit is the wrong place to spend that trust.

Singer personas and vowel placement

Vowel placement is the difference between a persona that sings a word and one that announces it. Noor Vex puts most of her weight on front vowels (ee, i, e) and pulls back vowels (oo, o) toward the front of the mouth — that is the breathy, close quality. Tempest does the opposite, sitting back vowels deep in the throat and leaning on consonant attacks. Arc Koi lands somewhere between, which is why hooky tenor lines tend to render cleanly on him without much rewriting.

A short practical rule. If a verse rhymes on back vowels (alone, gone, home) and you render it on a front-vowel persona, the rhymes will read as slightly thin. Either swap the persona or rewrite one of the rhymes onto a front vowel. The renderer will not flag this. It will sing exactly what you gave it.

What happens when you send an unsingable line

A line that has no phrasing the tokenizer can find (an alphabet-soup paragraph, a single hundred-character word, a list of unrelated nouns separated by commas) falls through to a fallback path. The fallback inserts a mid-line breath every eight syllables, applies the persona's default phrasing, and renders. The result is a verse that sings, but it sings the same shape regardless of the lyric, which is almost never what you wanted.

The audit row tags this fallback with phrasing: fallback so you can see, after the render, that the renderer did not find any phrasing in your input. It is the cleanest signal that the lyric needs a rewrite, not the prompt and not the persona.

A worked example: one verse, three takes

The first version, written as prose:

I keep finding pieces of last summer in the corners of every room I
walk into and I do not know what to do with any of them yet

Forty-one syllables, one line, no phrasing. The fallback runs, the verse comes back even and dull. Rewrite once for breath:

I keep finding pieces of last summer
in the corners of every room I walk into
and I do not know what to do with them yet

Three lines, twelve to fourteen syllables each, no internal breath needed. The persona sings it cleanly. Rewrite once more, this time for vowel placement on Noor Vex:

I keep finding pieces of last spring
in the corners of every empty street
and I do not know what to keep

Same shape. Front-vowel rhymes (spring, street, keep) for the breathy persona. Twelve syllables on the longest line. The renderer holds the breath where you wrote the line breaks, places the vowels forward, and the verse lands.

The contract, in one sentence

Write the lyric for a singer with a twelve-syllable breath. The composer will do the rest, and the persona page is where you pick the mouth.

Choose a singer persona

← Back to all notes

Lyric craft and the renderer: how phrasing meets phoneme alignment

Finish the production with licensed sound.

Lyric craft and the renderer: how phrasing meets phoneme alignment