Send an array of interaction messages between the player and the NPC character. The API processes the conversation history and returns a new response generated by the character. Response modes
Batch HTTP (default): Perform a regular POST https://api.app.npcbuilder.com/api/interactions request. The service buffers the model response and returns a single JSON payload that includes response, user_events, and character_events.
Streaming WebSocket: Open a WebSocket connection to wss://api.app.npcbuilder.com/api/interactions. Once the socket is accepted, send a JSON envelope in the following format to start the interaction:
{
"type": "interaction.start",
"payload": { ...interaction body... },
"meta": {
"language": "en-US",
"session_id": "optional-session"
}
}
The server answers with:
stream.accepted – confirms language / session and that generation started.stream.delta events – each contains the next safe sentence so that you can render audio/voice progressively.stream.safe_response – emitted if a banned phrase is detected mid-stream (it also aborts the upstream model).stream.end – delivers the final transcript plus user_events, character_events, and usage totals.You may cancel an ongoing generation by sending {"type":"interaction.cancel"}.
Voice live add-on (WebSocket only)
Connect to wss://api.app.npcbuilder.com/api/interactions?voice=true for full duplex voice, or use voice=input / voice=output for one-way voice. You can also provide meta.voice inside interaction.start. - When voice is enabled the service mirrors the normal stream.* text flow and emits supplemental voice.* events:
voice.ready, voice.session.updated — Azure Voice Live session lifecycle.voice.audio.delta, voice.audio.timestamp, voice.audio.done, voice.response.done — assistant text-to-speech audio.voice.transcript.delta, voice.input.started/stopped, voice.usage — microphone audio that was ingested through speech-to-text.Client commands accepted during an active session:
{ "type": "voice.input.append", "data": { "audio": "<base64 pcm16 chunk>" } }
{ "type": "voice.input.commit" }
{ "type": "voice.input.clear" }
{ "type": "voice.session.update", "data": { "voice_name": "en-US-Ava:DragonHDLatestNeural" } }
{ "type": "voice.cancel" }
Text streaming is never disabled—stream.delta and the final stream.end payload always include the transcript even if voice playback is on.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
The language code for the interaction. Supported values include: - en-US: English (United States) - es-ES: Spanish (Spain)
en-US, es-ES "en-US"
(Optional) The unique identifier for the player's session.
"60Z5aZjIuFlyYbjbZZKe"
Optional. Controls Azure Voice Live streaming when using WebSockets. Allowed values:
false (default): disable voice, text only. * true: enable full duplex (speech-to-text + text-to-speech). * input: only capture microphone input (speech-to-text). * output: only synthesize assistant audio (text-to-speech).false, true, input, output A JSON, XML, or URL-encoded payload containing the complete conversation history. Ensure messages are sent as an array of objects following the defined schema.
The Interaction schema defines the structure for a conversation between a player and an NPC. It includes metadata such as character ID, game ID, world ID, and an array of messages.
The unique identifier of the NPC character.
"a96c6161-59f5-40f7-955e-459cd11"
The unique identifier of the game.
"tx65BrETVN2vMrrUIrlV"
The unique identifier of the world.
"PtUYW5bLMZNnXPN8qAUJ"
(Optional) The unique identifier for the conversation history. If provided with empty messages, history will be loaded.
"conv_12345"
An array of messages exchanged between the player and the NPC.
An object containing player-specific information for character context.
Successful operation. Returns the character's response along with any triggered user events.
The ApiResponse schema represents the successful response from the API, including the NPC's response and any user-triggered events.