Obtain a response from an NPC character
Send an array of interaction messages between the player and the NPC character. The API processes the conversation history and returns a new response generated by the character. Response modes
-
Batch HTTP (default): Perform a regular
POST https://app.npcbuilder.com/api/interactionsrequest. The service buffers the model response and returns a single JSON payload that includesresponse,user_events, andcharacter_events. -
Streaming WebSocket: Open a WebSocket connection to
wss://api.app.npcbuilder.com/api/interactions. Once the socket is accepted, send a JSON envelope in the following format to start the interaction:{ "type": "interaction.start", "payload": { ...interaction body... }, "meta": { "language": "en-US", "session_id": "optional-session" } }The server answers with:
stream.accepted– confirms language / session and that generation started.- Multiple
stream.deltaevents – each contains the next safe sentence so that you can render audio/voice progressively. - Optional
stream.safe_response– emitted if a banned phrase is detected mid-stream (it also aborts the upstream model). stream.end– delivers the final transcript plususer_events,character_events, and usage totals.
You may cancel an ongoing generation by sending
{"type":"interaction.cancel"}.
Voice live add-on (WebSocket only)
-
Connect to
wss://api.app.npcbuilder.com/api/interactions?voice=truefor full duplex voice, or usevoice=input/voice=outputfor one-way voice. You can also providemeta.voiceinsideinteraction.start. - When voice is enabled the service mirrors the normalstream.*text flow and emits supplementalvoice.*events:voice.ready,voice.session.updated— Azure Voice Live session lifecycle.voice.audio.delta,voice.audio.timestamp,voice.audio.done,voice.response.done— assistant text-to-speech audio.voice.transcript.delta,voice.input.started/stopped,voice.usage— microphone audio that was ingested through speech-to-text.
-
Client commands accepted during an active session:
{ "type": "voice.input.append", "data": { "audio": "<base64 pcm16 chunk>" } } { "type": "voice.input.commit" } { "type": "voice.input.clear" } { "type": "voice.session.update", "data": { "voice_name": "en-US-Ava:DragonHDLatestNeural" } } { "type": "voice.cancel" }
Text streaming is never disabled—stream.delta and the final stream.end payload always include the transcript even if voice playback is on.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Query Parameters
The language code for the interaction. Supported values include: - en-US: English (United States) - es-ES: Spanish (Spain)
en-US, es-ES "en-US"
(Optional) The unique identifier for the player's session.
"60Z5aZjIuFlyYbjbZZKe"
Optional. Controls Azure Voice Live streaming when using WebSockets. Allowed values:
false(default): disable voice, text only. *true: enable full duplex (speech-to-text + text-to-speech). *input: only capture microphone input (speech-to-text). *output: only synthesize assistant audio (text-to-speech).
false, true, input, output Body
A JSON, XML, or URL-encoded payload containing the complete conversation history. Ensure messages are sent as an array of objects following the defined schema.
The Interaction schema defines the structure for a conversation between a player and an NPC. It includes metadata such as character ID, game ID, world ID, and an array of messages.
The unique identifier of the NPC character.
"a96c6161-59f5-40f7-955e-459cd11"
The unique identifier of the game.
"tx65BrETVN2vMrrUIrlV"
The unique identifier of the world.
"PtUYW5bLMZNnXPN8qAUJ"
(Optional) The unique identifier for the conversation history. If provided with empty messages, history will be loaded.
"conv_12345"
An array of messages exchanged between the player and the NPC.
An object containing player-specific information for character context.
Response
Successful operation. Returns the character's response along with any triggered user events.
The ApiResponse schema represents the successful response from the API, including the NPC's response and any user-triggered events.