litecord/docs/lvsp.md

6.5 KiB

Litecord Voice Server Protocol (LVSP)

LVSP is a protocol for Litecord to communicate with an external component dedicated for voice data. The voice server is responsible for the Voice Websocket Discord and Voice UDP connections.

LVSP runs over a long-lived websocket with TLS. The encoding is JSON.

OP code table

"client" is litecord. "server" is the voice server.

opcode name sent by
0 HELLO server
1 IDENTIFY client
2 RESUME client
3 READY server
4 HEARTBEAT client
5 HEARTBEAT_ACK server
6 INFO client / server

Message structure

Message data is defined by each opcode.

Note: the snowflake type follows the same rules as the Discord Gateway's snowflake type: A string encoding a Discord Snowflake.

field type description
op integer, opcode operator code
d map[string, any] message data
s Optional[int] sequence number
  • The s field is explained in the RESUME message.

High level overview

  • connect, receive HELLO
  • send IDENTIFY or RESUME
  • if RESUME, process incoming messages as they were post-ready
  • receive READY
  • start HEARTBEAT'ing
  • send INFO / VSU_REQUEST messages as needed

Error codes

code meaning
4000 general error. Reconnect
4001 authentication failure
4002 decode error, given message failed to decode as json

HELLO message

Sent by the server when a connection is established.

field type description
heartbeat_interval integer amount of milliseconds to heartbeat with
nonce string random 10-character string used in authentication

IDENTIFY message

Sent by the client to identify itself.

field type description
token string HMAC(SHA256, key=[secret shared between server and client]), message=[nonce from HELLO]

RESUME message

Sent by the client to resume itself from a failed websocket connection.

The server will resend its data, then send a READY message.

field type description
token string same value from IDENTIFY.token
seq integer last sequence number to resume from

Sequence numbers

Sequence numbers are used to resume a failed connection back and make the voice server replay its missing events to the client.

They are positive integers, starting from 0. There is no default upper limit. A "long int" type in languages will probably be enough for most use cases.

Replayable messages MUST have sequence numbers embedded into the message itself with a s field. The field lives at the root of the message, alongside op and d.

READY message

  • The health field is described with more detail in the HEARTBEAT_ACK message.
field type description
health Health server health

HEARTBEAT message

Sent by the client as a keepalive / health monitoring method.

The server MUST reply with a HEARTBEAT_ACK message back in a reasonable time period.

field type description
s integer sequence number

HEARTBEAT_ACK message

Sent by the server in reply to a HEARTBEAT message coming from the client.

The health field is a measure of the servers's overall health. It is a float going from 0 to 1, where 0 is the worst health possible, and 1 is the best health possible.

Servers SHOULD use the same algorithm to determine health, it CAN be based off:

  • Machine resource usage (RAM, CPU, etc), however they're too general and can be unreliable.
  • Total users connected.
  • Total bandwidth used in some X amount of time.

Among others.

field type description
s integer sequence number
health float server health

INFO message

Sent by either client or a server to send information between eachother. The INFO message is extensible in which many request / response scenarios are laid on.

field type description
type InfoType info type
data Any info data, varies depending on InfoType

InfoType Enum

value name description
0 CHANNEL_REQ channel assignment request
1 CHANNEL_ASSIGN channel assignment reply
2 CHANNEL_UPDATE channel update
3 CHANNEL_DESTROY channel destroy
4 VST_CREATE voice state create request
5 VST_UPDATE voice state update
6 VST_LEAVE voice state leave

TODO: finish all infos

CHANNEL_REQ

Request a channel to be created inside the voice server.

The Server MUST reply back with a CHANNEL_ASSIGN when resources are allocated for the channel.

field type description
channel_id snowflake channel id
guild_id Optional[snowflake] guild id, not provided if dm / group dm
channel_properties ChannelProperties channel properties

ChannelProperties

field type description
bitrate integer channel bitrate

CHANNEL_ASSIGN

Sent by the Server to signal the successful creation of a voice channel.

field type description
channel_id snowflake channel id
guild_id Optional[snowflake] guild id, not provided if dm / group dm
token string authentication token

CHANNEL_UPDATE

Sent by the client to signal an update to the properties of a channel, such as its bitrate.

Same data as CHANNEL_REQ.

CHANNEL_DESTROY

Sent by the client to signal the destruction of a voice channel. Be it a channel being deleted, or all members in it leaving.

Same data as CHANNEL_ASSIGN, but without token.

Common logic scenarios

User joins an unitialized voice channel

Since the channel is unitialized, both logic on initialization AND user join is here.

  • Client will send a CHANNEL_REQ.
  • Client MAY send a VST_CREATE right after as well.
  • The Server MUST process CHANNEL_REQ first, so the Server can keep a lock on channel operations while it is initialized.
  • Reply with CHANNEL_ASSIGN once initialization is done.
  • Process VST_CREATE TODO

Updating a voice channel

  • Client sends CHANNEL_UPDATE.
  • Server DOES NOT reply.

Destroying a voice channel

  • Client sends CHANNEL_DESTROY.
  • Server MUST disconnect any users currently connected with its voice websocket.

User joining an (initialized) voice channel

  • Client sends VST_CREATE
  • TODO

User moves a channel

TODO

User leaves a channel

TODO