6.5 KiB

Raw Blame History

Litecord Voice Server Protocol (LVSP)

LVSP is a protocol for Litecord to communicate with an external component dedicated for voice data. The voice server is responsible for the Voice Websocket Discord and Voice UDP connections.

LVSP runs over a long-lived websocket with TLS. The encoding is JSON.

OP code table

"client" is litecord. "server" is the voice server.

opcode	name	sent by
0	HELLO	server
1	IDENTIFY	client
2	RESUME	client
3	READY	server
4	HEARTBEAT	client
5	HEARTBEAT_ACK	server
6	INFO	client / server

Message structure

Message data is defined by each opcode.

Note: the snowflake type follows the same rules as the Discord Gateway's snowflake type: A string encoding a Discord Snowflake.

field	type	description
op	integer, opcode	operator code
d	map[string, any]	message data
s	Optional[int]	sequence number

The s field is explained in the RESUME message.

High level overview

connect, receive HELLO
send IDENTIFY or RESUME
if RESUME, process incoming messages as they were post-ready
receive READY
start HEARTBEAT'ing
send INFO / VSU_REQUEST messages as needed

Error codes

code	meaning
4000	general error. Reconnect
4001	authentication failure
4002	decode error, given message failed to decode as json

HELLO message

Sent by the server when a connection is established.

field	type	description
heartbeat_interval	integer	amount of milliseconds to heartbeat with
nonce	string	random 10-character string used in authentication

IDENTIFY message

Sent by the client to identify itself.

field	type	description
token	string	`HMAC(SHA256, key=[secret shared between server and client]), message=[nonce from HELLO]`

RESUME message

Sent by the client to resume itself from a failed websocket connection.

The server will resend its data, then send a READY message.

field	type	description
token	string	same value from IDENTIFY.token
seq	integer	last sequence number to resume from

Sequence numbers

Sequence numbers are used to resume a failed connection back and make the voice server replay its missing events to the client.

They are positive integers, starting from 0. There is no default upper limit. A "long int" type in languages will probably be enough for most use cases.

Replayable messages MUST have sequence numbers embedded into the message itself with a s field. The field lives at the root of the message, alongside op and d.

READY message

The health field is described with more detail in the HEARTBEAT_ACK message.

field	type	description
`health`	Health	server health

HEARTBEAT message

Sent by the client as a keepalive / health monitoring method.

The server MUST reply with a HEARTBEAT_ACK message back in a reasonable time period.

field	type	description
s	integer	sequence number

HEARTBEAT_ACK message

Sent by the server in reply to a HEARTBEAT message coming from the client.

The health field is a measure of the servers's overall health. It is a float going from 0 to 1, where 0 is the worst health possible, and 1 is the best health possible.

Servers SHOULD use the same algorithm to determine health, it CAN be based off:

Machine resource usage (RAM, CPU, etc), however they're too general and can be unreliable.
Total users connected.
Total bandwidth used in some X amount of time.

Among others.

field	type	description
s	integer	sequence number
health	float	server health

INFO message

Sent by either client or a server to send information between eachother. The INFO message is extensible in which many request / response scenarios are laid on.

field	type	description
type	InfoType	info type
data	Any	info data, varies depending on InfoType

InfoType Enum

value	name	description
0	CHANNEL_REQ	channel assignment request
1	CHANNEL_ASSIGN	channel assignment reply
2	CHANNEL_UPDATE	channel update
3	CHANNEL_DESTROY	channel destroy
4	VST_CREATE	voice state create request
5	VST_UPDATE	voice state update
6	VST_LEAVE	voice state leave

TODO: finish all infos

CHANNEL_REQ

Request a channel to be created inside the voice server.

The Server MUST reply back with a CHANNEL_ASSIGN when resources are allocated for the channel.

field	type	description
channel_id	snowflake	channel id
guild_id	Optional[snowflake]	guild id, not provided if dm / group dm
channel_properties	ChannelProperties	channel properties

ChannelProperties

field	type	description
bitrate	integer	channel bitrate

CHANNEL_ASSIGN

Sent by the Server to signal the successful creation of a voice channel.

field	type	description
channel_id	snowflake	channel id
guild_id	Optional[snowflake]	guild id, not provided if dm / group dm
token	string	authentication token

CHANNEL_UPDATE

Sent by the client to signal an update to the properties of a channel, such as its bitrate.

Same data as CHANNEL_REQ.

CHANNEL_DESTROY

Sent by the client to signal the destruction of a voice channel. Be it a channel being deleted, or all members in it leaving.

Same data as CHANNEL_ASSIGN, but without token.

Common logic scenarios

User joins an unitialized voice channel

Since the channel is unitialized, both logic on initialization AND user join is here.

Client will send a CHANNEL_REQ.
Client MAY send a VST_CREATE right after as well.
The Server MUST process CHANNEL_REQ first, so the Server can keep a lock on channel operations while it is initialized.
Reply with CHANNEL_ASSIGN once initialization is done.
Process VST_CREATE TODO

Updating a voice channel

Client sends CHANNEL_UPDATE.
Server DOES NOT reply.

Destroying a voice channel

Client sends CHANNEL_DESTROY.
Server MUST disconnect any users currently connected with its voice websocket.

User joining an (initialized) voice channel

Client sends VST_CREATE
TODO

User moves a channel

TODO

User leaves a channel