Why we used STOMP with WebSocket?

April 13, 2026•7 minute read

webSocketSession.send(“call someone”) sends raw bytes over a persistent connection. That’s it. No routing, no subscriptions, no message types. When you’re building a voice call signaling system that handles incoming calls, WebRTC negotiation, and call state events simultaneously — raw WebSocket becomes a routing nightmare you have to solve yourself.

This is where STOMP comes in, and understanding why matters if you’re designing any real-time feature beyond a simple chat.

The Problem with Raw WebSocket

WebSocket gives you a bidirectional pipe. You open a connection, you send text or binary frames, and the other side receives them. That’s the entire contract.

webSocket("ws://server/signaling") {
    send("some json string")
    incoming.collect { frame ->
        // What kind of message is this?
        // Who is it for?
        // Which handler should process it?
    }
}

Now imagine building a voice call system. You need to handle three completely different message streams over a single connection:

Incoming calls — someone is calling you
Call events — the call ended, was rejected, timed out
WebRTC signaling — SDP offers, ICE candidates, answers

With raw WebSocket, every incoming frame hits the same collect block. You’d have to inspect each message, parse its type, and route it yourself. You’d also need to build your own subscription mechanism, your own heartbeat, and your own reconnection logic.

We’ve seen teams do this. It works until it doesn’t — usually at 2 AM when messages arrive out of order and nobody can reproduce it.

What STOMP Actually Is

STOMP (Simple Text Oriented Messaging Protocol) rides on top of WebSocket the same way HTTP rides on top of TCP. It adds structure to the raw pipe:

SEND
destination:/topic/signal.create
content-type:application/json

{"callId":"abc-123"}
^@

That’s a real STOMP frame. It has a command (SEND), headers (destination, content-type), and a body. Compare this to raw WebSocket where you’d send {“callId”:”abc-123″} with no metadata about where it should go.

STOMP gives you three things that raw WebSocket doesn’t:

Destinations — messages have addresses. Instead of sending into the void, you send to /topic/signal.create or subscribe to /topic/public. The server knows exactly where to route each message.

Subscriptions — you explicitly declare what you want to receive. The server only sends you messages for topics you subscribed to, not everything happening on the connection.

Frame structure — every message has a command, headers, and body. No ambiguity about what a message means or how to parse it.

How Our Signaling System Uses It

Here’s how the connection layer works. We wrap a STOMP client around Ktor’s WebSocket transport, injecting authentication at the connection level:

class SignalingStompClient(private val httpClient: HttpClient) {

    suspend fun connect(url: String, token: String): SignalingSession {
        val transport = WebSocketDelegate(httpClient, token)
        
        val stomp = StompClient(transport) {
            heartBeat = HeartBeat(
                minSendPeriod = 150.seconds,
                expectedServerPeriod = 150.seconds
            )
            connectionTimeout = 30.seconds
        }
        
        val raw = stomp.connect(url)
        
         return SignalingSession(raw)
    }
}

Notice the heartbeat configuration — 150 seconds. This is how STOMP keeps the connection alive. Every 150 seconds, client and server exchange heartbeat frames. If one side stops responding, the other knows the connection is dead. With raw WebSocket, you’d implement your own ping/pong loop.

Once connected, the real power shows. We subscribe to three separate topics over one connection:

class SignalingConnection(private val client: SignalingStompClient) {
    fun observe(url: String, token: String): Flow<String> = flow {
          val session = client.connect(url, token)
          val streams = listOf(
              session.subscribe("/topic/public"),
              session.subscribe("/user/queue/call-events"),
              session.subscribe("/user/queue/webrtc"),
          )
          
          emitAll(streams.merge())
    }
}

Three subscriptions. Three independent message streams. One WebSocket connection. Each subscription gets only the messages it cares about — incoming calls don’t mix with ICE candidates, call events don’t arrive in the WebRTC handler.

With raw WebSocket, you’d receive all messages in one stream and write a routing switch:

// What you'd have to build without STOMP
incoming.collect { frame ->
    val json = Json.parseToJsonElement(frame.text)

    when {
        "callerId" in json -> handleIncomingCall(json)
        "status" in json   -> handleCallEvent(json)
        "sdp" in json      -> handleWebRtc(json)
        else               -> log("Unknown: $json")
    }
}

This works for three message types. It breaks down when you add ten. And debugging becomes detective work — “why did this message end up in the wrong handler?”

The Send Side — Destination-Based Routing

Sending messages shows the same advantage. Every action has a clear destination:

class SignalingSession(private val stomp: StompSession) {

suspend fun createCall(callId: String) =
        send("/topic/signal.create", CallAction(callId))

suspend fun acceptCall(callId: String) =
        send("/topic/signal.accept", CallAction(callId))

suspend fun sendAnswer(callId: String, sdp: String) =
        send("/topic/signal.answer", WebRtcAnswer(callId, sdp))

suspend fun sendIceCandidate(callId: String, candidate: IceCandidate) =
        send("/topic/signal.ice", candidate.toDto(callId))


private suspend inline fun <reified T> send(destination: String, body: T) {
        stomp.send(
            headers = StompSendHeaders(destination),
            body = FrameBody.Text(Json.encodeToString(body))
        )
    }
}

The server maps these destinations to specific handlers. /topic/signal.accept goes to the call acceptance handler. /topic/signal.ice goes to the WebRTC relay. No ambiguity, no routing logic on the client side.

The destination is a header, not part of the payload. This means your message body stays clean — just the data, no routing metadata mixed in.

The Parsing Layer — Clean Separation

Because STOMP handles routing, our parsing layer only deals with content. Messages arrive already separated by topic, so the repository can focus on one thing — transforming JSON into domain objects:

class DefaultSignalingRepository(
    private val connection: SignalingConnection,
    private val json: Json,
) : SignalingRepository {

override fun observe(): Flow<SignalingMessage> =
        connection.observe(url, token).mapNotNull { raw -> raw.toMessage() }


private fun String.toMessage(): SignalingMessage? =
        asCallEvent() ?: asIncomingCall() ?: asWebRtcSignal()


private fun String.asCallEvent(): SignalingMessage? =
        tryDecode<CallEventDto>()
            ?.takeIf { it.status == "ENDED" }
            ?.let { SignalingMessage.CallEnded(it.callId, it.reason) }

private fun String.asIncomingCall(): SignalingMessage? =
        tryDecode<IncomingCallDto>()
            ?.let { SignalingMessage.IncomingCall(it.callId, it.callerName, it.type) }

private fun String.asWebRtcSignal(): SignalingMessage? =
        tryDecode<WebRtcMessageDto>()?.let { dto ->
            when (dto.type) {
                "offer"         -> SignalingMessage.Offer(dto.sdp, dto.callId)
                "ice-candidate" -> SignalingMessage.Ice(dto.candidate, dto.callId)
                else            -> null
            }
        }

private inline fun <reified T> String.tryDecode(): T? =
        runCatching { json.decodeFromString<T>(this) }.getOrNull()
}

Notice there’s no routing logic here — no “which topic did this come from?” checks. The repository just parses content. STOMP already handled delivery to the right subscription.

Why Not Just Use a REST API?

Fair question. You could poll a REST endpoint for incoming calls and post WebRTC messages via HTTP. But consider the timing:

WebRTC negotiation requires exchanging SDP offers, answers, and ICE candidates in rapid succession — often within milliseconds. A polling interval of even 500ms would make call setup noticeably slow. And HTTP’s request-response model means the server can’t push an incoming call notification to you — you have to ask “any calls?” repeatedly.

WebSocket gives you the push capability. STOMP gives you the structure on top of it.

The Bottom Line

Raw WebSocket is a pipe. STOMP turns that pipe into a messaging system with routing, subscriptions, and structured frames. For a voice call feature juggling incoming calls, call events, and WebRTC signaling over a single connection, that structure isn’t optional — it’s what keeps the system debuggable and maintainable.

You could build all of this yourself on top of raw WebSocket. We’ve seen the code that results from that approach. It starts simple, grows custom routing, adds ad-hoc heartbeats, and eventually becomes the framework you should have used from the start.

Reference

Why we used STOMP with WebSocket? was originally published in ProAndroidDev on Medium, where people are continuing the conversation by highlighting and responding to this story.