Open Source Projects

SpechPhone's SIP Breakthrough: Inbound Calls Without WebRTC

Remember SpechPhone, the PHP SIP softphone that dared to go WebRTC-free? It just took a giant leap forward, embracing inbound calls and transforming from a web dialer into a true SIP endpoint.

Screenshot of the SpechPhone web interface showing call controls.

Key Takeaways

  • SpechPhone has successfully implemented inbound SIP call handling and chat functionality without relying on WebRTC.
  • The project utilizes PHP and Swoole, managing SIP signaling and RTP media through a custom bridge to PCM audio over WebSockets.
  • State management for calls and SIP sessions is handled efficiently using Swoole Tables for inter-worker communication.
  • A dedicated `audio.php` server isolates media processing from SIP signaling, contributing to architectural clarity.

Forget what you thought you knew about browser-based voice. For years, the industry consensus has been clear: if you want real-time audio and video in a browser, you’re tethered to WebRTC. It’s the lingua franca, the king, the absolute necessity. Or so we’ve all been told.

Well, the SpechPhone project is here to politely — and quite loudly — suggest otherwise. The latest release not only cements its status as a serious contender in the open-source SIP space but does so by sidestepping the WebRTC behemoth entirely. This isn’t just an incremental update; it’s a fundamental architectural statement.

What was once a proof-of-concept for outbound SIP calls in PHP, powered by Swoole’s asynchronous magic, has morphed into a fully-fledged softphone capable of receiving inbound calls, managing call states with surprising grace, and even integrating chat functionality. All this, mind you, without a single line of WebRTC signaling or media handling.

The WebRTC Heresy

It’s a bold move. WebRTC’s complexity, its reliance on browser-native APIs, and its often opaque ICE/STUN/TURN mechanisms have become the de facto standard. Companies have sunk years and fortunes into mastering it. SpechPhone’s existence, and now its significant expansion, throws a wrench into that established narrative.

The original SpechPhone was impressive enough: using PHP and Swoole to manage SIP signaling and transport audio PCM via WebSockets. It proved you could build a browser-based SIP client that didn’t feel like a clunky, plugin-dependent relic. But the big missing piece, the one that separated a novelty from a tool, was the ability to actually receive calls.

This new branch, appropriately named inbound, plugs that critical gap. It now handles the full dance of SIP invitations: INVITE, ACK, CANCEL, and BYE. This isn’t just about picking up the phone; it’s about a server correctly interpreting and responding to the nuances of SIP protocol headers. One wrong To-tag or Call-ID, and your elegant solution devolves into a digital paperweight. The author’s dry observation is spot-on:

Em termos práticos, o SpechPhone agora consegue tocar no navegador quando uma chamada SIP chega. Pequeno detalhe técnico: isso parece simples escrito em oito linhas, mas no mundo SIP cada header tem autoestima própria.

Orchestrating Chaos: State Management with Swoole Tables

For any real-time application, especially one distributed across potentially multiple workers, managing state is paramount. In the world of asynchronous PHP with Swoole, local variables are as ephemeral as a snowflake on a hot plate. SpechPhone tackles this head-on by leveraging Swoole Tables. These in-memory data structures provide a shared, persistent canvas for tracking everything from incoming calls and active sessions to SIP bindings, remote RTP IPs, and negotiated codecs. It’s the central nervous system for the softphone, ensuring that when a call comes in, or needs to be hung up, every worker knows exactly what’s happening.

This centralized state management is the bedrock for reliably handling events like accepting or rejecting calls, managing session timeouts, and crucially, synchronizing behavior across multiple browser tabs from the same user. Without it, the asynchronous backend would quickly become a tangled mess of conflicting information.

The Media Bridge: A Tale of Two Sockets

But here’s where SpechPhone really shines — its approach to media. Instead of funneling RTP directly into the browser via WebRTC’s complex handshake, SpechPhone constructs a clever intermediary. The path looks something like this:

SIP/RTP peer
↓
RTP UDP
↓
PHP + Swoole + libspech
↓
PCM interno
↓
WebSocket de áudio
↓
Browser

And the reverse:

Microfone do navegador
↓
PCM via WebSocket
↓
audio.php
↓
UDP interno
↓
encode para codec SIP
↓
RTP para o peer remoto

This CallMediaBridge class is the linchpin, managing the translation between raw RTP packets arriving over UDP and the PCM audio streams transmitted via WebSocket. It isolates the complex SIP control plane from the often-temperamental real-time audio. The dedicated audio.php server acts as a highly specialized hub, insulating audio processing — buffering, frequency management, reconnection logic — from the core SIP signaling handled by server.php. It’s a pragmatic separation of concerns: one process manages the existential drama of SIP messages, the other deals with the tangible, chirping reality of PCM data.

Why This Matters: A Challenger Emerges

SpechPhone’s commitment to avoiding WebRTC for media is more than just a technical curiosity. It’s a potential paradigm shift. It suggests that with clever asynchronous programming and careful protocol handling, developers can achieve strong, browser-based voice and chat solutions without being beholden to WebRTC’s constraints. This could open doors for developers working in environments where WebRTC is difficult to implement or maintain, or for those who simply want finer-grained control over their media streams.

Will it dethrone WebRTC? Unlikely, given the latter’s deep integration into browser standards. But for the open-source community and developers looking for alternatives, SpechPhone is now a very compelling option, demonstrating that the road less traveled – the one without WebRTC for media – can lead to some surprisingly functional destinations.



🧬 Related Insights

Frequently Asked Questions

What does SpechPhone do?

SpechPhone is an open-source web-based SIP softphone built with PHP and Swoole. It allows users to make and receive phone calls directly from their web browser.

Does SpechPhone use WebRTC?

No, SpechPhone deliberately avoids WebRTC for its media handling. It uses WebSockets and a custom RTP to PCM bridge for audio transmission.

Is this suitable for enterprise use?

While the project is evolving rapidly and demonstrating advanced features, its suitability for enterprise-grade production environments would depend on further testing, feature completeness, and dedicated support options, which are not typically offered by open-source projects in their early stages.

Written by
Open Source Beat Editorial Team

Curated insights, explainers, and analysis from the editorial team.

Frequently asked questions

What does SpechPhone do?
SpechPhone is an open-source web-based SIP softphone built with PHP and Swoole. It allows users to make and receive phone calls directly from their web browser.
Does SpechPhone use WebRTC?
No, SpechPhone deliberately avoids WebRTC for its media handling. It uses WebSockets and a custom RTP to PCM bridge for audio transmission.
Is this suitable for enterprise use?
While the project is evolving rapidly and demonstrating advanced features, its suitability for enterprise-grade production environments would depend on further testing, feature completeness, and dedicated support options, which are not typically offered by open-source projects in their early stages.

Worth sharing?

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to

Stay in the loop

The week's most important stories from Open Source Beat, delivered once a week.