When you’re developing interactive applications with simultaneous editing ability (for example, a virtual collaborative whiteboard, chat, online game or real-time reporting system over the web), using the traditional loosely coupled HTTP request/response web model is obviously not an efficient way to go. That approach is simply not designed for a real-time model. We need a more lightweight protocol that can provide a full-duplex communication channel between endpoints of the system to achieve as near a real-time experience as possible.
This need is becoming critical as such applications are deployed and run in the mobile world, where the resources for staying connected are sometimes very limited: limited bandwidth, limited memory, lots of potential latency.
A number of creative approaches—work-arounds—aiming to create a real-time feeling for users have been implemented (for example, Ajax, Comet). So far these have served the connected world well by bringing together good user experience with the ability to shorten the time in which data is being sent between client and server. But these approaches still have several limitations from a resource-consumption perspective: a huge redundancy of network traffic, server demands and the complication of maintaining two HTTP connections between endpoints (one for the upstream and another for the downstream).
Using WebSockets is a big step forward in the effort to create an engaging, interactive user experience. It could provide capabilities such as real bidirectional communication, low latency, significant reduction of overhead and dramatically reduced complexity of implementation.
From a security standpoint, though, some people are afraid of using WebSockets due to some risks that would create vulnerabilities. WebSockets’ application programming interface (API) allows establishing WebSockets connections across domains without the user’s acknowledgement, and requests are sent without notifying the user. This makes it possible for the attacker to inject malicious JavaScript code into the victim’s client application (the user agent; for example, browser, mobile app and so forth) to establish a WebSockets connection to an arbitrary target. The connection can then be utilized by the attacker for malicious purposes, such as:
- Remote shell, web-based botnet, port scanning
Cross-site scripting (XSS) vulnerability has been common in web technology, but utilizing WebSockets introduces some threats that would give an attacker more power to control the victims—assuming that the user somehow visits a malicious service, or a website that has XSS vulnerability exploited, from the user agent. Once loaded in the user agent, malicious JavaScript code can be executed and the attacker can easily establish a WebSockets connection to a malicious server and create a remote shell to utilize the victims for malicious purposes: a Distributed Denial of Service (DDoS) attack (with lots of victims), access to the company’s intranet services for information, port scanning or using the victim’s user agent as a proxy, a springboard for other attacks.
- Friendship between WebSockets and proxies, firewalls
In November 2010, a serious security issue involving WebSockets was reported. WebSockets was still not adopted widely enough, so some transparent proxies didn’t correctly understand the HTTP upgrade mechanism being used for the handshaking of WebSockets and thus can potentially allow a cache poisoning attack. Frame-masking was added to avoid that vulnerability, but in turn the frame-masking and other natural lightweight features of the protocol (lack of metadata like HTTP header, content length) challenge the virus and malware scanning tools in analyzing the data patterns to detect malicious content in a malicious usage of WebSockets channel.
The vulnerabilities are mostly not specific to WebSockets API or the protocol, but the freedom of the new data exchange model opens up more threats and more attention is needed to secure the communication. Best practices for traditional web programming should still be applied for WebSockets.
- Maximize the validation on both client and server side against the received input. Client and server basically should not trust each other by default.
- Maximize the use of Transport Layer Security (TLS) encryption to achieve integrity.
- Carefully implement authentication and session management between endpoints.
When you are struggling with the trade-off between security and performance while deciding whether or not to use WebSockets, it might be a good choice to utilize a well-known solution for your particular need. One of the proposals to deploy your application that uses WebSockets is to not make it a mobile web application that runs on web browsers of the users’ mobile devices, but instead to use an alternative way—to build a hybrid mobile application and stick your client application with a proven server-side solution, for example, Node.JS (and Socket.IO, or Worlize). IBM Worklight offers you a way to easily build a hybrid mobile application (and much more). You can basically build your app in a web-based code such as JavaScript, CSS or HTML just like a web application, but the code will eventually run on top of a thin native container that utilizes the device’s webkit engine, instead of using the mobile browser itself (you need some work-around for dealing with Android though, because unfortunately WebSockets has not been widely supported in its embedded webkit engine yet).
Be well aware of the security vulnerabilities of using WebSockets. Dealing with them properly will help you to build a secure, interactive mobile application and enjoy the near real-time experience on your mobile devices in a collaborative world where time is precious and conserving resources is critical