Cross-Site Scripting – An eXceSSive Discussion about XSS

Cross-site scripting (XSS) is a security concern that has persisted for over two decades in the world of application security. For newcomers to this field, XSS is often one of the first vulnerabilities they encounter due to its prevalence in web applications and it’s relatively straightforward nature. In this article, we delve into the origins of the term “cross-site scripting,” explore the various types of XSS vulnerabilities and their evolution, and contemplate whether simplifying the terminology could enhance our understanding of this issue. 

So why do we call it cross-site scripting? Microsoft coined the term around the year 2000. Before the same-origin policy came to be, the expression referred to the act of loading vulnerable site, from an attacker-controlled origin (this could be a website or even an email), in a way that a fragment of JavaScript could be executed in the context of the target domain. It’s important to note that this form of the vulnerability was reflected, and non-persistent. As persistent forms of script injection were made possible by the ever-evolving world wide web, and more scripting (and non-scripting) languages such as ActiveX, Java, VBScript and Flash became supported by browsers, the definition has broadened. This caused some confusion amongst newcomers to the field of information security (and it still does!). 

As Jeremiah Grossman explains below, the original acronym for the term was actually “CSS”, but this was slowly phased out as we already had Cascading Style Sheets in the web development world: 

What was soon discovered was that a malicious website could load another website into an adjacent frame or window, then use JavaScript to read into it. One website could cross a boundary and script into another page. Pull data from forms, re-write the page, etc. Hence the name cross-site scripting (CSS). Notice the use of “CSS”. Netscape fired back with the “same-origin policy”, designed to prevent such behavior. And the browser hackers took this as a challenge and began finding what seems like hundreds of ways to circumvent the security. 
 
As more became familiar with CSS, people confused the terms with Cascading Style Sheets. Another newly born web browser technology in the 90’s. Then a couple of years ago, via the webappsec mailing list, someone made the suggestion of changing cross-site scripting to XSS to avoid confusion. And just like that it stuck. XSS had an identity and dozens of newly minted white papers and thousands of advisories were released. Over time the original meaning of XSS changed, from one of reading across domains to anytime you can get a website to display user-supplied content laced with HTML/JavaScript. That’s why so many people are still confused by the name cross-site scripting, because it really doesn’t describe what it’s become.

What Jeremiah describes as confusing for so many, is that the definition of XSS has changed, from reading content across domains to displaying and executing user-supplied content. We now have persistent vectors that allow an attacker to bring in a malicious JavaScript payload into the same site (not cross-site). For this kind of attack vector, everything takes place in the context of the same site, with no attacker site needed. Yet the term “cross-site” has stuck around, and hence why we still have XSS. Even the simplification to “scripting”, where languages other than JavaScript could be used maliciously is no longer the case. It’s possible that older browsers might support Flash, or other languages, but most of the time, when we think of XSS we think of JavaScript. 

What has changed over the years are the different types of XSS. We started off with reflected XSS. This is when a malicious script comes from an HTTP request:

Then, as we discussed we realized that malicious payloads could be stored and persisted in an application’s database/storage, and then returned later on. We called this stored XSS:

In 2005, a paper published by Amit Klein discussed a third and overlooked type of XSS, where the attack does not rely on sending a malicious payload to the server in the first place. We called this DOM XSS: 

Later, the security community including OWASP established that DOM XSS could be combined with reflected or stored data:
These last types of XSS where we combine DOM XSS with stored and reflected, has caused some confusion, as there is overlap between the other types. DOM-based vulnerabilities can manifest purely within a single web page, where scripts can read data from the URL and potentially send it to unsafe destinations, making the vulnerability client-side. These vulnerabilities can also stem from data originating from the server, such as data reflected in the HTTP response from the server, leading to reflected DOM XSS vulnerabilities. A script on the page can mishandle this reflected data and send it to a dangerous destination. Applications may also store data on the server and later include it in a response, potentially leading to stored DOM XSS vulnerabilities. In both of these cases, the page contains a sink that processes the reflected or stored source data unsafely. 

What is common between all these types of XSS is the outcome. Each one results in a malicious JavaScript payload being executed in the context of the victim’s browser. We initially tried to distinguish between these types of XSS by differentiating based on persistency, one of the reasons being that persistent XSS can potentially have a higher impact as it could affect more people. Now that there is overlap between the types, this has contributed to a lot of confusion about classifying vulnerabilities of this nature. 

We’ve now established the following: 

  1. “cross-site” only applies to one current type of XSS (reflected/non-persistent)  
  2. “scripting” is vague and no longer relevant, as most modern browsers do not support other scripting languages such as Flash, VBScript etc 
  3. The current XSS types that we have don’t necessarily equate to number of affected users, and they all lead to the same outcome of malicious JavaScript being injected and executed client-side or server-side. 

With these points, it’s worth considering if we need all of the different types. In fact, in 2012, OWASP had considered the proposal to change the types to Client XSS and Server XSS, while maintaining that there could be reflected and stored variants of each of these. These proposed types don’t seem to have taken off, and most in the security community still use the old types. Considering that, we must think about why that might be. It could be argued that these new types don’t really solve the underlying problem, that there is overlap, and we might be overcomplicating the definition of the vulnerability.  

Ultimately, the outcome is using JavaScript to steal something from your site or induce the user to perform an action on the attacker’s behalf. Since the outcome is always the same, we could simplify the term to something more approachable, such as JavaScript Injection (JSi).  

The cons of this simplification are that: 

  1. We don’t classify whether the vulnerability is server-side or client-side. 
  2. There is no indicator of the vulnerability being persistent or non-persistent. 

Whether the vulnerability is server-side or client-side, the top controls that most would normally recommend as part of a defense-in-depth approach to mitigate this vulnerability are largely the same: 

  1. Validate input server-side to ensure that it is of the expected format, length, type and range. 
  2. Encode output to ensure that input including special characters is displayed, but not executed as code in the browser. 
  3. Implement a Content Security Policy (CSP). 

The first two controls have been around for a long time and are considered to be the first you should implement by web application security authorities such as PortSwigger. The third is a newer control but makes it very difficult to pull off an XSS attack when configured properly. With this in mind, does it matter whether the vulnerability is server-side or client-side? Food for thought, and we’d love to hear more about your opinions on the matter.