Understanding Text Encoding in ASP.NET MVC (ASP.NET MVC Foundations Series)

[The code for this post is available on GitHub]

This article covers the various ways in which you might handle text encoding in ASP.NET MVC. For example, if you were writing a forum web app, you should absolutely be paranoid about what your users are typing into your site. You need to be very careful about how you redisplay their input. For example, a friendly forum user might write something like:

Nice post, thanks for sharing!

On the other hand, they may write:

<script src=”http://evilserver.com/xss.js”></script&gt;
<script>xss.doBadDeeds();</script>

If you turn around and show this “post” to your other uses, maybe they’ll get hacked. At a minimum, the evil-doers could be a nuisance to your real users.

On the other hand, if you’re building a CMS or utility helper method, you do not want to filter out the HTML a user might type. They probably need to enter some HTML which you’ll want to show to all the other users. Same thing goes for code your app might generate.

There are at least three ways which MVC manages and encodes (or does not encode) text data. Knowing which scenario you’re targeting allows you to choose the right option. We’ll look at four examples in this post:

  1. A forum app which can be hacked
  2. A forum app which is safe from XSS injection
  3. A CMS app with rich text editing
  4. Generating HTML in code for use in MVC Razor views

Protecting Against Unwanted HTML Inputs

First, the good news. MVC protects you in several ways against any sort of HTML / JS injection issues. When you write out string contents such as below, it HTML encodes it by default when using @.

If we assume commentText = “<script src=’evil.js’></script>”, then the output would simply be:

Comment text:
<script src=’evil.js’></script>

That is &lt;script src=’evil.js’&gt;&lt;/script&gt; in view source, which is perfectly safe.

Next, it is unlikely that this input ever makes it to your site. By default, if you have an action method taking this input, it will just error out with the following message:

Error on submit:
A potentially dangerous Request.Form value was detected from the client…

Of course, we could disable this with a ValidateInput attribute:

In this case, you must be VERY careful when you write out the commentText values later.

So far we have seen that by default razor outputs text in a safe way using @value. Also, POST requests are blocked if they have dangerous content unless you let it in.

In order to demonstrate these concepts, I created a working sample app here:

http://text-encoding-aspnet-mvc-by-example.azurewebsites.net/

View the safe forum and unsafe forum sections to see what happens. You can download the code from the sample as well.

Allowing Direct HTML Inputs

But what if you trust the input and need MVC out of the way so you can write true HTML content to the browser? One such example might be a CMS you’re writing. There are two cases you would treat differently here. Is your HTML coming from data given to your view or from code called by your view?

Let’s assume it’s handed to you as a string in a variable called cmsSectionData  (i.e. data). Then we can use the helper method:

   @Html.Raw(cmsSectionData)

rather than @cmsSectionData. This will make the contents of cmsSectionData part of your HTML in the view. You will also need to disable validation on any edit pages using [ValidateInput(false)] as shown above.

Check out the CMS section of the demo to see it in action.

Finally, if you are writing little helper methods to make your views cleaner (a good idea!), you’ll do something totally different. For example, suppose we frequently need to wrap images in links in our views. We could write it out in HTML each time, or we could write a method on a class we make called OurHtmlHelper called LinkWithImage. Here is an example implementation:

You might think we could write code like this:

But MVC’s encoding for @ would block it for sure. You could wrap it in an @Html.Raw() but there is a better way.

Introducing the MvcHtmlString class

The purpose of this class is to inform MVC to get out of the way and NOT encode the contents. So simply changing the return type of LinkWithImage to MvcHtmlString fixes it.

Check out the Helpers section of the demo to see this in action.

There you have it. Three ways to encode or avoid encoding HTML data in ASP.NET MVC applications.

Cheers,
@mkennedy

15 thoughts on “Understanding Text Encoding in ASP.NET MVC (ASP.NET MVC Foundations Series)

  1. Pingback: Dew Drop – October 17, 2012 (#1,423) | Alvin Ashcraft's Morning Dew

  2. So, are you basically left to either HTML encode everything you display or, if you want to display raw HTML, to “trust the input”?
    What if you both want to be protected against evil input AND output raw HTML?

    • Hi,

      Yes, to a degree that is what MVC / ASP.NET provide. What I’ve done on recent projects that require untrusted but formatted input is to use Markdown.

      There is a cool editor and server-side library called MarkDownDeep which has worked the best for me.

      http://www.toptensoftware.com/markdowndeep/editor

      That allows some formatted input but there are no script issues provided you disallow that type of input (easy enough as you’ve already seen).

      Cheers,
      Michael

  3. I’m not sure exactly why but this site is loading very slow for me. Is anyone else having this issue or is it a issue on my end? I’ll check
    back later and see if the problem still exists.

  4. Pingback: Character encoding with ASP.NET MVC helpers | Cindy Potvin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s