10

As we all know by now, XSS attacks are dangerous and really easy to pull off. Various frameworks make it easy to encode HTML, like ASP.NET MVC does:

<%= Html.Encode("string"); %>

But what happens when your client requires that they be able to upload their content directly from a Microsoft Word document?

Here's the scenario: People can copy and paste content from Microsoft word into a WYSIWYG editor (in this case tinyMCE), and then that information is posted to a web page.

The website is public, but only members of that organization will have access to post information to a webpage.

What is the best way to handle this requirement? Currently there is no checking done on what the client posts (since only 'trusted' users can post), but I'm not particularly happy with that and would like to lock it down further in case an account is hacked.

The platform in question is ASP.NET MVC.

The only conceptual method that I'm aware of that meets these requirements is to whitelist HTML tags and let those pass through. Is there another way? If not, is the best way to let them store it in the Database in any form, but only display it properly encoded and stripped of bad tags?

NB: The questions differ in that he only assumes there's one way. I'm also asking the following questions:
1. Is there a better way that doesn't rely on HTML Whitelists?
2. Is there a better way that relies on a different view engine?
3. Is there a WYSIWYG editor that includes the ability to whitelist on the fly?
4. Should I even worry about this since it will only be for 'private posting' (Much in the same way that a private blog allows HTML From the author, but since only he can post, it's not an issue)?

Edit #2:

If suggesting a WYSIWYG editor, it must be free (as in speech, or as in beer).

Update:

All of the suggestions thus far revolve around a specific Rich Text Editor to use: Only provide an editor as a suggestion if it allows for sanitization of HTML tags; and it fulfills the requirement of accepting pasted documents from a WYSIWYG Editor like Microsoft Word.

There are three methods that I know of: 1. Not allow HTML. 2. Allow HTML, but sanitize it 3. Find a Rich Text Editor that sanitizes and allows HTML.

The previous questions remain (1-4 above).


Related Question

Preventing Cross Site Scripting (XSS)

5

The easiest way (for you as a developer) is probably to implement one of many variations of Markdown, for example Markdown.NET or, even better (imho), a wmd-editor.

Then, your users would be able to paste simple HTML, but nothing dangerous, and they would be able to preview their entered data and straighten out any scruples even before posting...

2 accepted

Whitelisting is indeed the best way to prevent XSS attacks when allowing users to enter HTML, either directly or using a Rich Text Editor.

About your other questions:

Is there a WYSIWYG editor that includes the ability to whitelist on the fly?

I don't think this could work. You need server side code for this and the RTE runs on the client.

TinyMCE filters tags if you want but since this takes place in the browser you can't trust it. See extended_valid_elements. TinyMCE (Moxie) also suggests whitelisting, see here.

Should I even worry about this since it will only be for 'private posting'

You should always filter HTML unless there are specific reasons not to (very rare). Some reasons: a) functionality that is for internal users today maybe for the public tomorrow b) unauthorized access will have less of an impact

is the best way to let them store it in the Database in any form, but only display it properly encoded and stripped of bad tags?

That is the way I prefer it. I don't like to change user input before inserting into the database for various reasons.

1

Regarding point #4: You bet it's still an issue! Most hacks are an inside job, after all.

For a specific editor, I've had good luck using FreeTextBox but I can't speak to how well it matches up to your requirements, especially MVC.

1

My IMHO keep trusting your users until you will go public.

Well, there is no reliable way to achieve your needs. For example any WYSIWYG editor fail to protect form inserting images with URLs (indirect usage track, illegal content) or text (illegal text, misspelled text, missized text).

My point of view is that if you can trust your users, simply allow everything, just warn users if there are KNOW dangerous markup (to keep them from errors).

If you do not trust, use sort of special markup (e.g. Markdown).

In my project we use special types for potentially dangerous content and special methods for rendering and accepting such content. This code has high mark in our thread model and attention to it is very high (for example each change should be reviewed by two independent coders, we have comprehensive test suite and so on).

1

Use FckEditor. It's extremely customizable, integrates into asp.net quite well and has a direct feature of pasting word text into it.

0

One option might be the HTML Edit Control for .NET (which I wrote).

It's a WYSIWYM HTML editor for .NET, which only supports a subset of the HTML elements, excluding <script> elements: so in that way it acts as a whitelist.

If it's for internal use (i.e. an intranet site), then the control can be embedded in a web page.

I haven't integrated support for pasting from Word, but I do have a component which is a step in that direction: a Doc to HTML converter; so I have the building blocks which you could use in ASP.NET to convert a Doc to HTML, display the HTML in the editor, etc.

-1

I am doing the same thing. I am using TinyMCE and allowing pasting from Word documents. Only certain people that maintain the site can do this via an admin area. This is secured by ASP.Net Membership. I'm simple doing the HTML.Encode when it gets sent out to the public site.

You could use the code below if you like before it gets put in the database but not sure what knock on affect it would give you. You may have to go with your whitelist.

 /// <summary>
    /// Strip HTML
    /// </summary>
    /// <param name="str"></param>
    /// <returns></returns>
    public static string StripHTML(string str)
    {
        //Strips the HTML tags from strHTML 
        System.Text.RegularExpressions.Regex objRegExp = new System.Text.RegularExpressions.Regex("<(.|\n)+?>");

        // Replace all tags with a space, otherwise words either side 
        // of a tag might be concatenated 
        string strOutput = objRegExp.Replace(str, " ");

        // Replace all < and > with < and > 
        strOutput = strOutput.Replace("<", "<");
        strOutput = strOutput.Replace(">", ">");

        return strOutput;
    }