Filtering HTML to exclude all but a small number of HTML elements and attributes

How much do we trust user input? Not. At. All. That’s how much we trust user input. You just don’t know where they’ve been!

WordPress has such a plethora of functions for escaping and filtering input and output, that I’m always discovering new possibilities. One I found recently is wp_kses, which allows you to strip out all HTML leaving only a limited set of allowed elements and attributes.

Here’s a quick snippet which demonstrates what I’m talking about. All HTML will be stripped except links, strong (bold) and em (italic), and even those tags have a restricted set of attributes (e.g. no class, no style, etc):


$html = 'Wisi <a href="#" style="color: red;">defui nunc</a> dignissim <strong class="weird">transverbero ideo vel</strong> utinam blandit, iaceo meus epulae enim amet nibh sed brevitas. Pala consequat <script type="text/javascript" src="http://example.com/certainly/do/not/want/this.js"></script> capio sino regula typicus <small>luptatum</small> olim ullamcorper uxor in verto.';
$allowed_html =  array(
	'a' => array( 'href' => array(), 'title' => array(), 'target' => array() ),
	'em' => array(),
	'strong' => array(),
);
$html= wp_kses( $html, $allowed_html );
echo $html;
// Outputs: Wisi <a href="#">defui nunc</a> dignissim 
// <strong>transverbero ideo vel</strong> utinam blandit, 
// iaceo meus epulae enim amet nibh sed brevitas. 
// Pala consequat  capio sino regula typicus luptatum 
// olim ullamcorper uxor in verto.

Pretty cool, right? Now you too never need trust your users again.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.