각종 도움되는 정보들: [프로그래밍] Regular Expression: ASP Strip Tags

This week, I have been mostly coding MAZIN. You will find out what that means when listening to Ram FM soon.

As part of the project, I had to create a page with a WYSIWYG editor (also known as Rich Text Editors) that would allow users to compose copy that may or may not include simple HTML tags such as bold, italic, lists, breaks and paragraphs.

As all sites we develop these days are XHTML based and standards compliant, I found that FCKEditor was the best choice - even though it is what Rich would call “Bloatware” - i.e, it’s rediculously large in terms of directories/language files/etc. That reminds me, I need to go through it and delete all the unwanted languages and plugin-in scripts before going live.

The problem with FCKEditor is that it will still allow users to post HTML that is not allowed i.e, you have told FCKEditor that you only want users to be able to make text bold, italic or whatever. This means that when we send the form, we need a strip_tags function, like PHP has.

Haven’t you already posted an ASP one!? I hear you ask. Well, I did. But the PHP version of strip_tags allows you to specify which tags you want to remain: www.php.net/strip_tags

After some researching, it seems that nobody has come up with an ASP version of this sweet function, so I wrote my own et voila:

'	=============================================================================================================

'	@name		stripHTML

'	@desc 		strips all HTML from code except for tags seperated by commas in the param "allowedTags"

'	@returns	string

'	=============================================================================================================

function stripHTML(strHTML, allowedTags)

	

	dim objRegExp, strOutput

	set objRegExp = new regexp

	

	strOutput = strHTML

	allowedTags = "," & lcase(replace(allowedTags, " ", "")) & ","

	

	objRegExp.IgnoreCase = true

	objRegExp.Global = true

	objRegExp.MultiLine = true

	objRegExp.Pattern = "< (.|
)+?>" ' match all tags, even XHTML ones

	set matches = objRegExp.execute(strHTML)

	objRegExp.Pattern = "< (/?)(w+)[^>]*>"

	for each match in matches

		tagName = objRegExp.Replace(match.value, "$2")

		tagName = "," & lcase(tagName) & ","

	

		if instr(allowedTags,tagName) = 0 then

			strOutput = replace(strOutput, match.value, "")

		end if

	next

	

	stripHTML = strOutput    'Return the value of strOutput

	set objRegExp = nothing

end function

Usage is simple, just do:

html = stripHTML(html, "b,i,strong,em,p,br")

Where b, i, strong, em, p and br are the tags you are allowing.

That’s all for now

Feb 05

Useful Regular Expressions in ASP

ASP, Regular Expressions No Comments »

While working on an ASP ticket system today that required regular expressions, I came up with a couple of useful regular expression patterns that may save people a few hours of thinking time.

Matching and extracting a string

Problem: I have the following chunk of arbitrary text and I want to extract the order number prefixed “ORD_”:

The quick brown fox... ORD_1012345678 ...jumped over the lazy dog

Solution: ORD_[a-zA-Z0-9_-]*

What is going on? Well, quite simply the regular expression engine is being asked to match the first three letters “ORD” followed by an underscore “_”. It then requires a series (*) of letters, numbers, underscores or dashes (but nothing else). Therefore, once the regular expression engine has found the order number “ORD_1012345678″ and then it comes to a whitespace, new line, period or whatever - it stops parsing.

ASP VBScript Code:

Set regEx = New RegExp

With regEx

	.Pattern = "ORD_[a-zA-Z0-9_-]*"

	.IgnoreCase = true

	.Global = false

End With

set matches = regEx.Execute(text)

if matches.count > 0 then

	result = matches.item(0).value

end if

The string “ORD_1012345678″, extracted from the chunk of text, will be stored in the variable “result”

A very similar version of string extraction

Problem: I have the following chunk of arbitrary text and I want to extract the ID number in square brackets (prefixed “[#”):

The quick brown fox jumped over the lazy dog [#101234-56789]

Solution: [#([a-zA-Z0-9_-]*)

What is going on? In a similar way to the first one, this regular expression match pattern is asking for a square bracket followed by a hash “[#” - but because the opening square bracket is a reserved character (used to define sets), we have to escape it with a backwards slash before hand. We then surround the series of allowed characters with parenthesis ( ) which groups the match as a “sub match”.

ASP VBScript Code:

Set regEx = New RegExp

With regEx

	.Pattern = "[#([a-zA-Z0-9_-]*)"

	.IgnoreCase = true

	.Global = false

End With

set matches = regEx.Execute(text)

if matches.count > 0 then

	result = matches(0).subMatches(0)

end if

The ID number “101234-56789″ will be stored in “result”

The important difference to note in this code is the use of “subMatches(0)” which returns the first match found in the brackets.

Stripping HTML tags

This function can be used to strip HTML tags from a string. It is very similar to the PHP function strip_tags(); but this one is not as advanced (yet).

A more advanced version is now available here

Let’s just jump straight to the code, you don’t really need to know what is going on (you can probably guess anyway)…

ASP VBScript Code:

function stripTags(strHTML)

	dim regEx

	Set regEx = New RegExp

	With regEx

		.Pattern = "< (.|
)+?>"

		.IgnoreCase = true

		.Global = false

	End With

	stripTags = regEx.replace(strHTML, "")

end function

Trimming unwanted whitespace

If you want to trim unwanted whitespace from a string, e.g: turning “Text[space]spaced[space]normally[space][space][space]or[space][space]not?” into: “Text[space]spaced[space]normally[space]or[space]not?” use the following method:

function trimWhitespace(strIn, singleSpacing)

	dim regEx

	Set regEx = New RegExp

	With regEx

		.Pattern = "s+"

		.IgnoreCase = true

		.Global = false

	End With

	if singleSpacing then

		space = " "

	else

		space = ""

	end if

	trimWhitespace = regEx.replace(strIn, space)

end function

When set to false, the second parameter “singleSpacing” will simply remove all whitespaces from a string, giving: “Textspacednormallyornot?”

I hope the above examples help someone!

You may find the following websites useful, I certainly did!

2010년 8월 25일 수요일

[프로그래밍] Regular Expression: ASP Strip Tags

Useful Regular Expressions in ASP

Matching and extracting a string

A very similar version of string extraction

Stripping HTML tags

Trimming unwanted whitespace

댓글 없음:

댓글 쓰기

각종 도움되는 정보들

팔로어

블로그 보관함

프로필