Skip to content Skip to sidebar Skip to footer

Regex - Match Not In Tag

this should be easy but somehow I can't figure it out: I have HTML snippet like this one:

This is 201 some 20 text 1

Solution 1:

It is really simple: extract only the text with an HTML parser, then use regular expressions on that.

Solution 2:

Regular expressions are meant to parse regular languages - those that can be described with finite automata. HTML is not a regular language. Parsing HTML with regular expressions is the Cthulhu way: Parsing Html The Cthulhu Way.

Solution 3:

HTML should not be parsed with regex because it's not a regular language. You might be able to do it to properly form XHTML, but I wouldn't recommend it. See the most voted up answer on SO

Post a Comment for "Regex - Match Not In Tag"