Skip to content Skip to sidebar Skip to footer

Best Way To "fix" Malformed Html For Use In An Xsl Transform

I have an input xml document that contains mal-formed html which has been xml encoded. i.e. the xml document itself is technically valid. Now I am applying an xsl transform to the

Solution 1:

Is using a third party library acceptable? The HTML Agility Pack (available on NuGet) might got part of the way to solving your invalid HTML and it also (according to the website) supports XSLT.

Solution 2:

I went for using a sgml parsing library and converting to valid xml.

I went for Mind Touch's library: https://github.com/MindTouch/SGMLReader

Once compiled and added to the GAC I could use this xsl:

<msxsl:script language="C#"implements-prefix="myns">
  <msxsl:assemblyname="SgmlReaderDll, Version=1.8.11.0, Culture=neutral, PublicKeyToken=46b2db9ca481831b"/>
    <![CDATA[
 public XPathNodeIterator SGMLStringToXml(string strSGML)
 {
 Sgml.SgmlReader sgmlReader = newSgml.SgmlReader();
 sgmlReader.DocType = "HTML";
 sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
 sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
 sgmlReader.InputStream = newSystem.IO.StringReader(strSGML);

 // create documentXmlDocument doc = newXmlDocument();
 doc.PreserveWhitespace = true;
 doc.XmlResolver = null;
 doc.Load(sgmlReader);
 return doc.CreateNavigator().Select("/*");
 }

 publicstringCurDir()
 {
 return (newSystem.IO.DirectoryInfo(".")).FullName;
 }
  ]]>

</msxsl:script>
<xsl:templatematch="node()"mode="PreventSelfClosingTags"><xsl:copy><xsl:apply-templatesselect="@* | node()"/><xsl:text></xsl:text></xsl:copy></xsl:template><xsl:templatematch="@*"mode="PreventSelfClosingTags"><xsl:copy><xsl:apply-templatesselect="@* | node()"/></xsl:copy></xsl:template>

and use it like so:

<xsl:apply-templates select="myns:SGMLStringToXml(.)/body/*" mode="PreventSelfClosingTags"/>

N.B. You have to run the transform manually with an XslCompiledTransform instance. The asp:xml control doesn't like the DLL reference.

Post a Comment for "Best Way To "fix" Malformed Html For Use In An Xsl Transform"