Htmlkit Swift Parsing
Parse between elements eg 7:33AM \n Dinner \n \n 12:23
Solution 1:
you can solve this the same way you would do it on any other browser. The problem is not HTMLKit specific.
Since there is no way to select a HTML Text Node via CSS, you have to select its parent and then access the text via the textContent property or access the parent node's child nodes.
So here are some options to solve your problem, using HTMLKit as an example and the following sample DOM:
let html ="""
<html>
<body>
<dl>
<dt>Breakfast</dt>
<dd id="Breakfast"><span>10:00</span>AM</dd>
<dt>Dinner</dt>
<dd id="Dinner"><span>12:23</span>PM</dd>
</dl>
</body>
</html>
"""let doc =HTMLDocument(string: html)
let elements = doc.querySelectorAll("dd")
- Option 1: Select the
ddelements and access thetextContent
elements.forEach { ddElement inprint(ddElement.textContent)
}
// Would produce:// 10:00AM// 12:23PM- Option 2: Select the
ddelements and iterate through their child nodes, while filtering out everything except forHTMLTextnodes. Additionally you can provide your own custom filter:
elements.forEach { ddElement inlet iter: HTMLNodeIterator= ddElement.nodeIterator(showOptions: [.text], filter: nil)
iter.forEach { node inlet textNode = node as!HTMLTextprint(textNode.textContent)
}
}
// Would produce:// 10:00// AM// 12:23// PM- Option 3: Expanding on the previous option, you can provide a custom filter for the node iterator:
for dd in elements {
let iter: HTMLNodeIterator= dd.nodeIterator(showOptions: [.text]) { node inif!node.textContent.contains("AM") &&!node.textContent.contains("PM") {
return .reject
}
return .accept
}
iter.forEach { node inlet textNode = node as!HTMLTextprint(textNode.textContent)
}
}
// Would produce:// AM// PM- Option 4: Wrap the
AMandPMin their own<span>elements and access those, e.g. withdd > spanselector:
doc.querySelectorAll("dd > span").forEach { elem inprint(elem.textContent)
}
// Given the sample DOM would produce:// 10:00// 12:23// if you wrap the am/pm in spans then you would also get those in the outputYour snippet produces: ["", ""] with the sample DOM from above. Here is why:
let test: [String] = doc.querySelectorAll("span")
.compactMap { element in// element is a <span> HTMLElement// However the elements returned here are <dt> elements and not <span>guardlet span = doc.querySelector("dt") else {
returnnil
}
// The <dt> elements in the DOM do not have IDs, hence an empty string is returnedreturn span.elementId
}
I hope this helps and clarifies some things.
Post a Comment for "Htmlkit Swift Parsing"