Extract Values From Html Td And Tr
I have some HTML source that i get from a website for option quotes. (please see below) What is the best way to extract the various text values in tr and store in a collection base
Solution 1:
After some fiddling I have derived a regex/VBA solution using
- XMLHTTP to access the site (change
strSite
to suit) - a Regexp to get the required numbers
- a variant array with 20 records to hold, then dump the numbers to the active sheet
Looking at the source HTML to find Regex patterns
The Call options have a common starting and finishing string that delimit the 10 values, but there are three different strings
- Strings 1-4,7-10 for each record match
<td class="ylwbg">
X</td>
- String 6 has a
Style
(and other text) preceding the>
before theX
- String 5 contains a much longer
<a href text
X</a>
A regex of
.Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(<\/td>)"
extracts all the needed strings, but further work is needed later on string 5
The Put options start with <td class="nobg"
so these are happily not extracted by a regex that gets points 1-3
Actual Code
Sub GetTxt()
Dim objXmlHTTP AsObjectDim objRegex AsObjectDim objRegMC AsObjectDim objRegM AsObjectDim strResponse AsStringDim strSite AsStringDim lngCnt AsLongDim strTemp AsStringDim X(1To20, 1To10)
X(1, 1) = "OI"
X(1, 2) = "Chng in vol"
X(1, 3) = "Volume"
X(1, 4) = "IV"
X(1, 5) = "LTP"
X(1, 6) = "Net Chg"
X(1, 7) = "Bid Qty"
X(1, 8) = "Bid Price"
X(1, 9) = "Ask Price"
X(1, 10) = "Ask Qnty"Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
strSite = "http://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionDates.jsp?symbol=NIFTY&instrument=OPTIDX&strike=4700.00"OnErrorGoTo ErrHandler
With objXmlHTTP
.Open "GET", strSite, False
.Send
If .Status = 200Then strResponse = .ResponseText
EndWithOnErrorGoTo0Set objRegex = CreateObject("vbscript.regexp")
With objRegex
'*cleaning regex* to remove all spaces
.Pattern = "[\xA0\s]+"
.Global = True
strResponse = .Replace(strResponse, vbNullString)
.Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(<\/td>)"If .Test(strResponse) Then
lngCnt = 20Set objRegMC = .Execute(strResponse)
ForEach objRegM In objRegMC
lngCnt = lngCnt + 1If Right$(objRegM.submatches(2), 2) <> "a>"Then
X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod10 > 0, lngCnt Mod10, 10)) = objRegM.submatches(2)
Else'Get submatches of the form <a href="/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=4700.00&type=CE&expiry=23FEB2012" target="_blank"> 206.40</a>
strTemp = Val(Right(objRegM.submatches(2), Len(objRegM.submatches(2)) - InStrRev(objRegM.submatches(2), """") - 1))
X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod10 > 0, lngCnt Mod10, 10)) = strTemp
EndIfNextElse
MsgBox "Parsing unsuccessful", vbCritical
EndIfEndWithSet objRegex = NothingSet objXmlHTTP = Nothing
[a1].Resize(UBound(X, 1), UBound(X, 2)) = X
ExitSubErrHandler:
MsgBox "Site not accessible"IfNot objXmlHTTP IsNothingThenSet objXmlHTTP = NothingEndSub
Post a Comment for "Extract Values From Html Td And Tr"