Skip to content Skip to sidebar Skip to footer

Extract Values From Html Td And Tr

I have some HTML source that i get from a website for option quotes. (please see below) What is the best way to extract the various text values in tr and store in a collection base

Solution 1:

After some fiddling I have derived a regex/VBA solution using

  1. XMLHTTP to access the site (change strSite to suit)
  2. a Regexp to get the required numbers
  3. a variant array with 20 records to hold, then dump the numbers to the active sheet

outputLooking at the source HTML to find Regex patterns

The Call options have a common starting and finishing string that delimit the 10 values, but there are three different strings

  1. Strings 1-4,7-10 for each record match <td class="ylwbg">X</td>
  2. String 6 has a Style (and other text) preceding the > before the X
  3. String 5 contains a much longer <a href textX</a>

A regex of .Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(<\/td>)" extracts all the needed strings, but further work is needed later on string 5

The Put options start with <td class="nobg" so these are happily not extracted by a regex that gets points 1-3

enter image description hereActual Code

Sub GetTxt()
    Dim objXmlHTTP AsObjectDim objRegex AsObjectDim objRegMC AsObjectDim objRegM AsObjectDim strResponse AsStringDim strSite AsStringDim lngCnt AsLongDim strTemp AsStringDim X(1To20, 1To10)
    X(1, 1) = "OI"
    X(1, 2) = "Chng in vol"
    X(1, 3) = "Volume"
    X(1, 4) = "IV"
    X(1, 5) = "LTP"
    X(1, 6) = "Net Chg"
    X(1, 7) = "Bid Qty"
    X(1, 8) = "Bid Price"
    X(1, 9) = "Ask Price"
    X(1, 10) = "Ask Qnty"Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
    strSite = "http://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionDates.jsp?symbol=NIFTY&instrument=OPTIDX&strike=4700.00"OnErrorGoTo ErrHandler
    With objXmlHTTP
        .Open "GET", strSite, False
        .Send
        If .Status = 200Then strResponse = .ResponseText
    EndWithOnErrorGoTo0Set objRegex = CreateObject("vbscript.regexp")
    With objRegex
        '*cleaning regex* to remove all spaces
        .Pattern = "[\xA0\s]+"
        .Global = True
        strResponse = .Replace(strResponse, vbNullString)
        .Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(<\/td>)"If .Test(strResponse) Then
            lngCnt = 20Set objRegMC = .Execute(strResponse)
            ForEach objRegM In objRegMC
                lngCnt = lngCnt + 1If Right$(objRegM.submatches(2), 2) <> "a>"Then
                    X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod10 > 0, lngCnt Mod10, 10)) = objRegM.submatches(2)
                Else'Get submatches of the form <a href="/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=4700.00&type=CE&expiry=23FEB2012" target="_blank"> 206.40</a>
                    strTemp = Val(Right(objRegM.submatches(2), Len(objRegM.submatches(2)) - InStrRev(objRegM.submatches(2), """") - 1))
                    X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod10 > 0, lngCnt Mod10, 10)) = strTemp
                EndIfNextElse
            MsgBox "Parsing unsuccessful", vbCritical
        EndIfEndWithSet objRegex = NothingSet objXmlHTTP = Nothing
    [a1].Resize(UBound(X, 1), UBound(X, 2)) = X
    ExitSubErrHandler:
    MsgBox "Site not accessible"IfNot objXmlHTTP IsNothingThenSet objXmlHTTP = NothingEndSub

Post a Comment for "Extract Values From Html Td And Tr"