Skip to content Skip to sidebar Skip to footer

Parse A Html File With Table Using Python

I got problem with my python parser. its a part of my file:
03.12. 10:45:00
<

Solution 1:

Find alltr tags and get td tags by class attribute:

# encoding: utf-8
from bs4 import BeautifulSoup

data = u"""
<table><tr><tdclass="zeit"><div>03.12. 10:45:00</div></td><tdclass="system"><div><atarget="_blank"href="detail.php?host=CG&factor=2&delay=1&Y=15">CG</div></a></td><tdclass="fehlertext"><div>System steht nicht zur Verfügung!</div></td></tr><tr><tdclass="zeit"><div>03.12. 10:10:01</div></td><tdclass="system"><div><atarget="_blank"href="detail.php?host=DEXProd&factor=2&delay=5&Y=15">DEX</div></a></td><tdclass="fehlertext"><div>ssh: Connection refused Couldn't read packet: Connection reset by peer</div></td></tr><tr><tdclass="zeit"><div>03.12. 06:23:06</div></td><tdclass="system"><div><atarget="_blank"href="detail.php?host=FRAUD&factor=2&delay=1&Y=15">Boni</div></a></td><tdclass="fehlertext"><div>ID Fehler</div></td></tr></table>
"""

soup = BeautifulSoup(data)
for tr in soup.find_all('tr'):
    zeit = tr.find('td', class_='zeit').get_text(strip=True)
    system = tr.find('td', class_='system').get_text(strip=True)
    fehlertext = tr.find('td', class_='fehlertext').get_text(strip=True)

    print zeit, system, fehlertext

Prints:

03.12. 10:45:00 CG System steht nicht zur Verfügung!
03.12. 10:10:01 DEX ssh: Connection refused Couldn't read packet: Connection reset by peer03.12. 06:23:06 Boni ID Fehler

Post a Comment for "Parse A Html File With Table Using Python"