i'm trying extract data webpage. however, if use urllib.request module result messy since data in html table (not wonderfully done, either).
i found if open page in browser , ctrl+a, ctrl+c, ctrl+v notepad - want. there way simulate in python, because there's large number of pages need on.
i've tried using beautifulsoup, said - tables done badly, come down modifying code extracting text every table, take more time manually copy-pasting.
there alternatives.
- still using
urllibremoving html tags or using beautiful soup. - if you're familiar qt, use
qwebkitmodule load web pages , extract text. - the selenium driver control web browser.
- if using windows , portability not in plans, can use winapi (
sendmessageorpostmessage) simulate ctrl + a, ctrl + c , ctrl + v.
hope helps!
Comments
Post a Comment