given html code below want output text of h1 not "details ", text of span (which encapsulated h1).
my current output gives:
details new men's genuine leather bifold id credit card money holder wallet black i like:
new men's genuine leather bifold id credit card money holder wallet black here html working with
<h1 class="it-ttl" itemprop="name" id="itemtitle"><span class="g-hdn">details </span>new men's genuine leather bifold id credit card money holder wallet black</h1> here current code:
for line in soup.find_all('h1',attrs={'itemprop':'name'}): print line.get_text() note: not want truncate string because code have re-usability. best code crops out text bounded span.
you can use extract() remove span tags:
for line in soup.find_all('h1',attrs={'itemprop':'name'}): [s.extract() s in line('span')] print line.get_text() # => new men's genuine leather bifold id credit card money holder wallet black
Comments
Post a Comment