python - beautifulsoup .get_text() is not specific enough for my HTML parsing -


given html code below want output text of h1 not "details  ", text of span (which encapsulated h1).

my current output gives:

details   new men's genuine leather bifold id credit card money holder wallet black 

i like:

new men's genuine leather bifold id credit card money holder wallet black 

here html working with

<h1 class="it-ttl" itemprop="name" id="itemtitle"><span class="g-hdn">details  &nbsp;</span>new men&#039;s genuine leather bifold id credit card money holder wallet black</h1> 

here current code:

for line in soup.find_all('h1',attrs={'itemprop':'name'}):     print line.get_text() 

note: not want truncate string because code have re-usability. best code crops out text bounded span.

you can use extract() remove span tags:

for line in soup.find_all('h1',attrs={'itemprop':'name'}):     [s.extract() s in line('span')] print line.get_text() # => new men's genuine leather bifold id credit card money holder wallet black 

Comments