php - how to Parse a website that is using infinite scroll technique to display content? -


how can scrape data there? writing php script scrape data website having dynamic loader . using html dom parser , scoopy scrape following website https://www.lyoness.com/au/search/partner/ . beginner , not able identify how parse infinite scroller.

<input id="btnnextpage" type="button" class="btn btn-primary" style="width: 100%" value="next page"> 

this link used pull content using ajax,

https://www.lyoness.com/au/search/loadpage?cp=1&area=2&st=&rz=&rzc=&f=&ft=basic&c=au&r=12&la=en-au&s=default&ispreviouspageclick=false&_= 

the cp variable page number loading. means can loop through numbers if there still content returned.

you can't access link php because accessing through browser not possible, tried ajax , works, here ajax code can type in page console , change cp print ajax content, can add loop delay

$.ajax({ url:'https://www.lyoness.com/au/search/loadpage?cp=5&area=2&st=&rz=&rzc=&f=&ft=basic&c=au&r=12&la=en-au&s=default&ispreviouspageclick=false&_=', success:function(data){   console.log(data); } }) 

you can post returned data after scrape using jquery (which easy using php libraries) server post or request , save database using sort of api or disable cross domain security option browser.

edit:

here php code retrieve first page using curl

    if (!function_exists('curl_init')){             die('sorry curl not installed!');     }     $url = 'https://www.lyoness.com/au/search/loadpage?cp=1&ft=basic&c=au&r=12&la=en-au&s=default';      $ch = curl_init();     curl_setopt($ch,curlopt_encoding , "gzip");     curl_setopt($ch, curlopt_ssl_verifyhost, 0);     curl_setopt($ch, curlopt_ssl_verifypeer, 0);     curl_setopt($ch, curlopt_url, $url);     curl_setopt($ch, curlopt_useragent, "mozilla-djokage/1.0");     curl_setopt($ch, curlopt_header, 0);     curl_setopt($ch, curlopt_httpheader, array(         'x-requested-with: xmlhttprequest'     ));     curl_setopt($ch, curlopt_returntransfer, true);     curl_setopt($ch, curlopt_timeout, 10);     $output = curl_exec($ch);     echo $output;     //echo 'curl error: ' . curl_error($ch);      curl_close($ch); 

you need loop through cp variable in url can parse pages , need scrape $output html variable , save them db, have tried code , works fine. hope accept solution


Comments