we had issue yesterday prevented gsa crawler loging in our website crawl. because of many of urls indexed login page. see lot of results on search page titled "please log in" (title of login page). when check index diagnostics crawl status these urls "retrying url: connection reset peer during fetch.".
now login problem resolved , once page re-crawled crawl status goes successful , picking page content , search results show proper title.. since cannot control being crawled there pages still haven't been re-crawled , still have problem.
there not uniform url can force re-crawl. hence question: there way force re-crawl based on crawl status ("retrying url: connection reset peer during fetch.")? if specific how re-crawl based on crawl status type (errors/successful/excluded)?
export error url csv file using "index> diagnostics > index diagnostics"
open csv , apply filter on crawl status colum , urls having error looking for.
copy urls , goto "content sources > web crawl > freshness tuning>recrawl these url patterns" , paste , click on recrawl
that's it. done!
ps: if error urls more (>10000,if not wrong), may not able of them in single csv file. in case can in batches.
regards,
mohan
Comments
Post a Comment