perl - Unable to get the web content using LWP::Simple but able to get content from LWP::UserAgent -
i trying run below code parse contents of html page below url
#!/usr/bin/perl use lwp::simple; use html::treebuilder; $response = get("http://www.viki.com/"); print $response; nothing gets printed. working if emulated browser.
when try access http://www.viki.com using lwp::useragent following response:
<html><body><h1>403 forbidden</h1> request forbidden administrative rules. </body></html> the get subroutine in lwp::simple implemented follows (at least in version 6.13).
sub ($) { $response = $ua->get(shift); return $response->decoded_content if $response->is_success; return undef; } as can see, get method return content if response success, otherwise return undef.
the response lwp::useragent 403 error, in other words not success. therefore, lwp::simple return undef same url.
it appears website (http://www.viki.com) checking user agent string , returning content "valid" user agents. lwp::simple hard-coded use lwp::simple/$version user agent.
if must use lwp::simple force user agent this:
use lwp::simple qw/ $ua /; $ua->agent('mozilla/5.0 (windows nt 6.1; wow64; rv:37.0) gecko/20100101 firefox/37.0'); print get('http://www.viki.com'); lwp::simple exposes lwp::useragent instance uses internally optionally included $ua variable. still necessary configure user agent on instance particular page load.
Comments
Post a Comment