string - PHP cURL return empty header and body despite HTTP Code 200 -
so try scrap url: xxxx.fr curl, impossible access page html code, both header , body empty. http code return 200 tried other url (different domain) , works charm. try different user agent , referer
do know wrong ? @ lest can try code on own server , let me know if have same issue ?
thank you
below code:
$url = 'http://www.xxxx.fr'; $header[] = "accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[] = "cache-control: max-age=0"; $header[] = "connection: keep-alive"; $header[] = "keep-alive: timeout=5, max=100"; $header[] = "accept-charset: iso-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "accept-language: en-us,en;q=0.5"; $header[] = ""; // browsers leave blank $curl = curl_init (); curl_setopt($curl, curlopt_url, $url); curl_setopt($curl, curlopt_httpheader, $header); curl_setopt($curl, curlopt_useragent, "mozilla/5.0 (windows nt 6.1; wow64; rv:37.0) gecko/20100101 firefox/37.0"); curl_setopt($curl, curlopt_encoding, 'gzip,deflate'); curl_setopt($curl, curlopt_referer, "http://www.google.fr"); curl_setopt($curl, curlopt_header, 1); curl_setopt($curl, curlinfo_header_out, 1); curl_setopt($curl, curlopt_verbose, 1); curl_setopt($curl, curlopt_cookiefile, getcwd().'/cookies.txt'); curl_setopt($curl, curlopt_cookiejar, getcwd().'/cookies.txt'); curl_setopt($curl, curlopt_timeout, 30); curl_setopt($curl, curlopt_returntransfer, 1); $curldata = curl_exec($curl); $infos = curl_getinfo($curl); print_r($infos); curl_close ( $curl ); echo "<hr>page:<br />"; echo htmlentities($curldata);
and here result print_r($infos):
array ( [url] => http://www.xxxx.fr [content_type] => text/html [http_code] => 200 [header_size] => 625 [request_size] => 465 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 0.032535 [namelookup_time] => 0.001488 [connect_time] => 0.002581 [pretransfer_time] => 0.002639 [size_upload] => 0 [size_download] => 10234 [speed_download] => 314553 [speed_upload] => 0 [download_content_length] => -1 [upload_content_length] => 0 [starttransfer_time] => 0.032088 [redirect_time] => 0 [certinfo] => array ( ) [primary_ip] => xxx [primary_port] => 80 [local_ip] => xxx [local_port] => 37319 [redirect_url] => [request_header] => / http/1.1 user-agent: mozilla/5.0 (windows nt 6.1; wow64; rv:37.0) gecko/20100101 firefox/37.0 host: www.xxxx.fr accept-encoding: gzip,deflate referer: http://www.google.fr accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 cache-control: max-age=0 connection: keep-alive keep-alive: timeout=5, max=100 accept-charset: iso-8859-1,utf-8;q=0.7,*;q=0.7 accept-language: en-us,en;q=0.5 )
//edit
htmlentities($curldata) returns empty string because encoding of source non utf-8 string see link
that should works:
htmlentities($curldata, ent_quotes,'iso-8859-1' );
in php 5.4 release, htmlspecialchars() doesn’t use iso-8859-1 default encoding. in fact htmlspecialchars() of php 5.4 uses utf-8. might expect, htmlspecialchars() skip non-utf-8 byte sequences or translate them ‘no found’ character. in fact, htmlspecialchars() returns blank string: no error gets generated, no errorcode gets returned, no exception gets raised, blank string gets returned if non-valid utf-8 sequences passed in
Comments
Post a Comment