python - How to get scrapy results orderly? -
help me scrapy. code resulting output doesn't print corrected way.
i tried inside loop not give correct result, anyway if found missing in there.. please tel me
code:
import scrapy class yelpscrapy(scrapy.spider): name = 'yelp' start_urls = ["http://www.yelp.com/search?find_desc=pet+grooming+services&find_loc=starnberg%2c+bayern",] def print_link(self, link): return link def parse(self, response): website = scrapy.selector(response) items = [] obj in website.xpath("//div[@class='main-attributes']"): item = yelpitem() # getting name item['name'] = obj.xpath("//div[@class='media-story']//h3//a/text()").extract() # getting addresss item['address'] = obj.xpath("//div[@class='secondary-attributes']//address").extract() items.append(item) return items
the resulting output comes like:
'name': [u'tierschutzverein starnberg u. umgebung', u'm\xfcmmelpension', u'hundesportverein starnberg e. v.', u'bellness hundesalon', u'california dog spa', u'gassi germering', u'hundesalon tanaka beauty & spa', u'hundesalon popp', u'neuhauser hundeladen', u'therafelis katja r\xfcssel'], {'address': [u'<address>\n franziskusweg 34<br>82319 starnberg<br>germany\n </address>', u'<address>\n st.-michael-str. 19<br>82319 starnberg<br>germany\n </address>', u'<address>\n j\xe4gersbrunner str. 1<br>82319 starnberg<br>germany\n </address>', u'<address>\n baierbrunner str. 1<br>81379 munich<br>germany\n </address>', u'<address>\n geigenbergerstr. 51<br>81477 solln<br>germany\n </address>', u'<address>\n donnersbergerstr. 30<br>80634 munich<br>germany\n </address>', u'<address>\n els\xe4sser stra\xdfe 24<br>81667 munich<br>germany\n </address>', u'<address>\n schluderstr. 40<br>80634 munich<br>germany\n </address>', u'<address>\n fliederstr. 23<br>82131 gauting<br>germany\n </address>'],
why it's not coming in order {{name, address}{name, address}}
.
that's because locators match multiple elements , not context-specific (should start dot), fix it:
def parse(self, response): obj in response.css("ul.search-results li"): item = yelpitem() item['name'] = obj.xpath(".//div[@class='media-story']//h3//a/text()").extract()[0] item['address'] = ''.join(obj.xpath(".//div[@class='secondary-attributes']//address/text()").extract()).strip() yield item
Comments
Post a Comment