Starting Out

This section out lines the most basic uses of Lassie


What Lassie Returns

Lassie aims to return the most beautifully crafted dictionary of important information about the web page.

Beginning

So, let’s say you want to retrieve details about a YouTube video.

Specifically: http://www.youtube.com/watch?v=dQw4w9WgXcQ

>>> import lassie
>>> lassie.fetch('http://www.youtube.com/watch?v=dQw4w9WgXcQ')
{
    'description': u'Music video by Rick Astley performing Never Gonna Give You Up. YouTube view counts pre-VEVO: 2,573,462 (C) 1987 PWL',
    'videos': [{
        'src': u'http://www.youtube.com/v/dQw4w9WgXcQ?version=3&autohide=1',
        'height': 480,
        'type': u'application/x-shockwave-flash',
        'width': 640
    }, {
        'src': u'https://www.youtube.com/embed/dQw4w9WgXcQ',
        'height': 480,
        'width': 640
    }],
    'title': u'Rick Astley - Never Gonna Give You Up',
    'url': u'http://www.youtube.com/watch?v=dQw4w9WgXcQ',
    'keywords': [u'Rick', u' Astley', u' Sony', u' BMG', u' Music', u' UK', u' Pop'],
    'images': [{
        'src': u'http://i1.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg?feature=og',
        'type': u'og:image'
    }, {
        'src': u'http://i1.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg',
        'type': u'twitter:image'
    }, {
        'src': u'http://s.ytimg.com/yts/img/favicon-vfldLzJxy.ico',
        'type': u'favicon'
    }, {
        'src': u'http://s.ytimg.com/yts/img/favicon_32-vflWoMFGx.png',
        'type': u'favicon'
    }],
    'locale': u'en_US'
}

Or what if you wanted to get information about an article?

Specifically: http://techcrunch.com/2013/01/16/github-passes-the-3-million-developer-mark/

>>> import lassie
>>> lassie.fetch('http://techcrunch.com/2013/01/16/github-passes-the-3-million-developer-mark/')
{
    'description': u"GitHub has surpassed the 3 million-developer mark, a milestone for the collaborative platform for application development.\xa0GitHub said it happened Monday night on the first day of the company's\xa0all-hands winter summit. Launched\xa0in April 2008, GitHub\xa0celebrated\xa0its first million users in..",
    'videos': [],
    'title': u'GitHub Passes The 3 Million Developer Mark | TechCrunch',
    'url': u'http://techcrunch.com/2013/01/16/github-passes-the-3-million-developer-mark/',
    'locale': u'en_US',
    'images': [{
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/01/github-logo.png?w=150',
        'type': u'og:image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/01/github-logo.png',
        'type': u'twitter:image'
    }, {
        'src': u'http://s2.wp.com/wp-content/themes/vip/tctechcrunch2/images/favicon.ico?m=1357660109g',
        'type': u'favicon'
    }, {
        'src': u'http://s2.wp.com/wp-content/themes/vip/tctechcrunch2/images/favicon.ico?m=1357660109g',
        'type': u'favicon'
    }]
}

Lassie, by default, also filters for content from Twitter Cards, grab favicons and touch icons.

Priorities

Open Graph values takes priority over other values (Twitter Card data, generic data, etc.)

In other words, if a website has the title of their page as <title>YouTube</title> and they have their Open Graph title set <meta property="og:title" content="YouTube | A Video Sharing Site" />

The value of title when you fetch the web page will return as “YouTube | A Video Sharing Site” instead of just “YouTube”.

But what if I don’t want open graph data?

Then pass open_graph=False to the fetch method.

>>> lassie.fetch('http://techcrunch.com/2013/01/16/github-passes-the-3-million-developer-mark/', open_graph=False)
{
    'description': u"GitHub has surpassed the 3 million-developer mark, a milestone for the collaborative platform for application development.\xa0GitHub said it happened Monday night on the first day of the company's\xa0all-hands winter summit. Launched\xa0in April 2008, GitHub\xa0celebrated\xa0its first million users in..",
    'videos': [],
    'title': u'GitHub Passes The 3 Million Developer Mark | TechCrunch',
    'url': u'http://techcrunch.com/2013/01/16/github-passes-the-3-million-developer-mark/',
    'locale': u'en_US',
    'images': [{
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/01/github-logo.png?w=150',
        'type': u'og:image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/01/github-logo.png',
        'type': u'twitter:image'
    }, {
        'src': u'http://s2.wp.com/wp-content/themes/vip/tctechcrunch2/images/favicon.ico?m=1357660109g',
        'type': u'favicon'
    }, {
        'src': u'http://s2.wp.com/wp-content/themes/vip/tctechcrunch2/images/favicon.ico?m=1357660109g',
        'type': u'favicon'
    }]
}

If you don’t want Twitter cards, favicons or touch icons, use any combination of the following parameters and pass them to fetch:

  • Pass twitter_card=False to exclude Twitter Card data from being filtered
  • Pass touch_icon=False to exclude the Apple touch icons from being added to the images array
  • Pass favicon=False to exclude the favicon from being added to the images array

Obtaining All Images

Sometimes you might want to obtain a list of all the images on a web page... simple, just pass all_images=True to fetch.

>>> lassie.fetch('http://techcrunch.com/2013/01/16/github-passes-the-3-million-developer-mark/', all_images=True)
{
    'description': u"GitHub has surpassed the 3 million-developer mark, a milestone for the collaborative platform for application development.\xa0GitHub said it happened Monday night on the first day of the company's\xa0all-hands winter summit. Launched\xa0in April 2008, GitHub\xa0celebrated\xa0its first million users in..",
    'videos': [],
    'title': u'GitHub Passes The 3 Million Developer Mark | TechCrunch',
    'url': u'http://techcrunch.com/2013/01/16/github-passes-the-3-million-developer-mark/',
    'locale': u'en_US',
    'images': [{
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/01/github-logo.png?w=150',
        'type': u'og:image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/01/github-logo.png',
        'type': u'twitter:image'
    }, {
        'src': u'http://s2.wp.com/wp-content/themes/vip/tctechcrunch2/images/favicon.ico?m=1357660109g',
        'type': u'favicon'
    }, {
        'src': u'http://s2.wp.com/wp-content/themes/vip/tctechcrunch2/images/favicon.ico?m=1357660109g',
        'type': u'favicon'
    }, {
        'src': u'http://s2.wp.com/wp-content/themes/vip/tctechcrunch2/images/site-logo-cutout.png?m=1342508617g',
        'alt': u'',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/countdown4.jpg?w=640',
        'alt': u'Main Event Page',
        'type': u'body_image'
    }, {
        'src': u'http://2.gravatar.com/avatar/b4e205744ae2f9b44921d103b4d80e54?s=60&d=identicon&r=G',
        'alt': u'',
        'height': 60,
        'type': u'body_image',
        'width': 60
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/01/github-logo.png?w=300',
        'alt': u'github-logo',
        'height': 300,
        'type': u'body_image',
        'width': 300
    }, {
        'src': u'http://crunchbase.com/assets/images/resized/0001/7208/17208v9-max-150x150.png',
        'alt': u'',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/tardis-egg.jpg?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/made-in-space-zero-gravity.jpg?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/04/apple1.jpg?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/p9130014.jpg?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/htc.png?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/screen-shot-2013-08-13-at-8-18-25-pm.png?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/24112v5-max-250x250.jpg?w=89&h=63&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/surface-14.jpg?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/sprawl_tuned_robot.jpg?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/ashton-kutcher-jobs.jpg?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/facebook-commerce.png?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/screen-shot-2013-08-14-at-10-23-20-am.png?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2012/10/ibm_logo.jpg?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/screen-shot-2013-08-15-at-12-09-16.png?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/inklogo.jpg?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }, {
        'src': u'http://tctechcrunch2011.files.wordpress.com/2013/08/screen-shot-2013-08-15-at-9-31-21-am.png?w=89&h=64&crop=1',
        'alt': '',
        'type': u'body_image'
    }]
}

So, now you know the basics. What if you don’t want to declare params every time to the fetch method? Head over to the advanced usage section to learn about the Lassie class.