Advanced Usage¶
This section will cover how to use the Lassie
class to maintain settings across all fetch
calls.
Class Level Attributes¶
Constructing a Lassie
class and calling fetch
will use all the default params that are available to fetch
.
>>> from lassie import Lassie
>>> l = Lassie()
>>> l.fetch('https://github.com/michaelhelmick')
{
'images': [{
'src': u'https://github.global.ssl.fastly.net/images/modules/logos_page/Octocat.png',
'type': u'og:image'
}, {
'src': u'https://github.com/favicon.ico',
'type': u'favicon'
}],
'url': 'https://github.com/michaelhelmick',
'description': u'michaelhelmick has 22 repositories written in Python, Shell, and JavaScript. Follow their code on GitHub.',
'videos': [],
'title': u'michaelhelmick (Mike Helmick) \xb7 GitHub'
}
>>> l.fetch('https://github.com/ashibble')
{
'images': [{
'src': u'https://github.global.ssl.fastly.net/images/modules/logos_page/Octocat.png',
'type': u'og:image'
}, {
'src': u'https://github.com/favicon.ico',
'type': u'favicon'
}],
'url': 'https://github.com/ashibble',
'description': u'Follow ashibble on GitHub and watch them build beautiful projects.',
'videos': [],
'title': u'ashibble (Alexander Shibble) \xb7 GitHub'
}
If you decide that you don’t want to filter for Open Graph data, instead of declaring open_graph=False
in every fetch
call:
>>> import lassie
>>> l = Lassie()
>>> l.fetch('https://github.com/michaelhelmick', open_graph=False)
>>> l.fetch('https://github.com/ashibble', open_graph=False)
You can use the Lassie
class and set attibutes on the class.
>>> from lassie import Lassie
>>> l = Lassie()
>>> l.open_graph = False
>>> l.fetch('https://github.com/michaelhelmick')
{
'images': [{
'src': u'https://github.com/favicon.ico',
'type': u'favicon'
}],
'url': 'https://github.com/michaelhelmick',
'description': u'michaelhelmick has 22 repositories written in Python, Shell, and JavaScript. Follow their code on GitHub.',
'videos': [],
'title': u'michaelhelmick (Mike Helmick) \xb7 GitHub'
}
>>> l.fetch('https://github.com/ashibble')
{
'images': [{
'src': u'https://github.com/favicon.ico',
'type': u'favicon'
}],
'url': 'https://github.com/ashibble',
'description': u'Follow ashibble on GitHub and watch them build beautiful projects.',
'videos': [],
'title': u'ashibble (Alexander Shibble) \xb7 GitHub'
}
You’ll notice the data for the Open Graph properties wasn’t returned in the last responses. That’s because passing open_graph=False
tells Lassie to not filter for those properties.
In the edge case that there is a time or two you want to override the class attribute, just pass the parameter to fetch
and Lassie will use that parameter.
>>> from lassie import Lassie
>>> l = Lassie()
>>> l.open_graph = False
>>> l.fetch('https://github.com/michaelhelmick')
{
'images': [{
'src': u'https://github.com/favicon.ico',
'type': u'favicon'
}],
'url': 'https://github.com/michaelhelmick',
'description': u'michaelhelmick has 22 repositories written in Python, Shell, and JavaScript. Follow their code on GitHub.',
'videos': [],
'title': u'michaelhelmick (Mike Helmick) \xb7 GitHub'
}
>>> l.fetch('https://github.com/ashibble', open_graph=True)
{
'images': [{
'src': u'https://github.global.ssl.fastly.net/images/modules/logos_page/Octocat.png',
'type': u'og:image'
}, {
'src': u'https://github.com/favicon.ico',
'type': u'favicon'
}],
'url': 'https://github.com/ashibble',
'description': u'Follow ashibble on GitHub and watch them build beautiful projects.',
'videos': [],
'title': u'ashibble (Alexander Shibble) \xb7 GitHub'
}
Manipulate the Request (headers, proxies, etc.)¶
There are times when you may want to turn SSL verification off, send custom headers, or add proxies for the request to go through.
Lassie uses the requests library to make web requests. requests
accepts a few parameters to allow developers to manipulate the acutal HTTP request.
Here is an example of sending custom headers to a lassie request:
from lassie import Lassie
l = Lassie()
l.request_opts = {
'headers': {
'User-Agent': 'python lassie'
}
}
l.fetch('http://google.com')
Maybe you want to set a request timeout, here’s another example:
from lassie import Lassie
l = Lassie()
l.request_opts = {
'timeout': 10 # 10 seconds
}
# If the response takes longer than 10 seconds this request will fail
l.fetch('http://google.com')
Playing Nice with non-HTML Files¶
Sometimes, you may want to grab information about an image or other type of file. Although only images are supported, you can retrieve a nicely structured dict
Pass handle_file_content=True
to lassie.fetch
or set it on a Lassie
instance
>>> from lassie import Lassie
>>> lassie.fetch('https://camo.githubusercontent.com/d19b279de191489445d8cfd39faf93e19ca2df14/68747470733a2f2f692e696d6775722e636f6d2f5172764e6641582e676966', handle_file_content=True)
{
'title': '68747470733a2f2f692e696d6775722e636f6d2f5172764e6641582e676966',
'videos': [],
'url': 'https://camo.githubusercontent.com/d19b279de191489445d8cfd39faf93e19ca2df14/68747470733a2f2f692e696d6775722e636f6d2f5172764e6641582e676966',
'images': [{
'type': 'body_image',
'src': 'https://camo.githubusercontent.com/d19b279de191489445d8cfd39faf93e19ca2df14/68747470733a2f2f692e696d6775722e636f6d2f5172764e6641582e676966'
}]
}
>>> lassie.fetch('http://2.bp.blogspot.com/-vzGgFFtW-VY/Tz-eozaHw3I/AAAAAAAAM3k/OMvxpFYr23s/s1600/The-best-top-desktop-cat-wallpapers-10.jpg', handle_file_content=True)
{
'title': 'The-best-top-desktop-cat-wallpapers-10.jpg',
'images': [{
'type': 'body_image',
'src': 'http://2.bp.blogspot.com/-vzGgFFtW-VY/Tz-eozaHw3I/AAAAAAAAM3k/OMvxpFYr23s/s1600/The-best-top-desktop-cat-wallpapers-10.jpg'
}],
'videos': [],
'url': 'http://2.bp.blogspot.com/-vzGgFFtW-VY/Tz-eozaHw3I/AAAAAAAAM3k/OMvxpFYr23s/s1600/The-best-top-desktop-cat-wallpapers-10.jpg'
}