I’m writing a small crawler with Scrapy. I want to be able to pass the start_url
argument to my spider which later will enable me to run it via Celery (or something elese).
I hit a wall with passing arguments. And I’m getting an error:
2016-03-13 08:50:50 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
Unhandled error in Deferred:
2016-03-13 08:50:50 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 153, in crawl
d = crawler.crawl(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1274, in unwindGenerator
return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 70, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 80, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/crawl.py", line 91, in from_crawler
spider = super(CrawlSpider, cls).from_crawler(crawler, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/__init__.py", line 50, in from_crawler
spider = cls(*args, **kwargs)
exceptions.TypeError: __init__() takes at least 3 arguments (1 given)
2016-03-13 08:50:50 [twisted] CRITICAL:
The spider code is as below:
Class OnetSpider(CrawlSpider):
name = 'OnetSpider'
def __init__(self, ur, *args, **kwargs):
super(OnetSpider, self).__init__(*args, **kwargs)
self.start_urls = [kwargs.get('start_url')]
#allowed_domains = ['katalog.onet.pl']
#start_urls = ['http://katalog.onet.pl/']
response_url = ""
rules = [Rule(LinkExtractor(unique = True),
callback="parse_items",
follow = True)]
def parse_start_url(self, response):
self.response_url = response.url
return self.parse_items(response)
def parse_items (self, response):
baseDomain = self.get_base_domain(self.response_url)
for sel in response.xpath('//a'):
l = sel.xpath('@href').extract()[0]
t = sel.xpath('text()').extract()
if (self.is_relative(l)) or (baseDomain.upper()
in l.upper()):
continue
else:
itm = OnetItem()
itm['anchorTitle'] = t
itm['link'] = self.process_url(l)
itm['timeStamp'] = datetime.datetime.now()
itm['isChecked'] = 0
itm['responseCode'] = 0
itm['redirecrURL'] = ''
yield itm
def is_relative(self,url):
#checks if url is relative path or absolute
if urlparse(url).netloc =="":
return True
else:
return False
def get_base_domain(self, url):
#returns base url stripped from www/ftp and any ports
base = urlparse(url).netloc
if base.upper().startswith("WWW."):
base = base[4:]
if base.upper().startswith("FTP."):
base = base[4:]
base = base.split(':')[0]
return base
def process_url(self,url):
u = urlparse(url)
if u.scheme == '' :
u.scheme = 'http'
finalURL = u.scheme + '://' + u.netloc +'/'
return finalURL.lower()
I’m pretty sure it has something to do with passing arguments as without the def __init__
spider runs well.
Any idea what’s the issue?
I’m running this on my VPS Ubuntu server.
2019-03-04 13:01:00 [scrapy.crawler] INFO: Overridden settings: {}
2019-03-04 13:01:00 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2019-03-04 13:01:01 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-03-04 13:01:01 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-03-04 13:01:01 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-03-04 13:01:01 [scrapy.core.engine] INFO: Spider opened
Unhandled error in Deferred:
2019-03-04 13:01:01 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "site-packagesscrapycrawler.py", line 172, in crawl
File "site-packagesscrapycrawler.py", line 176, in _crawl
File "site-packagestwistedinternetdefer.py", line 1613, in unwindGenerator
File "site-packagestwistedinternetdefer.py", line 1529, in _cancellableInlineCallbacks
--- <exception caught here> ---
File "site-packagestwistedinternetdefer.py", line 1418, in _inlineCallbacks
File "site-packagesscrapycrawler.py", line 82, in crawl
builtins.ModuleNotFoundError: No module named '_sqlite3'
Содержание
- Twisted critical unhandled error in deferred
- Re: Unhandled error in Deferred:
- Re: Unhandled error in Deferred:
- Re: Unhandled error in Deferred:
- Re: Unhandled error in Deferred:
- [twisted] CRITICAL: Unhandled error in Deferred #2402
- Comments
- scrapy version -v
- [twisted] CRITICAL: Unhandled error in Deferred: #1
- Comments
- Footer
- Deferred Reference¶
- Deferreds¶
- Callbacks¶
- Multiple callbacks¶
- Visual Explanation¶
- Errbacks¶
- Unhandled Errors¶
- Handling either synchronous or asynchronous results¶
- Handling possible Deferreds in the library code¶
- Cancellation¶
- Motivation¶
- Cancellation for Applications which Consume Deferreds¶
- Default Cancellation Behavior¶
- Creating Cancellable Deferreds: Custom Cancellation Functions¶
- Timeouts¶
- DeferredList¶
- Other behaviours¶
- gatherResults¶
- Class Overview¶
- Basic Callback Functions¶
- Chaining Deferreds¶
- See also¶
Twisted critical unhandled error in deferred
Post by SSamiK » Sun Jan 24, 2021 1:17 pm
So this happend on a fresh install of Linux Mint — I installed Deluge and deluged as per my normal routine.
Set everything up so that it worked.
But there is a couple issues — first of not all torrents shows in the thin client so I had to restart it a few times to get everything to show. Even on the host that runs borth deluge and deluged. Labels would not do «move when completed» as setup.
And all of a sudden deluged just won»t start anymore.
Throws out this bunch of lines:
lsb_release -a
No LSB modules are available.
Distributor ID: Linuxmint
Description: Linux Mint 20.1
Release: 20.1
Codename: ulyssa
deluged —version
deluged 2.0.3
libtorrent: 1.2.12.0
Python: 3.8.5
OS: Linux Linux Mint 20.1 ulyssa
Any ideas what I can do to fix this? Deluge always have been my favorite client so I really don’t want to move to something else.
Re: Unhandled error in Deferred:
Post by SSamiK » Mon Jan 25, 2021 1:19 pm
I have not been able to solve this yet, so please if anyone can chime in I would appreciate it much.
I tried purging Deluge and deluged from my system and reinstalling but no change. I admit not having spent a ton of time trying to figure this out, but that’s life.. Just cant find the same amount of time to spend on things like this as in younger years.
For the time being I’m checking out rTorrent and Qbit as possible replacements — both lack a thin client, so they might not fill the deluge-gap but they’ll have to do for now.
Re: Unhandled error in Deferred:
Post by shamael » Tue Jan 26, 2021 7:55 am
Re: Unhandled error in Deferred:
Post by ross104 » Tue Jan 26, 2021 6:42 pm
It helped me, might help you.
Re: Unhandled error in Deferred:
Post by mentor » Fri Feb 19, 2021 11:17 am
ross104 wrote:
Install patch log.py
It helped me, might help you.
How does one install a patch? I’m using the ppa:deluge-team/stable ppa.
Источник
[twisted] CRITICAL: Unhandled error in Deferred #2402
The text was updated successfully, but these errors were encountered:
Can you share your spider code? Without more information, I’m tempted to think this has to do with your spider
what system are you on? what does scrapy version -v output?
have you google the error? I can see a few questions on StackOverflow and answers that mention the installation of sqlite3 is corrupt somehow
scrapy version -v gives more information. can you paste that output instead of scrapy -V ?
Also, can you try:
scrapy version -v
/home/tutorial/tutorial/spiders/dmoz_spider.py:1: ScrapyDeprecationWarning: Module scrapy.spider is deprecated, use scrapy.spiders instead
from scrapy.spider import BaseSpider
/home/tutorial/tutorial/spiders/dmoz_spider.py:3: ScrapyDeprecationWarning: tutorial.spiders.dmoz_spider.DmozSpider inherits from deprecated class scrapy.spiders.BaseSpider, please inherit from scrapy.spiders.Spider. (warning only on first subclass, there may be others)
class DmozSpider(BaseSpider):
Scrapy : 1.2.1
lxml : 3.6.4.0
libxml2 : 2.7.6
Twisted : 16.6.0rc1
Python : 2.7.6 (default, Nov 21 2016, 21:01:03) — [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)]
pyOpenSSL : 16.2.0 (OpenSSL 1.0.1e-fips 11 Feb 2013)
Platform : Linux-2.6.32-573.18.1.el6.x86_64-x86_64-with-centos-6.7-Final
Can you try importing sqlite from python shell? #2402 (comment)
If that fails, maybe you need to check your python installation
According to your proposal, the problem is solved, SQLite3 module does not exist.
Thank you very much, you can leave a WeChat?
@Lampere1021 , I don’t know what you mean
WeChat in China is a communication tool, similar to the mailbox @redapple
Right, but what do you mean by «you can leave a WeChat?» (if the question is «do I have a WeChat account», the answer is no)
Источник
[twisted] CRITICAL: Unhandled error in Deferred: #1
(xueqiu_user) E:git_rootxueqiuxueqiu_user>python run.py
2019-03-07 18:43:48 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: xueqiuCrawler)
2019-03-07 18:43:48 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 2.7.13 | Continuum Analytics, Inc.| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Windows-7-6.1.7601-SP1
2019-03-07 18:43:48 [scrapy.crawler] INFO: Overridden settings: <‘NEWSPIDER_MODULE’:’xueqiuCrawler.spiders’, ‘SPIDER_MODULES’: [‘xueqiuCrawler.spiders’], ‘BOT_NAME’: ‘xueqiuCrawler’, ‘COOKIES_ENABLED’: False, ‘SCHEDULER’: ‘xueqiuCrawler.scrapy_redis.scheduler.Scheduler’, ‘DOWNLOAD_DELAY’: 3>
2019-03-07 18:43:48 [scrapy.extensions.telnet] INFO: Telnet Password: b2a63b4ecc111b7b
2019-03-07 18:43:48 [scrapy.middleware] INFO: Enabled extensions:
[‘scrapy.extensions.logstats.LogStats’,
‘scrapy.extensions.telnet.TelnetConsole’,
‘scrapy.extensions.corestats.CoreStats’]
2019-03-07 18:43:48 [xueqiu] DEBUG: Reading URLs from redis list ‘xueqiu:start_urls’
2019-03-07 18:43:48 [scrapy.middleware] INFO: Enabled downloader middlewares:
[‘scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware’,
‘scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware’,
‘scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware’,
‘scrapy.downloadermiddlewares.useragent.UserAgentMiddleware’,
‘scrapy.downloadermiddlewares.retry.RetryMiddleware’,
‘scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware’,
‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’,
‘scrapy.downloadermiddlewares.redirect.RedirectMiddleware’,
‘scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware’,
‘scrapy.downloadermiddlewares.stats.DownloaderStats’]
2019-03-07 18:43:48 [scrapy.middleware] INFO: Enabled spider middlewares:
[‘scrapy.spidermiddlewares.httperror.HttpErrorMiddleware’,
‘scrapy.spidermiddlewares.offsite.OffsiteMiddleware’,
‘scrapy.spidermiddlewares.referer.RefererMiddleware’,
‘scrapy.spidermiddlewares.urllength.UrlLengthMiddleware’,
‘scrapy.spidermiddlewares.depth.DepthMiddleware’]
2019-03-07 18:43:48 [scrapy.middleware] INFO: Enabled item pipelines:
[‘xueqiuCrawler.pipelines.DuplicatesPipeline’,
‘xueqiuCrawler.pipelines.MongoPipeline’]
2019-03-07 18:43:48 [scrapy.core.engine] INFO: Spider opened
2019-03-07 18:43:50 [scrapy.core.engine] INFO: Closing spider (shutdown)
2019-03-07 18:43:50 [scrapy.core.engine] ERROR: Scraper close failure
Traceback (most recent call last):
File «E:git_rootxueqiuxueqiu_userlibsite-packagestwistedinternetdefer.py», line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File «E:git_rootxueqiuxueqiu_userlibsite-packagesscrapycoreengine.py», line 311, in
dfd.addBoth(lambda _: self.scraper.close_spider(spider))
File «E:git_rootxueqiuxueqiu_userlibsite-packagesscrapycorescraper.py», line 86, in close_spider
slot.closing = defer.Deferred()
AttributeError: ‘NoneType’ object has no attribute ‘closing’
2019-03-07 18:43:50 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
<‘finish_reason’: ‘shutdown’,
‘finish_time’: datetime.datetime(2019, 3, 7, 10, 43, 50, 963000),
‘log_count/DEBUG’: 1,
‘log_count/ERROR’: 1,
‘log_count/INFO’: 7>
2019-03-07 18:43:50 [scrapy.core.engine] INFO: Spider closed (shutdown)
Unhandled error in Deferred:
2019-03-07 18:43:51 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File «E:git_rootxueqiuxueqiu_userlibsite-packagesscrapycrawler.py», line 172, in crawl
return self._crawl(crawler, *args, **kwargs)
File «E:git_rootxueqiuxueqiu_userlibsite-packagesscrapycrawler.py», line 176, in _crawl
d = crawler.crawl(*args, **kwargs)
File «E:git_rootxueqiuxueqiu_userlibsite-packagestwistedinternetdefer.py», line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File «E:git_rootxueqiuxueqiu_userlibsite-packagestwistedinternetdefer.py», line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
— —
File «E:git_rootxueqiuxueqiu_userlibsite-packagestwistedinternetdefer.py», line 1418, in_inlineCallbacks
result = g.send(result)
File «E:git_rootxueqiuxueqiu_userlibsite-packagesscrapycrawler.py», line 98, in crawl
six.reraise(*exc_info)
File «E:git_rootxueqiuxueqiu_userlibsite-packagesscrapycrawler.py», line 82, in crawl
yield self.engine.open_spider(self.spider, start_requests)
redis.exceptions.ConnectionError: Error 10061 connecting to localhost:6379. .
2019-03-07 18:43:51 [twisted] CRITICAL:
Traceback (most recent call last):
File «E:git_rootxueqiuxueqiu_userlibsite-packagestwistedinternetdefer.py», line 1418, in _inlineCallbacks
result = g.send(result)
File «E:git_rootxueqiuxueqiu_userlibsite-packagesscrapycrawler.py», line 98, in crawl
six.reraise(*exc_info)
File «E:git_rootxueqiuxueqiu_userlibsite-packagesscrapycrawler.py», line 82, in crawl
yield self.engine.open_spider(self.spider, start_requests)
ConnectionError: Error 10061 connecting to localhost:6379. .
The text was updated successfully, but these errors were encountered:
© 2023 GitHub, Inc.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Источник
Deferred Reference¶
This document is a guide to the behaviour of the twisted.internet.defer.Deferred object, and to various ways you can use them when they are returned by functions.
This document assumes that you are familiar with the basic principle that the Twisted framework is structured around: asynchronous, callback-based programming, where instead of having blocking code in your program or using threads to run blocking code, you have functions that return immediately and then begin a callback chain when data is available.
After reading this document, the reader should expect to be able to deal with most simple APIs in Twisted and Twisted-using code that return Deferreds.
- what sorts of things you can do when you get a Deferred from a function call; and
- how you can write your code to robustly handle errors in Deferred code.
Deferreds¶
Twisted uses the Deferred object to manage the callback sequence. The client application attaches a series of functions to the deferred to be called in order when the results of the asynchronous request are available (this series of functions is known as a series of callbacks, or a callback chain), together with a series of functions to be called if there is an error in the asynchronous request (known as a series of errbacks or an errback chain). The asynchronous library code calls the first callback when the result is available, or the first errback when an error occurs, and the Deferred object then hands the results of each callback or errback function to the next function in the chain.
Callbacks¶
A twisted.internet.defer.Deferred is a promise that a function will at some point have a result. We can attach callback functions to a Deferred, and once it gets a result these callbacks will be called. In addition Deferreds allow the developer to register a callback for an error, with the default behavior of logging the error. The deferred mechanism standardizes the application programmer’s interface with all sorts of blocking or delayed operations.
Multiple callbacks¶
Multiple callbacks can be added to a Deferred. The first callback in the Deferred’s callback chain will be called with the result, the second with the result of the first callback, and so on. Why do we need this? Well, consider a Deferred returned by twisted.enterprise.adbapi — the result of a SQL query. A web widget might add a callback that converts this result into HTML, and pass the Deferred onwards, where the callback will be used by twisted to return the result to the HTTP client. The callback chain will be bypassed in case of errors or exceptions.
Pay particular attention to the handling of self.d in the gotResults method. Before the Deferred is fired with a result or an error, the attribute is set to None so that the Getter instance no longer has a reference to the Deferred about to be fired. This has several benefits. First, it avoids any chance Getter.gotResults will accidentally fire the same Deferred more than once (which would result in an AlreadyCalledError exception). Second, it allows a callback on that Deferred to call Getter.getDummyData (which sets a new value for the d attribute) without causing problems. Third, it makes the Python garbage collector’s job easier by eliminating a reference cycle.
Visual Explanation¶
Errbacks¶
Deferred’s error handling is modeled after Python’s exception handling. In the case that no errors occur, all the callbacks run, one after the other, as described above.
If the errback is called instead of the callback (e.g. because a DB query raised an error), then a twisted.python.failure.Failure is passed into the first errback (you can add multiple errbacks, just like with callbacks). You can think of your errbacks as being like except blocks of ordinary Python code.
Unless you explicitly raise an error in an except block, the Exception is caught and stops propagating, and normal execution continues. The same thing happens with errbacks: unless you explicitly return a Failure or (re-)raise an exception, the error stops propagating, and normal callbacks continue executing from that point (using the value returned from the errback). If the errback does return a Failure or raise an exception, then that is passed to the next errback, and so on.
Note: If an errback doesn’t return anything, then it effectively returns None , meaning that callbacks will continue to be executed after this errback. This may not be what you expect to happen, so be careful. Make sure your errbacks return a Failure (probably the one that was passed to it), or a meaningful return value for the next callback.
Also, twisted.python.failure.Failure instances have a useful method called trap, allowing you to effectively do the equivalent of:
If none of arguments passed to failure.trap match the error encapsulated in that Failure , then it re-raises the error.
There’s another potential “gotcha” here. There’s a method twisted.internet.defer.Deferred.addCallbacks which is similar to, but not exactly the same as, addCallback followed by addErrback . In particular, consider these two cases:
If an error occurs in callback1 , then for Case 1 errback1 will be called with the failure. For Case 2, errback2 will be called. Be careful with your callbacks and errbacks.
What this means in a practical sense is in Case 1, the callback in line A will handle a success condition from getDeferredFromSomewhere , and the errback in line B will handle any errors that occur from either the upstream source, or that occur in A. In Case 2, the errback in line C will only handle an error condition raised by getDeferredFromSomewhere , it will not do any handling of errors raised in callback1 .
Unhandled Errors¶
If a Deferred is garbage-collected with an unhandled error (i.e. it would call the next errback if there was one), then Twisted will write the error’s traceback to the log file. This means that you can typically get away with not adding errbacks and still get errors logged. Be careful though; if you keep a reference to the Deferred around, preventing it from being garbage-collected, then you may never see the error (and your callbacks will mysteriously seem to have never been called). If unsure, you should explicitly add an errback after your callbacks, even if all you do is:
Handling either synchronous or asynchronous results¶
In some applications, there are functions that might be either asynchronous or synchronous. For example, a user authentication function might be able to check in memory whether a user is authenticated, allowing the authentication function to return an immediate result, or it may need to wait on network data, in which case it should return a Deferred to be fired when that data arrives. However, a function that wants to check if a user is authenticated will then need to accept both immediate results and Deferreds.
In this example, the library function authenticateUser uses the application function isValidUser to authenticate a user:
However, it assumes that isValidUser returns immediately, whereas isValidUser may actually authenticate the user asynchronously and return a Deferred. It is possible to adapt this trivial user authentication code to accept either a synchronous isValidUser or an asynchronous isValidUser , allowing the library to handle either type of function. It is, however, also possible to adapt synchronous functions to return Deferreds. This section describes both alternatives: handling functions that might be synchronous or asynchronous in the library function ( authenticateUser ) or in the application code.
Handling possible Deferreds in the library code¶
Here is an example of a synchronous user authentication function that might be passed to authenticateUser :
However, here’s an asynchronousIsValidUser function that returns a Deferred:
Our original implementation of authenticateUser expected isValidUser to be synchronous, but now we need to change it to handle both synchronous and asynchronous implementations of isValidUser . For this, we use maybeDeferred to call isValidUser , ensuring that the result of isValidUser is a Deferred, even if isValidUser is a synchronous function:
Now isValidUser could be either synchronousIsValidUser or asynchronousIsValidUser .
It is also possible to modify synchronousIsValidUser to return a Deferred, see Generating Deferreds for more information.
Cancellation¶
Motivation¶
A Deferred may take any amount of time to be called back; in fact, it may never be called back. Your users may not be that patient. Since all actions taken when the Deferred completes are in your application or library’s callback code, you always have the option of simply disregarding the result when you receive it, if it’s been too long. However, while you’re ignoring it, the underlying operation represented by that Deferred is still chugging along in the background, possibly consuming resources such as CPU time, memory, network bandwidth and maybe even disk space. So, when the user has closed the window, hit the cancel button, disconnected from your server or sent a “stop” network message, you will want to announce your indifference to the result of that operation so that the originator of the Deferred can clean everything up and free those resources to be put to better use.
Cancellation for Applications which Consume Deferreds¶
Here’s a simple example. You’re connecting to an external host with an endpoint , but that host is really slow. You want to put a “cancel” button into your application to terminate the connection attempt, so the user can try connecting to a different host instead. Here’s a simple sketch of such an application, with the actual user interface left as an exercise for the reader:
Obviously (I hope), startConnecting is meant to be called by some UI element that lets the user choose what host to connect to and then constructs an appropriate endpoint (perhaps using twisted.internet.endpoints.clientFromString ). Then, a cancel button, or similar, is hooked up to the cancelClicked .
When connectionAttempt.cancel is invoked, that will:
- cause the underlying connection operation to be terminated, if it is still ongoing
- cause the connectionAttempt Deferred to be completed, one way or another, in a timely manner
- likely cause the connectionAttempt Deferred to be errbacked with CancelledError
You may notice that that set of consequences is very heavily qualified. Although cancellation indicates the calling API’s desire for the underlying operation to be stopped, the underlying operation cannot necessarily react immediately. Even in this very simple example, there is already one thing that might not be interruptible: platform-native name resolution blocks, and therefore needs to be executed in a thread; the connection operation can’t be cancelled if it’s stuck waiting for a name to be resolved in this manner. So, the Deferred that you are cancelling may not callback or errback right away.
A Deferred may wait upon another Deferred at any point in its callback chain (see “Handling…asynchronous results”, above). There’s no way for a particular point in the callback chain to know if everything is finished. Since multiple layers of the callback chain may wish to cancel the same Deferred, any layer may call .cancel() at any time. The .cancel() method never raises any exception or returns any value; you may call it repeatedly, even on a Deferred which has already fired, or which has no remaining callbacks. The main reason for all these qualifications, aside from specific examples, is that anyone who instantiates a Deferred may supply it with a cancellation function; that function can do absolutely anything that it wants to. Ideally, anything it does will be in the service of stopping the operation your requested, but there’s no way to guarantee any exact behavior across all Deferreds that might be cancelled. Cancellation of Deferreds is best effort. This may be the case for a number of reasons:
- The Deferred doesn’t know how to cancel the underlying operation.
- The underlying operation may have reached an uncancellable state, because some irreversible operation has been done.
- The Deferred may already have a result, and so there’s nothing to cancel.
Calling cancel() will always succeed without an error regardless of whether or not cancellation was possible. In cases 1 and 2 the Deferred may well errback with a twisted.internet.defer.CancelledError while the underlying operation continues. Deferred s that support cancellation should document what they do when cancelled, if they are uncancellable in certain edge cases, etc..
If the cancelled Deferred is waiting on another Deferred , the cancellation will be forwarded to the other Deferred .
Default Cancellation Behavior¶
All Deferreds support cancellation. However, by default, they support a very rudimentary form of cancellation which doesn’t free any resources.
Consider this example of a Deferred which is ignorant of cancellation:
A caller of an API that receives operation may call cancel on it. Since operation does not have a cancellation function, one of two things will happen.
If operationDone has been called, and the operation has completed, nothing much will change. operation will still have a result, and there are no more callbacks, so there’s no observable change in behavior.
If operationDone has not yet been invoked, then operation will be immediately errbacked with a CancelledError .
However, once it’s cancelled, there’s no way to tell operationDone not to run; it will eventually call operation.callback later. In normal operation, issuing callback on a Deferred that has already called back results in an AlreadyCalledError , and this would cause an ugly traceback that could not be caught. Therefore, .callback can be invoked exactly once, causing a no-op, on a Deferred which has been cancelled but has no canceller. If you call it multiple times, you will still get an AlreadyCalledError exception.
Creating Cancellable Deferreds: Custom Cancellation Functions¶
Let’s imagine you are implementing an HTTP client, which returns a Deferred firing with the response from the server. Cancellation is best achieved by closing the connection. In order to make cancellation do that, all you have to do is pass a function to the constructor of the Deferred (it will get called with the Deferred that is being cancelled):
Now if someone calls cancel() on the Deferred returned from HTTPClient.request() , the HTTP request will be cancelled (assuming it’s not too late to do so). Care should be taken not to callback() a Deferred that has already been cancelled.
Timeouts¶
Timeouts are a special case of Cancellation . Let’s say we have a Deferred representing a task that may take a long time. We want to put an upper bound on that task, so we want the Deferred to time out X seconds in the future.
A convenient API to do so is Deferred.addTimeout. By default, it will fail with a TimeoutError if the Deferred hasn’t fired (with either an errback or a callback) within timeout seconds.
Deferred.addTimeout uses the Deferred.cancel function under the hood, but can distinguish between a user’s call to Deferred.cancel and a cancellation due to a timeout. By default, Deferred.addTimeout translates a CancelledError produced by the timeout into a TimeoutError.
However, if you provided a custom cancellation when creating the Deferred, then cancelling it may not produce a CancelledError. In this case, the default behavior of Deferred.addTimeout is to preserve whatever callback or errback value your custom cancellation function produced. This can be useful if, for instance, a cancellation or timeout should produce a default value instead of an error.
Deferred.addTimeout also takes an optional callable onTimeoutCancel which is called immediately after the deferred times out. onTimeoutCancel is not called if it the deferred is otherwise cancelled before the timeout. It takes an arbitrary value, which is the value of the deferred at that exact time (probably a CancelledError Failure), and the timeout . This can be useful if, for instance, the cancellation or timeout does not result in an error but you want to log the timeout anyway. It can also be used to alter the return value.
Note that the exact place in the callback chain that Deferred.addTimeout is added determines how much of the callback chain should be timed out. The timeout encompasses all the callbacks and errbacks added to the Deferred before the call to addTimeout, and none of the callbacks and errbacks added after the call. The timeout also starts counting down as soon as soon as it’s invoked.
DeferredList¶
Sometimes you want to be notified after several different events have all happened, rather than waiting for each one individually. For example, you may want to wait for all the connections in a list to close. twisted.internet.defer.DeferredList is the way to do this.
To create a DeferredList from multiple Deferreds, you simply pass a list of the Deferreds you want it to wait for:
You can now treat the DeferredList like an ordinary Deferred; you can call addCallbacks and so on. The DeferredList will call its callback when all the deferreds have completed. The callback will be called with a list of the results of the Deferreds it contains, like so:
A standard DeferredList will never call errback, but failures in Deferreds passed to a DeferredList will still errback unless consumeErrors is passed True . See below for more details about this and other flags which modify the behavior of DeferredList.
If you want to apply callbacks to the individual Deferreds that go into the DeferredList, you should be careful about when those callbacks are added. The act of adding a Deferred to a DeferredList inserts a callback into that Deferred (when that callback is run, it checks to see if the DeferredList has been completed yet). The important thing to remember is that it is this callback which records the value that goes into the result list handed to the DeferredList’s callback.
Therefore, if you add a callback to the Deferred after adding the Deferred to the DeferredList, the value returned by that callback will not be given to the DeferredList’s callback. To avoid confusion, we recommend not adding callbacks to a Deferred once it has been used in a DeferredList.
Other behaviours¶
DeferredList accepts three keyword arguments that modify its behaviour: fireOnOneCallback , fireOnOneErrback and consumeErrors . If fireOnOneCallback is set, the DeferredList will immediately call its callback as soon as any of its Deferreds call their callback. Similarly, fireOnOneErrback will call errback as soon as any of the Deferreds call their errback. Note that DeferredList is still one-shot, like ordinary Deferreds, so after a callback or errback has been called the DeferredList will do nothing further (it will just silently ignore any other results from its Deferreds).
The fireOnOneErrback option is particularly useful when you want to wait for all the results if everything succeeds, but also want to know immediately if something fails.
The consumeErrors argument will stop the DeferredList from propagating any errors along the callback chains of any Deferreds it contains (usually creating a DeferredList has no effect on the results passed along the callbacks and errbacks of their Deferreds). Stopping errors at the DeferredList with this option will prevent “Unhandled error in Deferred” warnings from the Deferreds it contains without needing to add extra errbacks [1] . Passing a true value for the consumeErrors parameter will not change the behavior of fireOnOneCallback or fireOnOneErrback .
gatherResults¶
A common use for DeferredList is to “join” a number of parallel asynchronous operations, finishing successfully if all of the operations were successful, or failing if any one of the operations fails. In this case, twisted.internet.defer.gatherResults is a useful shortcut:
The consumeErrors argument has the same meaning as it does for DeferredList : if true, it causes gatherResults to consume any errors in the passed-in Deferreds. Always use this argument unless you are adding further callbacks or errbacks to the passed-in Deferreds, or unless you know that they will not fail. Otherwise, a failure will result in an unhandled error being logged by Twisted. This argument is available since Twisted 11.1.0.
Class Overview¶
This is an overview API reference for Deferred from the point of using a Deferred returned by a function. It is not meant to be a substitute for the docstrings in the Deferred class, but can provide guidelines for its use.
There is a parallel overview of functions used by the Deferred’s creator in Generating Deferreds .
Basic Callback Functions¶
addCallbacks(self, callback[, errback, callbackArgs, callbackKeywords, errbackArgs, errbackKeywords])
This is the method you will use to interact with Deferred. It adds a pair of callbacks “parallel” to each other (see diagram above) in the list of callbacks made when the Deferred is called back to. The signature of a method added using addCallbacks should be myMethod(result, *methodAsrgs, **methodKeywords) . If your method is passed in the callback slot, for example, all arguments in the tuple callbackArgs will be passed as *methodArgs to your method.
There are various convenience methods that are derivative of addCallbacks. I will not cover them in detail here, but it is important to know about them in order to create concise code.
addCallback(callback, *callbackArgs, **callbackKeywords)
Adds your callback at the next point in the processing chain, while adding an errback that will re-raise its first argument, not affecting further processing in the error case.
Note that, while addCallbacks (plural) requires the arguments to be passed in a tuple, addCallback (singular) takes all its remaining arguments as things to be passed to the callback function. The reason is obvious: addCallbacks (plural) cannot tell whether the arguments are meant for the callback or the errback, so they must be specifically marked by putting them into a tuple. addCallback (singular) knows that everything is destined to go to the callback, so it can use Python’s “*” and “**” syntax to collect the remaining arguments.
addErrback(errback, *errbackArgs, **errbackKeywords)
Adds your errback at the next point in the processing chain, while adding a callback that will return its first argument, not affecting further processing in the success case.
addBoth(callbackOrErrback, *callbackOrErrbackArgs, **callbackOrErrbackKeywords)
This method adds the same callback into both sides of the processing chain at both points. Keep in mind that the type of the first argument is indeterminate if you use this method! Use it for finally: style blocks.
Chaining Deferreds¶
If you need one Deferred to wait on another, all you need to do is return a Deferred from a method added to addCallbacks. Specifically, if you return Deferred B from a method added to Deferred A using A.addCallbacks, Deferred A’s processing chain will stop until Deferred B’s .callback() method is called; at that point, the next callback in A will be passed the result of the last callback in Deferred B’s processing chain at the time.
If a Deferred is somehow returned from its own callbacks (directly or indirectly), the behavior is undefined. The Deferred code will make an attempt to detect this situation and produce a warning. In the future, this will become an exception.
If this seems confusing, don’t worry about it right now – when you run into a situation where you need this behavior, you will probably recognize it immediately and realize why this happens. If you want to chain deferreds manually, there is also a convenience method to help you.
Add otherDeferred to the end of this Deferred’s processing chain. When self.callback is called, the result of my processing chain up to this point will be passed to otherDeferred.callback . Further additions to my callback chain do not affect otherDeferred .
This is the same as self.addCallbacks(otherDeferred.callback, otherDeferred.errback) .
See also¶
- Generating Deferreds , an introduction to writing asynchronous functions that return Deferreds.
[1] | Unless of course a later callback starts a fresh error — but as we’ve already noted, adding callbacks to a Deferred after its used in a DeferredList is confusing and usually avoided. |
© Copyright 2017, Twisted Matrix Labs. Revision 6bee026e .
Источник