feat: provide Request instances in skipped request callbacks#1999
feat: provide Request instances in skipped request callbacks#1999vdusek wants to merge 4 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1999 +/- ##
==========================================
+ Coverage 93.09% 93.17% +0.08%
==========================================
Files 167 167
Lines 11775 11815 +40
==========================================
+ Hits 10962 11009 +47
+ Misses 813 806 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…ts skips out of transform
| """ | ||
|
|
||
|
|
||
| def _skipped_request_callback_expects_request(callback: Callable[..., Awaitable[None]]) -> bool: |
There was a problem hiding this comment.
Maybe we can wait for v2 release and make a breaking change with a clear signature to avoid this kind of fragile runtime inspection.
|
|
||
| # A string annotation may not resolve to the class (e.g. a `TYPE_CHECKING`-only import), so match by name. | ||
| if isinstance(annotation, str): | ||
| return _annotation_names_request(annotation) |
There was a problem hiding this comment.
If I'm not mistaken, this won't match when the annotation uses an import alias under TYPE_CHECKING:
if TYPE_CHECKING:
from crawlee import Request as CrawleeRequest
async def skipped_hook(request: CrawleeRequest, _reason: SkippedReason) -> None:
pass|
Converting to draft. Based on #1999 (comment), this will be done as a clean breaking change in v2 instead of the annotation-based dual dispatch. Tracked in #2007. |
on_skipped_requestcallbacks can now receive the fullRequestobject instead of only the URL string, so handlers can access request metadata such asuser_data.This stays backward compatible by dual-dispatching on the type annotation of the callback's first parameter: annotate it as
Requestto receive the object, while astrannotation (or no annotation) keeps receiving the URL string as before. Skipped URLs are normalized toRequestonce at the choke points (add_requestsand bothextract_linksimplementations), the duplicated request-building helper was extracted tocrawlee._utils.requests.create_request_from_options, and link extraction now appliestransform_request_functionto robots-skipped requests for consistency with enqueued ones.Reworked from #1927 (original author inactive).
Closes #2007.