When processing a FetchJob
.
- findAdapter :: URL -> Adapter
- setupRequest :: URL -> Request
- fetchRequest :: Request -> Content
- parseContent :: Content -> Information
- storeInformation :: Infomation -> Collection | FileCollection
When processing an InteractJob
.
- interactWithBrowser ::
- storeInformation :: Infomation -> Collection | FileCollection
When fetching a Request
.
- readFromCache ::
- readFromNetwork ::
- readFromBrowser ::
This module provides a managed approach to fetching HTML content from over the network, implementing the following three best practices
- Minimal Impact – Dont spam servers asking for the same content. Cache all requests.
- Minimal Latency - Wherever possible use a single HTTP request for the content.
- Dynamic Content - Cater for pages that generate content using client-side Javascript.
When an URL is requested the module will check whether there is a copy of it stored in the MongoDB cache. When there is a copy in cache which is not stale then it will return this instead of going out over the network.
Check whether fresh copy of the contents of a HTTP request has been cached.
If a request.options.checkCacheAge
has been specified
then check whether there is a cached result less than this age.
returns undefined
when no fresh cached result or no request.options.checkCacheAge
is not present.
Arguments
- request Object
-
The HTTP method, URL and options being requested.
- request.url String
-
The URL to retrieve, including query params. Used as an index for the cache and should be unique per page.
- request.method String
-
The HTTP method used for the original request. Either "
GET
", or "POST
". - request.options Object
-
The HTTP request options object. May contain various fields but only one is checked.
- request.options.checkCacheAge Object, Number, or String
-
The maximum age of the cache to return. eg.
{ days: 7 }
or{ seconds: 10 }
usable as a duration by moment. If there isnt anything younger than this duration thenundefined
will be returned.
Returns
- Object or undefined
-
The cached content as an object. Alternatively returns
undefined
when it cannot find a valid cache entry.
readFromNetwork
readFromBrowser