2.7 Fetching resources

When a user agent is to fetch a resource or URL, optionally from an origin origin, and optionally with a synchronous flag, a manual redirect flag, a force same-origin flag, and/or a block cookies flag, the following steps must be run. (When a URL is to be fetched, the URL identifies a resource to be obtained.)

  1. Let document be the appropriate Document as given by the following list:

    When navigating
    The active document of the source browsing context.
    When fetching resources for an element
    The element's Document.
    When fetching resources in response to a call to an API
    The entry script's document.
  2. While document is an iframe srcdoc document, let document be document's browsing context's browsing context container's Document instead.

  3. Generate the address of the resource from which Request-URIs are obtained as required by HTTP for the Referer (sic) header from the document's current address of document. [HTTP]

    Remove any <fragment> component from the generated address of the resource from which Request-URIs are obtained.

    If the origin of the appropriate Document is not a scheme/host/port tuple, then the Referer (sic) header must be omitted, regardless of its value.

  4. If the algorithm was not invoked with the synchronous flag, perform the remaining steps asynchronously.

  5. This is the main step.

    If the resource is identified by an absolute URL, and the resource is to be obtained using an idempotent action (such as an HTTP GET or equivalent), and it is already being downloaded for other reasons (e.g. another invocation of this algorithm), and this request would be identical to the previous one (e.g. same Accept and Origin headers), and the user agent is configured such that it is to reuse the data from the existing download instead of initiating a new one, then use the results of the existing download instead of starting a new one.

    Otherwise, if the resource is identified by an absolute URL with a scheme that does not define a mechanism to obtain the resource (e.g. it is a mailto: URL) or that the user agent does not support, then act as if the resource was an HTTP 204 No Content response with no other metadata.

    Otherwise, if the resource is identified by the URL about:blank, then the resource is immediately available and consists of the empty string, with no metadata.

    Otherwise, at a time convenient to the user and the user agent, download (or otherwise obtain) the resource, applying the semantics of the relevant specifications (e.g. performing an HTTP GET or POST operation, or reading the file from disk, dereferencing javascript: URLs, etc).

    For the purposes of the Referer (sic) header, use the address of the resource from which Request-URIs are obtained generated in the earlier step.

    For the purposes of the Origin header, if the fetching algorithm was explicitly initiated from an origin, then the origin that initiated the HTTP request is origin. Otherwise, this is a request from a "privacy-sensitive" context. [ORIGIN]

  6. If the algorithm was not invoked with the block cookies flag, and there are cookies to be set, then the user agent must run the following substeps:

    1. Wait until ownership of the storage mutex can be taken by this instance of the fetching algorithm.

    2. Take ownership of the storage mutex.

    3. Update the cookies. [COOKIES]

    4. Release the storage mutex so that it is once again free.

  7. If the fetched resource is an HTTP redirect or equivalent, then:

    If the force same-origin flag is set and the URL of the target of the redirect does not have the same origin as the URL for which the fetch algorithm was invoked

    Abort these steps and return failure from this algorithm, as if the remote host could not be contacted.

    If the manual redirect flag is set

    Continue, using the fetched resource (the redirect) as the result of the algorithm. If the calling algorithm subsequently requires the user agent to transparently follow the redirect, then the user agent must resume this algorithm from the main step, but using the target of the redirect as the resource to fetch, rather than the original resource.

    Otherwise

    First, apply any relevant requirements for redirects (such as showing any appropriate prompts). Then, redo main step, but using the target of the redirect as the resource to fetch, rather than the original resource.

    The HTTP specification requires that 301, 302, and 307 redirects, when applied to methods other than the safe methods, not be followed without user confirmation. That would be an appropriate prompt for the purposes of the requirement in the paragraph above. [HTTP]

  8. If the algorithm was not invoked with the synchronous flag: When the resource is available, or if there is an error of some description, queue a task that uses the resource as appropriate. If the resource can be processed incrementally, as, for instance, with a progressively interlaced JPEG or an HTML file, additional tasks may be queued to process the data as it is downloaded. The task source for these tasks is the networking task source.

    Otherwise, return the resource or error information to the calling algorithm.

If the user agent can determine the actual length of the resource being fetched for an instance of this algorithm, and if that length is finite, then that length is the file's size. Otherwise, the subject of the algorithm (that is, the resource being fetched) has no known size. (For example, the HTTP Content-Length header might provide this information.)

The user agent must also keep track of the number of bytes downloaded for each instance of this algorithm. This number must exclude any out-of-band metadata, such as HTTP headers.

The application cache processing model introduces some changes to the networking model to handle the returning of cached resources.

The navigation processing model handles redirects itself, overriding the redirection handling that would be done by the fetching algorithm.

Whether the type sniffing rules apply to the fetched resource depends on the algorithm that invokes the rules — they are not always applicable.

2.7.1 Protocol concepts

User agents can implement a variety of transfer protocols, but this specification mostly defines behavior in terms of HTTP. [HTTP]

The HTTP GET method is equivalent to the default retrieval action of the protocol. For example, RETR in FTP. Such actions are idempotent and safe, in HTTP terms.

The HTTP response codes are equivalent to statuses in other protocols that have the same basic meanings. For example, a "file not found" error is equivalent to a 404 code, a server error is equivalent to a 5xx code, and so on.

The HTTP headers are equivalent to fields in other protocols that have the same basic meaning. For example, the HTTP authentication headers are equivalent to the authentication aspects of the FTP protocol.

Anything in this specification that refers to HTTP also applies to HTTP-over-TLS, as represented by URLs representing the https scheme.

User agents should report certificate errors to the user and must either refuse to download resources sent with erroneous certificates or must act as if such resources were in fact served with no encryption.

User agents should warn the user that there is a potential problem whenever the user visits a page that the user has previously visited, if the page uses less secure encryption on the second visit.

Not doing so can result in users not noticing man-in-the-middle attacks.

If a user connects to a server with a self-signed certificate, the user agent could allow the connection but just act as if there had been no encryption. If the user agent instead allowed the user to override the problem and then displayed the page as if it was fully and safely encrypted, the user could be easily tricked into accepting man-in-the-middle connections.

If a user connects to a server with full encryption, but the page then refers to an external resource that has an expired certificate, then the user agent will act as if the resource was unavailable, possibly also reporting the problem to the user. If the user agent instead allowed the resource to be used, then an attacker could just look for "secure" sites that used resources from a different host and only apply man-in-the-middle attacks to that host, for example taking over scripts in the page.

If a user bookmarks a site that uses a CA-signed certificate, and then later revisits that site directly but the site has started using a self-signed certificate, the user agent could warn the user that a man-in-the-middle attack is likely underway, instead of simply acting as if the page was not encrypted.

2.7.3 Determining the type of a resource

The Content-Type metadata of a resource must be obtained and interpreted in a manner consistent with the requirements of the Media Type Sniffing specification. [MIMESNIFF]

The sniffed type of a resource must be found in a manner consistent with the requirements given in the Media Type Sniffing specification for finding the sniffed-type of the relevant sequence of octets. [MIMESNIFF]

The rules for sniffing images specifically and the rules for distinguishing if a resource is text or binary are also defined in the Media Type Sniffing specification. Both sets of rules return a MIME type as their result. [MIMESNIFF]

It is imperative that the rules in the Media Type Sniffing specification be followed exactly. When a user agent uses different heuristics for content type detection than the server expects, security problems can occur. For more details, see the Media Type Sniffing specification. [MIMESNIFF]

2.7.4 Extracting encodings from meta elements

The algorithm for extracting an encoding from a meta element, given a string s, is as follows. It either returns an encoding or nothing.

  1. Let position be a pointer into s, initially pointing at the start of the string.

  2. Loop: Find the first seven characters in s after position that are an ASCII case-insensitive match for the word "charset". If no such match is found, return nothing and abort these steps.

  3. Skip any U+0009, U+000A, U+000C, U+000D, or U+0020 characters that immediately follow the word "charset" (there might not be any).

  4. If the next character is not a U+003D EQUALS SIGN (=), then move position to point just before that next character, and jump back to the step labeled loop.

  5. Skip any U+0009, U+000A, U+000C, U+000D, or U+0020 characters that immediately follow the equals sign (there might not be any).

  6. Process the next character as follows:

    If it is a U+0022 QUOTATION MARK (") and there is a later U+0022 QUOTATION MARK (") in s
    If it is a U+0027 APOSTROPHE (') and there is a later U+0027 APOSTROPHE (') in s
    Return the encoding corresponding to the string between this character and the next earliest occurrence of this character.
    If it is an unmatched U+0022 QUOTATION MARK (")
    If it is an unmatched U+0027 APOSTROPHE (')
    If there is no next character
    Return nothing.
    Otherwise
    Return the encoding corresponding to the string from this character to the first U+0009, U+000A, U+000C, U+000D, U+0020, or U+003B character or the end of s, whichever comes first.

This algorithm is distinct from those in the HTTP specification (for example, HTTP doesn't allow the use of single quotes and requires supporting a backslash-escape mechanism that is not supported by this algorithm). While the algorithm is used in contexts that, historically, were related to HTTP, the syntax as supported by implementations diverged some time ago. [HTTP]

2.7.5 CORS settings attributes

A CORS settings attribute is an enumerated attribute. The following table lists the keywords and states for the attribute — the keywords in the left column map to the states in the cell in the second column on the same row as the keyword.

Keyword State Brief description
anonymous Anonymous Cross-origin CORS requests for the element will not have the credentials flag set.
use-credentials Use Credentials Cross-origin CORS requests for the element will have the credentials flag set.

The empty string is also a valid keyword, and maps to the Anonymous state. The attribute's invalid value default is the Anonymous state. The missing value default, used when the attribute is omitted, is the No CORS state.

2.7.6 CORS-enabled fetch

When the user agent is required to perform a potentially CORS-enabled fetch of an absolute URL URL, with a mode mode that is either "No CORS", "Anonymous", or "Use Credentials", an origin origin, and a default origin behaviour default which is either "taint" or "fail", it must run the first applicable set of steps from the following list. The default origin behaviour is only used if mode is "No CORS". This algorithm wraps the fetch algorithm above, and labels the obtained resource as either CORS-same-origin or CORS-cross-origin, or blocks the resource entirely.

If the URL has the same origin as origin
If mode is "No CORS"

Run these substeps:

  1. Let result have no value.

  2. Fetch URL, with the manual redirect flag set.

  3. Loop: Wait for the fetch algorithm to know if the result is a redirect or not.

  4. If the result of the fetch is a redirect, and the mode is not "No CORS", and the origin of the target URL of the redirect is not the same origin as origin, then set URL to the the target URL of the redirect and return to the top of the potentially CORS-enabled fetch algorithm (this time, the branch below will be taken, resulting in the fetch being done in a CORS-aware fashion).

    Otherwise, if the result of the fetch is a redirect, and result still has no value, then apply the CORS redirect steps, with the CORS credential flag set to true and the request rules being that the user agent continue to follow these steps. If this resumes the fetch algorithm, then return to the loop step. If it failed due to a failure of the CORS resource sharing check, then: if default is fail, then set result to fail and jump to the step labeled end; if default is taint, then set result to taint, transparently follow the redirect but with the manual redirect flag no longer set, and jump to the step labeled end below.

    Otherwise, if the resource is not available (e.g. there is a network error) then set result to the same value as default, and jump to the step labeled end below.

    Otherwise, perform a resource sharing check, with the CORS credential flag set to true. If it returns fail, then set result to the same value as default; otherwise, set result to success. Then, jump to the step labeled end below.

  5. End: Jump to the appropriate step from the following list:

    If result is fail

    Discard all fetched data and prevent any tasks from the fetch algorithm from being queued. For the purposes of the calling algorithm, the user agent must act as if there was a fatal network error and no resource was obtained. The user agent may report a cross-origin resource access failure to the user (e.g. in a debugging console).

    If result is taint

    The tasks from the fetch algorithm are queued normally, but for the purposes of the calling algorithm, the obtained resource is CORS-cross-origin. The user agent may report a cross-origin resource access failure to the user (e.g. in a debugging console).

    If result is success

    The tasks from the fetch algorithm are queued normally, and for the purposes of the calling algorithm, the obtained resource is CORS-same-origin.

If mode is "Anonymous" or "Use Credentials"

Run these steps:

  1. Perform a cross-origin request with the request URL set to URL, the source origin set to origin, and the credentials flag set to true if mode is "Use Credentials" and set to false otherwise. [CORS]

  2. Wait for the CORS cross-origin request status to have a value.

  3. Jump to the appropriate step from the following list:

    If the CORS cross-origin request status is not success

    Discard all fetched data and prevent any tasks from the fetch algorithm from being queued. For the purposes of the calling algorithm, the user agent must act as if there was a fatal network error and no resource was obtained. If a CORS resource sharing check failed, the user agent may report a cross-origin resource access failure to the user (e.g. in a debugging console).

    If the CORS cross-origin request status is success

    The tasks from the fetch algorithm are queued normally, and for the purposes of the calling algorithm, the obtained resource is CORS-same-origin.