User:Eighty5cacao/misc/HTTPS Everywhere/feature

From Pin Eight
Jump to: navigation, search
MUST
SHOULD
MAY
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. (what's this?)

This page is not intended for the developers to read directly. It is a scratchpad for me to collect my ideas if/when I file bug reports for these features.

Allow user rulesets to be flagged as "superseding" a built-in ruleset

...so as to reduce noise in an automated problem-reporting system for users who are testing changes to built-in rulesets (and would currently need to disable the built-in ruleset in question).

Perhaps a first draft of this could be, "If a user ruleset has a name conflict with a built-in ruleset, (optionally) prefer the user ruleset and log the situation as a warning rather than an error" (user-visible UI warnings MAY be provided, and any part of the behavior MAY be conditional upon a configuration setting and/or an attribute of the user ruleset's ruleset element). (NB: this GitHub ticket by a regular developer seems to be describing this)

A later implementation could cover the case where the user ruleset has a different name from a built-in ruleset, whose name is specified through an attribute like supersedes_builtin=.

User exclusion sets

Suppose there were a built-in ruleset named "Example0" that covered the domain bar.example.com, and badpage.htm on that domain suddenly broke (such as by redirecting to http). We should let users write something like:

<userexclusionset name="Example0 (emergency fix)" applies_to_ruleset="Example0">
 
        <target host="bar.example.com" />
 
 
        <exclusion pattern="^http://bar\.example\.com/badpage\.htm(\?|$)" />
 
</userexclusionset>

TODO: Consider whether the UI design demands a name attribute (should the user exclusion set be given its own line in the ruleset list, or should it silently apply without a checkbox choice?). Also: whether targets are needed, what syntax should be defined for "exclude from securecookie," and whether we should provide a mechanism to add coverage to an existing ruleset rather than merely exclusions (Tor Trac ticket 10033 and GitHub ticket 296 mention a work-in-progress Chrome implementation of the last).

Provide a means by which a rule in one ruleset can override a downgrade rule in another ruleset

The proposed attribute name is softdowngrade.

softdowngrade behaves like downgrade except that the code should check whether there is an enabled rule that would rewrite the opposite way (as there would be if both rulesets were enabled in the example below) and ignore the softdowngrade if so.

An example for a real site, specifically an xkcd comic that has unsecurable mixed scripts and fails to display any image when those scripts are blocked:

As a further example, a similar situation has already been found in the search feature on Stack Exchange sites (the downgrade in Stack-Exchange.xml ought to be a softdowngrade). (Does the fighting between the current Stack-Exchange.xml and Stack-Exchange-mixedcontent.xml cause redirect loops if the latter is manually enabled? This needs testing.)

This feature should be used only for pages that have true (unsecurable) mixed content that breaks major functionality (including, but not limited to, layout breakage severe enough to make the site unusable by an experienced, normally-abled user).

To consider: Instead of defining softdowngrade, is it better simply to give the existing downgrade attribute such "soft" behavior (i.e., explicitly give normal rules precedence over downgrade rules)? IIRC, currently, no code in the HTTPS Everywhere browser extension actually reads the downgrade attribute; it is read only by validation scripts as part of the build process.

New values for platform attribute

  • aia - Currently, rulesets for sites with incomplete certificate chains are simply default_off'd. This is a potential alternative for browsers that support fetching intermediate certificates in accord with Authority Information Access fields; currently it seems equivalent to chrome according to this comment.
  • mixedpost - Really means false mixed POST; Firefox checks for these separately from mixed content, so a fix for bmo:878890 might not automatically address this; true mixed POSTs should(?) generally be handled by splitting coverage of the referring page to a default-off ruleset if major functionality is broken
  • mixedxhr - when enforcement of same-origin policies causes problems with XMLHttpRequest calls (see torbug:7851)
  • softmixed (or falsemcb?) - exact definition to be decided later; needed in order to distinguish "good" and "bad" MCB implementations
  • tor - needed if we ever want to allow clearnet domains to be rewritten to hidden services - mailing list discussion exists on whether this is worth doing at all - newer tickets include GitHub #3798

Pseudoplatforms

These MUST NOT disable any rulesets without explicitly warning the user first. Instead, they SHOULD clarify the wording of the browser's TLS error pages, specifically to explain that the needed TLS feature may be broken by an intercepting proxy or webmaster misconfiguration. An initial implementation MAY treat these as no-ops. That is, in order to enable the corresponding rulesets by default, the browser addon MAY choose to pretend that all supported browsers match these platform values.

The behavior described above deviates from that for the existing platform attribute; thus a new attribute needs to be defined, perhaps subplatform.

  • letsencrypt - The Let's Encrypt CA is often reported problematic on Chrome for Windows XP, presumably due to lack of a required intermediate certificate in the Microsoft-supplied certificate database (TODO: or signature algorithm?). This is a subplatform due to the deprecation of Windows XP.
  • sni - for sites that require SNI in order for a matching certificate to be obtained, such as those that are hosted on WebFaction or that use Cloudflare's free service tier. Compare the snionly attribute in Chromium's HSTS preload list. (Dubious because: The non-SNI platform most likely to be encountered is Firefox with the Convergence addon [or its fork FreeSpeechMe, when configured to validate non-Namecoin sites?], but neither addon is still being maintained.)
  • tls13 - for sites that require TLS 1.3 or higher

Override DNS lookups

Suggested syntax: <dnsoverride host="sd.sharethis.com" ipv4_blacklist="184.72.49.139" /> (example is only illustrative; IP address no longer accurate)

To be used to work around broken load-balancing arrangements

There should also be positive ipv4 and ipv6 attributes, to force the use of the specified IP(s) for the specified hostname(s), even if the browser receives a single A record for some other IP.

(of course, ipv6_blacklist should be available too)

We should probably have different attribute names to specify hosts via either simple matches (like target) or regexes (like securecookie).

For a dnsoverride element to be effective for a given host, that host MUST also be listed in the ruleset's targets.

An attempt to blacklist an IP address corresponding to the only available A or AAAA record for a domain SHOULD be treated as a no-op and MUST generate a log message (TODO: at what severity?).

TODO: Decide how multiple IPs or hostnames should be delimited (comma? pipe? ...)

A Firefox implementation might depend on bmo:652295, though that bug isn't quite about overriding the built-in DNS resolver...

Load balancing

That is, allow a rule to specify multiple rewrite destinations among which one will be chosen randomly, to be used in cases where equivalent content is available on multiple hostnames. Some examples for real sites (just to show the syntax, not to demonstrate best practices):

To be reevaluated: The rewriting of any given URL should be deterministic within a browser session and/or a given time interval; that is, the chosen rewrite should be memoized.

If it is considered undesirable to repeatedly consume entropy from the browser's PRNG, perhaps a suitable pseudorandom number might be some HMAC using the originally-requested URL as the message and a single CSPRNG output (generated once per session or time interval) as the key.

Advanced string operations

such as letter case transformations and percent (un)encoding. Perhaps the to field could contain something like $lc{1} to mean "lowercase version of the string matched by the first parens in the corresponding from"? This could be useful for dealing with redirection scripts:

<ruleset name="Example (partial)">
    <target host="foo.example.com" />
 
    <!--
        foo.example.com lacks HTTPS support of its own
    -->
    <rule from="^http://foo\.example\.com/redirect.php\?u=https(?::|%3[Aa])(?:/|%2[Ff]){2}(.+)&#38;bar=.*&#38;baz="
            to="https://$decode{1}" />
</ruleset>

TODO: explain other use cases

What an automated problem-reporting system should cover

The existing proposal(s) seem(s) only to cover rulesets manually disabled by the user. The problem is a limitation of the current UI: If a casual user doesn't bother to click on the icon or the Tools menu entry, they may not be aware that a redirect loop exists. If it is the top-level document that has experienced a redirect loop, they may think there is no rule coverage for that URL. Consequently, they might not disable the ruleset in question. Thus, a problem-reporting system should also handle redirect loops.

It's probably also a good idea to report SSL/TLS protocol errors for sites with active rulesets (certificate-related or not); among other reasons, such errors may not be noticed if they are triggered by third-party content. (Perhaps we should twiddle the pref on Mozilla's TLS error reporter to point at an EFF/Tor Project-owned server...)

(For everything between here and the top of the section, "ruleset" means built-in rulesets only.)

If reporting that a user has manually disabled a (built-in) ruleset, allow optionally reporting whether there are any user rulesets that are active for the URLs for which the built-in rulesets were found disabled - but don't report on the contents of said user rulesets, of course

TODO: Discussion exists at GitHub issue 1888 with a proposed implementation in pull request 2601, but that implementation of the reporting mechanism appears to need revision because it does not yet ignore user rulesets.

Warn the user more loudly when user rulesets fail to load

...or possibly also when they are being disabled by default.

Theoretically, anyone technically oriented enough to work with user rulesets should be smart enough to validate their XML and regexes by eye (or script). However, people like me sometimes make stupid typos and then (1) get too lazy to check the Error Console and/or (2) visit websites that spam the Error Console heavily for unrelated reasons

Warn the user more loudly when redirect loops exist

  • Immediately after the redirect loop happens: infobar and/or change of menubar icon
    • Or replace the displayed count of active rulesets with an exclamation mark

Cert overrides

For logical consistency, the [[torbug:8958]] proposal should probably be a new element name rather than an attribute of rule elements; say, <certoverride host="deliveryimages.acm.org" accept_hostname="*.akamaihd.net" /> (a real example adapted from bmo:644640#c127). We should probably also define an "accept_fingerprint" mechanism to override errors other than mismatches. (Observe that pinning the cert fingerprint via accept_fingerprint would be satisfactory for both expiration and chain problems; on the other hand, we MUST NOT define accept_time, as overriding the TLS stack's idea of the current time could cause it to send an OCSP request that is bound to fail because of the cert being expired.)

Should the host attribute be a simple match (as in target elements) or a regex (as in securecookie)? Or should we provide options for both?

TODO: Make sure any specific proposal can handle load-balancing arrangements such as the one used by (www.)frys.com (Fry's Electronics). We probably need plural names: accept_hostnames, accept_fingerprints. (In this specific case, a possible syntax would be <certoverride host="^(?:www\.)?frys\.com$" accept_hostnames="shop1.frys.com,shop2.frys.com,shop3.frys.com,shop4.frys.com,shop5.frys.com,shop6.frys.com" />.)

Friendly-name attribute

Implement some UI/preference for an alternate name attribute, for users who would prefer to avoid seeing TLDs in ruleset names and/or prefer to see official company names written in full; examples based on existing rulesets:

<ruleset name="Reddit.com" friendly_name="Reddit">

<ruleset name="OEIS.org" friendly_name="On-Line Encyclopedia of Integer Sequences">