MUST SHOULD MAY |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. (what's this?) |
This page is not intended for the developers to read directly. It is a scratchpad for me to collect my ideas if/when I file bug reports for these features.
...so as to reduce noise in an automated problem-reporting system for users who are testing changes to built-in rulesets (and would currently need to disable the built-in ruleset in question).
Perhaps a first draft of this could be, "If a user ruleset has a name conflict with a built-in ruleset, (optionally) prefer the user ruleset and log the situation as a warning rather than an error" (user-visible UI warnings MAY be provided, and any part of the behavior MAY be conditional upon a configuration setting and/or an attribute of the user ruleset's ruleset
element). (NB: this GitHub ticket by a regular developer seems to be describing this)
A later implementation could cover the case where the user ruleset has a different name from a built-in ruleset, whose name is specified through an attribute like supersedes_builtin=
.
Suppose there were a built-in ruleset named "Example0" that covered the domain bar.example.com
, and badpage.htm
on that domain suddenly broke (such as by redirecting to http). We should let users write something like:
<userexclusionset name="Example0 (emergency fix)" applies_to_ruleset="Example0">
<target host="bar.example.com" />
<exclusion pattern="^http://bar\.example\.com/badpage\.htm(\?|$)" />
</userexclusionset>
TODO: Consider whether the UI design demands a name
attribute (should the user exclusion set be given its own line in the ruleset list, or should it silently apply without a checkbox choice?). Also: whether target
s are needed, what syntax should be defined for "exclude from securecookie
," and whether we should provide a mechanism to add coverage to an existing ruleset rather than merely exclusions (Tor Trac ticket 10033 and GitHub ticket 296 mention a work-in-progress Chrome implementation of the last).
The proposed attribute name is softdowngrade
.
softdowngrade
behaves like downgrade
except that the code should check whether there is an enabled rule that would rewrite the opposite way (as there would be if both rulesets were enabled in the example below) and ignore the softdowngrade
if so.
Example collapsed to save space |
---|
<!--
For rules that cause mixed content, see Example1-mixedcontent.xml.
-->
<ruleset name="Example1 (partial)">
<target host="foo.example.com" />
<exclusion pattern="^http://foo\.example\.com/badpage\.htm(?:\?|$)" />
<rule from="^http:"
to="https:" />
<rule from="^https://foo\.example\.com/badpage\.htm(?=\?|$)"
to="http://foo.example.com/badpage.htm" softdowngrade="1" />
</ruleset>
<!--
For rules without mixed content, see Example1.xml.
-->
<ruleset name="Example1 (mixed content)" platform="mixedcontent">
<target host="foo.example.com" />
<rule from="^http://foo\.example\.com/badpage\.htm(?=\?|$)"
to="https://foo.example.com/badpage.htm" />
</ruleset>
|
An example for a real site, specifically an xkcd comic that has unsecurable mixed scripts and fails to display any image when those scripts are blocked:
Example collapsed to save space |
---|
<!--
For mixed-content rules, see xkcd-mixedcontent.xml.
[...]
Mixed content:
- Images, on:
- (www.) from imgs * [obsolete?]
- m from imgs *
- Scripts, on:
- (www.).../1037/ from umwelt **
* Secured by us, doesn't trip MCB
** Unsecurable [TODO: specify]
-->
<ruleset name="xkcd (partial)">
<!-- targets, unrelated tests, and unrelated exclusions omitted -->
<!-- Mixed content: -->
<exclusion pattern="^http://(?:www\.)?xkcd\.com/1037/" />
<!-- other rules omitted -->
<!-- Previous/Next links between comics are relative: -->
<rule from="^https://(www\.)?xkcd\.com/(?=1037/)"
to="http://$1xkcd.com/" softdowngrade="1" />
<test url="http://xkcd.com/1037/" />
</ruleset>
<!--
For rules without mixed content, see xkcd.xml.
[...]
Mixed content:
- Images, on:
- (www.) from imgs * [obsolete?]
- m from imgs *
- Scripts, on:
- (www.).../1037/ from umwelt **
* Secured by us, doesn't trip MCB
** Unsecurable [TODO: specify]
-->
<ruleset name="xkcd (mixed content)" platform="mixedcontent">
<!-- other targets omitted -->
<target host="xkcd.com" />
<target host="www.xkcd.com" />
<!-- other rules omitted -->
<rule from="^http://(www\.)?xkcd\.com/(?=1037/)"
to="https://$1xkcd.com/" />
<test url="http://xkcd.com/1037/" />
</ruleset>
|
As a further example, a similar situation has already been found in the search feature on Stack Exchange sites (the
downgrade
in Stack-Exchange.xml ought to be a softdowngrade
). (Does the fighting between the current Stack-Exchange.xml and Stack-Exchange-mixedcontent.xml cause redirect loops if the latter is manually enabled? This needs testing.)
This feature should be used only for pages that have true (unsecurable) mixed content that breaks major functionality (including, but not limited to, layout breakage severe enough to make the site unusable by an experienced, normally-abled user).
To consider: Instead of defining softdowngrade
, is it better simply to give the existing downgrade
attribute such "soft" behavior (i.e., explicitly give normal rule
s precedence over downgrade
rule
s)? IIRC, currently, no code in the HTTPS Everywhere browser extension actually reads the downgrade
attribute; it is read only by validation scripts as part of the build process.
aia
- Currently, rulesets for sites with incomplete certificate chains are simply default_off
'd. This is a potential alternative for browsers that support fetching intermediate certificates in accord with Authority Information Access fields; currently it seems equivalent to chrome
according to this comment.mixedpost
- Really means false mixed POST; Firefox checks for these separately from mixed content, so a fix for bmo:878890 might not automatically address this; true mixed POSTs should(?) generally be handled by splitting coverage of the referring page to a default-off ruleset if major functionality is brokenmixedxhr
- when enforcement of same-origin policies causes problems with XMLHttpRequest calls (see torbug:7851)softmixed
(or falsemcb
?) - exact definition to be decided later; needed in order to distinguish "good" and "bad" MCB implementationstor
- needed if we ever want to allow clearnet domains to be rewritten to hidden services - mailing list discussion exists on whether this is worth doing at all - newer tickets include GitHub #3798These MUST NOT disable any rulesets without explicitly warning the user first. Instead, they SHOULD clarify the wording of the browser's TLS error pages, specifically to explain that the needed TLS feature may be broken by an intercepting proxy or webmaster misconfiguration. An initial implementation MAY treat these as no-ops. That is, in order to enable the corresponding rulesets by default, the browser addon MAY choose to pretend that all supported browsers match these platform values.
The behavior described above deviates from that for the existing platform
attribute; thus a new attribute needs to be defined, perhaps subplatform
.
letsencrypt
- The Let's Encrypt CA is often reported problematic on Chrome for Windows XP, presumably due to lack of a required intermediate certificate in the Microsoft-supplied certificate database (TODO: or signature algorithm?). This is a subplatform due to the deprecation of Windows XP.sni
- for sites that require SNI in order for a matching certificate to be obtained, such as those that are hosted on WebFaction or that use Cloudflare's free service tier. Compare the snionly
attribute in Chromium's HSTS preload list. (Dubious because: The non-SNI platform most likely to be encountered is Firefox with the Convergence addon [or its fork FreeSpeechMe, when configured to validate non-Namecoin sites?], but neither addon is still being maintained.)tls13
- for sites that require TLS 1.3 or higherSuggested syntax: <dnsoverride host="sd.sharethis.com" ipv4_blacklist="184.72.49.139" />
(example is only illustrative; IP address no longer accurate)
To be used to work around broken load-balancing arrangements
There should also be positive ipv4
and ipv6
attributes, to force the use of the specified IP(s) for the specified hostname(s), even if the browser receives a single A record for some other IP.
(of course, ipv6_blacklist
should be available too)
We should probably have different attribute names to specify hosts via either simple matches (like target
) or regexes (like securecookie
).
For a dnsoverride
element to be effective for a given host, that host MUST also be listed in the ruleset's target
s.
An attempt to blacklist an IP address corresponding to the only available A or AAAA record for a domain SHOULD be treated as a no-op and MUST generate a log message (TODO: at what severity?).
TODO: Decide how multiple IPs or hostnames should be delimited (comma? pipe? ...)
A Firefox implementation might depend on bmo:652295, though that bug isn't quite about overriding the built-in DNS resolver...
That is, allow a rule to specify multiple rewrite destinations among which one will be chosen randomly, to be used in cases where equivalent content is available on multiple hostnames. Some examples for real sites (just to show the syntax, not to demonstrate best practices):
Shareaholic |
---|
<ruleset name="Shareaholic.com (partial)">
<!-- other targets omitted -->
<target host="cdn.shareaholic.com" />
<!-- other rules omitted -->
<!-- CNAME rotates between two buckets: -->
<lbrule from="^http://cdn\.shareaholic\.com/">
<lbentry to="https://dtym7iokkjlif.cloudfront.net/" />
<lbentry to="https://dsms0mj1bbhn4.cloudfront.net/" />
</lbrule>
</ruleset>
|
Speed Demos Archive |
---|
<ruleset name="Speed Demos Archive.com (partial)">
<!-- other targets omitted -->
<target host="speeddemosarchive.com" />
<target host="www.speeddemosarchive.com" />
<!-- Other paths have no equivalent: -->
<exclusion pattern="^http://(www\.)?speeddemosarchive\.com/(?!favicon\.ico)" />
<test url="http://speeddemosarchive.com/favicon.ico" /> <!-- etc. -->
<!-- other rules omitted, specifically direct rewrites for forum and kb -->
<lbrule from="^http://(www\.)?speeddemosarchive\.com/">
<lbentry to="https://forum.speeddemosarchive.com/" />
<lbentry to="https://kb.speeddemosarchive.com/" />
</lbrule>
</ruleset>
|
Tumblr (unmaintained) |
---|
<!--
...
Problematic subdomains:
- media (cert only matches *.media)
NB: 25 and 37 now work as is; 34 and 35 no longer exist.
...
-->
<ruleset name="Tumblr.com (partial)">
<!-- other targets omitted -->
<target host="media.tumblr.com" />
<!-- other rules omitted -->
<lbrule from="^http://media\.tumblr\.com/">
<!--
These currently work but are no longer officially used for new images:
lbentry to="https://24.media.tumblr.com/" /
lbentry to="https://25.media.tumblr.com/" /
lbentry to="https://26.media.tumblr.com/" /
lbentry to="https://27.media.tumblr.com/" /
lbentry to="https://28.media.tumblr.com/" /
lbentry to="https://29.media.tumblr.com/" /
lbentry to="https://30.media.tumblr.com/" /
lbentry to="https://31.media.tumblr.com/" /
lbentry to="https://33.media.tumblr.com/" /
lbentry to="https://36.media.tumblr.com/" /
lbentry to="https://37.media.tumblr.com/" /
lbentry to="https://38.media.tumblr.com/" /
lbentry to="https://40.media.tumblr.com/" /
lbentry to="https://41.media.tumblr.com/" /
lbentry to="https://45.media.tumblr.com/" /
lbentry to="https://49.media.tumblr.com/" /
-->
<lbentry to="https://65.media.tumblr.com/" />
<lbentry to="https://66.media.tumblr.com/" />
<lbentry to="https://67.media.tumblr.com/" />
</lbrule>
</ruleset>
TODO: Is this example obsolete? (That is, does media.tumblr.com now have a valid cert to allow a direct rewrite?) |
To be reevaluated: The rewriting of any given URL should be deterministic within a browser session and/or a given time interval; that is, the chosen rewrite should be memoized.
If it is considered undesirable to repeatedly consume entropy from the browser's PRNG, perhaps a suitable pseudorandom number might be some HMAC using the originally-requested URL as the message and a single CSPRNG output (generated once per session or time interval) as the key.
such as letter case transformations and percent (un)encoding. Perhaps the to
field could contain something like $lc{1}
to mean "lowercase version of the string matched by the first parens in the corresponding from
"? This could be useful for dealing with redirection scripts:
<ruleset name="Example (partial)">
<target host="foo.example.com" />
<!--
foo.example.com lacks HTTPS support of its own
-->
<rule from="^http://foo\.example\.com/redirect.php\?u=https(?::|%3[Aa])(?:/|%2[Ff]){2}(.+)&bar=.*&baz="
to="https://$decode{1}" />
</ruleset>
TODO: explain other use cases
The existing proposal(s) seem(s) only to cover rulesets manually disabled by the user. The problem is a limitation of the current UI: If a casual user doesn't bother to click on the icon or the Tools menu entry, they may not be aware that a redirect loop exists. If it is the top-level document that has experienced a redirect loop, they may think there is no rule coverage for that URL. Consequently, they might not disable the ruleset in question. Thus, a problem-reporting system should also handle redirect loops.
It's probably also a good idea to report SSL/TLS protocol errors for sites with active rulesets (certificate-related or not); among other reasons, such errors may not be noticed if they are triggered by third-party content. (Perhaps we should twiddle the pref on Mozilla's TLS error reporter to point at an EFF/Tor Project-owned server...)
(For everything between here and the top of the section, "ruleset" means built-in rulesets only.)
If reporting that a user has manually disabled a (built-in) ruleset, allow optionally reporting whether there are any user rulesets that are active for the URLs for which the built-in rulesets were found disabled - but don't report on the contents of said user rulesets, of course
TODO: Discussion exists at GitHub issue 1888 with a proposed implementation in pull request 2601, but that implementation of the reporting mechanism appears to need revision because it does not yet ignore user rulesets.
...or possibly also when they are being disabled by default.
Theoretically, anyone technically oriented enough to work with user rulesets should be smart enough to validate their XML and regexes by eye (or script). However, people like me sometimes make stupid typos and then (1) get too lazy to check the Error Console and/or (2) visit websites that spam the Error Console heavily for unrelated reasons
For logical consistency, the [[torbug:8958]] proposal should probably be a new element name rather than an attribute of rule
elements; say, <certoverride host="deliveryimages.acm.org" accept_hostname="*.akamaihd.net" />
(a real example adapted from bmo:644640#c127). We should probably also define an "accept_fingerprint
" mechanism to override errors other than mismatches. (Observe that pinning the cert fingerprint via accept_fingerprint
would be satisfactory for both expiration and chain problems; on the other hand, we MUST NOT define accept_time
, as overriding the TLS stack's idea of the current time could cause it to send an OCSP request that is bound to fail because of the cert being expired.)
Should the host
attribute be a simple match (as in target
elements) or a regex (as in securecookie
)? Or should we provide options for both?
TODO: Make sure any specific proposal can handle load-balancing arrangements such as the one used by (www.)frys.com
(Fry's Electronics). We probably need plural names: accept_hostnames
, accept_fingerprints
. (In this specific case, a possible syntax would be <certoverride host="^(?:www\.)?frys\.com$" accept_hostnames="shop1.frys.com,shop2.frys.com,shop3.frys.com,shop4.frys.com,shop5.frys.com,shop6.frys.com" />
.)
Implement some UI/preference for an alternate name attribute, for users who would prefer to avoid seeing TLDs in ruleset names and/or prefer to see official company names written in full; examples based on existing rulesets:
<ruleset name="Reddit.com" friendly_name="Reddit">
<ruleset name="OEIS.org" friendly_name="On-Line Encyclopedia of Integer Sequences">
Categories: Computer security, Articles with RFC 2119 verbs