2019-04-14 22:23:52 +02:00
|
|
|
/*******************************************************************************
|
|
|
|
|
|
|
|
uBlock Origin - a browser extension to block requests.
|
|
|
|
Copyright (C) 2019-present Raymond Hill
|
|
|
|
|
|
|
|
This program is free software: you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
|
|
|
the Free Software Foundation, either version 3 of the License, or
|
|
|
|
(at your option) any later version.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program. If not, see {http://www.gnu.org/licenses/}.
|
|
|
|
|
|
|
|
Home: https://github.com/gorhill/uBlock
|
|
|
|
*/
|
|
|
|
|
|
|
|
'use strict';
|
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
// *****************************************************************************
|
|
|
|
// start of local namespace
|
|
|
|
|
|
|
|
{
|
|
|
|
|
2019-04-14 22:23:52 +02:00
|
|
|
/*******************************************************************************
|
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
A BidiTrieContainer is mostly a large buffer in which distinct but related
|
2019-04-14 22:23:52 +02:00
|
|
|
tries are stored. The memory layout of the buffer is as follow:
|
|
|
|
|
|
|
|
0-255: reserved
|
|
|
|
256-259: offset to start of trie data section (=> trie0)
|
|
|
|
260-263: offset to end of trie data section (=> trie1)
|
|
|
|
264-267: offset to start of character data section (=> char0)
|
|
|
|
268-271: offset to end of character data section (=> char1)
|
|
|
|
272: start of trie data section
|
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
+--------------+
|
|
|
|
Normal cell: | And | If "Segment info" matches:
|
|
|
|
(aka CELL) +--------------+ Goto "And"
|
|
|
|
| Or | Else
|
|
|
|
+--------------+ Goto "Or"
|
|
|
|
| Segment info |
|
|
|
|
+--------------+
|
|
|
|
|
|
|
|
+--------------+
|
|
|
|
Boundary cell: | Right And | "Right And" and/or "Left And"
|
|
|
|
(aka BCELL) +--------------+ can be 0 in last-segment condition.
|
|
|
|
| Left And |
|
|
|
|
+--------------+
|
|
|
|
| 0 |
|
|
|
|
+--------------+
|
|
|
|
|
|
|
|
Given following filters and assuming token is "ad" for all of them:
|
|
|
|
|
|
|
|
-images/ad-
|
|
|
|
/google_ad.
|
|
|
|
/images_ad.
|
|
|
|
_images/ad.
|
|
|
|
|
|
|
|
We get the following internal representation:
|
|
|
|
|
|
|
|
+-----------+ +-----------+ +---+
|
|
|
|
| |---->| |---->| 0 |
|
|
|
|
+-----------+ +-----------+ +---+ +-----------+
|
|
|
|
| 0 | +--| | | |---->| 0 |
|
|
|
|
+-----------+ | +-----------+ +---+ +-----------+
|
|
|
|
| ad | | | - | | 0 | | 0 |
|
|
|
|
+-----------+ | +-----------+ +---+ +-----------+
|
|
|
|
| | -images/ |
|
|
|
|
| +-----------+ +---+ +-----------+
|
|
|
|
+->| |---->| 0 |
|
|
|
|
+-----------+ +---+ +-----------+ +-----------+
|
|
|
|
| 0 | | |---->| |---->| 0 |
|
|
|
|
+-----------+ +---+ +-----------+ +-----------+
|
|
|
|
| . | | 0 | +--| | +--| |
|
|
|
|
+-----------+ +---+ | +-----------+ | +-----------+
|
|
|
|
| | _ | | | /google |
|
|
|
|
| +-----------+ | +-----------+
|
|
|
|
| |
|
|
|
|
| | +-----------+
|
|
|
|
| +->| 0 |
|
|
|
|
| +-----------+
|
|
|
|
| | 0 |
|
|
|
|
| +-----------+
|
|
|
|
| | /images |
|
|
|
|
| +-----------+
|
|
|
|
|
|
|
|
|
| +-----------+
|
|
|
|
+->| 0 |
|
|
|
|
+-----------+
|
|
|
|
| 0 |
|
|
|
|
+-----------+
|
|
|
|
| _images/ |
|
|
|
|
+-----------+
|
|
|
|
|
2019-04-14 22:23:52 +02:00
|
|
|
*/
|
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
const PAGE_SIZE = 65536;
|
|
|
|
// i32 / i8
|
|
|
|
const TRIE0_SLOT = 256 >>> 2; // 64 / 256
|
|
|
|
const TRIE1_SLOT = TRIE0_SLOT + 1; // 65 / 260
|
|
|
|
const CHAR0_SLOT = TRIE0_SLOT + 2; // 66 / 264
|
|
|
|
const CHAR1_SLOT = TRIE0_SLOT + 3; // 67 / 268
|
|
|
|
const TRIE0_START = TRIE0_SLOT + 4 << 2; // 272
|
|
|
|
|
|
|
|
const CELL_BYTE_LENGTH = 12;
|
|
|
|
const MIN_FREE_CELL_BYTE_LENGTH = CELL_BYTE_LENGTH * 4;
|
2019-04-14 22:23:52 +02:00
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
const CELL_AND = 0;
|
|
|
|
const CELL_OR = 1;
|
|
|
|
const BCELL_RIGHT_AND = 0;
|
|
|
|
const BCELL_LEFT_AND = 1;
|
|
|
|
const SEGMENT_INFO = 2;
|
2019-04-14 22:23:52 +02:00
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
|
|
|
|
µBlock.BidiTrieContainer = class {
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
|
|
|
|
constructor(details) {
|
|
|
|
if ( details instanceof Object === false ) { details = {}; }
|
2019-06-19 01:16:39 +02:00
|
|
|
const len = (details.byteLength || 0) + PAGE_SIZE-1 & ~(PAGE_SIZE-1);
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
this.buf = new Uint8Array(Math.max(len, 131072));
|
|
|
|
this.buf32 = new Uint32Array(this.buf.buffer);
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[TRIE0_SLOT] = TRIE0_START;
|
|
|
|
this.buf32[TRIE1_SLOT] = this.buf32[TRIE0_SLOT];
|
|
|
|
this.buf32[CHAR0_SLOT] = details.char0 || 65536;
|
|
|
|
this.buf32[CHAR1_SLOT] = this.buf32[CHAR0_SLOT];
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
|
|
|
//--------------------------------------------------------------------------
|
|
|
|
// Public methods
|
|
|
|
//--------------------------------------------------------------------------
|
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
reset() {
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[TRIE1_SLOT] = this.buf32[TRIE0_SLOT];
|
|
|
|
this.buf32[CHAR1_SLOT] = this.buf32[CHAR0_SLOT];
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
matches(iroot, a, i) {
|
|
|
|
const buf32 = this.buf32;
|
|
|
|
const buf8 = this.buf;
|
|
|
|
const char0 = buf32[CHAR0_SLOT];
|
|
|
|
const aR = a.length;
|
2019-04-14 22:23:52 +02:00
|
|
|
let icell = iroot;
|
2019-06-19 01:16:39 +02:00
|
|
|
let al = i;
|
|
|
|
let c, v, bl, n;
|
2019-04-14 22:23:52 +02:00
|
|
|
for (;;) {
|
2019-06-19 01:16:39 +02:00
|
|
|
c = a.charCodeAt(al);
|
2019-04-14 22:23:52 +02:00
|
|
|
al += 1;
|
|
|
|
// find first segment with a first-character match
|
|
|
|
for (;;) {
|
2019-06-19 01:16:39 +02:00
|
|
|
v = buf32[icell+SEGMENT_INFO];
|
2019-04-14 22:23:52 +02:00
|
|
|
bl = char0 + (v & 0x00FFFFFF);
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( buf8[bl] === c ) { break; }
|
|
|
|
icell = buf32[icell+CELL_OR];
|
2019-04-14 22:23:52 +02:00
|
|
|
if ( icell === 0 ) { return -1; }
|
|
|
|
}
|
|
|
|
// all characters in segment must match
|
2019-06-19 01:16:39 +02:00
|
|
|
n = v >>> 24;
|
2019-04-14 22:23:52 +02:00
|
|
|
if ( n > 1 ) {
|
|
|
|
n -= 1;
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( (al + n) > aR ) { return -1; }
|
2019-04-14 22:23:52 +02:00
|
|
|
bl += 1;
|
2019-06-19 01:16:39 +02:00
|
|
|
for ( let i = 0; i < n; i++ ) {
|
|
|
|
if ( a.charCodeAt(al+i) !== buf8[bl+i] ) { return -1; }
|
|
|
|
}
|
|
|
|
al += n;
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
|
|
|
// next segment
|
2019-06-19 01:16:39 +02:00
|
|
|
icell = buf32[icell+CELL_AND];
|
|
|
|
if ( /* icell === 0 || */ buf32[icell+SEGMENT_INFO] === 0 ) {
|
|
|
|
const inext = buf32[icell+BCELL_LEFT_AND];
|
|
|
|
if ( inext === 0 ) { return (i << 16) | al; }
|
|
|
|
const r = this.matchesLeft(inext, a, i);
|
|
|
|
if ( r !== -1 ) { return (r << 16) | al; }
|
|
|
|
icell = buf32[icell+CELL_AND];
|
|
|
|
if ( icell === 0 ) { return -1; }
|
|
|
|
}
|
|
|
|
if ( al === aR ) { return -1; }
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
matchesLeft(iroot, a, i) {
|
|
|
|
const buf32 = this.buf32;
|
|
|
|
const buf8 = this.buf;
|
|
|
|
const char0 = buf32[CHAR0_SLOT];
|
|
|
|
let icell = iroot;
|
|
|
|
let ar = i;
|
|
|
|
let c, v, br, n;
|
|
|
|
for (;;) {
|
|
|
|
ar -= 1;
|
|
|
|
c = a.charCodeAt(ar);
|
|
|
|
// find first segment with a first-character match
|
|
|
|
for (;;) {
|
|
|
|
v = buf32[icell+SEGMENT_INFO];
|
|
|
|
n = v >>> 24;
|
|
|
|
br = char0 + (v & 0x00FFFFFF) + n - 1;
|
|
|
|
if ( buf8[br] === c ) { break; }
|
|
|
|
icell = buf32[icell+CELL_OR];
|
|
|
|
if ( icell === 0 ) { return -1; }
|
|
|
|
}
|
|
|
|
// all characters in segment must match
|
|
|
|
if ( n > 1 ) {
|
|
|
|
n -= 1;
|
|
|
|
if ( n > ar ) { return -1; }
|
|
|
|
for ( let i = 1; i <= n; i++ ) {
|
|
|
|
if ( a.charCodeAt(ar-i) !== buf8[br-i] ) { return -1; }
|
|
|
|
}
|
|
|
|
ar -= n;
|
|
|
|
}
|
|
|
|
// next segment
|
|
|
|
icell = buf32[icell+CELL_AND];
|
|
|
|
if ( icell === 0 || buf32[icell+SEGMENT_INFO] === 0 ) { return ar; }
|
|
|
|
if ( ar === 0 ) { return -1; }
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
createOne(args) {
|
2019-04-14 22:23:52 +02:00
|
|
|
if ( Array.isArray(args) ) {
|
|
|
|
return new this.STrieRef(this, args[0], args[1]);
|
|
|
|
}
|
|
|
|
// grow buffer if needed
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( (this.buf32[CHAR0_SLOT] - this.buf32[TRIE1_SLOT]) < CELL_BYTE_LENGTH ) {
|
|
|
|
this.growBuf(CELL_BYTE_LENGTH, 0);
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
2019-06-19 01:16:39 +02:00
|
|
|
const iroot = this.buf32[TRIE1_SLOT] >>> 2;
|
|
|
|
this.buf32[TRIE1_SLOT] += CELL_BYTE_LENGTH;
|
|
|
|
this.buf32[iroot+CELL_OR] = 0;
|
|
|
|
this.buf32[iroot+CELL_AND] = 0;
|
|
|
|
this.buf32[iroot+SEGMENT_INFO] = 0;
|
2019-04-14 22:23:52 +02:00
|
|
|
return new this.STrieRef(this, iroot, 0);
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
compileOne(trieRef) {
|
2019-04-14 22:23:52 +02:00
|
|
|
return [ trieRef.iroot, trieRef.size ];
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
add(iroot, a, i = 0) {
|
|
|
|
const aR = a.length;
|
|
|
|
if ( aR === 0 ) { return 0; }
|
2019-04-14 22:23:52 +02:00
|
|
|
// grow buffer if needed
|
|
|
|
if (
|
2019-06-19 01:16:39 +02:00
|
|
|
(this.buf32[CHAR0_SLOT] - this.buf32[TRIE1_SLOT]) < MIN_FREE_CELL_BYTE_LENGTH ||
|
2019-08-22 23:11:49 +02:00
|
|
|
(this.buf.length - this.buf32[CHAR1_SLOT]) < aR
|
2019-04-14 22:23:52 +02:00
|
|
|
) {
|
2019-08-22 23:11:49 +02:00
|
|
|
this.growBuf(MIN_FREE_CELL_BYTE_LENGTH, aR);
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
2019-06-19 01:16:39 +02:00
|
|
|
const buf32 = this.buf32;
|
2019-08-22 23:11:49 +02:00
|
|
|
let icell = iroot;
|
|
|
|
// special case: first node in trie
|
|
|
|
if ( buf32[icell+SEGMENT_INFO] === 0 ) {
|
|
|
|
buf32[icell+SEGMENT_INFO] = this.addSegment(a, i, aR);
|
|
|
|
return this.addLeft(icell, a, i);
|
|
|
|
}
|
2019-06-19 01:16:39 +02:00
|
|
|
const buf8 = this.buf;
|
|
|
|
const char0 = buf32[CHAR0_SLOT];
|
|
|
|
let al = i;
|
2019-04-14 22:23:52 +02:00
|
|
|
let inext;
|
|
|
|
// find a matching cell: move down
|
|
|
|
for (;;) {
|
2019-06-19 01:16:39 +02:00
|
|
|
const binfo = buf32[icell+SEGMENT_INFO];
|
|
|
|
// skip boundary cells
|
|
|
|
if ( binfo === 0 ) {
|
|
|
|
icell = buf32[icell+BCELL_RIGHT_AND];
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
let bl = char0 + (binfo & 0x00FFFFFF);
|
|
|
|
// if first character is no match, move to next descendant
|
|
|
|
if ( buf8[bl] !== a.charCodeAt(al) ) {
|
|
|
|
inext = buf32[icell+CELL_OR];
|
|
|
|
if ( inext === 0 ) {
|
|
|
|
inext = this.addCell(0, 0, this.addSegment(a, al, aR));
|
|
|
|
buf32[icell+CELL_OR] = inext;
|
|
|
|
return this.addLeft(inext, a, i);
|
|
|
|
}
|
|
|
|
icell = inext;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
// 1st character was tested
|
|
|
|
let bi = 1;
|
|
|
|
al += 1;
|
|
|
|
// find 1st mismatch in rest of segment
|
|
|
|
const bR = binfo >>> 24;
|
|
|
|
if ( bR !== 1 ) {
|
|
|
|
for (;;) {
|
|
|
|
if ( bi === bR ) { break; }
|
|
|
|
if ( al === aR ) { break; }
|
|
|
|
if ( buf8[bl+bi] !== a.charCodeAt(al) ) { break; }
|
|
|
|
bi += 1;
|
|
|
|
al += 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
// all segment characters matched
|
|
|
|
if ( bi === bR ) {
|
|
|
|
// needle remainder: no
|
|
|
|
if ( al === aR ) {
|
|
|
|
return this.addLeft(icell, a, i);
|
|
|
|
}
|
|
|
|
// needle remainder: yes
|
|
|
|
inext = buf32[icell+CELL_AND];
|
|
|
|
if ( buf32[inext+CELL_AND] !== 0 ) {
|
|
|
|
icell = inext;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
// add needle remainder
|
|
|
|
icell = this.addCell(0, 0, this.addSegment(a, al, aR));
|
|
|
|
buf32[inext+CELL_AND] = icell;
|
|
|
|
return this.addLeft(icell, a, i);
|
|
|
|
}
|
|
|
|
// some characters matched
|
|
|
|
// split current segment
|
|
|
|
bl -= char0;
|
|
|
|
buf32[icell+SEGMENT_INFO] = bi << 24 | bl;
|
|
|
|
inext = this.addCell(
|
|
|
|
buf32[icell+CELL_AND],
|
|
|
|
0,
|
|
|
|
bR - bi << 24 | bl + bi
|
|
|
|
);
|
|
|
|
buf32[icell+CELL_AND] = inext;
|
|
|
|
// needle remainder: no = need boundary cell
|
|
|
|
if ( al === aR ) {
|
|
|
|
return this.addLeft(icell, a, i);
|
|
|
|
}
|
|
|
|
// needle remainder: yes = need new cell for remaining characters
|
|
|
|
icell = this.addCell(0, 0, this.addSegment(a, al, aR));
|
|
|
|
buf32[inext+CELL_OR] = icell;
|
|
|
|
return this.addLeft(icell, a, i);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
addLeft(icell, a, i) {
|
|
|
|
const buf32 = this.buf32;
|
|
|
|
// fetch boundary cell
|
|
|
|
let inext = buf32[icell+CELL_AND];
|
|
|
|
// add boundary cell if none exist
|
|
|
|
if ( inext === 0 || buf32[inext+SEGMENT_INFO] !== 0 ) {
|
|
|
|
const iboundary = this.allocateCell();
|
|
|
|
buf32[icell+CELL_AND] = iboundary;
|
|
|
|
buf32[iboundary+BCELL_RIGHT_AND] = inext;
|
|
|
|
if ( i === 0 ) { return 1; }
|
|
|
|
buf32[iboundary+BCELL_LEFT_AND] = this.allocateCell();
|
|
|
|
inext = iboundary;
|
|
|
|
}
|
|
|
|
// shortest match is always first so no point storing whatever is left
|
|
|
|
if ( buf32[inext+BCELL_LEFT_AND] === 0 ) {
|
|
|
|
return i === 0 ? 0 : 1;
|
|
|
|
}
|
|
|
|
// bail out if no left segment
|
|
|
|
if ( i === 0 ) {
|
|
|
|
buf32[inext+BCELL_LEFT_AND] = 0;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
// fetch root cell of left segment
|
|
|
|
icell = buf32[inext+BCELL_LEFT_AND];
|
|
|
|
// special case: first node in trie
|
|
|
|
if ( buf32[icell+SEGMENT_INFO] === 0 ) {
|
|
|
|
buf32[icell+SEGMENT_INFO] = this.addSegment(a, 0, i);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
const buf8 = this.buf;
|
|
|
|
const char0 = buf32[CHAR0_SLOT];
|
|
|
|
let ar = i;
|
|
|
|
// find a matching cell: move down
|
|
|
|
for (;;) {
|
|
|
|
const binfo = buf32[icell+SEGMENT_INFO];
|
2019-04-14 22:23:52 +02:00
|
|
|
// skip boundary cells
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( binfo === 0 ) {
|
|
|
|
icell = buf32[icell+CELL_AND];
|
2019-04-14 22:23:52 +02:00
|
|
|
continue;
|
|
|
|
}
|
2019-06-19 01:16:39 +02:00
|
|
|
const bL = char0 + (binfo & 0x00FFFFFF);
|
|
|
|
const bR = bL + (binfo >>> 24);
|
|
|
|
let br = bR;
|
2019-04-14 22:23:52 +02:00
|
|
|
// if first character is no match, move to next descendant
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( buf8[br-1] !== a.charCodeAt(ar-1) ) {
|
|
|
|
inext = buf32[icell+CELL_OR];
|
2019-04-14 22:23:52 +02:00
|
|
|
if ( inext === 0 ) {
|
2019-06-19 01:16:39 +02:00
|
|
|
inext = this.addCell(0, 0, this.addSegment(a, 0, ar));
|
|
|
|
buf32[icell+CELL_OR] = inext;
|
2019-04-14 22:23:52 +02:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
icell = inext;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
// 1st character was tested
|
2019-06-19 01:16:39 +02:00
|
|
|
br -= 1;
|
|
|
|
ar -= 1;
|
2019-04-14 22:23:52 +02:00
|
|
|
// find 1st mismatch in rest of segment
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( br !== bL ) {
|
2019-04-14 22:23:52 +02:00
|
|
|
for (;;) {
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( br === bL ) { break; }
|
|
|
|
if ( ar === 0 ) { break; }
|
|
|
|
if ( buf8[br-1] !== a.charCodeAt(ar-1) ) { break; }
|
|
|
|
br -= 1;
|
|
|
|
ar -= 1;
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
// all segment characters matched
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( br === bL ) {
|
|
|
|
inext = buf32[icell+CELL_AND];
|
2019-04-14 22:23:52 +02:00
|
|
|
// needle remainder: no
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( ar === 0 ) {
|
2019-04-14 22:23:52 +02:00
|
|
|
// boundary cell already present
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( inext === 0 || buf32[inext+SEGMENT_INFO] === 0 ) {
|
|
|
|
return 0;
|
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
// need boundary cell
|
2019-06-19 01:16:39 +02:00
|
|
|
buf32[icell+CELL_AND] = this.addCell(inext, 0, 0);
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
|
|
|
// needle remainder: yes
|
|
|
|
else {
|
|
|
|
if ( inext !== 0 ) {
|
|
|
|
icell = inext;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
// boundary cell + needle remainder
|
|
|
|
inext = this.addCell(0, 0, 0);
|
2019-06-19 01:16:39 +02:00
|
|
|
buf32[icell+CELL_AND] = inext;
|
|
|
|
buf32[inext+CELL_AND] =
|
|
|
|
this.addCell(0, 0, this.addSegment(a, 0, ar));
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
// some segment characters matched
|
|
|
|
else {
|
|
|
|
// split current cell
|
2019-06-19 01:16:39 +02:00
|
|
|
buf32[icell+SEGMENT_INFO] = (bR - br) << 24 | (br - char0);
|
2019-04-14 22:23:52 +02:00
|
|
|
inext = this.addCell(
|
2019-06-19 01:16:39 +02:00
|
|
|
buf32[icell+CELL_AND],
|
2019-04-14 22:23:52 +02:00
|
|
|
0,
|
2019-06-19 01:16:39 +02:00
|
|
|
(br - bL) << 24 | (bL - char0)
|
2019-04-14 22:23:52 +02:00
|
|
|
);
|
2019-06-19 01:16:39 +02:00
|
|
|
buf32[icell+CELL_AND] = inext;
|
2019-04-14 22:23:52 +02:00
|
|
|
// needle remainder: no = need boundary cell
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( ar === 0 ) {
|
|
|
|
buf32[icell+CELL_AND] = this.addCell(inext, 0, 0);
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
|
|
|
// needle remainder: yes = need new cell for remaining characters
|
|
|
|
else {
|
2019-06-19 01:16:39 +02:00
|
|
|
buf32[inext+CELL_OR] =
|
|
|
|
this.addCell(0, 0, this.addSegment(a, 0, ar));
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
optimize() {
|
2019-04-14 22:23:52 +02:00
|
|
|
this.shrinkBuf();
|
|
|
|
return {
|
|
|
|
byteLength: this.buf.byteLength,
|
2019-06-19 01:16:39 +02:00
|
|
|
char0: this.buf32[CHAR0_SLOT],
|
2019-04-14 22:23:52 +02:00
|
|
|
};
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
serialize(encoder) {
|
2019-04-14 22:23:52 +02:00
|
|
|
if ( encoder instanceof Object ) {
|
|
|
|
return encoder.encode(
|
|
|
|
this.buf32.buffer,
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[CHAR1_SLOT]
|
2019-04-14 22:23:52 +02:00
|
|
|
);
|
|
|
|
}
|
|
|
|
return Array.from(
|
|
|
|
new Uint32Array(
|
|
|
|
this.buf32.buffer,
|
|
|
|
0,
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[CHAR1_SLOT] + 3 >>> 2
|
2019-04-14 22:23:52 +02:00
|
|
|
)
|
|
|
|
);
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
unserialize(selfie, decoder) {
|
2019-04-14 22:23:52 +02:00
|
|
|
const shouldDecode = typeof selfie === 'string';
|
|
|
|
let byteLength = shouldDecode
|
|
|
|
? decoder.decodeSize(selfie)
|
|
|
|
: selfie.length << 2;
|
2019-04-20 15:06:54 +02:00
|
|
|
if ( byteLength === 0 ) { return false; }
|
2019-06-19 01:16:39 +02:00
|
|
|
byteLength = byteLength + PAGE_SIZE-1 & ~(PAGE_SIZE-1);
|
2019-04-14 22:23:52 +02:00
|
|
|
if ( byteLength > this.buf.length ) {
|
|
|
|
this.buf = new Uint8Array(byteLength);
|
|
|
|
this.buf32 = new Uint32Array(this.buf.buffer);
|
|
|
|
}
|
|
|
|
if ( shouldDecode ) {
|
|
|
|
decoder.decode(selfie, this.buf.buffer);
|
|
|
|
} else {
|
|
|
|
this.buf32.set(selfie);
|
|
|
|
}
|
2019-04-20 15:06:54 +02:00
|
|
|
return true;
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
|
|
|
//--------------------------------------------------------------------------
|
|
|
|
// Private methods
|
|
|
|
//--------------------------------------------------------------------------
|
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
allocateCell() {
|
|
|
|
let icell = this.buf32[TRIE1_SLOT];
|
|
|
|
this.buf32[TRIE1_SLOT] = icell + CELL_BYTE_LENGTH;
|
2019-04-14 22:23:52 +02:00
|
|
|
icell >>>= 2;
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[icell+0] = 0;
|
|
|
|
this.buf32[icell+1] = 0;
|
|
|
|
this.buf32[icell+2] = 0;
|
|
|
|
return icell;
|
|
|
|
}
|
|
|
|
|
|
|
|
addCell(iand, ior, v) {
|
|
|
|
const icell = this.allocateCell();
|
|
|
|
this.buf32[icell+CELL_AND] = iand;
|
|
|
|
this.buf32[icell+CELL_OR] = ior;
|
|
|
|
this.buf32[icell+SEGMENT_INFO] = v;
|
2019-04-14 22:23:52 +02:00
|
|
|
return icell;
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
addSegment(s, l, r) {
|
|
|
|
const n = r - l;
|
|
|
|
if ( n === 0 ) { return 0; }
|
|
|
|
const buf32 = this.buf32;
|
|
|
|
const des = buf32[CHAR1_SLOT];
|
|
|
|
buf32[CHAR1_SLOT] = des + n;
|
|
|
|
const buf8 = this.buf;
|
|
|
|
for ( let i = 0; i < n; i++ ) {
|
|
|
|
buf8[des+i] = s.charCodeAt(l+i);
|
|
|
|
}
|
|
|
|
return (n << 24) | (des - buf32[CHAR0_SLOT]);
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
growBuf(trieGrow, charGrow) {
|
2019-04-14 22:23:52 +02:00
|
|
|
const char0 = Math.max(
|
2019-06-19 01:16:39 +02:00
|
|
|
(this.buf32[TRIE1_SLOT] + trieGrow + PAGE_SIZE-1) & ~(PAGE_SIZE-1),
|
|
|
|
this.buf32[CHAR0_SLOT]
|
2019-04-14 22:23:52 +02:00
|
|
|
);
|
2019-06-19 01:16:39 +02:00
|
|
|
const char1 = char0 + this.buf32[CHAR1_SLOT] - this.buf32[CHAR0_SLOT];
|
2019-04-14 22:23:52 +02:00
|
|
|
const bufLen = Math.max(
|
2019-06-19 01:16:39 +02:00
|
|
|
(char1 + charGrow + PAGE_SIZE-1) & ~(PAGE_SIZE-1),
|
2019-04-14 22:23:52 +02:00
|
|
|
this.buf.length
|
|
|
|
);
|
|
|
|
this.resizeBuf(bufLen, char0);
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
shrinkBuf() {
|
2019-06-19 01:16:39 +02:00
|
|
|
const char0 = this.buf32[TRIE1_SLOT] + MIN_FREE_CELL_BYTE_LENGTH;
|
|
|
|
const char1 = char0 + this.buf32[CHAR1_SLOT] - this.buf32[CHAR0_SLOT];
|
2019-04-14 22:23:52 +02:00
|
|
|
const bufLen = char1 + 256;
|
|
|
|
this.resizeBuf(bufLen, char0);
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
resizeBuf(bufLen, char0) {
|
2019-06-19 01:16:39 +02:00
|
|
|
bufLen = bufLen + PAGE_SIZE-1 & ~(PAGE_SIZE-1);
|
2019-04-14 22:23:52 +02:00
|
|
|
if (
|
|
|
|
bufLen === this.buf.length &&
|
2019-06-19 01:16:39 +02:00
|
|
|
char0 === this.buf32[CHAR0_SLOT]
|
2019-04-14 22:23:52 +02:00
|
|
|
) {
|
|
|
|
return;
|
|
|
|
}
|
2019-06-19 01:16:39 +02:00
|
|
|
const charDataLen = this.buf32[CHAR1_SLOT] - this.buf32[CHAR0_SLOT];
|
2019-04-14 22:23:52 +02:00
|
|
|
if ( bufLen !== this.buf.length ) {
|
|
|
|
const newBuf = new Uint8Array(bufLen);
|
|
|
|
newBuf.set(
|
|
|
|
new Uint8Array(
|
|
|
|
this.buf.buffer,
|
|
|
|
0,
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[TRIE1_SLOT]
|
2019-04-14 22:23:52 +02:00
|
|
|
),
|
|
|
|
0
|
|
|
|
);
|
|
|
|
newBuf.set(
|
|
|
|
new Uint8Array(
|
|
|
|
this.buf.buffer,
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[CHAR0_SLOT],
|
2019-04-14 22:23:52 +02:00
|
|
|
charDataLen
|
|
|
|
),
|
|
|
|
char0
|
|
|
|
);
|
|
|
|
this.buf = newBuf;
|
|
|
|
this.buf32 = new Uint32Array(this.buf.buffer);
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[CHAR0_SLOT] = char0;
|
|
|
|
this.buf32[CHAR1_SLOT] = char0 + charDataLen;
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( char0 !== this.buf32[CHAR0_SLOT] ) {
|
2019-04-14 22:23:52 +02:00
|
|
|
this.buf.set(
|
|
|
|
new Uint8Array(
|
|
|
|
this.buf.buffer,
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[CHAR0_SLOT],
|
2019-04-14 22:23:52 +02:00
|
|
|
charDataLen
|
|
|
|
),
|
|
|
|
char0
|
|
|
|
);
|
2019-06-19 01:16:39 +02:00
|
|
|
this.buf32[CHAR0_SLOT] = char0;
|
|
|
|
this.buf32[CHAR1_SLOT] = char0 + charDataLen;
|
2019-04-14 22:23:52 +02:00
|
|
|
}
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
};
|
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
/*******************************************************************************
|
|
|
|
|
|
|
|
Class to hold reference to a specific trie
|
2019-04-14 22:23:52 +02:00
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
*/
|
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
µBlock.BidiTrieContainer.prototype.STrieRef = class {
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
constructor(container, iroot, size) {
|
|
|
|
this.container = container;
|
|
|
|
this.iroot = iroot;
|
|
|
|
this.size = size;
|
|
|
|
}
|
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
add(s, i = 0) {
|
|
|
|
if ( this.container.add(this.iroot, s, i) === 1 ) {
|
2019-04-14 22:23:52 +02:00
|
|
|
this.size += 1;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
|
|
|
|
2019-06-19 01:16:39 +02:00
|
|
|
matches(a, i) {
|
|
|
|
return this.container.matches(this.iroot, a, i);
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
|
|
|
|
2019-05-06 17:12:39 +02:00
|
|
|
dump() {
|
|
|
|
for ( const s of this ) {
|
|
|
|
console.log(s);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
[Symbol.iterator]() {
|
2019-04-14 22:23:52 +02:00
|
|
|
return {
|
|
|
|
value: undefined,
|
|
|
|
done: false,
|
|
|
|
next: function() {
|
|
|
|
if ( this.icell === 0 ) {
|
|
|
|
if ( this.forks.length === 0 ) {
|
|
|
|
this.value = undefined;
|
|
|
|
this.done = true;
|
|
|
|
return this;
|
|
|
|
}
|
|
|
|
this.charPtr = this.forks.pop();
|
|
|
|
this.icell = this.forks.pop();
|
|
|
|
}
|
|
|
|
for (;;) {
|
2019-06-19 01:16:39 +02:00
|
|
|
const idown = this.container.buf32[this.icell+CELL_OR];
|
2019-04-14 22:23:52 +02:00
|
|
|
if ( idown !== 0 ) {
|
|
|
|
this.forks.push(idown, this.charPtr);
|
|
|
|
}
|
2019-06-19 01:16:39 +02:00
|
|
|
const v = this.container.buf32[this.icell+SEGMENT_INFO];
|
|
|
|
let i0 = this.container.buf32[CHAR0_SLOT] + (v & 0x00FFFFFF);
|
2019-04-14 22:23:52 +02:00
|
|
|
const i1 = i0 + (v >>> 24);
|
|
|
|
while ( i0 < i1 ) {
|
|
|
|
this.charBuf[this.charPtr] = this.container.buf[i0];
|
2019-05-06 17:12:39 +02:00
|
|
|
this.charPtr += 1;
|
2019-04-14 22:23:52 +02:00
|
|
|
i0 += 1;
|
|
|
|
}
|
2019-06-19 01:16:39 +02:00
|
|
|
this.icell = this.container.buf32[this.icell+CELL_AND];
|
2019-04-14 22:23:52 +02:00
|
|
|
if ( this.icell === 0 ) {
|
|
|
|
return this.toPattern();
|
|
|
|
}
|
2019-06-19 01:16:39 +02:00
|
|
|
if ( this.container.buf32[this.icell+SEGMENT_INFO] === 0 ) {
|
|
|
|
this.icell = this.container.buf32[this.icell+CELL_AND];
|
2019-04-14 22:23:52 +02:00
|
|
|
return this.toPattern();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
toPattern: function() {
|
|
|
|
this.value = this.textDecoder.decode(
|
2019-05-06 17:12:39 +02:00
|
|
|
new Uint8Array(this.charBuf.buffer, 0, this.charPtr)
|
2019-04-14 22:23:52 +02:00
|
|
|
);
|
|
|
|
return this;
|
|
|
|
},
|
|
|
|
container: this.container,
|
|
|
|
icell: this.iroot,
|
|
|
|
charBuf: new Uint8Array(256),
|
2019-05-06 17:12:39 +02:00
|
|
|
charPtr: 0,
|
2019-04-14 22:23:52 +02:00
|
|
|
forks: [],
|
|
|
|
textDecoder: new TextDecoder()
|
|
|
|
};
|
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 22:33:46 +02:00
|
|
|
}
|
2019-04-14 22:23:52 +02:00
|
|
|
};
|
2019-06-19 01:16:39 +02:00
|
|
|
|
|
|
|
// end of local namespace
|
|
|
|
// *****************************************************************************
|
|
|
|
|
|
|
|
}
|