2017-11-02 20:49:11 +01:00
|
|
|
/*******************************************************************************
|
|
|
|
|
|
|
|
uBlock Origin - a browser extension to block requests.
|
2018-11-03 12:58:46 +01:00
|
|
|
Copyright (C) 2017-present Raymond Hill
|
2017-11-02 20:49:11 +01:00
|
|
|
|
|
|
|
This program is free software: you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
|
|
|
the Free Software Foundation, either version 3 of the License, or
|
|
|
|
(at your option) any later version.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program. If not, see {http://www.gnu.org/licenses/}.
|
|
|
|
|
|
|
|
Home: https://github.com/gorhill/uBlock
|
|
|
|
*/
|
|
|
|
|
2018-11-03 12:58:46 +01:00
|
|
|
/* globals WebAssembly */
|
2018-12-04 19:02:09 +01:00
|
|
|
/* exported HNTrieContainer */
|
2018-11-03 12:58:46 +01:00
|
|
|
|
2017-11-02 20:49:11 +01:00
|
|
|
'use strict';
|
|
|
|
|
|
|
|
/*******************************************************************************
|
|
|
|
|
|
|
|
The original prototype was to develop an idea I had about using jump indices
|
|
|
|
in a TypedArray for quickly matching hostnames (or more generally strings)[1].
|
|
|
|
Once I had a working, un-optimized prototype, I realized I had ended up
|
|
|
|
with something formally named a "trie": <https://en.wikipedia.org/wiki/Trie>,
|
|
|
|
hence the name. I have no idea whether the implementation here or one
|
|
|
|
resembling it has been done elsewhere.
|
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
"HN" in HNTrieContainer stands for "HostName", because the trie is
|
|
|
|
specialized to deal with matching hostnames -- which is a bit more
|
|
|
|
complicated than matching plain strings.
|
2017-11-02 20:49:11 +01:00
|
|
|
|
|
|
|
For example, `www.abc.com` is deemed matching `abc.com`, because the former
|
|
|
|
is a subdomain of the latter. The opposite is of course not true.
|
|
|
|
|
2018-11-03 12:58:46 +01:00
|
|
|
The resulting read-only tries created as a result of using hnTrieManager are
|
2017-11-02 20:49:11 +01:00
|
|
|
simply just typed arrays filled with integers. The matching algorithm is
|
|
|
|
just a matter of reading/comparing these integers, and further using them as
|
|
|
|
indices in the array as a way to move around in the trie.
|
|
|
|
|
|
|
|
[1] To solve <https://github.com/gorhill/uBlock/issues/3193>
|
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
Since this trie is specialized for matching hostnames, the stored
|
|
|
|
strings are reversed internally, because of hostname comparison logic:
|
|
|
|
|
|
|
|
Correct matching:
|
|
|
|
index 0123456
|
|
|
|
abc.com
|
|
|
|
|
|
|
|
|
www.abc.com
|
|
|
|
index 01234567890
|
|
|
|
|
|
|
|
Incorrect matching (typically used for plain strings):
|
|
|
|
index 0123456
|
|
|
|
abc.com
|
|
|
|
|
|
|
|
|
www.abc.com
|
|
|
|
index 01234567890
|
|
|
|
|
|
|
|
------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
1st iteration:
|
|
|
|
- https://github.com/gorhill/uBlock/blob/ff58107dac3a32607f8113e39ed5015584506813/src/js/hntrie.js
|
|
|
|
- Suitable for small to medium set of hostnames
|
|
|
|
- One buffer per trie
|
|
|
|
|
|
|
|
2nd iteration: goal was to make matches() method wasm-able
|
|
|
|
- https://github.com/gorhill/uBlock/blob/c3b0fd31f64bd7ffecdd282fb1208fe07aac3eb0/src/js/hntrie.js
|
|
|
|
- Suitable for small to medium set of hostnames
|
|
|
|
- Distinct tries all share same buffer:
|
|
|
|
- Reduced memory footprint
|
|
|
|
- https://stackoverflow.com/questions/45803829/memory-overhead-of-typed-arrays-vs-strings/45808835#45808835
|
|
|
|
- Reusing needle character lookups for all tries
|
|
|
|
- This significantly reduce the number of String.charCodeAt() calls
|
|
|
|
- Slightly improved creation time
|
|
|
|
|
|
|
|
This is the 3rd iteration: goal was to make add() method wasm-able and
|
|
|
|
further improve memory/CPU efficiency.
|
|
|
|
|
|
|
|
This 3rd iteration has the following new traits:
|
|
|
|
- Suitable for small to large set of hostnames
|
|
|
|
- Support multiple trie containers (instanciable)
|
|
|
|
- Designed to hold large number of hostnames
|
|
|
|
- Hostnames can be added at any time (instead of all at once)
|
|
|
|
- This means pre-sorting is no longer a requirement
|
|
|
|
- The trie is always compact
|
|
|
|
- There is no longer a need for a `vacuum` method
|
|
|
|
- This makes the add() method wasm-able
|
|
|
|
- It can return the exact hostname which caused the match
|
|
|
|
- serializable/unserializable available for fast loading
|
|
|
|
- Distinct trie reference support the iteration protocol, thus allowing
|
|
|
|
to extract all the hostnames in the trie
|
|
|
|
|
|
|
|
Its primary purpose is to replace the use of Set() as a mean to hold
|
|
|
|
large number of hostnames (ex. FilterHostnameDict in static filtering
|
|
|
|
engine).
|
|
|
|
|
|
|
|
A HNTrieContainer is mostly a large buffer in which distinct but related
|
|
|
|
tries are stored. The memory layout of the buffer is as follow:
|
|
|
|
|
|
|
|
0-254: needle being processed
|
|
|
|
255: length of needle
|
|
|
|
256-259: offset to start of trie data section (=> trie0)
|
|
|
|
260-263: offset to end of trie data section (=> trie1)
|
|
|
|
264-267: offset to start of character data section (=> char0)
|
|
|
|
268-271: offset to end of character data section (=> char1)
|
|
|
|
272: start of trie data section
|
|
|
|
|
2017-11-02 20:49:11 +01:00
|
|
|
*/
|
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
const HNTRIE_PAGE_SIZE = 65536;
|
|
|
|
// i32 / i8
|
|
|
|
const HNTRIE_TRIE0_SLOT = 256 >>> 2; // 64 / 256
|
|
|
|
const HNTRIE_TRIE1_SLOT = HNTRIE_TRIE0_SLOT + 1; // 65 / 260
|
|
|
|
const HNTRIE_CHAR0_SLOT = HNTRIE_TRIE0_SLOT + 2; // 66 / 264
|
|
|
|
const HNTRIE_CHAR1_SLOT = HNTRIE_TRIE0_SLOT + 3; // 67 / 268
|
|
|
|
const HNTRIE_TRIE0_START = HNTRIE_TRIE0_SLOT + 4 << 2; // 272
|
|
|
|
|
|
|
|
|
|
|
|
const HNTrieContainer = function(details) {
|
|
|
|
if ( details instanceof Object === false ) { details = {}; }
|
|
|
|
let len = (details.byteLength || 0) + HNTRIE_PAGE_SIZE-1 & ~(HNTRIE_PAGE_SIZE-1);
|
|
|
|
this.buf = new Uint8Array(Math.max(len, 131072));
|
|
|
|
this.buf32 = new Uint32Array(this.buf.buffer);
|
|
|
|
this.needle = '';
|
|
|
|
this.buf32[HNTRIE_TRIE0_SLOT] = HNTRIE_TRIE0_START;
|
|
|
|
this.buf32[HNTRIE_TRIE1_SLOT] = this.buf32[HNTRIE_TRIE0_SLOT];
|
|
|
|
this.buf32[HNTRIE_CHAR0_SLOT] = details.char0 || 65536;
|
|
|
|
this.buf32[HNTRIE_CHAR1_SLOT] = this.buf32[HNTRIE_CHAR0_SLOT];
|
|
|
|
this.wasmInstancePromise = null;
|
|
|
|
this.wasmMemory = null;
|
|
|
|
this.readyToUse();
|
|
|
|
};
|
|
|
|
|
|
|
|
HNTrieContainer.prototype = {
|
|
|
|
|
|
|
|
//--------------------------------------------------------------------------
|
|
|
|
// Public methods
|
|
|
|
//--------------------------------------------------------------------------
|
2018-11-03 12:58:46 +01:00
|
|
|
|
|
|
|
reset: function() {
|
2018-12-04 19:02:09 +01:00
|
|
|
this.buf32[HNTRIE_TRIE1_SLOT] = this.buf32[HNTRIE_TRIE0_SLOT];
|
|
|
|
this.buf32[HNTRIE_CHAR1_SLOT] = this.buf32[HNTRIE_CHAR0_SLOT];
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
|
|
|
|
|
|
|
readyToUse: function() {
|
2018-12-04 19:02:09 +01:00
|
|
|
if ( HNTrieContainer.wasmModulePromise instanceof Promise === false ) {
|
|
|
|
return Promise.resolve();
|
|
|
|
}
|
2019-02-01 14:20:43 +01:00
|
|
|
return HNTrieContainer.wasmModulePromise.then(
|
|
|
|
module => this.initWASM(module)
|
|
|
|
);
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
|
|
|
|
|
|
|
setNeedle: function(needle) {
|
|
|
|
if ( needle !== this.needle ) {
|
2018-12-04 19:02:09 +01:00
|
|
|
const buf = this.buf;
|
2018-11-03 12:58:46 +01:00
|
|
|
let i = needle.length;
|
2018-11-19 20:04:26 +01:00
|
|
|
if ( i > 254 ) { i = 254; }
|
2018-11-03 12:58:46 +01:00
|
|
|
buf[255] = i;
|
|
|
|
while ( i-- ) {
|
|
|
|
buf[i] = needle.charCodeAt(i);
|
|
|
|
}
|
|
|
|
this.needle = needle;
|
2017-11-02 20:49:11 +01:00
|
|
|
}
|
2018-11-03 12:58:46 +01:00
|
|
|
return this;
|
|
|
|
},
|
2017-11-02 20:49:11 +01:00
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
matchesJS: function(iroot) {
|
|
|
|
const char0 = this.buf32[HNTRIE_CHAR0_SLOT];
|
|
|
|
let ineedle = this.buf[255];
|
|
|
|
let icell = iroot;
|
2018-11-03 12:58:46 +01:00
|
|
|
for (;;) {
|
2018-12-04 19:02:09 +01:00
|
|
|
if ( ineedle === 0 ) { return -1; }
|
2018-11-03 12:58:46 +01:00
|
|
|
ineedle -= 1;
|
2018-12-04 19:02:09 +01:00
|
|
|
let c = this.buf[ineedle];
|
|
|
|
let v, i0;
|
|
|
|
// find first segment with a first-character match
|
2018-11-03 12:58:46 +01:00
|
|
|
for (;;) {
|
2018-12-04 19:02:09 +01:00
|
|
|
v = this.buf32[icell+2];
|
|
|
|
i0 = char0 + (v & 0x00FFFFFF);
|
|
|
|
if ( this.buf[i0] === c ) { break; }
|
|
|
|
icell = this.buf32[icell+0];
|
|
|
|
if ( icell === 0 ) { return -1; }
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
// all characters in segment must match
|
|
|
|
let n = v >>> 24;
|
|
|
|
if ( n > 1 ) {
|
|
|
|
n -= 1;
|
|
|
|
if ( n > ineedle ) { return -1; }
|
|
|
|
i0 += 1;
|
|
|
|
const i1 = i0 + n;
|
2018-11-03 12:58:46 +01:00
|
|
|
do {
|
|
|
|
ineedle -= 1;
|
2018-12-04 19:02:09 +01:00
|
|
|
if ( this.buf[i0] !== this.buf[ineedle] ) { return -1; }
|
|
|
|
i0 += 1;
|
|
|
|
} while ( i0 < i1 );
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
// next segment
|
|
|
|
icell = this.buf32[icell+1];
|
|
|
|
if ( icell === 0 ) { break; }
|
|
|
|
if ( this.buf32[icell+2] === 0 ) {
|
|
|
|
if ( ineedle === 0 || this.buf[ineedle-1] === 0x2E ) {
|
|
|
|
return ineedle;
|
|
|
|
}
|
|
|
|
icell = this.buf32[icell+1];
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2017-11-02 20:49:11 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
return ineedle === 0 || this.buf[ineedle-1] === 0x2E ? ineedle : -1;
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
|
|
|
matchesWASM: null,
|
|
|
|
matches: null,
|
2017-11-02 20:49:11 +01:00
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
createOne: function(args) {
|
|
|
|
if ( Array.isArray(args) ) {
|
|
|
|
return new this.HNTrieRef(this, args[0], args[1]);
|
2017-11-02 20:49:11 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
// grow buffer if needed
|
|
|
|
if ( (this.buf32[HNTRIE_CHAR0_SLOT] - this.buf32[HNTRIE_TRIE1_SLOT]) < 12 ) {
|
|
|
|
this.growBuf(12, 0);
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
const iroot = this.buf32[HNTRIE_TRIE1_SLOT] >>> 2;
|
|
|
|
this.buf32[HNTRIE_TRIE1_SLOT] += 12;
|
|
|
|
this.buf32[iroot+0] = 0;
|
|
|
|
this.buf32[iroot+1] = 0;
|
|
|
|
this.buf32[iroot+2] = 0;
|
|
|
|
return new this.HNTrieRef(this, iroot, 0);
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
2017-11-02 20:49:11 +01:00
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
compileOne: function(trieRef) {
|
|
|
|
return [ trieRef.iroot, trieRef.size ];
|
|
|
|
},
|
2017-11-02 20:49:11 +01:00
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
addJS: function(iroot) {
|
|
|
|
let lhnchar = this.buf[255];
|
|
|
|
if ( lhnchar === 0 ) { return 0; }
|
|
|
|
let icell = iroot;
|
|
|
|
// special case: first node in trie
|
|
|
|
if ( this.buf32[icell+2] === 0 ) {
|
|
|
|
this.buf32[icell+2] = this.addSegment(lhnchar);
|
|
|
|
return 1;
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
// grow buffer if needed
|
|
|
|
if (
|
|
|
|
(this.buf32[HNTRIE_CHAR0_SLOT] - this.buf32[HNTRIE_TRIE1_SLOT]) < 24 ||
|
|
|
|
(this.buf.length - this.buf32[HNTRIE_CHAR1_SLOT]) < 256
|
|
|
|
) {
|
|
|
|
this.growBuf(24, 256);
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
//
|
|
|
|
const char0 = this.buf32[HNTRIE_CHAR0_SLOT];
|
|
|
|
let inext;
|
|
|
|
// find a matching cell: move down
|
2018-11-03 12:58:46 +01:00
|
|
|
for (;;) {
|
2018-12-04 19:02:09 +01:00
|
|
|
const vseg = this.buf32[icell+2];
|
|
|
|
// skip boundary cells
|
|
|
|
if ( vseg === 0 ) {
|
|
|
|
icell = this.buf32[icell+1];
|
|
|
|
continue;
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
let isegchar0 = char0 + (vseg & 0x00FFFFFF);
|
|
|
|
// if first character is no match, move to next descendant
|
|
|
|
if ( this.buf[isegchar0] !== this.buf[lhnchar-1] ) {
|
|
|
|
inext = this.buf32[icell+0];
|
|
|
|
if ( inext === 0 ) {
|
|
|
|
this.buf32[icell+0] = this.addCell(0, 0, this.addSegment(lhnchar));
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
icell = inext;
|
|
|
|
continue;
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
// 1st character was tested
|
|
|
|
let isegchar = 1;
|
|
|
|
lhnchar -= 1;
|
|
|
|
// find 1st mismatch in rest of segment
|
|
|
|
const lsegchar = vseg >>> 24;
|
|
|
|
if ( lsegchar !== 1 ) {
|
|
|
|
for (;;) {
|
|
|
|
if ( isegchar === lsegchar ) { break; }
|
|
|
|
if ( lhnchar === 0 ) { break; }
|
|
|
|
if ( this.buf[isegchar0+isegchar] !== this.buf[lhnchar-1] ) { break; }
|
|
|
|
isegchar += 1;
|
|
|
|
lhnchar -= 1;
|
|
|
|
}
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
// all segment characters matched
|
|
|
|
if ( isegchar === lsegchar ) {
|
|
|
|
inext = this.buf32[icell+1];
|
|
|
|
// needle remainder: no
|
|
|
|
if ( lhnchar === 0 ) {
|
|
|
|
// boundary cell already present
|
|
|
|
if ( inext === 0 || this.buf32[inext+2] === 0 ) { return 0; }
|
|
|
|
// need boundary cell
|
|
|
|
this.buf32[icell+1] = this.addCell(0, inext, 0);
|
|
|
|
}
|
|
|
|
// needle remainder: yes
|
|
|
|
else {
|
|
|
|
if ( inext !== 0 ) {
|
|
|
|
icell = inext;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
// boundary cell + needle remainder
|
|
|
|
inext = this.addCell(0, 0, 0);
|
|
|
|
this.buf32[icell+1] = inext;
|
|
|
|
this.buf32[inext+1] = this.addCell(0, 0, this.addSegment(lhnchar));
|
|
|
|
}
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
// some segment characters matched
|
|
|
|
else {
|
|
|
|
// split current cell
|
|
|
|
isegchar0 -= char0;
|
|
|
|
this.buf32[icell+2] = isegchar << 24 | isegchar0;
|
|
|
|
inext = this.addCell(
|
|
|
|
0,
|
|
|
|
this.buf32[icell+1],
|
|
|
|
lsegchar - isegchar << 24 | isegchar0 + isegchar
|
|
|
|
);
|
|
|
|
this.buf32[icell+1] = inext;
|
|
|
|
// needle remainder: no = need boundary cell
|
|
|
|
if ( lhnchar === 0 ) {
|
|
|
|
this.buf32[icell+1] = this.addCell(0, inext, 0);
|
|
|
|
}
|
|
|
|
// needle remainder: yes = need new cell for remaining characters
|
|
|
|
else {
|
|
|
|
this.buf32[inext+0] = this.addCell(0, 0, this.addSegment(lhnchar));
|
|
|
|
}
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
return 1;
|
2017-11-02 20:49:11 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
},
|
|
|
|
addWASM: null,
|
|
|
|
add: null,
|
|
|
|
|
|
|
|
optimize: function() {
|
|
|
|
this.shrinkBuf();
|
|
|
|
return {
|
|
|
|
byteLength: this.buf.byteLength,
|
|
|
|
char0: this.buf32[HNTRIE_CHAR0_SLOT],
|
|
|
|
};
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
fromIterable: function(hostnames, add) {
|
|
|
|
if ( add === undefined ) { add = 'add'; }
|
|
|
|
const trieRef = this.createOne();
|
|
|
|
for ( const hn of hostnames ) {
|
|
|
|
trieRef[add](hn);
|
2017-11-02 20:49:11 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
return trieRef;
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
|
|
|
|
2019-02-14 19:33:55 +01:00
|
|
|
serialize: function(encoder) {
|
|
|
|
if ( encoder instanceof Object ) {
|
|
|
|
return encoder.encode(
|
|
|
|
this.buf32.buffer,
|
|
|
|
this.buf32[HNTRIE_CHAR1_SLOT]
|
|
|
|
);
|
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
return Array.from(
|
|
|
|
new Uint32Array(
|
|
|
|
this.buf32.buffer,
|
|
|
|
0,
|
|
|
|
this.buf32[HNTRIE_CHAR1_SLOT] + 3 >>> 2
|
|
|
|
)
|
|
|
|
);
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
|
|
|
|
2019-02-14 19:33:55 +01:00
|
|
|
unserialize: function(selfie, decoder) {
|
|
|
|
const shouldDecode = typeof selfie === 'string';
|
|
|
|
let byteLength = shouldDecode
|
|
|
|
? decoder.decodeSize(selfie)
|
|
|
|
: selfie.length << 2;
|
|
|
|
byteLength = byteLength + HNTRIE_PAGE_SIZE-1 & ~(HNTRIE_PAGE_SIZE-1);
|
2018-12-04 19:02:09 +01:00
|
|
|
if ( this.wasmMemory !== null ) {
|
|
|
|
const pageCountBefore = this.buf.length >>> 16;
|
2019-02-14 19:33:55 +01:00
|
|
|
const pageCountAfter = byteLength >>> 16;
|
2018-12-04 19:02:09 +01:00
|
|
|
if ( pageCountAfter > pageCountBefore ) {
|
|
|
|
this.wasmMemory.grow(pageCountAfter - pageCountBefore);
|
|
|
|
this.buf = new Uint8Array(this.wasmMemory.buffer);
|
|
|
|
this.buf32 = new Uint32Array(this.buf.buffer);
|
|
|
|
}
|
2019-02-14 19:33:55 +01:00
|
|
|
} else if ( byteLength > this.buf.length ) {
|
|
|
|
this.buf = new Uint8Array(byteLength);
|
|
|
|
this.buf32 = new Uint32Array(this.buf.buffer);
|
|
|
|
}
|
|
|
|
if ( shouldDecode ) {
|
|
|
|
decoder.decode(selfie, this.buf.buffer);
|
2018-11-03 12:58:46 +01:00
|
|
|
} else {
|
2019-02-14 19:33:55 +01:00
|
|
|
this.buf32.set(selfie);
|
2017-11-02 20:49:11 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
this.needle = '';
|
|
|
|
},
|
|
|
|
|
|
|
|
//--------------------------------------------------------------------------
|
|
|
|
// Class to hold reference to a specific trie
|
|
|
|
//--------------------------------------------------------------------------
|
|
|
|
|
|
|
|
HNTrieRef: function(container, iroot, size) {
|
|
|
|
this.container = container;
|
|
|
|
this.iroot = iroot;
|
|
|
|
this.size = size;
|
|
|
|
},
|
|
|
|
|
|
|
|
//--------------------------------------------------------------------------
|
|
|
|
// Private methods
|
|
|
|
//--------------------------------------------------------------------------
|
|
|
|
|
|
|
|
addCell: function(idown, iright, v) {
|
|
|
|
let icell = this.buf32[HNTRIE_TRIE1_SLOT];
|
|
|
|
this.buf32[HNTRIE_TRIE1_SLOT] = icell + 12;
|
|
|
|
icell >>>= 2;
|
|
|
|
this.buf32[icell+0] = idown;
|
|
|
|
this.buf32[icell+1] = iright;
|
|
|
|
this.buf32[icell+2] = v;
|
|
|
|
return icell;
|
|
|
|
},
|
|
|
|
|
|
|
|
addSegment: function(lsegchar) {
|
|
|
|
if ( lsegchar === 0 ) { return 0; }
|
|
|
|
let char1 = this.buf32[HNTRIE_CHAR1_SLOT];
|
|
|
|
const isegchar = char1 - this.buf32[HNTRIE_CHAR0_SLOT];
|
|
|
|
let i = lsegchar;
|
|
|
|
do {
|
|
|
|
this.buf[char1++] = this.buf[--i];
|
|
|
|
} while ( i !== 0 );
|
|
|
|
this.buf32[HNTRIE_CHAR1_SLOT] = char1;
|
|
|
|
return (lsegchar << 24) | isegchar;
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
growBuf: function(trieGrow, charGrow) {
|
|
|
|
const char0 = Math.max(
|
|
|
|
(this.buf32[HNTRIE_TRIE1_SLOT] + trieGrow + HNTRIE_PAGE_SIZE-1) & ~(HNTRIE_PAGE_SIZE-1),
|
|
|
|
this.buf32[HNTRIE_CHAR0_SLOT]
|
|
|
|
);
|
|
|
|
const char1 = char0 + this.buf32[HNTRIE_CHAR1_SLOT] - this.buf32[HNTRIE_CHAR0_SLOT];
|
|
|
|
const bufLen = Math.max(
|
|
|
|
(char1 + charGrow + HNTRIE_PAGE_SIZE-1) & ~(HNTRIE_PAGE_SIZE-1),
|
|
|
|
this.buf.length
|
|
|
|
);
|
|
|
|
this.resizeBuf(bufLen, char0);
|
|
|
|
},
|
|
|
|
|
|
|
|
shrinkBuf: function() {
|
|
|
|
// Can't shrink WebAssembly.Memory
|
|
|
|
if ( this.wasmMemory !== null ) { return; }
|
|
|
|
const char0 = this.buf32[HNTRIE_TRIE1_SLOT] + 24;
|
|
|
|
const char1 = char0 + this.buf32[HNTRIE_CHAR1_SLOT] - this.buf32[HNTRIE_CHAR0_SLOT];
|
|
|
|
const bufLen = char1 + 256;
|
|
|
|
this.resizeBuf(bufLen, char0);
|
|
|
|
},
|
|
|
|
|
|
|
|
resizeBuf: function(bufLen, char0) {
|
|
|
|
bufLen = bufLen + HNTRIE_PAGE_SIZE-1 & ~(HNTRIE_PAGE_SIZE-1);
|
|
|
|
if (
|
|
|
|
bufLen === this.buf.length &&
|
|
|
|
char0 === this.buf32[HNTRIE_CHAR0_SLOT]
|
|
|
|
) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
const charDataLen = this.buf32[HNTRIE_CHAR1_SLOT] - this.buf32[HNTRIE_CHAR0_SLOT];
|
|
|
|
if ( this.wasmMemory !== null ) {
|
|
|
|
const pageCount = (bufLen >>> 16) - (this.buf.byteLength >>> 16);
|
|
|
|
if ( pageCount > 0 ) {
|
|
|
|
this.wasmMemory.grow(pageCount);
|
|
|
|
this.buf = new Uint8Array(this.wasmMemory.buffer);
|
|
|
|
this.buf32 = new Uint32Array(this.wasmMemory.buffer);
|
|
|
|
}
|
|
|
|
} else if ( bufLen !== this.buf.length ) {
|
|
|
|
const newBuf = new Uint8Array(bufLen);
|
|
|
|
newBuf.set(
|
|
|
|
new Uint8Array(
|
|
|
|
this.buf.buffer,
|
|
|
|
0,
|
|
|
|
this.buf32[HNTRIE_TRIE1_SLOT]
|
|
|
|
),
|
|
|
|
0
|
|
|
|
);
|
|
|
|
newBuf.set(
|
|
|
|
new Uint8Array(
|
|
|
|
this.buf.buffer,
|
|
|
|
this.buf32[HNTRIE_CHAR0_SLOT],
|
|
|
|
charDataLen
|
|
|
|
),
|
|
|
|
char0
|
|
|
|
);
|
|
|
|
this.buf = newBuf;
|
|
|
|
this.buf32 = new Uint32Array(this.buf.buffer);
|
|
|
|
this.buf32[HNTRIE_CHAR0_SLOT] = char0;
|
|
|
|
this.buf32[HNTRIE_CHAR1_SLOT] = char0 + charDataLen;
|
|
|
|
}
|
|
|
|
if ( char0 !== this.buf32[HNTRIE_CHAR0_SLOT] ) {
|
|
|
|
this.buf.set(
|
|
|
|
new Uint8Array(
|
|
|
|
this.buf.buffer,
|
|
|
|
this.buf32[HNTRIE_CHAR0_SLOT],
|
|
|
|
charDataLen
|
|
|
|
),
|
|
|
|
char0
|
|
|
|
);
|
|
|
|
this.buf32[HNTRIE_CHAR0_SLOT] = char0;
|
|
|
|
this.buf32[HNTRIE_CHAR1_SLOT] = char0 + charDataLen;
|
|
|
|
}
|
|
|
|
},
|
|
|
|
|
|
|
|
initWASM: function(module) {
|
2019-02-01 15:09:51 +01:00
|
|
|
if ( module instanceof WebAssembly.Module === false ) {
|
|
|
|
return Promise.resolve(null);
|
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
if ( this.wasmInstancePromise === null ) {
|
|
|
|
const memory = new WebAssembly.Memory({ initial: 2 });
|
|
|
|
this.wasmInstancePromise = WebAssembly.instantiate(
|
|
|
|
module,
|
|
|
|
{
|
|
|
|
imports: {
|
|
|
|
memory,
|
|
|
|
growBuf: this.growBuf.bind(this, 24, 256)
|
|
|
|
}
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
);
|
|
|
|
this.wasmInstancePromise.then(instance => {
|
|
|
|
this.wasmMemory = memory;
|
|
|
|
const pageCount = this.buf.byteLength + HNTRIE_PAGE_SIZE-1 >>> 16;
|
|
|
|
if ( pageCount > 1 ) {
|
|
|
|
memory.grow(pageCount - 1);
|
|
|
|
}
|
|
|
|
const buf = new Uint8Array(memory.buffer);
|
|
|
|
buf.set(this.buf);
|
|
|
|
this.buf = buf;
|
|
|
|
this.buf32 = new Uint32Array(this.buf.buffer);
|
|
|
|
this.matches = this.matchesWASM = instance.exports.matches;
|
|
|
|
this.add = this.addWASM = instance.exports.add;
|
|
|
|
});
|
2017-11-02 20:49:11 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
return this.wasmInstancePromise;
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
2018-12-04 19:02:09 +01:00
|
|
|
};
|
2018-11-03 12:58:46 +01:00
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
/******************************************************************************/
|
|
|
|
|
|
|
|
HNTrieContainer.prototype.HNTrieRef.prototype = {
|
|
|
|
add: function(hn) {
|
|
|
|
if ( this.container.setNeedle(hn).add(this.iroot) === 1 ) {
|
|
|
|
this.size += 1;
|
|
|
|
return true;
|
2018-11-03 12:58:46 +01:00
|
|
|
}
|
2018-12-04 19:02:09 +01:00
|
|
|
return false;
|
|
|
|
},
|
|
|
|
addJS: function(hn) {
|
|
|
|
if ( this.container.setNeedle(hn).addJS(this.iroot) === 1 ) {
|
|
|
|
this.size += 1;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
},
|
|
|
|
addWASM: function(hn) {
|
|
|
|
if ( this.container.setNeedle(hn).addWASM(this.iroot) === 1 ) {
|
|
|
|
this.size += 1;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
},
|
|
|
|
matches: function(needle) {
|
|
|
|
return this.container.setNeedle(needle).matches(this.iroot);
|
|
|
|
},
|
|
|
|
matchesJS: function(needle) {
|
|
|
|
return this.container.setNeedle(needle).matchesJS(this.iroot);
|
|
|
|
},
|
|
|
|
matchesWASM: function(needle) {
|
|
|
|
return this.container.setNeedle(needle).matchesWASM(this.iroot);
|
|
|
|
},
|
|
|
|
[Symbol.iterator]: function() {
|
|
|
|
return {
|
|
|
|
value: undefined,
|
|
|
|
done: false,
|
|
|
|
next: function() {
|
|
|
|
if ( this.icell === 0 ) {
|
|
|
|
if ( this.forks.length === 0 ) {
|
|
|
|
this.value = undefined;
|
|
|
|
this.done = true;
|
|
|
|
return this;
|
|
|
|
}
|
|
|
|
this.charPtr = this.forks.pop();
|
|
|
|
this.icell = this.forks.pop();
|
|
|
|
}
|
|
|
|
for (;;) {
|
|
|
|
const idown = this.container.buf32[this.icell+0];
|
|
|
|
if ( idown !== 0 ) {
|
|
|
|
this.forks.push(idown, this.charPtr);
|
|
|
|
}
|
|
|
|
const v = this.container.buf32[this.icell+2];
|
|
|
|
let i0 = this.container.buf32[HNTRIE_CHAR0_SLOT] + (v & 0x00FFFFFF);
|
|
|
|
const i1 = i0 + (v >>> 24);
|
|
|
|
while ( i0 < i1 ) {
|
|
|
|
this.charPtr -= 1;
|
|
|
|
this.charBuf[this.charPtr] = this.container.buf[i0];
|
|
|
|
i0 += 1;
|
|
|
|
}
|
|
|
|
this.icell = this.container.buf32[this.icell+1];
|
|
|
|
if ( this.icell === 0 ) {
|
|
|
|
return this.toHostname();
|
|
|
|
}
|
|
|
|
if ( this.container.buf32[this.icell+2] === 0 ) {
|
|
|
|
this.icell = this.container.buf32[this.icell+1];
|
|
|
|
return this.toHostname();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
toHostname: function() {
|
|
|
|
this.value = this.textDecoder.decode(
|
|
|
|
new Uint8Array(this.charBuf.buffer, this.charPtr)
|
|
|
|
);
|
|
|
|
return this;
|
|
|
|
},
|
|
|
|
container: this.container,
|
|
|
|
icell: this.iroot,
|
|
|
|
charBuf: new Uint8Array(256),
|
|
|
|
charPtr: 256,
|
|
|
|
forks: [],
|
|
|
|
textDecoder: new TextDecoder()
|
|
|
|
};
|
2018-11-03 12:58:46 +01:00
|
|
|
},
|
2017-11-02 20:49:11 +01:00
|
|
|
};
|
|
|
|
|
2018-11-03 12:58:46 +01:00
|
|
|
/******************************************************************************/
|
2017-11-02 20:49:11 +01:00
|
|
|
|
2018-12-04 19:02:09 +01:00
|
|
|
// Code below is to attempt to load a WASM module which implements:
|
|
|
|
//
|
|
|
|
// - HNTrieContainer.add()
|
|
|
|
// - HNTrieContainer.matches()
|
|
|
|
//
|
|
|
|
// The WASM module is entirely optional, the JS implementations will be
|
|
|
|
// used should the WASM module be unavailable for whatever reason.
|
|
|
|
|
2018-11-03 12:58:46 +01:00
|
|
|
(function() {
|
2018-12-04 19:02:09 +01:00
|
|
|
HNTrieContainer.wasmModulePromise = null;
|
|
|
|
|
2018-11-03 12:58:46 +01:00
|
|
|
// Default to javascript version.
|
2018-12-04 19:02:09 +01:00
|
|
|
HNTrieContainer.prototype.matches = HNTrieContainer.prototype.matchesJS;
|
|
|
|
HNTrieContainer.prototype.add = HNTrieContainer.prototype.addJS;
|
2017-11-02 20:49:11 +01:00
|
|
|
|
2018-11-03 12:58:46 +01:00
|
|
|
if (
|
|
|
|
typeof WebAssembly !== 'object' ||
|
2018-12-04 19:02:09 +01:00
|
|
|
typeof WebAssembly.compileStreaming !== 'function'
|
2018-11-03 12:58:46 +01:00
|
|
|
) {
|
|
|
|
return;
|
|
|
|
}
|
2017-11-02 20:49:11 +01:00
|
|
|
|
2018-11-03 12:58:46 +01:00
|
|
|
// Soft-dependency on vAPI so that the code here can be used outside of
|
|
|
|
// uBO (i.e. tests, benchmarks)
|
|
|
|
if (
|
|
|
|
typeof vAPI === 'object' &&
|
|
|
|
vAPI.webextFlavor.soup.has('firefox') === false
|
|
|
|
) {
|
|
|
|
return;
|
2017-11-02 20:49:11 +01:00
|
|
|
}
|
|
|
|
|
2018-11-16 16:19:06 +01:00
|
|
|
// Soft-dependency on µBlock's advanced settings so that the code here can
|
|
|
|
// be used outside of uBO (i.e. tests, benchmarks)
|
|
|
|
if (
|
|
|
|
typeof µBlock === 'object' &&
|
|
|
|
µBlock.hiddenSettings.disableWebAssembly === true
|
|
|
|
) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2018-11-03 12:58:46 +01:00
|
|
|
// The wasm module will work only if CPU is natively little-endian,
|
2018-12-04 19:02:09 +01:00
|
|
|
// as we use native uint32 array in our js code.
|
2018-11-03 12:58:46 +01:00
|
|
|
const uint32s = new Uint32Array(1);
|
|
|
|
const uint8s = new Uint8Array(uint32s.buffer);
|
|
|
|
uint32s[0] = 1;
|
2018-12-04 19:02:09 +01:00
|
|
|
if ( uint8s[0] !== 1 ) { return; }
|
2017-11-02 20:49:11 +01:00
|
|
|
|
2018-11-04 21:52:25 +01:00
|
|
|
// The directory from which the current script was fetched should also
|
|
|
|
// contain the related WASM file. The script is fetched from a trusted
|
|
|
|
// location, and consequently so will be the related WASM file.
|
2018-11-03 12:58:46 +01:00
|
|
|
let workingDir;
|
|
|
|
{
|
2018-11-04 22:26:02 +01:00
|
|
|
const url = new URL(document.currentScript.src);
|
|
|
|
const match = /[^\/]+$/.exec(url.pathname);
|
|
|
|
if ( match !== null ) {
|
|
|
|
url.pathname = url.pathname.slice(0, match.index);
|
|
|
|
}
|
|
|
|
workingDir = url.href;
|
2017-11-02 20:49:11 +01:00
|
|
|
}
|
|
|
|
|
2019-02-01 14:20:43 +01:00
|
|
|
HNTrieContainer.wasmModulePromise = fetch(
|
|
|
|
workingDir + 'wasm/hntrie.wasm',
|
|
|
|
{ mode: 'same-origin' }
|
|
|
|
).then(
|
|
|
|
WebAssembly.compileStreaming
|
2018-12-04 19:02:09 +01:00
|
|
|
).catch(reason => {
|
|
|
|
HNTrieContainer.wasmModulePromise = null;
|
2019-02-14 19:33:55 +01:00
|
|
|
log.info(reason);
|
2018-11-03 12:58:46 +01:00
|
|
|
});
|
|
|
|
})();
|