external/uBlock - Forgejo: Beyond coding. We forge.

external/uBlock

mirror of https://github.com/gorhill/uBlock.git synced 2024-11-11 17:41:03 +01:00

Author	SHA1	Message	Date
Raymond Hill	adabb56dc9	Do not store impossible to match filters in HNTrie Consider the two following filters: example.com www.example.com This commit make it so that if the first filter is already present in a given HNTrie, the second filter will not be stored, since HNTrie will _always_ return the first filter as a match whenever the hostname to match is example.com or any subdomain of example.com. The detection of such pointless filters is virtually free when adding a hostname to an HNTrie instance (given how data is stored in the trie), so in practice no overhead is incurred to detect such pointless filters. The ability to ignore impossible to match filters in HNTrie instances will _especially_ benefit those using large hosts files. Examples of how this helps using real configurations: - Default lists: 444 filters out of 100,382 were ignored as a result of this commit. - Default lists + "Energized Ultimate Protection": 283,669 filters out of 903,235 were ignored as a result of this commit. Side note: There was no measurable difference between the two configurations above in the performance of the matching algorithm as reported by the built-in benchmark tool.	2019-04-29 13:15:16 -04:00
Raymond Hill	ac58b8e688	Make token hashes fit within a 32-bit integer The staticNetFilteringEngine uses token hashes to store/lookup filters into Map objects. Before this commit, the tokens were encoded into token hashes as JS numbers (not exceeding MAX_SAFE_INTEGER) using at most the 8 first characters of the token. With this commit, token hashes are now restricted to fit into 32-bit integers, and are derived from at most the 7 first characters. This improves filter look-up performance as per built-in benchmark().	2019-04-28 10:15:15 -04:00
Raymond Hill	96dce22218	Increase resolution of known-token lookup table Related commit: - `69a43e07c4` Using 32 bits of token hash rather than just the 16 lower bits does help discard more unknown tokens. Using the default filter lists, the known-token lookup table is populated by 12,276 entries, out of 65,536, thus making the case that theoretically there is a lot of possible tokens which can be discarded. In practice, running the built-in staticNetFilteringEngine.benchmark() with default filter lists, I find that 1,518,929 tokens were skipped out of 4,441,891 extracted tokens, or 34%.	2019-04-27 08:18:01 -04:00
Raymond Hill	69a43e07c4	Ignore unknown tokens in urlTokenizer.getTokens() Given that all tokens extracted from one single URL are potentially iterated multiple times in a single URL-matching cycle, it pays to ignore extracted tokens which are known to not be used anywhere in the static filtering engine. The gain in processing a single network request in the static filtering engine can become especially high when dealing with long and random-looking URLs, which URLs have a high likelihood of containing a majority of tokens which are known to not be in use.	2019-04-26 17:14:00 -04:00
Raymond Hill	19ece97b0c	Leverage compile-time token information in new fitler classes Related commit: - `99390390fc` The token information available at compile time can be stored in the filter to be used at match() time. This allows the use of startsWith() rather than a more costly indexOf() call as a first quick test to detect mismatches.	2019-04-26 11:16:47 -04:00
Raymond Hill	99390390fc	Introduce three more specialized filter classes to avoid regexes Performance- and memory-related work. Three more classes have been created to avoid regex-based filters internally. Purpose is to enforce filters which have only one single wildcard in their pattern, a common occurrence. The filter pattern is split in two literal string segments. Similar as above, with the added condition that the filter is hostname-anchored (`\|\|`). The "Wildcard2" variant is a further specialization to enforce filters where the only wildcard is immediately preceded by the `^` special character, again a very common occurrence. Using two literal string segments in lieu of regexes allows to quickly detect a mismatch by just testing the first segment. Additionally, this reduces memory footprint as regexes are much more expensive memory-wise than plain strings. These three new filter classes allow to replace the use of 5276 regex-based filters internally with plain string-based filters. Often-called isHnAnchored() has been further fine-tuned to avoid as much work as possible. I have also observed that using an arrow function for closure-purpose helps measurably performance, as per built-in benchmark.	2019-04-25 17:48:08 -04:00
Raymond Hill	fa83744b58	Use a sequence of base 64 numbers to encode array buffers The purpose of using a custom base128 encoder is to convert array buffers into strings, to allow a direct string-to-array buffer conversion at load time: string => array buffer Whereas a JSON array would require an extra step: JSON array as string => JS array => array buffer Turns out that the current use of a custom base128 encoding results in a significantly larger selfie storage usage when converting array buffers into strings. Speculation: possibly the browser convert the strings to save into JSON strings internally. Since the custom base128 encoder is likely to cause the resulting string to contain a lot of unprintable ASCII characters, these will need to be escaped when converted to JSON -- escaped characters occupy more space than non-escaped ones. Using a sequence of base 64 numbers means only printable will be present in the output string, hence no escaping necessary. I have observed significant reduction in storage usage for selfie purpose.	2019-04-20 09:06:54 -04:00
Raymond Hill	3f3a1543ea	Add HNTrie-based filter classes to store origin-only filters Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622 Following STrie-related work in above issue, I noticed that a large number of filters in EasyList were filters which only had to match against the document origin. For instance, among just the top 10 most populous buckets, there were four such buckets with over hundreds of entries each: - bits: 72, token: "http", 146 entries - bits: 72, token: "https", 139 entries - bits: 88, token: "http", 122 entries - bits: 88, token: "https", 118 entries These filters in these buckets have to be matched against all the network requests. In order to leverage HNTrie for these filters[1], they are now handled in a special way so as to ensure they all end up in a single HNTrie (per bucket), which means that instead of scanning hundreds of entries per URL, there is now a single scan per bucket per URL for these apply-everywhere filters. Now, any filter which fulfill ALL the following condition will be processed in a special manner internally: - Is of the form `\|https://` or `\|http://` or ``; and - Does have a `domain=` option; and - Does not have a negated domain in its `domain=` option; and - Does not have `csp=` option; and - Does not have a `redirect=` option If a filter does not fulfill ALL the conditions above, no change in behavior. A filter which matches ALL of the above will be processed in a special manner: - The `domain=` option will be decomposed so as to create as many distinct filter as there is distinct value in the `domain=` option - This also apply to the `badfilter` version of the filter, which means it now become possible to `badfilter` only one of the distinct filter without having to `badfilter` all of them. - The logger will always report these special filters with only a single hostname in the `domain=` option. ** [1] HNTrie is currently WASM-ed on Firefox.	2019-04-19 16:33:46 -04:00
Raymond Hill	c229003d31	Performance + code maintenance work on static network filtering engine Implement a plain string trie container class: STrieContainer. Make use of STrieContainer where beneficial Some filter buckets can grow quite large, and in such case coalescing "trieable" filter classes into a single trie reduces lookup performance and memory usage. For instance, at time of commit, the filter bucket for the `ad` keyword contains 919 entries[1]. Coalescing trieable filters of the same class into a single plain string trie reduced the size of the bucket into 50 entries + two tries which are scanned only once each whenever the bucket is visited. [1] Enter the following code at uBO's dev console: µBlock.staticNetFilteringEngine.categories.get(0).get(µBlock.urlTokenizer.tokenHashFromString('ad')) Refactor static network filtering engine code to make use of ES6's syntactic sugar `class`. Change first auto-update run from 7 to 5 minutes.	2019-04-14 16:45:20 -04:00
Raymond Hill	87feb47b51	Support disabling `suspendTabsUntilReady` in Firefox The value of `suspendTabsUntilReady` was disregarded in Firefox and uBO defaulted to always defer tab loading until it was ready. This commit allows to disable the deferring of tab loading in Firefox. The new valid values for `suspendTabsUntilReady` are: - `unset`: leave it to the platform to pick the optimal behavior (default) - `no`: do no suspend tab loading at launch time - `yes`: suspend tab loading at launch time	2019-02-19 12:30:37 -05:00
Raymond Hill	426a6ea9a7	Fix spurious output at uBO's dev console Regression from https://github.com/gorhill/uBlock/commit/0d369cda21bb	2019-02-18 14:41:04 -05:00
Raymond Hill	e93062bcdf	Spin-off FilterOrigin flavors into standalone classes This removes the derivation of FilterOrigin flavors from FilterOrigin itself and simplify code paths. FilterOrigin flavors are small specialized classes, no need to overcomplicate with derivation. Specifically, this removes an indirect call to reach the match() method.	2019-02-16 12:16:30 -05:00
Raymond Hill	ed7e34fb07	Refactor selfie generation into a more flexible persistence mechanism The motivation is to address the higher peak memory usage at launch time with 3rd-gen HNTrie when a selfie was present. The selfie generation prior to this change was to collect all filtering data into a single data structure, and then to serialize that whole structure at once into storage (using JSON.stringify). However, HNTrie serialization requires that a large UintArray32 be converted into a plain JS array, which itslef would be indirectly converted into a JSON string. This was the main reason why peak memory usage would be higher at launch from selfie, since the JSON string would need to be wholly unserialized into JS objects, which themselves would need to be converted into more specialized data structures (like that Uint32Array one). The solution to lower peak memory usage at launch is to refactor selfie generation to allow a more piecemeal approach: each filtering component is given the ability to serialize itself rather than to be forced to be embedded in the master selfie. With this approach, the HNTrie buffer can now serialize to its own storage by converting the buffer data directly into a string which can be directly sent to storage. This avoiding expensive intermediate steps such as converting into a JS array and then to a JSON string. As part of the refactoring, there was also opportunistic code upgrade to ES6 and Promise (eventually all of uBO's code will be proper ES6). Additionally, the polyfill to bring getBytesInUse() to Firefox has been revisited to replace the rather expensive previous implementation with an implementation with virtually no overhead.	2019-02-14 13:33:55 -05:00
Raymond Hill	a026e9ae54	Fix reverting use of IndexedDB as default cache storage on Chromium Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/399 The advanced setting `cacheStorageAPI` has been added to allow a user to force the use of IndexedDB as cache storage. Set to `IndexedDB` to force use of IndexedDB. Default to `unset`.	2019-01-25 18:49:30 -05:00
Raymond Hill	64bea27881	Add ability to control auto-commenting at filter creation time Related issues: - https://github.com/uBlockOrigin/uBlock-issues/issues/372 - https://github.com/gorhill/uBlock/issues/93 A new advanced settings has been added: `autoCommentFilterTemplate`. Default value is `{{date}} {{origin}}`. Placeholders are identified by `{{...}}`. There are currently only three placeholders supported: - `{{date}}`: will be replaced with current date - `{{time}}`: will be replaced with current time - `{{origin}}`: will be replaced with site information on which the filter(s) was created If no placeholder is found in `autoCommentFilterTemplate`, this will disable auto-commenting. So one can use `-` to disable auto-commenting. Additionally, if auto-commenting is enabled, uBO will not emit a comment if an emitted comment would be a duplicate of the last one found in the user filter list.	2019-01-08 07:37:50 -05:00
Raymond Hill	610ca2684b	Remove (broken) benchmark pane	2018-12-21 12:01:24 -05:00
Raymond Hill	1b6fea16da	3rd-gen hntrie, suitable for large set of hostnames	2018-12-04 13:02:09 -05:00
Raymond Hill	2189f020df	add new advanced setting to disable use of WASM for dev purpose	2018-11-16 10:19:06 -05:00
Raymond Hill	d7d544cda0	Squashed commit of the following: commit 7c6cacc59b27660fabacb55d668ef099b222a9e6 Author: Raymond Hill <rhill@raymondhill.net> Date: Sat Nov 3 08:52:51 2018 -0300 code review: finalize support for wasm-based hntrie commit 8596ed80e3bdac2c36e3c860b51e7189f6bc8487 Merge: cbe1f2e `000eb82` Author: Raymond Hill <rhill@raymondhill.net> Date: Sat Nov 3 08:41:40 2018 -0300 Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm commit cbe1f2e2f38484d42af3204ec7f1b5decd30f99e Merge: 270fc7f `dbb7e80` Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 17:43:20 2018 -0300 Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm commit 270fc7f9b3b73d79e6355522c1a42ce782fe7e5c Merge: d2a89cf `d693d4f` Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 16:21:08 2018 -0300 Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm commit d2a89cf28f0816ffd4617c2c7b4ccfcdcc30e1b4 Merge: d7afc78 `649f82f` Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 14:54:58 2018 -0300 Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm commit d7afc78b5f5675d7d34c5a1d0ec3099a77caef49 Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 13:56:11 2018 -0300 finalize wasm-based hntrie implementation commit e7b9e043cf36ad055791713e34eb0322dec84627 Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 08:14:02 2018 -0300 add first-pass implementation of wasm version of hntrie commit 1015cb34624f3ef73ace58b58fe4e03dfc59897f Author: Raymond Hill <rhill@raymondhill.net> Date: Wed Oct 31 17:16:47 2018 -0300 back up draft work toward experimenting with wasm hntries	2018-11-03 08:58:46 -03:00
Raymond Hill	d693d4fba3	add new "Benchmarks" pane in dashboard Purpose is strictly for development purpose. The new pane can be enabled by setting the advanced setting `benchmarkingPane` to `true`.	2018-11-02 16:18:50 -03:00
Raymond Hill	6d9382a501	fix https://github.com/uBlockOrigin/uBlock-issues/issues/77	2018-10-29 09:56:51 -03:00
Raymond Hill	9039874fc9	refactor some webRequest-related code (now that firefox legacy is out of the way)	2018-10-28 10:58:25 -03:00
Raymond Hill	cabb0d36b6	fix https://github.com/gorhill/uBlock/issues/3371	2018-10-23 14:01:08 -03:00
Raymond Hill	e107cbb370	revised fix for https://github.com/uBlockOrigin/uBlock-issues/issues/229	2018-09-21 09:16:46 -04:00
Raymond Hill	06fe7e6871	code review for static extended filtering, notably: - use domain-derived integer hash to store filters - remove code meant for firefox/legacy - properly handle subdomains of entity-based filters	2018-09-09 08:10:09 -04:00
Raymond Hill	89c073f3e9	fix https://github.com/uBlockOrigin/uBlock-issues/issues/209	2018-09-07 09:11:07 -04:00
Raymond Hill	3c85c03194	fix #308 , #3436 , https://github.com/uBlockOrigin/uBlock-issues/issues/155 <https://github.com/gorhill/uBlock/issues/3436>: a new per-site switch has been added, no-scripting, which purpose is to wholly disable/enable javascript for a given site. This new switch has precedence over all other ways javascript can be disabled, including precedence over dynamic filtering rules. The popup panel will report the number of script resources which have been seen by uBO for the current page. There is a minor inaccuracy to be fixed regarding the count, and which fix requires to extend request journaling. <https://github.com/gorhill/uBlock/issues/308>: the `noscript` tags will now be respected when the new no-scripting switch is in effect on a given site. A default setting has been added to the _Settings_ pane to disable/enable globally the new no-script switch, such that one can work in default-deny mode regarding javascript execution. <https://github.com/uBlockOrigin/uBlock-issues/issues/155>: a new hidden setting, `requestJournalProcessPeriod`, has been added to allow controlling the delay before uBO internally process it's network request journal queue. Default to 1000 (milliseconds).	2018-08-31 18:47:02 -04:00
Raymond Hill	b7c4ee0c45	enable cache storage compression by default	2018-08-21 12:59:35 -04:00
Raymond Hill	e163080518	added optional lz4 compression for cache storage (https://github.com/uBlockOrigin/uBlock-issues/issues/141 ) Squashed commit of the following: commit 6a8473822537636ac54d5dabdb14472114bb730b Author: Raymond Hill <rhill@raymondhill.net> Date: Mon Aug 6 10:56:44 2018 -0400 remove remnant of snappyjs and spurious instruction commit 9a4b709bee97d3cc2235fab602359fa5953bdb46 Author: Raymond Hill <rhill@raymondhill.net> Date: Mon Aug 6 09:48:58 2018 -0400 make cache storage compression optionally available on all platforms New advanced setting: `cacheStorageCompression`. Default is `false`. commit 22ee6547f2f7c9c5aefe25dea1262a1b31612155 Author: Raymond Hill <rhill@raymondhill.net> Date: Sun Aug 5 19:16:26 2018 -0400 remove Chromium from lz4 experiment commit ee3e201c45afe983508f70713a2d43af74737d8d Author: Raymond Hill <rhill@raymondhill.net> Date: Sun Aug 5 18:52:43 2018 -0400 import lz4-block-codec.wasm library commit 883a3118efcfd749c82356fde7134754d6ae371d Author: Raymond Hill <rhill@raymondhill.net> Date: Sun Aug 5 18:50:46 2018 -0400 implement storage compression through lz4-wasm [draft] commit 48d1ccaba407de447c2cd6747dc3a90839c260a7 Merge: 8ae77e6 `b34c897` Author: Raymond Hill <rhill@raymondhill.net> Date: Sat Aug 4 08:56:51 2018 -0400 Merge branch 'master' of github.com:gorhill/uBlock into lz4 commit 8ae77e6aeeaa85af335e664c2560d2afd37288c6 Author: Raymond Hill <rhill@raymondhill.net> Date: Wed Jul 25 18:17:45 2018 -0400 experiment with compression	2018-08-06 12:34:41 -04:00
Raymond Hill	ef455deb0a	fix https://github.com/uBlockOrigin/uBlock-issues/issues/106	2018-07-18 18:00:55 -04:00
Raymond Hill	798f8dab9d	reduce baseline memory at selfie-load time	2018-06-01 07:54:31 -04:00
Raymond Hill	ab867dedf5	improve in-memory storage of specific cosmetic filters + more ES6 - collate together specific filters with same base domain - replace string-based hash to integer-based hash - revisit code to benefit from ES6-specific syntax	2018-05-31 10:41:03 -04:00
Raymond Hill	c6cab02999	fine-tune logger-related code - Default to being detached - Default to "Current tab" - Append current tab title to "Current tab" entry - Avoid iterating through all tabs when no change	2018-05-27 08:31:17 -04:00
Raymond Hill	b4306e3297	code reivew of `c5d8588118`: immediate scriptlets injection works well only on Chromium-based browsers for now	2018-05-18 10:19:14 -04:00
Raymond Hill	c5d8588118	inject scriptlets earlier (experimental) (ex. https://github.com/uBlockOrigin/uAssets/issues/2300 )	2018-05-17 07:33:21 -04:00
Raymond Hill	3923520b87	remove no longer needed platform-dependent polyfill.js	2018-04-27 08:36:38 -04:00
Raymond Hill	427d0fd0ff	fix https://github.com/uBlockOrigin/uBlock-issues/issues/21	2018-04-24 17:12:41 -04:00
Raymond Hill	86e80d43d6	fix https://github.com/gorhill/uBlock/issues/3693#issuecomment-379782428	2018-04-20 11:26:11 -04:00
Raymond Hill	0036154d52	code review: be sure "ublock" flavor is always present	2018-04-18 07:11:13 -04:00
Raymond Hill	8071321e91	lower default value of manualUpdateAssetFetchPeriod	2018-04-09 08:26:14 -04:00
Raymond Hill	4d8974fe80	code review: avoid redundant PSL selfie	2018-04-06 16:02:35 -04:00
Raymond Hill	93f49a61d7	add pre-processor directives to filter list compiler (https://github.com/AdguardTeam/AdguardBrowserExtension/issues/917 )	2018-04-05 07:29:15 -04:00
Raymond Hill	0a879a816b	treat behind-the-scene network requests like all others	2018-03-30 08:55:51 -04:00
Raymond Hill	2c901588c7	fix #3546 , #3428	2018-02-26 13:59:16 -05:00
Raymond Hill	a81d2a759b	fix #3318 , #3387	2018-02-21 13:29:36 -05:00
Raymond Hill	6b7d8e75f4	bring out of band fixes (`c5cbf5db47`, `2999dbee5c`) for Firefox/webext into master	2018-02-21 08:19:43 -05:00
Raymond Hill	17930cc778	fix #3474 , #2823	2018-02-15 17:25:38 -05:00
Raymond Hill	a9f68fe02f	Fix #3069 , and consequently #3374 , #3378 . A new filtering class has been created: "static extended filtering". This new class is an umbrella class for more specialized filtering engines: - Cosmetic filtering - Scriptlet filtering - HTML filtering HTML filtering is available only on platforms which support modifying the response body on the fly, so only Firefox 57+ at the moment. With the ability to modify the response body, HTML filtering has been introduced: removing elements from the DOM before the source data has been parsed by the browser. A consequence of HTML filtering ability is to bring back script tag filtering feature.	2017-12-28 13:49:02 -05:00
Raymond Hill	b446f9f8bd	fix regression reported in `dec0b80a72 (commitcomment-26435928)` by partially reverting changes from `4a09c9f866`	2017-12-22 11:45:07 -05:00
Raymond Hill	4a09c9f866	improve slightly pre-parsing of `##script:...` filters	2017-12-17 10:28:12 -05:00

1 2 3 4