external/uBlock - Forgejo: Beyond coding. We forge.

external/uBlock

mirror of https://github.com/gorhill/uBlock.git synced 2024-11-11 17:41:03 +01:00

Author	SHA1	Message	Date
Raymond Hill	010635acd6	Add support for `ping` static filter option Related issue: - https://github.com/gorhill/uBlock/issues/1493 Documentation: - https://help.eyeo.com/adblockplus/how-to-write-filters#type-options Test page: - https://testpages.adblockplus.org/en/filters/ping Additionally, network requests of type `beacon` will be mapped to `ping` by the static filtering engine.	2019-09-22 09:11:55 -04:00
Raymond Hill	23c4c80136	Add support for `elemhide` (through `specifichide`) Related documentation: - https://help.eyeo.com/en/adblockplus/how-to-write-filters#element-hiding Related feedback/discussion: - https://www.reddit.com/r/uBlockOrigin/comments/d6vxzj/ The `elemhide` filter option as per ABP semantic is now supported. Previously uBO would consider `elemhide` to be an alias of `generichide`. The support of `elemhide` is through the convenient conversion of `elemhide` option into existing `generichide` option and new `specifichide` option. The purpose of the new `specifichide` filter option is to disable all specific cosmetic filters, i.e. those who target a specific site. Additionally, for convenience purpose, the filter options `generichide`, `specifichide` and `elemhide` can be aliased using the shorter forms `ghide`, `shide` and `ehide` respectively.	2019-09-21 11:30:38 -04:00
Raymond Hill	9f7e385a5c	Code review fix re. max string length in bidi-trie Related commit: - `fb4e94f92c` A bidi-trie can't store strings longer than 255 characters because the string segment lengths are encoded into a single byte. This commit ensures only strings smaller than 256 characters are stored in the bidi-tries.	2019-08-23 11:30:10 -04:00
Raymond Hill	68ae847ba3	Add support for AdGuard's `mp4` filter option Related discussion: - https://github.com/uBlockOrigin/uBlock-issues/issues/701#issuecomment-520884196 The `mp4` filter option will be converted to `redirect=noopmp4-1s` internally, and `media` type will be assumed.	2019-08-13 12:30:11 -04:00
Raymond Hill	3e5c9e00ab	Add support for AdGuard's `empty` option Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/701 The filter option `empty` is converted to `redirect=empty` by uBO internally; however unlike when the `redirect=` option is used expressly, the `empty` option does not require a resource type. When `empty` is used, only network requests which are meant to return a text response will be redirected to an empty response body by uBO -- so `empty` will not work for resources such as images, media, or other binary resources.	2019-08-13 08:16:21 -04:00
Raymond Hill	aa73f292ec	Add new static network filter option: `redirect-rule=` Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/310 The purpose of this new option is to add the ability to create standalone redirect rule without being forced to create a block filter (a corresponding block filter is always created when using the `redirect=`). Additionally: The syntax `$redirect=token,...` is now supported, there is no need to "trick" the filter parser with `/$redirect=token,...` in order to create redirect rules which are meant to match all paths. Filters of the form `\|http://` will be normalized into two corresponding filters `\|https://` and `\|http://` so as to reduce the number of filters in the buckets of untokenizable filters.	2019-08-03 10:18:47 -04:00
Raymond Hill	cf4345ffc4	Fix some element picker-related issues Related discussion: - https://www.reddit.com/r/uBlockOrigin/comments/c5do7w/ Make the element picker better reflect network filters as parsed by the static network filtering engine. Additionally, discard single alphanumeric character-based filters. Related discussion: - https://www.reddit.com/r/uBlockOrigin/comments/c62irc/ Inject newly created cosmetic filters into the DOM filterer, in order for these filters to be enforced by the DOM filterer in subsequent dynamic DOM changes.	2019-06-29 11:06:03 -04:00
Raymond Hill	be2a950541	Code review of HNTrie/staticNetFilteringEngine - Remove HNTrieContainer class from global context by storing it as a property of µBlock. - Use block scope to isolate HNTrie-related constants from global context. - Prevent filters which are pure IP address from being stored in an HNTrie instance -- as this could cause false positives.	2019-06-19 10:00:19 -04:00
Raymond Hill	cfc2ce333d	Implement bidirectional plain-string trie The bidirectional trie allows storing the right and left parts of a string into a trie given a pivot position. Releated issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/528 Additionally, the mandatory token-at-index-0 rule for FilterPlainHnAnchored has been lifted, thus allowing the engine to pick a potentially better token at any position in the filter string. *** TODO: Eventually rename `strie.js` to `biditrie.js`. TODO: Fix dump() method, it currently only show the right-hand side of a filter string.	2019-06-18 19:16:39 -04:00
Raymond Hill	fb6d69f543	Discard whole filter with bad `csp=` content Related discussion: - https://www.reddit.com/r/uBlockOrigin/comments/bshn7z/ uBO was just removing the bad option, while the whole filter needs to be discarded.	2019-05-24 15:41:37 -04:00
Raymond Hill	1e9528e2a6	Fix regression affecting `*$csp=`-like filters Related discussion: - https://www.reddit.com/r/uBlockOrigin/comments/bshn7z/filter_question/ Regression introduced in: - `3f3a1543ea`	2019-05-24 12:15:32 -04:00
Raymond Hill	1f398134f9	Minor code reivew of `4430ec11e2`	2019-05-23 08:15:26 -04:00
Raymond Hill	7b8c087fdd	Start using async/await where it makes sense	2019-05-22 19:23:04 -04:00
Raymond Hill	4430ec11e2	Rearrange inner loop of static network filtering engine The motivations for the re-arrangement: - Reducing the number of entry points: matchStringExactString() has been removed and matchString() is simply reused with a modifier parameter to enable matching variants. - Presumption that most matches, if any, occur early with the left-most tokens in a URL. This gives a very small marginal performance gain as per built-in benchmark.	2019-05-22 17:51:03 -04:00
Raymond Hill	32b04fa262	Re-arrange parsing of type options to be order-independent Related commit: - `1888033070` This removes the need to place `all` before any negated type in the list of options.	2019-05-21 14:04:21 -04:00
Raymond Hill	1888033070	Add support for `all` filter option Related discussion: - https://www.reddit.com/r/uBlockOrigin/comments/bqnsoa/ The `all` option is equivalent to specifying all network-based types + `popup`, `document`, `inline-font`, `inline-script`. Example from discussion: \|\|bet365.com^$all Above will block all network requests, block all popups, prevent inline fonts/scripts from `bet365.com`. EasyList- compatible syntax does not allow to accomplish that semantic when using only `\|\|bet365.com^`. If using specific negated type options along with `all`, the order in which the options appear is important. In such case `all` should always be first, followed by the negated type option(s).	2019-05-20 13:46:36 -04:00
Raymond Hill	0ca44b847c	Avoid duplicated strings in filterOrigin w/ new approach The new approach is simpler and should benefit selfie serialization/unserialization. This renders stringDeduplicater obsolete -- it has been removed.	2019-05-17 10:13:58 -04:00
Raymond Hill	57890d60ff	Fix incorrect use of `this` in static method Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/568 Regression from: - `19ece97b0c`	2019-05-11 17:40:55 -04:00
Raymond Hill	3692bb4ada	Add HNTrieRef.dump() and STrieRef.dump() as dev tool To be used at the console, as an investigation tool for development purpose. Using it to verify the content of the largest FilterHostnameDict instance, I spotted an all-uppercase hostname in the HNTrieRef instance: µBlock.staticNetFilteringEngine.categories.get(0).get(0x10000000).dict.dump(); Thus the changes to static-net-filtering.js are to fix the erroneous insertion of filters with uppercase characters. The single instance found was a hostname entry in Malware Domain List (TRIANGLESERVICESLTD dot COM).	2019-05-06 11:12:39 -04:00
Raymond Hill	0e4fbefd07	Remove unecessary `null` placeholders FilterOriginHitSet et al. The `null` placeholder are not necessary, we can just use default arguments instead, and add the HNTrieContainer references if and only if they are instanciated.	2019-05-01 18:54:11 -04:00
Raymond Hill	96dce22218	Increase resolution of known-token lookup table Related commit: - `69a43e07c4` Using 32 bits of token hash rather than just the 16 lower bits does help discard more unknown tokens. Using the default filter lists, the known-token lookup table is populated by 12,276 entries, out of 65,536, thus making the case that theoretically there is a lot of possible tokens which can be discarded. In practice, running the built-in staticNetFilteringEngine.benchmark() with default filter lists, I find that 1,518,929 tokens were skipped out of 4,441,891 extracted tokens, or 34%.	2019-04-27 08:18:01 -04:00
Raymond Hill	a8946c8d73	Fix list lookup of multi-hostname `domain=` filters in logger Related commit: - `3f3a1543ea` The regression was preventing uBO to find from which list a filter originated. This affected only filters for which the `domain=` option had multiple hostnames.	2019-04-27 07:04:43 -04:00
Raymond Hill	69a43e07c4	Ignore unknown tokens in urlTokenizer.getTokens() Given that all tokens extracted from one single URL are potentially iterated multiple times in a single URL-matching cycle, it pays to ignore extracted tokens which are known to not be used anywhere in the static filtering engine. The gain in processing a single network request in the static filtering engine can become especially high when dealing with long and random-looking URLs, which URLs have a high likelihood of containing a majority of tokens which are known to not be in use.	2019-04-26 17:14:00 -04:00
Raymond Hill	19ece97b0c	Leverage compile-time token information in new fitler classes Related commit: - `99390390fc` The token information available at compile time can be stored in the filter to be used at match() time. This allows the use of startsWith() rather than a more costly indexOf() call as a first quick test to detect mismatches.	2019-04-26 11:16:47 -04:00
Raymond Hill	99390390fc	Introduce three more specialized filter classes to avoid regexes Performance- and memory-related work. Three more classes have been created to avoid regex-based filters internally. Purpose is to enforce filters which have only one single wildcard in their pattern, a common occurrence. The filter pattern is split in two literal string segments. Similar as above, with the added condition that the filter is hostname-anchored (`\|\|`). The "Wildcard2" variant is a further specialization to enforce filters where the only wildcard is immediately preceded by the `^` special character, again a very common occurrence. Using two literal string segments in lieu of regexes allows to quickly detect a mismatch by just testing the first segment. Additionally, this reduces memory footprint as regexes are much more expensive memory-wise than plain strings. These three new filter classes allow to replace the use of 5276 regex-based filters internally with plain string-based filters. Often-called isHnAnchored() has been further fine-tuned to avoid as much work as possible. I have also observed that using an arrow function for closure-purpose helps measurably performance, as per built-in benchmark.	2019-04-25 17:48:08 -04:00
Raymond Hill	43ecffc295	Fix overzealous strict blocking (regression) Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/536 Regression from: - `3f3a1543ea (diff-522a16ddeed280252d7c3a351261b441R2767)`	2019-04-21 09:17:31 -04:00
Raymond Hill	537271f26b	Fix how `*$`, `\|https://`, `http://` filters are reported in logger This was a regression introduced in `3f3a1543ea` Reported in issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-485163348	2019-04-20 17:25:32 -04:00
Raymond Hill	fa83744b58	Use a sequence of base 64 numbers to encode array buffers The purpose of using a custom base128 encoder is to convert array buffers into strings, to allow a direct string-to-array buffer conversion at load time: string => array buffer Whereas a JSON array would require an extra step: JSON array as string => JS array => array buffer Turns out that the current use of a custom base128 encoding results in a significantly larger selfie storage usage when converting array buffers into strings. Speculation: possibly the browser convert the strings to save into JSON strings internally. Since the custom base128 encoder is likely to cause the resulting string to contain a lot of unprintable ASCII characters, these will need to be escaped when converted to JSON -- escaped characters occupy more space than non-escaped ones. Using a sequence of base 64 numbers means only printable will be present in the output string, hence no escaping necessary. I have observed significant reduction in storage usage for selfie purpose.	2019-04-20 09:06:54 -04:00
Raymond Hill	3f3a1543ea	Add HNTrie-based filter classes to store origin-only filters Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622 Following STrie-related work in above issue, I noticed that a large number of filters in EasyList were filters which only had to match against the document origin. For instance, among just the top 10 most populous buckets, there were four such buckets with over hundreds of entries each: - bits: 72, token: "http", 146 entries - bits: 72, token: "https", 139 entries - bits: 88, token: "http", 122 entries - bits: 88, token: "https", 118 entries These filters in these buckets have to be matched against all the network requests. In order to leverage HNTrie for these filters[1], they are now handled in a special way so as to ensure they all end up in a single HNTrie (per bucket), which means that instead of scanning hundreds of entries per URL, there is now a single scan per bucket per URL for these apply-everywhere filters. Now, any filter which fulfill ALL the following condition will be processed in a special manner internally: - Is of the form `\|https://` or `\|http://` or ``; and - Does have a `domain=` option; and - Does not have a negated domain in its `domain=` option; and - Does not have `csp=` option; and - Does not have a `redirect=` option If a filter does not fulfill ALL the conditions above, no change in behavior. A filter which matches ALL of the above will be processed in a special manner: - The `domain=` option will be decomposed so as to create as many distinct filter as there is distinct value in the `domain=` option - This also apply to the `badfilter` version of the filter, which means it now become possible to `badfilter` only one of the distinct filter without having to `badfilter` all of them. - The logger will always report these special filters with only a single hostname in the `domain=` option. ** [1] HNTrie is currently WASM-ed on Firefox.	2019-04-19 16:33:46 -04:00
Raymond Hill	b70302c0fc	Cleanup comments following changes in `34f3cfe5e7`	2019-04-16 19:20:56 -04:00
Raymond Hill	34f3cfe5e7	Add filterClassHistogram() method to µBlock.staticNetFilteringEngine As a development tool for investigation purpose. To use, enter the following at uBO's dev console: µBlock.staticNetFilteringEngine.filterClassHistogram()	2019-04-16 19:01:14 -04:00
Raymond Hill	4940cda154	Categorize `google` as a bad token for map key purpose In the static network filtering engine, `google` token is too generic and probably leads to too many false positives, beside causing too large filter bucket.	2019-04-16 06:52:13 -04:00
Raymond Hill	a594b3f3d1	Add µBlock.staticNetFilteringEngine.bucketHistogram() as investigative dev tool Additionally, lower the treshold of trieability to 4 for FilterPlainPrefix1.	2019-04-15 11:45:33 -04:00
Raymond Hill	c229003d31	Performance + code maintenance work on static network filtering engine Implement a plain string trie container class: STrieContainer. Make use of STrieContainer where beneficial Some filter buckets can grow quite large, and in such case coalescing "trieable" filter classes into a single trie reduces lookup performance and memory usage. For instance, at time of commit, the filter bucket for the `ad` keyword contains 919 entries[1]. Coalescing trieable filters of the same class into a single plain string trie reduced the size of the bucket into 50 entries + two tries which are scanned only once each whenever the bucket is visited. [1] Enter the following code at uBO's dev console: µBlock.staticNetFilteringEngine.categories.get(0).get(µBlock.urlTokenizer.tokenHashFromString('ad')) Refactor static network filtering engine code to make use of ES6's syntactic sugar `class`. Change first auto-update run from 7 to 5 minutes.	2019-04-14 16:45:20 -04:00
Raymond Hill	92c5f17b78	Improve usefulness of FilterContainer.benchmark() Add ability to test/record results. This allows to compare against output after code changes to detect and more accurately report regressions.	2019-04-14 09:44:24 -04:00
Raymond Hill	d2cb0f17ea	Report block count in benchmark() The block count can be used for testing against regression after code changes.	2019-04-12 10:19:38 -04:00
Noelle Leigh	0bb7b76338	Fixed wrong method for number of elements in a Map (#3755 )	2019-04-06 16:42:24 -03:00
Raymond Hill	928ab91ab8	Add support to benchmark the dynamic filtering pane From uBO's dev console, type: - `µBlock.sessionFirewall.benchmark();` Keep in mind that it's the temporary ruleset being benchmarked.	2019-02-19 10:46:33 -05:00
Raymond Hill	3b81841dc0	Properly set resource URL in benchmark loop	2019-02-17 07:45:05 -05:00
Raymond Hill	d63592b11e	Remove obsolete code to translate `\|blob:` filters into CSP filters These filters are to be considered obsolete since they can't be matched against network requests in the webRequest API. They were probably meant to work when ABP was pre-webext, which means they are quite probably obsolete and there is no longer a point for uBO to conveniently translate them into CSP directives.	2019-02-16 19:25:15 -05:00
Raymond Hill	e93062bcdf	Spin-off FilterOrigin flavors into standalone classes This removes the derivation of FilterOrigin flavors from FilterOrigin itself and simplify code paths. FilterOrigin flavors are small specialized classes, no need to overcomplicate with derivation. Specifically, this removes an indirect call to reach the match() method.	2019-02-16 12:16:30 -05:00
Raymond Hill	5733439f62	Leverage whotracks.me's huge dataset of URLs for benchmark purpose As seen at: https://whotracks.me/blog/adblockers_performance_study.html The requests.json.gz file can be downloaded from: https://cdn.cliqz.com/adblocking/requests_top500.json.gz Copy the file into ./tmp/requests.json.gz If the file is present when you build uBO using `make-[target].sh` from the shell, the resulting package will contain `./assets/requests.json`, which will be looked-up by the method below to launch a benchmark session. From uBO's dev console, launch the benchmark: µBlock.staticNetFilteringEngine.benchmark(); The usual browser dev tools can be used to obtain useful profiling data, i.e. start the profiler, call the benchmark method from the console, then stop the profiler when it completes. Keep in mind that the measurements at the blog post above where obtained with ONLY EasyList. The CPU reportedly used was: https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-6600U+%40+2.60GHz&id=2608 Rename ./tmp/requests.json.gz to something else if you no longer want ./assets/requests.json in the build.	2019-02-15 16:18:03 -05:00
Raymond Hill	ed7e34fb07	Refactor selfie generation into a more flexible persistence mechanism The motivation is to address the higher peak memory usage at launch time with 3rd-gen HNTrie when a selfie was present. The selfie generation prior to this change was to collect all filtering data into a single data structure, and then to serialize that whole structure at once into storage (using JSON.stringify). However, HNTrie serialization requires that a large UintArray32 be converted into a plain JS array, which itslef would be indirectly converted into a JSON string. This was the main reason why peak memory usage would be higher at launch from selfie, since the JSON string would need to be wholly unserialized into JS objects, which themselves would need to be converted into more specialized data structures (like that Uint32Array one). The solution to lower peak memory usage at launch is to refactor selfie generation to allow a more piecemeal approach: each filtering component is given the ability to serialize itself rather than to be forced to be embedded in the master selfie. With this approach, the HNTrie buffer can now serialize to its own storage by converting the buffer data directly into a string which can be directly sent to storage. This avoiding expensive intermediate steps such as converting into a JS array and then to a JSON string. As part of the refactoring, there was also opportunistic code upgrade to ES6 and Promise (eventually all of uBO's code will be proper ES6). Additionally, the polyfill to bring getBytesInUse() to Firefox has been revisited to replace the rather expensive previous implementation with an implementation with virtually no overhead.	2019-02-14 13:33:55 -05:00
Raymond Hill	ed5d63df69	Grand refactoring of the logger Performance-related work: the logger data has been decoupled from the DOM -- inspired from CodeMirror's way of efficiently handling large amout of text data. This decoupling now makes the logger highly efficient CPU- and memory-wise, and open the way to more possibilities. Ability to configure some aspect of the logger behavior and visuals: - The hard-coded limit of 5000 entries has been removed and is now replaced with a variety of user-configurable settings to enforce the discarding of logger entries. - Some columns in the logger output can now be hidden. The filter list look-up feature has been merged into the existing overlay dialog used to create URL rules or static filters, as an entry in a new "Details" pane. Other issues addressed during refactoring: - https://github.com/uBlockOrigin/uBlock-issues/issues/280 - https://github.com/gorhill/uBlock/issues/1999 The minimum version supported on Firefox has been bumped up to 55.0.	2019-01-12 16:36:20 -05:00
Raymond Hill	dfcd23197d	Fix parsing of `redirect=` option as per `67e06f53b4 (commitcomment-27803901)`	2018-12-17 07:46:04 -05:00
Raymond Hill	261ef8c510	Add support for procedural :not to HTML filtering Related issue: <https://github.com/gorhill/uBlock/issues/3683> Additionally, improve compile-time error reporting in the logger	2018-12-15 10:46:17 -05:00
Raymond Hill	9b27a98f90	Fix https://github.com/gorhill/uBlock/issues/3654 Additionally, there has been refactoring work done regarding filtering context used throughout uBO, motivated by the fix here.	2018-12-13 12:30:54 -05:00
Raymond Hill	1b6fea16da	3rd-gen hntrie, suitable for large set of hostnames	2018-12-04 13:02:09 -05:00
Raymond Hill	9eba215961	fix missing trailing asterisk in filter representation in the logger Issue unearthed in https://github.com/uBlockOrigin/uAssets/issues/4083#issuecomment-436914727	2018-11-08 09:01:41 -02:00
Raymond Hill	d7d544cda0	Squashed commit of the following: commit 7c6cacc59b27660fabacb55d668ef099b222a9e6 Author: Raymond Hill <rhill@raymondhill.net> Date: Sat Nov 3 08:52:51 2018 -0300 code review: finalize support for wasm-based hntrie commit 8596ed80e3bdac2c36e3c860b51e7189f6bc8487 Merge: cbe1f2e `000eb82` Author: Raymond Hill <rhill@raymondhill.net> Date: Sat Nov 3 08:41:40 2018 -0300 Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm commit cbe1f2e2f38484d42af3204ec7f1b5decd30f99e Merge: 270fc7f `dbb7e80` Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 17:43:20 2018 -0300 Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm commit 270fc7f9b3b73d79e6355522c1a42ce782fe7e5c Merge: d2a89cf `d693d4f` Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 16:21:08 2018 -0300 Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm commit d2a89cf28f0816ffd4617c2c7b4ccfcdcc30e1b4 Merge: d7afc78 `649f82f` Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 14:54:58 2018 -0300 Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm commit d7afc78b5f5675d7d34c5a1d0ec3099a77caef49 Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 13:56:11 2018 -0300 finalize wasm-based hntrie implementation commit e7b9e043cf36ad055791713e34eb0322dec84627 Author: Raymond Hill <rhill@raymondhill.net> Date: Fri Nov 2 08:14:02 2018 -0300 add first-pass implementation of wasm version of hntrie commit 1015cb34624f3ef73ace58b58fe4e03dfc59897f Author: Raymond Hill <rhill@raymondhill.net> Date: Wed Oct 31 17:16:47 2018 -0300 back up draft work toward experimenting with wasm hntries	2018-11-03 08:58:46 -03:00

1 2 3 4 5