uBlock

mirror of https://github.com/gorhill/uBlock.git synced 2024-09-22 21:57:47 +02:00

Author	SHA1	Message	Date
Raymond Hill	c4f9ae706a	Fix alternate code path introduced in `295f08da97` (oops)	2019-04-28 14:18:09 -04:00
Raymond Hill	295f08da97	Implement code path for when TextDecoder() is not available The primary purpose is to unbreak https://github.com/cliqz-oss/adblocker/tree/master/bench/comparison	2019-04-28 14:07:21 -04:00
Raymond Hill	ac58b8e688	Make token hashes fit within a 32-bit integer The staticNetFilteringEngine uses token hashes to store/lookup filters into Map objects. Before this commit, the tokens were encoded into token hashes as JS numbers (not exceeding MAX_SAFE_INTEGER) using at most the 8 first characters of the token. With this commit, token hashes are now restricted to fit into 32-bit integers, and are derived from at most the 7 first characters. This improves filter look-up performance as per built-in benchmark().	2019-04-28 10:15:15 -04:00
Raymond Hill	96dce22218	Increase resolution of known-token lookup table Related commit: - `69a43e07c4` Using 32 bits of token hash rather than just the 16 lower bits does help discard more unknown tokens. Using the default filter lists, the known-token lookup table is populated by 12,276 entries, out of 65,536, thus making the case that theoretically there is a lot of possible tokens which can be discarded. In practice, running the built-in staticNetFilteringEngine.benchmark() with default filter lists, I find that 1,518,929 tokens were skipped out of 4,441,891 extracted tokens, or 34%.	2019-04-27 08:18:01 -04:00
Raymond Hill	a8946c8d73	Fix list lookup of multi-hostname `domain=` filters in logger Related commit: - `3f3a1543ea` The regression was preventing uBO to find from which list a filter originated. This affected only filters for which the `domain=` option had multiple hostnames.	2019-04-27 07:04:43 -04:00
Raymond Hill	69a43e07c4	Ignore unknown tokens in urlTokenizer.getTokens() Given that all tokens extracted from one single URL are potentially iterated multiple times in a single URL-matching cycle, it pays to ignore extracted tokens which are known to not be used anywhere in the static filtering engine. The gain in processing a single network request in the static filtering engine can become especially high when dealing with long and random-looking URLs, which URLs have a high likelihood of containing a majority of tokens which are known to not be in use.	2019-04-26 17:14:00 -04:00
Raymond Hill	19ece97b0c	Leverage compile-time token information in new fitler classes Related commit: - `99390390fc` The token information available at compile time can be stored in the filter to be used at match() time. This allows the use of startsWith() rather than a more costly indexOf() call as a first quick test to detect mismatches.	2019-04-26 11:16:47 -04:00
Raymond Hill	e0d2285da0	Convert HNTrie code to ES6 `class`	2019-04-25 19:38:07 -04:00
Raymond Hill	155abfba18	Cache and reuse result of HNTrieRef.matches() when possible Due to how web pages typically load secondary resources and due to how HNTrieContainer instances are used in uBO, there is a great likelihood that the result of a previous call to HNTrieRef.matches() can be reused in a subsequent call. This has been confirmed by instrumenting HNTrieRef.matches(). Since uBO uses distinct HNTrieContainer instances to either match against the request or the origin hostnames, this means a high likelihood of repeated calls to HNTrieRef.matches() with the same hostname as argument, hence a performance gain when caching the argument+result -- as despite that HNTrie.matches() is fast, comparing two short strings is even faster if this allows to skip HNTrie.matches() altogether.	2019-04-25 18:36:03 -04:00
Raymond Hill	99390390fc	Introduce three more specialized filter classes to avoid regexes Performance- and memory-related work. Three more classes have been created to avoid regex-based filters internally. Purpose is to enforce filters which have only one single wildcard in their pattern, a common occurrence. The filter pattern is split in two literal string segments. Similar as above, with the added condition that the filter is hostname-anchored (`\|\|`). The "Wildcard2" variant is a further specialization to enforce filters where the only wildcard is immediately preceded by the `^` special character, again a very common occurrence. Using two literal string segments in lieu of regexes allows to quickly detect a mismatch by just testing the first segment. Additionally, this reduces memory footprint as regexes are much more expensive memory-wise than plain strings. These three new filter classes allow to replace the use of 5276 regex-based filters internally with plain string-based filters. Often-called isHnAnchored() has been further fine-tuned to avoid as much work as possible. I have also observed that using an arrow function for closure-purpose helps measurably performance, as per built-in benchmark.	2019-04-25 17:48:08 -04:00
Raymond Hill	fff2bb6290	Assume media elements with no Content-Length header to be of size 0 Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/543	2019-04-24 08:30:54 -04:00
Raymond Hill	72bbcdd93c	Prevent search expression in CodeMirror editor from crossing line boundaries Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/493	2019-04-23 19:26:02 -04:00
Raymond Hill	43ecffc295	Fix overzealous strict blocking (regression) Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/536 Regression from: - `3f3a1543ea (diff-522a16ddeed280252d7c3a351261b441R2767)`	2019-04-21 09:17:31 -04:00
Raymond Hill	f10b100379	Fix the handling of pseudoclass-based generic cosmetic filters Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/464 Regression from: `261ef8c510 (diff-3b15596213ed9ba37fb5b8bb1402a6c2R599)` Pseudoclass-based generic cosmetic filters were improperly seen as invalid following the regression.	2019-04-21 07:49:44 -04:00
Raymond Hill	7735b35e21	Fix uncaught rejected promise in assets.fetchText() Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/534 Regression from `a52b07ff6e`	2019-04-21 06:12:20 -04:00
Raymond Hill	97f91f8be9	Small code review of `a52b07ff6e`	2019-04-20 19:10:34 -04:00
Raymond Hill	f0d5205bd7	Discard existing lines when importing from file in "My filters" Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/519	2019-04-20 18:57:16 -04:00
Raymond Hill	537271f26b	Fix how `*$`, `\|https://`, `http://` filters are reported in logger This was a regression introduced in `3f3a1543ea` Reported in issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-485163348	2019-04-20 17:25:32 -04:00
Raymond Hill	a52b07ff6e	Make `userResourcesLocation` able to support multiple URLs The URLs must be space-separated. Reminders: - The additional resources will be updated at the same time the built-in resource file is updated - Purging the cache of 'uBlock filters' will also purge the cache of the built-in resource file -- and hence force a reload of the user's custom resources if any Related issues: - https://github.com/gorhill/uBlock/issues/3307 - https://github.com/uBlockOrigin/uAssets/issues/5184#issuecomment-475875189 Addtionally: - Opportunitically promisified assets.fetchText() - Fixed https://github.com/gorhill/uBlock/issues/3586	2019-04-20 17:16:49 -04:00
Raymond Hill	fa83744b58	Use a sequence of base 64 numbers to encode array buffers The purpose of using a custom base128 encoder is to convert array buffers into strings, to allow a direct string-to-array buffer conversion at load time: string => array buffer Whereas a JSON array would require an extra step: JSON array as string => JS array => array buffer Turns out that the current use of a custom base128 encoding results in a significantly larger selfie storage usage when converting array buffers into strings. Speculation: possibly the browser convert the strings to save into JSON strings internally. Since the custom base128 encoder is likely to cause the resulting string to contain a lot of unprintable ASCII characters, these will need to be escaped when converted to JSON -- escaped characters occupy more space than non-escaped ones. Using a sequence of base 64 numbers means only printable will be present in the output string, hence no escaping necessary. I have observed significant reduction in storage usage for selfie purpose.	2019-04-20 09:06:54 -04:00
Raymond Hill	3f3a1543ea	Add HNTrie-based filter classes to store origin-only filters Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622 Following STrie-related work in above issue, I noticed that a large number of filters in EasyList were filters which only had to match against the document origin. For instance, among just the top 10 most populous buckets, there were four such buckets with over hundreds of entries each: - bits: 72, token: "http", 146 entries - bits: 72, token: "https", 139 entries - bits: 88, token: "http", 122 entries - bits: 88, token: "https", 118 entries These filters in these buckets have to be matched against all the network requests. In order to leverage HNTrie for these filters[1], they are now handled in a special way so as to ensure they all end up in a single HNTrie (per bucket), which means that instead of scanning hundreds of entries per URL, there is now a single scan per bucket per URL for these apply-everywhere filters. Now, any filter which fulfill ALL the following condition will be processed in a special manner internally: - Is of the form `\|https://` or `\|http://` or ``; and - Does have a `domain=` option; and - Does not have a negated domain in its `domain=` option; and - Does not have `csp=` option; and - Does not have a `redirect=` option If a filter does not fulfill ALL the conditions above, no change in behavior. A filter which matches ALL of the above will be processed in a special manner: - The `domain=` option will be decomposed so as to create as many distinct filter as there is distinct value in the `domain=` option - This also apply to the `badfilter` version of the filter, which means it now become possible to `badfilter` only one of the distinct filter without having to `badfilter` all of them. - The logger will always report these special filters with only a single hostname in the `domain=` option. ** [1] HNTrie is currently WASM-ed on Firefox.	2019-04-19 16:33:46 -04:00
Raymond Hill	b70302c0fc	Cleanup comments following changes in `34f3cfe5e7`	2019-04-16 19:20:56 -04:00
Raymond Hill	34f3cfe5e7	Add filterClassHistogram() method to µBlock.staticNetFilteringEngine As a development tool for investigation purpose. To use, enter the following at uBO's dev console: µBlock.staticNetFilteringEngine.filterClassHistogram()	2019-04-16 19:01:14 -04:00
Raymond Hill	4940cda154	Categorize `google` as a bad token for map key purpose In the static network filtering engine, `google` token is too generic and probably leads to too many false positives, beside causing too large filter bucket.	2019-04-16 06:52:13 -04:00
Raymond Hill	60858b6719	Fix handling of backslashes in string expressions for `:has-text()`	2019-04-15 18:56:28 -04:00
Raymond Hill	a594b3f3d1	Add µBlock.staticNetFilteringEngine.bucketHistogram() as investigative dev tool Additionally, lower the treshold of trieability to 4 for FilterPlainPrefix1.	2019-04-15 11:45:33 -04:00
Raymond Hill	c9c21f9cbf	Add more languages for list selection at install/reset time Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/501 Also, the handling of 3-letter language codes has been fixed.	2019-04-14 18:20:57 -04:00
Raymond Hill	7652808806	Improve handling of srcset-based images in element picker Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/517	2019-04-14 17:37:48 -04:00
Raymond Hill	b73480b4c5	Update fix for https://github.com/uBlockOrigin/uBlock-issues/issues/468 As suggested by @jspenguin2017: https://github.com/uBlockOrigin/uBlock-issues/issues/468#issuecomment-482863195	2019-04-14 16:57:09 -04:00
Raymond Hill	c229003d31	Performance + code maintenance work on static network filtering engine Implement a plain string trie container class: STrieContainer. Make use of STrieContainer where beneficial Some filter buckets can grow quite large, and in such case coalescing "trieable" filter classes into a single trie reduces lookup performance and memory usage. For instance, at time of commit, the filter bucket for the `ad` keyword contains 919 entries[1]. Coalescing trieable filters of the same class into a single plain string trie reduced the size of the bucket into 50 entries + two tries which are scanned only once each whenever the bucket is visited. [1] Enter the following code at uBO's dev console: µBlock.staticNetFilteringEngine.categories.get(0).get(µBlock.urlTokenizer.tokenHashFromString('ad')) Refactor static network filtering engine code to make use of ES6's syntactic sugar `class`. Change first auto-update run from 7 to 5 minutes.	2019-04-14 16:45:20 -04:00
Raymond Hill	92c5f17b78	Improve usefulness of FilterContainer.benchmark() Add ability to test/record results. This allows to compare against output after code changes to detect and more accurately report regressions.	2019-04-14 09:44:24 -04:00
Raymond Hill	813d96175d	Fix https://github.com/uBlockOrigin/uBlock-issues/issues/468	2019-04-13 08:10:55 -04:00
Raymond Hill	d2cb0f17ea	Report block count in benchmark() The block count can be used for testing against regression after code changes.	2019-04-12 10:19:38 -04:00
Noelle Leigh	0bb7b76338	Fixed wrong method for number of elements in a Map (#3755 )	2019-04-06 16:42:24 -03:00
Raymond Hill	1a7a3298e2	Be prepared to deal with failure to read user settings Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/507	2019-04-03 13:18:47 -03:00
Raymond Hill	f62d866b36	Code review implementation of cacheStorage.clear() Possibly related issue: - https://old.reddit.com/r/firefox/comments/b3u4nj/what_is_the_estimated_time_period_for_reviewing_a/ @gwarser has been able to reproduce at will, while I have been unable to reproduce at all. The change here is to clear the content of the database instead of outright deleting it before restoring backed up settings.	2019-03-28 10:17:47 -03:00
Raymond Hill	2fd587b7ae	Simplyfy code to gather storage used with StorageManager.estimate() Documentation: https://developer.mozilla.org/docs/Web/API/StorageManager	2019-03-22 22:09:27 -03:00
Raymond Hill	ac71d6577a	Visually emphasize directive syntax (`!#if`/`!#endif`) in list viewer/editor	2019-03-21 19:53:04 -03:00
Raymond Hill	26c57feee8	Code review of IndexedDB usage for cache storage purpose Use Promise.prototype.catch() to deal with potential exceptions. Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/416	2019-03-21 17:49:19 -03:00
Raymond Hill	34a138e3ef	Add `unlimitedStorage` to Firefox manifest; add timeout to IndexedDB access Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/416 The Chromium version of uBO has declared `unlimitedStorage` since the extension was first published in 2014. Declaring this permission in Firefox brings uBO inline with the Chromium version. I suspect some reported errors could be caused by IndexedDB eviction due to the lack of `unlimitedStorage` permission. Additionally, a timeout has been added when uBO tries to access its indexedDB storage. It's unclear whether this will help with the mentioned related issue though, the root cause is still to be identified.	2019-03-17 09:45:28 -04:00
Raymond Hill	008370e4b9	Fix https://github.com/uBlockOrigin/uBlock-issues/issues/461 uBO will fallback using a JSON string when trying to encode an array buffer in Chromium version 59 and earlier.	2019-03-16 09:00:31 -04:00
Raymond Hill	580c3885df	Fix typo which could lead to improper filtering context Related discussion: - `354ac4f57b (commitcomment-32715209)`	2019-03-15 07:47:36 -04:00
Raymond Hill	875542c964	Code review of fix for https://github.com/uBlockOrigin/uBlock-issues/issues/459 Relocate workaround to the code responsible to compute filtering context, such that the workaround will also be applied in other code paths, for example also for webRequest.onHeadersReceived.	2019-03-14 11:24:13 -04:00
Raymond Hill	9a7887eb39	Better English in comment	2019-03-13 17:21:30 -04:00
Raymond Hill	f5974a500b	Fix https://github.com/uBlockOrigin/uBlock-issues/issues/459	2019-03-13 17:17:37 -04:00
Raymond Hill	e49debd5dd	Properly report `:spath` operator of procedural cosmetic filters in logger Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/453	2019-03-08 07:26:55 -05:00
Raymond Hill	3a8b68ea76	Remove obsolete code related to assets storage refactoring in 1.11.0 The removed code was quite old, and was about how user filters were persisted before/after uBO version 1.11, related to the following issue: - https://github.com/gorhill/uBlock/pull/2314 The assets storage refactoring was released in: - https://github.com/gorhill/uBlock/releases/tag/1.11.0	2019-03-06 08:59:13 -05:00
Raymond Hill	67d143ec4e	Fix https://github.com/uBlockOrigin/uBlock-issues/issues/448	2019-03-05 12:42:59 -05:00
Raymond Hill	388c1c98ec	Fix parsing of AdGuard's `#$?#`-based cosmetic filters As reported in the following commit: - https://github.com/AdguardTeam/AdguardFilters/commit/4fe02d73cee6	2019-03-05 10:10:40 -05:00
Raymond Hill	337b1f81b6	Code review of indexedDB-based cache storage	2019-02-26 10:37:25 -05:00

1 2 3 4 5 ...

1673 commits