Commit graph

49 commits

Author SHA1 Message Date
Raymond Hill
0ca44b847c
Avoid duplicated strings in filterOrigin w/ new approach
The new approach is simpler and should benefit selfie
serialization/unserialization.

This renders stringDeduplicater obsolete -- it has been
removed.
2019-05-17 10:13:58 -04:00
Raymond Hill
c4f9ae706a
Fix alternate code path introduced in 295f08da97 (oops) 2019-04-28 14:18:09 -04:00
Raymond Hill
295f08da97
Implement code path for when TextDecoder() is not available
The primary purpose is to unbreak
https://github.com/cliqz-oss/adblocker/tree/master/bench/comparison
2019-04-28 14:07:21 -04:00
Raymond Hill
ac58b8e688
Make token hashes fit within a 32-bit integer
The staticNetFilteringEngine uses token hashes to store/lookup
filters into Map objects.

Before this commit, the tokens were encoded into token hashes
as JS numbers (not exceeding MAX_SAFE_INTEGER) using at most
the 8 first characters of the token.

With this commit, token hashes are now restricted to fit
into 32-bit integers, and are derived from at most the 7 first
characters. This improves filter look-up performance as per
built-in benchmark().
2019-04-28 10:15:15 -04:00
Raymond Hill
96dce22218
Increase resolution of known-token lookup table
Related commit:
- 69a43e07c4

Using 32 bits of token hash rather than just the 16 lower
bits does help discard more unknown tokens.

Using the default filter lists, the known-token lookup
table is populated by 12,276 entries, out of 65,536, thus
making the case that theoretically there is a lot of
possible tokens which can be discarded.

In practice, running the built-in
staticNetFilteringEngine.benchmark() with default filter
lists, I find that 1,518,929 tokens were skipped out of
4,441,891 extracted tokens, or 34%.
2019-04-27 08:18:01 -04:00
Raymond Hill
69a43e07c4
Ignore unknown tokens in urlTokenizer.getTokens()
Given that all tokens extracted from one single URL are potentially
iterated multiple times in a single URL-matching cycle, it pays to
ignore extracted tokens which are known to not be used anywhere in
the static filtering engine.

The gain in processing a single network request in the static
filtering engine can become especially high when dealing with
long and random-looking URLs, which URLs have a high likelihood
of containing a majority of tokens which are known to not be in
use.
2019-04-26 17:14:00 -04:00
Raymond Hill
a52b07ff6e
Make userResourcesLocation able to support multiple URLs
The URLs must be space-separated.

Reminders:
- The additional resources will be updated at the same time
  the built-in resource file is updated
- Purging the cache of 'uBlock filters' will also purge the
  cache of the built-in resource file -- and hence force a
  reload of the user's custom resources if any

Related issues:
- https://github.com/gorhill/uBlock/issues/3307
- https://github.com/uBlockOrigin/uAssets/issues/5184#issuecomment-475875189

Addtionally:
- Opportunitically promisified assets.fetchText()
- Fixed https://github.com/gorhill/uBlock/issues/3586
2019-04-20 17:16:49 -04:00
Raymond Hill
fa83744b58
Use a sequence of base 64 numbers to encode array buffers
The purpose of using a custom base128 encoder is to
convert array buffers into strings, to allow a direct
string-to-array buffer conversion at load time:

  string => array buffer

Whereas a JSON array would require an extra step:

  JSON array as string => JS array => array buffer

Turns out that the current use of a custom base128 encoding
results in a significantly larger selfie storage usage when
converting array buffers into strings.

Speculation: possibly the browser convert the strings to
save into JSON strings internally. Since the custom base128
encoder is likely to cause the resulting string to contain
a lot of unprintable ASCII characters, these will need to
be escaped when converted to JSON -- escaped characters
occupy more space than non-escaped ones.

Using a sequence of base 64 numbers means only printable
will be present in the output string, hence no escaping
necessary. I have observed significant reduction in
storage usage for selfie purpose.
2019-04-20 09:06:54 -04:00
Raymond Hill
3f3a1543ea
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622

Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:

- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries

These filters in these buckets have to be matched against all
the network requests.

In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.

Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:

- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option

If a filter does not fulfill ALL the conditions above, no change
in behavior.

A filter which matches ALL of the above will be processed in a special
manner:

- The `domain=` option will be decomposed so as to create as many
  distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
  means it now become possible to `badfilter` only one of the
  distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
  single hostname in the `domain=` option.

***

[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 16:33:46 -04:00
Raymond Hill
a594b3f3d1
Add µBlock.staticNetFilteringEngine.bucketHistogram() as investigative dev tool
Additionally, lower the treshold of trieability to 4 for FilterPlainPrefix1.
2019-04-15 11:45:33 -04:00
Raymond Hill
008370e4b9
Fix https://github.com/uBlockOrigin/uBlock-issues/issues/461
uBO will fallback using a JSON string when trying to encode an array
buffer in Chromium version 59 and earlier.
2019-03-16 09:00:31 -04:00
Raymond Hill
928ab91ab8
Add support to benchmark the dynamic filtering pane
From uBO's dev console, type:
- `µBlock.sessionFirewall.benchmark();`

Keep in mind that it's the temporary ruleset being benchmarked.
2019-02-19 10:46:33 -05:00
Raymond Hill
ed7e34fb07
Refactor selfie generation into a more flexible persistence mechanism
The motivation is to address the higher peak memory usage at launch
time with 3rd-gen HNTrie when a selfie was present.

The selfie generation prior to this change was to collect all
filtering data into a single data structure, and then to serialize
that whole structure at once into storage (using JSON.stringify).

However, HNTrie serialization requires that a large UintArray32 be
converted into a plain JS array, which itslef would be indirectly
converted into a JSON string. This was the main reason why peak
memory usage would be higher at launch from selfie, since the JSON
string would need to be wholly unserialized into JS objects, which
themselves would need to be converted into more specialized data
structures (like that Uint32Array one).

The solution to lower peak memory usage at launch is to refactor
selfie generation to allow a more piecemeal approach: each filtering
component is given the ability to serialize itself rather than to be
forced to be embedded in the master selfie. With this approach, the
HNTrie buffer can now serialize to its own storage by converting the
buffer data directly into a string which can be directly sent to
storage. This avoiding expensive intermediate steps such as
converting into a JS array and then to a JSON string.

As part of the refactoring, there was also opportunistic code
upgrade to ES6 and Promise (eventually all of uBO's code will be
proper ES6).

Additionally, the polyfill to bring getBytesInUse() to Firefox has
been revisited to replace the rather expensive previous
implementation with an implementation with virtually no overhead.
2019-02-14 13:33:55 -05:00
Raymond Hill
261ef8c510
Add support for procedural :not to HTML filtering
Related issue: <https://github.com/gorhill/uBlock/issues/3683>

Additionally, improve compile-time error reporting in the logger
2018-12-15 10:46:17 -05:00
Raymond Hill
5b7a3c9983
fix https://github.com/uBlockOrigin/uBlock-issues/issues/256; add regex support in logger filter field 2018-12-14 11:01:21 -05:00
Raymond Hill
cabb0d36b6
fix https://github.com/gorhill/uBlock/issues/3371 2018-10-23 14:01:08 -03:00
Raymond Hill
777144b036
fix https://github.com/uBlockOrigin/uBlock-issues/issues/200 2018-09-03 16:15:51 -04:00
Raymond Hill
8f1b4b52fd
fix #3606 2018-08-09 11:31:25 -04:00
Raymond Hill
7766786b2c
code review: reuse last decomposed hostname (hit rate = 75%) 2018-06-03 13:27:42 -04:00
Raymond Hill
2c843f6e69
code review: chromium 45 supports arrow functions = start using them 2018-06-01 11:49:48 -04:00
Raymond Hill
798f8dab9d
reduce baseline memory at selfie-load time 2018-06-01 07:54:31 -04:00
Raymond Hill
a9f68fe02f
Fix #3069, and consequently #3374, #3378.
A new filtering class has been created: "static extended filtering".
This new class is an umbrella class for more specialized filtering
engines:
- Cosmetic filtering
- Scriptlet filtering
- HTML filtering

HTML filtering is available only on platforms which support modifying
the response body on the fly, so only Firefox 57+ at the moment.

With the ability to modify the response body, HTML filtering has
been introduced: removing elements from the DOM before the source
data has been parsed by the browser.

A consequence of HTML filtering ability is to bring back script tag
filtering feature.
2017-12-28 13:49:02 -05:00
Raymond Hill
4ab63e70fe
code review: avoid Array.splice/unshift
The array size stays the same, items are just moved around.
2017-12-22 09:37:26 -05:00
Raymond Hill
607968de7f
code review: cache most-recently-used pre-filled scriptlets 2017-12-21 17:05:25 -05:00
gorhill
386e8bee9c
fix #3210 2017-11-09 12:53:05 -05:00
gorhill
6112a68faf
fix #2984 2017-10-21 13:43:46 -04:00
gorhill
9a4681d4e1
fix #2656 2017-05-27 14:31:46 -04:00
gorhill
aae97b8535
fix badfilter option; performance work
- badfilter option was no longer working following last refactoring
  changes.
- performance work:
    - reduce duplication of large strings.
    - new lighter FilterBucket to use when only 2 filters: FilterPair.
2017-05-26 20:00:21 -04:00
gorhill
8d2319e011
fix "purge all" button not disabled when there is nothing left to purge 2017-05-26 08:31:19 -04:00
gorhill
f3e6057e07
fix #2598: refactor to address the cause rather than the symptoms 2017-05-25 17:46:59 -04:00
gorhill
fd03683045
minor code review: it makes no difference, I just prefer no indent there 2017-05-20 16:32:42 -04:00
gorhill
acf7562b0f
minor code review 2017-05-19 20:22:26 -04:00
gorhill
fcf43d972e
tentatively fix issue reported in #2612 re. FFox 24.8.1 2017-05-19 10:12:55 -04:00
gorhill
a222e23e49
fix #2630 2017-05-19 08:45:19 -04:00
gorhill
0232382695
refactor static network filtering, add support for csp injection 2017-05-12 10:35:11 -04:00
gorhill
a4e20ae3ad
new filter option: "badfilter" (see https://github.com/uBlockOrigin/uAssets/issues/192) 2017-03-11 13:55:47 -05:00
gorhill
0b4f31bd8a fix #2344 2017-01-27 13:44:52 -05:00
gorhill
da163bbe4b fix #1641 2016-10-13 13:25:57 -04:00
gorhill
b105010f34 minor code review 2016-10-11 11:53:28 -04:00
gorhill
ef0a7ed5cb code review re. #1997: be sure the setting is persisted 2016-09-16 19:12:16 -04:00
gorhill
269c35a04a fix #1997 2016-09-16 17:41:17 -04:00
gorhill
a7fe367eec refactor where appropriate to make use of ES6 Set/Map (#1070)
At the same time, the following issues were fixed:
- #1954: automatically lookup site-specific scriptlets
- https://github.com/uBlockOrigin/uAssets/issues/23
2016-09-12 10:22:25 -04:00
gorhill
e9157bafb7 fix #1892, #1891 2016-08-13 16:42:58 -04:00
gorhill
a944873b83 code review: convert static filtering's tokenizer to a global utility 2015-12-29 11:34:41 -05:00
Deathamns
95b778fbc7 Change extension description 2015-03-07 19:20:18 +01:00
gorhill
f60f149531 1000 is k, not K 2014-12-24 08:11:22 -05:00
Deathamns
30ef97a678 Fix messaging for Safari 2014-11-09 17:41:07 +01:00
Deathamns
0886f7e886 Add .jshintrc, and use the "use strict" directive
.jshintrc's otion-set is a personal choice, merely a suggestion.
Beside that, it includes some common globals for specific browsers, so
there's no need to set the globals in every .js file.

In order to force strict coding, "use strict" directive was added into
every .js file.
2014-11-09 17:39:17 +01:00
Deathamns
5b79bf3536 Work on vendor API abstraction, and near complete Safari support 2014-11-09 17:39:12 +01:00
Renamed from js/utils.js (Browse further)