245
points
85
comments github.io
I built a tool that reveals the data your browser exposes automatically every time you visit a website.
GitHub: https://github.com/neberej/exposedbydefault
Demo: https://neberej.github.io/exposedbydefault/
Note: No data is sent anywhere. Everything runs in your browser.
I’ve been experimenting with ways to reduce my browser fingerprint and exploring techniques to anonymize fingerprint data.
So I built this.
This is kind of like a lighter, more thorough version of CreepJS but entirely client side. I don’t maintain massive lists of time zones or do server-side comparisons to calculate uniqueness. Instead, it automatically surfaces everything a browser exposes, explaining each item in detail.
What I'd love for these sites to do is help me understand where I am distributionally. How unique am I? On what? Help me understand what needs to be fixed and what my threat vector is.
The problem with these is that I'm always unique. Doesn't matter what browser I'm on or what. If I am unique on a clean Apple laptop in either Safari or Chrome then it is essentially meaningless. I got controlled hardware and vanilla software, how else do you blend into the crowd?
But in the wild sites aren't always implementing all these features. So I want to see if I'm unique to standard site or even one that is a bit more heavy. Importantly HOW unique am I? What things am I not unique, how unique am I, and what are the most unique things about me?
Having that information gives me the ability to do something about it. Without that information then this is just like any other website where essentially the message is "be scared! People can track you on the internet and there's nothing you can do about it!"
This EFF tool does this https://coveryourtracks.eff.org/
To critique that (and maybe suggest what OP can do to make theirs better) is that there's poor visualization. What's great is that it tells me there right in center
But give me some visualization. Sentences like Are not super helpful, though they should exist. Showing a density plot[0] is very useful[1]. It gives the user more information, telling them where they need to go. Even a simple replacement to Makes things easier to read.In an ideal setting I think the site should suggest to users what they should change and show them where they could be with the new settings. Letting them play around and adjust a some settings.
I know I'm being nitpicky here and to be honest I think the EFF version is "good enough" but I still think adding such visualizations and letting users "see" the results makes things easier to understand and can help them know what to do.
[0] https://seaborn.pydata.org/generated/seaborn.kdeplot.html
[1] In this case it isn't going to be continuous since I pulled from the User agent so this will have more discrete bins. Helping inform the user would be seeing the proportion of those other bins. That way they know what to change their user agent to!
And creating that comparison is far harder than people think. To answer "How unique am I?" I need a large, representative dataset of fingerprints collected over time and ideally weighted by how often real websites use each feature. That would require running an backend and database.
It’s something I’d like to build eventually, but only in a privacy-preserving, opt-in way that aligns with the spirit of the project.
For privacy prevention, maybe you can help me understand something better then. I was under the impression that for the most part, each fingerprinting technique itself was not enough to identify someone, but it is the collection of them. So in that setting, would not showing the distribution of the individual metrics likely preserve privacy? I can certainly see some subtle naive trap existing here that I'm not aware of but do you know of one? I at least would think things such as agent, dark mode, and some other things shouldn't risk deanonymization. Though clearly things like coordinates, unique fingerprints, and probably even the canvas fingerprinting shouldn't be shared. As long as each data point isn't associated with others and you have a decent sample size. But also I'd love to learn if I'm missing something important.
Here's a suggestion: it's important to show us that our browser footprint allows us to be positively identified and tracked, but it only alerts us to a problem. It would be very useful if the site also provided some tips to improve anonymity, particularly if it's low-effort changes such as tweaking a couple of config changes.
https://webgpureport.org/
But, they are bucketed
https://www.w3.org/TR/webgpu/#privacy-considerations
It's not zero pieces of info but it's also not close to as bad as it looks. Effectively, everyone who has, say an NVidia GPU, will likely have the same list of features and limits.
As a more general example: The number is just a flat out wrong
> Unique to 1 in 2,147,483,648+ devices.
No, I have an iPhone Pro and am in the PST time zone, set to English. It has the exact same finger print as millions of other devices among the 40 million people in the PST time zone. In general, The only things different between 2 iPhones of the same model are time-zone, laguange setting, and font size.
Please STOP EXAGGERATING!
Your IP address, ASN, and location make this not true.
you walked right by the chance to call it WeirdoJS
Because let’s be honest - all of us know that a lot of data points are being collected about us, countless articles have been written about the insanity of cookie and user-data monetization networks - still it appears to be a privilege to few to tap into that data trove.
I personally haven’t seen an effort to try and make this transparent. Efforts like this page are commendable and informative, much like amiunique or other services - still they lack the tangible information that sharing this information with “the world” reveals about an affected individual.
Why hasn’t this been done yet? Why is this seemingly not trivial?
https://myadcenter.google.com/controls
I'm not sure how that would work from an ad-buying perspective, from what I understand you essentially choose which buckets you'd like to show ads to? Like I don't think ad-buyers get the whole dossier for the person they're showing ads to, the platform just decides "from what you've told us, this person seems likely to like your ads"
Or more like "on ad network X you match for keywords A, B, F, G"?
So yes, your fingerprint is unique, but it's a different unique every time, making it pretty useless for anything.
Edit: Ah, turns out "Unique Fingerprint ID" is just the same fingerprint ID printed at the top, it isn't one of the attribute used for calculating the ID, it is the ID. Guess I got confused by the placement of it.
The fingerprint should really only use stable features that don’t fluctuate between reloads. That way it’s consistent for the same device.
No idea how representative either tool is.
For me it says 1 in 17,179,869,184+, but scrolling through all the variables, the vast majority should be the same for any MacBook Chrome user.
It would be great to see the stats of each individual characteristic.
There are ~40 million in the PST time-zone. Some percent have smartphones (80%+), ~50% of those are iPhones (16 million). Of those, the majority are set it English (80%+), and are divided into screen sizes. But basically, if you have an iPhone, you have the same fingerprint has at least a million other other people in the PST time size. You are at best, 1 of 100, not 1 of x,xxx,xxx,xxx.
You might be x,xxx,xxx,xxx of people who visited that unpopular site but no one needs tracking on an unpopular site. On a popular site you will not have a unique finger print.
If the fingerprint ID is unique every time, there is zero possibility of using it for identification.
Am I missing something? That doesn’t math the way math should math.
Or did I misunderstand you?
The more important bit to see from this tool is probably "this is an example of how much information which can aid in identification your browser exposes".
I tried with Windows 7 (Firefox 115) and it reports Windows 7.
It seems though that it cannot distinguish between Windows 10 and Windows 11, so, without looking further, I suppose the detection is based on the User-Agent string? (The OS version browsers report on Windows is frozen, so Windows 10 and Windows 11 have the same version there.)
But to what extent should we care for such a small website? The AI witch hunt won't get us too far, and this new way of producing is only getting started. The loss of control to a non-deterministic black box is worrysome, but at some point non-vibe coded (hard coded? brain coded?) software might become less error-prone that vibe-coded
Did you mean more instead of less?
My iPhone is allegedly unique to 1 in 2,147,483,648+ devices.
But I wonder how true that is, given how many people use the same model and iOS version as me.
And if every option cuts the user base in half, becoming unque is a matter of 33 such options.
Browser type and version
Screen resolution
Installed fonts
Browser plugins and extensions
Canvas fingerprinting data
WebGL (graphics hardware info)
Time zone
Language settings
IP address
HTTP headers
Touch support
Device type
AudioContext
Also I think somebody on HN recently pointed out that the language accept header can be used to fingerprint chromium users.
The Canvas Deep Fingerprint Hash is higher entropy and includes baseline shapes, emoji rendering, winding rules etc. [2]. It’s meant to capture subtle rendering differences between systems.
1. https://github.com/neberej/exposedbydefault/blob/main/src/mo...
2. https://github.com/neberej/exposedbydefault/blob/main/src/mo...
I just get approximate location from your public IP address via an external IP geolocation API (ipapi.co), which usually gives city-level accuracy.
> Impossible to "expose"
The perks of disabling JS on every site!
For example, in the DRM section, they extract the Security Level, like L3 – Software Decode (SW_SECURE_DECODE).
Their WebRTC test is also unique: they utilize a TURN server as a feedback mechanism. That means even if you tamper with WebRTC JS in the browser (like some extensions do), it can still expose your real IP by leveraging UDP and bypassing the proxy altogether. https://scrapfly.io/web-scraping-tools/webrtc-leak
So instead I wonder if we could build an open database of “identities” that our browsers could clone.
That is your browser deliberately reports the whatever is currently the most popular of a set of general identities.
It's important to point out fingerprinting, yet no ordinary user cares.