User-Agent Reduction - Hacker Neue

76 points Nov 4, 2022

jraph Nov 4, 2022

When I got started with the web 15 years ago, it was advised everywhere not to rely on user agent strings and rely on feature detection instead, and that using the user agent string should be a last resort solution.

Today, we are still seeing issues "solved" by switching one's user agents. And here we are reading Akamai whining about user agents getting unreliable. And we are talking unreliable at the minor version and specific platform version level.

It's not like we weren't warned ahead of time.

I'm sure problems will be sorted by proper http headers, data in handshakes or other things. And they should. Nobody should have to read user agent strings to optimize things, because things should also be optimized for a new, unknown user agent that would support these optimizations.

bluesmoon Nov 4, 2022

I'm not sure how you came to the conclusion that "Akamai is whining" about this. It's an informational blog post about what's happening and what's changing.

User Agent strings aren't used for feature detection, they're used for classification. As a developer, when you're trying to fix a bug reported by a customer, it helps to know exactly which browser right down to the patch version that bug shows up in so that you can try and reproduce the bug in the same environment.

cxr Nov 4, 2022

Odd response—especially the (perversely ironic!) dig in your first sentence. The blog post states:

> At Akamai, we use the User-Agent header at the edge and as part of many Akamai products for business logic

The post then goes on to describe several things that are expected to break (or would be breaking—if Akamai weren't taking steps on their end) since they rely on the value of client's User-Agent header, and it affects how they respond. It's definitely not just for being used at Akamai to help reproduce bugs in the same environments...

stickfigure Nov 4, 2022

Then you can ask the customer. You have a relationship with them.

Akamai uses user-agent strings in its Bot Manager. They see what specific version of what browser you're running, then check certain characteristics of the request (eg header order) against a database. That isn't going to work anymore.

And good riddens. It makes the internet brittle and isn't especially hard to work around anyway.

jraph Nov 4, 2022

> I'm not sure how you came to the conclusion that "Akamai is whining" about this

I largely overstated that. Of course. However, my feeling throughout the entire post is that it was like they were announcing bad news about which they were not too overly about this because they based their optimization strategy on this. I was like "the world told you so".

nicj Nov 7, 2022

As the author of the post... :) We don't see this as bad news, quite the opposite in fact. Feature detection (if you're running JavaScript), Client Hints (if you're running code at the edge), etc are all a lot better to use for logic decisions. We've been making changes to rely less on the UA header directly for our Akamai products.

Our goal of the post was more around educating our customers, who may not be aware of these changes, and these changes can affect their custom logic, if they depend on the UA.

Beltalowda Nov 4, 2022

This is yet another example of good advice that over time gets oversimplified to an 'always' rule, and just becomes silly.

"Don't rely on User-Agent" was in response to things like:

  if (isIE())
      useIEThing()
  elseif (isNetscape())
      useNetscapeThing()
  else
      alert('unsupported')

And that is usually a bad thing to do, since you can always almost replace that with a simple "if ('foo' in window) useFoo()" test or the like.

But there are also things that can't be done like this. What if I want to serve the best possible image or video format for a platform? The Accept-Content header isn't really enough for this, aside from that it doesn't really advertise all supported formats on most platforms it also doesn't tell you things like "Firefox 86 enabled AVIF decoding support, and Firefox 100 enabled hardware decoding, but only on Windows". So if someone is using Firefox 101 on Windows: let's serve them an AVIF video, it will work great for them. If they're using Firefox 101 on macOS: maybe use another format because AVIF will eat all their CPU.

There's lots of little cases like this where you can't really rely on feature detection. There's a reason User-Agent got replaced by another system which gives the same information: that's because they're useful (IMHO Client Hints are worse by the way and not an improvement at all).

jraph Nov 4, 2022

For this example I see two possible answers:

- Accept-Content has a q-factor weight that can be attached to the types. Browsers should use this correctly to hint at the server their preferred format(s). It should be considered a bug if not. What if I actually tweaked and recompiled my Firefox with a better decoder? Or if the browsers uses a framework like gstreamer / ffmpeg and I installed the right packages for this decoder? You can't know from the UA. Accept-Content cannot possibly list all the supported codecs when the browser accepts a large number of them, but it should at least list the most widespread ones with correct weights. But this leads to my second answer (especially as a very reliable Accept-Content is bad for privacy):

- Actually, just use srcset and provide an entry for each format you support and let the browser pick the right one. No need for Accept-Content and CDN magic. It should be the browser's responsibility to know what's supported, what's not, what's best. If not, again, it should be considered a browser bug, the web developer has done their work at this point.

The server can't know the best format, only the browser can.

I understand that reality is different, but Akamai and YouTube both have leverage on browser vendors to make them fix their bugs / to build standards for this shit. Smaller developers can report bugs too, and I understand that using the UA string is a "worse is better" solution that works around the issues, but we've had years to fix this.

A large entity like Akamai should not have relied on UA without preparing cleaner solutions built with the browser vendors, and should not have been caught by surprise by such a change. Something is wrong in this story.

Note that I didn't outright reject using the UA string for workarounds, but that's still a last resort which is error prone. Building business logic using the UA string is asking for troubles and we've been known this for a long time now. The proof of this is in the very existence of this article from Akamai. They are "screwed" [1] because they can't really do what they are doing the way they are doing it.

[1] I'm sure they are smart and will find solutions. I hope they'll find that building standards with browser vendors is a good solution.

Beltalowda Nov 4, 2022

> A large entity like Akamai should not have relied on UA without preparing cleaner solutions built with the browser vendors

The current solution works; there is no problem here.

> The proof of this is in the very existence of this article from Akamai. They are "screwed" [1] because they can't really do what they are doing the way they are doing it.

This is a rather odd interpretation of the article; they just switched from User-Agent header to UA Client Hints. UA Client Hints are the same as the User-Agent string, except delivered through a different mechanism. It's little more than s/one-thing/other-thing-thats-basically-the-same-but-different/

I resent how the Chrome team is handling this because it's forcibly creating work for a large number of developers for no good reason in particular other than "this other interface is a little bit nicer".

jraph Nov 4, 2022

Ah yes, you are right, I misread this a bit.

Well, then if the UA string is reduced for privacy reasons but the server can still ask for the information, the benefits are quite unclear.

UA strings need to be solved but I also agree that Chrome single-handledly deciding the standard is annoying.

2 More Comments →

masklinn Nov 4, 2022

> and that using the user agent string should be a last resort solution.

In fairness, “last resort solution” means sometimes it is your only solution, when a specific browser fucks up on specific content and you need to work around that specifically.

jraph Nov 4, 2022

Sure, that belongs to the very few valid use cases.

I got an iPad 2 from a relative, I do detect its user agent on my private Invidious instance to send it transpiled/polyfilled JS instead of the original one.

Of course it would not be the correct solution if Apple did not forbid other browsers on its hardware, the correct solution would then be to install a recent Firefox version on it. It would also allow a shitload of other stuff to work, like subtitles on fullscreen videos and autoplay on the next video, playback of videos protected by HTTP basic auth, as well as Let's Encrypt SSL certificates.

The device's browser should send a "X-I-m-dumb-and-my-manufacturer-likes-to-piss-everybody-off: true" HTTP header to avoid relying on its user-agent though.

jeroenhd Nov 4, 2022

FWIW you can get Let's Encrypt working on outdated Apple hardware by manually loading the CA certificate.

jraph Nov 4, 2022

I could import the root certificate but instructions for activating it didn't work, they seem to apply on more recent versions of iOS.

jeroenhd Nov 4, 2022

I've done it on iOS 9.3.5 and it worked in Safari at the very least. You need to import it first and then activate it from the settings afterwards, I believe.

iOS changed the exact procedure a few times so you may need to Google around for the exact steps you need to follow.

krono Nov 4, 2022

The level of detail that will be available to servers will not be reduced at all, but rather repackaged and split up into separate headers that the server can individually request. The information contained in these headers will likely be more accurate because it's claimed to be safer this way.

Whereas today your browser sends the messy but relatively detailed user agent string automatically with each request, after this change it will still send the messy user agent string with each request but with a tiny bit less detail.

Google's writers are pretty good at polishing turds, got to give them that!

culopatin Nov 5, 2022

My biggest issue with being a solo learner is that unless I specifically look for some information, I don’t know it’s there. Who is we for you? Who was warning everyone? Was I supposed to get a memo?

fiedzia Nov 4, 2022

> When I got started with the web 15 years ago, it was advised everywhere not to rely on user agent strings and rely on feature detection instead,

Which is reasonable advice for a code running in a browser, not for a proxy/CDN (and you don't want proxy inserting it's own js).

jraph Nov 4, 2022

UA detection in the backend has also been frawned upon, it's not limited to code running in the browser.

And a proxy/CDN should not be doing something else than proxying requests and serving files.

Workarounds are fine, but that's what they are.

fiedzia Nov 5, 2022

To get it working client-side you'd need to change the app, while the whole premise here is that you don't have to (and often can't). One point tracking what UA string are here and what they need is better than expecting every app author to handle this properly.

computerfriend Nov 4, 2022

User agent strings are such a train wreck. I wish Chrome was braver and changed it to something like "Chromium (Blink, V8); Linux (Android)".

somat Nov 4, 2022

Or just "chromium 99"

Every once in a while I rebel and change my user agent to "firefox 103". but in the end get sad about how much breaks when you do that, and come crawling back the the default user agent string.

I think the thing that bugs me the most is not the complexity of it. but how every body is spoofing every body elses user agent string. It is just this stupid circle jerk of spoofing.

cj Nov 4, 2022

How would CDNs cache both a mobile optimized and desktop optimized version of a site on the edge?

I suppose this can still (kind of) be done, but on the client-side using the viewport size (combined with javascript or CSS @media) rather than on the backend.

jraph Nov 4, 2022

For a simple website, the user agent should be able to decide what to download and display. It should not be a backend application.

HTML is responsive by default, just don't break this, and yes, you can use media queries if needed.

For images, we have srcset to tell the browser what to download depending on the screen size [1]. You should not try to optimize the bandwidth if I'm on mobile. I might be on a Wi-Fi connection with a mobile or with my tethered mobile connection on my laptop. Just optimize for everything anyway.

The backend should not be involved in how the site is presented, and the CDN should be as dumb as possible, or should not be used at all.

For apps, you have Javascript to do whatever you want.

Mobile / desktop detection is yet another user agent detection in disguise anyway. Just detect my screen size, my dpi, my tactile screen, my mouse, possibly my bandwidth is it's really necessary (videoconferencing for instance). I could be using a mouse on a mobile device. Both the mouse and the touchscreen need to work. You might not need to do feature detection, just bind these events unconditionally. I could plug a secondary tactile screen and move my browser window on this screen.

Many devices are hybrid now. A tablet with a keyboard is not that weird today. What should isMobile return? "YesAndNo"?

I've not seen a really convincing use of isMobile yet. But I've seen harmful ones. They are full of assumptions that are correct most of the times, but still have exceptions.

[1] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/so...

fiedzia Nov 4, 2022

> What should isMobile return?

There was an joke (real story maybe) about soldiers being allowed to carry up to 25kg of gear, and therefore a device weighting 104kg supposed to be carried by 4 people was deemed not to be portable.

donatj Nov 4, 2022

As author of a popular User Agent parser - they are indeed a train wreck but they were at least a largely solved, managed and contained train wreck. The average person could just grab a library, pass it a single string and know what browser someone was using.

UA hints, SEC headers and all that stuff they’re pushing to “replace” it really just complicate the problem. Getting accurate data server side has been made a total pita.

kijin Nov 4, 2022

Yeah, the problem with "just use feature detection" is that most of it only works on the frontend, or by having the frontend send additional data to the server after the initial page load.

Sometimes you need to optimize things for certain browsers or bots before a single byte of JS has been sent, relying only on the first few request headers. Akamai probably needs to.

Deleting the cruft but retaining the highest bits (product name and major version) like Chrome is doing seems like a reasonable compromise.

Semaphor Nov 4, 2022

On the other hand, they are amazing at catching bots. Almost all bots (obviously excluding disguised ones, but those never were an issue for us) have identifiable user agents, by blocking bots via UAs, we became better than Google Ads at blocking bots, they are probably doing some kind of complicated ML thing that works far better for edge cases, our simple solution works better for normal cases…

secondcoming Nov 4, 2022

Even today? UA strings are easily fakeable (so fakeable that it surprises me that people still use them for anything).

If a bot still gets caught by UA strings then it's just a poorly written bot?

DoubleVerify is a Googley company that does bot detection. That uses the UA and IP address to find them.

Semaphor Nov 4, 2022

I’m talking about actual, legit bots. Facebook, Instagram, all those search engine crawlers. Those follow all kinds of links, including ads, and then go and annoy us and the advertisers by counting as "fake traffic".

Google is/was (we wrote our own simplified adserver, only using AdManager for the agencies that require it, so I’m not sure how much things changed in the last two years) not only happy letting those through, they even send their own, it was so bad that we redirected all links through our site where we filtered all Google IP ranges (because, of course, whatever they used did not have a proper bot UA) that we could find to block them and stop sending 1000s of fake visits to the advertiser every day.

Multicomp Nov 4, 2022

I wonder if these bots would respect robots.txt files for the ads?

Semaphor Nov 4, 2022

Well, you’d have to get Google to host those robots.txt as the ads are running iframed on their servers ;)

Avamander Nov 4, 2022

Easily fakeable and abusers still use bad ones. The bar is barely above the floor.

fragmede Nov 4, 2022

Right? the mentioned short UA string in the article is

    Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.0.0 Safari/537.36

but as long as we're breaking backwards compatibility anyway, it seems to me it could just say Chrome/100.0.0.0 Android

klabb3 Nov 4, 2022

Eh. I guess I'm happy with any reduction of entropy in the UA string (which today almost contains your family name and dogs favorite meal in an unparseable string blob). But "client hints" seems like very little bang (device-specific cdn assets?) for the buck (an interactive, configurable and stateful protocol). And if this is for privacy reasons, but you can circumvent it by requesting client hints, then won't that end up an always-on default anyway that nobody benefits from in practice?

I guess if it's fine if users and devs don't need to think about it. OTOH, this is yet another barrier to competition in the browser space, which we desperately need to curb.

It seems like someone should overhaul the whole privacy mess with cookies, fingerprinting, user agents and other remnants from the 90s and take a slightly more principled approach to making a sane single standard, instead of adding hundreds of highly specific single-purpose headers and JS APIs.

jefftk Nov 4, 2022

> if this is for privacy reasons, but you can circumvent it by requesting client hints, then won't that end up an always-on default anyway that nobody benefits from in practice?

The goal is to switch from sending high-entropy information by default to sending it only when explicitly requested by a site. This has several advantages as you try to reduce fingerprinting, but the big one is that it's visible which sites are using which information. Today any server could be using any part of the UA.

klabb3 Nov 4, 2022

So, if this becomes a permission, it would be another annoyance like the cookie banners. Akamai is requesting client information, allow or deny? Who would know what that even means, outside of tech?

The gist is, privacy and security has to be in the defaults to be useful. The amount of stuff that can afford to have a human in the loop is in practice minor, and only be reserved for important things that people can actually understand.

jefftk Nov 5, 2022

I don't think there are any plans to make this a pop-up. But it's still useful to move this entropy to be something that has to be actively collected: people can see which sites are collecting what information, and eventually browsers can start enforcing something like privacy budgets (https://github.com/mikewest/privacy-budget). You can't do these with something always sent automatically.

_trackno5 Nov 4, 2022

I guess it could always be on by default, but the browser can offer a privacy-focused user experience. For example, the first time a website asks for these hints the browser could prompt the user to give permission to share that information. Similar to iOS privacy information pills

iicc Nov 4, 2022

I've been blocked from an Akamai fenced website because I used Firefox for Android. Not great when you have a plane ticket you need to change.

iam-TJ Nov 4, 2022

Your experience made me wonder how that blocking stands up against accessibility legal requirements in various countries.

daveoc64 Nov 4, 2022

While I am personally in favour of feature sniffing, rather than user agent sniffing, I think it's worth remembering the debacle about how the SameSite attribute on cookies was handled by the browsers a few years ago.

Several browsers shipped with an old implementation of the spec that is incompatible with the most recent, current version of the spec.

Setting the SameSite attribute to a specific value can result in the site working in newer browsers, but not in older browsers (or vice versa).

The only way to handle this is to sniff for a specific set of old browsers by user agent string, and to alter how cookies are set for those:

https://www.chromium.org/updates/same-site/incompatible-clie...

Due to the prevalence of old iOS devices that can't be updated with a more modern browser (especially iPads), the company I work for has to keep this user agent sniffing in our codebases going forward.

If the user agent string is going to be deprecated or significantly weakened, there needs to be effort among browser vendors to avoid something like this ever happening again.

jefftk Nov 4, 2022

All the UA reduction proposals still include sending the browser major version, which is what you need to handle this kind of incompatibility.

forgotmypw17 Nov 4, 2022

I look forward to Google-Chrome-Web (GCW) fully separating into its AOL-ish self and leaving the Good Web alone for us geeks to revel in.

politelemon Nov 4, 2022

> The Chrome team expects the highlighted portions to be changed to:

I cannot see what they have highlighted.

rrwo Nov 4, 2022

Neither can I, but see https://www.chromium.org/updates/ua-reduction/ (which the article links to)

njsubedi Nov 4, 2022

If only browsers had similar behavior across the platform and devices, user-agent wouldn’t be so useful to servers. They wouldn’t need to respond with customized content for each different user agent. As a developer, I’d prefer having to deal with at most a few dozen UAs instead of hundreds of specific ones.

rrwo Nov 4, 2022

If you're using a version of a Chrome-based browser or Firefox from the few years, you don't need to worry about the UA.

At $work, once we dropped support for Internet Explorer, site development and maintenance became much easier.

UpToTheSky Nov 4, 2022

Someone should also look into the "navigator" variable that websites can access. It provides a strangely open look into the user's machine.

For example, it allows websites to know about your OS, your CPU and your memory.

The "window" object also provides data that I would consider private. Like the screen size. Websites should only know the window size.

Demo: https://jsfiddle.net/uvtLc784/

corford Nov 4, 2022

>It provides a strangely open look into the user's machine.

"navigator" is just the tip of the iceberg: https://abrahamjuliot.github.io/creepjs/

flutas Nov 4, 2022

> Someone should also look into the "navigator" variable that websites can access. It provides a strangely open look into the user's machine. For example, it allows websites to know about your OS, your CPU and your memory.

Looks like it's full of red herring values to me, at least Chrome 107 on a (M1) MBP. That being said, it's always good to remove anything that can be used to fingerprint a user.

It reported the following...

- CPU as undefined

- Memory as 8GB (32GB in actuality)

- Platform as MacIntel (Should be Mac-AArch64)

secondcoming Nov 4, 2022

For me on various browsers:

    Firefox:
        CPU: Windows NT 10
        Memory: undefinedGB
        Screen: 2560x1440

    Brave:
        CPU: undefined
        Memory: 0.5GB
        Screen: 2560x1440

    Edge:
        CPU: undefined
        Memory: 8GB
        Screen: 2560x1440

Actual values: Windows 10, 128GB, 3820x2160

chipsa Nov 4, 2022

That's a crapton of RAM, but are you running on a scaled display? Specifically at 150% scaling?

secondcoming Nov 4, 2022

Ah, yes on the scaling

chipsa Nov 5, 2022

Yeah, I’m pretty sure it’s showing logical pixels, not physical pixels then. Mine as also showing the value you’d get for adjusting based on scaling.

UpToTheSky Nov 4, 2022

The combination of your "CPU as undefined, Memory as 8GB, Platform as MacIntel" can still be used to fingerprint you. Independent of whether the values represent your actual hardware or not.

And they are probably not even red herrings. undefind CPU simply means you use a certain type of browser that does not provide this value. 8GB memory probably means "8GB or more". MacIntel might simply be interpreted as "Some Mac".

130e13a Nov 4, 2022

Interesting. Running this in Firefox on my M1 MBP, I get the following results:

  CPU: "Intel Mac OS X 10.15"
  Memory: undefined  
  Screen: 1512x982

The CPU result is doubly incorrect, since a) it's not an Intel chip, and b) i'm running 12.5.1 instead of 10.15.

The screen value is correct if you double it, i guess this is measured in points instead of pixels...

dao- Nov 4, 2022

> The CPU result is doubly incorrect, since a) it's not an Intel chip, and b) i'm running 12.5.1 instead of 10.15.

It looks like you've enabled resistfingerprinting?

https://searchfox.org/mozilla-central/rev/eddb810ffd5499f098...

400thecat Nov 4, 2022

same for me. It only guessed screen resolution correctly

Havoc Nov 4, 2022

I can see reduced granularity but that seems a touch extreme?

I’d rather google work on all the other info that leaks

charonn0 Nov 4, 2022

Seems like a relevant time to bring up this old chestnut:

https://webaim.org/blog/user-agent-string-history/

donatj Nov 4, 2022

Similarly related, no longer being able to tell iPad OS and macOS apart server side was a major blow for us.

We have essentially “continue this in the app” buttons whose existence and how they passed state (iOS vs Android) was determined server side. We rewrote that all to happen client side because you can check “is it a Mac?” && “does it have multi-touch support?” And know it’s an iPad - at least until they build a touchscreen Mac.

geraldwhen Nov 4, 2022

Open in app banners are butt, so good riddance. If I wanted an app, I would be using an app.

donatj Nov 5, 2022

It's not a banner. It's a little button to take the current state into the app so you can go offline.

Briggs958 Nov 4, 2022 (dead)

bullen Nov 4, 2022

This would sort itself out if browser implementations followed the standard and allowed us to set the user-agent ourselves.

Personally I would then set it to 1, 2 or 3 and have my server handle those cases.

Right now then ONLY user-agent code I have running is this:

  navigator.userAgent.indexOf('Android') == -1 && navigator.userAgent.indexOf('Other') == -1 && navigator.userAgent.indexOf('SamsungBrowser') == -1

Good job Samsung/Google!

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous