Preferences


> As Bitsight continues to investigate the traffic patterns exhibited by CrowdStrike machines across organizations globally, two distinct points emerge as “interesting” from a data perspective. Firstly, on July 16th at around 22:00 there was a huge traffic spike, followed by a clear and significant drop off in egress traffic from organizations to CrowdStrike. Second, there was a significant drop, between 15% and 20%, in the number of unique IPs and organizations connected to CrowdStrike Falcon servers, after the dawn of the 19th.

> While we can not infer what the root cause of the change in traffic patterns on the 16th can be attributed to, it does warrant the foundational question of “Is there any correlation between the observations on the 16th and the outage on the 19th?”. As more details from the event emerge, Bitsight will continue investigating the data.

Interested to know how they're capturing sample data for IPs accessing Crowdstrike Falcon APIs and the corresponding packet data.

EDIT: Not to mention that they're able to distill their dataset to group IPs by their representative organizations. Since they have that info I feel a proper analysis would include actually analyzing which orgs (types, country of origin, etc) started dropping off starting on the 16th. Alas since this seems like just a marketing fluff piece we'll never get anything substantial :(

I'm not sure what exactly they are trying to say. They saw some CrowdStrike traffic logs, saw a random spike a few days before the outage, and...that's it? Why is that "strange", and how does it relate to the incident timeline?

Just a random security company with a fluff piece with "CrowdStrike" in the title trying to get in the headlines.

It's interesting if the spike only happened on CrowdStrike computers but I'm not sure if this article has checked traffic logs on non-CrowdStrike computers and confirmed there was no spike. Even if they did, I agree there's little reason to believe it's related to the CrowdStrike incident and this is probably just a company trying to catch the CrowdStrike PR wave.
I feel very disappointed too after reading it; there's no "Mystery" in the behavior of those numbers, the only unexplained (as of now) is the huge spike, but the subsequent reduction in traffic is explainable by "the client hosts were crashing", hence lower numbers.

I want my 10 minutes back.

I would be interested to know what the distribution of release times for these "channel files" is like. Dropping them at 8pm Eastern time is in line with some companies' idea of well-timed system maintenance windows, whereas others prefer to do things during the workday so that if they need all hands on deck, they can get them more easily.

The latter works better with organizations that release often and have reasonable surety that their updates are not going to cause disruption -- it becomes a normal part of the day, most commonly it causes no noticeable disruption at all, and thus it makes sense to not have to have eng / ops working late hours for the release. This surety can come from different ways, but the one I've seen is having a very methodical rollout with at least a smoke-test (affecting a very small subset of "production", not internal or lab machines, so in CRWD's case it would be customers' machines), and then rolling out to a random %age of machines starting with 1%, and depending on your level of confidence, some schedule that gets you to 100% before the end of business for your easternmost co-workers.

Some additional things to gain confidence can include a 1% rollout to a set of machines that is picked to ideally provide exposure to every type of machine in the fleet, and 100% rollout to customers who have agreed to be at the cutting edge (how you get them to accept that risk is an exercise for the reader, but maybe cut them a deal like 30% off their license).

The reason I'm curious about the distribution of channel file drops, for the case of Crowdstrike, is that if it's an atypically-timed release, that could indicate that it's a response to whatever caused the dip in traffic on the 16th mentioned in the Bitsight article.

Edit: From what I understand, Crowdstrike does have at least some segmentation of releases for the kernel extension, but it appears the configuration file / channel file updates seem to be "Oh well, fire ze missiles".

How exactly is Bitsight collecting the data used in this analysis? I understand it’s just a sampling, but how are they sampling traffic between two arbitrary parties (Crowdstrike and customers in this case)?
Probably buying it from ISPs.
The obvious inference from this is that the bad update was trickled out to some customers on the 16th and it took them 2 days to report the issue because they were all busy figuring out why every machine was blue-screening. Alternatively it took CrowdStrike 2 days to notice that their traffic was disappearing and put 2 and 2 together as to why.
I infer that they pushed a (slightly) bad update on the 16th and tried to correct on the 19th, and that the correction was the update that hosed the world.
I feel like there's an army of sys admins who would be speaking up right now if things started going down on the 16th, but that doesn't seem to be the case.
I wonder if CrowdStrike did do a phased rollout (as they should), but didn't notice that the update was causing crashes?

Not sure if that would make them more or less incompetent...

I also believe this data is suggesting there was a phased roll out that was not noticed.
A lot of evidence, but no claim.
That is a good thing isn't it?

Collect evidence first, draw conclusions when you have enough evidence.

Makes a refreshing change from deciding what happened then collecting only the evidence that supports it.

This image at https://www.bitsight.com/sites/default/files/2024/07/23/Uniq... makes a compelling argument that something happened concurrently with the third set of weekday peaks. Although considering how similar each peak was for the first two weeks I would say the divergence was earlier than the dotted line where the short sharp peak appeared.

Something happened, the nature of that something might be unrelated to the BSOD crash. Could just be another piece of software doing an update at a different frequency that sometimes changes the timing of the crowdstrike update.

You'd need a longer term view of data searching for beat patterns to detect that.

If the something was a one-off effect, like admins taking sick days to watch the Euro final, I'm not sure how you could positively identify the cause.

I don't know what the point of this article was, but at my work, we push updates to millions of devices in people's homes, but we only "open the flood gates" briefly. A blip of time, so 0.01% get the update, we check that they all came back online and reported in healthy, 0.1%, next day 1%, next day 20%, next day 100%. There have been a couple times where we had to refund 0.01% of the customers who got bricked as guinea pigs and called us angrily, luckily it's never been 1%. I get that security updates can't wait a whole day, but can't they at least wait until Windows reboots? I wonder why Crowdstrike pushed to all 8.5 million before checking if any came back online.
PR advertisement stunt disguised as an uninformative nothingburger blog post.
Anyone know what tool was used to produce the graphs?
I think it's seaborn, which uses matplotlib under the hood.

https://seaborn.pydata.org/index.html

It looks like matplotlib to me: https://matplotlib.org/
TL;DR: Strange, our deep packet inspector traffic data shows a drop in traffic from July 16th-18th!

It's incredibly creepy that they A) are collecting this much data from customers B) are comfy drilling into it by IP/organization and C) have enough spare time to do so for a marketing blog post.

Also, for god's sake, you're a company, you're supposed to look professional. If you're going to use AI art for your blog at least don't be lazy: load up Photopea and either fix the broken text or magic wand it out. It'll take you 5 minutes.

"there's a spike we don't know how to explain" saved you a click
I jumped to the conclusion after realizing this was largely an ad piece.

Also kind of funny for BitSight to be claiming how much data they have to improve security. Yet, a massive change in traffic volumes is only surfaced in a hindsight analysis.

They imply, without saying it fully, that someone should have already known the update would bork the world, but failed to see it for some incompetent reason
How are they capturing this data?
CrowdStrike is a rather interesting company, in that is politically connected. Some additional background by Mike Benz:

https://x.com/mikebenzcyber/status/1816177071757893823

https://x.com/mikebenzcyber/status/1816196876686999962

This was discussed on…newsmax2? So it wasn’t even reputable enough for newsmax? You’ll have to excuse me if I’m extremely doubtful about anything from that source.
Mike Benz seems more politically connected than CrowdStrike...

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal