Comment by bfm - Hacker Neue

bfm May 2, 2022 parent

The OP details how poor software engineering practices brought down a 1.4B market marker with 1400 employees in 2012.

Some of the issues mentioned include:

  - Keeping synthetic test data generation as part of a production build.
  - Keeping dead code for years.
  - Re-purposing a feature flag.
  - Refactoring without regression tests.
  - Manual deployments without peer reviews. They forgot to update one of their servers with the new code.
  - Automated alerts sent via email were ignored.
  - Rolled back to a version of the code running on the server they forgot to update, making things worse.
  - Rushing out a release without proper software engineering hygiene.

The article suggests improvements that could have prevented the chain of events.

For those here who are in HFT circles, have things improved after the Knight Capital Group debacle?

edit: formatting

kevstev May 2, 2022

I worked in algo trading for years, eventually got out because quite frankly the level of risk I was carrying on my shoulders everyday for what I was being paid were just way out of whack, I at least personally never got the huge pay days that people talked about until after I left finance for more pure tech. Interestingly, I worked at Knight and my team pioneered trying to blow up the firm, but that was in 2004, and things were much friendlier- instead of front page news, it was a small blurb on page 3 of the markets section of the WSJ.

Anyway, I still have friends in that business. It hasn't really changed, they have too few people covering systems that are quite complex and while there are checks and such, no one really understands things entirely from end to end in detail that can prevent all problems.

I will never invest directly in an investment bank- either through carelessness or maliciousness I could have easily caused a 9 figure loss, if not more, and there were probably a thousand other people in the same position.

When I read the detailed writeup around this a few years back, I think by far the biggest issue was reusing a tag that had been previously used to denote which strategy to use. I understand why they may have chosen to do so, at the Big Bank I was working at, getting a new fix tag to be passed through all the layers properly would involve at least two other teams and coordinating releases and probably several weeks worth of meetings. If you just reuse an old value you can avoid all that since everything is already set up.

fullsend May 2, 2022

I appreciate your comment about pay. Recruiters will often tell me "it's finance so of course the pay will be substantial." Then when we get to talking numbers they're like "300k a year". Oh, you mean the going rate at a FAANG? And I have to move to New York or Chicago, work more hours, and actively work for people who I know are taking home paychecks with 7+ zeroes on them? Come on. Sometimes it's 400 plus bonus or whatever, which is based on fund performance and yada yada. But it feels way off. I had heard so much about the staggering paydays at these places but it seems you need an ML PHD or some trading chops to be part of that.

kevstev May 2, 2022

Yeah, pay at the big banks is shit really, especially when you consider the utter lack of work/life balance. I left in 2013 making 150k, which was supposed to be supplemented by a ~40% bonus for the level I was at, but each year was "well its been a tough year..." and after getting a token amount one year, and then zeroes the next 2, after working 50-60 hour weeks, I was like I am not only done with this place, but this industry, and left for a 50% pay raise, my TC is now 4x where it was in those days. A neighbor of mine is more or less sitting in my exact seat there, and is somewhere in the 200-250k range.

That said, I went back to finance to work at one of the premier hedge funds out there, and they actually lived up to their expectations in terms of comp, that place was more like a tech firm though than any other firm I have ever worked at aside for maybe Knight. 8% annual increases were normal there. You can look in my post history back to 2018 if you want the name, I recently left after 5 years there and just want to stay out of their crosshairs- they monitor social media aggressively and there is deferred comp at stake.

At big banks, there are really only a very small number of people who are in tech that are getting paid- you have to know which questions to ask- where is the bonus pool coming from- are you "in the business" or the tech pool, which is a second class of citizen. I would have to be in a pretty bad place to ever consider going back to a bank, it was borderline abusive... always dangling the prospect of that big check that would make it worth it

rosege May 2, 2022

I spent a few years at an investment bank, not in the US, and the only people on serious money were some of the top managers. But my overall opinion of these people were that they did very little but the lower downs I met were some of the most talented people I ever worked with.

The top ones would spend all their time traveling the world to the offices and meeting with staff in each location and the sending emails to the rest of the department about what the staff in that location were working on. They would harvest ideas from the staff as they went and then present that as their own or approve projects that staff have suggested to them. I really didn't see how they were worth the $5M they were earning since they didn't come up with the ideas for what would be done and didn't do any real work.

rvba May 3, 2022

Probably there were many ideas.

smcl May 3, 2022

>> Then when we get to talking numbers they're like "300k a year".

> Yeah, pay at the big banks is shit really

I would say that a $300,000/year salary is really quite far from "shit". Many jobs are stressful, demanding or unrewarding but rarely do they provide such a salary, and quite often they offer hourly wages hovering around minimum wage with poor benefits. It's worth taking a step back once in a while to think about this, because it sounds like some people are a little out of touch with how the majority of the world lives

kevstev May 3, 2022

I would agree, but I really don't think many people are making that kind of money at IBs. Things may have changed, but as I mentioned, a neighbor who is pretty much in my seat- well technically one level up as a director (though title inflation has just made pretty much all titles below MD worthless), and was not pulling 300k. The new guys on the desk were not clearing 100k, and even "senior" guys like me were making a wage that just barely put us in the ownership class, and we were sacrificing a good chunk of our lives for it.

And again- these are the cream of the crop, this is the team most people wanted to work. I was one of the first to leave, but all those guys are now making multiples elsewhere that I am aware of. The point is that there are just better opportunities out there where you get to actually enjoy your life. I hear you on other jobs being stressful and demanding, but I have not really worked at a job where I was wearing that stress 24/7- well typically nothing traded on Saturday night, so nothing would ruin that, but outside of that, I could expect a call at literally any time of day- the markets were trading 23x6.

charcoalhobo May 3, 2022

This is a welcome bit of perspective, so thank you for that.

I often find myself complaining about things at work like competing priorities or wearing too many hats and it helps me to ground myself by thinking about past jobs at minimum wage or even watching what people making less than a fifth of my salary have to put up with on a daily basis.

sjtindell May 3, 2022

I believe that technology and other workers who are paid these salaries are actually just the only workers lucky enough to be getting a fraction of what is their fair share. You shouldn’t let a CEO walk with 300x their workers wage and say “well at least I’m not in poverty like those other workers”. You should almost always fight for more.

22SAS May 2, 2022

Most quantitative hedge funds and prop trading firms are now following a very tech like culture since they realize now that technology is just as important as the strategies. To get the best engineers, especially from FAANG, they need to have a similar culture otherwise they'll have a hard time getting new hires.

skippyboxedhero May 3, 2022

It depends what the strategy is.

Frankly, a lot of what people refer to as "prop trading firms" and "quant hedge funds" on here are just market-makers...they are taking very little to no risk, a lot are just riding the wave of ETF growth. Even the ones that are running alpha, I have heard of a few strategies, and they are largely what you would expect: low-edge, crowded trades (frankly, a lot of it is LTCM-style stuff).

That is why the business has become more tech-like, because actually taking risk is...quite risky. If you are going for alpha, you are trying to hire one guy out of thousands, that person knows what they are worth, etc. It is far easier to hire lots of devs on low wages and do the grunt work jobs that are less lucrative, but don't require being able to actually work out if someone is a decent earner.

Man Group has their own department at Oxford Uni, I think Winton hires people out of their department at Cambridge...Man Group's shaky record is legendary (they are doing everything now that blew them up in 2008), Winton was top tier...not anymore. AQR is another one, although a more traditionally finance quant approach...it all turned out to be pure beta. The hybrid approach of hiring devs or ML PHds to generate alpha has only ever worked accidently.

The firms that have been printing money from quant investing over the past five years have all been traditional hedge funds that incorporated quant methods into a fundamental process. And I expect that will continue because, bluntly, these firms know what the price of a security should be better than some busted factor model.

2 More Comments →

revel May 3, 2022

Sounds like I had a similar experience to you, except many years prior. I actually worked on a desk where there was a front page-headline scandal, partly caused by lack of investment in tech. I would never recommend anyone work in finance based on my experiences, but most people get very excited and assume it was fun and lucrative work. It was neither.

I heard, on multiple occasions from senior leadership, that industry practice was to categorize all dev work as IT so they could justify lower compensation. Pay was, just as you pointed out, not the main problem with this setup. There was no respect for other people and individuals were frequently berated or humiliated for no reason.

One interesting observation: despite the fact that I am very well qualified to work in the sector, it's very rare for fintech recruiters to reach out to me. They must know that I'm far out of their pay grade now and will never go back.

caffeine May 2, 2022

The attitude that finance pays more is a leftover from a previous era. 10-15 years ago it was true: the profits from HFT were so also way, way bigger and split up amongst a much smaller group of firms.

Now those firms are all in a completely competitive industry squeezing each other for basis points.

Meanwhile the definition of a FAANG is that it has an effective monopoly, and these companies are taking in way more money than the HFT industry. (Netflix is losing its monopoly but we can’t really drop N from the acronym without a replacement..)

spacemanmatt May 2, 2022

> but we can’t really drop N from the acronym without a replacement

Huh, yeah. That would be quite a GAAF. Gotta come up with something before Netflix is forced out of the FAANG club.

snotrockets May 2, 2022

I’ve seen MAAM being used.

5 More Comments →

22SAS May 2, 2022

Tbf, most of us don't really prefer to be called as HFT's but as Market Makers. Different name, but we still use the same ultra low latency techniques to get the job done.

seoaeu May 3, 2022

“The job” being trading securities at high frequencies?

kevstev May 3, 2022

I worked in option AMM when the term HFT started getting thrown around in some articles, and was discussed as this uber secret hush hush thing, and I was interested. I was then astonished when I kept reading and found out that it was what I had been doing- it was just another name for market making really...

3 More Comments →

isogon May 2, 2022

I cannot confirm this. ~300k is pay (excluding sign-on) fresh out of college at a big HFT -- sufficiently senior devs make 7 figures.

hatesinterviews May 2, 2022

At our firm, the numbers are similar: $600k TC for new grads ($200k base, $100k minimum first year bonus, $300k signing bonus)

22SAS May 2, 2022

WTF! I am at an HFT firm in Chicago, this is insane. This seems to be a lot like an offer from Radix, or Headland, or maybe Algo Dev at HRT.

2 More Comments →

22SAS May 2, 2022

Honestly, that depends on the firm. There are same that do pay very well like this, eg: HRT, Jane Street (they are not an HFT though), Headlands, Radix. Some others like Jump, Optiver the pay varies depending on whether it's front office or back office.

Where I work at, the new grad offers are slightly better than FAANG, but the growth is very good based on performance, we also pay very well to people coming in from a competitor.

FrenchTouch42 May 3, 2022

> sufficiently senior devs make 7 figures

I can confirm, known firm, manager role, offer was $1,3M full cash.

ewuhic May 3, 2022

Could anyone (successfully) apply for such a position with no referral but CV only?

FrenchTouch42 May 3, 2022

Not sure as the positions were not even public in my personal situation.

gjs278 May 2, 2022 (dead)

asjre34marakf May 2, 2022

Why pay more than market rate of a replaceable ML person?

Is there any realistic path for a demonstrably smart and hardworking person into that 7+zeros club? Evidence suggests no: leetcode grinders and FAANGers are not in that club, and most of them will never even make it into the 6+zeros club. Net wealth -- sure, but not income.

22SAS May 2, 2022

It's all about making $$ for the firm. If the strategies developed are very profitable then 7-figures is definitely reachable for the researchers at a prop trading firm.

renewiltord May 3, 2022

7 figs is 6 zeroes in his lingo, right? My man is looking for 8 figures.

2 More Comments →

munificent May 3, 2022

> I understand why they may have chosen to do so, at the Big Bank I was working at, getting a new fix tag to be passed through all the layers properly would involve at least two other teams and coordinating releases and probably several weeks worth of meetings. If you just reuse an old value you can avoid all that since everything is already set up.

There's a good meta-lesson here which is that smart people will do dumb things if you make the smart thing require too much red tape or process.

Processes can block stupidity, but they can also block intelligence if not well designed.

quickthrower2 May 3, 2022

Weirdly a market maker near me paid less than competitive wages for a dev. I think they rest on laurels that people want to work in finance badly.

pclmulqdq May 2, 2022

I used to work in HFT. I have seen highly variable practices in this case, including a "mini-knight" incident in the single-digit millions due to tech debt and poor test coverage. However, the most useful change that has resulted from the KCG debacle was adding several layers of kill switches, a dedicated ops team to watch trading and flip the kill switches, and embracing devops automation.

There is a much more serious focus on having a defense in depth, and making sure that problems like this are noticed before they become an issue. Rollbacks are no longer the first action when something goes wrong: the kill switch comes first.

Dead code, tech debt, repurposed flags, and spotty test coverage are everywhere still.

aaronharnly May 2, 2022

I’m curious about the “repurposed flags” part.

I wouldn’t think of flags as expensive / effortful to make more of, but clearly they must be if people are tempted to reuse them. Can you help me understand what is meant by a flag in this context, and why it would be repurposed?

isogon May 2, 2022

Repurposing flags not always well-motivated, but one legitimate reason to do this is the memory (and particularly cache) footprint.

Often flags are local to a particular object. If there are lots of such objects, you want each to take as little space as possible. You should check out the contortions linux devs go through to make struct page small [0]. This is important, because there is one such struct per page of physical memory. The memory use is a near-constant percentage of your total memory, and you wouldn't want it to be any larger than necessary.

Even when there are not a lot of these objects, in low-latency software it's important to hit the cache. Your program should always just be as compact in memory as possible.

Semantically flags are booleans (is proposition P true of this object). They are stored compactly as bitsets, often implicitly, say:

    #define FLAG_1 0x01
    #define FLAG_2 0x02
    /* ... */
    #define FLAG_8 0x80

   struct order {
       u32 qty;
       u16 id;
       u8  type;
       u8  flags;
   };

This struct will fit into 8 bytes. This is great, as you probably won't waste space to alignment in many cases -- 8 is a good multiple. But if you wanted to add FLAG_9 here, your flags would become a u16, and your struct would, frustratingly, stop fitting into 8 bytes. To avoid this, one might repurpose flags.

Another example of this is intrustive flagging, using, for example, the high or low bits of a pointer aligned to 2^n bytes. If you run out of bits there, not much you can do.

[0] https://github.com/torvalds/linux/blob/master/include/linux/...

pclmulqdq May 2, 2022

This is pretty much why flags get repurposed. It's also important to mention that things like JSON and protobufs are too expensive for HFT, so you are likely going to be sending structs over the wire. Repurposing flags lets you change a wire format with a lot less friction than adding a byte to a struct. Essentially, it lets you change the minor version number on a protocol and only recompile the endpoints without changing the major version number and recompiling everything.

aaronharnly May 2, 2022

Thank you both!

lordnacho May 3, 2022

This is what I thought as well, and I'm in the field. It's legit to try to fit everything in cache.

However, you can still do something about making this safe. For instance the program could do some sort of version check on startup and panic if things weren't correct.

There's a bunch of stuff that needs to be done before the program reaches its low latency steady state, where speed doesn't matter. Might as well add checks there to make sure things are correct.

commandlinefan May 2, 2022

> poor test coverage

Yet you don't have to hang around here long to be told that "Unit Testing is Overrated": https://tyrrrz.me/blog/unit-testing-is-overrated

bnastic May 2, 2022

I remember the Knight Cap event, I was working on order routing at the time.

Things have changed a lot since 2012, and at the same time haven’t. Circuit breakers and position monitoring are no.1 in any sane market making firm. What happened then I can’t imagine happening now (accumulating a huge position for, what was it, 30 minutes? With nobody killing the algos within a couple of minutes?). On the other hand, the perfect world of “code hygiene” and 100% test coverage will never exist in this world, things will slip and they do frequently. What’s better, externally, is the availability of good tools for development and change reviews (bitbucket taking hold, for example), automated deployments, containers, testing frameworks and similar. This type of software, end to end, is incredibly complex and difficult to reason about when unexpected happens (there was a TTL misconfig for multicast and we never got such and such update? Well, no one thought of that!), esp these days with the influx of ML algos for price generation.

22SAS May 2, 2022

Currently work at an HFT firm. Most of the firms invest well into good DevOps, Trading Systems and SRE teams, to ensure that everything from installing a trading server at the colocation facility, to CI/CD and making changes to the systems configs, is done well. There are also guards in place to ensure that if the system seems to make trades that are way too odd then pull the plug and go down immediately.

Also, any code that does not need to be there, is promptly removed right away.

Where I work at, we have a few people from KCG i.e what was formed after Knight Capital merged with GETCO, after this incident. Sometimes this incident is bought up, although none of them I think ever worked for Knight Capital before this incident.

rebelos May 2, 2022

Some of this is unforgivable, but reflecting on it I also realized that software engineering at quant firms has an almost impossible mandate. You want something akin to the extreme rigor of mission critical software (airplanes, cars, NASA, etc), while also remaining nimble enough to modify strategies as market conditions rapidly evolve.

SilasX May 2, 2022

Same is true for blockchain smartcontracts, which have similar catastrophic consequences.

ChrisClark May 2, 2022

That truly is scary to me. I can easily* write advanced Solidity and could try to make something big. But I won't, because I know I would not be able to handle the stress and responsibility. One tiny logic error and millions lost. Thanks but no thanks.

*The fact I believe I could easily do it is probably exactly why I'd end up making some huge mistake. ;)

wombatpm May 3, 2022

I’m sure that any big contracts are written only by CMM Level 5 organizations using formal methods and provably correct with certification that less than 1 line in a million contains an error. It’s all spelled out in the SOW attached to the RFP.

astrange May 3, 2022

I don't think I could, since the language was designed by amateurs after no research into safe programming, and is only being improved one gigantic loss at a time.

Maybe if you could translate from Coq.

bfm OP May 2, 2022

It is challenging, although, with financial markets, it seems like it would be simpler to have some automatic anomaly detection mechanism to unplug or slow things down to prevent further damage.

WJW May 2, 2022

There are a lot of preventative measures they could have taken, starting with just not leaving in dead code and paying attention to automated alerting. But the moral of the story is that they got away with it for so long that nobody cared about it anymore. After all, if it were truly a big deal why hadn't it broken years earlier. Then when the technical debt finally got called it bankrupted the entire firm in one go.

Most of us (hopefully) have less devastating technical debt to deal with, but it is still a cautionary tale about what could happen if you ignore it for too long.

posterboy May 2, 2022

That's a weird statement.

The extreme rigor on the one hand seems to require a value judgement of the real benefits to HTF that I'm not willing to make. The remaining nimble'ity, on the other hand, is an odd word to use over agility or old fashioned responsibility. The benefit is proportional to it, but not exclusively.

The rapidly evolving market conditions concern regular trade too. Swift reactions are expected in any other systems application. "almost impossible" is a weasel word. It's almost impossible to win except for the last man standing, is that it? And there's no practical upper limit to nimble'y, though conservative estimates indicate that less work is more.

What's missing is the perverse incentives, corrupt policies, sociopathic leadership, ...

nradov May 2, 2022

Why unforgivable? It's only numbers in an account. No one died.

idohft May 2, 2022

Hard to speak for HFT in general. Like in software, different firms have different levels of hygiene. About half of your bullet points were true of my previous employer, at my time of leaving.

bob1029 May 2, 2022

Repurposing feature flags is some kind of next dimension horror for me. We've got quite a few of these to deal with, and if someone started changing what they mean we'd be fucked super fast. Simply suggesting that we alter the meaning of an existing FF would result in the resignation of a non-zero number of project managers on my team.

Rolling back code is another thing I have no tolerance for anymore. The only option we entertain these days is a roll-forward. If your software takes so long to iterate/build that you need to go back to and old version in an emergency, you need to review your languages/tools/frameworks/processes. We maintain a contractual obligation to our customers for same-day code updates (in cases of production/regulatory emergencies) because we have enough confidence in our processes.

sokoloff May 2, 2022

You’re likely using human-readable names for the flags and shipping a multi-MB payload of JS and JSON. An HFT firm is likely bit-packing flags so they can send an 8 byte payload rather than 10 and might be using an FPGA hanging right off the PHY to figure out “is this message even interesting to me?”

Your feature flag might be “SHOW_STRIKETHROUGH_PRICING_ON_CROSSSELL_OFFERS”; theirs is a bit mask macro to pick off the 5th bit from the 7th byte. (Why do they care? Because if they allow themselves to get fat and slow, a competitor will take the money.)

Roll-forward only, same day SLA is probably right for your business, but isn’t for a company that could have their systems dusting off $1M every handful of seconds that bad code is running.

Different business problems call for different technical approaches. You should no more adopt theirs than they should yours.

hn_go_brrrrr May 2, 2022

Rolling back isn't about avoiding rebuilds, it's about restoring to a known-good state. Making an emergency patch is typically far riskier than going back to last week's build. We always favor rollbacks, unless there were critical fixes in the current release we absolutely cannot afford to lose.

jll29 May 3, 2022

The only thing worse that comes to mind than repurposing flags is re-using UUIDs, which I have seen some production DB do (no, not mine, thank you very much!).

Reusing the RAM occupied by flags might be do-able in a clean way using guards like the structured enum in Rust, which permits unions (objects that occupy the same space) that always have the right type (i.e., the compiler has knowledge what is in there at each point in time). This mechanism could in theory be extended beyond type-safety to accommodate other contexts in systems programming use cases where memory usage is extremely important.

akhmatova May 3, 2022

Rushing out a release without proper software engineering hygiene.

Sounds familiar. From everything listed above, it would sounds like this must have been yet another one of those "Just F-ing push the change the server now, dweeb, the traders are going crazy" environments. That is to say: the real problems were most likely cultural, and not about the sum of a certain set of bad practices.

benjaminwootton May 2, 2022

I worked in a lot of front office groups in investment banking. The short spell I did in HFT had great software development and DevOps practices.

aledalgrande May 2, 2022

This is all basic stuff I look to set up in every team, and it's crazy given how these firms work directly with tons of money that they don't have an even higher standard. Guess I wasn't wrong turning down these roles.

hftthrowaway22 May 3, 2022 (dead)

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous