Comment by worstspotgain

worstspotgain Jul 8, 2024 parent

I briefly looked over your stock exchange code:

- For memory management, consider switching to std::shared_ptr. It won't slow anything down and will put that concern to rest entirely.

- For sockets, there are FOSS libraries that will outperform your code and save you a ton of headaches dealing with caveats and annoyances. For example, your looping through FD_ISSET is slower than e.g. epoll or kqueue.

- For dependencies, C++ is definitely wilder than other languages. Dependencies are even harder to find than they are to manage. There's a lot of prospective library code, some of it hidden in little forgotten folds of the Internet. Finding it is basically a skill unto itself, one that can pay off handsomely.

zxcvbn4038 Jul 9, 2024

When I did low latency everyone was offloading TCP to dedicated hardware.

They would shut down every single process on the server and bind the trading trading app to the CPUs during trading hours to ensure nothing interrupted.

Electrons travel slower than light so they would rent server space at the exchange so they had direct access to the exchange network and didn't have to transverse miles of cables to send their orders.

They would multicast their traffic and there were separate systems to receive the multicast, log packets, and write orders to to databases. There were redundant trading servers that would monitor the multicast traffic so that if they had to take over they would know all of the open positions and orders.

They did all of their testing against simulators - never against live data or even the exchange test systems. They had a petabyte of exchange data they could play back to verify their code worked and to see if tweaks to the algorithm yielding better or worse trading decisions over time.

A solid understanding of the underlying hardware was required, you would make sure network interfaces were arranged in a way they wouldn't cause contention on the PCI bus. You usually had separate interfaces for market data and orders.

All changes were done after exchange hours once trades had been submitted to the back office. The IT department was responsible for reimbursing traders for any losses caused by IT activity - there were shady traders who would look for IT problems and bank them up so they could blame a bad trade on them at some future time.

shaklee3 Jul 9, 2024

You don't need to shut down processes on the server. All you have to do is isolate CPU cores and move your workloads onto those cores. That's been a common practice in low latency networking for decades.

bluGill Jul 9, 2024

I'm not in HFT, but I wouldn't expect that to be enough.

Not only do you want to isolate cores, you want to isolate any shared cache between cores. You do not want your critical data ejected from the cache because a different core sharing the cache has decided it needs that cache. Which of course starts with knowing exactly what CPU you are using since different ones have different cache layouts.

You also don't want those other cores using up precious main memory or IO bandwidth at the moment you need it.

worstspotgain OP Jul 9, 2024

Just to add to your good points: since there's always a faster cache for your working set to not fit in, you can use memory streaming instructions to reduce cache pollution. Depending on the algorithm, increasing cache hit rates can give ridiculous speed-ups.

shaklee3 Jul 9, 2024

Correct. I was just pointing out to OP that moving processes is not worthwhile and isolation is how you'd do it

gohwell Jul 9, 2024

I’ve worked at a few firms and never heard of an IT budget for f-ups. Sounds like a toxic work environment.

davidmr Jul 9, 2024

Same. That sounds like a way to make that relationship between front office and back office as toxic and unproductive as possible.

hawk_ Jul 9, 2024

Depends on how it's set up. You take a chunk of profits as well if things go well.

resonious Jul 10, 2024

It's just business, no? Would you rather trade with a service that's liable for their mistakes or one that isn't?

rramadass Jul 9, 2024

Any good books/resources you can recommend to learn about the above architectures/techniques?

neomantra Jul 9, 2024

Some years ago I wrote a gist about HFT/HPC systems patterns (versus OPs C++ patterns) applied to dockerized Redis. Might be dated, but touches on core isolation/pinning, numa/cgroups, kernel bypass, with some links to go deeper. Nowadays I do it with Kubernetes and Nomad facilities, but same basic ideas:

https://gist.github.com/neomantra/3c9b89887d19be6fa5708bf401...

rramadass Jul 9, 2024

Nice; reminds me of the Redhat Performance Tuning and Real Time Low Latency Optimization guides.

crabmusket Jul 9, 2024

A few episodes of Signals and Threads, a podcast from Jane Street, go into parts of it.

rramadass Jul 9, 2024

Thank You.

ra0x3 Jul 9, 2024

A great insightful comment, thank you!

sneilan1 Jul 9, 2024

I did not know std::shared_ptr would not slow things down. I've learned something new today! :)

Yes, I agree, epoll is a lot better than FD_ISSET.

Maybe I can keep moving with my C++ code but do people still trust C++ projects anymore? My ideal use case is a hobbyist who wants a toy stock exchange to run directly in AWS. I felt that C++ has a lot of bad publicity and if I want anyone to trust/try my code I would have to rebuild it in rust.

worstspotgain OP Jul 9, 2024

Here's how to maximize shared_ptr performance:

- In function signatures, use const references: foo(const std::shared_ptr<bar> &p). This will prevent unnecessary bumps of the refcount.

- If you have an inner loop copying a lot of pointers around, you can dereference the shared_ptr's to raw pointers. This is 100% safe provided that the shared_ptr continues to exist in the meantime. I would consider this an optimization and an edge case, though.

I would say people trust C++ projects at least as much as any other professional language - more so if you prove that you know what you're doing.

Maxatar Jul 9, 2024

> In function signatures, use const references: foo(const std::shared_ptr<bar> &p). This will prevent unnecessary bumps of the refcount.

This advice doesn't seem quite right to me, and in my codebases I strictly forbid passing shared_ptr by const reference. If you don't need to share ownership of bar, then you do the following:

    foo(const bar&);

If you do need to share ownership of bar, then you do the following:

    foo(std::shared_ptr<bar>);

Why do we pass by value when sharing ownership? Because it allows for move semantics, so that you give the caller to option to make a copy, which bumps up the reference count, or to entirely avoid any copy whatsoever, which allows transfering ownership without bumping the reference count.

Having said that, shared_ptrs do have their uses but they are very very rare and almost all of our use cases do not expose shared_ptr's in the public API but rather use them as an implementation detail. We use them almost exclusively for things like immutable data structures, or copy-on-write semantics, or as a part of a lock-free data structure.

spacechild1 Jul 9, 2024

> If you don't need to share ownership of bar, then you do the following: > > foo(const bar&);

Exactly!

> This advice doesn't seem quite right to me, and in my codebases I strictly forbid passing shared_ptr by const reference

There is at least one use case I can think of: the function may copy the shared_ptr, but you want to avoid touching the reference count for the (frequent) case where it doesn't. This is an edge case, though, and personally I almost never do it.

vitus Jul 9, 2024

Additionally: if you care about nullability semantics within your function, then you write foo(const bar*) and pass in bar_ptr.get(), and of course check that the value is != nullptr before dereferencing it.

Otherwise, I'm inclined to agree -- don't pass around smart pointers unless you're actually expressing ownership semantics. Atomics aren't free, ref-counting isn't free, but sometimes that genuinely is the correct abstraction for what you want to do.

One more point: shared ownership should not be used as a replacement for carefully considering your ownership model.

(For readers who might not be as familiar with ownership in the context of memory management: ownership is the notion that an object's lifetime is constrained to a given context (e.g. a scope or a different object -- for instance, a web server would typically own its listening sockets and any of its modules), and using that to provide guarantees that an object will be live in subcontexts. Exclusive ownership (often, in the form of unique_ptr) tends to make those guarantees easier to reason about, as shared ownership requires that you consider every live owning context in order to reason about when an object is destroyed. Circular reference? Congrats, you've introduced a memory leak; better break the cycle with weak_ptr.)

3 More Comments →

worstspotgain OP Jul 9, 2024

> This is an edge case, though, and personally I almost never do it.

My experience is the opposite. It has to do with the coarseness of the objects involved and the amount of inter-object links. We typically have a vast variety of classes. Many of them have shared_ptr members, resulting in rich graphs.

Many methods capture the shared_ptr parameters by copying them inside other objects. However, many methods just want to call a couple methods on the passed-in object, without capturing it. By standardizing on const shared_ptr &, all calls are alike, and callees can change over time (e.g. from not capturing to capturing.)

worstspotgain OP Jul 9, 2024

foo(const bar&) is ideal if you precisely wish to bar ownership. If (and in many kinds of projects, invariably it's more like when) you later decide to share ownership, or if nullptr is an option, then it's no good.

foo(std::shared_ptr<bar>) is copy-constructed as part of your function call (bumping the refcount) unless copy elision is both available and allowed. It's only ideal if you almost always pass newly instantiated objects.

Pass by const reference is the sweet spot. If you absolutely must minimize the refcount bumps, overload by const reference and by rvalue.

As for shared_ptrs being very rare, uh, no. We use them by the truckload. To each their own!

CyberDildonics Jul 9, 2024

foo(const bar&) is ideal if you precisely wish to bar ownership.

What?

invariably it's more like when) you later decide to share ownership,

shared_ptr shouldn't even be necessary for keeping track of single threaded scope based ownership.

As for shared_ptrs being very rare, uh, no. We use them by the truckload. To each their own!

You might want to look into that, you shouldn't need to count references in single threaded scope based ownership. If you need something to last longer, make it's ownership higher scoped.

If something already works it works, but this is not necessary and is avoiding understanding the actual scope of variables.

6 More Comments →

quotemstr Jul 9, 2024

> Why do we pass by value when sharing ownership? Because it allows for move semantics, so that you give the caller to option to make a copy, which bumps up the reference count, or to entirely avoid any copy whatsoever, which allows transfering ownership without bumping the reference count.

What if the callee sometimes wants to get a reference count and sometimes doesn't? In the latter case, your proposed signature forces an unnecessary pair of atomic reference count operations. But if you use

    foo(bar const&)

instead, then foo can't acquire a reference even when it wants to.

You could stick std::enable_shared_from_this` under `bar`. But `std::enable_shared_from_this` adds a machine word of memory, so you might not want to do that.

If you pass

    foo(shared_ptr<bar> const&)

you incur an extra pointer chase in the callee. Sure, you could write

    foo(bar const&, shared_ptr<bar> const&)

but then you burn an extra argument register. You can't win, can you?

You can win actually. Just use https://www.boost.org/doc/libs/1_85_0/libs/smart_ptr/doc/htm... or your favorite intrusive reference-counted smart pointer, not `std::shared_ptr`. If you do, you get the same capabilities that `std::enable_shared_from_this` grants but without any of the downsides.

worstspotgain OP Jul 9, 2024

> If you pass

   > foo(shared_ptr<bar> const&)

> you incur an extra pointer chase in the callee.

Actually this is usually not the case (assuming of course that caller is holding the original pointer in a shared_ptr<bar> which is the use case we were discussing.)

That shared_ptr<bar> instance is held either on the stack (with address FP + offset or SP + offset) or inside another object (typically 'this' + offset.) To call foo(const shared_ptr<bar> &), the compiler adds the base pointer and offset together, then passes the result of that addition - without dereferencing it.

So as it turns out, you may actually have one fewer pointer chase in the const shared_ptr<bar> & case. For example, if foo() is a virtual method and a specific implementation happens to ignore the parameter, neither the caller nor the callee ever dereference the pointer.

The one exception is if you've already resolved the underlying bar& for an unrelated reason in the caller.

I do agree that intrusive_ptr is nice (and we actually have one codebase that uses something very similar.) However shared_ptr has become the standard idiom, and boost can be a hard sell engineering-wise.

3 More Comments →

a_t48 Jul 9, 2024

If you _maybe_ need to share ownership, the second is a little pessimistic - you always increase the ref count.

Maxatar Jul 9, 2024

That is correct and I can see that being a justification for passing a const&, in fact the C++ Core Guidelines agree with you that such a scenario is the only acceptable reason for passing a shared_ptr by const&, although they encourage passing by value, or just passing a const T&.

2 More Comments →

fooker Jul 9, 2024

Reference counting definitely slows down tight loops if you are not careful.

The way to avoid that in low latency code is to break the abstraction and operate with the raw pointer in the few areas where this could be a bottleneck.

It is usually not a bottleneck if your code is decently exploiting ipc, an extra addition or subtraction easily gets executed while some other operation is waiting a cycle for some cpu resource.

shaklee3 Jul 9, 2024

That's not true. It does slow things down because it has an atomic access. How slow depends on the platform.

unique_ptr does not slow things down.

pjmlp Jul 9, 2024

C++ might have a bad reputation, but in many fields the only alternative, in terms of ecosystem, tooling and tribal knowledge is C.

Between those two, I rather pick the "Typescript for C" one.

chipdart Jul 9, 2024

> I felt that C++ has a lot of bad publicity and if I want anyone to trust/try my code I would have to rebuild it in rust.

C++ gets bad publicity only from evangelists of the flavour of the month of self-described "successor of C++". They don't have a sales pitch beyond "C++ bad" and that's what they try to milk.

And yet the world runs on C++.

rmlong Jul 9, 2024

std::shared_ptr definitely slows things down. It's non-intrusive therefore requires a memory indirection.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous