Preferences

markstock
Joined 25 karma
HPC engineer, artist, designer

  1. Sure, I usually measure performance of methods like these in terms of FLOP/s; getting 50-65% of theoretical peak FLOP/s for any given CPU or GPU hardware is close to ideal.
  2. Quadtrees and octrees are themselves quite deep research areas. If the acceleration data structures interest you, I highly recommend Hanan Samet's book "Foundations of Multidimensional and Metric Data Structures". It's from 2006, but is basically the bible for the field.
  3. Note that even without an acceleration structure ("direct summation" in N-body research terminology), a CUDA program or GLSL shader program can exceed 60 fps with 10,000 to 20,000 particles. And a parallel, C/C++/fortran vectorized CPU code can do the same with over 5 thousand.
  4. The general algorithm used here (of computing attraction and repulsion forces between pairs of particles) is very similar to that used in simulations of many interesting phenomena in physics. Start with Smoothed Particle Hydrodynamics (https://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamic...) and then check out Lagrangian Vortex Particle Methods and other N-Body problems (https://en.wikipedia.org/wiki/N-body_problem).

    And the algorithms to solve these quickly is another deep area of research.

  5. Thank you - I was just about to point out some of that.

    The reason that the flocks are tight is because the separation "force" is normally computed as a repulsion between a target boid and all other nearby boids individually, not vs. the center of mass of all nearby boids.

  6. Just a few volumes from my bookshelf related to this:

    Network Analysis in Geography, Haggett and Chorley

    Cities and Complexity, Batty

    Urban Grids, Busquets et al

  7. Let's be a little more clear: these are not "laws" as much as they are scaling relationships, this is not "new math" (see Ziph and others), and central planning has always had an impact on city development. Nevertheless, I appreciate this line of inquiry.
  8. Something doesn't add up here. The listed peak fp64 performance assumes one fp64 operation per clock per thread, yet there's very little description of how each PE performs 8 flops per cycle, only "threads are paired up such that one can take over processing when another one stalls...", classic latency-hiding. So the performance figures must assume that each PE has either an 8-wide SIMD unit (and 16-wide for fp32) or 8 separately schedulable execution units, neither of which seem likely given the supposed simplicity of the core (or 4 FMA EUs). Am I missing something?
  9. Exactly this. Whenever I talk about how I got started in computer art over 40 years ago, I always mention the fact that a screen back then was a one-way device: TV network to you. Basic home computers HAD to plug into the TV, and to a kid, this was magic and freedom.
  10. Yes, this appears to use Stam's Stable Fluids algorithm. Look for the phrases "semi-Lagrangian advection" and "pressure correction" to see the important functions. The 3d version seems to use trilinear interpolation, which is pretty diffusive.
  11. Um, no?

    This is a fine collection of links - much to learn! - but the connection between flow and gravitation is (in my understanding) limited to both being Green's function solutions of a Poisson problem. https://en.wikipedia.org/wiki/Green%27s_function

    There are n-body methods for both (gravitation and Lagrangian vortex particle methods), and I find the similarities and differences of those algorithms quite interesting.

    But the Fedi paper misses that key connection: they're simply describing a source/sink in potential flow, not some newly discovered link.

  12. It is a fudge if you really are trying to simulate true point masses. Mathematically, it's solving for the force between fuzzy blobs of mass.
  13. Supercomputers will simulate trillions of masses. The HACC code, commonly used to verify the performance of these machines, uses a uniform grid (interpolation and a 3D FFT) and local corrections to compute the motion of ~8 trillion bodies.
  14. Yes, the author uses a globally-adaptive time stepper, which is only efficient for very small N. There are adaptive time step methods that are local, and those are used for large systems.

    If you see bodies flung out after close passes, three solutions are available: reduce the time step, use a higher order time integrator, and (the most common method) add regularization. Regularization (often called "softening") removes the singularity by adding a constant to the squared distance. So 1 over zero becomes one over a small-ish and finite number.

  15. I can't recommend cards, but you are absolutely correct about porting CUDA to HIP: there was (is?) a hipify program in rocm that does most of the work.
  16. The US Treasury has one, though. Not sure if that satisfies the above criteria.
  17. Here's one that starts with the concept of a straight line and builds all the way to string theory. It's a monumental book, and it still challenges me. Roger Penrose's The Road To Reality.
  18. If you love this aesthetic and the concepts beneath it, I highly recommend Paolo Soleri's Arcology: The City in the Image of Man.
  19. I wasn't familiar with the "Wave32" term, but took "RDNA" to mean the smaller wavefront size. I've used both, and wave32 is still quite effective for CFD.
  20. Maybe never by the big players, but RDNA and even fp32 are perfectly fine for a number of CFD algorithms and uses; Stable Fluids-like algorithms and Lagrangian Vortex Particle Methods to name two.
  21. This has not been my experience in the academic/research side. Poison solver-based incompressible CFD regularly runs ~10x faster on equivalently-priced GPU systems, and has been doing so since I've been following it (since 2008). Some FFT-based solvers don't weak scale ideally, but that'd be even worse for CPU-based versions, as they use similar algorithms and would be spread over many more nodes.
  22. I'm surprised no one has mentioned Vc. I found ispc clunky and not as performant, and std::simd didn't support some useful math ops like rsqrt. Vc has been around for years, I have no trouble including it in my codes, it has masking and many of the most useful math ops, and I can get over 1 TF/s on a consumer-grade Ryzen and at least 3 TF/s on the big Epyc CPUs.

    https://github.com/VcDevel/Vc https://github.com/Applied-Scientific-Research/nvortexVc

  23. The Earth is a multi-physics complex system and OP claiming to "Simulate the Earth" is misleading. Methods that work on the atmosphere may not work on other parts. There are numerous scientific projects working on simulation earthquakes, both using ML and more "traditional" physics.
  24. Each node has 4 GPUs, and each of those has a dedicated network interface card capable of 200 Gbps each way. Data can move right from one GPU's memory to another. But it's not just bandwidth that allows the machine to run so well, it's a very low-latency network as well. Many science codes require very frequent synchronizations, and low latency permits them to scale out to tens of thousands of endpoints.
  25. FYI: LUMI uses a nearly identical architecture as Frontier (AMD CPUs and GPUs), and was also made by HPE.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal