Profile: mkbosmans - Hacker Neue

mkbosmans

Joined Nov 7, 2018 92 karma

mkbosmans Oct 4, 2025 parent

I saw that you used `float z;` to later use `z` instead of the constant `0.`. You can also apply that to get a zero vector: `vec3 y;` and use `y` in place of `p-p`.
It seems that leaving the obsession behind some more can save another byte.
mkbosmans Oct 3, 2025 parent
Another two bytes found (I think)
```
  (d==0.?K*.01*h:c-c)
```
could become
```
  (d>0.?.0:.01)*K*h
```
mkbosmans Sep 3, 2025 parent

Especially in HPC there are lots of workloads that do not benefit from SMT. Such workloads are almost always bottlenecked on either memory bandwidth or vector execution ports. These are exactly the resources that are shared between the sibling threads.
So now you have a choice of either disabling SMT in the bios, or make sure the application correctly interprets the CPU topology and only spawns one thread per physical core. The former is often the easier option, both from software development and system administration perspective.
mkbosmans Sep 3, 2025 parent

Sort of niche indeed.
In addition to needing SMT to get full performance, there were a lot of other small details you needed to get right on Xeon Phi to get close to the advertised performance. Think of AVX512 and the HBM.
For practical applications, it never really delivered.
mkbosmans Jun 22, 2025 parent

I'm not sure what the OS on the X32 (or the Midas M32 sister model for that matter) is from factory. The higher end Midas Pro consoles do definitely run on Linux though.
mkbosmans Nov 24, 2024 parent

It is the performance win for similar looking results that I find improbable. For a box blur to look like gaussian blur, you would need multiple passes. Even though each pass is now O(1) instead of O(n) (with n the blur radius), due to caching effects I think a gaussian kernel would still be faster, especially for the small blur radius as described in the article.
mkbosmans Nov 24, 2024 parent

That link is not a box filter, as it still uses weights to approximate a gaussian convolution kernel. It just uses some special hardware to do less texture fetches. But that is a constant 2x improvement over the full 1D convolution, not the box filter O(1) approach that the article suggests that browsers are using.
mkbosmans Nov 24, 2024 parent

Do browsers really use a box filter to approximate a gaussian blur? That seems implausible to me, as they produce pretty different looking blurs.
mkbosmans Nov 24, 2024 parent

I noticed the blur only "sees" the underlying pixels directly below the glass surface. Any pixels outside that box, but within the blur radius do not get used in the gaussian filter. Probably the edge pixels are just repeated. You can see this when the light of the moon pops in to view when the edge of the rectangle starts to touch the moon. It would look more real if the light pixels from the moon start to through even when the box itself is still just over the dark area.
Would this be possible to achieve in CSS? I presume having a larger box with the blur, but clipping it to a smaller box or something like that.
mkbosmans Jul 28, 2024 parent

Why do you say a tritone substitution turns it in a II-bII-I? Can it be as easily said to be II-#I-I? In that case it would be C#.
mkbosmans Jul 28, 2024 parent

This looks to me like actual correct usage of the term exponential. Surprisingly correct usage of that term is rare, even in technical writing.
Let's say each dimensions added has a finite set of N possible values. Then for k dimensions there are a total N^k possibilities.
Combinatorical growth would actually be faster still, scaling like k!.
mkbosmans Jul 28, 2024 parent

Yes, I think it is valid usage.
Why do you think usage of the term _curse of dimensionality_ is different in ML?
mkbosmans Jul 25, 2024 parent

Nothing requires an ESOP to skip on a separate retirement fund for employees and expect them to retire from their share of the investment.
In the contrary, I expect the owners of an ESOP are very much in favor of having a well managed separate employee retirement fund. More so than in a publicly owned company.
But of course you are right that the risks factors of losing your income and losing your investment are pretty much 100% correlated for an ESOP. Some investment diversification is always a good idea.
mkbosmans Jul 25, 2024 parent

It is not an unconstrained optimization problem. The constraints in which the company operates might result in a feasible region where even the most profitable point is still a net loss. Those constraints may of course be self-imposed, like e.g. choosing a market to operate in.
mkbosmans May 10, 2024 parent

> get paid like you should for a job like that
Does that happen very often though?
mkbosmans Feb 24, 2024 parent

I do agree about the virtues of a resume that shows that it has been put together with care and attention.
But don't forget that while coding, the IDE, compiler, or whatever will correct your spelling mistakes in a very short feedback loop, with hardly any penalty for your output rate. That might mean that your dyslexic super programmer has never learned the value of carefully going over each text before submission.
mkbosmans Jan 7, 2024 parent

I think that is unique to Gitlab though. For example Azure DevOps will rebase the branch for you when completing a pull request.
mkbosmans Jan 7, 2024 parent

The big difference between adding the first `== true` to `if (x)` and adding more is that for a bare `x` you need more context to know whether the expression inside the `if` is of boolean type or something that will implicitly be casted to boolean. With `x == true` you know just looking at the `if` statement that it is a boolean expression. Adding more `== true` does not make it more explicit.
mkbosmans Jul 26, 2023 parent

That's doable with the Python interface to GnuCash. I hacked together a little script that sucks out invoice data out of a register file and creates PDFs for new ones using a LaTeX template. My wife uses it for her business, mostly without my involvement, so seems robust enough.
mkbosmans Jan 2, 2023 parent

VORtech | Scientific Software Engineer | 24-40h | ONSITE Delft, NL
We support both government institutions in modelling critical infrastructure (think rail transport, storm surge defense, power grid) and multinationals in bringing their research software to production quality standards. The ideal candidate combines elements of a mathematical consultant, software architect, HPC specialist and all-round coder.
A bit of Fortran doesn't scare you? Come and join us building computational software at VORtech. https://www.vortech.nl/en/about-us/vacancies/scientific-soft... (If you are not already proficient in Dutch, you welcome the opportunity to learn Dutch quickly)
mkbosmans Sep 3, 2022 parent

> Unlike using either underflow exceptions or subnormals, the use of flush-to-zero and denormals-are-zero > is acceptable only in the programs where computation errors do not matter at all, such as games.
That is not true. FTZ and DAZ are perfectly reasonable in a lot of scientific computing scenario's, where computation errors in general are closely scrutinized.
mkbosmans Sep 3, 2022 parent

Reasoning from first principles:
When a problem is linear, a small pertubation in your input will result in a similarly small difference in the results, bounded by some constant factor. When a problem is non-linear, there is no such constant upper bound to the output error.
There are differences in the amount of nonlinearity however. It seems that your algorithm was nonlinear, e.g. using log and exp functions, but otherwise pretty well behaved. So while the factor between input and output error might not be constant, but rather dependent on input values, it is still the case that in the limit of the input error to zero, the output error will also vanish. (obviously in the real domain, not considering floating point).
Contrast this with a problem that has discontinuities in it. In that case it might happen that however small you make your input error, any nonzero pertubation will cause a significant change in the solution. The TeX layout problem is an example of this, but also happens often enough in physical simulations.
mkbosmans Jun 12, 2022 parent

To be fair, most analytical models working with data already incorporate a lot of 'data science' techniques. That does not have to be ML or NN, but could be as trivial as a least squares regression to fit you model to a set of observations that overdetermine the system of equations.
A more advanced example of a technique that was used before it was called data science is data assimilation (DA)[1]. Here you assume that you have observations (e.g. sensor data) that you want to use to inform the model, but they are noisy in some sense. With DA you take a set of observations at t=0 and fit a numerical model to that. Then you time-step the model to t=1 where you have new observations. The model and observations don't necessarily agree, but there is value in incorporating information from both. Based on e.g. your statistical description of the sensor noise, DA techniques give you the tools to combine data and models.
A good example of DA is 4DCOOL[2], combining temperature sensors in a datacenter with a CFD model. Because the model is physics-based, after some time you get a good idea of the temperature distribution in the whole room, even if you only have pretty sparse sensor data. (disclosure: I work for the company)
[1] https://en.wikipedia.org/wiki/Data_assimilation [2] https://www.vortech.nl/en/projects/a-digital-twin-for-the-in...
mkbosmans Jun 12, 2022 parent

Yes, I think ML could be useful at places where the current physics modelling falls short. The nowcasting rain example from DeepMind is in some respects pretty comparable with the Next Ocean wave prediction.
In the wave prediction case, wave propagation and dispersion is pretty well understood. But one could add ML-based nonlinear terms to the equations to capture everything we don't know. That has the possibility of giving better predictions.
In contrast with the rogue wave example, there is a lot of relevant input data (the radar backscatter) and the model output can be verified after the fact (the ship's 6-dof acceleration). What I was objecting to was the idea of: slap a ML model on the whole problem and call it a day.
mkbosmans Jun 11, 2022 parent

My guess would be: not that much. But I work in the field of numerical mathematics and computational physics, so I could have some bias. :-D
The more nuanced answer would be that taking the raw radar data as input to e.g. a neural network and train that to output the predicted timeseries of future ship motion is not feasible. It would take a giant network and too much compute to train for very unreliable results.
This problem consists of a lot of subproblems, most of which are pretty well understood. For example how to translate the 6-dof motion of a ship to the vertical displacement of a heli platform on that ship is just some simple coordinate transforms. You don't gain anything by including that in the neural net. Potentially some data science techniques could be useful to handle some of the less understood submodels. Sort of like it is done in CFD with NN as a turbulence model within an existing PDE solver.
mkbosmans Jun 11, 2022 parent

Sure, but tide and wave models generally cover a much larger area, such as whole coastal area's and over larger time spans. An example of such a model would be: https://www.youtube.com/watch?v=eN6CDaoMZ7U In the Netherlands this is used e.g. to determine when to close the storm surge barriers to protect the river delta area from flooding due to a tide+storm combination.
In contrast, the Next Ocean product is meant to model the direct area around a single ship, predicting minutes in advance with a second by second granularity. They are able to reuse the raw data coming from the navigation radar that is already installed on every sea-going ship. It uses the backscatter from water ripples to determine the wave field around the ship. Interestingly, for navigational purposes exactly this part of the data is considered noise and filtered out by the on board navigation system. (https://nextocean.nl/technology.php)
[I did some work on the software implementation for Next Ocean a couple of years back]
Anyway, neither type of model has anything to do with rogue waves.
mkbosmans Jun 11, 2022 parent

This is a commercially available product for wave prediction on ships: https://nextocean.nl/wavepredictor.php
Not specifically meant for rogue waves though, but rather to predict calm streaks in the waves affecting the ships motion. This is useful, e.g. when landing a helicopter or transferring cargo or persons to another ship while at sea.
I can confirm that there is a lot of interesting math involved and, as noted in the article, a challenging amount of computation to do in a real-time prediction setting.
mkbosmans Mar 3, 2022 parent

Your life will be a lot easier if you decide not to chase bit-level reproducibility when doing floating point calculations. You can try setting CFLAGS to get the same result on all hardware, but that is a lot of effort, especially if you don't want to leave any performance on the table.
And for what gain, just to make it simpler for your tests to compare against a fixed reference? In practice you don't gain anything else by having reproducible calculations because your inputs will have a lot more uncertainty in them than just beyond the 16th digit.
Better to embrace the difference in results between hardware architecture, parallel execution, compiler version, optimization level, etc. This means spending time making your test suite more robust to small variations, but it saves you time investigating failed test runs that are insignificant. And as a bonus, it forces you to understand the numerical properties of your code better.
mkbosmans Mar 3, 2022 parent

Sure, but that doesn't change the fact that you cannot simply plug that into a simple formula to get the e.g. the optimal block size for cache-blocking.
To be more precise: you can try to figure out how cache size would affect optimal block size in theory and in practice this gets you pretty decent performance. But to get the last bit of performance out of the hardware you need to benchmark several configurations on the hardware and select the best one based on run time parameters like u-arch and cache size.

mkbosmans Mar 1, 2022 parent

It is an overly strict interpretation of that rule to assume it prohibits any randomized algorithm.

Aren't these the same steps for any given input?

  1) Roll dice to get random number N
  2) Get the Nth permutation of the input sequence (for some ordering of permutations)
  3) Permute input sequence
  4) If sorted: done, else goto step 1

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous