Preferences

Here's an example from this morning. At 10:00 am, a colleague created a ticket with an idea for the music plugin I'm working on: wouldn't it be cool if we could use nod detection (head tracking) to trigger recording? That way, musicians who use our app wouldn't need a foot switch (as a musician, you often have your hands occupied).

Yes, that would be cool. An hour later, I shipped a release build with that feature fully functional, including permissions plus a calibration UI that shows if your face is detected and lets you adjust sensitivity, and visually displays when a nod is detected. Most of that work got done while I was in the shower. That is the second feature in this app that got built today.

This morning I also created and deployed a bug fix release for analytics on one platform, and a brand-new report (fairly easy to put together because it followed the pattern of other reports) for a different platform.

I also worked out, argued with random people on HN and walked to work. Not bad for five hours! Do I know how long it would have taken to, for example, integrate face detection and tracking into a C++ audio plugin without assistance from AI? Especially given that I have never done that before? No, I do not. I am bad at estimating. Would it have been longer than 30 minutes? I mean...probably?


Just having a 'count-in' type feature for recording would be much much more useful. Head nodding is something I do all the time anyway as a musician :).

I don't know what your user makeup is like, but shipping a CV feature same day sounds so potentially disastrous.. There are so many things I would think you would at least want to test, or even just consider with the kind of user emapthy we all should practice.

I appreciate this example. This does seem like a pretty difficult feature to build de novo. Did you already have some machine vision work integrated into your app? How are you handling machine vision? Is it just a call to an LLM API? Or are you doing it with a local model?
There was no machine vision stuff in the app at that point. Claude suggested a couple of different ways of handling this and I went with the easiest way: piggybacking on the Apple Vision Framework (which means that this feature, as currently implemented, will only work on Macs - I'm actually not sure if I will attempt a Windows release of this app, and if I do, it won't be for a while).

Despite this being "easier" than some of the alternatives, it is nonetheless an API I have zero experience with, and the implementation was built with code that I would have no idea how to write, although once written, I can get the gist. Here is the "detectNodWithPitch" function as an example (that's how a "nod" is detected - the pitch of the face is determined, and then the change of pitch is what is considered a nod, of course, this is not entirely straightforward).

```

- (void)detectNodWithPitch:(float)pitch { // Get sensitivity-adjusted threshold // At sensitivity 0: threshold = kMaxThreshold degrees (requires strong nod) // At sensitivity 1: threshold = kMaxThreshold - kThresholdRange degrees (very sensitive) float sens = _cppOwner->getSensitivity(); float threshold = NodDetectionConstants::kMaxThreshold - (sens * NodDetectionConstants::kThresholdRange);

    // Debounce check
    NSTimeInterval now = [NSDate timeIntervalSinceReferenceDate];
    if (now - _lastNodTime < _debounceSeconds)
        return;

    // Initialize baseline if needed
    if (!_hasBaseline)
    {
        _baselinePitch = pitch;
        _hasBaseline = YES;
        return;
    }

    // Calculate delta: positive when head tilts down from baseline
    // (pitch increases when head tilts down, so delta = pitch - baseline)
    float delta = pitch - _baselinePitch;

    // Update nod progress for UI meter
    // Normalize against a fixed max (20 degrees) so the bar shows absolute head movement
    // This allows the threshold line to move with sensitivity
    constexpr float kMaxDisplayDelta = 20.0f;
    float progress = (delta > 0.0f) ? std::min(delta / kMaxDisplayDelta, 1.0f) : 0.0f;
    _cppOwner->setNodProgress(progress);

    if (!_nodStarted)
    {
        _cppOwner->setNodInProgress(false);

        // Check if nod is starting (head tilting down past nod start threshold)
        if (delta > threshold * NodDetectionConstants::kNodStartFactor)
        {
            _nodStarted = YES;
            _maxPitchDelta = delta;
            _cppOwner->setNodInProgress(true);
            DBG("HeadNodDetector: Nod started, delta=" << delta);
        }
        else
        {
            // Adapt baseline slowly when not nodding
            _baselinePitch = _baselinePitch * (1.0f - _baselineAdaptRate) + pitch * _baselineAdaptRate;
        }
    }
    else
    {
        // Track maximum delta during nod
        _maxPitchDelta = std::max(_maxPitchDelta, delta);

        // Check if head has returned (delta decreased below return threshold)
        if (delta < threshold * _returnFactor)
        {
            // Nod complete - check if it was strong enough
            if (_maxPitchDelta > threshold)
            {
                DBG("HeadNodDetector: Nod detected! maxDelta=" << _maxPitchDelta << " threshold=" << threshold);
                _lastNodTime = now;
                _cppOwner->handleNodDetected();
            }
            else
            {
                DBG("HeadNodDetector: Nod too weak, maxDelta=" << _maxPitchDelta << " < threshold=" << threshold);
            }

            // Reset nod state
            _nodStarted = NO;
            _maxPitchDelta = 0.0f;
            _baselinePitch = pitch;  // Reset baseline to current position
            _cppOwner->setNodInProgress(false);
            _cppOwner->setNodProgress(0.0f);
        }
    }
}

@end

```

> An hour later, I shipped a release build

I would love to see that pull request, and how readable and maintainable the code is. And do you understand the code yourself, since you've never done this before?

This item has no comments currently.