tl;dr the base ModernBERT was trained with code in mind unlike most encoder-only models (therefore assuming it was also trained on JSON/YAML objects) and also includes a custom tokenizer to support that, which is why I mention that indentation is important since different levels of indentation have different single tokens.
This is mostly theoetical and does require a deeper dive to confirm.
This is mostly theoetical and does require a deeper dive to confirm.