1. Extend Blender itself. This will net you the maximum performance, but you essentially need to maintain your own custom fork of Blender. Generally not recommended outside of large pipeline environments with dedicated support engineers.
2. Native Python addon. This is what 99% of addons are, just accessing scene data via Blender's Python interface. Drawbacks mentioned above, though there are some helper utilities to batch process information to regain some performance.
3. Hybrid Python Addon. You use the Python API as a glue layer to pass information between Blender and a natively compiled library via Python's C Extension API. With the exception of extracting scene data info, this will give you back the compute performance and host resource scalability you'd get from building on Blender directly. Being able to escape the GIL opens a lot of doors for parallel computation.
For vendors, the former is obviously a no-go. The latter has the issue of be throttled by Python, so you have to effectively create a shim that communicates with an external library or application that actually performs compute intensive tasks.
Most (if not all) industry DCCs provide a dedicated C++ SDK with Python bindings available if desired.