Comment by remram - Hacker Neue

remram May 14, 2022 parent

cffi is probably the canonical way to do this on Python, I wonder what the performance is there.

edit: 30% improvement, still 100x slower than e.g. Rust.

kevin_thibedeau May 14, 2022

If you need a fast loop in Python then switch to Cython.

cycomanic May 14, 2022

Using a cython binding compared to the Ctypes one gives a speedup of a factor of 3. That's still not very fast, now putting the whole thing into a cython program. Like so:

    def extern from "newplus/plus.h":
        cpdef int plusone(int x)

    cdef extern from "newplus/plus.h":
        cpdef long long current_timestamp()


    def run(int count):
        cdef int start 
        cdef int out 
        cdef int x = 0
        start = current_timestamp()
        while x < count:
            x = plusone(x)
        out = current_timestamp() - start
        return out

Actually yields 597 compared to the pure c program yielding 838.

remram OP May 14, 2022

That's fine for a tight loop. Performance might still matter in a bigger application. This benchmark is measuring the overhead, which is relevant in all contexts; the fact that it does it with a loop is a statistical detail.

worik May 14, 2022

> If you need a fast loop in Python then switch to Cython.

If you need a fast loop do not use Python.

I am a Python hater, but this is unfair. Python is not designed to do fast loops. Crossing the FFI boundary happens very few times compared to iterations of tight loops.

(I have very little experience using FFI, but I am about to - hence keen interest)

kevin_thibedeau May 15, 2022

The point is Python's exception mechanism is a particularly heavyweight way to do loop control. This benchmark is heavily dominated by that overhead in a way other interpreted languages aren't.

This item has no comments currently.