You typically write a bash script with some metadata in rows at the top that say how many nodes, how many cores on those nodes you want, and what if any accelerator hardware you need.
Then typically it’s just setting up the environment to run your software. On most supercomputers you need to use environment modules (´module load gcc@10.4’) to load up compilers, parallelism libraries, and software, etc. You can sometimes set this stuff up on the login node to to try out and make sure things work, but generally you’ll get an angry email if you run processed for more than 10 minutes because login nodes are a shared resource.
There’s a tension because it’s often difficult to get this right, and people often want to do things like ´pip install <package>’ but you can leave a lot of performance on the table because pre-compiled software usually targets lowest common denominator systems rather than high end ones. But cluster admins can’t install every Python package ever and precompile it. Easybuild and Spack aim to be package managers that make this easier.
Source: worked in HPC in physics and then worked at a University cluster supporting users doing exactly this sort of thing.
90% of my interactions are ssh'ing into a login node and running code with SLURM, then downloading the data.
You typically develop programs with MPI/OpenMP to exploit multiple nodes and CPUs. In Fortran, this entails a few pragmas and compiler flags.
This will have much of what you need.
It's like kubernetes but invented long ago before kubernetes