High Performance Computing: Execution Time vs. Memory in Distributed Architectures

Why Does Your Code Scale (or Not)? 🚀 In this video, we dive deep into the heart of High-Performance Computing (HPC) optimization. The shift to a distributed-memory architecture—involving clusters and supercomputers—radically changes the game compared to traditional programming. We analyze the critical trade-off between execution time and memory consumption. Whether you are parallelizing a fluid simulation algorithm or training a Deep Learning model on a cluster, understanding this relationship is the key to avoiding the waste of your precious compute hours.

HIGH PERFORMANCE COMPUTING

Eric Kabe, Computational Geostatistician

4/24/20262 min read

French Version:

High Performance Computing

English Version:

Optimizing Variogram Modeling with Dynamic MPI Allocation in QtCreator

You didn’t come this far to stop.... Keep pushing the limits of your hardware, I got you!

Questions...

To use lscpu to dynamically configure an MPI-based variogram modeling module in QtCreator, you can follow this structure for your blog article. The key idea is to use lscpu to detect physical cores vs. threads, ensuring your variogram calculation—which is computationally intensive—doesn't suffer from oversubscription.

Variogram modeling is the backbone of geostatistical estimation, often requiring thousands of squared-difference calculations across spatial lags. When scaling this to large datasets, MPI (Message Passing Interface) is essential. However, hardcoding the number of processes (e.g., mpirun -n 4) often leads to performance bottlenecks if it doesn't match the specific machine's architecture.

Step 1: Grabbing Machine Specs with lscpu

The lscpu command provides a clean summary of your hardware. For high-performance kriging or simulation, you typically want to pin processes to physical cores rather than logical threads to avoid resource contention.

You can extract the core count in your Qt/C++ code using QProcess or a standard pipe:

// Example: Extracting physical core count in C++

int getPhysicalCores() {

FILE* pipe = popen("lscpu | grep '^Core(s) per socket:' | awk '{print $4}'", "r");

if (!pipe) return 1;

char buffer[128];

fgets(buffer, 128, pipe);

pclose(pipe);

return atoi(buffer);

}

Step 2: Configuring MPI in QtCreator

To make QtCreator play along with MPI, you must modify your .pro file (if using qmake) to use the MPI compiler wrappers like mpicxx.

In your .pro file:

QMAKE_CXX = mpicxx QMAKE_LINK = mpicxx QMAKE_CXXFLAGS += $$system(mpicxx --showme:compile) QMAKE_LFLAGS += $$system(mpicxx --showme:link)

Step 3: Running the Variogram Module Dynamically

Instead of the standard run button, set up a Custom Executable in QtCreator’s Project Run settings:

Go to Projects > Run.
Add a Custom Executable.
Executable: /usr/bin/mpiexec (or mpirun).
Arguments: Use a shell script wrapper or environment variables to pass the core count discovered by your lscpu logic.

Step 4: Implementation in Variogram Modeling

In your MPI-based module, use the core count to distribute your spatial data blocks. For example, if lscpu reports 8 physical cores, your master process can split the search for pairs across 7 worker ranks, reserving one for orchestration.

Conclusion

By combining lscpu for hardware discovery and QtCreator’s flexible run configurations, you ensure your variogram modeling module remains "architecture-aware," maximizing throughput for your geostatistical workflows.