Centre de Conception du Logiciel
Workshop : Code, performance, compilation hybride et debug
© Inria / Photo Kaksonen
L'équipe Corse organise les 13 et 14 décembre 2016 un workshop dédié à la compilation et l'optimisation de codes logiciels: " Code characterization, performance, energy, hybrid-compilation, and debugging", au Centre de Conception du Logiciel (CCL), à Grenoble. Le workshop sera en anglais.
- Date : 13/12/2016 au 14/12/2016
- Lieu : Minatec Campus, 17 Rue des Martyrs, Grenoble - Batiment 50C - Room C203/C206
- Intervenants : Alexandra Jimborean (Uppsala U.), Louis-Noël Pouchet (Colorado St. U.), Ayal Zaks (Intel), Kim Ahn (Uppsala U.), Fabian Grüber (Inria)
- Organisateurs : Equipe CORSE (Fabrice Rastello)
Program
Tuesday,
14h - 14h45
Alexandra Jimborean
Automatic Detection of Extended Data-Race-Free Regions
Data-race-free (DRF) parallel programming becomes a standard as newly adopted memory models of mainstream programming languages such as C++ or Java impose data-race-freedom as a requirement.We propose compiler techniques that automatically delineate extended data-race-free regions (xDRF), namely regions of code which provide the same guarantees as the synchronization-free regions (in the context of DRF codes). xDRF regions stretch across synchronization boundaries, function calls and loop back-edges and preserve data-race-free semantics, thus increasing the optimization opportunities exposed to the compiler and to the underlying architecture.
Our compiler techniques precisely analyze the threads’ memory accessing behavior and data sharing in shared-memory, general-purpose parallel applications and can therefore infer the limits of xDRF code regions.We evaluate the potential of our technique by employing the xDRF region classification in a state-of-the-art, dual-mode cache coherence protocol. Larger xDRF regions reduce the coherence bookkeeping and enable optimizations for performance (6.1%) and energy efficiency (12.7%) compared to a standard directory-based coherence protocol.
15h15 - 16h
Louis-Noel Pouchet
Source Code Analysis for Kernel Characterization and Categorization
Polyhedral program transformations can perform highly aggressive restructuring of programs with static control-flow. However the task of finding the actually best transformation to optimize for speed or for energy remains a daunting challenge: to date the state of practice is to perform auto-tuning on the target device, running many different versions of the input program to observe which one actually performs best.
16h30 - 17h15
Ayal Zaks
Extending Loop Vectorizer towards supporting Open MP4.5 SIMD and outer loop auto-vectorization
Currently, LoopVectorizer in LLVM is specialized in
Wednesday,
10h - 10h40
Kim-Anh Tran
Compiling for energy efficient architectures: Hiding long-latencies on limited, energy-efficient cores
Memory latency becomes a performance bottleneck if long latency loadscannot be overlapped with useful computation.While aggressiveout-of-order processors are able to hide long latencies, limitedout-of-order and in-order cores fail to find enough independentinstructions to hide the delay.We propose software-only and software-hardware co-designs to overcome the performance degradation caused by long latency loads on small cores. Energy-efficient cores can, equipped with the appropriate compile-time support, significantly improve their performance formemory-bound applications. We separate loads from their uses, andoverlap their latencies with instructions from different blocks andloop iterations. Our techniques overcome restrictions which yieldedconventional compile-time techniques impractical: (i) staticallyunknown dependencies, (ii) insufficient independent instructions, and(iii) register pressure, and achieve a an average run time improvement of 10%,
10h45 - 11h30
Fabian Gruber
Extending QEMU to Build a Bottleneck Model based Performance Debugging Tool
QEMU, short for Quick Emulator, is a CPU emulator that is able to run applications compiled for one architecture on another (such as running an ARM binary on an x86 CPU, or vice versa). QEMU is not based on an interpreter, but instead uses binary translation to allow efficient execution of foreign instructions. Performance debugging is the process of, first, finding performance problems, that is, pinpointing code regions with suboptimal resource utilization, and then diagnosing the causes for these problems. This talk presents ongoing work the CORSE team has done in collaboration with ST Microelectronics on extending QEMU to instrument executed programs in order to collect high-level performance metrics. The goal of this presentation is not only to present our work, but also to solicit feedback on our ideas from the audience.
14h - 14h45
Diogo Sampaio
Profile Guided Hybrid Compilation (PhD defense)
Heat dissipation limitations caused a paradigm change in how computational capacity of chips are scaled, ranging from increasing the clock frequency to growing parallelism. In order to explore this characteristic computer applications must be made parallel, a hard job left to software developers. To aid in this process many optimizing compilers and frameworks have been developed, such as polyhedral compilation
This works advocates for the use of hybrid analyses when optimizing loops, regions where the majority of programs spend most of their time.
Keywords: Centre de Conception du Logiciel Performance Debugging Énergie Compilation hybride
Informations pratiques
-
Localisation
Minatec Campus
17 Rue des Martyrs, Grenoble
Batiment 50C - Room C203/C206
-
Contact
Equipe CORSE
Fabrice Rastello
+33 6 43-98-34-57