Publications

Performance Characterization

 
Evaluating Architecture and Compiler Design through Static Loop Analysis. Yuriy Kashnikov, Pablo de Oliveira Castro, Emmanuel Oseret, and William Jalby. In High Performance Computing and Simulation (HPCS), 2013 International Conference on. IEEE Computer Society, (to appear) 2013. [ bib ]
 
ASK: Adaptive Sampling Kit for Performance Characterization. Pablo de Oliveira Castro, Eric Petit, Jean Christophe Beyler, and William Jalby. In Christos Kaklamanis, Theodore S. Papatheodorou, and Paul G. Spirakis, editors, Euro-Par 2012 Parallel Processing - 18th International Conference, volume 7484 of Lecture Notes in Computer Science, pages 89-101. Springer, 2012. [ bib | .pdf | .pdf ]
Characterizing performance is essential to optimize programs and architectures. The open source Adaptive Sampling Kit (ASK) measures the performance trade-offs in large design spaces. Exhaustively sampling all points is computationally intractable. Therefore, ASK concentrates exploration in the most irregular regions of the design space through multiple adaptive sampling methods. The paper presents the ASK architecture and a set of adaptive sampling strategies, including a new approach: Hierarchical Variance Sampling. ASK’s usage is demonstrated on two performance characterization problems: memory stride accesses and stencil codes. ASK builds precise models of performance with a small number of measures. It considerably reduces the cost of performance exploration. For instance, the stencil code design space, which has more than 31.10^8 points, is accurately predicted using only 1500 points.

 
Computing-Kernels Performance Prediction Using DataFlow Analysis and Microbenchmarking. Eric Petit, Pablo de Oliveira Castro, Tarek Menour, Bettina Krammer, and William Jalby. In International Workshop on Compilers for Parallel Computers, 2012. [ bib ]

Dataflow Parallelism

 
DSL Stream Programming on Multicore Architectures. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. In Sabri Pllana and Fatos Xhafa, editors, Programming Multi-core and Many-core Computing Systems, to appear. John Wiley and Sons, 2012. [ bib | .pdf ]
To effectively program parallel architectures it is important to combine a simple expression of the parallelism with efficient compiler optimizations. We propose a novel stream programming framework based on two domain specific languages that separate these two issues. A high-level declarative language allows to describe data dependencies between filters while an intermediate language enables powerful optimizations through a set of stream graph transformations. This two level approach offers a clean separation between the issue of programming complexity and the issue of target specific optimization.

 
Automatic mapping of stream programs on multicore architectures. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. In International Workshop on Compilers for Parallel Computers, 2010. [ bib ]
 
A Multidimensional Array Slicing DSL for Stream Programming. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. In Complex, Intelligent and Software Intensive Systems, International Conference, pages 913-918. IEEE Computer Society, 2010. [ bib | DOI | .pdf ]
Stream languages offer a simple multi-core programming model and achieve good performance. Yet expressing data rearrangement patterns (like a matrix block decomposition) in these languages is verbose and error prone. In this paper, we propose a high-level programming language to elegantly describe n-dimensional data reorganization patterns. We show how to compile it to stream languages.

 
Reducing memory requirements of stream programs by graph transformations. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. In High Performance Computing and Simulation (HPCS), 2010 International Conference on, pages 171-180. IEEE Computer Society, 2010. [ bib | DOI | .pdf ]
Stream languages explicitly describe fork-join parallelism and pipelines, offering a powerful programming model for many-core Multi-Processor Systems on Chip (MPSoC). In an embedded resource-constrained system, adapting stream programs to fit memory requirements is particularly important. In this paper we present a new approach to reduce the memory footprint required to run stream programs on MPSoC. Through an exploration of equivalent program variants, the method selects parallel code minimizing memory consumption. For large program instances, a heuristic accelerating the exploration phase is proposed and evaluated. We demonstrate the interest of our method on a panel of ten significant benchmarks. Using a multi-core modulo scheduling technique, our approach lowers considerably the minimal amount of memory required to run seven of these benchmarks while preserving throughput.

 
Design-Space Exploration of Stream Programs through Semantic-Preserving Transformations. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. [ bib | .pdf ]
Stream languages explicitly describe fork-join parallelism and pipelines, offering a powerful programming model for many-core Multi-Processor Systems on Chip (MPSoC). In an embedded resource-constrained system, adapting stream programs to fit memory requirements is particularly important. In this paper we present a design-space exploration technique to reduce the minimal memory required when running stream programs on MPSoC; this allows to target memory constrained systems and in some cases obtain better performance. Using a set of semantically preserving transformations, we explore a large number of equivalent program variants; we select the variant that minimizes a buffer evaluation metric. To cope efficiently with large program instances we propose and evaluate an heuristic for this method. We demonstrate the interest of our method on a panel of ten significant benchmarks. As an illustration, we measure the minimal memory required using a multi-core modulo scheduling. Our approach lowers considerably the minimal memory required for seven of the ten benchmarks.

 
Expression et optimisation des réorganisations de données dans du parallélisme de flots. Pablo de Oliveira Castro. PhD thesis, Université de Versailles Saint Quentin en Yvelines, 2010. [ bib | .pdf | .pdf ]

If you download IEEE publications please consider the copyright notice.