Publications

Performance Characterization

 
Fine-grained Benchmark Subsetting for System Selection. Pablo de Oliveira Castro, Yuriy Kashnikov, Chadi Akel, Mihail Popov, and William Jalby. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pages 132:132-132:142, New York, NY, USA, 2014. ACM. [ bib | DOI | http | .pdf | .pdf ]
System selection aims at finding the best architecture for a set of programs and workloads. It traditionally requires long running benchmarks. We propose a method to reduce the cost of system selection. We break down benchmarks into elementary fragments of source code, called codelets. Then, we identify two causes of redundancy: first, similar codelets; second, codelets called repeatedly. The key idea is to minimize redundancy inside the benchmark suite to speed it up. For each group of similar codelets, only one representative is kept. For codelets called repeatedly and for which the performance does not vary across calls, the number of invocations is reduced. Given an initial benchmark suite, our method produces a set of reduced benchmarks that can be used in place of the original one for system selection.

We evaluate our method on the NAS SER benchmarks, producing a reduced benchmark suite 30 times faster in average than the original suite, with a maximum of 44 times. The reduced suite predicts the execution time on three target architectures with a median error between 3.9% and 8%.

 
Adaptive Sampling for Performance Characterization of Application Kernels. Pablo de Oliveira Castro, Eric Petit, Asma Farjallah, and William Jalby. Concurrency and Computation: Practice and Experience, 2013. [ bib | DOI | .pdf ]

Characterizing performance is essential to optimize programs and architectures. The open source Adaptive Sampling Kit (ASK) measures the performance trade-off in large design spaces. Exhaustively sampling all sets of parameters is computationally intractable. Therefore, ASK concentrates exploration in the most irregular regions of the design space through multiple adaptive sampling strategies. The paper presents the ASK architecture and a set of adaptive sampling strategies, including a new approach called Hierarchical Variance Sampling. ASK's usage is demonstrated on three performance characterization problems: memory stride accesses, Jacobian stencil code, and an industrial seismic application using 3D stencils. ASK builds accurate models of performance with a small number of measures. It considerably reduces the cost of performance exploration. For instance, the Jacobian stencil code design space, which has more than 31 × 10^8 combinations of parameters, is accurately predicted using only 1500 combinations.

 
Is Source-code Isolation Viable for Performance Characterization? Chadi Akel, Yuriy Kashnikov, Pablo de Oliveira Castro, and William Jalby. In International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI). IEEE Computer Society, 2013. [ bib | .pdf | .pdf ]

Source-code isolation finds and extracts the hotspots of an application as independent isolated fragments of code, called codelets. Codelets can be modified, compiled, run, and measured independently from the original application. Source-code isolation reduces benchmarking cost and allows piece-wise optimization of an application. Source-code isolation is faster than whole-program benchmarking and optimization since the user can concentrate only on the bottlenecks. This paper examines the viability of using isolated codelets in place of the original application for performance characterization and optimization. On the NAS benchmarks, we show that codelets capture 92.3% of the original execution time. We present a set of techniques for keeping codelets as faithful as possible to the original hotspots: 63.6% of the codelets have the same assembly as the original hotspots and 81.6% of the codelets have the same run time performance as the original hotspots.

 
Evaluating Architecture and Compiler Design through Static Loop Analysis. Yuriy Kashnikov, Pablo de Oliveira Castro, Emmanuel Oseret, and William Jalby. In High Performance Computing and Simulation (HPCS), 2013 International Conference on, pages 535 - 544. IEEE Computer Society, 2013. [ bib | DOI ]
 
ASK: Adaptive Sampling Kit for Performance Characterization. Pablo de Oliveira Castro, Eric Petit, Jean Christophe Beyler, and William Jalby. In Christos Kaklamanis, Theodore S. Papatheodorou, and Paul G. Spirakis, editors, Euro-Par 2012 Parallel Processing - 18th International Conference, volume 7484 of Lecture Notes in Computer Science, pages 89-101. Springer, 2012. [ bib | .pdf | .pdf ]
Characterizing performance is essential to optimize programs and architectures. The open source Adaptive Sampling Kit (ASK) measures the performance trade-offs in large design spaces. Exhaustively sampling all points is computationally intractable. Therefore, ASK concentrates exploration in the most irregular regions of the design space through multiple adaptive sampling methods. The paper presents the ASK architecture and a set of adaptive sampling strategies, including a new approach: Hierarchical Variance Sampling. ASK’s usage is demonstrated on two performance characterization problems: memory stride accesses and stencil codes. ASK builds precise models of performance with a small number of measures. It considerably reduces the cost of performance exploration. For instance, the stencil code design space, which has more than 31.10^8 points, is accurately predicted using only 1500 points.

 
Computing-Kernels Performance Prediction Using DataFlow Analysis and Microbenchmarking. Eric Petit, Pablo de Oliveira Castro, Tarek Menour, Bettina Krammer, and William Jalby. In International Workshop on Compilers for Parallel Computers, 2012. [ bib ]

Dataflow Parallelism

 
DSL Stream Programming on Multicore Architectures. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. In Sabri Pllana and Fatos Xhafa, editors, Programming Multi-core and Many-core Computing Systems, to appear. John Wiley and Sons, 2012. [ bib | .pdf ]
To effectively program parallel architectures it is important to combine a simple expression of the parallelism with efficient compiler optimizations. We propose a novel stream programming framework based on two domain specific languages that separate these two issues. A high-level declarative language allows to describe data dependencies between filters while an intermediate language enables powerful optimizations through a set of stream graph transformations. This two level approach offers a clean separation between the issue of programming complexity and the issue of target specific optimization.

 
Automatic mapping of stream programs on multicore architectures. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. In International Workshop on Compilers for Parallel Computers, 2010. [ bib ]
 
A Multidimensional Array Slicing DSL for Stream Programming. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. In Complex, Intelligent and Software Intensive Systems, International Conference, pages 913-918. IEEE Computer Society, 2010. [ bib | DOI | .pdf ]
Stream languages offer a simple multi-core programming model and achieve good performance. Yet expressing data rearrangement patterns (like a matrix block decomposition) in these languages is verbose and error prone. In this paper, we propose a high-level programming language to elegantly describe n-dimensional data reorganization patterns. We show how to compile it to stream languages.

 
Reducing memory requirements of stream programs by graph transformations. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. In High Performance Computing and Simulation (HPCS), 2010 International Conference on, pages 171-180. IEEE Computer Society, 2010. [ bib | DOI | .pdf ]
Stream languages explicitly describe fork-join parallelism and pipelines, offering a powerful programming model for many-core Multi-Processor Systems on Chip (MPSoC). In an embedded resource-constrained system, adapting stream programs to fit memory requirements is particularly important. In this paper we present a new approach to reduce the memory footprint required to run stream programs on MPSoC. Through an exploration of equivalent program variants, the method selects parallel code minimizing memory consumption. For large program instances, a heuristic accelerating the exploration phase is proposed and evaluated. We demonstrate the interest of our method on a panel of ten significant benchmarks. Using a multi-core modulo scheduling technique, our approach lowers considerably the minimal amount of memory required to run seven of these benchmarks while preserving throughput.

 
Design-Space Exploration of Stream Programs through Semantic-Preserving Transformations. Pablo de Oliveira Castro, Stéphane Louise, and Denis Barthou. [ bib | .pdf ]
Stream languages explicitly describe fork-join parallelism and pipelines, offering a powerful programming model for many-core Multi-Processor Systems on Chip (MPSoC). In an embedded resource-constrained system, adapting stream programs to fit memory requirements is particularly important. In this paper we present a design-space exploration technique to reduce the minimal memory required when running stream programs on MPSoC; this allows to target memory constrained systems and in some cases obtain better performance. Using a set of semantically preserving transformations, we explore a large number of equivalent program variants; we select the variant that minimizes a buffer evaluation metric. To cope efficiently with large program instances we propose and evaluate an heuristic for this method. We demonstrate the interest of our method on a panel of ten significant benchmarks. As an illustration, we measure the minimal memory required using a multi-core modulo scheduling. Our approach lowers considerably the minimal memory required for seven of the ten benchmarks.

 
Expression et optimisation des réorganisations de données dans du parallélisme de flots. Pablo de Oliveira Castro. PhD thesis, Université de Versailles Saint Quentin en Yvelines, 2010. [ bib | .pdf | .pdf ]

If you download IEEE publications please consider the copyright notice.