TUTORIALS

Tutorials provide information on recent
developments and issues in high performance
computing presented in a seminar-style format.
The following section provides abstracts and
information about the authors of tutorials
planned for Supercomputing `94. To register for
a tutorial, please use the Conference Registration
Form. Tutorial registration fees can be found on the
form.

Tutorials at a Glance

Tutorial levels = % Beginner / % Intermediate / % Advanced

Monday, November 14 - Half Day Afternoon

M1 Sorting Out Communication Libraries: A Comparison of NX, CMMD, PVM and MPI
William Saphir -15/60/25

Monday, November 14 - Full Day

M2 The Science of Benchmarking
Roger Hockney and David Bailey -25/50/25

M3 Introduction to Volume Visualization: Imaging Multidimensional Scientific Data
T. Todd Elvins -85/15/0

M4 PVM/HeNCE: Tools for Heterogeneous Network Computing
Adam Beguelin, Jack Dongarra, Al Geist, Robert Manchek, and Vaidy Sunderam-20/50/30

M5 Message-Passing Programming for Scientists and Engineers
Hugh Caffey and Cherri Pancake -80/20/0

M6 Compilers and Runtime Support for Distributed Memory Machines
J. Ramanujam and Alok Choudhary -20/50/30

M7 High Performance Computing for Scientific Applications: AnIntroduction
Horst D. Simon and Subhash Saini -50/50/0

M8 Interdisciplinary Adventures in Computational Biology
Paul Stolorz -0/100/0

M9 ATM in a Supercomputer Network Environment
James D. McCabe -40/40/20

M10 Clustered Workstation Environments
Dennis W. Duke, Doug Elias, Miron Livny, and Louis H. Turcotte -20/60/20

M11 Parallel I/O on Highly Parallel Systems
Bill Nitzberg and Samuel A. Fineberg -25/50/25

M12 Compilers: Source/Code-Level Optimizations and Resource Allocation
Constantine D. Polychronopoulos and Alex Nicolau -10/30/60

Friday, November 18 - Half Day Morning

F1 An Introduction to Virtual Reality
Henry Sowizral -90/10/0

Friday, November 18 - Half Day Afternoon

F2 Advanced Issues in Virtual Reality
Henry Sowizral -10/50/40

F3 Performance Tuning on RISC Systems
Ramesh Agarwal, David H. Bailey, Charles Grassl, Fred Gustavson, Dick Hessel, and Mohammad Zubair-10/80/10

F4 Parallel Programming Tools -- Status, Evaluation and Comparison
Doreen Y. Cheng -20/60/20

F5 Experiences with Scalable Parallel Architectures at Maui, Argonne and Cornell
Brian T. Smith, Pete Siegel, and Tom Morgan -25/50/25

F6 Methodologies and Tools for Tuning Parallel Programs -- Facts and Fantasies
Jerry Yan -25/50/25

Friday, November 18 - Full Day

F7 Scientific Visualization: From Data to Photons
Mike Bailey, Chuck Hansen, and Lloyd Treinish -60/30/10

F8 C++ for High Performance Computing
Ian G. Angus -75/25/0

F9 Linear Algebra Algorithms and Software for Large Scientific Problems
Jack Dongarra, Iain Duff, Danny Sorensen and Henk van der Vorst -20/50/30

F10 Computational Chemistry: Beyond the Black Box
Rozeanne Steckler, Peter Taylor, and Franklin Brown -75/25/0

Monday, November 14
Half Day Afternoon

M1: Sorting Out Communication Libraries:
A Comparison of NX, CMMD, PVM and MPI

William Saphir
15/60/25

With the recent widespread acceptance of PVM and MPI, choosing a communication library for parallel message-passing applications has become difficult. Little information is available that describes the strengths and weaknesses of different libraries. This tutorial presents a detailed comparison of several widely used libraries, focusing on performance, functionality, and design. The comparison includes discussions of point-to-point communication semantics, performance tuning, porting, and the rationale for advanced new features in MPI. The goal of the tutorial is to provide applications programmers with enough information to make informed decisions about which communication library to use, and to enable them to write portable and efficient message passing applications.

William Saphir is a senior staff member at the NAS supercomputing facility at NASA Ames Research Center, where he helps researchers optimize and port codes for the NAS parallel systems. He received a Ph.D. in Physics from The University of Texas at Austin in 1992.

Monday, November 14 - Full Day

M2: The Science of Benchmarking

Roger Hockney and David Bailey
25/50/25

This tutorial presents a scientific approach to benchmarking. It defines a clear set of units and symbols, followed by a carefully defined set of performance parameters and metrics, and finally a hierarchy of parallel benchmarks to measure them. This methodology follows that of the new PARKBENCH committee of users in their first report entitled "Public International Benchmarks for Parallel Computers." The first lecture presents the theory of the parametric representation of performance and the design of low-level benchmarks to measure these basic computer characteristics. The second lecture presents different metrics for judging the performance of complete algorithms and applications, and the new DUSD method for representing the scaling of an application for all computers and all problem sizes in a single dimensionless diagram. The third lecture describes some of the most common fallacies in benchmarking. The fourth lecture presents some recent benchmark results on currently available highly parallel systems.

Roger Hockney is Emeritus Professor of Computer Science at Reading University, visiting Professor at Southampton University, and consultant in parallel computing. He is co-author of two books: "Parallel Computers-2," and "Computer Simulation Using Particles." He is also currently chairman of the PARKBENCH committee, and co-founder of the Euroben benchmarking initiative. Professor Hockney obtained his B.Sc. at Cambridge University, and his Ph.D. at Stanford University.

David H. Bailey is with the NAS Applied Research Branch at NASA Ames Research Center. In 1976 Bailey received his Ph.D. from Stanford University in mathematics. Since then his research has ranged from computational number theory to numerical algorithms and supercomputer benchmarking. He is one of the authors of the widely cited NAS Parallel Benchmarks. Last year, Dr. Bailey received the Sid Fernbach award from the IEEE Computer Society for his contributions to the field of high performance computing.

M3: Introduction to Volume Visualization: Imaging Multidimensional Scientific Data

T. Todd Elvins
85/15/0

The emphasis of this tutorial will be on data-driven visualization techniques applicable to all disciplines; i.e., how a scientist can immediately get started exploring and imaging data using existing systems. The course will begin with an introduction to the exciting field of volume visualization, its foundations and terminology. Fields of applications, dozens of example images, data characteristics, and strategies for data reconstruction and classification will be shown. Fundamental concepts of volume visualization, including interactive methods, surface-fitting methods, ray-casting methods, and projection methods will be discussed in depth, followed by a rigorous introduction to algorithm enhancements and optimizations.

T. Todd Elvins is a staff visualization programmer at the San Diego Supercomputer Center. He leads a group of software engineers, animators, and media specialists who assist scientists in gaining insight into a wide variety of intellectual problems. Todd has been a computer graphics enthusiast for eight years and has been doing research in distributed and parallel volume rendering for the past four years.

M4: PVM/HeNCE: Tools for Heterogeneous Network Computing

Adam Beguelin, Jack Dongarra, Al Geist, Robert Manchek, and Vaidy Sunderam
20/50/30

Book provided: PVM: Parallel Virtual Machine -- A Users Guide and Tutorial for Network Parallel Computing. MIT Press, 1994.

This tutorial will be devoted to understanding PVM and HeNCE -- software systems that enable concurrent computing on heterogeneous collections of multiprocessors, supercomputers, scalar machines, and workstations. PVM (Parallel Virtual Machine) is a software infrastructure that enables heterogeneous collections of machines to be used as a general purpose concurrent computing resource. The tutorial will focus on developing concurrent applications for PVM, using several models of parallelism. This will include fundamental concepts in data and function partitioning as they apply to networked environments, tree- and crowd-based control structures, adjusting granularity and tuning for optimal performance. The approach will be based on explaining the conceptual notions in concurrent program development for PVM, followed by detailed descriptions of example textbook applications. The tutorial will also include pragmatic aspects, including PVM installation and operation, debugging and profiling methods, common programming and operational errors, and performance analysis and tuning. In addition, some relevant issues regarding PVM internals, using PVM on MPP's, and ongoing and future work will be discussed. HeNCE is a graphical toolkit and methodology that significantly eases the task of application development for PVM. HeNCE is based on the notion that concurrency can be expressed using a variant of directed acyclic graphs, where vertices represent computation and arcs represent data and control dependencies. Developers specify applications using such a graph, and associate predefined program modules with different points in the graph. The HeNCE toolkit then executes the graph on a user-configured virtual machine (under PVM). The latter part of the tutorial will describe HeNCE and illustrate its use in graphically assembling concurrent applications from simple (sequential) building blocks. HeNCE also possesses profiling and execution visualization capabilities that will be demonstrated.

Adam Beguelin joined the faculty of Carnegie Mellon University in June of 1992. He holds a joint appointment with the School of Computer Science and the Pittsburgh Supercomputing Center. He received his Ph.D. in Computer Science from the University of Colorado in 1990. His primary research interests are in the area of computer systems, specifically the design and development of programming tools and environments for high performance parallel and distributed computing. He is involved in the design and implementation of the PVM, HeNCE, and Xab software packages.

Jack Dongarra holds a joint appointment as Distinguished Professor of Computer Science in the Computer Science Department at the University of Tennessee and as Distinguished Scientist in the Mathematical Sciences Section at Oak Ridge National Laboratory. He specializes in numerical algorithms in linear algebra, parallel computing, use of advanced-computer architectures, programming methodology, and tools for parallel computers.

Al Geist is a computer scientist in the Mathematical Sciences Section of Oak Ridge National Laboratory. His research interests are in the areas of parallel and distributed processing, scientific computing, and high performance numerical software.

Robert Manchek is a Research Associate at the University of Tennessee, Knoxville. His research interests include Parallel Computing, Networking and Operating Systems. He received a B.S. in Electrical and Computer Engineering from the University of Colorado, Boulder in 1988 and is currently pursuing a Ph.D. in Computer Science at the University of Tennessee.

Vaidy Sunderam is a Professor in the Department of Mathematics and Computer Science at Emory University. He received a Ph.D. in Computer Science from Kent, England in 1986. His research interests are in parallel and distributed processing, particularly high-performance concurrent computing in heterogeneous networked environments.

M5: Message-Passing Programming for Scientists and Engineers

Hugh Caffey and Cherri Pancake
80/20/0

In this tutorial, the principles of parallel programming in a message-passing environment will be introduced in terms that make sense to non-computer scientists. Emphasis will be on practical information, with a series of example programs being used to guide newcomers through the important stages in writing and tuning message-passing codes. The tutorial will not address details of parallel architectures, algorithms or theoretical models. Instead, it will offer a minimum-trauma introduction to the issues at stake in deciding whether or not to parallelize an application, basic strategies for adding parallelism, and techniques for debugging/evaluating/tuning parallel programs.

Hugh Caffey is a senior scientific programmer at BioNumerik Pharmaceuticals, Inc. His work focuses on porting and parallelizing applications for clusters of RISC System/6000 workstations and multiprocessors using FORGE 90, PVM and other tools.

Cherri Pancake is an associate professor and Intel Faculty Fellow in the Department of Computer Science at Oregon State University. Her research area is software support for high performance computing, with emphasis on user interface design. She also serves as chair of the Parallel Tools Consortium and as HPC area editor for "Communications of the ACM" and "IEEE Computer."

M6: Compilers and Runtime Support for Distributed Memory Machines

J. Ramanujam and Alok Choudhary
20/50/30

For high performance computing to be successful, significant advances in programming tools, especially compilers and runtime systems, are critical. This tutorial is designed to cover issues that are central to compiler and runtime optimizations of scientific codes written in languages like HPF for distributed memory and massively parallel machines. Specifically, this tutorial will address three main issues: general principles, analysis and optimization for parallel machines; specific issues and optimizations related to compiling programs portable on many for distributed memory machines; and the required runtime support and optimizations for such compilers. This tutorial is intended for compiler writers for languages like High Performance Fortran (HPF) for high performance architectures, architects of parallel computers, researchers in high performance computing, graduate students, and application developers.

J. Ramanujam received his Ph.D. in Computer Science from The Ohio State University in 1990. He is currently an Assistant Professor in the Department of Electrical and Computer Engineering at Louisiana State University. His research interests are in the area of parallelizing compilers, operating systems and programming environments for parallel computing, and computer architecture.

Alok Choudhary is an associate professor in the Departments of Electrical and Computer Engineering and Computer and Information Science at Syracuse University. He received his Ph.D. from University of Illinois at Urbana-Champaign in 1989. His main research interests are in parallel and distributed processing, software development environments for parallel computers including compilers and runtime support, parallel computer architectures, and parallel I/O systems. He has been one of the main architects of the Fortran 90D/HPF compiler for distributed memory machines.

M7: High Performance Computing for Scientific Applications: An Introduction

Horst D. Simon and Subhash Saini
50/50/0

This tutorial proposes to be a practical guide for beginners to the main topics and themes of high performance computing (HPC). The intent is to provide some guidance and directions in the rapidly increasing field of scientific computing using massively parallel supercomputers. In the last few years highly parallel systems have become the tool of choice for solving the grand challenge problems of science and engineering. Even though many research issues concerning their effective use and their integration into a large scale production facility are still unresolved, parallel supercomputers are now widely used for production computing. In this talk we will utilize our experience with massively parallel supercomputers such as the CM-5 from Thinking Machines Corporation, the Paragon from Intel SSD, the T3D from Cray Research, and the SP-2 from IBM. The CM-5 and Paragon are currently installed at NASA Ames Research Center, and a 160 node SP-2 system will be installed by summer of 1994. The lecturers have had direct experience in working with scientists on these machines.

Horst D. Simon is with Computer Sciences Corporation at NASA Ames Research Center in Moffett Field, California. He is CSC department manager and responsible for a group of researchers in the areas of parallel algorithm development and scientific visualization, who, as contractor personnel, collaborate with the NASA staff. His research interests are in the development of high performance algorithms for vector and parallel machines. Particular areas of interest are sparse matrix algorithms, algorithms for large scale eigenvalue problems, and domain decomposition algorithms for irregular domains for parallel processing. Dr. Simon's algorithm research efforts were honored with the 1988 Gordon Bell Award for parallel processing research.

Subhash Saini received a Ph.D. in Physics from the University of Southern California and has held positions at University of California at Los Angeles, University of California at Berkeley, and Lawrence Livermore National Laboratory. He is a senior computer scientist with Computer Sciences Corporation at Numerical Aerodynamic Simulation Systems Division, NASA Ames Research Center. His research interests include the development, testing and documentation of high-performance mathematical software on massively parallel machines such as Paragon, CM-5, Cray T3D and IBM SP-1. His duties over the years have included a number of training and educational seminars within and outside NASA.

M8: Interdisciplinary Adventures in Computational Biology

Paul Stolorz
0/100/0

This tutorial will provide a practical guide to the application of massively parallel computational techniques in biology and medicine, drawing upon ideas and experiences from a number of disciplines. Several recent developments which have emerged at the interface between computer science, biology, chemistry and physics will be presented. The emphasis will be on a self-contained presentation of each of the topics studied, enabling those who may be familiar with one discipline to incorporate promising new directions from other fields. Particular attention will be paid to the links between different approaches, to the practical application of these insights to important biological problems, and to their efficient implementation on high performance machines. Topics to be covered include applications of simulated annealing and other stochastic sampling techniques to the prediction of RNA, DNA and protein structure, sequence alignment algorithms, modern N-body methods for parallel molecular dynamics, machine learning methods for sequence design and motif-detection, and genetic algorithms.

Paul Stolorz is a staff member at Caltech's Jet Propulsion Laboratory. He specializes in high performance computing in science and engineering, including problems in geophysical modeling, structural biology, immunology and physics. His research interests include parallel and distributed computing, with an emphasis on high-speed networking, machine learning, and the analysis of large-scale scientific databases. He received his Ph.D. in theoretical physics from the California Institute of Technology in 1987.

M9: ATM in a Supercomputer Network Environment

James D. McCabe
40/40/20

Asynchronous Transfer Mode (ATM) network technology is gaining acceptance in both the telecommunications and data networking environments. As the model of networking changes from customer-built private networks to vendor-supplied services-based networks, ATM is expected to provide the infrastructure for higher level services. This tutorial presents an overview of ATM technology, including the factors leading to the development of ATM, existing and potential applications for ATM based supercomputer networks, and the evolution towards services-based networking.

James D. McCabe is manager of the Long-Haul Communications Group for the Numerical Aerodynamic Simulation Systems Division at NASA Ames Research Center in Moffett Field, CA. His interests include development and deployment of new high performance data communications technologies in the NASA computational fluid dynamics environment.

M10: Clustered Workstation Environments

Dennis W. Duke, Doug Elias, Miron Livny, and Louis H. Turcotte
20/60/20

The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered workstation computing. This tutorial will present the factors which motivate the implementation of workstation clusters, the characteristics of workstation clusters, and the present software systems that are available to effectively exploit these resources. The tutorial will be divided into four major subject areas: dedicated clusters, enterprise clusters, software environments for using clusters for parallelization, and ancillary topics related to workstation clustering. Major emphasis will be placed on software systems to support each of these topics, system administration requirements, real world experiences, and the pros/cons of each.

Dennis W. Duke is the Director of the Supercomputer Computations Research Institute and Professor of Physics at Florida State University. Recently he has been active in establishing cluster computing as a third branch of high performance computing, complementing vector supercomputing and massively parallel computing.

Doug Elias is a member of the technical staff of the Scientific Computing Support Group within the Cornell Theory Center. His current responsibilities include parallel environment evaluation, parallel software enablement, user consulting, education and training, and parallel and system software development.

Miron Livny is an Associate Professor in the Computer Sciences Department at the University of Wisconsin-Madison. His research focuses on scheduling policies for processing and data management systems and on tools that can be used to evaluate such policies.

Louis Turcotte received a Ph.D. in Engineering Mechanics from the University of Alabama and is a licensed Professional Engineer. He presently holds a joint appointment as a Research Engineer with NSF Engineering Research Center for Computational Field Simulation at Mississippi State University and the U.S. Army Engineering Waterways Experiment Station.

M11: Parallel I/O on Highly Parallel Systems

Bill Nitzberg and Samuel A. Fineberg
25/50/25

Typical scientific applications require vast amounts of processing power coupled with significant I/O capacity. Highly parallel computer systems provide floating point processing power at low cost, but efficiently supporting a scientific workload also requires commensurate I/O performance. In order to achieve high I/O performance, these systems utilize parallelism in their I/O subsystems---supporting concurrent access to files by multiple nodes of a parallel application, and striping files across multiple disks. However, obtaining maximum I/O performance can require significant programming effort. This tutorial presents a snapshot of the state of I/O on highly parallel systems by comparing the well-balanced I/O performance of a traditional vector supercomputer (the Cray Y/MP C90) with the I/O performance of various highly parallel systems (Cray T3D, IBM SP-2, Intel iPSC/860 and Paragon, Kendall Square KSR1, and Thinking Machines CM-5). In addition, the tutorial covers benchmarking techniques for evaluating current parallel I/O systems and techniques for improving parallel I/O performance. Finally, the tutorial presents several high level parallel I/O libraries and shows how they can help application programmers improve I/O performance.

Bill Nitzberg leads the Parallel Systems Development group supporting the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center in Moffett Field, California. Chartered with solving the system software deficiencies of highly parallel machines, the group is focusing on reliability, scheduling, message passing, and parallel I/O.

Samuel A. Fineberg has been working at the NAS facility since 1992 where he has been concentrating on developing support for multidisciplinary and multizonal message passing applications. He co-designed the NAS Parallel I/O benchmark suite and is a member of the Message Passing Interface Forum. Dr. Fineberg received his Ph.D. from the Department of Electrical and Computer Engineering at the University of Iowa.

M12: Compilers: Source/Code-Level Optimizations and Resource Allocation

Constantine D. Polychronopoulos and Alex Nicolau
10/30/60

In this tutorial we shall address specific directions in the development of powerful compilers and operating systems for large-scale multiprocessors, undertaken by industry and universities during the last few years. The tutorial will cover major aspects of parallelizing compiler design and implementation, including data and control dependence analysis, dataflow analysis, source level optimizations and restructuring, backend optimizations, program partitioning and scheduling, code generation, granularity control, parallel thread management and operating system support for parallel threads. Emphasis will be placed on the tradeoffs and relative advantages of various approaches to parallelism exploitation. In the course of this tutorial we shall also discuss issues pertaining to the design of powerful intermediate program representation structures which capture both the hierarchy of computations and the parallelism in a program. Finally we shall review recent progress in control dependence analysis and optimization, partitioning a program into threads, packaging parallelism and carrying out static and dynamic scheduling coupled with dynamic tuning of the granularity of threads.

Constantine D. Polychronopoulos is an Associate Professor of Electrical and Computer Engineering at the University of Illinois and the Center for Supercomputing Research and Development. His research interests are in parallelizing compilers, code optimization, multithreading, and multiprocessor operating systems.

Alex Nicolau is a Professor of Computer Science and Electrical and Computer Engineering at the University of California, Irvine. His research interests are in the areas of fine-grain parallelizing compilers and environments, program transformations, and parallel architectures.

Friday, November 18 - Half Day Morning

F1: An Introduction to Virtual Reality

Henry Sowizral
90/10/0

Visualization technologies play an increasingly crucial role in our understanding as we seek to learn more from ever more complete and complex data. Virtual Reality (VR) is a new visualization technology that allows a scientist to surround herself or himself with data. The scientist can "tour" the data, insert and position telltales into dynamic data, and even manipulate the data. VR allows a scientist to build up an accurate 3D model of the data, just as she or he would examine a complex physical space in the real world, by turning the head left or right, or by moving the head forwards or backwards. This tutorial presents an introduction to VR technology. It examines why VR can be used in scientific visualization, how VR systems work, and how we can begin experimenting with visualization using VR.

Henry A. Sowizral is a computer scientist at Boeing Computer Services. He has a Ph.D. in Computer Science from Yale University. He is a software architect and technical lead on Boeing's immersive VR research effort, which is aimed at developing a VR-based visualization system capable of generating high frame rates (20-30 Hz) when viewing large (on the order of 10 Giga-polygons) CAD-based designs.

Friday, November 18 - Half Day Afternoon

F2: Advanced Issues in Virtual Reality

Henry Sowizral
10/50/40

This tutorial examines the detailed issues that can make or break the Virtual Reality (VR) experience. It examines the eye, how it operates, and what we need to do to present it with appropriate images. It examines methods for compensating tracker inadequacies. It presents techniques for dealing with scene complexity. It discusses the problems of inserting the human into the synthetic environment. And finally, it concludes with a quick overview of augmented reality -- a mixing of the real world with the synthetic.

F3: Performance Tuning on RISC Systems

Ramesh Agarwal, David H. Bailey, Charles Grassl, Fred Gustavson, Dick Hessel, and Mohammad Zubair
10/80/10

With their burgeoning computational power, memory capacities and disk space, RISC workstations are rapidly becoming the computational platforms of choice for many scientists. Also, the latest generation of highly parallel computers employs RISC processors in their design. On the other hand, the new RISC processors are more complex than before, and obtaining optimal performance with them is not as straightforward as it once was. This tutorial is designed to assist programmers and others in understanding how to tune their applications on these systems. Individual presentations will focus on three state-of-the art RISC processors: the DEC "Alpha" (which is used in the Cray T3D), the IBM "Power2" (which is used in the IBM RS6000/590 and the SP-2) and the MIPS/SGI R8000 (which also has been known as the "TFP").

Ramesh Agarwal is with the Mathematical Sciences Department of IBM T. J. Watson Research Center. He has done research in many areas of engineering, science, and mathematics and has published over 60 papers. For the last five years his primary research has been in algorithms and architecture for high performance computing on RISC workstations and scalable parallel machines. He has received several prizes and awards. He is a Fellow of IEEE, and a member of the IBM Academy of Technology.

David H. Bailey is with the NAS Applied Research Branch at NASA Ames Research Center. Since his Ph.D. from Stanford in 1976, Bailey's research has ranged from computational number theory to numerical algorithms and supercomputer performance analysis. He is one of the authors of the widely cited NAS Parallel Benchmarks. In 1993 he received the Sid Fernbach award from the IEEE Computer Society for his contributions to the field of high performance computing.

Charles Grassl is a Senior Benchmarking Analyst at Cray Research, Inc., where he has worked since 1984. His is currently involved with benchmarking and performance analysis for the Cray T3D. Familiar with all the standard supercomputer benchmarks, he has been involved in development and planning for the Perfect group, the Parkbench group and the HPC subgroup of SPEC.

Fred Gustavson is manager of Algorithms and Architectures in the Mathematical Sciences Department at IBM Research. He and his group are engaged in exploiting novel features of IBM RISC processors. These include hardware design, novel algorithms for the Power2, and parallel algorithms for the SP-2. Dr. Gustavson has received five IBM Outstanding Awards and two Corporate Technical Recognition Awards.

Dick Hessel is the manager of Performance Engineering at Silicon Graphics. Prior to joining SGI, he was Senior Scientist and Deputy Project Manager of the NASA Ames Central Computing Facility (with Sterling Software), a chief engineer at Supercomputer Systems, Inc., the manager of MCAD applications at Alliant, and a mechanical engineering professor at Clarkson University and the University of Cincinnati.

Mohammad Zubair is with the Mathematical Sciences Department of IBM T. J. Watson Research Center. He has also worked at the Center for Applied Research in Electronics (IIT-Delhi), ICASE, and at the Old Dominion University. His research interests are in the algorithm and architecture aspects of large scale scientific computing. He has published over 30 papers. He received his Ph.D. in 1987 from the Indian Institute of Technology, New Delhi.

F4: Parallel Programming Tools -- Status, Evaluation and Comparison

Doreen Y. Cheng
20/60/20

NASA Report provided: A Survey or Parallel Programming Languages and Tools

This tutorial will discuss the complexity of developing parallel programs and describe the characteristics of scientific applications and the computing environment in a typical high performance computing center. It will define the user requirements for tools that support application portability and present the difficulties in satisfying them. It will describe the tools for converting sequential programs to parallel programs, tools for developing portable new parallel programs, tools for debugging and performance tuning, tools for partitioning and mapping, and tools for managing networks of resources. It will compare tool usability for real-world application development and different technological approaches. Finally, it will outline the future directions of the tools in each category.

Doreen Y. Cheng received her Ph.D. in Electrical Engineering from Stanford University in 1988. She is currently at NASA Ames Research Center. Together with colleagues, she has evaluated and compared many tools. Currently, she is defining protocols that facilitate debugger portability and developing a prototype for a portable parallel/distributed debugger.

F5: Experiences with Scalable Parallel Architectures at Maui, Argonne and Cornell

Brian T. Smith, Peter M. Siegel, and Tom Morgan
25/50/25

Representatives from three national centers and laboratories -- the Maui High Performance Computing Center (managed by UNM), Cornell Theory Center and Argonne National Laboratory -- will discuss their early experiences with the IBM Scalable Power Parallel architectures. The presentations will cover the following topics:

1. Management experiences in operating high performance computing facilities to meet the research, technology transfer, economic development and education needs that address and support NSF, DOE and DoD HPC initiatives. Ken Cole, Jeffrey Silber, Rick Stevens

2. Overview of the IBM SP-1 and SP-2 architectures, configuration issues, installation, operation and system management experiences including standard benchmark performance characteristics and availability of software tools for system administration. John Sobolewski, Mark Henderson, Jeffrey Silber

3. Survey of available application software for these machines as well as detailed descriptions of several applications that have been ported, porting experiences as well as experiences in using available parallel development tools, graphical performance tools and application scalability issues. Steve Lantz, Tom Morgan

4. Description of user services, help desk and consulting services provided remotely over the Internet using X-Mosaic and other support tools. Demonstrations of these tools will be provided using a portable PC running X-Mosaic. Margaret A. Williams, Tom Morgan

Rick Stevens is Director of the Mathematics and Computer Science Division at Argonne National Laboratory. He is responsible for the operation and research activities on the Argonne 128-way SP1 processor.

Jeffrey Silber is Director of Administration and Operational Support at the Center for Theory and Simulation in Science and Engineering (Cornell Theory Center) at Cornell University.

Ken Cole is Interim Director of the Maui High Performance Computing Center and formerly the Director of the supercomputer facility at Phillips Laboratory in Albuquerque.

John S. Sobolewski is currently the Associate Vice President for Computing at the University of New Mexico where he also holds a faculty appointment in Electrical and Computer Engineering and is one of the Principal Investigators responsible for the implementation of the Maui High Performance Computer Center. His research interests include computer architectures, data communication networks as well as the implementation and management of information service infrastructures.

Margaret A. Williams is currently the Associate Director for User Services/Research for the Maui High Performance Computing Center, and Senior Computational Mathematician for Numerical Algorithms Group, Inc. She is formerly a manager at IBM responsible for Engineering and Scientific Subroutine Library. She and her staff recently designed the Mosaic server for the MHPCC.

Mark Henderson is the head of the systems group at the Mathematics and Computer Science Division at Argonne National Laboratory. He is a computer scientist responsible for the operation of the advanced computer research facility at Argonne.

Brian T. Smith is assistant dean of the College of Engineering at the University of New Mexico, Professor of Computer Science, Director of the Albuquerque Resource Center for the High Performance Computing and Education Research Center at UNM, and one of the principal investigators for the implementation of the MHPCC.

Tom Morgan is a computer scientist in the Mathematics and Computer Science Division of Argonne National Laboratory.

Peter M. Seigel is executive director of the Cornell Theory Center at Cornell University.

F6: Methodologies and Tools for Tuning Parallel Programs -- Facts and Fantasies

Jerry Yan
25/50/25

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. The recent introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance tool technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools really detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g., AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.

Jerry Yan received his Ph.D. in Electrical Engineering at Stanford University. He currently works at NASA Ames Research Center (as a contractor with RECOM Technologies) and directs a research group working on advanced system software supporting the High Performance Computing and Communications Program. His research interests include parallel processing, performance evaluation and computer architecture.

Friday, November 18 - Full Day

F7: Scientific Visualization: From Data to Photons

Mike Bailey, Chuck Hansen, and Lloyd Treinish
60/30/10

Scientific visualization is an essential part of high performance computing, but it is worthwhile only if done well. The computational science landscape is littered with examples of visualizations gone awry, where the display of the data detracted from the insight instead of enhancing it. This tutorial is aimed at those who need to visualize complex scientific data and want to better understand the process from beginning to end. This tutorial will not focus on using specific software packages, but will instead discuss four key elements that are common to many scientific visualization procedures: representation and management of scientific data; 2D data visualization; 3D data visualization of scalar and vector fields; and visualization displays, colors, and hardcopy. Along the way, we will show many examples.

Mike Bailey is a Senior Staff Visualization Scientist at the San Diego Supercomputer Center. Mike has been involved in computer graphics and scientific visualization for 20 years. His research interests include color science, stereographics, geometric modeling, visualization hardcopy, and the use of the Internet for such innovative purposes as hardcopy, digital libraries, and tele-manufacturing.

Chuck Hansen is project leader for scientific visualization in the Advanced Computing Laboratory (ACL) at Los Alamos National Laboratory. He is responsible for the scientific visualization environment for the DOE High Performance Computing Research Center at the ACL. He has extensive experience in the field of scientific visualization, particularly as it applies to very large scale computational environments. His research interests include scientific visualization, parallel rendering, 3D geometric modeling, and computer vision.

Lloyd Treinish is a Research Staff Member in the Visualization Systems Group at the IBM Thomas J. Watson Research Center. Lloyd works on techniques, architectures and applications of data visualization for a wide variety of scientific disciplines within Visualization Systems, the group that developed the IBM POWER Visualization System and the IBM Data Explorer. His research interests include computer graphics, data storage structures, data representation methodologies, database management, computer user interfaces, data analysis algorithms, planetary astronomy, and climatology.

F8: C++ for High Performance Computing

Ian G. Angus
75/25/0

This tutorial will introduce the C++ programming language and object oriented programming and design techniques and will show how they can be applied to efficiently solve real applications on all classes of supercomputers. It will motivate and describe the features of C++ with examples that capture the essence of many applications of interest to the high performance computing community. It will also present a perspective of how object oriented methods are evolving and how these developments might influence the software environment surrounding supercomputers.

Ian Angus has been involved with the development of efficient object-oriented programming techniques for massively parallel computers since 1987. His research has analyzed the ramifications of these methods in several production applications, for the design of tools, operating systems, distributed object oriented databases, compilers, and the integration of graphic visualization into this framework.

F9: Linear Algebra Algorithms and Software for Large Scientific Problems

Jack Dongarra, Iain Duff, Danny Sorensen and Henk van der Vorst
20/50/30

Books provided: Solving Linear Systems on Vector and Shared Memory Computers, SIAM Publication, 1990; Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM Publication, Philadelphia, 1994.

Present computers, even workstations, allow the solution of very large scale problems in science and engineering. Most often a major part of the computational effort goes in solving linear algebra subproblems. This tutorial will discuss a variety of algorithms for these problems indicating where each is appropriate and emphasizing their efficient implementation. In particular, the development of vector and parallel computers in the late 1970s led to a critical review of mathematical software. Many of the sequential algorithms used satisfactorily on traditional machines fail to exploit the architecture of advanced computers. We briefly review some of the features of modern computer systems and illustrate how the architecture affects the potential performance of linear algebra algorithms. We will consider recent techniques devised to utilize advanced architectures more fully, especially the design of the Level 1, 2, and 3 BLAS. We will highlight the LAPACK package which provides a choice of algorithms mainly for dense matrix problems that are efficient and portable on a variety of high performance computers. For large sparse linear systems, the situation is more complicated and a wide range of algorithms is available. We will give an introduction to this field and guidelines on the selection of appropriate software. We will consider both direct methods and iterative methods of solution, including some recent work that can be viewed as a hybrid of the two. In the case of direct methods, we will emphasize frontal and multifrontal methods including variants performing well on parallel machines. For iterative methods, our discussion will include CG, BiCG, QMR, CGS, BiCGSTAB, GMRES, and LSQR. For large (sparse) eigenproblems we will discuss some of the most widely used methods such as Lanczos, Arnoldi, and Davidson. Efficient implementation of Arnoldi's method and the Implicitly Restarted Arnoldi Method will be discussed, along with guidelines for their usage, preconditioning, and hints for the selection of these algorithms. Finally, we address the challenge facing designers of mathematical software in view of the development of highly parallel computer systems. We shall discuss ScaLAPACK, a project to develop and provide high performance scalable algorithms suitable for highly parallel computers. We will also consider techniques for implementation of sparse matrix algorithms in a highly parallel environment, in particular the solution of sparse linear equations.

Iain S. Duff is currently Group Leader of Numerical Analysis in the Central Computing Department at the Rutherford Appleton Laboratory. He is also the Project Leader for the Parallel Algorithms Group at CERFACS in Toulouse and is a visiting professor at the University of Strathclyde. His main interests are sparse matrices, mathematical software, and parallel computation.

Danny C. Sorensen received a Ph.D. in 1977 from the Department of Mathematics, University of California, San Diego and is now a Professor in the Mathematical Sciences Department of Rice University. His research interests are in numerical analysis and parallel computation. His specialties include numerical linear algebra, use of advanced computer architectures, programming methodology and tools for programming parallel computers, and numerical methods for nonlinear optimization.

Henk A. van der Vorst is a professor in numerical analysis in the Mathematical Department of Utrecht University in the Netherlands. His current research interests include iterative solvers for linear systems, large sparse eigenproblems, overdetermined systems, and the design of algorithms for parallel and vector computers.

F10: Computational Chemistry: Beyond the Black Box

Rozeanne Steckler, Peter Taylor, and Franklin Brown
75/25/0

This tutorial is directed at chemists, biochemists, and other interested scientists who are currently using or interested in using ab initio computational chemistry techniques for modeling chemical systems and who want to learn more about the models and assumptions made in popular computational chemistry software packages. This tutorial is designed to enhance each participant's understanding of the underlying theoretical models, including their assumptions, approximations and applicability. This is a "how-to" tutorial and the emphasis will be on exploring enough of the theoretical background material to better understand the software of more popular packages, more reliably interpret the results obtained using these packages, and guide the researcher in the efficient use of the software and the corresponding hardware. The tutorial will cover Hartree-Fock (HF), Multi-Configuration Self-Consistent Field (MCSCF), Moller-Plesset Perturbation (MP2 and MP4), Coupled Cluster (CC), and Configuration Interaction (CI) theories. The emphasis throughout these discussions will be on developing an understanding of the methods and their accuracy and applicability to problems of general interest. The course will also cover the extension of ab initio calculations to the determination of a potential energy surface and reaction rate constants.

Rozeanne Steckler is a principal scientist at the San Diego Supercomputer Center and an Adjunct Professor at San Diego State University. Her research interests are in the development of direct dynamics methods and their application to atmospheric chemistry. She is also active in the advancement of computational chemistry education at the undergraduate level.

Peter Taylor is the manager of the Science department at the San Diego Supercomputer Center. He has been involved in ab initio quantum chemistry for 17 years. His research specialty is the development of highly accurate quantum mechanical methods and their applications.

Franklin Brown is a chemistry professor at Tallahassee Community College and a faculty associate at the Supercomputer Computations Research Institute at Florida State. He specializes in highly accurate small molecule quantum chemistry and computational chemistry education.