August 23:
Simultaneous Multithreading and the Case for Chip Multiprocessing (John Mellor-Crummey)
Simultaneous multithreading: maximizing on-chip parallelism,
Dean Tullsen, Susan Eggers, and Henry Levy,
In 25 Years of the International Symposia on Computer Architecture
(Selected Papers) (Barcelona, Spain, June 27 - July 02, 1998).
G. S. Sohi, Ed. ISCA '98. ACM Press, New York, NY, 533-544.
(First published in ISCA '95.)
DOI= http://doi.acm.org/10.1145/285930.28601
The case for a single-chip multiprocessor,
Kunle Olukotun, Basem Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang.
In Proceedings of
the Seventh international Conference on Architectural Support For
Programming Languages and Operating Systems (Cambridge, Massachusetts,
United States, October 01 - 04, 1996). ASPLOS-VII. ACM Press, New
York, NY, 2-11. DOI=http://doi.acm.org/10.1145/237090.237140
A single-chip multiprocessor,
Lance Hammond, Basem Nayfeh, Kunle Olukotun.
Computer 30(9):79-85, September
1997. DOI=http://dx.doi.org/10.1109/2.612253
August 28: Fine-grain Multithreading (John Mellor-Crummey)
ELDORADO.
John Feo, David Harper, Simon Kahan, Petr Konecny.
In Proceedings of the 2nd Conference on Computing Frontiers (Ischia,
Italy, May 04 - 06, 2005). CF '05. ACM, New York, NY, 28-34.
IBM POWER7 multicore server processor. Sinharoy, B.; Kalla, R.; Starke, W. J.; Le, H. Q.; Cargnoni, R.; Van Norstrand, J. A.; Ronchetti, B. J.; Stuecheli, J.; Leenstra, J.; Guthrie, G. L.; Nguyen, D. Q.; Blaner, B.; Marino, C. F.; Retter, E.; Williams, P. IBM Journal of Research and Development 55(3), May-June 2011, 1:1-1:29. http://dx.doi.org/10.1147/JRD.2011.2127330
IBM POWER7 performance modeling, verification, and evaluation
Srinivas, M.; Sinharoy, B.; Eickemeyer, R. J.; Raghavan, R.; Kunkel, S.; Chen, T.; Maron, W.; Flemming, D.; Blanchard, A.; Seshadri, P.; Kellington, J. W.; Mericas, A.; Petruski, A. E.; Indukuru, V. R.; Reyes, S.
IBM Journal of Research and Development 55(3), May-June 2011, 4:1-4:19.
September 13: From Multicore to Multisocket (Kumud Bhandari)
Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk,
S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese.
In Proceedings of the International Symposium on Computer
Architecture (ISCA),
pp. 282-293, June 2000.
The Implementation of the Cilk-5 Multithreaded Language
by Matteo Frigo, Charles E. Leiserson, and Keith H. Randall.
1998 ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI), Montreal, Canada, June 1998.
Reducers and other Cilk++ hyperobjects.
M. Frigo, P. Halpern, C.E. Leiserson, and S. Lewin-Berlin. In Proceedings
of the Twenty-First Annual Symposium on Parallelism in Algorithms and
Architectures (Calgary, AB, Canada, August 11 - 13, 2009). SPAA
'09. ACM, New York, NY, 79-90.
September 24: Intel's Thread Building Blocks (Ashrith Pillarisetti)
September 27: Implementing Nested Data Parallelism
Implementation of a Portable Nested Data-Parallel Language.
Guy E. Blelloch, Siddhartha Chatterjee,
Jonathan C. Hardwick, Jay Sipelstein, and Marco Zagha.
Technical Report CMU-CS-93-112, School of Computer Science,
Carnegie Mellon University, Pittsburgh, PA. 1993.
(An earlier version of this paper appeared in "Proceedings of the
4th ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming", San Diego, May 1993.)
October 2: Shared Memory Consistency Models (Rishi Surendran)
The Java Memory Model, J. Manson, W. Pugh, and S. V. Adve.
In Proceedings of the Symposium on Principles of Programming Languages (PoPL), January 2005.
C++ Concurrency Memory Model
Foundations of the C++ Concurrency Memory Model,
H. Boehm, and S. V. Adve.
In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (Tucson, AZ, USA, June 07 - 13, 2008). PLDI '08. ACM, New York, NY, 68-78. DOI= http://doi.acm.org/10.1145/1375581.1375591
Data Race Detection: Locksets and Happens-before (Shangyu Luo)
Efficient detection of determinacy races in Cilk programs.
M. Feng and C. Leiserson.
In Proceedings of the Ninth Annual ACM
Symposium on Parallel Algorithms and Architectures (Newport, Rhode
Island, United States, June 23 - 25, 1997). SPAA '97. ACM, New York,
NY, 1-11. DOI= http://doi.acm.org/10.1145/258492.258493
Data Race Detection: Integrated Approaches
Detecting data races in Cilk programs that use locks,
G. Cheng, M. Feng, C.E. Leiserson, K. Randall, and A.F. Stark.
In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms
and Architectures (Puerto Vallarta, Mexico, June 28 - July 02,
1998). SPAA '98. ACM, New York, NY,
298-309. DOI=http://doi.acm.org/10.1145/277651.277696
Scalable and precise dynamic datarace detection for
structured parallelism.
Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav.
In Proceedings of the 33rd ACM SIGPLAN
conference on Programming Language Design and Implementation (PLDI
'12). ACM, New York, NY, USA, 531-542. 10.1145/2254064.2254127
Provably efficient scheduling for languages with fine-grained parallelism.
Blelloch, G. E., Gibbons, P. B., and Matias, Y. 1995.
In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms
and Architectures (Santa Barbara, California, United States, June 24 -
26, 1995). SPAA '95. ACM Press, New York, NY, 1-12.
Effectively sharing a cache among threads,
Guy E. Blelloch and Phillip B. Gibbons.
In Proceedings of the 16th
Annual ACM Symposium on Parallelism in Algorithms and Architectures
(Barcelona, Spain, June 27 - 30, 2004). SPAA '04. ACM Press, New York,
NY, 235-244.
Transactional memory. J. Larus, and C. Kozyrakis, Communications of
the ACM 51, 7 (Jul. 2008), 80-88.
Transactional
Memory: Architectural Support for Lock-free Data Structures,
Maurice Herlihy and J. Eliot B. Moss. In Proceedings of the 20th Annual
International Symposium on Computer Architecture, San Diego,
California, 1993, ACM Press, New York, NY, USA, 289-300.
ISCA most influential paper award, 2008.
Software Transactional Memory (Danny Abad)
Software transactional memory for dynamic-sized data
structures,
Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer,
III. In Proceedings of the Twenty-Second Annual Symposium on
Principles of Distributed Computing (Boston, Massachusetts, July 13 -
16, 2003). PODC '03. ACM Press, New York, NY, 92-101.
Understanding Tradeoffs in Software Transactional Memory, Dice, D. and Shavit, N. 2007.
In Proceedings of the international Symposium on
Code Generation and Optimization (March 11 - 14, 2007). Code
Generation and Optimization. IEEE Computer Society, Washington, DC,
21-33.
Early experience
with a commercial hardware transactional memory implementation.
D. Dice, Y. Lev, M. Moir, and D. Nussbaum.
Proceeding of the 14th International Conference on Architectural
Support For Programming Languages and Operating Systems (Washington,
DC, USA, March 07 - 11, 2009). ASPLOS '09. ACM, New York, NY,
157-168.
Blue Gene/Q Compute Chip
The IBM Blue Gene/Q Compute Chip.
Ruud Haring, Martin Ohmacht, Thomas Fox, Michael Gschwind, David
Satterfield, Krishnan Sugavanam, Paul Coteus, Philip Heidelberger,
Matthias Blumrich, Robert Wisniewski, Alan Gara, George Chiu, Peter
Boyle, Norman Chist, and Changhoan Kim.
IEEE Micro 32, 2 (March 2012), 48-60. 10.1109/MM.2011.108
Evaluation of Blue Gene/Q hardware support for transactional
memories.
Amy Wang, Matthew Gaudet, Peng Wu, José Nelson Amaral, Martin Ohmacht,
Christopher Barton, Raul Silvera, and Maged Michael.
In Proceedings of the 21st international conference on Parallel
architectures and compilation techniques (PACT '12). ACM, New York,
NY, USA, 127-136, 2012.
Effective performance measurement and analysis of multithreaded
applications.
Tallent, N. R. and Mellor-Crummey, J. M.
In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Raleigh, NC, USA, February 14 - 18, 2009). PPoPP '09. ACM, New York, NY, 229-240. DOI= http://doi.acm.org/10.1145/1504176.1504210
Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data Races. Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer,
Hans-J. Boehm.
International Symposium on Computer Architecture, Saint-Malo, France, June 2010.
Single-chip Cloud Computer: An experimental many-core processor from
Intel Labs
Download
January slides,
Download March slides. (Note: There is
substantial overlap between the Feburary and
March slide sets, but they are but identical.)