Data cache locking for tight timing calculations

Transactions on Embedded Computing Systems - Tập 7 Số 1 - Trang 1-38 - 2007
Xavier Vera1, Björn Lisper1, Jingling Xue2
1Mälardalens högskola, Västerås, Sweden
2University of New South Wales, Sydney, Australia

Tóm tắt

Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality. In addition, caches are a source of unpredictability, resulting in programs sometimes behaving in a different way than expected. Detailed information about the number of cache misses and their causes allows us to predict cache behavior and to detect bottlenecks. Small modifications in the source code may change memory patterns, thereby altering the cache behavior. Code transformations, which take the cache behavior into account, might result in a high cache performance improvement. However, cache memory behavior is very hard to predict, thus making the task of optimizing and timing cache behavior very difficult. This article proposes and evaluates a new compiler framework that times cache behavior for multitasking systems. Our method explores the use of cache partitioning and dynamic cache locking to provide worst-case performance estimates in a safe and tight way for multitasking systems. We use cache partitioning, which divides the cache among tasks to eliminate intertask cache interferences. We combine static cache analysis and cache-locking mechanisms to ensure that all intratask conflicts, and consequently, memory access times, are exactly predictable. The results of our experiments demonstrate the capability of our framework to describe cache behavior at compile time. We compare our timing approach with a system equipped with a nonpartitioned, but statically, locked data cache. Our method outperforms static cache locking for all analyzed task sets under various cache architectures, demonstrating that our fully predictable scheme does not compromise the performance of the transformed programs.

Từ khóa


Tài liệu tham khảo

Alt , M. , Ferdinand , C. , Martin , F. , and Wilhelm , R . 1996. Cache behaviour prediction by abstract interpretation . In Proceedings of Static Analysis Symposium (SAS'96) . Lecture Notes in Computer Science (LNCS) 1145. Springer-Verlag, New York, 52--66. Alt, M., Ferdinand, C., Martin, F., and Wilhelm, R. 1996. Cache behaviour prediction by abstract interpretation. In Proceedings of Static Analysis Symposium (SAS'96). Lecture Notes in Computer Science (LNCS) 1145. Springer-Verlag, New York, 52--66.

Arnold , R. , Müeller , F. , Whalley , D. , and Harmon , M . 1994. Bounding worst-case instruction cache performance . In Proceedings of 15th Real-Time Systems Symposium (RTSS'94) . 172--181. Arnold, R., Müeller, F., Whalley, D., and Harmon, M. 1994. Bounding worst-case instruction cache performance. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94). 172--181.

Basumallick , S. and Nielsen , K . 1994. Cache issues in real-time systems . In Proceedings ACM Workshop on Languages, Compilers and Tools for Real-Time Systems (LCTES'94) . Basumallick, S. and Nielsen, K. 1994. Cache issues in real-time systems. In Proceedings ACM Workshop on Languages, Compilers and Tools for Real-Time Systems (LCTES'94).

Burns , A. and Wellings , A . 1993. The impact of an Ada run-time system's performance characteristics on scheduling models . In Proceedings of 12th Ada-Europe International Conference. 240--248 . Burns, A. and Wellings, A. 1993. The impact of an Ada run-time system's performance characteristics on scheduling models. In Proceedings of 12th Ada-Europe International Conference. 240--248.

10.1109/32.387477

Busquets-Mataix , J. V. , Serrano , J. J., . Ors , R. , Gil , P. , and Wellings , A . 1996. Adding instruction cache effect to schedulability analysis of preemptive real-time systems . In Proceedings of 2nd Real-Time Technology and Applications Symposium (RTAS'96) . Busquets-Mataix, J. V., Serrano, J. J., .Ors, R., Gil, P., and Wellings, A. 1996. Adding instruction cache effect to schedulability analysis of preemptive real-time systems. In Proceedings of 2nd Real-Time Technology and Applications Symposium (RTAS'96).

Busquets-Mataix , J. V. , Serrano , J. J. , and Wellings , A . 1997. Hybrid instruction cache partitioning for preemptive real-time systems . In Proceedings of 9th Euromicro Workshop on Real-Time Systems (EUROMICRO-RTS'97) . Busquets-Mataix, J. V., Serrano, J. J., and Wellings, A. 1997. Hybrid instruction cache partitioning for preemptive real-time systems. In Proceedings of 9th Euromicro Workshop on Real-Time Systems (EUROMICRO-RTS'97).

Campoy , M. , Ivars , A. P. , and Busquets-Mataix , J. V . 2001. Static use of locking caches in multitask preemptive real-time systems . In Proceedings of IEEE/IEE Real-Time Embedded Systems Workshop (Satellite of the IEEE Real-Time Systems Symposium). Campoy, M., Ivars, A. P., and Busquets-Mataix, J. V. 2001. Static use of locking caches in multitask preemptive real-time systems. In Proceedings of IEEE/IEE Real-Time Embedded Systems Workshop (Satellite of the IEEE Real-Time Systems Symposium).

Engblom , J. and Ermedahl , A . 1999. Pipeline timing analysis using a trace-driven simulator . In Proceedings of 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99) . Engblom, J. and Ermedahl, A. 1999. Pipeline timing analysis using a trace-driven simulator. In Proceedings of 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99).

10.5555/1890629.1890651

Ermedahl , A. and Gustafsson , J . 1997. Deriving annotations for tight calculation of execution time . In Proceedings of Euro-Par (EUROPAR'97) . 1298--1307. Ermedahl, A. and Gustafsson, J. 1997. Deriving annotations for tight calculation of execution time. In Proceedings of Euro-Par (EUROPAR'97). 1298--1307.

Feautrier , P. 1996. Automatic parallelization in the polytope model . In The Data Parallel Programming Model, G. R. Perrin and A. Darte, Eds. Lecture Notes in Computer Science 1132 . Springer Verlag , New York , 79--103. Feautrier, P. 1996. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, G. R. Perrin and A. Darte, Eds. Lecture Notes in Computer Science 1132. Springer Verlag, New York, 79--103.

10.1023/A:1008186323068

10.1016/0743-7315(88)90014-7

10.1145/325478.325479

Healey , C. A. , Whalley , D. , and Harmon , M . 1995. Integrating the timing analysis of pipelining and instruction caching . In Proceedings of 16th Real-Time Systems Symposium (RTSS'95) . 288--297. Healey, C. A., Whalley, D., and Harmon, M. 1995. Integrating the timing analysis of pipelining and instruction caching. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95). 288--297.

Hennessy J. L. and Patterson D. A. 1996. Computer architecture: a quantitative approach. Morgan Kaufman Pub. San Francisco CA. Hennessy J. L. and Patterson D. A. 1996. Computer architecture: a quantitative approach. Morgan Kaufman Pub. San Francisco CA.

IBM Microelectronics Division. 1999. The PowerPC 440 core. IBM Microelectronics Division. 1999. The PowerPC 440 core.

Integrated Device Technologies. 2001. 79RC64574/RC64575 Data Sheet. Integrated Device Technologies. 2001. 79RC64574/RC64575 Data Sheet.

Jeffay , K. and Stone , D. L . 1993. Accounting for interrupt handling costs in dynamic priority task systems . In Proceedings of 14th Real-Time Systems Symposium (RTSS'93) . 212--221. Jeffay, K. and Stone, D. L. 1993. Accounting for interrupt handling costs in dynamic priority task systems. In Proceedings of 14th Real-Time Systems Symposium (RTSS'93). 212--221.

10.1093/comjnl/29.5.390

10.1109/32.241774

Kim , S. K. , Min , S. L. , and Ha , R . 1996. Efficient worst case timing analysis of data caching . In Proceedings of IEEE Real-Time Technology and Applications Symposium (RTAS'96) . Kim, S. K., Min, S. L., and Ha, R. 1996. Efficient worst case timing analysis of data caching. In Proceedings of IEEE Real-Time Technology and Applications Symposium (RTAS'96).

10.1109/REAL.1989.63574

10.1145/106972.106981

10.1109/12.689649

Li , Y. T. S. , Malik , S. , and Wolfe , A . 1995. Efficient microarchitecture modeling and path analysis for real-time software . In Proceedings of 16th Real-Time Systems Symposium (RTSS'95) . 298--307. Li, Y. T. S., Malik, S., and Wolfe, A. 1995. Efficient microarchitecture modeling and path analysis for real-time software. In Proceedings of 16th Real-Time Systems Symposium (RTSS'95). 298--307.

Li , Y. T. S. , Malik , S. , and Wolfe , A . 1996. Cache modeling and path analysis for real-time software . In Proceedings of 17th Real-Time Systems Symposium (RTSS'96) . Li, Y. T. S., Malik, S., and Wolfe, A. 1996. Cache modeling and path analysis for real-time software. In Proceedings of 17th Real-Time Systems Symposium (RTSS'96).

Liedtke , J. , Härtig , H. , and Hohmuth , M . 1997. OS-controlled cache predictability for real-time systems . In Proceedings of 3rd IEEE Real-Time Technology and Applications Symposium (RTAS'97) . Liedtke, J., Härtig, H., and Hohmuth, M. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of 3rd IEEE Real-Time Technology and Applications Symposium (RTAS'97).

Lim , S. S. , Bae , Y. H. , Jang , G. T. , Rhee , B. D. , Min , S. L. , Park , C. Y. , Shin , H. , Park , K. , and Kim , C. S . 1994. An accurate worst case timing analysis technique for RISC processors . In Proceedings of 15th Real-Time Systems Symposium (RTSS'94) . 97--108. Lim, S. S., Bae, Y. H., Jang, G. T., Rhee, B. D., Min, S. L., Park, C. Y., Shin, H., Park, K., and Kim, C. S. 1994. An accurate worst case timing analysis technique for RISC processors. In Proceedings of 15th Real-Time Systems Symposium (RTSS'94). 97--108.

Lundqvist , T. and Stenström , P . 1998. Integrating path and timing analysis using instruction-level simulation techniqes . In Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98) . 1--15. Lundqvist, T. and Stenström, P. 1998. Integrating path and timing analysis using instruction-level simulation techniqes. In Proceedings of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98). 1--15.

Lundqvist , T. and Stenström , P . 1999a. A method to improve the estimated worst-case performance of data caching . In Proceedings of the 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99) . 255--262. Lundqvist, T. and Stenström, P. 1999a. A method to improve the estimated worst-case performance of data caching. In Proceedings of the 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99). 255--262.

Lundqvist , T. and Stenström , P . 1999b. Timing anomalies in dynamically scheduled microprocessors . In Proceedings of 20th Real-Time Systems Symposium (RTSS'99) . Lundqvist, T. and Stenström, P. 1999b. Timing anomalies in dynamically scheduled microprocessors. In Proceedings of 20th Real-Time Systems Symposium (RTSS'99).

MIPS Technologies. 2001. MIPS32 4Kp- Embedded MIPS Processor Core. MIPS Technologies. 2001. MIPS32 4Kp- Embedded MIPS Processor Core.

Motorola Inc. 1996. PowerPC 604e RISC Microprocessor Technical Summary. Motorola Inc. 1996. PowerPC 604e RISC Microprocessor Technical Summary.

10.1145/216636.216677

Puaut , I. and Decotigny , D . 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems . In Proceedings of 23th Real-Time Systems Symposium (RTSS'02) . Puaut, I. and Decotigny, D. 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Proceedings of 23th Real-Time Systems Symposium (RTSS'02).

10.1145/277650.277661

Sánchez , F. , González , A. , and Valero , M . 1997. Static locality analysis for cache management . In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT'97) . Sánchez, F., González, A., and Valero, M. 1997. Static locality analysis for cache management. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT'97).

Sun Microelectronics. 1997. microSPARC-IIep User's Manual. Sun Microelectronics. 1997. microSPARC-IIep User's Manual.

10.1007/BF01088593

Vera , X. and Xue , J . 2002. Let's study whole program cache behaviour analytically . In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA 8) . Cambridge. Vera, X. and Xue, J. 2002. Let's study whole program cache behaviour analytically. In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA 8). Cambridge.

Vera , X. , Abella , J. , González , A. , and Llosa , J . 2003a. Optimizing program locality through CMEs and GAs . In Proceedings of 12th International Conference on Parallel Architectures and Compilation Techniques (PACT'03) . New Orleans. Vera, X., Abella, J., González, A., and Llosa, J. 2003a. Optimizing program locality through CMEs and GAs. In Proceedings of 12th International Conference on Parallel Architectures and Compilation Techniques (PACT'03). New Orleans.

10.1145/781027.781062

Vera , X. , Lisper , B. , and Xue , J . 2003c. Data caches in multitasking hard real-time systems . In Proceedings of International Real-Time Systems Symposium (RTSS03) . Vera, X., Lisper, B., and Xue, J. 2003c. Data caches in multitasking hard real-time systems. In Proceedings of International Real-Time Systems Symposium (RTSS03).

10.1145/973097.973099

White , R. T. , Müeller , F. , Healy , C. , Whalley , D. , and Harmon , M . 1997. Timing analysis for data caches and set-associative caches . In Proceedings of Third IEEE Real-Time Technology and Applications Symposium (RTAS'97) . 192--202. White, R. T., Müeller, F., Healy, C., Whalley, D., and Harmon, M. 1997. Timing analysis for data caches and set-associative caches. In Proceedings of Third IEEE Real-Time Technology and Applications Symposium (RTAS'97). 192--202.

10.1145/113445.113449

Wolfe , A. 1993 . Software-based cache partitioning for real-time applications . In Proceedings of the 3rd International Workshop on Responsive Computer Systems. Wolfe, A. 1993. Software-based cache partitioning for real-time applications. In Proceedings of the 3rd International Workshop on Responsive Computer Systems.

Xue , J. 2000. Loop Tiling for Parallelism . Kluwer Academic Publ ., Boston, MA. Xue, J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publ., Boston, MA.

10.1023/A:1018734612524