KEYWORDS: Data processing, Data integration, Data communications, Signal processing, Switches, Energy efficiency, Rutherfordium, Telecommunications, Video coding, Computer architecture
Hardware accelerators are used to speed up execution of specific tasks such as video coding. Often the purpose
of hardware acceleration is to be able to use a cheaper or, for example, more energy economical processor for
executing the majority of the application in software. However, when using hardware acceleration, new overheads
are produced mainly due to the need to transfer data to and from the accelerator and signaling the readiness
of the accelerator computation to the processor. We find the traditional mechanisms suboptimal for fine-grain
hardware acceleration, especially when energy efficiency is important.
This paper explores a technique unique to Transport Triggered Architectures to interface with hardware
accelerators. The proposed technique places hardware accelerators to the processor data path, making them
visible as regular function units to the programmer. This way communication costs are reduced as data can
be transferred directly to the accelerator from other processor data path components and synchronization can
be done by polling a simple ready flag in the accelerator function unit. Additionally, this setup enables the
instruction scheduler of the compiler to schedule the hardware accelerator like any other operation, thus partially
hide its latency with other program operations.
The paper presents a case study with an audio decoder application in which fine-grain and coarse-grain
hardware accelerators are integrated to the processor data path as function units. The case is used to study
several different synchronization, communication, and latency-hiding techniques enabled by this kind of setup.
KEYWORDS: Binary data, Multimedia, Video, Data modeling, Mobile devices, Digital signal processing, Video processing, Computer programming, Video coding, Multiplexers
Video coding standards, such as MPEG-4, H.264, and VC1, define hybrid transform based block motion compensated techniques that employ almost the same coding tools. This observation has been a foundation for defining the MPEG Reconfigurable Multimedia Coding framework that targets to facilitate multi-format codec design. The idea is to send a description of the codec with the bit stream, and to reconfigure the coding tools accordingly on-the-fly. This kind of approach favors software solutions, and is a substantial challenge for the implementers of mobile multimedia devices that aim at high energy efficiency. In particularly as high definition formats are about to be required from mobile multimedia devices, variable length decoders are becoming a serious bottleneck. Even at current moderate mobile video bitrates software based variable length decoders swallow a major portion of the resources of a mobile processor. In this paper we present a Transport Triggered Architecture (TTA) based programmable implementation for Context Adaptive Binary Arithmetic de-Coding (CABAC) that is used e.g. in the main profile of H.264 and in JPEG2000. The solution can be used even for other variable length codes.
Application-specific programmable processors tailored for the requirements at hand are often at the center of
today's embedded systems. Therefore, it is not surprising that considerable effort has been spent on constructing
tools that assist in codesigning application-specific processors for embedded systems. It is desirable that such
design toolsets support an automated design flow from application source code down to synthesizable processor
description and optimized machine code. In this paper, such a toolset is described. The toolset is based on a
customizable processor architecture template, which is VLIW-derived architecture paradigm called Transport
Triggered Architecture (TTA). The toolset addresses some of the pressing shortcomings found in existing toolsets,
such as lack of automated exploration of the "design space", limited run time retargetability of the design tools
or restrictions in the customization of the target processors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.