Category Archives: News

Visitor from Japan

Last weekend we had a visitor from the Gifu University, Dr.  Satoshi Tamura.

Satoshi has been working the following areas: speech signal processing, computer vision and image processing, music information processing, and natural language (text) processing. He also has been investigating multimodal information processing such as Audio-visual speech recognition, voice activity detection, speech conversion and model adaptation using speech signals as well as lip images. Cross-modal researches such as the application of speech technologies to the other areas. Through these activities, he would like to improve the performance of each pattern recognition task e.g. speech recognition, and to explore the universal recognition algorithm that is commonly applied to many pattern recognition areas. In addition, he has been collaborating with doctors and researchers in the school of medicine.

Satoshi will work with us on our medical projects such as the live colonoscopy video analysis and he will also support us for the collaboration and communication with the Hiroshima University Hospital.




Building a Supercomputer with Tegra K1s and Dolphin PCI-E Interconnects

The Tegra K1 is a highly advanced SoC featuring a dual cluster CPU (quad-core high performance cluster + one low-power core) as well as a 192-core Kepler-based, CUDA compatible NVIDIA graphics processor. As a mobile processor it is therefore impressive in terms of both performance and power usage, capable of delivering some 365 GFLOPS at below 10 W. As part of an ongoing project to save energy by video filter processing, we have built a cluster of four Tegra K1s (on Jetson-TK1 development kits) which are connected with a Dolphin PCI-Express interconnect. The goal is to save energy by distributing a workload among several heterogeneous cores, thereby avoiding unnecessary stress on the hardware components at their peak utilisation levels.


The picture above shows the four Tegra K1s (Yin, Yang, Anakin and Darth) on handcrafted mounting boards that are made by acrylic plastic, an electric drill and a lot of patience. The acrylic boards are necessary because the PCI-E cards need to be stable during operation, and the Jetson-TK1 motherboards do not really have any solid connector for them. Note to NVIDIA – please ship hardware for this purpose with the Jetson-TK1 in the future 😉 In the background of the picture above you can also see the Dolphin IXS600 switching box. This switch is necessary for more than two Tegra K1s to be connected; otherwise, only the two IXH610 PCI-E cards and a direct cable connection between them is needed. The maximum raw throughput is 500 MB per second, where the Tegra K1 root complex provides a single lane to the PCI-E card, both of which are PCI-E gen 2.0 compatible.


In effect, the four Tegra K1s provide 16 high-performance CPU cores, 4 low-power cores, almost 800 GPU cores and a total of 8 GB of RAM, making for a theoretical maximum of 1.46 TFLOPS in less than 25-30 W. Note that the power estimate includes a lot of idle components on the Jetson-TK1 boards themselves, and that the actual power usage of the GPU is smaller. Our estimates indicate that the GPUs draw only up to 12W, but the actual GPU power usage is hard to measure due to a lack of sensors on the Jetson-TK1.

From our research, we have found that in general, energy can be saved when processing video by reducing the performance of the Tegra K1 (for example by reducing clock frequencies) but without compromising quality of service requirements such as a specific framerate. In our clustered Tegra experiments we will test our hypothesis that power can be saved by distributing workload on several heterogeneous cores, thereby reducing the stress of individual hardware components at lower utilisation levels.

Presentation Accepted for GTC 2016

Kristoffer Robin Stokke will give a 50-minute presentation on high-precision power modelling of the Tegra K1 SoC’s CPU, GPU and memory at NVIDIA’s GPU Technology Conference (GTC) in April this year. His talk will focus on how the power usage of heterogeneous processing elements can be modelled generically using information about operating frequencies, hardware utilisation, rail voltages and power optimisation techniques such as clock and core gating. A detailed description of the presentation, as well as time and place of the presentation and stream can be found here. His 2015 poster on energy-efficient video encoding using various frequency scaling algorithms and naive / optimised software implementation can be found here.

Unboxing the Shield TV

We recently got our hands on a brand new NVIDIA Shield TV, the world’s first Android-driven 4K gaming and entertainment console for home and office use. The platform features the Tegra X1 System-on-Chip (SoC), the successor to the Tegra K1 SoC which was especially popular with tinkerers and hobbyists due to the Jetson-TK1 development kit. The Tegra X1 is basically a heavily upgraded version of the Tegra K1 SoC with roughly twice as much hardware power as its predecessor; featuring a dual quad-core CPU cluster (4x ARM Cortex-A53 + 4x ARM Cortex-A57) as well as a CUDA-capable, 256-core Maxwell GPU (GM20A). Of course, we’re all very excited to see what’s on the inside of this box, so we decided to strip it down.


The Shield doesn’t have any obvious screws, nuts or bolts. Instead, the top cover is mounted with plastic clips which must be opened using a small screwdriver. With a little care, it is possible to do without damaging the plastic:


With the top cover removed we can see all the hardware components. The black cover is a placeholder for the hard drive which is only available in the expensive version. There are also three main circuit boards: the lower one is a peripheral board with expansion headers and various interfaces such as USB, LAN and main power. There is also an unsoldered SATA port. A challenge for the regular hacker would therefore be to solder on a port and attempt to use their own hard drive. The smaller (middle) board is the main board with the Tegra X1, RAM and all core components. It is quite small which is maybe not that suprising, given that the X1 is a mobile SoC. The last circuit board on the “top” of the package is a simple board integrating an antenna for WiFi and a power-on button.


With fan, I/O and main circuit boards removed:


Close-up of I/O and main circuit boards:


The black wires are the main power connectors. Surprisingly, NVIDIA decided to include rail power measurement sensors on the Shield TV with INA3221 power measurement sensors. Unfortunately these are not configured for reading power usage directly, and some development in the kernel sources is therefore needed if this functionality is desirable. The Tegra X1 (silver) can be seen on the main board, connected with the I/O board using over PCI Express using the flat connector. Here are some close-ups of the main board over and underside:


In the above photo, the components on the right are basically related to power supply and management, connectors on the left is the WiFi+BT module and the two upper chips are the RAM chips (3GB LPDDR). On the underside, the only major compoent is the 16GB eMMC (NAND) for storage:


All in all, the Shield is quite easy to unbox and get back together again into working state.



The problem of overbuffering in today’s Internet (termed as bufferbloat) has  recently drawn a great amount of attention from the research community. This  has led to the development of various active queue management (AQM) schemes.  The last years have seen a lot of effort to show the15210144673_b37c806986_z benefits of AQMs over simple  tail-drop queuing and to encourage deployment. Yet it is still unknown to what  extent AQMs are deployed in the Internet. We have developed an active measurement tool, called TADA (Tool for Automatic Detection of AQMs), that can detect if the bottleneck router on a particular communication path uses AQM. Our detection technique is based on analyzing the patterns of queue delays and packet losses. The tool is composed of a Sender process running at the sender machine and a Receiver process running at the receiver machine. The tool uses UDP for sending constant bit rate (probing) streams and TCP for a control channel between the endpoints.

The sourcecode can be found at:

DigSys Pillcam Medical Workshop

The media performance group hosted a workshop for the DigSys Pillcam pre project. In this workshop we tried to bring the medical and the computer science world together. After two days of fruitful discussion and hard work we achieved this goal and we even managed to define a catalogue of abnormalities in the digestive system that are required by medical doctors.





RITE talk at LinuxCon Europe 2015

Per Hurtig and Andreas Petlund presented some of the RITE prototypes that has been implemented in the Linux kernel at Linux Conference Europe 6.10.2015. LinuxCon Europe 2015 had 1484 registered attendees and gathers core competence from the open source community, both developers, businesses and academics.

The slides from the presentation can be found here.

Linux networking, and especially Linux TCP has seen a lot of development recently. In RITE, one of the goals is to develop networking technology that enables lower latency transport available to the public and industry, and Linux is at the centre of this focus. Prototypes implemented in the Linux kernel includes keeping the congestion window appropriately open for bursty traffic (newCWV), Faster retransmissions for application limited flows (RTO restart and TLP restart), Redundant bundling to avoid retransmissions for thin streams and bringing hybrid delay-based congestion control to the Linux kernel for less queue buildup in bottlenecks.

img_0077 img_0192 img_0193

Third Life premieres at the WUK in Wien

Artists Otto Krause and Milan Loviška took to the stage in three public performances of the Third Life Project, while their research collaborators from Stellenbosch University in South Africa, University of Duisburg-Essen in Germany and Simula Research Lab and Norway followed closely from the around the stage, to step in and assist on either the virtual or real-world side of their performance.

WUK: Third Life (8.10. – 10.10.2015, Generalprobe) | Fotos:

WUK: Third Life (8.10. - 10.10.2015, Generalprobe) | Foto:

WUK: Third Life

WUK: Third Life (8.10. - 10.10.2015, Generalprobe) | Foto:

WUK: Third Life

WUK: Third Life (8.10. - 10.10.2015, Generalprobe) | Foto:

WUK: Third Life

WUK: Third Life (8.10. - 10.10.2015, Generalprobe) | Foto:

WUK: Third Life

WUK: Third Life (8.10. - 10.10.2015, Generalprobe) | Foto:

WUK: Third Life

WUK: Third Life (8.10. - 10.10.2015, Generalprobe) | Foto:

WUK: Third Life

WUK: Third Life (8.10. - 10.10.2015, Generalprobe) | Foto:

WUK: Third Life

WUK: Third Life (8.10. - 10.10.2015, Generalprobe) | Foto:

WUK: Third Life

WUK: Third Life (8.10. – 10.10.2015, Generalprobe) | Fotos:

High Performance Computing meets Performance Art

In their performative lecture the artists together with an international team of experts exploit technology and employ artistic vision to blur the lines between human beings and machines and between reality and imagination. They explore up-to-date possibilities of development of an avatar performance for a real life audience, which operates within mixed realities (real and “second life”) and coessentially aspires to open the door to the “third life”, where virtuality can transgress directly into reality.

With the use of a “smart stage” they address the new performative possibilities of virtual environments that aren’t limited or constrained by the local space that the physical bodies inhabit. This unique interface of a simulated virtual world, Internet of Things and novel tracking technologies allow virtual characters to perform activities in the real world, whereas activities of performers in the real world enable changes in the virtual world. The notion of third life is manifested here not only in the synchronous interconnection of the virtual and the real but also in their divergence alike, and brings up for question and re-examination what a body is, how a body operates and whether that body is alive or dead, real or virtual.

Third Life Project, initiated in early 2014, implements artistic and scientific research and is devised in the ongoing, networked collaboration across national boundaries.

Premiere @ WUK Vienna, 08 October 2015.

Concept/Dramaturgy/Scenography/Performance: Otto Krause & Milan Loviška
Virtual environments of Minecraft: Otto Krause alias Aproktas
Minecraft expertise and gesture control: Herman Engelbrecht, Jason Bradley Nel (Stellenbosch University/MIH Medialab, South Africa)
Tracking: Carsten Griwodz, Lilian Calvet (Simula Research Lab & LABO Mixed Realities, Norway)
Cyberphysical devices and Non-Player-Characters: Gregor Schiele, Alwyn Burger, Stephan Schmeißer, Christopher Cichiwskyj (University of Duisburg-Essen, Germany)
Server: René Griessl (Bielefeld University, Germany)

A co-production of Territorium – Kunstverein and WUK Performing Arts in Vienna.

With the kind support of the City of Vienna’s Department of Cultural Affairs, the Arts Division; and the Arts and Culture Division of the Federal Chancellery of Austria. With the contribution from the FiPS project funded from the EU’s 7th Framework Programme for research, technological development and demonstration under grant agreement no 609757. Thanks to LABO Mixed Realities in Norway, the EU project POPART (Previz for On-set Production – Adaptive Realtime Tracking) funded under grant agreement no 644874, the Bielefeld University in Germany and the Stellenbosch University in South Africa.WUK: Third Life (8.10. – 10.10.2015, Generalprobe) | Fotos:



Energy-Efficient Processing Myth Debunked at MCSoC 2015

Kristoffer Robin Stokke gave a presentation on energy-efficient video processing at the Multi-Core System-on-Chip (MCSoC) conference in Italy. The talk debated the popular belief that energy can be saved in embedded systems by “racing to the finish”, i.e. maximising processor and memory speed until processing is done. For video post-processing, this was shown to be an inefficient strategy compared to an approach where processor and memory frequency is reduced until an application’s requirements (for example a framerate of 25 FPS) are met. The slides from the talk can be downloaded from here.

MCSoC 2015 had 53 registered attendees and featured several talks from industry (where Oracle and LG Electronics were represented) and academia on versatile topics in embedded systems such as FPGA power estimation and automatic code generation for run-time power management.

« Older Entries Recent Entries »