Author Archives: griff

RATS at the Festival of Lights 2019

RATS is short for Real-time Adaptive Three-sixty Streaming, a software that was mainly developed by Trevor Ballardt while he stayed at TU Darmstadt and worked for the MAKI SFB. RATS was used in a test case of the H2020 project 5Genesis, which is preparing to showcase 5G during the Festival of Lights 2020 in Berlin. The test was conducted at Humboldt University during the Festival of Lights 2019.

Fraunhofer Fokus, one of the local partners in Berlin, wrote a piece to summarize the test from the 5Genesis perspective. We contributed video streaming and talk about it in this article.

FoL 2019 at the Humboldt University (© Magnus Klausen 2019)

The original idea of RATS was to use NVenc to convert an input stream from a 360 camera into a set of tiles in real time, which could be encoded at several qualities on the server before stitching them into a set of tiled H.265 videos. These H.265 videos would form a succession of qualities suitable for the orientation of the 360 camera. This idea was published in a demo paper at ACM MMSys 2019, and the intended application in 5Genesis as a short paper at ACM Mobicom’s S3. The code for RATS can be found on Github.

However, 5Genesis is also about dense user populations that access a live video feed, and such density can only be achieved if users can stream to their mobile phones without installing any additional software. The RATS idea would work perfectly for this if mobile phones’ browsers supported H.265. Unfortunately, Android phones do not.

So, instead of tiling in the sensible manner, we modified a clone of ffmpeg (a clone with minor modifications that is required for RATS). ffmpeg can be configured to use NVenc for encoding video streams in H.264, and it can also generate fragmented MPEG4 with suitable HLS and DASH manifest files. In case of DASH, MPD (manifest) files can the form of templates, which removes the needs for clients to download updates even in case of live streams, while HLS clients require updates. Instead of merging tiles after compressing them separately, we used Gaussian filtering on tile-shaped regions of the video to reduce the coding complexity. An arbitrary number of these version can be generated in parallel, using our new ffmpeg CUDA module for partial blurring.

The camera that we installed at the FoL 2019 was a surveillance camera with a fisheye lens (actually a panomorph lens, but close enough to fisheye to make our life easy), while we settled onto VideoJS for panorama display in our cross-platform web pages which should show the FoL videos on arbitrary browsers. It was a bit irritating that the current version of VideoJS has lost Fisheye projection support while it has gained both DASH and HLS support.

Consequently, we had project our panorama from fisheye to equirectangular projection. We followed two approaches. In one, we added a reprojection module into ffmpeg that uses CUDA to make the conversion before streaming, followed by a configuration of VideoJS that allowed to project only one half of a sphere, since a fisheye camera records only a single hemisphere. In the other, we extended VideoJS to support fisheye lenses directly. While the first piece of code may be more generally useful, we found that the single conversion of the second approach (which will be published in a master thesis next year) provides better visual quality.

Point Cloud Based Room Reconstruction

Download PDF on request.

Pipeline for Point Cloud Processing focused on Room Reconstruction

Scanning and modeling the interior rooms and spaces are important topics in Computer Vision. The main idea is to be able to capture and analyze the geometry of these interior environments.

Despite the considerable amount of work put in indoor reconstruction over the past years, all implementations suffer from various limitation and we still have yet to ?nd a method that will work in any scenario, in our approach we will focus in encoding points to reduce the space taken by point cloud on the disk as well as introducing a 3D meshing techniques that add coherence and readability to the point cloud. This is why we suggest the following approach.

When scanning a closed room, important/main features are the walls; it would be interesting to detect (and compress) the walls, floor and ceiling. Walls, floor and ceiling bound the room and have a high probability to represent high-density of points.

It is relevant to attempt to detect the main planar component in order to determine the boundaries of the point cloud and also we can, in a second step replace them with a more simplistic modelization, like a surface that only takes 4 vertices and 4 edges potentially replacing thousands of points. This will also allow us to filter the points scanned through windows that will be outside the rooms

We can code this 3D Mesh box as a graph since the point cloud might be not complete and suffer from occlusion. A graph-based architecture will make a strong starting point to start developing other features. But in order to do that we will need to detect all relevant planar component in our point cloud, this time we will initiates the segmentation with a region growing method in order to avoid the issue detected in Figure 7, insuring each plane resulting from the plane segmentation is linked to exactly one main planar component fed into the graph approach. Our graph structure is kept to the most simplistic entities so that it can be applied to a wild variety of scenarios (faces for the planes, edges are intersections of two planes, and corners/vertices are intersections of 3 planes.)

After detecting the walls and replacing them with simple faces, we will add them features lost by the plane approximation using height maps that are textures that model the height of the walls of each coordinate (X,Y). The operation opens up the domain of image processing and we are able to generate high-resolution versions of the room as well as low-resolution versions.

Reconstruction of Indoor Environments Using LiDAR and IMU

Today there is a trend towards reconstruction of 3D scenes with movement over time, in both image-based and point cloud based reconstruction systems. The main challenge in point cloud-based systems is the lack of data. Most of the existing data sets are made from 3D-reconstructed meshes, but the density of these constructions is unrealistic.

Point cloud from a fixed LIDAR scan (left) to a LIDAR sweep (right)

In order to do proper research into this field, it must be possible to generate real data sets of high-density point clouds. To deal with this challenge, we have been supplied with a VLP-16 laser scanner and a Tinkerforge IMU Brick 2.0. In our final setup, we position the IMU at the top center of the VLP-16 by utilizing a 3D printed mounting plate. This assembly is fastened to a tripod, in order to move the assembly about well-defined axes. Because most laser scanners acquire points sequentially, these devices do not have the same concept of frame as for images where all data are captured in the same instant. To deal with this issue we divide one scan, i.e., a 360◦ LiDAR sweep, into data packets and transform these data packets using the associated pose to global space. We compensate for mismatch in sampling frequency between the VLP-16 and the IMU by linear interpolation between the acquired orientations. We generate subsets of the environment by changing the laser scanner orientation in static positions and estimate the translation between static positions using point- to-plane ICP. The registration of these subsets is also done using point- to-plane ICP. We conclude that at subset level, our reconstruction system can reconstruct high-density point clouds of indoor environments with a precision that is mostly limited to the inherent uncertainties of the VLP- 16. We also conclude that the registration of several subsets obtained from different positions is able to preserve both visual appearance and reflective intensity of objects in the scene. Our reconstruction system can thus be utilized to generate real data sets of high-density point clouds.

Latex writing tips

When writing papers or theses, Latex is the tool of choice for computer science students and researchers. In contrast to WYSIWYG word processors, whether they are locally installed or cloud-based, Latex emphasizes your ability to write text and deals with the layout on your behalf. Even if Latex’s decisions about placing figures and tables are not exactly controllable and not always according to your liking, it is actually very hard to use Latex for the creation of a document that is visually as unappealing as a letter written in word.

Although an author who is new to Latex may get used to the inconvenience of visible tags to represent special text features instead of a direct visual feedback of a WYSIWYG word processor, there are some things that are harder in Latex.


The classical collaboration tool of WYSIWYG users, email, has been impractical compared to Latex’s ability to be used with a version control system such as Git. Cloud-based WYSIWYG processors, however, allow several users to collaborate interactively with each other. This remains hard in Latex, but a Cloud service called Overleaf exists that provides the same feature for Latex users.

It features complete Latex support as well as PDF previews and at the same time also live interaction and side-bar comments. Furthermore, it provides Git access to the underlying repository, which can be cloned into a local directory. Consequently, it is also possible to push local updates into the repository in Overleaf, which permits safe offline working on the same document. The local Git repository can of course also be pushed to a secondary remote node, allowing a user to keep a secondary safe copy.

Spell checking

Spell checking has been notoriously bad for essentially all word processors except Word. This problem has been alleviated for many of us by the arrival of Grammarly. Grammarly calls itself “the free writing assistant”, and it is well capable of protecting us from most spelling and grammatical mistakes that we usually make in English. It can be integrated into Chrome and is already quite effective in its free version.

Interestingly, Grammarly can be used when Chrome is used to write Latex in Overleaf. A discussion on stackexchange reveals how it can be used.


Many people are disappointed that Google Docs has no bibliography system, while Microsoft Word users have access to EndNote. Latex can use several bibliography systems, among them Bibtex, which needs much less attention than EndNote. There is an old tool named bibtex that extracts and converts Bibtex entries according to a format chosen inside the latex document, and there is a younger tool named biber that does the same job. Porting Latex documents (but not the Bibtex files) from bibtex to biber has a learning curve, and I haven’t done it yet.

Most publishers provide options for downloading appropriate Bibtex (as well as EndNote) citations for every book or paper, but in the case of Bibtex, it is possible to collect all of them in one single large collection and refer to its entries from every Latex document that you write.

Just collecting bibliography in such a large file has its disadvantages, however. It is particularly inconvenient when the collection has grown large and you are searching for a specific document whose details you cannot remember perfectly. Of course, a basic text editor can be used to search in that file, but also other features such as grouping by topic are desirable.

This is where Mendeley and Zotero come in. Both are Cloud-based systems that allow you to manage to bibliography and export it in a variety of formats including Bibtex and EndNote. Both of these systems have desktop clients to manage your online collection, and both are capable of storing PDF files of the managed content for you.

Mendeley is today owned by Elsevier, Zotero by the Corporation for Digital Scholarship; a user may prefer the one or the other for this reason. At UiO, the recommended tool is apparently Zotero.

Reducing Packet Loss in Real-Time Wireless Multicast Video Streams with Forward Error Correction

Download PDF.

Average packet loss without and with FEC

Wireless multicast suffers from severe packet loss due to interference and
lack of link layer retransmission. In this work, we investigate whether the
most recent Forward Error Correction (FEC) draft is suitable for realtime wireless multicast live streaming, with emphasis on three main points:
packet reduction effectivity, and latency and overhead impact. We design
and perform an experiment in which we simulate wireless packet loss in
multicast streams with a Gilbert model pattern of ≈ 16% random packet
loss. We check all FEC configurations (L and D values) within several
constraints: maximum 500 milliseconds repair window (latency impact),
66.67% overhead, and a maximum L value of 20. For all these L and D
values we stream the tractor sample three times, to avoid possible outliers
in the data. We show that packet loss reduction in the most recent FEC
draft is effective, at most reducing from ≈ 16% down to ≈ 1.02%. We
also show that low latency streaming can be conducted, but it requires a
minimum of 160 milliseconds additional latency for our stream file. The
overhead for such low latency can be as high as 66.67%.

Implementation of a virtual reality design review application using vision-based gesture recognition technology

Download PDF.

Choosing annotation visibility in the virtual design review application.

Classification societies date back to the second half of the 18th century, where marine insurers developed a system for independent technical assessment of the ships presented to them for insurance cover. Today, a major part of a classification society’s responsibilities is to review the designs of enormous maritime vessels. This usually involves working with big and complex 3D models and 3D tools, but without support to do many of the tasks required in a design review. As a consequence, the workflow is often just partially digital, and many important tasks, such as annotating or commentating on aspects of the models, are done on paper. DNV GL, the world’s largest maritime classification society, is interested in digitalizing this process more, and make it more interactive and efficient by utilizing an application that allows for virtual design review meetings in the 3D models. In these virtual design review meetings, the designer and reviewer could remotely interact, survey the model together, and annotate it instead of model-printouts. As the sense of scale is important in a 3D model review, virtual reality technology is deemed promising as it gives a unique sense of scale and a depth, which is hard to match by regular 2D screens. DNV GL is also interested in alternate interaction methods, as mouse and keyboard can have some limitation when working in 3D environments. Gesture Recognition Technology has been of special interest as this can potentially offer unique approaches to working with 3D models. This thesis implements such a design review application using state-ofthe- art virtual reality- and vision-based gesture recognition technologies, coupled with the Unity game engine, a popular cross-platform game development platform and software framework. After discussing these technologies’ theoretical foundations, the thesis reviews the requirements and design of the design review application, in addition to documenting its implementation and evaluating its performance by conducting user tests. In the implemented design review application the user is able to navigate 3D models, annotate them and perform various other actions, all performed by gestures.

Implementation and initial assessment of VR for scientific visualisation: Extending Unreal Engine 4 to visualise scientific data on the HTC Vive

Download PDF.

In the visualizer: outline (left), depth buffer (middle), stencil buffer (right)

Virtual Reality (VR) for scientific visualization has been researched from the 90s, but there has been little research into the fundamental aspects of VR for scientific visualisation. Questions like “Is VR ready for adoption?”, “How does VR design differ from design for monocular systems?” are two examples of fundamental questions yet to addressed. In this paper a scientific visualiser based on the game engine Unreal Engine 4 (UE4) was developed and tested by educators and researchers. A full ray marcher was successfully implemented and a near zero-cost cutting tool was developed. VR is found to have a lot of potential for improving visualisation of data sets with structural “interleaved complexity”. VR has also been deemed ready for limited mass adoption. Through field testing visualisations of volumetric and geometric models, three major issues are identified:

Current VR hardware lacks adequate input options. Menu and interaction design must be reinvented. Furthermore, 90 FPS is required for comfortable and extended VR use, which makes most current algorithms and data sets incompatible with VR. The conclusion reached through analysis of and feedback regarding the computational cost and design challenges of VR is that VR is best utilised as a tool in already existing monocular visualisation tool kits. By using a monocular system to perform most of the encoding and filtering and then use VR for inspecting the pre-processed model, it is possible to obtain the best of both worlds.

A 3D stateroom editor

Download PDF.

Today’s software systems reach easily hundreds of thousands of lines of code, and such systems do frequently benefit from the use of state machines, which help in managing system complexity by guaranteeing completeness and consistency. To visualize such state machines, statecharts have been introduced, which also offer a formalism for orthogonal and hierarchical states. Many developers use state machines, but even with statecharts as a tool, it is a major challenge to keep an overview of the machine and its effects. Gaining an overview as a newcomer to a project is an even larger challenge. In this paper, we argue that a 3D statechart can alleviate these challenges somewhat, and present an editor for state machines that are given in SCXML, an XML notation for statecharts. This editor allows the user to explore the state machine by navigating freely in a 3D environment and make changes to the machine. The editor is a prototype and incomplete. It is an attempt to reflect the idea of having statecharts presented in 3D space.

A testbed to compare Dynamic Adaptive Streaming over HTTP network configurations

Download PDF.

Streaming video over the internet has become vastly popular over the
past decade. In recent years there have been a shift towards using the
Hypertext Transfer Protocol (HTTP) for delivery of layered, segmented,
video content. These solutions go under the name HTTP Adaptive bit-rate
Streaming (HAS). One of the streaming solutions using this is the recent
international streaming standard Dynamic Adaptive Streaming over HTTP
(DASH). The increased popularity of HAS has significantly increased the
chance that multiple HAS clients will share the same network bottleneck.
Studies show that this can introduce unwanted network characteristics,
but as of today there are no good way of running realistic evaluations of
how different network configurations will impact HAS players sharing the
same bottleneck. To solve this, we have set up a testbed capable of running
automated streaming sessions, and have performed three experiments
using the DASH industry forum’s reference player, DASH.js, to present the
capabilities of the testbed.

LADIO project will follow up on POPART

The Horizon 2020 project proposal LADIO: Live Action Data Input and Output has been accepted.

Like POPART before, LADIO is an innovation action of 18 months. This time, we focus on maximizing the collection of metadata on film sets in order to simplify the collaboration of post-production facilities.
The project will have a strong vision aspect, especially since the Technical University of Prague is joining the POPART team, but Media will concentrate on aspects of storage and transmission. Stay tuned for more news form LADIO.

« Older Entries