Two years ago I was approached by someone from a public TV broadcaster in Germany with the following problem: Given multiple video files with differently cut versions of the same production, is it possible to use the technology from AudioAlign/Aurio to automatically generate edit decision lists (EDL/XML) and use them to transfer subtitles from a reference version to the different cuts? The answer is “yes”, and that’s just one of many use-cases. This article describes the challenges and how the Aurio technology solves them almost magically in a successful prototype developed for the TV station.
Aurio adds support for realtime live fingerprinting and cross-plattform cloud deployments with .NET Core 2.0
A collaboration with eyecandylab, a company developing products for augmenting TV programs, recently gave me the opportunity to implement great new features into Aurio. The most recent version released today extends the architecture to support processing of realtime audio streams with infinite lengths, which means that live streams can now be fingerprinted on the fly with minimal latency. Additionally, the Aurio core library has been ported to .NET Standard 2.0 and will run with the .NET Core 2.0 framework on Windows, Linux and MacOS, enabling building microservices in containerized environments like Docker.
An example application named
Aurio.Test.RealtimeFingerprinting has been added to demonstrate how realtime live fingerprinting can be implemented with only a few lines of code. As part of supporting .NET Core, resampler and FFT dependencies building on native code have been moved from the core into optional add-on packages, and the WDL resampler ported to purely managed .NET code by NAudio has been integrated as an alternative for deployments where native dependencies are undesired. This means that now there are purely managed implementations available for both FFT (Exocortex.DSP with
Aurio.Exocortex) and resampling (
NAudioWdlResampler within the
Of course Aurio continues to support the good old WPF GUI applications, but the framework requirement had to be increased from .NET Framework 4.0 to 4.6.2. AudioAlign has also been updated to the latest Aurio version.
This October I talked about spherical audio for 360°/VR videos at Demuxed, a conference for video engineers. I explain the principles of the Ambisonics surround sound technique from recording to playback through a speaker array or headphones with binaural playback, and how it can be used in web applications and video players through the Web Audio API.
The slides edited into the video are slightly incomplete and miss a few important details, and I made a mistake talking about “headphones” instead of “microphones” on the HRTF slide. Other than that, I think the video can be helpful to everyone trying to understand what the Ambisonics technique is. There’s also a short recap on the talk, conference, and the Ambisonics integration into the Bitmovin video player on the company blog called 360° VR Audio with the Bitmovin Player.
Native immersive 360° VR video playback on Android with Spectaculum
Playback of immersive 360° video on Android is usually done in a WebView with an HTML5 video player. This tutorial demonstrates how to display 360° video in a native view widget to save the overhead of a whole browser stack. This is done by using the versatile Spectaculum view widget for video rendering and the popular ExoPlayer for video decoding. Both of these libraries are open source under the Apache 2.0 license and available on GitHub and the JCenter repository.
Spectaculum is a view widget for Android to display visual content in a GLES accelerated context, providing zooming and panning functionality, parameterized shader effects, and frame grabbing. It comes with additional views that save developers a lot of time and implement all functionality for displaying bitmap images, camera preview, and videos through the Android MediaPlayer, MediaPlayer Extended, and ExoPlayer. The packaged shader effects range from simple color filters to immersive 360° VR video rendering.
The Spectaculum view can be used with all sources that can write to a surface or surface texture, which is essentially every visual content source, but I recommend using one of the many available modules if applicable. Example use-cases are photo galleries and picture viewers with zooming/panning support and optional picture effects through shaders (e.g. contrast adjustment, color correction), video players with live image adjustments through shaders and 3D/360°/immersive/VR playback, and camera previews with live effects. An extensive documentation on functionality, API, usage, and modules is available on GitHub. The library is also available from JCenter’s Maven repository, and a demo app that showcases various views and shader effects is available on the Play Store.Spectaculum on GitHub
LAIS.Foto is a voluntary project that I developed a few months ago for a friend. It is basically a self-hosted online photo sharing platform with community features, similar to Flickr. Compared to other existing platforms, it features a unique upload and download workflow with an intermediate moderation step, tailored to the principle of donating and requesting pictures.
The story began when the client asked me for my opinion on how she could build and manage a photo sharing platform with one of these online website generators provided by webhosting companies. The idea was to create a picture sharing website called LAIS.Foto, where people could donate (upload) pictures for charity and nonprofit projects that cannot or do not want to afford paid pictures for their promotion work. Members of these projects would then request and acquire (download) pictures that they want to use.
A new website presenting my professional software development services is now online.Protyposis Multimedia Solutions
Since the previous post from about a year ago, the ITEC MediaPlayer for Android has evolved to its second major version, receiving a lot of bugfixes, a rewritten playback core with huge performance improvements, and the ability to playback audio-only sources. Its new name now bumps it to version 3.0 and it is reaching a point where I feel confident that it can be used in production, and in fact, people are already starting to use it.
Yesterday I discovered the free TVUnblock.com service and was suprised how easy it is to use. Configuring my OpenWRT router to get access to US Netflix content only took me about 5 minutes. Being a free service, TVUnblock requires users to register their IP with them to unlock the unblocking functionality. For internet connections with dynamic IP adresses, this means that you have to regularly re-register your IP address, else you’re back to the local program, or none at all if you use it for services that aren’t available outside the US.
To automate this process, I have written a small IP update script:
AudioAlign is a tool that I started developing in 2010 for my master’s thesis (and has been actively developed since then), with the goal to create a software for the automatic synchronization of audio and video recordings. Although I never quite reached the point of a fully automatic synchronization system, it showed promising results compared to the few similar commercial applications available on the market, and continues to be a helpful tool for my research purposes. I gave up on the plan to commercialize it due to patenting problems I didn’t know how to deal with, but instead decided on open sourcing it so others could still make use of it and hopefully even help me improve it. Aurio is a library extracted from AudioAlign, providing the underlying core audio processing functionality like an audio processing engine and audio fingerprinting and time warping algorithms. Both Aurio and AudioAlign are now available as AGPL licensed open source software on GitHub.