Introduction

Speech technologies have met with an enormous success and have become a central element of our society. With the ubiquity of mobile devices, technically enabled speech communication at any time and from any place has become a commodity.
Advanced applications, such as untethered teleconferencing systems, smart rooms, telepresence or audio-visual systems for ambient assisted living or for surveillance and monitoring, are gaining importance. But their performance still often falls short of user expectations: They require a significant amount of user cooperation and attention, are tailored to specific use cases, depend on close-talking microphones, and cannot tolerate highly variable acoustic environments. Furthermore, systems that incorporate a classification and recognition component rely on an expert a priori definition of the events to be recognized, and on the availability of labeled and application-specific training data. These requirements render such systems expensive and inflexible, and their performance is still limited.

On the other hand, there is the inexpensiveness and proliferation of acoustic sensors and the ubiquity of wireless communications, which allow a cost-effective realization of the infrastructure for the aforementioned applications, based on wireless acoustic sensor networks (ASNs). Traditional microphone arrays sample a sound field only locally, often at a relatively large distance from the target source(s), resulting in poor signal quality. ASNs use many more microphones to cover an area of interest, making it likely that at least one microphone is close to every relevant sound source, capturing a signal with improved quality and saliency. But to leverage the enormous potential of ASNs for spatio-temporal signal processing and classification significant challenges have to be overcome. They are related to the loose coupling of the sensors, the spatial extent of the network, the natural acoustic environment it is deployed in, and the increased signal enhancement and analysis demands of novel applications.

This Research Unit (RU) represents a coordinated effort to transgress the boundaries of existing speech and audio technology to enable flexible, hardware-, environment-, and usage-adaptive, high-quality speech communication and acoustic scene classification over acoustic sensor networks. We categorize the corresponding applications as small-space applications (SSA) and large-space applications (LSA) according to the geometric extent of the space covered by the sensor network. The RU will address the specific challenges of these application categories as identified in the sequel.

Small-space applications (SSA) will typically rely on a few acoustic sensor nodes distributed in a single or in multiple rooms. Key applications include

Ambient assisted living (AAL) and smart homes: Current technical solutions for AAL hardly ever consider acoustic sensors, owing to insufficient signal quality, other technical limitations, and privacy concerns. If solved, an advanced AAL system using audio will bring about many benefits: It will enhance the user's voice over arbitrary nonstationary distortions, glean context information by observing the acoustic scene, and will be able to detect abnormal and possibly hazardous events by their acoustic signature. It will hence become an important technology for an aging society to support a prolonged self-determined life at home.
Personal communication scenarios with a natural look & feel: This includes teleconferencing without headsets and the use of multiple sound capturing devices. While today's solutions rely on pre-installed audio equipment, a possible future solution would be to let the participants' smartphones spontaneously form a multichannel sound capturing and speech enhancement system.

Large-space applications (LSA) employ acoustic sensor networks that cover an extended geographical area. Example applications include

Surveillance of public or private spaces: While traditional audio-visual systems conduct event detection based on energy thresholding and classify signals into one of a few pre-trained classes, an advanced system will build models of the typical sounds of the environment using unsupervised and semi-supervised learning techniques, thus being more adaptive to the task at hand and requiring less costly labeled training data.
Environmental and habitat monitoring: An acoustic sensor network can serve a multiplicity of purposes from monitoring adherence to noise control regulations to monitoring endangered animal species in a wildlife refuge. By using appropriate signal representations and local processing at the sensor nodes the amount of data the network must transmit and store will be greatly reduced.

This RU is dedicated to address the key scientific challenges which are common to these relevant applications.

Principal Investigators

Dr.-Ing. habil. Gerald Enzner

Lehrgebiet Adaptive Systeme der Signalverarbeitung,
Lehrstuhl für Allg. Informationstechnik u. Kommunikationsakustik (AIKA),
Fakultät für Elektrotechnik und Informationstechnik (ETIT),
Ruhr-Universität Bochum (RUB)

Gebäude ID/2/227
Universitätsstraße 150
D-44780 Bochum (Germany)

Phone: +49-234-32-25392
Fax: +49-234-32-14165
Email: gerald.enzner@rub.de

Website

: https://ruhr-uni-bochum.de/ika

Prof. Dr.-Ing. Reinhold Häb-Umbach

Communications Engineering Group
Department of Electrical Engineering, Computer Science and Mathematics
Faculty of Computer Science, Electrical Engineering and Mathematics
Paderborn University

Pohlweg 47-49
33098 Paderborn

Phone: +49 5251 60-3626
Fax: +49 5251 60-3627
E-Mail: haeb(at)nt.upb(dot)de
Website: http://ei.uni-paderborn.de/nt/

Prof. Dr. Holger Karl

Computer Networks
Department of Electrical Engineering, Computer Science and Mathematics
Faculty of Computer Science, Electrical Engineering and Mathematics
Paderborn University

Pohlweg 47-49
33098 Paderborn

Phone: 49 5251 60-5375
E-mail: holger.karl(at)uni-paderborn(dot)de
Website: https://cs.uni-paderborn.de/de/cn/

Prof. Dr.-Ing. Walter Kellermann

Friedrich-Alexander-Universität Erlangen-Nürnberg
Lehrstuhl für Multimediakommunikation und Signalverarbeitung
Wetterkreuz 15
91058 Erlangen

Mailing Address:
Cauerstraße 7
91058 Erlangen

Phone: +49 9131 85 27669

Fax:

+49 9131 85 28849
E-Mail: Walter.Kellermann@FAU.de
Website: https://lms.lnt.de/en/

Prof. Dr.-Ing. Rainer Martin

Ruhr-Universität Bochum
Institut für Kommunikationsakustik
Fakultät für Elektrotechnik und Informationstechnik
Raum: ID/2/233
Universitätsstr. 150
D-44780 Bochum

Phone: +49 234 32 22495
E-Mail: Rainer.Martin@rub.de
Website: https://ruhr-uni-bochum.de/ika

Dr.-Ing. Jörg Schmalenströer

Communications Engineering Group
Department of Electrical Engineering, Computer Science and Mathematics
Faculty of Computer Science, Electrical Engineering and Mathematics
Paderborn University

Pohlweg 47-49
33098 Paderborn

Phone: +49 5251 60-3626
Fax: +49 5251 60-3627
E-Mail:schmalen(at)nt.upb(dot)de
Website: http://ei.uni-paderborn.de/nt/

Further information:

Research Unit Leader

Prof. Dr. Reinhold Häb-Umbach

Communications Engineering / Heinz Nixdorf Institute

Head of Department of Communications Engineering

Write email +49 5251 60-3626

More about the person

In­tro­duc­tion

Further information:

Prof. Dr. Reinhold Häb-Umbach

Introduction