Certain objects and advantages include providing for a spatial audio processing system and method that is robust to changes in an acoustic environment and capable of providing undistorted human speech and other quasi-stationary signals. Certain objects and advantages include providing for a spatial audio processing system and method that requires limited audio learning data; for example, two seconds (cumulative).
In various embodiments, an exemplary system and method according to the principles herein may process audio input data to calculate/estimate, and/or use one or more machine learning techniques to learn, an acoustic propagation model between a target location of a sound source relative to one or more array elements within an acoustic space. In certain embodiments, the one or more array elements may be co-located and/or distributed transducer elements.
Embodiments of the present disclosure are configured to accommodate for suboptimal acoustic propagation environments (e.g., large reflective surfaces, objects located between the target acoustic location and the transducers that interfere with the free-space propagation, and the like) by processing audio input data according to a data processing framework in which one or more boundary conditions are estimates within a Green's Function algorithm to derive an acoustic propagation model for a target acoustic location.