The invention claimed is:1. An apparatus for feature-based compression, the apparatus comprising:a feature encoder configured to extract one or more features from a set of raw information, to generate a set of features that compresses the raw information by a compression ratio that satisfies a predetermined physical channel capacity limit for a transmission channel, each feature providing information about a respective probability distribution that each represents one or more aspects of the subject; anda transmitter configured to transmit a reduced set of the features over the transmission channel.2. The apparatus of claim 1, wherein the feature encoder implements a probabilistic encoder to generate the set of features and the probabilistic encoder is implemented using an encoder deep neural network (DNN), and wherein the encoder DNN is trained to satisfy: a first target of maximizing likelihood between a set of recovered information at a corresponding decoder DNN, and a second target of minimizing an upper boundary of mutual information to be within the predetermined physical channel capacity limit.3. The apparatus of claim 2, wherein the encoder DNN and the decoder DNN are trained together.4. The apparatus of claim 2, wherein the compression ratio provided by the trained encoder DNN and the decoder DNN has been determined by performing training on a plurality of candidate encoder and decoder DNN pairs, each candidate encoder and decoder DNN pair providing a respective different compression ratio, and selecting the candidate encoder and decoder DNN pair and associated compression ratio that minimizes the upper boundary of mutual information.5. The apparatus of claim 1, wherein the reduced set of features omit a correlated feature of the set of features.6. The apparatus of claim 5, further comprising:a historical database storing at least one previously transmitted feature; andwherein the correlated feature is any feature that is unchanged compared to the at least one previously transmitted feature.7. The apparatus of claim 5, wherein the correlated feature is indicated by a control message.8. The apparatus of claim 1, wherein the transmitter is configured to:assign a sub-channel for transmission of each respective feature, the assigning being based on a relative importance of each feature; andtransmit the set of features over the sub-channels.9. The apparatus of claim 8 wherein each feature indicates an expectation value of the respective probability distribution and a variance value of the respective probability distribution, and the relative importance of each feature is determined based on the variance value of each respective feature.10. The apparatus of claim 9 wherein the transmitter is further configured to:select a transmission scheme for each assigned sub-channel, the transmission scheme being selected to indicate the variance value of the feature assigned to each respective sub-channel; andtransmit the expectation value of each feature over the respective sub-channel in accordance with the respective transmission scheme.11. The apparatus of claim 10 wherein the transmitter is further configured to:generate a control message or header indicating the selected transmission scheme and assigned sub-channel for each feature; andtransmit the control message or header.12. The apparatus of claim 9, wherein a first feature having a first variance value and a second feature having a second variance value similar to the first variance value are assigned to the same sub-channel for transmission.13. A method for managing a plurality of sensors monitoring a common subject, each sensor generating and transmitting a respective set of features representing one or more aspects of the subject, the method comprising:determining a correlated feature that is highly correlated between a first set of features generated by a first sensor and a second set of features generated by a second sensor;generating a control message to the first sensor to cause the first sensor to omit the correlated feature from transmission; andreconstructing the first set of features from a transmission from the first sensor by filling in the omitted correlated feature.14. The method of claim 13, wherein the first set of features is reconstructed by copying the correlated feature from the second set of features received from the second sensor.15. The method of claim 13, further comprising:determining that the correlated feature is a background feature that is unchanged over a predetermined time period; andwherein the first set of features is reconstructed by copying the correlated feature from a historical database containing a previously transmitted instance of the background feature.16. The method of claim 15, wherein a same or different control message is generated to cause the first sensor and the second sensor to omit the background feature from transmission.17. The method of claim 13, further comprising:determining a requested set of features that is requested by an application, wherein the requested set of features is a subset of the first set of features;generating the control message to the first sensor to cause the first sensor to transmit only the subset of features; andreconstructing the first set of features from a transmission from the first sensor by filling in untransmitted features with random values.18. The method of claim 13, wherein a same or different control message is generated to cause the first sensor and the second sensor to alternately transmit or omit the correlated feature.19. The method of claim 13, wherein all features in the first set of features are highly correlated with the second set of features, and wherein the control message causes the first sensor to enter a sleep mode.20. The method of claim 13, wherein the first sensor has a poorer physical layer transmission performance than the second sensor.