where Lc denotes the overall optimization function for both embedding and clustering, y denotes a hyperparameter determining the weight coefficient of the cluster cost, and uc denotes the cluster mean for cth cluster. Note that they are trainable parameters.