In an example, a method for facilitating participation in a conference by a client device, comprises: receiving a digital video captured by a camera; receiving a computer-readable audio signal captured by a microphone; operating a face identification machine to recognize a face of a local conference participant in the digital video; operating a speech recognition machine to translate the computer-readable audio signal to text; operating an attribution machine to attribute the text to the local conference participant recognized by the face identification machine; sending, to a conference server device, the text attributed to the local conference participant; receiving, from the conference server device, a running transcript of the conference including the text attributed to the local conference participant, and further including different text attributed to a remote conference participant; and displaying, in real time, new text added to the running transcript and attribution for the new text.