An interaction server 120 on the datacenter 116 side can translate the user interactions to input gestures compatible with the remote device 112 and input those translated gestures to the remote device 112, which can in turn input the translated gestures to the software 103. A screen capture application 124 can capture a video feed of the display of the remote device 112 and can transmit the video feed via the video communication channel to the ULB 105. The ULB 105 or an application running on the ULB 105 can use the video feed to generate the replica display. The user interacts with the replica display, and no parallel processes of the software 103 are run on the ULB 105 or the local machine on which the ULB 105 is installed. The RDA 111 can include additional components, such as a media capture module 126, which can assist in capturing user input media, such as voice in instances where the user may wish to inject or test media input to the software 103.