Analyze video to describe actions and transcribe audio
interact with videos !
Generate images from text descriptions