Learning From Rewards In Text Generation