Evaluating large language models trained on code