[LM] Challenging BIG-Bench tasks


CoT was already available when BIG-Bench first came out, but CoT did not perform well on small scale models (emergent effect could not be achieved), so BIG-Bench did not mention using CoT; but after that, PaLM / Davinci-002 / code-davinvi-002 and other larger scale models appeared. So there was a motivation to verify the effect of CoT on the new baseline of BIG-Bench. Sure enough, CoT is indeed better for many tasks.

More information here