Qi, Zhenting, Hong Luo, Xuliang Huang, Zhuokai Zhao, Yibo Jiang, Xiangjun Fan, Himabindu Lakkaraju, and James Glass. "Quantifying Generalization Complexity for Large Language Models." Proceedings of ...
ARC AGI 3 shows the AGI gap clearly: humans reach 100% accuracy while models like CjatGPT 5.4 and Gemini 3.1 Pro score under ...