FRACTURED-SORRY-Bench is a framework for evaluating the safety of Large Language Models (LLMs) against multi-turn conversational attacks. Building upon the SORRY-Bench dataset, we propose a simple yet ...
Abstract: Programming language source code vulnerability mining is crucial to improving the security of software systems, but current research is mostly focused on the C language field, with little ...
Can the U.S. men repeat what the U.S. women just accomplished? The U.S. men's hockey team is facing Canada in the gold medal game of the 2026 Winter Olympics on Sunday, Feb. 22, at Santagiulia Ice ...
Abstract: In this article, we present BenchING, a new benchmark for evaluating large language models (LLMs) on their ability to follow structured output format instructions in text-based procedural ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results