Reflector: Internalizing Step-wise Reflection against Indirect Jailbreaks
Ma, J., Zhang, J., Li, X., Zou, B., Lu, C., & Yang, C. (2026). "REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak." ICML 2026.
Ma, J., Zhang, J., Li, X., Zou, B., Lu, C., & Yang, C. (2026). "REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak." ICML 2026.
Wang, Y., Liu, Z., Li, X., Lu, C., & Yang, C. (2026). "Native Reasoning Models: Training Language Models to Reason on Unverifiable Data." ICLR 2026 Poster.
Shanghai AI Lab et al. (2025). "SafeWork-R1: Coevolving Safety and Intelligence under the AI-45° Law." arXiv preprint arXiv:2507.18576.
Conference proceedings talk at Testing Institute of America 2014 Annual Conference, Los Angeles, CA, USA
Talk at London School of Testing, London, UK
Tutorial at UC-Berkeley Institute for Testing Science, Berkeley, CA, USA
Talk at UC San Francisco, Department of Testing, San Francisco, CA, USA