Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
portfolio
AI-Powered Operator Network Operations Automation
Published:
Developed AI-based automation systems for operator network operations, improving efficiency and reducing manual intervention in network management.
AI-Based Base Station Error Handling and Diagnosis
Published:
Developed intelligent systems for automated base station error detection, diagnosis, and resolution, reducing downtime and maintenance costs.
Road Detection and Autonomous Driving
Published:
Research and development on road detection and autonomous driving systems, focusing on computer vision and perception for autonomous vehicles.
Robot Reinforcement Learning
Published:
Research on reinforcement learning for robotic control and manipulation, developing intelligent agents for robotic tasks.
Text-to-Scene Generation
Published:
Generating 3D scenes from natural language descriptions, bridging the gap between language understanding and 3D scene representation.
MLLM-MCTS-CoT: Monte Carlo Tree Search for Multimodal Reasoning
Published:
Combining Monte Carlo Tree Search (MCTS) with Chain-of-Thought reasoning for enhanced reasoning in multimodal large language models.
Multimodal Large Language Model Chain-of-Thought
Published:
Research on Chain-of-Thought (CoT) reasoning for multimodal large language models, improving reasoning capabilities in vision-language tasks.
Automated Evaluation Framework Improvements (lmms-eval)
Published:
Improving automated evaluation frameworks based on lmms-eval for comprehensive assessment of multimodal and language models.
openRLHF and lightRLHF Framework Improvements
Published:
Algorithm improvements and enhancements for the openRLHF reinforcement learning framework, including lightRLHF - a lightweight version based on openRLHF improvements.
veRL Framework Improvements
Published:
Framework integration and improvements for veRL (Volcano Engine Reinforcement Learning), a flexible, efficient and production-ready RL training library for large language models.
Research MCP Servers
Published:
A collection of Model Context Protocol (MCP) servers for research workflows, including arXiv integration and other research tools.
Multi-Agent Research Assistant
Published:
An intelligent multi-agent system for research assistance, enabling collaborative problem-solving among multiple AI agents.
Multi-round Reinforcement Learning and Self-Evolving Agents
Published:
Exploring multi-round reinforcement learning and self-evolving agent architectures for developing adaptive and continuously improving AI systems.
Agent Harness: Long-Term Memory for Adaptive AI Agents
Published:
A memory-augmented agent harness framework that enables AI agents to continuously learn and adapt through long-term episodic and semantic memory mechanisms.
Self-Evolving Multi-Agent Systems with Hierarchical Memory
Published:
A self-evolving multi-agent framework that enables agents to autonomously improve strategies through multi-round interactions and hierarchical memory structures.
publications
Collaborative Multi-Agent Reinforcement Learning for Complex Task Solving
Published in Under Review, 2025
We propose a collaborative multi-agent reinforcement learning framework that enables agents to efficiently coordinate and solve complex tasks through emergent communication and adaptive role assignment.
Recommended citation:
Download Paper
Harnessing Long-Term Memory for Adaptive AI Agents
Published in Under Review, 2025
We present a memory-augmented agent architecture that harnesses long-term episodic and semantic memory to enable adaptive behavior and continual learning in dynamic environments.
Recommended citation:
Download Paper
Self-Evolving Multi-Agent Systems with Hierarchical Memory
Published in Under Review, 2025
We propose a self-evolving multi-agent framework with hierarchical memory structures that enables agents to continuously improve their strategies through multi-round interactions and experience replay.
Recommended citation:
Download Paper
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45° Law
Published in arXiv, 2025
We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training, supported by a suite of multi-principled verifiers.
Recommended citation: Shanghai AI Lab et al. (2025). "SafeWork-R1: Coevolving Safety and Intelligence under the AI-45° Law." arXiv preprint arXiv:2507.18576.
Download Paper
Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
Published in ICLR 2026 Poster, 2026
We propose a novel approach for training language models to reason on unverifiable data, enabling native reasoning capabilities without requiring ground-truth supervision.
Recommended citation: Wang, Y., Liu, Z., Li, X., Lu, C., & Yang, C. (2026). "Native Reasoning Models: Training Language Models to Reason on Unverifiable Data." ICLR 2026 Poster.
Download Paper
TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems
Published in arXiv, 2026
We present TrinityGuard, a unified safety framework for multi-agent systems that ensures robust and trustworthy coordination among AI agents through multi-layered safeguarding mechanisms.
Recommended citation:
Download Paper
Reflector: Internalizing Step-wise Reflection against Indirect Jailbreaks
Published in ICML 2026, 2026
We propose Reflector, a framework that internalizes step-wise reflection mechanisms to defend against indirect jailbreak attacks on large language models.
Recommended citation: Ma, J., Zhang, J., Li, X., Zou, B., Lu, C., & Yang, C. (2026). "REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak." ICML 2026.
Download Paper
talks
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.
