Posts by Collection

portfolio

AI-Powered Operator Network Operations Automation

发布时间： September 01, 2021

Developed AI-based automation systems for operator network operations, improving efficiency and reducing manual intervention in network management.

AI-Based Base Station Error Handling and Diagnosis

发布时间： March 01, 2022

Developed intelligent systems for automated base station error detection, diagnosis, and resolution, reducing downtime and maintenance costs.

Road Detection and Autonomous Driving

发布时间： June 01, 2023

Research and development on road detection and autonomous driving systems, focusing on computer vision and perception for autonomous vehicles.

Robot Reinforcement Learning

发布时间： September 01, 2023

Research on reinforcement learning for robotic control and manipulation, developing intelligent agents for robotic tasks.

Text-to-Scene Generation

发布时间： March 01, 2024

Generating 3D scenes from natural language descriptions, bridging the gap between language understanding and 3D scene representation.

MLLM-MCTS-CoT: Monte Carlo Tree Search for Multimodal Reasoning

发布时间： November 26, 2024

Combining Monte Carlo Tree Search (MCTS) with Chain-of-Thought reasoning for enhanced reasoning in multimodal large language models.

Multimodal Large Language Model Chain-of-Thought

发布时间： December 04, 2024

Research on Chain-of-Thought (CoT) reasoning for multimodal large language models, improving reasoning capabilities in vision-language tasks.

Automated Evaluation Framework Improvements (lmms-eval)

发布时间： January 01, 2025

Improving automated evaluation frameworks based on lmms-eval for comprehensive assessment of multimodal and language models.

openRLHF and lightRLHF Framework Improvements

发布时间： January 01, 2025

Algorithm improvements and enhancements for the openRLHF reinforcement learning framework, including lightRLHF - a lightweight version based on openRLHF improvements.

veRL Framework Improvements

发布时间： March 01, 2025

Framework integration and improvements for veRL (Volcano Engine Reinforcement Learning), a flexible, efficient and production-ready RL training library for large language models.

Research MCP Servers

发布时间： August 01, 2025

A collection of Model Context Protocol (MCP) servers for research workflows, including arXiv integration and other research tools.

Multi-Agent Research Assistant

发布时间： August 14, 2025

An intelligent multi-agent system for research assistance, enabling collaborative problem-solving among multiple AI agents.

Multi-round Reinforcement Learning and Self-Evolving Agents

发布时间： November 01, 2025

Exploring multi-round reinforcement learning and self-evolving agent architectures for developing adaptive and continuously improving AI systems.

publications

Paper Title Number 1

Published in Journal 1, 2009

This paper is about the number 1. The number 2 is left for future work.

Recommended citation: Your Name, You. (2009). "Paper Title Number 1." Journal 1. 1(1).
Download Paper | Download Slides | Download Bibtex

Paper Title Number 2

Published in Journal 1, 2010

This paper is about the number 2. The number 3 is left for future work.

Recommended citation: Your Name, You. (2010). "Paper Title Number 2." Journal 1. 1(2).
Download Paper | Download Slides

Paper Title Number 3

Published in Journal 1, 2015

This paper is about the number 3. The number 4 is left for future work.

Recommended citation: Your Name, You. (2015). "Paper Title Number 3." Journal 1. 1(3).
Download Paper | Download Slides

Paper Title Number 4

Published in GitHub Journal of Bugs, 2024

This paper is about fixing template issue #693.

Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper

Paper Title Number 5, with math \(E=mc^2\)

Published in GitHub Journal of Bugs, 2024

This paper is about a famous math equation, \(E=mc^2\)

Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45° Law

Published in arXiv, 2025

We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training, supported by a suite of multi-principled verifiers.

Recommended citation: Shanghai AI Lab et al. (2025). "SafeWork-R1: Coevolving Safety and Intelligence under the AI-45° Law." arXiv preprint arXiv:2507.18576.
Download Paper

Large Language Model Reinforcement Learning Algorithm Optimization

Published in Under Review, 2025

A novel approach for optimizing reinforcement learning algorithms in large language model training, focusing on improving training efficiency and stability.

Recommended citation:
Download Paper

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015