Recently, Professor Hong Liang's research group from the Institute of Natural Sciences, School of Physics and Astronomy, Zhangjiang Advanced Research Institute, and School of Pharmacy at Shanghai Jiao Tong University, in collaboration with Tan Pan, a junior researcher at the Shanghai Artificial Intelligence Laboratory, unveiled VenusMine—a new addition to the Venus series of large models designed for enzyme discovery. This model integrates a large-scale protein language model with three-dimensional structural analysis. By leveraging implicit mapping rules between protein sequences, structures, and functions, it efficiently identifies enzymes with low homology yet superior functionality within vast protein databases. Applying this model, the team successfully identified a series of PET hydrolases. Among these, KbPETase from Kibdelosporangium banguiense demonstrated exceptionally high catalytic efficiency and thermal stability, exhibiting optimal enzyme activity 97 times greater than its template, IsPETase. The research findings, titled ‘Harnessing Protein Language Model for Structure-Based Discovery of Highly Efficient and Robust PET Hydrolases’, were published in Nature Communications, a journal under the Springer Nature group.
Research Background
Plastic pollution has become a global environmental challenge, with polyethylene terephthalate (PET) attracting particular attention due to its widespread use and recalcitrant nature. Conventional mechanical and chemical recycling methods suffer from low efficiency and environmental contamination issues. Against this backdrop, enzymatic hydrolysis is regarded as the most promising PET degradation solution due to its green and efficient characteristics. However, known natural PET hydrolases (PETases) generally exhibit low activity and poor thermal stability, severely limiting industrial applications and creating an urgent need to develop high-performance PETases. Conventional enzyme discovery methods (such as BLAST) primarily rely on sequence similarity analysis. This approach can only identify enzyme molecules highly homologous to known enzymes, systematically overlooking numerous ‘hidden gems’ with low sequence similarity but similar functions. Furthermore, 99% of microorganisms in nature remain uncultured, leaving vast untapped reserves of potentially high-quality enzymes with unknown functions undiscovered. More critically, sequence similarity does not equate to functional similarity; many enzymes with substantial sequence divergence may possess analogous three-dimensional structures and catalytic mechanisms. Recent breakthroughs in structural biology and artificial intelligence technologies have enabled large-scale, high-precision structural prediction through tools like AlphaFold. while structure alignment algorithms like FoldSeek enable rapid three-dimensional similarity searches. Emerging protein language-structure multimodal models (e.g., SaPro, ProSST, VenusREM) can capture deep sequence-structure correlations. Consequently, there is an urgent need to develop novel enzyme mining paradigms based on cutting-edge AI methods to facilitate the discovery of superior wild-type PETases for tackling plastic pollution.
Research Methodology
To address this, this paper introduces VenusMine, an innovative enzyme mining approach leveraging large-scale protein models. The method first employs these models to extract high-dimensional features containing critical biological information. It then completes end-to-end intelligent screening through hierarchical clustering, sequence deduplication, physicochemical property prediction, and expressability assessment. Employing a ‘structure-first’ strategy, it precisely targets candidates from vast datasets, identifying high-performance proteins with low sequence similarity but functional similarity for minimal wet-lab validation. VenusMine demonstrates broad coverage, high efficiency, and outstanding candidate protein quality during mining. Its application extends beyond PETase; the team has successfully deployed it to efficiently discover nearly 10 novel enzymes, all validated through wet-lab experiments.

Figure 1: VenusMine workflow and clustering results.
Research Findings
Using the IsPETase crystal structure as a query template, VenusMine maximised the identification of candidate PET hydrolases (33 million entries) by searching high-dimensional features within its proprietary database. Subsequently, structural feature embedding and clustering analysis via a protein language model, coupled with multi-tier screening (including thermal stability prediction, solubility assessment, and structural alignment), yielded 34 high-potential candidate proteins. Without specific training, VenusMine successfully captured all currently known PETase sequences, demonstrating the model's comprehensiveness and reliability. Experimental validation revealed that 26 candidate proteins (76.5%) achieved soluble expression, with 14 (53.8% of expressed proteins) exhibiting significant ester bond cleavage and PET degradation activity. These active proteins represent the first reported PET hydrolases, with the majority exhibiting catalytic properties comparable to the template enzyme IsPETase. This discovery strongly validates the advantages of structure-based enzyme mining strategies for functional prediction.

Figure 2: Performance characterisation of candidate proteins.
Among the proteins examined, KbPETase from Kibdelosporangium banguiense demonstrated superior properties across all metrics. Its protein melting temperature (Tm) of 77.58 °C indicates high thermal stability. Simultaneously, it demonstrates outstanding catalytic activity, with its optimal degradation enzyme activity towards PET film (at 50°C) reaching 97 times that of the template enzyme IsPETase at its optimal activity temperature (30°C). Even when compared to another high-performance wild-type PET hydrolase, LCC, its optimal enzyme activity is 1.5 times that of the latter. Moreover, the proportion of the end product terephthalic acid (TPA) in the degradation products is significantly increased. This characteristic will help simplify downstream separation and purification processes, reduce post-processing costs, and hold greater prospects for industrial application.

Figure 3: Performance characterisation of KbPETase.
Summary
VenusMine, a new addition to the Venus protein large-scale model developed by Shanghai Jiao Tong University and the Shanghai Artificial Intelligence Laboratory, represents an innovative protein discovery tool based on large language models. It offers biologists and protein engineering researchers unprecedented possibilities for uncovering functionally superior yet low-homology ‘hidden gems’ within vast databases of proteins with unknown functions. Complementing the team's earlier release of the 9-billion-protein database (VenusPod), this development is poised to significantly advance research in synthetic biology and biopharmaceutical fields.
The study's co-first authors are Wu Banghao, a doctoral candidate at Shanghai Jiao Tong University's School of Life Sciences and Technology, and Zhong Bozitao, a pre-doctoral researcher at the same institution. Corresponding authors include Professor Hong Liang from the University's Institute of Natural Sciences, School of Physics and Astronomy, Zhangjiang Advanced Research Institute, and School of Pharmacy; Dr Zheng Lirong, a postdoctoral researcher at the University of Michigan's Institute for Neuroscience and Institute for Cellular and Developmental Biology; and Tan Pan, a junior researcher at the Shanghai Artificial Intelligence Laboratory. This research received funding from the Shanghai Major Science and Technology Special Project, the National Key R&D Programme (2024YFA0917603), the Shanghai Municipal Science and Technology Commission Computational Biology Project (23JS1400600), the Shanghai Municipal Education Commission Scientific Research Programme (2024AIZD015), the Shanghai Jiao Tong University Science and Technology Innovation Fund (21X010200843), Chongqing Key R&D Programme for Technological Innovation and Application Development (CSTB2022TIAD-STX0017, CSTB2024TIAD-STX0032), the Shanghai Jiao Tong University Student Innovation Centre, the Shanghai Artificial Intelligence Laboratory, the Shanghai Jiao Tong University High Performance Computing Centre, and the Shanghai Jiao Tong University Analytical Testing Centre.
Paper link: https://doi.org/10.1038/s41467-025-61599-z