I. Flocking Theory: Collective Behaviors in Active and Living Systems
It is well known in physics that collective behavior exists in systems with only local interactions, e. g., the long-range magnetic order described by the Heisenberg model. These collective behaviors have also been observed in living systems, such as in fish school, bird flock, and bacterial swarming motion. Motivated by these biological (or active) systems where the difference from its physical counterpart is that each individual is endowed with its own energy source that allows it to move, i.e., they are self-propelled. Motivated by results from Tamas Vicsek and his coworkers using a discrete model (the Vicsek model), John Toner (U. Oregon) and I developed a hydrodynamic model to describe the large- scale collective behaviors in systems with self-propelled individuals in a series of 3 papers published during 1995-1998. We discovered that due to the non-equilibrium nature of the self-propelled systems, the long-range order is preserved even in two dimensions, which breaks the well-known Mermin-Wagner theorem that is valid for equilibrium systems. Other important collective behaviors such as the anomalous density fluctuations and the anisotropic sound waves are also studied by using the hydrodynamic model, which is now called the Toner-Tu equation. For our work in developing the flocking theory, Tamas Vicsek, John Toner and I were awarded the 2020 Lars Onsager Prize from the American Physical Society (APS): "For seminal work on the theory of flocking that marked the birth and contributed greatly to the development of the field of active matter."
Our work on the flocking theory helped open a new field called physics of active matter, which has been applied to study living systems such as the cytoskeletal spindle dynamics, collective behaviors of bacterial swarming and wound healing.
• "Long Range Order in a 2-dimensional Dynamical XY Model: How Birds Fly Together", J. Toner and Yuhai Tu, Phys. Rev. Lett. (PRL), 75(23), 4326 (1995). Flocking Theory I
• “Flocks, herds, and schools: A quantitative theory of flocking”, J. Toner, Yuhai Tu, Phys. Rev. E, 58(4), 4828 (1998). Flocking Theory II• “Sound Waves and the Absence of Galilean Invariance in Flocks”, Yuhai Tu, J. Toner and M. Ulm. Phys. Rev. Lett. (PRL), 80(21), 4819 (1998). Flocking Theory III
• “Hydrodynamics and phases of flocks”, John Toner, Yuhai Tu, Sriram Ramaswamy, Annals of Physics 318(1):170-244 (2005). Review
II. Quantitative Systems Biology: Bacterial Chemotaxis and Motility
Systems biology aims to understand biological systems in a wholistic fashion based on microscopic interactions of individual molecules in the underlying system. So far, this goal has remained a dream in most cases. Since the early 2000’s, in close collaboration with experimental biologists including the late Howard Berg and many of his former lab members, we have developed models to study all key aspects of the E. coli chemotaxis system including sensing and signal processing, flagellar motor functions, cellular motility, and population behaviors. Furthermore, we have integrated these different subsystems coherently into an integrated multiscale modeling framework that allows us to gain a system level understanding of bacterial chemotaxis and motility behaviors based on relevant molecular mechanisms. Three aspects of our work are described below:
II.A. Chemotaxis signal transduction: sensory adaptation and signal amplification.
We developed a mechanistic model for robust adaptation in E. coli chemotaxis, which was verified by in vivo experiments with time varying stimuli done in the Berg lab. Our study elucidated the molecular mechanism for the large signal gain in the chemoreceptor array. Our work has led to a quantitative predictive model of bacterial chemotaxis signaling based on molecular-level interactions.
• “An allosteric model for heterogeneous receptor complexes: understanding bacterial chemotaxis responses to multiple stimuli”, B. Mello and Yuhai Tu, PNAS, 100(14), 8223-8228 (2003). Receptor Coupling: MWC Model
• “Modeling the chemotactic response of Escherichia coli to time-varying stimuli”, Yuhai Tu, T. S. Shimizu and Howard C. Berg, PNAS, 105(39), 14855-14860 (2008). The Standard Model for E. coli Chemotaxis
• “A modular gradient-sensing network for chemotaxis in E. coli revealed by responses to time-varying stimuli”, T. S. Shimizu, Yuhai Tu, and Howard C. Berg, Molecular Systems Biology 6:382 (2010). Experimental Verification of the Standard Model for E. coli Chemotaxis
• “Adapt locally and act globally: strategy to maintain high chemoreceptor sensitivity in complex environments”, G. Lan, S. Schulmeister, V. Sourjik, Yuhai Tu, Molecular Systems Biology 7:475 (2011).Ligand-Specific Adaptation in Mixed Receptor Arrays
• “Quantitative Modeling of Bacterial Chemotaxis: Signal Amplification and Accurate Adaptation”, Yuhai Tu, Ann. Rev. Biophys., 42L:337-59 (2013).Review
II.B. Bacterial flagellar motor: motor mechanics, switching, and adaptation.
Based on Cryo-EM data and single molecule experiments, we developed a system-level model that revealed the molecular mechanisms underlying mechanical properties of the bacterial flagellar motor (BFM), its ultrasensitive switching dynamics, and the mechano-adaptation of the BFM. Our work provides a unified framework to study mechanical, chemical, and adaptive properties of BFM complex.
• “Dynamics of the bacterial flagellar motor with multiple stators”, G. Meacci and Yuhai Tu, PNAS, 106(10), 3746-3751 (2009). BFM Dynamics with Multiple Stators
• “Design principles and thermodynamic limits for irreversible molecular motors”, Yuhai Tu and Yuansheng Cao, PRE 97 (2): 022403 (2018). BFM Efficiency
• “A multi-state dynamic process confers mechano-adaptation to a biological nanomachine”, Navish Wadhwa, Alberto Sassi, Howard C. Berg, and Yuhai Tu, Nature Communications, 13, 5327 (2022). BFM Adaptation
II.C. Population model of E. coli chemotaxis: from molecules to behaviors.
We have developed a population-level model of bacterial chemotaxis based on molecular-level signaling pathway dynamics. Our population model has been verified quantitatively by microfluidic experiments and can predict bacterial chemotaxis behaviors in any natural environments that vary in space and time.
• “A Pathway-based Mean-field Model for Escherichia coli Chemotaxis”, G. Si, T. Wu, Q. Ouyang, Yuhai Tu, Phys. Rev. Lett. (PRL), 109, 048101-048105 (2012). Pathway-based Mean-Field Theory for E. coli Chemotaxis
• “Behaviors and Strategies of Bacterial Navigation in Chemical and Nonchemical Gradients”, B. Hu, Yuhai Tu, PLoS Computational Biology 10 (6), e1003672 (2014). Precision Sensing vs Gradient Sensing
• “The escape band in Escherichia coli chemotaxis in opposing attractant and nutrient gradients”, X. Zhang, G. Si, Y. Dong, K. Chen, Q. Ouyang, C. Luo, Y. Tu, PNAS, 116(6):2253-2258 (2019). Escape Band in Opposing Gradients
III. Nonequilibrium Thermodynamics of Biological Networks
Most biological systems operate out of thermal equilibrium and their functions (ultrasensitive switching, adaptation, accurate oscillation, etc.) are driven and maintained by continuous dissipation of chemical energy, e.g., by ATP hydrolysis and proton motif force (PMF). Since early 2000's, we have pioneered the study of cellular energetics in biochemical networks to elucidate the cost-performance relationship for different biological functions. In a seminal work published in 2008 (Tu, PNAS, 2008), we showed that the experimentally observed switching dynamics of the bacterial flagellar motor (BFM) indicates that the underlying switching mechanism has to operate out-of-equilibrium and the ultra-sensitivity, i.e., the high Hill coefficient in the response curve is directly related to the amount of energy dissipation per switching event. By using the same approach, we studied the energy cost of sensory adaptation and discovered a universal “energy-speed-accuracy” (ESA) tradeoff relationship independent of the underlying biochemical network architecture. We have successfully applied this framework to study biochemical oscillations, synchronization of molecular clocks (e.g., the KaiABC system in cyanobacteria), which provided unique insights on the energetics of these functions as well as their underlying mechanisms and efficient design principles. Recently, we have extended this nonequilibrium thermodynamics framework to study the energy cost of collective/emergent behaviors in spatially extended systems such as pattern formation (e.g., Turing pattern) in reaction diffusion systems and flocking systems with mutliple interacting motile agents.
Overall, our work aims to lay the foundation for studying nonequilibrium thermodynamics of biochemical networks and provide a new perspective in systems biology research. The energy-performance trade-off relationship we discovered in various biochemical networks serves as a fundamental design principle for biological networks.
• “The energy-speed-accuracy trade-off in sensory adaptation”, G. Lan, P. Sartori, S. Neumann, V. Sourjik, and Yuhai Tu, Nature Physics 8, 422–428 (2012). Energy-Speed-Accuracy (ESA) in Adaptation
• “The free-energy cost of accurate biochemical oscillations”, Yuansheng Cao, Hongli Wang, Qi Ouyang, and Yuhai Tu, Nature Physics, 11, 772 (2015). Energy Cost of Accurate Biochemical Oscillations
• “The energy cost and optimal design for synchronization of coupled molecular oscillators”, D. Zhang, Y. Cao, Q. Ouyang, Y. Tu, Nature Physics 16, 95-100 (2020). Energy Cost of Synchronization
• “A nonequilibrium allosteric model for receptor-kinase complexes: The role of energy dissipation in chemotaxis signaling”, David Hathcock, Qiwei Yu, Bernardo A. Mello, Divya N. Amind, Gerald L. Hazelbauer, and Yuhai Tu, PNAS, 120(42):e2303115120 (2023). Nonequilibrium Allosteric Model
• “Time-reversal symmetry breaking in the chemosensory array reveals a general mechanism for dissipation-enhanced cooperative sensing”, David Hathcock, Qiwei Yu, and Yuhai Tu, Nature Communications, 15(1), 8892 (2024). Time-reversal Symmetry Breaking in Receptor Arrays
• “Altruistic Resource-Sharing Mechanism for Synchronization: The Energy-Speed-Accuracy Trade-off”, D. Zhang, Y. Cao, Q. Ouyang, Y. Tu, PRL 135(3), 037401 (2025). Energy-Speed-Accuracy Tradeoff in Synchronization
IV. Computational Neuroscience
Broadly speaking, we are interested in studing information processing (sensing, representation, and learning) in realistic neural systems. Besides understanding the neural mechanisms underlying these brain bahaviors, one of our focuses is to gain insights on fundamental differences between brains and artificial neural networks, which may shed light on possible new paradigms for more efficient AI.
IV.A. Information processing in olfactory system of the fly
We are intersted in sensory information processing in olfactory system of Drosophila (fly) at both single neuron and neural network levels:
1) Adaptive sensing of olfactory sensory neurons (OSN): In close collaboration with our experimental collaborator (Prof. D. Luo, PKU) whose lab developed technique to obtain intracellular recordings from individual targeted Drosophila OSNs, we developed a theoretical model for explaining the adaptive behaviors of OSNs. We found that the adaptation-induced changes in odor sensitivity obey the Weber-Fetchner law in agreement with experiments. Our model revealed two general effects of adaptation, desensitization and prevention of saturation, which dynamically adjust odor sensitivity and extend the sensory operating range.
2) Optimal encoding strategy for the OSN array: We have also studied the encoding problem in the olfactory system — how to encode sparse high-dimensional odorant mixture by using a small number of OSNs — by mapping the problem to a nonlinear compressed-sensing (NCS) problem where each sensor (OSN) has a finite response range. Solving the NCS problem, we found that for neurons with basal activity the optimal coding matrix is “sparse” — a fraction of ligand-receptor sensitivity are zero and the nonzero sensitivity follow a broad distribution matching the odor mixture statistics. For neurons with a finite basal activity, our theory shows that introducing odor-evoked inhibition further enhances coding capacity. Our theoretical findings are supported by existing experiments, and our optimal coding model provides a unifying framework for understanding the peripheral olfactory systems.
• “Distinct Signaling of Drosophila Chemoreceptors in Olfactory Sensory Neurons”, Li-Hui Cao, Bi-Yang Jing, Xiankun Zeng, Dong Yang, Yuhai Tu, Dong-Gen Luo, PNAS, 113(7), E902-911, (2016). (doi: 10.1073/pnas.1518329113) Sensing and Adaptation of OSNs
• “Odor-evoked inhibition of olfactory sensory neurons drives olfactory perception in Drosophila", Cao et al, Nature Communications 8:1357, (2017). Odor-evoked Inhibition in Fly Olfaction• “Optimal compressed sensing strategies for an array of nonlinear olfactory receptor neurons with and without spontaneous activity”, S. Qin, Q. Li, C. Tang, Y. Tu, PNAS, 116(41):20286-20295 (2019). Compressed Sensing in Fly Olfaction
IV.B. Representational learning in piriform cortex
We are intersted in representational learning in piriform cortex, e.g., how alignment of representations is learnt and how learning affects dynamics of neural representation.
1) Alignment of representations: In collaboration with the Murthy lab (Harvard), we developed a neural network model for studying alignment of neural representations in the two brain hemispheres. We found that modifying a sparse network of inter-hemispheric synaptic connections by following local Hebbian rule is enough to achieve bilateral alignment. Our model revealed an inverse scaling law between the number of cortical neurons and the interhemispheric projection density (sparsity) required to achieve a desired alignment accuracy. We compared the alignment performance of the local learning rule with that from the global stochastic gradient descent (SGD) learning rule. We found that although SGD leads to the same alignment performance with a slightly reduced sparsity, the same inverse relation holds. Our analysis shows that the local Hebbian learning rule has a similar performance as that from the global SGD learning rule due to the fact that the update vectors from the two learning rules are highly aligned throughout the learning process, which may be a general feature for a large class of neural network architectures. The insights gained from comparing the two learning rules may inspire development of more efficient local learning algorithms for more complex problems.
2) Representational drift: The neural response to a specific stimulus evolves over time in a phenomenon known as representational drift (RD). However, the underlying mechanisms driving this widespread phenomenon remains poorly understood. By employing a realistic neural network model to investigate RD in the piriform cortex, we demonstrate that the experimentally observed RD can be quantitatively attributed to slow spontaneous fluctuations in synaptic weights. Furthermore, our model shows that a fast-learning process drives neural activity towards a lower-dimensional representational manifold, effectively reducing the dimensionality of the diffusive RD and thus reducing RD as seen in recent experiments. In addition to quantitatively explaining recent experiments in the piriform cortex from the Axel lab, our slow-fluctuation-fast-learning model offers a comprehensive framework for understanding representation dynamics in the brain. Representational drift also occurs in artificial neural networks during continuous learning setting. If not regularized, these drifts can lead to undesirable effects such as catastrophic forgetting (CF). We have been exploring ways to avoid CF based on insights from realistic neural networks where RD does not lead to interference of representations of different stimuli.
• “Representational Drift and Learning-Induced Stabilization in the Olfactory Cortex”, G. Morales, M. Muñoz, and Y. Tu, PNAS, 122 (29) e2501811122 (2025). Representational Drift.
V. Statistical Physics of Deep Learning
Recently, we have worked on developing a theoretical framework based on statistical physics and stochastic dynamical systems theory to study learning dynamics and generalization in artificial neural-network-based learning systems.
V.A. Stochastic learning dynamics
We have developed a theoretical to study the learning dynamics in deep learning neural networks where the learning process is described by a stochastic dynamical system in high-deimensional weight space. By simultaneously monitoring the weight fluctuation and loss function landscape, we discovered a general ``inverse Einstein relation” between fluctuations of weights and loss landscape flatness in stochastic gradient descent (SGD) algorithm. This inverse Einstein relation is the key mechanism that drives SGD to find flatter solutions, which are known to be more generalizable. We have used this insight to develop a novel algorithm to avoid “catastrophic forgetting” by enforcing the representation of a new task in the flat directions of the previous task in the high-dimensional weight space.
V.B. Generalization
Given the high dimensionality of the weight space and the the underconstrained nature of the deep neural networks, there are many solutions that can be found in deep learning, which can satisfy the training data equally well. A major challenge/question in deep learning is to determine which solution may have better generalization performance, i.e., doing well for unseen test data. A related challenge is to how to construct/develop regularization schemes to find these better (or more generalizable) solutions. To answer the first question, we discovered an exact duality relation between activity of a neuron and the weights connecting it to the neurons in the next layer. By using this “Activity-Weight” (AW) duality, we showed that flatness of loss landscape and size of the solution act together to determine generalization, which prove for the first time that flatter and smaller solutions are more generalizable. The AW duality allows us to understand how (and why) various popular regulartion schemes such as weight decay, drop-out, and SGD can help find more generalizable solutions.
• “Phases of learning dynamics in artificial neural networks in the absence or presence of mislabeled data”, Y. Feng and Y. Tu, Machine Learning Science and Technology, 2(4), 043001, (2021). Learning Dynamics with Mislabled Data
• “Activity–weight duality in feed-forward neural networks reveals two co- determinants for generalization”, Yu Feng, Wei Zhang, Yuhai Tu, Nature Machine Intelligence, 5:908–918 (2023). The Activity-Weight Duality in ANN
• “Stochastic Gradient Descent Introduces an Effective Landscape-Dependent Regularization Favoring Flat Solutions”, Ning Yang, Chao Tang, and Y. Tu, Phys. Rev. Lett. (PRL) 130 (23), 237101 (2023). Landscape-Dependent Regularization in SGD
• “Physics Meets Machine Learning: A Two-Way Street”, Herbert Levine and Yuhai Tu, PNAS, 121 (27), e240358012 (2024). ML Meets Physics: A Two-Way Street
Pulications
For selected publcations relevant for each research area, check out Research Areas & Selected Publications. The full publication list can be found in Google Scholar
Talks
- A recent talk on statical physics for machine learning: Understanding Deep-Learning as a Physicist: What would Einstein do?, given at a KITP Workshop on Deep Learning from the Perspective of Physics and Neuroscience (2023).
- A recent talk on thermodynamics of information processing in living systems: Nonequilibrium Thermodynamics of Biological Circuits, given at the 16th Granada Seminar (2021).
- A recent colloquium on The Energy Cost of Cellular Biological Functions given at Arizona State University (2025).
- My popular talk on "Information Processing in a Single Cell" given at the Aspen Center for Physics is available online: What and How does E. coli Compute? (2019).
