publications
(*) denotes equal contribution.
For a complete list, visit my Google Scholar profile.
2025
- PreprintSigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts PerspectivearXiv:2502.00281, 2025Under review
- AISTATSUnderstanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of ExpertsIn International Conference on Artificial Intelligence and Statistics (AISTATS), 2025
- ICLRStatistical Advantages of Perturbing Cosine Router in Sparse Mixture of ExpertsIn International Conference on Learning Representations (ICLR), 2025
2024
- PreprintQuadratic Gating Functions in Mixture of Experts: A Statistical InsightarXiv:2410.11222, 2024Under review
- ICMLImproving Computational Complexity in Statistical Models with Local Curvature InformationIn International Conference on Machine Learning (ICML), 2024
- ICMLIs Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?In International Conference on Machine Learning (ICML), 2024
- ICMLA General Theory for Softmax Gating Multinomial Logistic Mixture of ExpertsIn International Conference on Machine Learning (ICML), 2024
- ICLRStatistical Perspective of Top-K Sparse Softmax Gating Mixture of ExpertsIn International Conference on Learning Representations (ICLR), 2024
2022
- NeurIPSImproving Counterfactual Explanations for Time Series Classification Models in Healthcare SettingsIn NeurIPS 2022 Workshop on Learning from Time Series for Health, 2022