論文

2025

J.-weon Jung, Y. Wu, X. Wang, J.-H. Kim, S. Maiti, Y. Matsunaga, H.-jin Shim, J. Tian, N. Evans, J. S. Chung, W. Zhang, S. Um, S. Takamichi, and S. Watanabe, “SpoofCeleb: Speech Deepfake Detection and SASV In The Wild,” IEEE Open Journal of Signal Processing, vol. 6, pp. 68–77, 2025.
Y. Ishikawa, T. Nakamura, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Real-Time Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation and Spatially Regularized Independent Low-Rank Matrix Analysis with Fast Demixing Matrix Estimation,” IEEE Access, vol. 13, pp. 88683–88706, 2025.
Y. Mizobuchi, D. Kitamura, T. Nakamura, N. Takamune, H. Saruwatari, Y. Takahashi, and K. Kondo, “Music bleeding-sound reduction based on time-channel nonnegative matrix factorization,” APSIPA Transactions on Signal and Information Processing, 2025. (accepted)

2024

D. Xin, S. Takamichi, and H. Saruwatari, “JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions,” Speech Communication, vol. 156, p. 103004, 2024.
Y. Saito, K. Yatabe, and Shogun, “Does controller sound contain valuable information for video game scene analysis? Case study by character identification of Super Smash Bros. Ultimate,” Acoustical Science and Technology, vol. 45, no. 2, pp. 113–116, 2024.
T. Saeki, S. Maiti, X. Li, S. Watanabe, S. Takamichi, and H. Saruwatari, “Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1829–1844, 2024.
D. Xin, J. Jiang, S. Takamichi, Y. Saito, A. Aizawa, and H. Saruwatari, “JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions,” IEEE Access, vol. 12, pp. 19752–19764, 2024.
J. G. C. Ribeiro, S. Koyama, and H. Saruwatari, “Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach,” EURASIP Journal on Advances in Signal Processing, vol. 2024, no. 43, 2024.
J. G. C. Ribeiro, S. Koyama, R. Horiuchi, and H. Saruwatari, “Sound field estimation based on physics-constrained kernel interpolation adapted to environment,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4369–4383, 2024.
K. Imamura, T. Nakamura, K. Yatabe, and H. Saruwatari, “Neural Analog Filter for Sampling-Frequency-Independent Convolutional Layer,” APSIPA Transactions on Signal and Information Processing, vol. 13, no. 1, e28, 2024.

2023

T. Saeki, S. Takamichi, T. Nakamura, N. Tanji, and H. Saruwatari, “SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources,” IEEE Access, Dec. 2023.
T. Hasumi, T. Nakamura, N. Takamune, H. Saruwatari, D. Kitamura, Y. Takahashi, and K. Kondo, “PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.
X. Luo, S. Takamichi, Y. Saito, T. Koriyama, and H. Saruwatari, “Emotion-controllable Speech Synthesis using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence,” APSIPA Transactions on Signal and Information Processing, vol. 13, no. 1, 2023.

2022

H. Kai, S. Takamichi, S. Shiota, and H. Kiya, “Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules,” Computer Speech & Language, vol. 72, p. 101315, 2022.
M. Takeuchi, J. Ahn, K. Lee, K. Takaki, T. Ifukube, K.-ichiro Yabu, S. Takamichi, R. Ueha, and M. Sekino, “Hands-Free Wearable Electrolarynx using LPC Residual Waves and Listening Evaluation,” Advanced Biomedical Engineering, vol. 11, pp. 68–75, 2022.
Y. Okamoto, K. Imoto, S. Takamichi, R. Yamanishi, T. Fukumori, and Y. Yamashita, “Onoma-to-wave: Environmental sound synthesis from onomatopoeic words,” APSIPA Transactions on Signal and Information Processing, 2022.
K. Saito, T. Nakamura, K. Yatabe, and H. Saruwatari, “Sampling-Frequency-Independent Convolutional Layer and Its Application to Audio Source Separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022.
J. G. C. Ribeiro, N. Ueno, S. Koyama, and and Hiroshi Saruwatari, “Region-to-region Kernel Interpolation of Acoustic Transfer Functions Constrained by Physical Properties,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2944–2954, 2022.
Y. Kondo, Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Deficient-basis-complementary rank-constrained spatial covariance matrix estimation based on multivariate generalized Gaussian distribution for blind speech extraction,” EURASIP Journal on Advances in Signal Processing, vol. 2022, no. 88, 2022.
竹内雅樹, 副島裕太郎, 安在帥, 李根学, 高木健, 伊福部達, 藪謙一郎, 高道慎之介, and 関野正樹, “線形予測法(LPC)残差波を用いて自然発声に近い音声を得るハンズフリー型電気式人工喉頭の開発,” in 電気学会論文誌A, 2022, vol. 142, no. 9, pp. 390–396.

2021

K. Kamo, Y. Mitsui, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Joint-diagonalizability-constrained multichannel nonnegative matrix factorization based on time-variant multivariate complex sub-Gaussian distribution,” Elsevier Signal Processing, vol. 188, p. 108183, Jun. 2021.
T. Nakamura, S. Kozuka, and H. Saruwatari, “Time-Domain Audio Source Separation with Neural Networks Based on Multiresolution Analysis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1687–1701, 2021.
T. Nakamura and H. Kameoka, “Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Separation of Harmonic Sounds,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 68–82, 2021.
Y. Saito, T. Nakamura, Y. Ijima, K. Nishida, and S. Takamichi, “Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification,” Acoustical Science and Technology, vol. 42, no. 1, pp. 1–11, 2021.
Y. Saito, S. Takamichi, and H. Saruwatari, “Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1033–1048, 2021.
T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Real-time full-band voice conversion with sub-band modeling and data-driven phase estimation of spectral differentials,” IEICE Transactions on Information and Systems, vol. E104.D, no. 7, pp. 1002–1016, 2021.
A. Aiba, M. Yoshida, D. Kitamura, S. Takamichi, and H. Saruwatari, “Noise Robust Acoustic Anomaly Detection System with Nonnegative Matrix Factorization Based on Generalized Gaussian Distribution,” IEICE Transactions on Information and Systems, vol. E104.D, no. 3, pp. 441–449, 2021.
T. Saeki, S. Takamichi, and H. Saruwatari, “Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model,” IEEE Signal Processing Letters, vol. 28, pp. 857–861, 2021.
N. Ueno, S. Koyama, and H. Saruwatari, “Directionally weighted wave field estimation exploiting prior information on source direction,” IEEE Transactions on Signal Processing, vol. 69, pp. 2383–2395, 2021.
Y. Mitsufuji, N. Takamune, S. Koyama, and H. Saruwatari, “Multichannel blind source separation based on evanescent-region-aware non-negative tensor factorization in spherical harmonic domain,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 607–617, 2021.
K. Mitsui, T. Koriyama, and H. Saruwatari, “Deep Gaussian Process Based Multi-speaker Speech Synthesis with Latent Speaker Representation,” Elsevier Speech Communication, vol. 132, pp. 132–145, 2021.
S. Mizoguchi, Y. Saito, S. Takamichi, and H. Saruwatari, “DNN-based low-musical-noise single-channel speech enhancement based on higher-order-moments matching,” IEICE Transactions on Information and Systems, vol. E104.D, no. 11, pp. 1971–1980, 2021.

2020

N. Makishima, Y. Mitsui, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Independent deeply learned matrix analysis with automatic selection of stable microphone-wise update and fast sourcewise update of demixing matrix,” Signal Processing (Elsevier), vol. 178, no. 107753, Sep. 2020.
M. Aso, S. Takamichi, N. Takamune, and H. Saruwatari, “Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis,” Elsevier Speech Communication, vol. 125, pp. 53–60, Sep. 2020.
Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1948–1963, Jun. 2020.
H. Tamaru, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Generative moment matching network-based neural double-tracking for synthesized and natural singing voices,” IEICE Transactions on Information and Systems, vol. E103-D, no. 3, pp. 639–647, May 2020.
J. Koguchi, S. Takamichi, M. Morise, H. Saruwatari, and S. Sagayama, “DNN-based full-band speech synthesis using GMM approximation of spectral envelope,” IEICE Transactions on Information and Systems, vol. E103.D, no. 12, pp. 2673–2681, 2020.
Y. Saito, K. Akuzawa, and K. Tachibana, “Joint adversarial training of speech recognition and synthesis models for many-to-one voice conversion using phonetic posteriorgrams,” IEICE Transactions on Information and Systems, vol. E103.D, no. 9, pp. 1978–1987, 2020.
H. Tamaru, S. Takamichi, and H. Saruwatari, “Perception analysis of inter-singer similarity in Japanese song,” Acoustical Science and Technology, vol. 41, no. 5, pp. 804–807, 2020.
S. Takamichi, Y. Saito, N. Takamune, D. Kitamura, and H. Saruwatari, “Phase Reconstruction from Amplitude Spectrograms Based on Directional-Statistics Deep Neural Networks,” Elsevier Signal Processing, vol. 169, 2020.
S. Takamichi, R. Sonobe, K. Mitsui, Y. Saito, T. Koriyama, N. Tanji, and H. Saruwatari, “JSUT and JVS: free Japanese voice corpora for accelerating speech synthesis research,” Acoustical Science and Technology, vol. 41, no. 5, pp. 761–768, 2020.
Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition,” Elsevier Signal Processing, vol. 169, 2020.

2019

D. Sekizawa, S. Takamichi, and H. Saruwatari, “Prosody correction preserving speaker individuality for Chinese-accented Japanese HMM-based text-to-speech synthesis,” IEICE Transactions on Information and Systems, vol. E102.D, no. 6, pp. 1218–1221, Jun. 2019.
S. Takamichi and D. Morikawa, “Perceived azimuth-based creditability and self-reported confidence for sound localization experiments using crowdsourcing,” Acoustical Science and Technology, vol. 40, no. 2, pp. 142–143, Mar. 2019.
H. Nakajima, D. Kitamura, N. Takamune, H. Saruwatari, and N. Ono, “Bilevel optimization using stationary point of lower-level objective function for discriminative basis learning in nonnegative matrix factorization,” IEEE Signal Processing Letters, vol. 26, no. 6, pp. 818–822, 2019.
S. Mogami, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, and N. Ono, “Independent low-rank matrix analysis based on time-variant sub-Gaussian source model for determined blind source separation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, pp. 503–518, 2019.
N. Makishima, S. Mogami, N. Takamune, D. Kitamura, H. Sumino, S. Takamichi, H. Saruwatari, and N. Ono, “Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 10, pp. 1601–1615, 2019.
Y. Saito, S. Takamichi, and H. Saruwatari, “Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra,” Computer Speech & Language, vol. 58, pp. 347–363, 2019.
Y. Mitsufuji, S. Uhlich, N. Takamune, D. Kitamura, S. Koyama, and H. Saruwatari, “Multichannel non-negative matrix factorization using banded spatial covariance matrices in wavenumber domain,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, pp. 49–60, 2019.
H. Sawada, N. Ono, H. kameoka, D. Kitamura, and H. Saruwatari, “A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF,” APSIPA Transactions on Signal and Information Processing, vol. 8, no. E12, 2019.
S. Koyama and L. Daudet, “Sparse Representation of a Spatial Sound Field in a Reverberant Environment,” IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 1, pp. 172–184, 2019.
T. Koriyama and T. Kobayashi, “Statistical parametric speech synthesis using deep Gaussian processes,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 5, pp. 948–959, 2019.
N. Maikusa, R. Sonobe, S. Kinoshita, N. Kawada, S. Yagishi, T. Masuoka, T. Kinoshita, S. Takamichi, and A. Homma, “Automatic detection of Alzheimer’s dementia using speech features of the revised Hasegawa’s Dementia Scale,” Geriatric Medicine, vol. 57, no. 2, pp. 1117–1125, 2019.
N. Ueno, S. Koyama, and H. Saruwatari, “Three-Dimensional Sound Field Reproduction Based on Weighted Mode-Matching Method,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 12, pp. 1852–1867, 2019.

2018

T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, Satoshi, and Nakamura, “An End-to-end Model for Cross-Lingual Transformation of Paralinguistic Information,” Machine Translation, pp. 1–16, Apr. 2018.
Y. Saito, S. Takamichi, and H. Saruwatari, “Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 1, pp. 84–96, Jan. 2018. [第34回電気通信普及財団テレコムシステム技術学生賞]
D. Kitamura, S. Mogami, Y. Mitsui, N. Takamune, H. Saruwatari, N. Ono, and Y. Takahashi, “Generalized independent low-rank matrix analysis using heavy-tailed distributions for blind source separation,” EURASIP Journal on Advances in Signal Processing, vol. 2018, no. 1, 2018.
N. Murata, S. Koyama, N. Takamune, and H. Saruwatari, “Sparse Representation Using Multidimensional Mixed-Norm Penalty With Application to Sound Field Decomposition,” IEEE Transactions on Signal Processing, vol. 66, no. 12, pp. 3327–3338, 2018.
S. Koyama, N. Murata, and H. Saruwatari, “Sparse Sound Field Decomposition for Super-resolution in Recording and Reproduction,” Journal of the Acoustical Society of America, vol. 143, no. 6, pp. 3780–3895, 2018.
N. Ueno, S. Koyama, and H. Saruwatari, “Sound Field Recording Using Distributed Microphones Based on Harmonic Analysis of Infinite Order,” IEEE Signal Processing Letters, vol. 25, no. 1, pp. 135–139, 2018.

2017

Y. Saito, S. Takamichi, and H. Saruwatari, “Voice Conversion Using Input-to-Output Highway Networks,” IEICE Transactions on Information and Systems, 2017.
Y. Bando, H. Saruwatari, N. Ono, S. Makino, K. Itoyama, D. Kitamura, M. Ishimura, M. Takakusaki, N. Mae, K. Yamaoka, Y. Matsui, Y. Ambe, M. Konyo, S. Tadokoro, K. Yoshii, and H. G. Okuno, “Low-latency and high-quality two-stage human-voice-enhancement system for a hose-shaped rescue robot,” Journal of Robotics and Mechatronics, vol. 29, no. 1, 2017.

2016

S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, “A statistical sample-based approach to GMM-based voice conversion using tied-covariance acoustic models,” IEICE Transactions on Information and Systems, vol. E99-D, no. 10, pp. 2490–2498, Oct. 2016.
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 9, pp. 1626–1641, Sep. 2016.
S. Takamichi, T. Toda, A. W. Black, G. Neubig, S. Sakti, and S. Nakamura, “Post-filters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 755–767, Apr. 2016. [日本音響学会独創研究奨励賞板倉記念対象論文]
S. Koyama, K. Furuya, K. Wakayama, S. Shimauchi, and H. Saruwatari, “Analytical approach to transforming filter design for sound field recording and reproduction using circular arrays with a spherical baffle,” Journal of the Acoustical Society of America, vol. 139, no. 3, pp. 1024–1036, Mar. 2016.
Y. Oshima, S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, “Non-Native Text-To-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics,” IEICE Transactions on Information and Systems, vol. E99-D, no. 12, 2016.

2015

S. Koyama, K. Furuya, Y. Haneda, and H. Saruwatari, “Source-location-informed sound field recording and reproduction,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5, pp. 881–894, Aug. 2015.
D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo, and S. Nakamura, “Multichannel signal separation combining directional clustering and nonnegative matrix factorization with spectrogram restoration,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 4, pp. 654–669, Apr. 2015.
F. D. Aprilyanti, J. Even, H. Saruwatari, K. Shikano, S. Nakamura, and T. Takatani, “Suppresion of noise and late reverberation based on blind signal extraction and Wiener filtering,” Acoustical Science and Technology, vol. 36, no. 6, pp. 302–313, Jan. 2015.

2014

S. Koyama, K. Furuya, Y. Hiwasaki, Y. Haneda, and Y. Suzuki, “Wave Field Reconstruction Filtering in Cylindrical Harmonic Domain for With-Height Recording and Reproduction,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1546–1557, Oct. 2014.
R. Miyazaki, H. Saruwatari, S. Nakamura, K. Shikano, K. Kondo, J. Blanchette, and M. Bouchard, “Musical-noise-free blind speech extraction integrating microphone array and iterative spectral subtraction,” Signal Processing (Elsevier), vol. 102, pp. 226–239, Sep. 2014.
S. Koyama, K. Furuya, H. Uematsu, Y. Hiwasaki, and Y. Haneda, “Real-time Sound Field Transmission System by Using Wave Field Reconstruction Filter and Its Evaluation,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E97-A, no. 9, pp. 1840–1848, Sep. 2014.
T. Aketo, H. Saruwatari, and S. Nakamura, “Robust sound field reproduction against listener’s movement utilizing image sensor,” Journal of Signal Processing, vol. 18, no. 4, pp. 213–216, Jul. 2014.
T. Miyauchi, D. Kitamura, H. Saruwatari, and S. Nakamura, “Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization,” Journal of Signal Processing, vol. 18, no. 4, pp. 217–220, Jul. 2014.
D. Kitamura, H. Saruwatari, K. Yagi, K. Shikano, Y. Takahashi, and K. Kondo, “Music signal separation based on supervised nonnegative matrix factorization with orthogonality and maximum-divergence penalties,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E97-A, no. 5, pp. 1113–1118, May 2014.