top of page

Comparative Analysis of Neural Network Architectures in AutomaticSpeech Recognition: A Focus on Single-Word Recognition

  • Writer: Arturo Arriaga
    Arturo Arriaga
  • Dec 28, 2023
  • 1 min read

Updated: Dec 29, 2023

This project explores automatic speech recognition (ASR), specifically targeting the challenge of recognizing single words. The core objective is to evaluate and compare the effectiveness of various neural network architectures for identifying spoken words from audio inputs. The project entails preprocessing audio files in WAV format and experimenting with different neural network models, each offering unique perspectives in ASR.


The Speech Commands dataset is used in this project and is a comprehensive collection of over 105,000 audio files, containing utterances of 35 different words predominantly one second or shorter.


Our methodology involves converting the waveform audio files into spectrograms. This transformation is executed through the Short-Time Fourier Transform (STFT), which converts time-domain signals into time-frequency-domain signals. The STFT is performed by segmenting the signal into windows and running a Fourier transform on each window, thus retaining essential time information. The generated spectrograms are then used as input for the neural network models.


The project initiates with a baseline model utilizing a Dense Neural Network (DNN). Subsequent phases of the project explore various advanced neural network architectures, including:

  • Recurrent Neural Network (RNN)

  • Convolutional Neural Network (CNN)

  • Gated Recurrent Unit (GRU)

  • Bidirectional RNN

  • Architectural Enhancements


The project culminates with a comprehensive comparison of the different models and their respective results. Our aim is to identify the most effective model based on empirical data, ensuring that our conclusion about the best-performing model is grounded in solid, measurable evidence. This comparative analysis will provide insights into each model's strengths and weaknesses, guiding our decision-making process in selecting the optimal approach for this task.


The full project can be found at this link:




Screenshots showing elements of the project.


 
 
 

Comments


Let's connect on LinkedIn

bottom of page