Skip to content
View PVTHust's full-sized avatar
🏠
Working from home
🏠
Working from home

Highlights

  • Pro

Block or report PVTHust

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
PVTHust/README.md

Hi! My name is Viet Tien Pham

🚀 AI Engineer

I'm an AI Engineer and Researcher passionate about building intelligent systems that understand and interact with humans naturally.

I have solid experience working on speech technologies, including:

  • Automatic Speech Recognition (ASR)
  • Speaker Verification (SV)
  • Text-to-Speech (TTS)
  • Audio Large Language Models (Audio LLMs)

I enjoy designing callbot center solutions, conversational AI, and human-robot interaction systems, where the ability to process and generate natural speech plays a critical role.

During my university years, I worked extensively on computer vision and deep learning, focusing on:

  • Image segmentation
  • Image classification
  • Few-shot segmentation
  • Object detection

This foundation helped me develop strong skills in designing and fine-tuning large-scale models, as well as integrating them into real-world applications.

On the research side, I'm interested in:

  • Reasoning with LLMs
  • Reinforcement Learning (RL) for optimizing conversational strategies
  • Retrieval-Augmented Generation (RAG) for enhancing knowledge-grounded dialogue systems

My technical background spans both model development and deployment, covering end-to-end speech pipelines, advanced audio feature engineering, and multi-modal reasoning capabilities.

I always aim to bridge the gap between state-of-the-art research and impactful user-facing products — from scalable voicebots for customer service to advanced vision and speech-based interactive systems.


🎯 Focus Areas

  • ASR, SV, TTS, Audio LLMs
  • Callbot / contact center automation
  • Human-robot interaction
  • Computer vision (segmentation, classification, few-shot segmentation, object detection)
  • Reasoning and RAG with LLMs
  • Reinforcement Learning for dialogue optimization

📫 Contact


⭐ Feel free to check out my repositories and connect with me! Followers GitHub Stars

Pinned Loading

  1. Speech_project_Vin Speech_project_Vin Public

    Multimodal Speech Emotion Recognition ViT (AST) for audio encoder and Multiscale Attention Net (MANet) for visual encoder

    Python 7 3

  2. project_NLP_final project_NLP_final Public

    This is a group project in the vin program: Modality Balance for Multimodal Conversational Emotion Recognition

    Python 1 1

  3. HySonLab/LightMed HySonLab/LightMed Public

    Light-weight Medical Image Segmentation

    Python 5 2

  4. ichigo ichigo Public

    Forked from menloresearch/ichigo

    Llama3.1 learns to Listen

    Python

  5. LLaMA-Omni LLaMA-Omni Public

    Forked from ictnlp/LLaMA-Omni

    LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

    Python

  6. mini-omni mini-omni Public

    Forked from gpt-omni/mini-omni

    open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

    Python