cv

This is a description of the page. You can modify it in '_pages/cv.md'. You can also change or remove the top pdf download button.

Basics

Name Ziyue Yin
Label Data Science Undergraduate | Research Assistant (Speech & Multimodal AI, Environmental Data Science)
Email ziyue.yin@dukekunshan.edu.cn
Phone +86 181 5120 0882
Url https://www.linkedin.com/in/ziyue-yin
Summary Dual-degree Data Science undergraduate (Class of 2026) at Duke Kunshan University & Duke University. Research experience spanning whisper-to-normal speech conversion with neural audio codecs and Transformers, multi-omics analysis of cyanobacterial bloom dynamics under heatwaves, and retrieval-augmented QA systems with LLMs.

Work

  • 2025.08 - 2025.09
    AI Agent Development Intern, Digital Intelligence Support Center
    China Mobile Communications Group Jiangsu Co., Ltd.
    Built an internal AI-agent workflow system to automate quarterly/annual operational report generation; integrated LLM-driven extraction, summarization, and templated drafting with cross-department stakeholders.
    • Delivered 3 reusable workflows covering 10 report sections to standardize formatting and speed iteration
    • Prototype adopted internally; recognized for independence, communication, and execution by the AI Group
  • 2025.06 - Present
    Research Assistant (Advisor: Ming Li)
    Speech and Multimodal Intelligent Information Processing (SMIIP) Lab, Duke Kunshan University
    Research on real-time whisper-to-normal speech conversion using neural audio codecs with Transformer-based mapping; focus on robust training under misalignment and evaluation on real whisper datasets.
    • Built codec + Transformer speaker-conditioned mapping with DTW-based latent alignment and multi-objective waveform/latent losses
    • Benchmarking on wTIMIT and AISHELL-6-Whisper; evaluating with WER/CER, DNSMOS, SI-SDR, and MCD
    • First-author manuscript in preparation for ICME 2026
  • 2024.06 - 2024.08
    Summer Research Scholar (Advisor: Paul Weng)
    Duke Kunshan University
    Built and evaluated a Retrieval-Augmented Generation (RAG) pipeline for question answering using Llama-3 8B; improved retrieval quality via indexing, embedding tuning, and filtering.
    • Implemented hierarchical indexing and embedding fine-tuning; ran 75 hyperparameter configurations
    • Added relevancy check (cosine similarity threshold 0.6) to filter irrelevant queries and reduce cost
    • Achieved +21.4% Hit Rate and +12.9% Context Recall; improved faithfulness (+8.3%) and context precision (+9.1%)
  • 2023.11 - 2025.08
    Research Assistant (Advisor: Huansheng Cao)
    Duke Kunshan University
    Multi-omics analysis of cyanobacterial bloom suppression under extreme summer heatwaves; large-scale pipelines on computing clusters and biological interpretation of heat-driven dynamics.
    • Processed metagenomic, metatranscriptomic, and metabolomic datasets (QC, taxonomic profiling, functional annotation, integration)
    • Quantified species-level shifts and thermal thresholds suppressing Microcystis blooms
    • Co-first author manuscript submitted to Nature Communications (under consideration)
  • 2023.01 - 2024.06
    Senior Program Assistant (Student Worker)
    Innovation and Entrepreneurship Initiative (InE), Duke Kunshan University
    Coordinated and executed entrepreneurship programming and incubator events; planned operations and on-site logistics to improve engagement and event quality.
    • Supported Entrepreneur Speaker Series (ESS) and U-Corp Lab, averaging ~60 attendees per event
    • Helped run Innovation Incubator (Dii) events serving 100+ incubator teams (Innovation Challenge, Chinese College Student Innovation & Entrepreneurship Program)

Education

  • 2022.08 - 2026.05

    Kunshan, China / Durham, U.S.

    B.S. (Dual Degree)
    Duke Kunshan University (DKU) & Duke University
    Data Science; Interdisciplinary Studies (Data Science)

Skills

Programming
Python
Java
Shell
MATLAB
SQL
Bash
Git
Machine Learning & Deep Learning
PyTorch
scikit-learn
Model training & evaluation
Experiment tracking
Research & Technical Writing
LaTeX
Technical reports
Posters
Manuscripts
HPC & Systems
Linux HPC clusters
Slurm
GPU workflows

Languages

Chinese
Native
English
Professional Working Proficiency
Spanish
Elementary

Interests

Photography
Badminton
Piano

Projects

  • 2025.06 - Present
    Whisper-to-Normal Speech Conversion (Real-time)
    A real-time whisper-to-normal speech conversion system using neural audio codecs and a Transformer-based, speaker-conditioned mapping with DTW-based latent alignment and multi-objective losses.
    • DTW-based latent alignment to handle misaligned real whisper datasets
    • Evaluation: WER/CER, DNSMOS, SI-SDR, MCD; ablations on alignment and multi-objective losses
    • Target datasets: wTIMIT, AISHELL-6-Whisper
  • 2023.11 - 2025.08
    Multi-omics Analysis of Bloom Suppression under Heatwaves
    End-to-end processing and integration of metagenomic, metatranscriptomic, and metabolomic data to study temperature-driven microbiome dynamics and thermal thresholds affecting Microcystis blooms.
    • Species-level dynamics + thermal threshold quantification
    • Strain-level genomic analysis for M. aeruginosa and associated phages
  • 2024.06 - 2024.08
    RAG Question Answering with Llama-3 8B
    A retrieval-augmented question answering pipeline with hierarchical indexing, embedding fine-tuning, and relevancy filtering to improve retrieval and reduce unnecessary compute.
    • 75-config hyperparameter sweep; best model improved Hit Rate (+21.4%) and Context Recall (+12.9%)
    • Relevancy check via cosine similarity thresholding (0.6)