Projects

Multilingual TextDetox 2025 2025
Built a 15-language multilingual text detoxification pipeline, fine-tuning transformer-based mT0-XL-detox-orpo and DistilBERT with synthetic data, lexicon-guided tagging, ORPO, and FP16, improving accuracy by 0.6 points.

Gender Gap Corpus Annotation 2025
Developed and deployed a corpus annotation pipeline using FastAPI (backend), HTML/JavaScript (frontend), and Docker, leveraging NLP techniques like dependency parsing, POS tagging, and web scraping, while ensuring annotation quality through Inter-annotator Agreement (IAA) analysis and optimizing search with Whoosh.