As a Data Scientist Intern in the INDEPTH Deepfake Detection Platform team, you will work on evaluating and benchmarking state-of-the-art deepfake detection solutions to safeguard digital content authenticity across Singapore's public sector. You will contribute to analysing the evolving landscape of content generation and detection technologies, integrating best-in-class detection solutions, and building automated evaluation frameworks to assess their effectiveness against emerging synthetic media threats.
1. Conduct comprehensive landscape analysis of content generation and detection technologies, mapping the evolving threat landscape and identifying cutting-edge detection solutions from academia and industry.
2. Design and implement automated benchmarking pipelines to systematically evaluate third-party detection solutions across multiple modalities and datasets, analysing performance trade-offs and identifying optimal solutions for different use cases.
3. Build automated evaluation frameworks that continuously test detection solutions against emerging generation techniques, implementing strategies to measure robustness, generalisation, and failure modes.
4. Integrate and orchestrate multiple detection solutions into a unified evaluation platform, learning to design testing APIs, manage solution versioning, and create comparative analysis dashboards for stakeholder decision-making.
1. Able to commit full-time during the internship period without other curricular or co-curricular commitments.
2. Strong Python programming skills with experience in deep learning frameworks (e.g. PyTorch) and familiarity with content generation models (e.g. diffusion models).
3. Understanding of deepfake generation techniques and detection methodologies, including knowledge of the latest research in facial forensics, temporal consistency analysis, and multimodal approaches for detecting synthetic media.
4. Knowledge of evaluation pipelines and MLOps practices, including experience with automated testing frameworks, synthetic data generation for benchmarking, dataset curation tools, and reproducible experimentation.
5. Knowledge of AWS (e.g. SageMaker, Athena) is a plus.