Research
I am interested in Deep Learning in general, and read literature from different fields. In-particular, I try to follow the research in the following areas:
-
Improving Reasoning in Vision-Language Models
- Visual Question Answering (VQA)
How well can a model reason on an image/video and answer questions. How can we extend it to multi-cultural and multi-lingual settings while maintaining alignment.
- Image Generation
How well can a model reasonate with the textual caption and generate image. How can we extend it to a multi-cultural and multi-lingual settings while maintaining alignment.
- Self-Supervised Learning
How to learn good representations from a huge amount of data that can be used for down-stream tasks without huge efforts.
- World Models
How can we make models that can plan and reason in an environment.
|
News
[Dec 2024] Our work on "Bridging the Data Provenance Gap Across Text, Speech and Video " has been accepted at ICLR 2025
[Sep 2024] Starting as a Pre-Doctoral AI Research Fellow at Fatima Fellowship advised by Dr. Wei Peng, Research Scientist at Stanford University.
[July 2024] Our work on "The Rapid Decline of the AI Data Commons" has been accepted at NeurIPS 2024
[June, 2024] Serving as NLP Lead for Bytewise Fellowship..
[April, 2024] Accepted in Oxford Machine Learning Summer School OxML (MLX Fundamentals).
[April, 2024] Started working as a Deep Learning Computer Vision Engineer at Roll.ai.
[Dec, 2023] Got accepted as AI Fellow at the 14th Batch of PI School of AI with a scholarship worth 12,500€.
[Dec, 2023] Started as a ML-Maths Community Lead at Cohere.for.ai led by Sara Hooker.
[June, 2023] Served as Urdu Language Ambassador for AYA. Contributed 3 Datasets and led data crowd sourcing.
[Mar, 2023] Serving as Data Science Lead for Bytewise Fellowship..
[Feb, 2023] Started as a Asian Community Lead at Cohere.for.ai led by Sara Hooker. See our sessions: here.
[Aug, 2022] Graduated from IIUI with a Bachelors in Computer Science.
[April, 2022] Started as a Machine Learning Engineer at Redbuffer.ai.
[Dec, 2021] Started as a Software Engineer (Deep Learning and Computer Vision) at Wortel.ai.
[July, 2021] Started as a Deep Learning and Computer Vision Intern at Wortel.ai.
[Sep, 2018] Started my Undergraduate studies in CS at IIUI
|
Publications
On the Limitations of Vision Language Models in Understanding Image Transforms
Ahmad Mustafa Anis, Hasnain Ali, Saquib Sarfraz
Preprint
This paper investigates the image-level understanding of VLMs, specifically CLIP by OpenAI and SigLIP by Google. Our findings reveal that these models lack comprehension of multiple image-level augmentations.
Bridging the Data Provenance Gap Across Text, Speech and Video
S Longpre, N Singh, [8 authors], Ahmad Mustafa Anis, et al.
ICLR 2025
Consent in Crisis: The Rapid Decline of the AI Data Commons
S Longpre, R Mahari, [13 authors], Ahmad Mustafa Anis, et al.
NeurIPS, 2024
|
Favourite Papers
List of papers I really admire (and hope to do similar impactful work).
CLIP, abs
World Models, abs
JEPA, abs
SimCLR,
abs
Vision Transformers Need Registers,
abs
Pali-3,
abs
Learning by Distilling Context,
abs
SigLIP,
abs
GILL: Generating Images with Multimodal Language Models,
abs
|
|