CV
Education
Ecole Polytechnique
Ph.D. in Natural Language Processing
2019-2022
- Title: Natural Language Generation: Systems and Evaluation
- Supervisor: Prof. Michalis Vazirgiannis
- Team: DaSciM
- Lab: Informatics laboratory of l’École polytechnique (LIX)
- Defended on: 16/12/2022
- Jury members:
- Prof. Eduard Hovy
- Prof. Eric Gaussier
- Prof. Ioana Manolescu
- Prof. Nizar Habash
- Prof. Jie Tang
- Mr. Alexandros Potamianos
Paris Saclay
Masters degree, Data Science
2018-2019
Télécom Paris
Engineering Degree, Data Science
2017-2019
Lebanese University Faculty of Engineering - Branch 1
Engineering degree - Software Engineering and Telecommunication
2013-2017
Lebanese University Faculty of Literature - Branch 3
Bachelor - Arabic Language and Literature
2013-2016
Professional Experience
Zaion
Research Scientist
2023-current
- Capitalizing on Large Language Models (LLMs) to solve problems related to customer service calls.
Ecole Polytechnique
Researcher (PhD Candidate)
2019-2023
- Pretraining and finetuning Transformers-based models for abstractive summarization.
- Collecting and creating French resources, including the pretraining of French language models.
- Proposing new metrics for automatic text generation evaluation.
Orange France (CCIC)
NLP Data Science Intern
2018-2019
- Definition of a GAN for text analysis and generation. The generated text is used to challenge chatbots.
- Keyword Spotting for Medical Applications.
Beetech
Web Developer
2016-2017
- Creation of a website designed to enhance communication among administrators, teachers, and parents.
Teachings
Introduction to Text Mining and NLP
Ecole Polytechnique: 3rd year engineering cycle
2019-2022
Advanced Learning for Text and Graph Data (ALTEGRAD)
Master’s Degree, Data Science, MVA
2020-2022
Data Science Starter Program - DSSAP3
Ecole Polytechnique - Executive Education
2019-2022
Advanced AI for Data Analysis
Ecole Polytechnique - Executive Education
2020-2022
Publications
- BARThez: a Skilled Pretrained French Sequence-to-Sequence Model. EMNLP 2021
- AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization. WANLP 2022
- FrugalScore: Learning Cheaper, Lighter and Faster Evaluation Metrics for Automatic Text Generation. ACL 2022
- DATScore: Evaluating Translation with Data Augmented Translations. [EACL 2023]
- Evaluation Of Word Embeddings From Large-Scale French Web Content. [arXiv]
- Leveraging Third-Party LLMs’ Annotations for Sensitive Conversational Data Abstractive Summarization. Under review
- Attention-Based Summary-Worthy Utterances Identification for Lengthy Conversations Abstractive Summarization. Under review
Skills
Programming languages and Frameworks:
Python, C++, JAVA, Pytorch, Pandas, Fairseq, Hugging Face Transformers.
Languages:
- Arabic (Native Proficiency)
- English (Full Professional Proficiency)
- French (Full Professional Proficiency)