Education

Ecole Polytechnique
Ph.D. in Natural Language Processing
2019-2022

  • Title: Natural Language Generation: Systems and Evaluation
  • Supervisor: Prof. Michalis Vazirgiannis
  • Team: DaSciM
  • Lab: Informatics laboratory of l’École polytechnique (LIX)
  • Defended on: 16/12/2022
  • Jury members:
    • Prof. Eduard Hovy
    • Prof. Eric Gaussier
    • Prof. Ioana Manolescu
    • Prof. Nizar Habash
    • Prof. Jie Tang
    • Mr. Alexandros Potamianos

Paris Saclay
Masters degree, Data Science
2018-2019

Télécom Paris
Engineering Degree, Data Science
2017-2019

Lebanese University Faculty of Engineering - Branch 1
Engineering degree - Software Engineering and Telecommunication
2013-2017

Lebanese University Faculty of Literature - Branch 3
Bachelor - Arabic Language and Literature
2013-2016


Professional Experience

Zaion
Research Scientist
2023-current

  • Capitalizing on Large Language Models (LLMs) to solve problems related to customer service calls.

Ecole Polytechnique
Researcher (PhD Candidate)
2019-2023

  • Pretraining and finetuning Transformers-based models for abstractive summarization.
  • Collecting and creating French resources, including the pretraining of French language models.
  • Proposing new metrics for automatic text generation evaluation.

Orange France (CCIC)
NLP Data Science Intern
2018-2019

  • Definition of a GAN for text analysis and generation. The generated text is used to challenge chatbots.
  • Keyword Spotting for Medical Applications.

Beetech
Web Developer
2016-2017

  • Creation of a website designed to enhance communication among administrators, teachers, and parents.

Teachings

Introduction to Text Mining and NLP
Ecole Polytechnique: 3rd year engineering cycle
2019-2022

Advanced Learning for Text and Graph Data (ALTEGRAD)
Master’s Degree, Data Science, MVA
2020-2022

Data Science Starter Program - DSSAP3
Ecole Polytechnique - Executive Education
2019-2022

Advanced AI for Data Analysis
Ecole Polytechnique - Executive Education
2020-2022


Publications

  • BARThez: a Skilled Pretrained French Sequence-to-Sequence Model. EMNLP 2021
  • AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization. WANLP 2022
  • FrugalScore: Learning Cheaper, Lighter and Faster Evaluation Metrics for Automatic Text Generation. ACL 2022
  • DATScore: Evaluating Translation with Data Augmented Translations. [EACL 2023]
  • Evaluation Of Word Embeddings From Large-Scale French Web Content. [arXiv]
  • Leveraging Third-Party LLMs’ Annotations for Sensitive Conversational Data Abstractive Summarization. Under review
  • Attention-Based Summary-Worthy Utterances Identification for Lengthy Conversations Abstractive Summarization. Under review

Skills

Programming languages and Frameworks:
Python, C++, JAVA, Pytorch, Pandas, Fairseq, Hugging Face Transformers.

Languages:

  • Arabic (Native Proficiency)
  • English (Full Professional Proficiency)
  • French (Full Professional Proficiency)