Salma Ouardi

Paris, France · (33) 7 58 22 41 69 · [email protected]

Data Scientist specializing in AI and ML, with strong problem-solving skills and a knack for bridging the gap between business and technical needs. Seeking a collaborative team where I can contribute to impactful projects and grow professionally. Passionate about using data science to improve lives.

2 Years of Work Experience

AI Engineer | Full-Time (1 year)

September 2023 - Present
Ryte AI - Healthcare Startup - Paris, France

Ryte AI leverages big data to drive insights into healthcare efficiency and patient care, with projects spanning from entity resolution to advanced machine learning model development and integration of healthcare data sources. The company focuses on transforming healthcare delivery by developing advanced analytics solutions for both B2C and B2B markets.

Role Summary

As an AI/NLP Engineer at Ryte AI, I led the development and optimization of machine learning models and analytics pipelines, harnessing big data to drive healthcare insights and data-driven decision-making.

Key Tasks

  • Advanced Model Development and Optimization:
    • Developed extractive and abstractive summarization models using large language models, ensuring diversity and quality across a large corpus of reviews.
    • Utilized LightGBM with Hyperopt for hyperparameter tuning, improving model accuracy over baseline.
  • Entity Resolution and Data Processing:
    • Spearheaded the design and implementation of entity resolution pipelines, reducing processing time from 3 days to 1 hour and increasing F1-Score from 77% to 98% on clinical datasets.
  • Scalable Data Transformation:
    • Engineered scalable Spark-based data transformation codebases, optimizing performance for datasets exceeding 10 TB, and using Scala UDFs for complex data processing.
  • Azure and Cloud Integration:
    • Leveraged Azure products, including Azure Databricks, Azure VMs, and Azure Storage Manager, to deploy scalable data processing and machine learning solutions.
    • Streamlined workflows using Azure DevOps, enhancing collaboration and deployment efficiency.
  • Active Learning and Data Validation:
    • Incorporated active learning techniques into the pipeline, improving model adaptability and efficiency through a Streamlit web application with a PostgreSQL backend.
    • Implemented Test-Driven Development (TDD) using PyTest and Pydantic, ensuring robust data validation and reliable model performance.

Data Scientist GCP | Internship (6 months)

March 2023 - August 2023
Orange - Paris, France

At Orange, we were faced with a complex challenge: our existing data ecosystem was costly and required extensive data engineering services from third-party providers. The solution came in the form of Google Cloud Platform (GCP), which offered a comprehensive suite of services including cost-effective storage, serverless architectures, and robust data engineering tools.

Recognizing the potential of GCP, Orange brought me on board as a Data Scientist to lead this transition and to explore the full range of solutions that GCP could offer. Our initial focus was on classifying customer complaints.

Key Tasks

  • Led the transition of Orange's data ecosystem to Google Cloud Platform (GCP) to reduce costs and reliance on third-party data engineering services.
  • Developed a machine learning model on Vertex AI to classify customer complaints, improving the understanding of customer feedback and identifying areas for improvement.
  • Performed data engineering tasks including establishing the architecture for machine learning solutions on GCP's Vertex AI.
  • Conducted extensive data preprocessing and cleaning to prepare data for machine learning model training.
  • Selected and tuned machine learning algorithms to optimize model performance.
  • Implemented active learning techniques to handle a large amount of unlabeled data, iteratively improving the model's performance.
  • Achieved a model accuracy of 91%, demonstrating the effectiveness of the data science methodologies employed.
  • Successfully tested and validated the new data architecture, confirming its efficiency and robustness.
  • Utilized data science to drive business decisions and strategies, highlighting the importance of data-driven insights in business operations.

Manager Contact

  • Name: Benoit Eock Belinga
  • Role: Lead Data Scientist | Programme Data / IA
  • Email: [email protected]
  • Phone: +33 6 84 59 08 70

Recommendation Letter

For a detailed recommendation, please see the letter from my manager:

View Recommendation Letter

Data Engineer | Internship (6 months)

March 2022 - September 2022
Stellantis - Casablanca, Morocco

As a Data Engineering Intern at Stellantis, I played a pivotal role in the Carflow MEA Dashboards project, a key initiative aimed at leveraging data from diverse sources to monitor supply chain operations in the MEA region. The existing manual ETL (Extract, Transform, Load) process posed challenges in terms of increased effort, potential human error, and inefficiencies. My primary responsibility was to automate this ETL process, thereby streamlining data management for the project.

Key Tasks

  • Collaborated closely with a data architect to establish an effective working environment, gaining valuable insights into the data team's operations.
  • Conducted in-depth research into Stellantis's Supply Chain business, facilitated by the supply chain business team, to understand the business context and requirements.
  • Analyzed the existing ETL solution, identified business requirements, and mapped out a strategic plan for process improvement.
  • Designed and implemented an automated ETL solution using PySpark, Apache Airflow, and Oracle Exadata, tools from the Stellantis Data department.
  • Conducted rigorous testing of the data pipelines and documented the end-to-end automation process to ensure knowledge transfer and future reference.
  • The implemented solution significantly improved the system's efficiency, reducing latency by 46ms and decreasing the failure rate by 82%.

Education

Paris-Saclay University

September 2022 - September 2023
Paris, France
Master of Science, Artificial Intelligence
Main Courses: ML Algorithms, Deep Learning, Computer Vision, Large-Scale Distributed Data Processing, Probabilistic Generative Models, Applied Statistics, Advanced Optimization, Signal Processing, NLP, Information Retrieval, Reinforcement Learning.

Ecole des sciences de l'information

September 2018 - August 2022
Rabat, Morocco
Master of Engineering, Data and Knowledge

GPA: 3.25

Main Courses: Data Structures and Algorithm, Business Intelligence and Data Warehousing, Big Data, Artificial Intelligence, Expert Systems, Statistics, Machine Learning, Network Security, Operating Systems, Knowledge Management.

Classes Preparatoires Aux Grandes Ecoles

September 2016 - August 2018
Agadir, Morocco
MPSI, MP
Main Courses: Mathematics, Physics, Engineering Sciences, Chemistry, Computer Science.

Skills

Soft Skills
  • Communication: Clearly explain complex ideas.
  • Problem-solving: Find efficient solutions.
  • Collaboration: Work well with teams (AGILE, Scrum)
  • Adaptability: Learn new tools quickly.
  • Attention to Detail: Ensure high data quality.
  • Critical Thinking: Extract insights from data.
Technical Skills
  • Programming: Python (NumPy, Pandas, scikit-learn, spaCy), Scala, SQL, NoSQL, Bash.
  • Machine Learning: Supervised & Unsupervised Learning, AutoML.
  • Deep Learning & AI: Transformers, LLMs, PyTorch, TensorFlow, Hugging Face.
  • NLP: Entity resolution, Summarization, Wav2Vec2 for ASR.
  • Data Analysis & Statistics: Data modeling, Predictive analytics.
  • Big Data: Apache Spark, Databricks, Oracle Exadata, Hadoop.
  • Cloud & DevOps: Microsoft Azure, GCP, Docker, CI/CD, Terraform.
  • Tools & Platforms: Git, GitHub, MLflow, DVC.
  • Data Visualization: PowerBI, Tableau, Matplotlib, Seaborn.
Languages
  • English: Bilingual Proficiency
  • French: Bilingual Proficiency
  • Arabic: Native

Technologies


PROJECTS

Wav2Vec2 Fine-Tuning for ASR

January 2023

Implemented a fine-tuning approach on the Wav2Vec2 model for Automatic Speech Recognition (ASR). Fine-tuned on Spanish and Finnish speech datasets to improve performance on low-resource languages.

Project Steps

  • Setting up APIs
  • Loading and preprocessing the CSS10 dataset
  • Configuring the Wav2Vec2CTCTokenizer and Wav2Vec2FeatureExtractor
  • Fine-tuning and training the model
  • Evaluating the model using the Word Error Rate (WER) metric

Results

Achieved WER of 0.165 for Spanish and 0.376 for Finnish, demonstrating effectiveness in ASR tasks.

Future Work

Plan to fine-tune Wav2Vec2 on more low-resource languages and compare performance with other pretrained models.

  • Language: Python
  • Tools: Wav2Vec2 (Hugging Face Transformers), CSS10 dataset, WER metric, APIs, Google Drive.
  • GitHub: Wav2Vec2 Fine-Tuning for ASR

Bias Mitigation For Age Detection

October 2022

Developed models estimating age from images, focusing on mitigating biases related to age, gender, ethnicity, and facial expression.

Techniques Used

  • Data Augmentation with Albumentations and OpenCV
  • Customized loss functions
  • Base models: NASnet, RESnet

Probabilistic Generative Models

October 2022

Implemented several probabilistic generative models from scratch using PyTorch, including Variational AutoEncoder, Restricted Boltzmann Machine, and Real-NVP Normalizing Flows.

Flower Recognition with Fine-tuned VGG16

March 2022

Executed multi-class image classification on a flower dataset using VGG16 with Transfer Learning. Achieved 85% accuracy on the validation set.

Project Steps

  • Implemented image generators with data augmentation
  • Fine-tuned the VGG16 architecture
  • Evaluated model performance on the validation set

Dataset

Used a dataset with 4242 images across five classes: chamomile, tulip, rose, sunflower, and dandelion.

Classification with PySpark

January 2022

Trained an ALS recommendation model using PySpark and MovieLens 100k dataset, stored on HDFS. Developed to generate insightful recommendations effectively.

Project Steps

  • Built recommendation model using user preferences
  • Computed recommendations and similar items
  • Applied evaluation metrics for model performance

Interests

Traveling
Music
Chess
Basketball
Hiking