Data Scientist specializing in AI and ML, with strong problem-solving skills and a knack for bridging the gap between business and technical needs. Seeking a collaborative team where I can contribute to impactful projects and grow professionally. Passionate about using data science to improve lives.
Ryte AI leverages big data to drive insights into healthcare efficiency and patient care, with projects spanning from entity resolution to advanced machine learning model development and integration of healthcare data sources. The company focuses on transforming healthcare delivery by developing advanced analytics solutions for both B2C and B2B markets.
As an AI/NLP Engineer at Ryte AI, I led the development and optimization of machine learning models and analytics pipelines, harnessing big data to drive healthcare insights and data-driven decision-making.
At Orange, we were faced with a complex challenge: our existing data ecosystem was costly and required extensive data engineering services from third-party providers. The solution came in the form of Google Cloud Platform (GCP), which offered a comprehensive suite of services including cost-effective storage, serverless architectures, and robust data engineering tools.
Recognizing the potential of GCP, Orange brought me on board as a Data Scientist to lead this transition and to explore the full range of solutions that GCP could offer. Our initial focus was on classifying customer complaints.
For a detailed recommendation, please see the letter from my manager:
As a Data Engineering Intern at Stellantis, I played a pivotal role in the Carflow MEA Dashboards project, a key initiative aimed at leveraging data from diverse sources to monitor supply chain operations in the MEA region. The existing manual ETL (Extract, Transform, Load) process posed challenges in terms of increased effort, potential human error, and inefficiencies. My primary responsibility was to automate this ETL process, thereby streamlining data management for the project.
GPA: 3.25
Implemented a fine-tuning approach on the Wav2Vec2 model for Automatic Speech Recognition (ASR). Fine-tuned on Spanish and Finnish speech datasets to improve performance on low-resource languages.
Achieved WER of 0.165 for Spanish and 0.376 for Finnish, demonstrating effectiveness in ASR tasks.
Plan to fine-tune Wav2Vec2 on more low-resource languages and compare performance with other pretrained models.
Developed models estimating age from images, focusing on mitigating biases related to age, gender, ethnicity, and facial expression.
Implemented several probabilistic generative models from scratch using PyTorch, including Variational AutoEncoder, Restricted Boltzmann Machine, and Real-NVP Normalizing Flows.
Executed multi-class image classification on a flower dataset using VGG16 with Transfer Learning. Achieved 85% accuracy on the validation set.
Used a dataset with 4242 images across five classes: chamomile, tulip, rose, sunflower, and dandelion.
Trained an ALS recommendation model using PySpark and MovieLens 100k dataset, stored on HDFS. Developed to generate insightful recommendations effectively.