Stefano Viel's Portfolio

Modelling topics on Reddit @ ETH

Developed a GPU-accelerated data processing pipeline to categorize 1.7 billion Reddit posts into topics, implementing BERTopic with BERT embeddings, UMAP, and HDBSCAN. Conducted an analysis of approximately 12 billion Reddit posts to find the topic of each post and how those topics changed over time. Optimized large-scale NLP techniques, reducing the dataset to 43 million categorized posts for efficient analysis. Created and published an interactive website (redditopics.xyz) for exploring and downloading the processed dataset. I’ll spend the summer working on the project and update the results later.

Questudying.com

Developed an AI-powered personal learning assistant that helps users revise more efficiently by answering questions about the material they upload. Questions are tuned based on the previous user's answers to reach the right level of difficulty. Skills: Retrieval-Augmented Generation, Python, Flask, React, AWS.

Project Link

Data Story on the Evolution of Movies

Analyzed cinema trends over decades using large-scale datasets, focusing on movie length, ratings, budgets, and plot complexity. Applied Latent Dirichlet Allocation to track changes in movie topics and originality, creating an interactive data story. Skills: Python, pandas, numpy.

View Data Story

Course Assistant Chat-bot for STEM Subjects

Developed STEMERALD, a STEM course assistant using the Gemma-2b language model, implementing Supervised Fine-Tuning and Direct Preference Optimization. Achieved 70% accuracy across various multiple-choice question benchmark datasets. Successfully reduced the model’s memory footprint to 2GB, enabling its deployment on consumer-grade hardware.

Report

Solving Wordle with Reinforcement Learning

Applied RL to solve Wordle and similar few-step environments using large language models, focusing on behavioral cloning. Developed and tested a novel reward-weighted behavioral cloning method, demonstrating its ability to generalize traditional behavioral cloning approaches and effectively manage datasets of varying quality.

Report

Graph Convolution Network for Book Recommendations

In this project, we used Graph Convolutional Network architectures to develop a book recommender system. Evaluated on a dataset composed of 6 million book reviews. Once trained, and given a user's purchase history, the model was able to recommend books that the user was most likely to enjoy.

GitHub Report

Stefano Viel

About me

Projects