Scroll to explore
Who I Am

Andrea Alberti
GenAI Engineer & Data Scientist
Graduated with a double degree in Management and Computer Science–Data Science, I have gained substantial experience in multidisciplinary projects. I specialize in applying machine learning, deep learning, and most recently Generative AI techniques to develop innovative and automated solutions.
My academic journey, from Management Engineering (110/110 cum laude) to a Master's in Data Science (110/110 cum laude), has given me a blend of technical expertise and strategic thinking. This allows me to approach complex problems from both a business and a technological perspective.
Professionally, I focus on building intelligent systems—from multi-agent architectures that automate complex business processes to advanced conversational AI—to drive efficiency and reduce operational costs. My goal is to continuously improve myself, using AI to create tangible value.
MSc. Data Science - Computer Engineering
University of Pavia
BSc. Management Engineering
University of Brescia
Certification
Google Cloud Professional ML Engineer
Google Cloud
Technical Skills
Core Skills
AI Tools
Cloud & Infrastructure
Programming Languages
Frameworks & Libraries
Professional Projects
A selection of professional projects where I applied Generative AI and Machine Learning to solve real-world problems.
AI-Powered Tyre Selection Assistant
Refined and tested a Dialogflow CX conversational agent for the UK market, guiding users in tyre selection through a multi-agent system with RAG and external APIs. Successfully migrated the solution to Google Agent Development Kit (ADK), expanding to international markets and new vehicle categories with cross-cloud AWS-GCP integration.
Automated Document Validation System
Full-stack implementation of a multi-agent system to automate verification and validation of documents for public funding requests. Developed backend in Python and frontend in HTML, CSS, and JavaScript. Designed a modular and flexible architecture that allows agent behavior adaptation without code modification.
Advanced Knowledge Base Chatbot
Implementation of advanced chatbot agent architecture to answer user questions using a knowledge base of web pages and PDF documents. System designed with a main orchestrator agent routing requests to five specialized sub-agents. Developed data management pipeline including PDF parsing, chunking strategy, and security layer for inappropriate questions.
Real-Time Multimodal Agent
Development and implementation of real-time multimodal conversational agent based on Google Agent Development Kit (ADK). System capable of processing audio and video inputs simultaneously, sustaining fluid conversations, and autonomously performing browser operations through reasoning and tools. Asynchronous dual-server architecture with WebSocket communication for low-latency bidirectional streaming.
Luxury Yacht Virtual Assistant
Implementation of virtual assistant with aim of answering user questions based on corporate knowledge base. Configured as RAG (Retrieval-Augmented Generation) application, leveraging Google Cloud ADK and Vertex AI Search to retrieve relevant information from dedicated datastore. Architecture uses Gemini models to process requests and generate accurate, contextualized responses in Italian.
Multi-Agent Ticketing System
Implementation of multi-agent ticketing system to automate user responses. Project developed as Proof of Concept (POC), involving creation of specialized agents, each capable of interacting with external databases via APIs to provide accurate and comprehensive replies. Multi-agent architecture ensures user requests are routed to most competent agent, reducing staff workload and management costs.
LLM-Based Email Classification Pipeline
Within broader project aimed at automating manual process and reducing operational costs, contribution focused on creating LLM-based pipeline for classification and dispatch of certified emails (PEC). Key activities included prompt engineering and few-shot learning to refine model outputs, along with development of metrics and analytical tools for system performance evaluation.
Insurance Liquidation Engine - Demo Project
Developed comprehensive multi-agent collaborative architecture for internal Sharing Days demonstration. System automatically analyzes, processes, and issues liquidation judgments on insurance claims. Architecture combines parallel agents for document analysis with sequential agents for final evaluation, utilizing custom tools, built-in Google Cloud services, and MCP integration.
Research & Academic Work

Heart Disease Detection from Audio Signals
Advanced Biomedical Machine Learning
Designed prevention and clinical support ensembles for early cardiac screening on the Dangerous Heartbeat Dataset (CHSC2011). Heart sounds were resampled at 4 kHz, segmented into 1-second windows, described with MFCC, chroma, spectral and temporal descriptors, and reduced from 338 to 41 features via Spearman-based filters. The prevention ensemble keeps false normals under control (ROC-AUC 0.96, TPR 43.4% at 1% FPR) while the five-class support ensemble delivers macro F1 81.6 with per-class risk analysis and SHAP explanations.

Disease Prediction with Graph Machine Learning
Financial Data Science
Mapped 773 diseases and 377 symptoms into a bipartite network to engineer graph-aware features for diagnosis. Method of Reflections, Disease/Symptom Influence indices, community detection and betweenness centrality drive new descriptors that complement one-hot symptoms. Logistic Regression, Random Forest and MLP models were benchmarked; the best logistic model matches the symptom-only baseline while using fewer inputs and exposes class-level accuracy insights.

Review Helpfulness Prediction with Big Data
Data Science & Big Data Analytics
Analyzed ~3M Amazon book reviews end-to-end with a big data stack (HDFS, Spark, MongoDB) to explain and predict perceived helpfulness. Hypothesis testing quantified the role of review length, sentiment and star rating, while Word2Vec embeddings fed Random Forest, SVR and MLP regressors for score prediction. The best Random Forest model achieved MSE 0.0259 (RMSE 0.1609, R² 0.253).

Clickbait Detection in News Headlines
Machine Learning
Benchmarked Multinomial Naive Bayes and Logistic Regression on 32k balanced news headlines to detect clickbait. Two deployment targets were explored: maximum accuracy (97.12% test accuracy with stopwords and 8k vocabulary) and zero false positives (0% FPR, 84% accuracy, TPR 68%). Detailed error analysis highlights impactful tokens and the trade-offs introduced by bias calibration.

DDoS Attack Detection and Mitigation
Enterprise Digital Infrastructure
Recreated DNS reflection and amplification attacks in a controlled LAN to measure amplification factors, target-side latency and server resource usage. Custom Scapy scripts spoofed victims while varying query types (A, MX, NS, ANY) at 10k–50k packets/s. The study documents latency spikes above 100 ms for ANY requests, CPU saturation during amplified attacks and evaluates mitigation strategies.

Cake Classification Features Analysis
Machine Learning
Compared handcrafted descriptors and CNN-derived features for classifying 15 cake categories (1,800 images). Low-level statistics (color histogram, edge direction, co-occurrence) fed an MLP but plateaued at 31% accuracy, while PVMLNet feature maps (layer −5) coupled with an MLP achieved 90% test accuracy. Transfer learning by fine-tuning PVMLNet reached 80%, highlighting the importance of deep representations.

Vanishing Points Detection in Images
Computer Vision
Delivered two computer-vision utilities: (1) a histogram-driven binarisation tool with auto/manual tuning and GUI, and (2) a vanishing point detector that chains Canny, probabilistic Hough and RANSAC (500 iterations, 5 px tolerance). The pipeline adapts thresholds from image statistics, overlays the 15 most significant lines, and documents SSIM comparisons against Otsu.

Sentiment Analysis on Social Media
Machine Learning
Implemented sentiment classifiers on the IMDb dataset (50k reviews) comparing Multinomial Naive Bayes and Logistic Regression. Vocabulary size, stopword removal and stemming were studied to balance accuracy and overfitting. Naive Bayes with stopwords (vocab 1k) achieved 82.6% test accuracy, while Logistic Regression reached 85.4% with minimal tuning.
Interactive AI Demos
Explore cutting-edge AI capabilities through interactive demonstrations. From RAG-powered research tools to autonomous multi-agent systems.
Research Paper Explorer
Chat with my academic papers using RAG. Ask questions about my research in ML, DL, NLP, and Computer Vision.
AI Board of Directors
Multi-agent system simulating a board of expert advisors. Watch agents debate and reach consensus on strategic decisions.
Autonomous Research Assistant
AI agent with tools for web search, data analysis, and report generation. Demonstrates agentic workflows and tool use.
A Unique Place for Everything
Learn Different, Think Different, Choose Liberti Hub
Beyond Work
Other things I love to do

Active Body, Active Mind
It is important for me to have moments to take care of my health. I set up a small home gym and I also love playing tennis and football.

The Perfect Mix of Passion, Ability, and Strategy
I love Formula 1 because it represents the pinnacle of racing, combining cutting-edge technology with human skill and strategic thinking.
If you no longer go for a gap which exists, you are no longer a racing driver. — Ayrton Senna

A Long-Lasting Family Tradition
I inherited from my grandfather the passion for football and tennis. I love watching and analyzing matches and I'm a big fan of AC Milan since ever.
Let's Connect
Open to opportunities and collaborations. Feel free to reach out!
Let's Connect
Open to opportunities and collaborations
