Projects

A mix of academic data infrastructure, LLM evaluation, and applied ML tooling. Most are public; one (Academic-Atlas) is local-only due to corpus size.

Moltbook — AI Agent Social Media Corpus

2025 – Present

corpus2.78M posts14.32M commentsasync scraping

The largest publicly available agent-only social media corpus: 2.78M posts, 14.32M comments, 97K agent profiles from the first AI-only social platform. Daily unattended collection via adaptive scheduler that self-calibrates run-time predictions; SQLite dedup at 14M+ scale; rate-limit-header parsing with exponential backoff. CC BY 4.0.

HuggingFace dataset

Academic-Atlas — OpenAlex Paper Exploration Tool

2025 – 2026

~455M papersOpenAlexlocal search

Independently built local search and comparison interface over the full OpenAlex academic corpus (~455M papers). Local-only deployment due to data scale; full corpus downloaded and indexed.

GitHub

How AI Sees Your Name — LLM Embedding Bias Evaluation

2026

LLM evalsWEAT250K namesStreamlit

Applied WEAT (Word Embedding Association Test) methodology over 250K+ names across six dimensions (wealth, wisdom, happiness, health, leadership, beauty). Multi-model comparison: GloVe, BERT, Tencent Chinese embeddings, plus Claude / GPT first-impression testing. Data: Chinese Gender Dataset (Nature Sci Data 2025).

GitHub

Permanent Portfolio Backtesting Engine

2025

Streamlitpandas40+ years of data

Interactive Streamlit application stress-testing Harry Browne's 25/25/25/25 portfolio across 40+ years of historical market data. Supports custom allocations, sensitivity analysis, and portfolio optimization.

HuggingFace Space

AI Audio Transcription with Speaker Diarization

2025

Whisperdiarization

Audio/video to text pipeline with automatic speaker identification. Built on the OpenAI Whisper API with a custom diarization layer; deployed on HuggingFace Spaces.

HuggingFace Space

Academic Writing Check — Manuscript Self-Assessment

2026

GradiospaCytextstat

Gradio app analyzing academic manuscripts across 9 writing-quality metrics in 3 groups: readability, style traditions, AI-style markers. Inspired by Gartenberg et al. (2026) on AI-generated text in peer review.

HuggingFace Space