Research — Throlson Labs

Focus Areas

Research Domains

🧠 Large Language Models

Architecture innovation, efficient training at scale, long-context modeling, and multi-modal integration for next-generation LLMs.

🛡️ AI Safety & Alignment

Constitutional AI, RLHF, interpretability, red-teaming, and developing robust frameworks to keep AI systems aligned with human intent.

👁️ Computer Vision

Object detection, scene understanding, generative imaging, video comprehension, and multi-modal visual reasoning systems.

🎯 Reinforcement Learning

Multi-agent systems, reward modeling, sim-to-real transfer, and training agents for complex real-world decision making.

💬 Natural Language Processing

Multilingual understanding, semantic parsing, dialogue systems, and information extraction across 100+ languages.

🤖 Autonomous Agents

Planning, tool use, world models, and building AI agents capable of sustained reasoning and action in open-ended environments.

Publications

Selected Papers

Feb 2026

LLM

Sparse Attention Architectures for Long-Context Language Modeling

A novel sparse attention mechanism enabling 128K token context windows with 3× lower memory overhead. Achieves new SOTA on SCROLLS, BookSum, and our internal LongEval benchmark.

Read Paper → arXiv Code

Jan 2026

Safety

Constitutional Alignment via Reward-Weighted Self-Reflection

A framework where language models evaluate and improve their own outputs through constitutionally grounded reward signals. Reduces harmful outputs by 94% on HarmBench while maintaining helpfulness scores.

Read Paper → arXiv

Dec 2025

Vision

Multi-Scale Feature Pyramids for Zero-Shot Object Detection

Hierarchical feature pyramid network that generalizes to unseen categories without fine-tuning. Outperforms GLIP and OWL-ViT on LVIS and Objects365 zero-shot benchmarks.

Read Paper → arXiv Code

Oct 2025

Cooperative Multi-Agent Learning with Emergent Communication

Demonstrating that RL agents develop structured communication protocols when incentivized to cooperate, with implications for scalable multi-agent AI systems.

Read Paper → arXiv

Aug 2025

LLM

Efficient Knowledge Distillation for On-Device Language Models

A novel distillation pipeline that compresses 70B parameter models to 3B with less than 5% quality degradation, enabling powerful on-device inference.

Read Paper → arXiv Code

Jun 2025

Safety

Red-Teaming at Scale: Automated Adversarial Evaluation of LLMs

An automated red-teaming framework that generates diverse adversarial probes and systematically evaluates model robustness across safety-critical domains.

Read Paper → arXiv Code

Mar 2025

Vision

Temporal Consistency in Video Generation via Latent Diffusion

Achieving state-of-the-art temporal coherence in AI-generated video through a novel latent diffusion architecture with temporal attention layers.

Read Paper → arXiv

Open Source

Open Source Contributions

throlson/safeguard

SafeGuard Framework

Production-ready constitutional alignment toolkit with plug-and-play safety layers for any LLM deployment.

⭐ 4.2k 🔀 890

throlson/sparseformer

SparseFormer

Efficient sparse attention implementation for PyTorch enabling 128K+ token context windows with minimal memory overhead.

⭐ 3.1k 🔀 620

throlson/redteam-auto

RedTeam Auto

Automated adversarial evaluation suite for LLMs with 1000+ curated attack vectors across safety-critical domains.

⭐ 2.8k 🔀 540

Pushing the Boundaries of AI Science

🧠 Large Language Models

🛡️ AI Safety & Alignment

👁️ Computer Vision

🎯 Reinforcement Learning

💬 Natural Language Processing

🤖 Autonomous Agents

Sparse Attention Architectures for Long-Context Language Modeling

Constitutional Alignment via Reward-Weighted Self-Reflection

Multi-Scale Feature Pyramids for Zero-Shot Object Detection

Cooperative Multi-Agent Learning with Emergent Communication

Efficient Knowledge Distillation for On-Device Language Models

Red-Teaming at Scale: Automated Adversarial Evaluation of LLMs

Temporal Consistency in Video Generation via Latent Diffusion

SafeGuard Framework

SparseFormer

RedTeam Auto