// about

aiexplorer.dev

No corporate framing. Just a builder testing things and publishing results honestly.

What I do

By day, I am an Associate Director and Principal Architect focused on building the Agentic Enterprise. I solve the problem of integrating modern AI into rigid, legacy core systems in highly regulated industries. I design enterprise-grade AI platforms, high-throughput data pipelines (specializing in Google Cloud's Vertex AI and AWS), and the deterministic API ecosystems required to make multi-agent systems scale securely in production.

On the weekends, I run empirical experiments on LLMs and SLMs — adversarial tests, benchmarks, and compliance experiments on open-source models to see where they actually break. I refer to published research papers and develop the code to replicate findings on smaller models, focusing on structured output failures, adversarial guardrails, context position bias, and compliance enforcement. Real benchmarks. Real limitations. No hype.

I also share the architectural reality of this work. I've spoken at the Kong API Summit (2024, 2025) about GenAI integration patterns, API-driven architectures, and what it takes to transition from legacy interfaces to agent-ready Tool-Use standards at scale.

Credentials

  • Post Graduate Program in AI & ML: Business Applications — McCombs School of Business, UT Austin (2024)
  • Google Cloud L400 Advanced
  • 15+ Google Cloud GenAI Certifications (including Vertex AI Search & RAG Framework)
  • Kong API Summit speaker — 2024, 2025

Focus areas

  • Agentic workflow orchestration and multi-agent systems
  • Enterprise RAG pipelines and hub-and-spoke data architectures
  • RAG pipeline testing & compliance enforcement
  • Structured output benchmarking (1,500+ tests across 7 models)
  • Context position bias in small LLMs
  • Adversarial scenario testing (17 scenarios, 490 test cases)
  • Enterprise benchmark design for small models (evaluator bias detection)
  • NeMo Guardrails & Llama Guard comparison
  • Prompt injection defense

Models I test

  • Gemma 2 2B / 3 4B / 3 12B / 4 E4B / 4 31B
  • Llama-3B / 7B / 8B
  • Latest Gemini models (Flash / Pro) — including video & audio via Gemini Live
  • Claude Opus / Sonnet / Haiku
  • Local inference on Apple Silicon via Ollama

Cloud & Architecture Stack

  • GCP Ecosystem: Vertex AI, Cloud Run, Cloud Functions, BigQuery, APIGEE
  • AWS Ecosystem: Bedrock, EventBridge, Lambda, MSK, OpenSearch, Textract
  • Architecture Patterns: Event-driven process orchestration, Hub-and-Spoke data pipelines, Agentic Tool-Use APIs

How I build & experiment

  • Applied research — translating academic arXiv papers into executable code and test harnesses
  • Agentic engineering — I use Claude Code and Gemini Code Assist to build these research pipelines and orchestrate complex testing workflows
  • Hypothesis-driven — statistical validation for every test
  • Open reality — writing up what doesn't work, not just what does