Unpitched was developed in around 5 weeks by Greg Kossakowski and Patryk Kabaj in June-July 2025, to see if we'd make good co-founders together. This is the story of how it all went and what we learned about building with AI.
The Work Trial: Testing Partnership Through Building
We met for the first time in early May 2025. Patryk sent an email to Greg, asking whether he was still searching for co-founders, as he'd heard about that from a mutual friend. Patryk had different plans—to apply for PM roles at companies he admired—but deep down, he's always been a founder and considered giving it another go. He saw Greg speak at a networking event a year before, where he remembered liking Greg's unique perspective on building technology companies.
After dinners, coffees, and sixty-three emails, we clicked. At some point, Greg suggested filling out a fairly long and exhaustive questionnaire titled 50 Questions to Explore with a Potential Co-Founder by First Round, and we did that. We highly encourage everyone to do that—think of it as a smoke detector, not a crystal ball. It won't predict success, but it's good at catching deal-breakers and exposing when founders haven't thought deeply about what they're getting into. We were aligned.
Here's the thing about co-founder relationships: your co-founder is the person you'll be spending the majority of your days with for the next couple of years. The best way to see if you work well with someone? Actually work with them.
Instead of endless coffee chats about "maybe we should build something," we gave ourselves a deadline and picked a real, tightly scoped project. We agreed to build a small product in less than a month, give it 100% of our focus, work in-person, and experiment with AI to make it happen.
The Idea: Solving a Real Problem We'd Witnessed
The idea came from genuine frustration. Greg recently did a consulting gig with two founders who were technically strong but struggled with user research and honing their idea. Greg didn't enjoy giving the same slightly heartbreaking feedback repeatedly and thought impartial AI would be a better fit for that kind of work. We thought: what if AI could help founders catch these mistakes and coach founders how to do better?
Fortunately, a book exists on this exact problem: The Mom Test by Rob Fitzpatrick. It's a short, easy-to-digest book that we recommend as a must-read for all current and future founders.
Unpitched became an AI-powered tool that helps startup founders conduct better customer interviews. Think of it as a digital twin of The Mom Test book that operationalizes book's advice. It analyzes interview transcripts to detect when founders slip into "pitch mode"—explaining their solution instead of learning about customer problems. It also features speaking balance analysis and offers personalized coaching based on principles from The Mom Test.
Unpitched is intentionally limited in its functionality. We needed an idea for a work trial, not for a business. We didn't research competitors extensively and the only "discovery session" was Greg's personal experience. We wanted to agree on reasonable scope and end up with a shippable product.
Our requirements were straightforward:
- It must solve a real user problem
- It must solve it well
- It should be tightly scoped but can't be half-baked
- It must be production-grade
- It must monetize its usage
We delivered on all requirements except the last one, which proved trickier than anticipated. We also agreed to keep it running for at least a year, then decide what's next.
The reality check: Unpitched isn't a traditional business, and we don't have immediate plans for further development. It's the tangible result of our work trial—a genuine experiment in both partnership and AI-accelerated development.
Development Approach: Walking Skeletons and Tracer Bullets
We knew we'd leverage AI heavily for development—what most people call "vibe-coding." The interesting challenge: most documentation about AI-assisted development focuses on solo projects. How do you coordinate this approach with someone you've never worked with before?
We had a crucial discussion early on and agreed to follow the tracer bullet and walking skeleton approaches. These methodologies guided us throughout the entire build, ensuring we had a working product from day one rather than assembling separate pieces later. In other words, these two approaches gave structure to iterative development.
Context for fellow builders: In software engineering, a walking skeleton is a minimal, end-to-end implementation that demonstrates core functionality and integrates all major architectural components. It validates architecture and overall system design early by ensuring everything connects and works together, even with basic functionality. Walking prototype is primarily focused on product execution risk. A tracer bullet focuses on testing specific feature viability by implementing a thin slice that touches all critical components. Tracing bullets derisk technical assumptions. Both techniques are invaluable for early risk reduction and faster feedback loops.
This approach shaped everything. Instead of working on separate slices and integrating them later, we instructed both AI and ourselves to maintain a working, evolving prototype while validating potential issues upfront. This guided our prioritization and kept the entire system functional from week one. We believe this is crucial for teams building fast—connecting separate parts later can prove surprisingly tricky and time-consuming. This style of development helps you keep the scope tight, and avoid the risk of having implicit assumptions about ease of getting the product to the finishing line. The constraint of maintaining a working product naturally limits scope's growth.
We documented our approach in a comprehensive PRD (Product Requirements Document), using Grok and Aqua (an excellent speech-to-text utility) to create an almost 1,000-line PRD based on a template we liked. We worked with various AI models to develop architectural decisions, data models, constraints, and a feasible delivery plan with tasks and sub-tasks.
Initially, we used the PRD to instruct LLMs to code on our behalf, prioritize tasks, and track progress. However—and this is a key learning—we outgrew the single-doc PRD in favor of task-oriented plans halfway through delivery. We haven't found a great way to mix high-level PRD with area-focused plans and docs.
Tech Stack and Division of Labor
We selected what we believe was (and still is) the optimal tech stack for AI-accelerated development: TypeScript + Next.js + Supabase (Database & Auth) + Resend + Vercel + Vercel AI SDK + OpenAI. We chose trigger.dev for background jobs, mainly handling AI inference. We used V0 to design initial app versions, and then switched to Cursor/Claude Code combined with Patryk's eye for refinements.
We wanted to use polar.sh for payments but ran into issues with LLMs hallucinating Polar's APIs and struggling to implement basic subscription-based functionality. To be fair, we attempted this integration at the very end and underestimated the effort required.
We divided work according to our prior experience and strengths. Greg handled the database, AI analysis, and async tasks. Patryk managed initial repository and environment setup (Vercel, Supabase, auth, and Next.js boilerplate), then implemented most design and application logic. This division remained consistent throughout the project. Exploring vibe coding techniques and AI-accelerated development was a joint activity.
Plan vs. Reality: The Inevitable Slippage
Here's how we intended to structure our 5-week sprint:
- Week 1: Complete PRD, working MVP with simple AI analysis
- Week 2: MVP solidification & user-facing UI refinement
- Week 3: Feature enhancement & robustness
- Week 4: Advanced features, marketing pages, final polish
- Week 5: Pre-release polish & nice-to-haves
Here's what actually happened:
- ✅ Week 1: Complete PRD, working MVP with simple AI analysis (On track)
- ✅ Week 2: MVP solidification & UI refinement. Learned we needed third-party background jobs—first plan deviation
- ⚠️ Week 3: Feature enhancement & robustness. Mostly on track with some delays, partly because we decided to present at AI Tinkerers Warsaw
- ❌ Week 4: Further feature enhancement & robustness. No landing pages or advanced features materialized
- ❌ Week 5: Still working on Week 4's scope
- ❌ Week 6: Marketing pages & pre-release polishing (Where we were when writing this)
Both of us had personal responsibilities consume 20-30% of our time—a reality check on ambitious timelines. As with most software development, unforseen challenges emerged. However, AI development tools made clearing these obstacles significantly more manageable.
The AI Development Experience: Living in the Future
Here's the headline: AI wrote approximately 90% of Unpitched's 24k lines of code. In Patryk's case, that number approaches 98%. As he puts it: "I lost too much time trying to get LLMs to fix CSS. It's simply not there yet. It's still way faster to fix styling yourself."
You can sense some frustration there, which is telling. Overall, we had an incredibly smooth experience using AI for various development tasks. When AI tools ultimately fail you—especially after consistent success—it comes as a genuine surprise rather than an expected limitation. This was not the case even half a year ago which is a testament to how quickly these tools are maturing.
We developed Unpitched at a unique inflection point for software engineering. We're experiencing the emergence of truly agentic development workflows. Patryk regularly had Claude Code run for up to an hour without interruptions or human intervention, doing substantial work. Some features were nearly one-shotted based on prior planning, with only minor tweaks needed.
Important caveat: we worked with a fresh codebase where we could change literally everything at any moment. We're not claiming this approach applies universally to mature codebases—it likely doesn't. However, our development velocity remained remarkably consistent throughout the project.
Experience has become everything in AI-augmented development. Working effectively with AI requires new skills. Some are transferable from traditional development, but most stem from direct experience interacting with AI tools and understanding their capabilities and limitations. A good mental model to think of AI tools as somewhat unreliable junior engineers that you have to guide and coach towards success. For a lot of engineers, this is a new skill that requires practice, and more importantly buy-in.
We'll be sharing more insights about AI development workflows in the coming weeks. Follow us on X (Patryk, Greg) to tune in on those updates.
Key Learnings: What We'd Do Differently
Here are the most valuable insights from our 5-week journey:
- Having an exhaustive PRD doesn't scale beyond solo development.- Maintaining all context and tasks in a single document seems logical initially, but as projects grow, keeping comprehensive documentation current becomes unwieldy. With multiple people working on the same file, you inevitably face merge conflicts and context overload. We abandoned our PRD for feature planning three weeks in, switching to dedicated planning files that referenced relevant documentation for context. However, PRDs are still excellent for getting the project off the ground. 
- Think, brainstorm, and plan before writing code dramatically improves AI-accelerated development.- For example, when implementing comprehensive error handling, you first plan the task with an LLM and save it as error-handling-plan.md. After implementation, ask the LLM to convert it into documentation with code snippets and real examples. Later, when providing this documentation as context for new work, you can be confident the AI will implement documented patterns correctly. This approach was crucial to our smooth development experience. We ended with 16 comprehensive documentation files. 
- LLMs are living through their own Twitter Bootstrap moment- Most AI-generated apps look similar, and this isn't the bar for production applications. While it's great that LLMs understand Shadcn and Radix, the aesthetic of LLM-generated apps feels reminiscent of the early Twitter Bootstrap era. You can iterate with AI on CSS and styling, but it becomes guesswork and grows frustrating quickly. LLMs are not great visual or spatial thinkers, yet. We wish we'd allocated time to create a proper design system in Figma, port it to Storybook, and have Cursor/Claude Code work with perfectly styled components. Instead, we relied on V0, which was helpful but not yet remarkable. 
- AI as Reliable Worker vs. Creative Partner- AI excels as a reliable executor for well-defined tasks but becomes unpredictable for ambiguous work. It can handle clearly scoped work with specific intent and get it done successfully. For example, we use AI to execute database-related runbooks with consistent results. However, for nebulous tasks, AI ranges from being counterproductively detailed (nitpicking non-essential aspects) to brilliantly insightful (suggesting innovative approaches to tricky problems). 
- The 80% Rule in AI Development- AI excels at 0-to-80% work: prototyping, basic planning, asking the right UX questions, and similar foundational tasks. It will dramatically accelerate you to 80% completion on most tasks but won't produce fully finished results you'll be proud of. The remaining 20% requires either AI-assisted work where you guide and curate AI-generated ideas or manual sprint to the finish line. The key insight: for many aspects of product development, 80% completion is absolutely sufficient. For an experienced engineering duo, we estimate our throughput of productive-engineering-work-done increased 4-5x using this approach. 
- Aiming docs at AI-first tools like Cursor- There's an emerging adoption vector for developer tools: optimizing documentation for AI-first development environments like Cursor. Trigger.dev's Cursor rules are ahead of the pack of this approach —they made trigger.dev integration relatively painless, and we can't recall a single API hallucination when asking LLMs in Cursor to write trigger tasks. In contrast, Polar.sh docs aren't optimized for this style of building. We ended up adopting trigger.dev (and genuinely enjoying it) while dropping Polar.sh integration as a result. 
- Instruction Following Improvements- LLMs have made substantial strides in instruction following. Unpitched's core functionality relies on a single, large prompt passed to GPT-4.1. This approach wouldn't have worked with GPT-4 or even Claude 3.5—we would have needed a more multi-prompt setup with higher latency and more moving parts. 
- AI evaluations are still hard- Evaluations for products like Unpitched are both essential and very difficult. Existing evaluation tools generally don't work well. We, as a software engineering profession, haven't discovered the right abstractions and coding patterns for this domain yet. We, as Greg and Patryk, had to build custom evaluations ourselves. Fortunately, AI makes building this kind of fit-for-purpose tooling much easier and cheaper. 
- Hosting and Third-Party Configuration Overhead- AI can't help with the operational overhead that still consumes significant development time. We spent considerable time on tasks AI couldn't assist with: GitHub Actions, deployments, Vercel/Supabase/trigger.dev/Resend configurations, syncing environment variables, and domain setup. Since most of these involve UI-based interfaces tied to pricing tiers, you simply have to do them manually. This platform integration work remains a genuine time-sink that feels particularly tedious after experiencing AI acceleration elsewhere. Andrej Karpathy recently made a similar observation about the gluing work where LLMs don't help yet. 
Closing Words
We built Unpitched during a unique moment in software development. AI-accelerated development is transitioning from experimental to genuinely practical for experienced teams working on greenfield projects. The productivity gains are real, but success requires understanding AI's strengths and limitations, developing new collaboration patterns, and building workflows that amplify rather than replace human judgment.
If there's surprising demand for Unpitched, we'll continue developing it. Otherwise, we're committed to keeping it running for some time with daily usage limits. If you've tried Unpitched and found it useful, we'd genuinely love to hear about it! Use the feedback form in the app or give us a shout on X.
Follow us to see what we build next.



