5YView｜如何打造真正有用的AI產品？生成式AI改變的創業規則

推薦人

石允豐五源執行總監

從五年前開始專注AI投資時，我就憧憬能找到天才產品經理來連結「AI的侷限能力」和「優雅閉環的使用者體驗」。ChatGPT釋出兩年後，期待中的AGI時代的天才產品經理並沒有出現，並沒有橫空出世的喬布斯，張小龍，Evan Spiegel 教我們怎麼做AI產品。

Granola 可能在中國並不是一個好的獨立商業生態位，但在一個哪怕在美國都極度擁擠的AI筆記賽道，做到了口碑最好的產品體驗。我認為 Chris目前探索出的產品方法論，對所有2C / prosumer 產品尤其有借鑑意義。如果你在build的產品適用於以下標準中的3/4，歡迎聯絡我交流討論 [email protected]

How to Build a Truly Useful AI Product

Generative AI breaks the old startup playbook

作者：Chris Pedregal

Granola 聯合創始人兼 CEO。他曾聯合創辦教育科技公司 Socratic，後被 Google 收購。

首次釋出時間：2024.12

如果說創業像是在打一個難度很高的影片遊戲，那麼在生成式 AI 領域創業，就像是在 2 倍速下打這個遊戲。

當你在“應用層”創業——即你依賴像 OpenAI 或 Anthropic 提供的 AI 模型時——你實際上是建立在一項發展速度前所未有、不可預測的技術之上：每年至少兩次重大模型更新，技術飛躍異常迅猛。如果你不夠謹慎，可能花了幾周時間開發一個功能，結果下一代模型一發布就自動實現了它。而且，由於所有人都能接入優秀的 API 和最前沿的大語言模型（LLM），你以為獨一無二的產品創意，很可能被任何人實現。

當然，AI 的確打開了許多全新可能——例如程式碼生成、研究輔助等產品能力在過去是完全不可能的——但你需要確保自己是站在技術浪潮之上衝浪，而不是被浪打翻。

這正是我們需要一套“新打法”的原因。

在過去兩年裡，我一直在打造 Granola ——一款能用轉錄和 AI 最佳化會議記錄的智慧筆記本。這段經歷讓我相信，生成式 AI 是一個完全不同的領域，傳統的“創業物理學定律”在這裡並不完全適用。比如，“先解決使用者最大的痛點”“使用者量越大服務成本越低”這些老規律，在 AI 領域未必成立。

如果你的直覺是由傳統創業經驗培養出來的，你就需要重新建立一套適用於 AI 的直覺。在這兩年的摸索中，我總結出了一套適用於 AI 創業者的四個關鍵原則，我認為每一位做應用層 AI 的創始人都應該瞭解。

1. 不要把時間浪費在即將消失的問題上

LLM 正在經歷人類歷史上最快速的技術發展之一。兩年前，ChatGPT 還無法處理圖片、解決複雜數學問題或生成複雜程式碼——而這些現在都已輕鬆實現。兩年後的能力圖景，很可能又完全不同。

對於應用層創業者來說，很容易掉進“錯解問題”的陷阱：你埋頭苦幹去解決某個眼前的問題，而下一版 GPT 一齣，這個問題就不復存在了。所以，不要把時間浪費在即將消失的問題上。

說起來容易，做起來難——因為你得去預測未來，而這聽起來就讓人很不舒服。你需要設法預判 GPT-X+1 能做什麼，然後圍繞這些預測來制定產品路線圖和發展戰略。

舉個例子：Granola 的第一版無法記錄超過 30 分鐘的會議。當時最強的模型是 OpenAI 的 DaVinci，上下文視窗僅有 4000 tokens，所以只能處理短會。按傳統邏輯，我們應該立刻優先解決這個問題——怎麼能讓人用一個只能記短會的會議記錄工具呢？但我們有一個假設：LLM 很快就會變得更強、更快、更便宜，且上下文視窗更長。於是我們決定不花任何時間去最佳化上下文視窗限制，而是集中精力提升筆記的質量。

這段時間，我們甚至要有意識地忽略使用者對“時間太短”的抱怨。但我們的假設是對的：幾個月後，LLM 的上下文視窗已經足夠大，可以處理更長的會議了。如果我們一開始去解決“記錄時長”問題，現在那些工作就全都浪費了。而我們那段時間在“筆記質量”上的投入，至今仍是使用者最喜歡 Granola 的原因之一。

2. 邊際成本就是機會

在過去，軟體最具代表性的特徵之一就是：新增一個使用者的邊際成本幾乎為零。如果你的產品能服務 1 萬個使用者，支援 100 萬個使用者也不會貴太多。

但在 AI 領域這個規律不再適用。每增加一個使用者，其邊際成本依然存在，而執行最前沿的 AI 模型非常昂貴。比如，把一個 30 分鐘的會議音訊傳送給 OpenAI 的旗艦語音模型 GPT-4o，一次就要花費約 4 美元。如果每天有成千上萬的使用者使用，這個成本將十分可觀。

不僅如此，你的創業公司也無法隨意擴容使用者數。即便你擁有無限預算，OpenAI 和 Anthropic（Claude的開發者）也沒有足夠的算力資源，為數百萬使用者同時提供前沿模型的服務。所以，歷史上首次出現了一種新可能：為少數使用者提供更好產品體驗，比為千萬級使用者服務更可行。

但這並不是阻礙，反而是創業者的巨大機會：大公司反而沒法和你競爭，因為世界上根本沒有足夠的計算資源讓他們為數百萬使用者都提供最先進的 AI 體驗。

作為創業公司，你完全可以為每個使用者提供一輛“法拉利級”的產品體驗：

盡情使用最貴、最先進的模型

不要優先考慮成本最佳化

如果多呼叫 5 次 API（即傳送給 LLM 提供商的請求）能顯著提升體驗，那就去做

雖然每個使用者的成本可能較高，但初期你不會有很多使用者。而要記住：像 Google 這樣的公司，也只能為使用者提供“本田級”的體驗。

你可能會想：那如果我的“法拉利體驗”吸引大量使用者怎麼辦？我不也會像現在的大公司一樣，難以提供高質量的服務嗎？

答案是：別擔心。即使你的使用者數量指數增長，AI 推理的成本也在指數下降。

今天的前沿模型，在一兩年後就會變成“白菜價”。今天的法拉利，就是明天的大眾。趁現在還開得起法拉利，就去開吧。

3. 上下文為王

當我們最初為 Granola 編寫提示詞、讓模型生成會議記錄時，很快就意識到：逐條列出操作指令在實際中並不好用。

現實世界是混亂的，不可能事先為每一種情況都寫好規則。即使你真的涵蓋了所有場景，規則之間也很容易互相沖突。

這時我們意識到一個關鍵點：與其把 AI 模型當作“執行命令的工具”，不如把它當作第一天上班的實習生。一個實習生是聰明的，但缺乏上下文，不知道該做什麼、怎麼做。讓一個實習生成功的關鍵是給足他理解你的“語境”。

這也就是我們現在在 Granola 所採用的提示策略：不是隻告訴模型“怎麼做”，而是為它提供經過精挑細選的上下文，讓它“像你一樣思考”。

對於 Granola 來說，它的任務是生成高質量的會議筆記；而它需要的上下文是：誰參與了會議？會議的背景是什麼？你希望透過這場會議達成什麼？你的長期目標是什麼？這場會議如何服務於那個目標？

我們負責從網路和各種來源找到這些關鍵資訊，讓模型掌握你的意圖，並最終寫出真正有用的內容。這個過程的“藝術”就在於：選擇哪些上下文資訊最有價值、該如何包裝這些資訊。

無論模型多麼強大，它所得到的上下文永遠至關重要。

我相信：“上下文視窗選擇（context window selection）”將成為我們這個時代的一個核心理念，其意義遠遠超出 AI 領域本身。就像工業革命時期，在工業革命時期，人們用機械術語來描述大腦的運作，比如“釋放壓力”就是一種比喻。到了計算機時代，我們開始用“頻寬”“儲存容量”這樣的詞來形容大腦。而接下來，我們也許會用“上下文視窗”來理解大腦的工作方式。這個概念，最終會滲透到科技之外的整個社會中。

4. 更窄、更深

如今打造 AI 產品最有趣的挑戰之一是：你正在和 ChatGPT、Claude 這樣的通用 AI 助手競爭。它們已經在很多事情上做得相當不錯。那麼，怎樣打造一個讓使用者願意放棄“瑞士軍刀”，選擇你的產品呢？

答案只有一個：做得足夠“窄”——真的非常窄。選一個非常具體的場景，然後在這個場景裡做到極致。

“打造人們真正想要的東西”這個創業鐵律，在 AI 時代依然成立，但門檻變得更高了。

但有趣的是：在特定場景下提供卓越體驗，往往和 AI 本身關係不大。

我們在 Granola 上花了無數時間最佳化筆記質量，但我們也花了同樣多的時間去打磨非 AI 功能，比如會議提醒是否順暢、回聲消除是否做得足夠好（無論你是否戴耳機，使用體驗都得完美）。AI外面的“包裹層”（wrapper），往往才是決定使用者體驗是“驚豔”還是“雞肋”的關鍵。

同時，聚焦場景越窄，也更容易最佳化 AI 部分的表現。

當 AI 的回答正確時，體驗會非常“魔法”；但當它出錯時，那種錯誤往往讓人出戲甚至不安 —— 你會立刻意識到自己在和一個演算法對話，而不是一個人。這種“落入恐怖谷”的產品體驗，很可能會讓使用者徹底流失。

但如果你聚焦足夠窄，就更容易識別 AI 最常見的失敗模式，從而進行有效的規避，或者至少讓它“體面地失敗”。

基本原則從未改變

在生成式 AI 的世界裡創業，就像你在跑步機上奔跑，而傳統科技還在慢慢散步。這種節奏上的不同，會影響你所面對的技術問題、產品打磨路徑乃至擴張節奏。

雖然這種“加速狀態”確實要求你採用新的策略，但有一件事從未改變：你仍然必須去打造人們真正想要的產品。沒有捷徑。

你依舊需要關注細節、耐心打磨產品體驗。最關鍵的問題仍然是那個看似簡單卻最有洞察力的提問：

“這個產品，給我帶來了什麼樣的感受？”

以下為原文：

If building a startup is like playing a tough video game, building a startup in generative AI is like playing that video game at 2x speed.

When you’re building at the application layer—your startup uses an AI model provided by companies like OpenAI and Anthropic—you're relying on technology that is improving at an unpredictable and unprecedented rate, with major model releases happening at least twice a year. If you're not careful, you might spend weeks on a feature, only to find that the next AI model release automates it. And because everyone has access to great APIs and frontier large language models, your incredible product idea can be built by anyone.

Many opportunities are being unlocked—LLMs have opened up product abilities like code generation and research assistance that were impossible before—but you need to make sure you are surfing the wave of AI progress, not getting tumbled by it.

That’s why we need a new playbook.

Having spent the last two years building Granola, a notepad that takes your meeting notes and enhances them using transcription and AI, I’ve come to believe that generative AI is a unique space. The traditional laws of “startup physics”—like solving the biggest pain points first or that supporting users gets cheaper at scale—don’t fully apply here. And if your intuitions were trained on regular startup physics, you’ll need to develop some new ones in AI. After developing these intuitions over the last two years, I have a set of four principles for building in AI that I believe every app-layer founder needs to know.

1. Don't solve problems that won't be problems soon

LLMs are undergoing one of the fastest technical developments in history. Two years ago, ChatGPT couldn’t process images, handle complex math, or generate sophisticated code—tasks that are easy for today’s LLMs. And two years from now, this picture will look very different.

If you’re building at the app layer, it’s easy to spend time on the wrong problems—those that will go away when the next version of GPT comes out. Don’t spend any time working on problems that will go away. It sounds simple, but doing this is hard because it feels wrong.

Predicting the future is now part of your job (uncomfortable, right?). To know what problems will stick around, you’ll need to predict what GPT-X-plus-one will be capable of, and that can feel like staring into a crystal ball. And once you have your predictions, you have to base your product roadmap and strategy around them.

For example, the first version of Granola didn’t work for meetings longer than 30 minutes. The best model at the time, OpenAI’s DaVinci, only had a 4,000-token context window, which limited how long meetings could be.

Normally, lengthening this time frame would have been our top priority. How can you expect people to use a notetaker that only works for short meetings? But we had a hypothesis that LLMs were going to be much better: They’d get smarter, faster, cheaper, and have longer context windows. We decided not to spend any time fixing the context window limitation. Instead, we spent our time improving note quality.

For a while, we had to actively ignore users who complained about the duration limit. But our hypothesis was right: After a couple of months, context windows got big enough to handle longer meetings. Any work we would have done on that would have been wasted. Meanwhile, the work we did on note quality is one of the main reasons users say they love Granola today.

2. Your marginal cost is my opportunity

Historically, a defining characteristic of software was that the marginal cost of supporting an additional user was close to zero. If you had a product that worked for 10,000 users, it wouldn't cost that much more to support 1 million users.

This is not true when it comes to AI. The marginal cost of every additional user remains the same, and cutting-edge AI models are really expensive to run. For example, sending the audio of a half-hour meeting to OpenAI’s flagship GPT4o audio model costs about $4. Imagine that cost scaled across thousands of users, every day. There’s also a limit to the number of users your startup can onboard. Even if you had all the money in the world, OpenAI and Anthropic (which makes Claude) don’t have enough compute to support cutting-edge models for millions of users.

For the first time, it’s possible to provide a better product experience for a small number of users than for millions of users. But this isn’t an obstacle—it’s a big opportunity for startups. Big companies with millions of users literally can’t compete with you because there isn’t enough compute available in the world to provide a cutting-edge experience at scale.

As a startup, you can give each of your users a Ferrari-level product experience. Use the most expensive, cutting-edge models. Don’t worry about optimizing for cost. If doing five additional API calls (server requests to your LLM provider of choice) makes the product experience better, go for it. It might be expensive on a per-user basis, but you probably won’t have many users at first. And remember: At best, companies like Google can provide their users with a Honda-level product experience.

You might be wondering what happens when users come flocking to your Ferrari product experience. Won’t you end up in the same position as the big tech companies of today, unable to provide high-quality, cutting-edge services to your users?

The beauty is that even if your user base is growing exponentially, the cost of AI inference is decreasing exponentially. Today’s cutting-edge models will be affordable commodities in a year or two. Today’s Ferrari’s are tomorrow’s Hondas. Be a Ferrari while you can.

3. Context is king

When we first started writing prompts for Granola to generate meeting notes, we quickly realized that providing a set of step-by-step instructions doesn't work well in practice. The real world is messy, and it’s nearly impossible to anticipate and write rules for every situation an LLM might encounter. Even if you could cover every scenario, you'd inevitably have conflicting guidance.

We had an insight: Instead of treating AI models as something that just follows instructions, we should treat them like interns on their first day. An intern is smart but lacks context on what to do and how to do it. The key to an intern's success is to give them the context they need to think like you.

That's how we approach prompting at Granola now. We provide the model with curated context to guide its thinking. For Granola, the use case is writing great notes from a meeting. The context is understanding who is in the meeting and why it’s being discussed. Our work is to find that information—from the web and other sources—and then get the model to think like you (What are you trying to get out of this meeting? What are your long term goals and how is this meeting in service of that?) and put only the relevant information in the notes. The art is in selecting which context to provide and how to frame it—because no matter how good models get, the context you give them will always matter.

I believe "context window selection" will be one of the defining ideas of our time, with implications far beyond AI. During the Industrial Revolution, the brain was described in terms of mechanical machines—blowing off steam, for example. When computers emerged, we started to use terms like “bandwidth” and “storage capacity.” I think we will start describing how the brain works in terms of "context window selection.” This idea will permeate well beyond tech.

4. Go narrow, go deep

One fascinating challenge with building AI products today is that you're competing with general-purpose AI assistants like ChatGPT and Claude. They’re pretty good at most things. How do you build something good enough that users will choose you over these Swiss Army knives?

The only answer is to go narrow—really narrow. Pick a very specific use case and become exceptional at it. The cardinal rule of startups—building something people want—remains consistent in AI, but the bar is higher.

But here's the plot twist: Exceptional experiences for narrow use cases often have little to do with AI. We spend endless hours on note quality at Granola, but we spend just as much time on features like seamless meeting notifications and great echo cancellation (so our tool works whether you're using headphones or not). The "wrapper" around the AI is often the difference between a delightful experience and a great demo that is disappointing to actually use.

Going narrow also makes it easier to improve the AI part of your product. When AI gets a response right, it’s magical. But when it gets it wrong, it does so in ways that can feel weird and disconcerting. It becomes obvious that you’re not talking with a human, but with an algorithm. Product experiences that fall into the uncanny valley can push users away from your product for good. When you go narrow, it’s much easier to identify the most common AI failure cases, and either mitigate them or try to fail more gracefully.

The fundamentals are the same

Building in generative AI is like running on a treadmill while traditional tech moves at walking speed. This speed impacts everything from the technical problems you tackle to your timeline for reaching scale. While this acceleration should change your strategy, it doesn’t change the fundamentals of building a good product. You need to build something people want.

There are no shortcuts. You still have to sweat the details. And the most clarifying questions remain deceptively simple: How does this product make me feel when I use it?