W3 Recap: 4o, Marketing & Product work

3 days off but still managed to make some progress…

May 28, 2024

Context: In May I left my full-time job to start a 3-month LLM learning cycle by building a few products of my own. I’m documenting my journey here. Every week I post a recap of how the week went, things I learned, etc.

Had a bit of a shorter week. I went on a camping trip to Yosemite and had no service for 3 days. Despite this, still managed to pack 3 days of 10 hours.

Here's some of the progress and learnings:

Canopy AI: JSON vs XML and streaming

I'm still helping maintain the AI Assistant I built for Canopy. It has the usual Chat experience where a user asks a question, we do a RAG search and layer a bunch of things on top.

One frustrating aspect has been GPT-4's unreliability in returning JSON. OpenAI's Completion API has a JSON mode that ensures valid JSON output, but this option isn't available on the Assistant API when using the File Retrieval tool (their built-in RAG).

Although OpenAI's built-in RAG gives me better results compared to Pinecone and PG Vector, the trade-off for Canopy's use case was losing the JSON mode.

Canopy's assistant does more than just RAG the content. It runs categorization, sensitivity checks, confidence checks, and other tasks that might trigger subsequent LLM calls or be needed in the user interface. These constraints and extra information are metadata that I used to include in a JSON (making a single API call for speed and cost).

The problem with JSON is streaming. Ideally, you want to start rendering the message as you receive the response. In the JSON, I had a markdown answer as one of the values. I'd buffer the JSON until the right value starts streaming and relay that to the user interface, stopping when the value ends. A better solution would be to have the markdown first, followed by the metadata in a structured data format (JSON, XML, etc.). After a full day of prompt engineering, I couldn't consistently get GPT-4 to return the metadata in this format after the markdown message – until GPT-4o.

Last week, I spent a day migrating from the JSON format to a structure with markdown first, followed by the metadata inside XML. Anthropic uses XML in their examples, so I've been using this tip with OpenAI as well (makes sense if you think about it, it’s easier for transformers to handle words than curly braces).

The end format looks like this:

Markdown answer…
```xml
<metadata>
…
</metadata>
```

Incorporating the ``` is another trick that took me a while to learn. When you can't use JSON mode, OpenAI will often add wrappers like these. Instead of trying to remove them in the prompt or handling failures in the app, including "shots" (examples) with the ``` as part of the answer will guide the model to always return it.

This simplified the code on our app side and made the metadata correct nearly 100% of the time. The next step is to auto-retry on the rare cases when it fails, which I haven't implemented yet.

The New Product

So far, in these 3 weeks, I've spent around 40 hours on the new product, with a significant portion of that time dedicated to boilerplate tasks that I won't need to repeat for future products, which is a positive.

While most of the AI work was already done, I couldn't resist adding some extra features at the last minute with GPT-4o. Scope creep? Why not!

I finally have a name – but I’ll share it once the landing page is up. Until now I had it under a code name, but now I have the final name, domain, and a marketing landing page 90% done.

Selecting the domain and name included initial SEO work. I think it’s BS that domains are not relevant for SEO anymore. If you are a nobody in page authority, I think it’s a good idea to make the domains very literal, so that’s how I’m starting: a short name followed by something literal.

In the 2 days I had after spending a day on Canopy, I worked on:

Feature work:
- Refining the AI features and adding some extra parallel LLM calls.
- Finished an Insights page (I hate styling Heatmaps).
- Added dark mode support while stuck in traffic driving to Yosemite.
- Started with a Telegram bot.
Marketing
- Finished a landing page and registered the domain.
- SEO: Research to find keywords I could try to rank for and figure out a strategy for evergreen content (I'll need more time here).

Before sharing the product with everyone, I still have to polish some final details, like setting up the Stripe account, deploying to the final servers, etc.

Timeline-wise, I'm thinking I need:

1 more full week for polishing, details, and putting up the waitlist (including boring things like Terms of Service).
About 3 days for the initial marketing work, including backlink work, evergreen SEO-focused pages, and some community-based distribution.

Then, I'm switching attention to the second product that I want for myself (while keeping 25% of my time on this one).

Once the product is live with users, I plan to share some of the lessons more openly by writing specific posts about certain things.

For example, on this project, I've been using a mix of multiple parallel LLM calls and OpenAI batch processing. These two things could be interesting as standalone posts.

I also hope to carve out time to write separate posts about some of the product and marketing aspects whenever I can.

Week 4 Plan

This week, I'll be splitting my work into three parts:

1 day for Canopy

2 days on New Product

3 days helping a friend with his startup (I usually work a 6-day week anyway)

I'm avoiding doing any kind of external work, but in this case, I made an exception.

It's a fixed-scope project on something I'm quite familiar with, plus it's an opportunity to learn some new React on a real product designed by great programmers. On top of that, what this friend is building is an impressive product in an interesting domain with a capable team, so I'll help whenever I can. I’m estimating probably 14 days on this (let’s see how off I’ll be this time).

This will delay my LLM learning cycle plan a bit, but it's a good trade-off.

I'll continue to share my updates here regardless.

Daniel Lopes

Discussion about this post