TL;DR
- It is advisable to go from adolesence to senility, bypassing maturity, at least for humans, for language models that can cause timeouts.
- fastapi cloud works like a charm.
- Mixing local and remote models with ollama (cloud) easy peasy - good usage limits currently (did all my processing with ollama cloud) of 90+ pdfs.
The objective
I always enjoy looking through Tom Lehrer songs, almost as much as I like to listen to them. But it plagued me that I had to do a click, actually choose a song, and then click again. I’m sure you can relate. To alleviate this I wanted to create a new website. Which just shoves one different song per day down your something.
The result
So I did and created lehrer-lyrics, hosted by the friendly people of fastapi cloud.
The code behind all this can be found in this random person’s github repo.
It contains
- a typer cli to
- download pdfs with the lyrics and
- sanitize them using a language model (
ministral-3:14bworked great) via ollama local / cloud
- fastapi web-service rendering HTML stuff created with python-fasthtml
Learnings along the way
How to get a language model into an existential crisis
The fun part of working with smallish open weights models is that it is tricky to know where the problem is when something is not working in a pipeline as you’d expect.
The core of the pipeline for this project is here:
def pdf_to_markdown(...):
...
raw_text = extract_text_from_pdf(pdf_path)
return polish_lyrics_with_llm(
raw_text,
model,
...
host=host,
...
)
So at first we yank the text (raw_text) out of the pdfs, using pypdf, and then we throw it against a language model which we politely ask
f"""Convert the following Tom Lehrer lyrics extracted from a PDF into clean Markdown.
Please:
1. Identify and bold the title (remove the — or header noise).
2. Extract the credits and place them below the title in bold italics.
3. Group lines into logical stanzas using double line breaks (verse-chorus structure).
4. Handle footnotes by moving the text associated with, e.g. `(*)`, to a blockquote at the end.
---
{raw_text}"""
In a first version I’ve used qwen3.5:27b, which has been pretty helpful in other tasks. Quite amazing how capable local models have become. But then I started running into timeouts. And I’ve set them generously to 5 minutes.
At first I thought that this is because my hardware is insufficient. Great chance to try out ollama cloud! Setup of API key and insertion into the code was easy enough. But then I hit the same wall!
Well if more resources don’t help (I really didn’t want to extend the timout limit beyond 5 minutes!) maybe it helps to actually look at what’s going on.
Doing this is easy enough using the ollama GUI. So off I went, threw the pdf text with my request into the GUI and observed. What did I see? It turns out that qwen3.5 could not stop reasoning! What was it reasoning about? A significant part about the Tom Lehrer lyrics, wondering if they are safe and if they maybe need to be altered or are okay to be returned as is. Once it agreed with itself that the task is okay, it started figuring out what it needs to do to format the content following my request. When it seemed it was done with that, it sarted tripping over the lyrics again! :-D
In a naive attempt I just switched to ministral-3:14b. And behold, it didn’t even flinch and returned what I requested.
So qwen3.5 really really really wanted to overthink things. But who can blame it with Tom Lehrer lyrics? :-)
It doesn’t work if you do it wrong
I’ve got the chance to be a beta tester for fastapi cloud and had to seize that chance, having used a bunch of other providers before and hosting also my own stuff on some machine somewhere. I was curious how they’d compare, being very optimistic given I’m an enthusiastic user of fastapi and typer. If you know what I’m talking about, then you do.
fastapi cloud has some decent documentation. But apparently, if it exists, I overlooked something that said don’t use uv_build as your build-backend and don’t store your stuff in src/. Anyway, I did, the local deployment worked fine, but the cloud deployment got stuck verifying because it could not find the lehrer_lyrics package in src/.
The solution then was to switch the uv build-backend to hatchling, like so
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
and move the lehrer_lyrics package to the root of the repo. Then things went smoothly.
But that was really the only obstacle. So high hopes met! :-)
The end
Life is like a sewer - what you get out of it depends on what you put into it. It always seemed to me that this is precisely the sort of dynamic, positive thinking that we so desperately need today in these trying times of crisis and universal brouhaha. - Tom Lehrer 1959