Category Inference Experiments With UCSINFER
I’ve had some success auto-categorizing sounds into their corresponding UCS categories by using an LLM and the technique of sentence embedding. I’m writing up my process here and hope people find it useful and can maybe take this forward to do other things.
This is also my first post to the “Notebook” and am testing the posting system.
How I Learned What ChatGPT Can’t Do
One of my original experiments with ChatGPT was to try to use it to automatically select a UCS category for a given text description. This seemed like exactly the kind of problem ChatGPT should be able to solve well: it knows that a Ford is a car, that explosions go bang and may invovlve gasoline or black powder, inferring a category given the text description should be easy.
So, I got a few categories and their descriptions from Tim Neilsen’s Excel document one morning, put them into CSV and uploaded them into a new chat, with the preface that I wanted it to classify sounds based on their textual description and that I was giving it a list of categories, and I’d like the best match and corresponding score. It said that it understood and, with a little extra caveats and coaxing I asked it something like…
How would you classify “Crowd Reaction Sweetener: Boo; Large Crowd Ambience; Dull Roar To A Time-to-retire Boo, Everybody Is Upset, Medium Perspective.”?
And it would reply with…
CRWDReac
So far so good, but then you’d give it something similar like…
How would you classify “24 Crowd (50)_ Steady _ Sparse _ Subdued, Int _ Wide Pov. Laughs Crowd, Medium..aif”?
And it’d reply with something like
CRWDSparse
Which sounds right, until you check the UCS and note that there’s no category by that name. ChatGPT would invent new categories following the UCS CatID prototypes. It wasn’t really understanding that it had to choose from a fixed list of word options and this turns out to be something it can’t really do.
It seems strange that ChatGPT could invent new words, but it’s not too remarkable when you consider that LLMs generally parse natural language in tokens, short strings of letters around five characters long, and it understands words as chains or permutations of tokens. In fact, under certain circumstances you can get ChatGPT to invent all kinds of goofy Lewis Caroll-type neologisms. When it encounters UCS CatID literals, it doesn’t understand them as atomic values but as normal words that can be shaped by the natural English language model.
I was never particularly crazy about using ChatGPT for categorizing sounds, not least because ChatGPT costs money and requires you to send them all the data to crunch on it. This isn’t ideal, it’s expensive, not secure, etc. so the fact that it didn’t work made me put the idea away for a while.
Working Smarter: Sentence Embedding
I had some time recently and got back into this, but this time I’d done a little more reading and had discovered the concept of Sentence Embedding, which is something LLM services do as a part of their inference process but can be used by itself for useful purposes. And even better, you can run it on your own machine using freely-available models and get very good results.
Sentence Embedding is the process of taking a word or sentence and then calculating a “tensor” or vector from the sentence that captures the semantic meaning of the sentence. The vector captures the semantics of the sentence across some dozens or a few hundred of dimensions, and even better, two vectors can be compared with each other in different ways mathematically and by measuring how “close” they are, this can indicate how close the two sentences that made them are in meaning. And this embedding process is still using an LLM, and can avail itself of all of the language knowledge of an LLM: it knows a magnum is a gun, that “bottle shatter” implies glass, and so on. It just requires a little more code.
ucsinfer Module and Project
ucsinfer is a work-in-progress but you can see it working in
principle in  my git repository in the notebook.