Tuesday, January 28, 2025

ChatClauDeepGemGrokMeta

With the recent interest in artificial intelligence, and particularly the genAI/large language models, I have been running various tests.

One of the tests involves asking for suggestions to save a (real) charitable organization facing dire (and complex) financial straits. I submitted "prompts" containing details of the situation to ChatGPT, Claude, Deepseek, Gemini, Grok, and Meta AI. (If the systems had paid versions, I used only the free versions.)


None of the systems were outstandingly useful. For example, I noted that a single type of expense was the major issue, and that other expenses were not worth trying to avoid. All of the systems, nonetheless, suggested trying to reduce other expenses.

Deepseek, Grok, and Meta AI were all the least useful. All of them produced text that could have come from a bank pamphlet on "how to address your money troubles." Deepseek was marginally the worst of the three, but neither Grok nor Meta AI produced anything worth pursuing.

ChatGPT, Claude, and Gemini did include similar fluff that would require extensive pruning. However, they did also suggest some interesting lines of activity that might be worth pursuing.

I'm still trying to answer Neil Postman's query about "what is the problem to which this technology is the solution?" I was beginning to think that genAI, hallucinations and all, could be used for brainstorming, if you have no friends. (As long as you recall Scott Adam's assertion that "Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.") Plowing through the fluff, to mine the occasional nuggets of utility, could be worthwhile, as long as there *are* nuggets of utility. (That does seem to leave Deepseek, Grok, and Meta AI out in the cold.)

At the moment, these, and other systems, are vying in an AI arms race, trying to capture "first mover" market dominance (and are burning through thousands of GPUs and gigawatts of energy in order to do so). Every week someone is claiming to have beaten a benchmark or achieved some new kind of function, so this test is probably only valid for another eighty-three hours.

Postman's question remains.

No comments:

Post a Comment