I knew that the Newton device from Apple would be a failure when it didn't have any communications connectivity. I also knew that the Newton device would fail when, in order to get communications connectivity, you had to buy a separate device for exactly the same amount of money as the base unit and exactly the same size as the base unit.
Then again I have never been able to type. I have always wanted something to do the typing for me, and I have always wanted something to take dictation to enable me to write down what I wanted to write. I do not know how to explain why I loathe and despise, to the very depths of my soul, soft keyboards on smartphones. I have hated them ever since actual physical keyboards disappeared from smartphones. So all I really wanted was something to take dictation for me.
On the other hand everybody else seems to have wanted something to turn on their lights, play their music, choose from a selection of playlists, add items to their shopping list, and to buy items from their shopping list so that's what Siri and Alexa seemed to have been built for. Of course all of these functions are fairly simple and so they never needed much artificial intelligence to get them to work.
All of which is sort of circling the fact that what we really want on our smart phones is a kind of a personal assistant. We want something to remember things for us. We want something to remind us of important events. We even want something to decide which events *are* important. We want something to decide which calls to us are important enough to bother us about. And this is what we want, what we really want, from artificial intelligence.
This has something to say about what we want for artificial intelligent assistants or devices. Do we want something that looks and acts like our current cell phones? Do we want something like the communicator on Star Trek, that's simply a microphone and a speaker and some kind of communications to a centralized computer system?
First, if we are going to simplify it down to that minimalistic communicator device, we are definitely going to have to do something about the reliability of artificial intelligence and that problem of hallucinations. (What is the difference between artificial intelligence and a used car salesman? Answer: The used car salesman knows when he is lying to you.)
We have gotten to the point of artificial intelligence being somewhat useful for producing programming code for us, and also we have gotten to the point where artificial intelligence can be useful for various types of agentic operations. We still need to have, or possibly formalize, the syntax of specifications that we accumulate and refine possibly over three and a half days of thinking, and then finally commit to getting an agreed upon set of specifications. Having the artificial intelligence define those specifications for you and commit to executing the action.
All of which is kind of background and explanation for why I am doing a review of Wispr Flow.
I have tried out and reviewed at least four different versions of dictation systems so far. The two that I use most frequently are Gboard, which I use on my Android phones for dictation to pretty much anything, and Live Transcribe, which I use because it has an independent unconnected mode. While problematic in terms of accuracy, at least it works when I don't have a connection to the Internet.
The reason to add Flow to the mix is that it is produced by OpenAI. OpenAI, of course, is the producer of ChatGPT and a number of the other major artificial intelligence tools that are available to the general public. Therefore it stands to reason that Flow will be OpenAI's tool for a local artificial intelligence tool, something along the lines of a personal assistant. It therefore makes sense to see how well Flow works and whether it is reliable enough and accurate enough to be used in this type of a situation.
I am interested in the fact that Wispr Flow is available for multiple platforms. I am particularly interested in the fact that it is available for Windows. This gives me a dictation capability on my desktop machine, which I greatly appreciate.
Perhaps not as greatly as I might. I have, in testing out Wispr Flow in order to do this review, found that I would really rather prefer to do dictation onto my phone, and, as I will note, there is a problem with that.
Wispr Flow is available both for Windows and for Android, as well as a number of other platforms. This is handy for me since I can install it both on my desktop and on my cell phone. Presumably I can also install it on the laptop at some point and I might be getting around to that.
Anyway for the first test I tried it on the Android cell phone. That test was a complete and unmitigated disaster.
As I have mentioned I have experience with a number of other dictation applications. As far as I can recall, all of them will display to you, as the person dictating, the output and transcription of what you are dictating.
As noted I most frequently use Gboard and Live Transcribe. Both of these display, as you are dictating, what they are transcribing. Both of them (and this is only to be expected since both are made by Google) have an interesting property where if they haven't fully decided on what the final transcription will be, the text that they have transcribed so far and is still under consideration shows up as being underlined. When the underline disappears the system has decided what the final transcription will be. In any case the system displays to you in real time what it figures you have said.
That is not the case with Flow. Initially it *really* threw me. I dictated something and nothing appeared on the screen. Because I was using the Android version and possibly because of some weird issue with settings or formatting, even after I stopped dictating a test and hit the button indicating that I was finished dictating, nothing appeared.
I tried this multiple times and then I started looking into possible problems, shifting more or less immediately into systems analyst mode. I figured out that, yes, what I had dictated *had* been transcribed, but for some reason it showed up as white text on a white background. It was therefore not until I did some work to select the text in the area that I figured that there was some text, but invisible. Once I could pull up that text I found that, yes, all three attempts had in fact been transcribed. However since I had been frantically trying to figure out where this text had been transcribed, the various attempts were embedded within each other and the total text was a horrendous mess.
Subsequent testing indicated that this was not specifically a problem with the Android version. This must have had to do with some kind of formatting issues because I have tested it once again on the Android smartphone and in a very similar situation with the same application and the result were pretty much okay.
I should note that in an early feedback to Wispr Flow I mentioned this problem and got a response from their technical support that I should look for settings dealing with fonts and font colours and settings in the application. They weren't specific about whether it was the Wispr Flow application or the application that I had been using Wispr Flow to provide input to. In any case I couldn't find any settings on the phone, in either application, that dealt with fonts or font colours. Their technical support wasn't really very supportive.
(I've had subsequent contacts with Wispr Flow support. I suspect that "Tina" is a bot. Regardless, content that I send to them seems to get lost somewhere along the way. In addition, suggestions from support tend to include references to options that don't appear in either version of Flow that I am currently testing.)
Technical support did tell me that this issue of the text not appearing until you have finished dictating is a deliberate design choice in the case of Flow. Personally I think it's a pretty stupid choice.
I have been practicing, very extensively, with dictation software for the last four years. It is a non-trivial task until you start to get the hang of it and it is also extremely difficult when you have no feedback.
If you are thinking about what you want to say and you can't see what you have said, to determine whether or not you are using too much repetition of a given word, or if you have already dictated a specific piece of information that you want to include, it can be very difficult. I would definitely disagree with Flow's design choice in this regard.
As I have noted I have used both Gboard and Live Transcribe fairly extensively. As I have also noted I use Live Transcribe in the unconnected mode. Therefore it is completely unsurprising that Live Transcribe makes many more errors than Gboard does. Gboard does not have an unconnected mode and you can only use it if you are connected to the Internet. Therefore Google, and its massive data centres, are supporting the transcription of what you dictate to Gboard. I have used Live Transcribe in situations where I can't be connected to the Internet and it's a bit of a pain to have to do all of the work necessary to edit the material that has been transcribed, at some later time, in order to get what you really want. But I still appreciate the fact that I can dictate something and edit it later. However even Gboard is not perfect. That's actually putting it mildly. There are frequently some pretty major transcription errors. You have to say any punctuation that you want to have inserted in your text, with Gboard, and frequently when I want it to put in a comma, it instead inserts the word "karma".
So it is fairly easy to say that Flow is much more accurate than Gboard. Flow gets many more words down correctly than does Gboard. Flow doesn't make as many mistakes. Flow can handle punctuation even if you don't say it but it isn't as good with commas as it is with periods. Flow can handle certain levels of formatting, even if you don't ask for it. I was interested when it started to create bulleted lists for me even though I didn't want bulleted lists in that particular case.
The advertising for Wispr Flow seems to indicate that it can handle transcription even if it isn't connected to the Internet. However I have examined the settings for Wispr Flow, at least on my desktop machine, and I don't find any setting that indicates that I can turn on or off a connection to the Internet. I will probably have to do some more extensive work on my smartphone in order to test that out.
(I have also, in the course of doing some testing for the purposes of this review, found that occasionally Wispr will actually take down a transcription but not paste it into the application that you think you are working in. On the Windows desktop version you can call up the Wispr application itself and find that the transcription has been recorded in Wispr. You can then copy and paste it back into the application you thought you were using.)
I'm using the free version of Flow. At least I *think* I'm using the free version of Flow. The Wispr Flow application, itself, tells me that I have access to the Pro version for a couple of extra weeks. However it doesn't tell me whether I am actually using the Pro version right now. So while I appreciate the dictation capability that Flow is providing to me, it could tell you a bit more about itself. I think this is only fair. After all, I have not turned on the privacy setting and therefore Flow is using my attempts at dictation to tune and improve Flow. Regardless of whether it says so or not, I am quite sure that Flow is also feeding my transcription back to Open AI so that they can use it in building the next round of ChatGPT. Hey, fair's fair.
I like it. I'll probably continue to use it. But it definitely still has some bugs.
And I still think they should show you what you're transcribing in real time.
A few more bits.
Flow's ability to handle punctuation and formatting can be interesting at times. Flow will eliminate punctuation, if it feels like it, even if you have given it spoken commands to include punctuation. Flow is an American product, of course, and seems quite insistently determined to eliminate all possible commas.
As I have noted, Flow is able to handle stumbles over words and usually turns out a pretty good edit no matter how much of a fumble tongue you have been in doing the dictation. However I am concerned that occasionally Flow may edit out stuff that it simply considers extraneous. And Flow is definitely not as good of a copy editor as Gloria was.
I am getting used to Flow's lack of immediate display of what it is transcribing. However this is probably at the cost of some change in my writing style. I am probably moving more to an Ernest Hemingway style of writing, in contrast to my preferred Henry James.
I have noticed, although it may be due to other factors, that since I have started the trial of Flow my writing productivity has gone up considerably. You guys are *really* in trouble now.
AI topic and series