Deploy open-source LLMs on your Mac in less than 10 minutes with these private Chat UIs.

Want to run LLMs locally on your MAC but don’t want to wait for Apple Intelligence? In this post I’ll compare two private LLM chat UIs for Mac (for M1 or later Apple Silicon devices): Chat-with-MLX vs GPT4All. I’ll show you how to get them up and running on your Mac, compare some of their features, and explain why one of them is my clear favorite for running private, open-source LLM chats on a Mac.

With all the hype surrounding the forthcoming release of Apple Intelligence, it’s easy to forget there are other options for running LLMs locally on your Mac. Recent advances in model compression using quantization have made it possible to run open-source Large Language Models like Meta Llama3 8B and Mistral Instruct on devices with as little as 8GB of RAM.

This advance has fostered the emergence of new open-source chat UIs that make these models as accessible and easy to use as ChatGPT or Microsoft Copilot, but in a free and totally private local chat experience for Mac users. Two easy to install and free open-source options for Mac are GPT4All and Chat-with-MLX.

GPT4All Community Edition

Strengths: Everything just works out of the box. Simple, intuitive UI, easy model discovery and download. RAG support for complex use cases with multiple files and folders.
Weaknesses: No multi-lingual support.

Chat-with-MLX

Strengths: Multi-lingual support (11 languages)
Weaknesses: Retrieval Augmented Generation (RAG) only indexes a single file at a time, up to ten files, no document organization functions for RAG features. Buggy Youtube feature did not work at time of writing. UI not responsive to Dark Mode setting on Mac.

GPT4All vs Chat-with-MLX Comparison

I installed both GPT4All and Chat-with-MLX on my Mac M1 Max to give them both a test drive and there was one clear standout among the two. Let’s dive in and show you how to get both apps installed and do a quick walkthrough of their features.

GPT4All

The GPT4All Desktop Application allows you to discover, download, and run large language models (LLMs) locally & privately on your device. With GPT4All, you can chat with models and turn your local files into information sources for the models you’ve downloaded onto your device.

Installation – Installing GPT4All could not be easier. Go to the Nomic website and download the Mac version of the file. Install and then open the app as you would any other desktop application.

Download GPT4All (Mac M1 Devices or later)

Choose a model – once installed, you will need to choose your first LLM to download and install. This model will power your chats. There are over 1000 models to choose from! I recommend starting with ‘Llama3B-instruct’ from Meta. Llama is the most widely adopted open source LLM for good reason– it’s good! Once you choose your model, click download.

Chat – After your model finishes downloading, you’re ready to start chatting. Head over to the Chat UI by clicking ‘Chat’ in the left nav. Once in Chat, don’t forget to select your model from the Model dropdown in the top middle portion of your screen. Now you’re ready to enter your first prompt.

LocalDocs – LocalDocs is the killer feature that really makes GPT4All a standout, not just vs. Chat-with-MLX but vs proprietary Chat apps like ChatGPT and Microsoft CoPilot. LocalDocs allows you to chat with your own files, even folders full of files… heck even multiple folders of files… using Retrieval Augmented Generation (RAG).

RAG takes your files and turns them into embeddings, essentially a searchable index that Llama3 or any other LLM can reference during your chats. It turns your general-purpose chat into a very personalized chat, with the ability to query your documents, summarize information from them, perform comparative discovery and analysis across files, and assemble complex query responses that integrate information from multiple sources.

RAG is your new best friend. It makes your chats smarter, like really, really smart. The more documents you provide in your RAG-based sessions, the smarter your chats and output will get. Once you get used to doing this, you’ll never want to go back to plain old generic chats. Give it a try!

Click ‘LocalDocs’ in the left nav > Click ‘Add Collection’ > Name your Collection and browse your drive to select a folder containing the files (txt, PDF, or markdown) you want to add to your collection.

Technically speaking, what GPT4All has done here to abstract away much of the complexity of RAG is just elegant and wonderful. If you’ve ever tried to develop a RAG application using Langchain or other open-source tools, it’s definitely not a three-click affair as it appears here in GPT4All. So, consider yourself lucky for discovering GPT4All and make sure you take advantage of this fantastic feature for your next research or writing project.

As a simple example of how RAG can make you more productive, I use it to help write excerpts, headlines, and SEO content for each of my blog posts. I do this by creating a document library with my draft blog and all my related research content. Then I start a chat session with this document library and ask for short summaries and headline options based on the underlying blog content.

This approach saves me at least an hour or two at the time of publication by giving me all the companion content I need to publish without having to do all the tedious extra copywriting work. I <3 RAG!

Settings - There are a variety of settings available in the GPT4All ‘Settings’ module. Refer to the docs on the Nomic site to learn more about all the options.

There are a few ‘Model’ settings worth noting which will impact how your LLM behaves during your chat sessions:

Context length, max length - You can set the context length (how many characters you can input in your prompt), and Max length (maximum length of response from your LLM). If you plan to cut and paste long form text into your prompts, you may need to adjust the context length here. If you are prompting for long-form outputs like paragraphs or even pages of text, adjust the max length to a larger value.

Temperature, Top-P – In my evaluation of GPT4All, the default settings are in a goldilocks, or just right, range. I recommend leaving them in their default settings, however if you want to be adventurous you can increase temperature to introduce more variation in model outputs. Changing Top-P to a higher setting will make your model output less predictable.

Chat-with-MLX

Chat-with-MLX is a private LLM chat UI app for Mac that uses the new Apple MLX framework for machine learning research on Apple silicon. Like GPT4All, Chat-with-MLX allows you to easily download and install a large library of open-source LLMs to power your chats.

Chat-with-MLX distinguishes itself from GPT4All with native support for multilingual chat, including English, Spanish, Chinese, Vietnamese, and Turkish. If you need multilingual support out of the box, then Chat-with-MLX is a great option, as this feature is not currently offered by GPT4All.

Installation – The simplest way to install Chat-with-MLX is to install the Pinokio Gen AI App Browser for Mac and then install Chat-with-MLX with Pinokio. You can use the following steps to install both apps on your device:

1. Install Pinokio (see my recent post on Pinokio for a full look at Pinokio and it’s capabilities)

2. Install Chat-With-MLX

In the Pinokio UI, select ‘Discover’
In the search bar, type ‘MLX’
Select the ‘Chat-with-MLX’ App tile
Click ‘Download’
Install the Dependencies
* note, if you have not accepted the Apple Xcode license on your Mac, the dependency installation will not complete. Open Xcode and accept the license, and then return to Pinokio and compete the installation process

If you want to install Chat-with-MLX without Pinokio, you can find all the source files and install instructions here on GitHub.

Chat

Once installed, you’re ready to chat. Navigate to the ‘Chat’ tab and select a model under ‘Configuration’. Select your language preference and then click ‘Load Model’. Once again, I recommend selecting Llama3B-Instruct as your first model.

The first time you select load model, Chat-with-MLX will automatically download your selected model. Llama3B-Instruct is just under 4GB, so it may take some time. Once downloaded, load times should only take a few seconds. Once your download is complete, the model status will change to ‘Model Loaded’ and you will be ready to start chatting.

Simply type your first prompt in the message box and the bottom of the Chatbot screen. In this example, I’ve asked Llama3 to provide some suggestions for some morning exercises to help get my day started:

Notice that I’m using “Light” mode in my Mac appearance settings. Normally I prefer “Dark” mode, but I found the Chat-With-MLX UI is not responsive to Dark mode, which made some of the chat responses unreadable. I did not experience this limitation at all in the GPT4All UI.

Advance Settings

I recommend modifying some of the advance settings for chat sessions in Chat-with-MLX. As mentioned previously, I’ve found the default settings in GPT4All to be goldilocks, so reset Chat-with-MLX to match those found in GPT4All:

Set Temperature to .7
Set Top-P to .4
Set Repetition Penalty to 1.18
Increase Max Tokens to 1024

RAG Settings

Like GPT4All, Chat-with-MLX provides a Retrieval-Augmented Generation (RAG) capability that allows you to chat with your data, however the RAG features here are far less intuitive.

You can only add one file at a time via the RAG interface, up to a total of 10 files. There is no way to see which files are part of your current library of indexed files. To switch to a different file or set of files, you have to click “Stop Indexing”, but you lose the current library when you do so and must rebuild the index should you wish to interrogate that set of files again in a future chat.

Clearly more work needs to be done on the RAG UI feature set for Chat-with-MLX to reach the level of maturity found in the GPT4All UI. One advantage of Chat-with-MLX is that it does work natively with MS Word .Docx files, so you don’t have to convert these files to .TXT or .PDF files as is required with GPT4All.

*Chat-with-MLX RAG File upload and Indexing Dialogue Box*

Using the Retrieval Augmented Generation feature in Chat-with-MLX:

Open the ‘RAG Setting’ dialogue in the left part of the UI.
Choose ‘Files’ for the dataset type.
Click ‘Upload File’ to browse your machine for the file you want to upload. You can add up to ten .Docx, .Txt, or .PDF files to a document index.
Click ‘Start Indexing’ to embed the file so it’s available in your chat. Once your file has been indexed, the Index Status will change to ‘Indexing Done’. Repeat this for each file you wish to add to a document index. You’re now ready to chat with your file(s).

Now whenever you interact with Llama3 (or whatever model you have loaded) it will reference the file or files you’ve loaded and indexed here. If you want to revert to generic chat and no longer want the model to refer to these files, simply click ‘Stop Indexing’.

I found the RAG sessions in Chat-with-MLX to be hit or miss. When I ended a session with a document by clicking ‘Stop Indexing’ I assumed this meant a fresh start and that document was cleared from the session. But I found that future chats were still referring to older content. It became hard to keep track of which content the chat was referring to. In another session, Chat-with-MLX seemed to have lost all context for my documents and I had to restart the application to get RAG working again.

The Chat-with-MLX RAG tool does offer a unique Chat-with-YouTube RAG feature, which is designed to extract and embed a transcript of a YouTube video. As useful as that sounds, I was unable to get the feature working at the time of this writing.

The idea behind the YouTube RAG feature is intriguing nevertheless…you add a Youtube URL and Chat-with-MLX creates an embedding from the video transcript contents, from which you can then interrogate in a chat session. If you have troubleshooting tips for getting this feature to work, please leave them in the comments below!

Conclusion

While Chat-with-MLX provides a useful multi-lingual LLM chat app for Mac, it’s immature UI and limited, and sometimes buggy, RAG features make it seem more like a proof-of-concept demo than a fully realized private chat tool. If you need multi-lingual chat capabilities and are willing to work with its limitations, it might be worth consider installing and experimenting with Chat-with-MLX.

GPT4All is an excellent choice as a private LLM chat solution for Mac users. Its ability to easily setup and process complex RAG use cases, eg supporting chats that rely on multiple documents and document folders, make it an ideal tool for individuals who require a robust private LLM chat solution. GPT4All is easy to install on any M1 or later Mac device, is completely free, and its private nature ensures that all conversations remain confidential, making it an attractive option for professionals that need to chat with proprietary information sources.

Overall, GPT4All is a reliable and secure chat solution that can significantly enhance productivity and efficiency for Mac users. I enthusiastically recommend GPT4All as a private LLM chat solution for your Mac device.

Read More →

HubSpot Breeze: The Ultimate Customer Whisperer

By embedding AI natively across the HubSpot platform, Breeze is a force multiplier for your teams.

Read More →

The Rise of AI Agents: Salesforce Introduces Agentforce

With AgentForce, Salesforce is pioneering the future of AI-driven workflows, enabling seamless integration across its ecosystem of applications and services.

Read More →

Glazing Over: Protecting Creator Images from AI Models with Glaze

Intellectual property protection tool shields original photography, art, illustration, and other visual media against AI-generated derivatives.

Read More →

GPT4All and Chat-with-MLX: LLM Chat Apps for Your Mac

Deploy open-source LLMs on your Mac in less than 10 minutes with these private Chat UIs.

GPT4All vs Chat-with-MLX Comparison

Chat-with-MLX

Conclusion

NinjaTech releases SuperGPT: Supercharging AI Assistant with Llama3.1 405B

Imagen3: Google’s latest Text-to-Image Generation Model