How I Automated Away the Annoying 25 Tool Call Limit in Cursor IDE
June 19, 2025Why My GPT-4o in Long Context Mode Was Failing and How I Fixed It in Cursor IDE
June 19, 2025As a developer who loves experimenting with AI tools, I really wanted to run my own language models locally in Cursor. Why? For better privacy, custom tweaks, and offline coding sessions. But when I first tried, I hit a wall—Cursor seemed designed only for cloud services. After some tinkering, I found a clever solution that actually works. Let me show you how I did it.
The Core Problem: Why Local LLMs Aren’t Plug-and-Play
I was excited to run models like Llama on my own machine—no cloud costs, total control over sensitive code. But Cursor, my go-to IDE, focuses on cloud APIs like OpenAI. It doesn’t connect to local servers by default. That meant I couldn’t just point it to my LLM running on localhost. My plans for offline coding felt stuck.
The Solution I Discovered: Tunneling to Bridge the Gap
After testing different approaches, I landed on tunneling. Here’s the trick: make your local server appear like it’s online. This fools Cursor into treating it as a cloud service. You keep the benefits of custom models, though true offline use isn’t possible yet.
My Step-by-Step Process to Make It Work
Here’s how I set everything up using Llama 3.1 70B. You’ll need your local LLM running and a tunneling tool—I chose ngrok for its simplicity.
- Get your local LLM running: First, install and run your model (I used Llama). I set up LiteLLM locally to handle API requests, making sure it matched OpenAI’s format.
- Create your tunnel: Fire up ngrok to expose your server. If your LLM uses port 8000, run:
ngrok http 8000
. You’ll get a public URL likehttps://your-tunnel.ngrok.io
—save this. - Update Cursor settings: In Cursor’s settings, enable the custom OpenAI API option. Paste your ngrok URL into the base URL field. Triple-check your model’s name—spelling and case matter here.
- Try it out: Once saved, use cmd+K for chat. Basic tasks worked perfectly for me, though advanced features like Composer need Cursor’s cloud services.
What Worked and the Limitations I Faced
This setup let me use my custom Llama model for real coding tasks—saving money and adding flexibility. But I still needed internet for the tunnel, and Cursor’s cloud indexing meant my data wasn’t fully local. While not perfect for offline use, it’s great for privacy-focused work.
Final Thoughts: Why It’s Worth Trying
If you want to use local LLMs with Cursor, this approach moves you forward. It’s not flawless, but it lets you customize your AI coding setup while cutting costs. Give it a try—you might find it changes how you work.