Most brand monitoring workflows have a blind spot. You’re testing prompts on your own machine, logged into your own accounts, and after you’ve searched your brand a hundred times, the AI already knows what you want to see. Personalized results make it nearly impossible to know how your brand actually appears to real users. You could be invisible in AI search and not even know it.
This post walks through a manual testing workflow for LLM brand visibility that removes that bias, fans out your prompts the right way, and gives you something you can actually use.
We also built a free tool on Hugging Face Spaces so you can run this yourself without needing to access the API directly.
In a previous article about AI visibility monitoring tools, we looked at SE Visible, Keyword.com, Scrunch, and a few others. This process complements that. Those tools are great at automated citation tracking at scale, and this article walks you through manual testing at the prompt level.
Why Manual Prompt Testing Still Matters
Automated tools are great for trend lines and share of voice over time. But they abstract away something important: the actual AI response.
When you want to know why your brand isn’t showing up, or how synthesized answers mention it, you need to go to the answer. You can’t get that from a dashboard alone.
The other reason manual testing matters is coverage. Your monitoring tool is only as good as the prompts you feed it. If your prompt set is shallow, you could be tracking the wrong things entirely. Before you scale anything, make sure you’re asking the right questions.
How To Test LLM Brand Visibility
Step 1: Start With a Core Query (Seed Intent)
Everything starts with one seed term, the core query that represents what you want to cover.
For an agency, it might be “AI search marketing” or “AI search monitoring tools.” For a SaaS company, it might be “project management software for remote teams.” One seed. That’s it. You’re going to generate everything from there.
Step 2: Fan Out Your Seed Across Multiple Queries
We built a query fan-out generator into the top of our Hugging Face Spaces tool.
Add your seed, select your axis, and it generates a set of variants for you. For this workflow, you don’t need 30 queries. A tight set of 10 to 15 well-constructed variants is more useful than 30 shallow ones.
This is where most people stop way too early. They run five or ten variations and call it done. The problem is that LLMs don’t answer queries. They answer intent, and intent varies across your audience, journey stage, and framing.

You need to fan out your seed across multiple axes:
- Role and audience — “best project management tool for developers” will return a different answer than “best project management tool for CFOs.” It’s the same product category with different model behavior.
- Journey stage — Are people trying to learn, compare, choose, or implement? Your brand may show up in compare queries, but get dropped in choose queries. That gap tells you something.
- Verb patterns — “how to” versus “best alternative” versus “checklist” aren’t just stylistic differences. They change the structure of the AI’s response and which sources it pulls.
- Constraints — Price point, region, compliance requirements. “HIPAA-compliant project management software” is a completely different retrieval problem than the generic version.
Duane Forrester, who has been in the SEO world for many years, covers this framework in much more detail in his book The Machine Layer. It’s worth a read if you want to go deeper on prompt construction.
Step 3: Manual Testing – Run Queries in a Clean Session
Before using the API or the Hugging Face tool, run your queries from step 2 manually.
Use an incognito browser window. This helps remove as much personalization as possible. You can run queries across ChatGPT, Perplexity, and Gemini without logging in.
Note: Claude will require you to log in, so it’s better to use the API workflow in step 4 to access the platform.
Once you add your seed query to each platform, watch for four things:
- Cited — Is your brand referenced with a direct attribution and source link?
- Mentioned — Is your name dropped, but with no citation or link?
- Paraphrased — Does your content appear to influence the answer without being attributed?
- Omitted — Does your brand not appear at all?
Log what you see. You’ll almost always get different outcomes across platforms for the same query, and that’s not a bug. Each model has a different training set, different retrieval logic, and different recency weighting. The differences tell you where your content is well represented and where it isn’t.
Step 4: Using the API With the Hugging Face Spaces Tool
Once you know what you’re looking for, that’s when you scale. The tool we built on Hugging Face Spaces runs your query set directly against the LLM APIs, not through a browser interface.

The reason this matters is the same reason we use clean sessions. API calls are sent to the model without user context. The model isn’t seeing your data or your history. It’s the closest thing we have to what a random user would actually see.
Here’s how it works:
- Use your fan-out queries from step 2
- Send them to the tester within the app
- Select which platforms you want to test
- Add your brand name and entity
Brands often get mentioned without being cited, and a mention still counts as visibility.

After you run it, the tool logs each response and returns results you can review and classify. Each result includes the platform, the query, the outcome, supporting evidence, and a diagnostic with suggestions.
Try it out! SMA Marketing’s Query Fan-Out Generator
The tool is free, but you do need your own API keys. They’re easy to get. Search for an API key generator, such as StrongDM. The cost of this query volume is minimal.
Step 5: Classify Your Results and Act
Once you have your results, group them into the four outcome types from step 3: cited, mentioned, paraphrased, or omitted.
Here’s what to do with each:
Omitted is your highest priority, especially in the compare and choose stages. Those are the highest-intent moments. If you’re not showing up there, you’re losing people who are already ready to make a decision. Create content that directly answers the query, not keyword stuffing, but substantive content built around that specific question.
Paraphrased content without attribution means your content is being used but not credited. Go back to that content and add clear entity markers. A strong H2, an explicit brand mention in the opening paragraph, and structured data on the page give the model something unambiguous to cite.
Inconsistency across platforms is a useful signal. If you’re being cited on Perplexity but omitted on ChatGPT, that points to a retrieval difference, often training data recency or source authority on that specific platform.
Key Takeaways and Action Plan
Think of this as a loop, not a one-time audit. Run it, fix it, run it again. Rerun the same queries two to four weeks after making changes and look for what shifted.
Here’s where to start:
- Pick one seed intent that matters to your business.
- Run it through the fan-out generator and get 10 to 15 variants.
- Run one query manually across ChatGPT, Perplexity, and Gemini in a clean session.
- Log what you see.
- Load the rest into the Hugging Face Spaces tool and let the API do the work.
- Classify your outcomes and fix your content gaps.
- Track your improvements with automated monitoring tools like SE Visible and Keyword.com, covered in our last video.
Manual testing tells you what’s happening right now. Those monitoring tools will show you what’s improving over time. You want both running together.
AI visibility testing is still new territory for most marketers, and most teams are flying blind. Getting this workflow in place now puts you ahead of the curve. If you’re looking for a partner to help you build it out, we’d love to help. Contact us to get started.
Until next time, happy marketing.