Building Anna: A ComfyUI Pipeline for Virtual Influencers

1 July 2025 By Leonardo Bruni Key Collaborator: Giulia Salerno 12 min read

Generative AI Virtual Influencer ComfyUI Flux

After spending months testing different image and video models, we finally cracked the code for a virtual influencer pipeline that actually works. No manual uploads, no weird hacks – just ComfyUI orchestrating everything while Flux handles the pretty pictures and Wan makes them move. The best part? It posts straight to Instagram and TikTok using their official APIs.

I'm sharing the high-level approach here because honestly, the concept matters more than the specific tools. This stuff changes so fast that whatever we're using today might be different next month, but the workflow principles stay solid.

How We Think About It

The whole system follows one simple rule: create once, distribute everywhere. Here's the breakdown:

Persona Development

This is like creating a brand bible for a person who doesn't exist. We define everything – personality, style, what she wears, what colors work, what she'd never say. Think of it as character development for a digital human.

Image Generation with Flux

Flux cranks out the hero shots – thumbnails, carousel images, and clean keyframes that lock in composition and styling. It's remarkably consistent once you dial in the prompts.

Video Creation with Wan

Take those keyframes, add some direction, and Wan turns them into smooth vertical videos. We keep everything short, punchy, and story-driven because that's what works on social.

Automated Publishing

ComfyUI triggers a small service that talks directly to Instagram and TikTok's official APIs. No sketchy browser automation or manual posting – everything's above board and scalable.

Performance Tracking

We pull metrics from both platforms, analyze what's working, and feed that back into the system. The prompts get better, the content gets more targeted, and the whole thing keeps improving.

Why These Tools Work Together

We tried a bunch of different combinations before settling on this stack. Here's why these specific tools won out:

Flux for Static Content

Flux is just really good at controlled, consistent imagery. When you need the same character to look the same across hundreds of posts, consistency beats creativity every time. Plus it handles composition really well – important for thumbnails that need to grab attention.

Wan for Motion

Wan keeps the character looking like herself across video frames, which is harder than it sounds. Combined with solid keyframes from Flux, you get smooth motion that doesn't drift into weird territory halfway through a clip.

Together, they eliminate most of the guesswork. Flux establishes the visual language, Wan brings it to life.

The Technical Setup (Bird's Eye View)

I'm not going to get into specific node configurations because they change constantly, but here's how the pipeline flows:

Image Production Line

Prompt templates with personality and style locked in
Reference controls to keep the character consistent
Flux generation with automatic cropping for different platform formats
Output: thumbnails, carousel content, and video keyframes

Video Production Line

Hero keyframes from the image line plus scene direction
Wan generates 9:16 vertical videos (short clips work best)
Basic post-processing: stabilization, cleanup, watermarks if needed

Content Packaging

Caption templates that match the character's voice
Safety checks for brand compliance and content guidelines
Audio beds and sound effects where appropriate
Final output: ready-to-post MP4s with cover images

Publishing & Analytics

ComfyUI makes HTTP calls to our posting service
Service handles Instagram Graph API and TikTok API calls
Post IDs get logged for performance tracking
Weekly dashboards for content optimization

Keeping It Legal and Clean

We use a simple Python microservice with proper OAuth tokens to handle all API calls. ComfyUI sends "ready to publish" payloads, and the service handles rate limits, retries, and error handling. Everything goes through official channels – no gray area automation or ToS violations.

The Rules That Keep Us Sane

After making every mistake in the book, here are the guidelines that actually matter:

Visual Consistency

Keep a reference set of her best looks. Don't let the character drift – hair color, eye shape, and facial structure should stay locked in. Audiences notice when their favorite influencer suddenly looks different.

Content Strategy

Short beats work better than complex narratives. Wan handles 6-15 second clips beautifully, but longer content tends to get wobbly unless you're really specific about scene direction.

Voice and Tone

Define how she talks and stick to it. We have templates for different post types, but the underlying personality stays consistent. Audiences follow people, not random content generators.

Quality Gates

Auto-flag weird hands, text artifacts, or anything that doesn't match brand guidelines before it goes live. It's way easier to catch problems in the pipeline than to deal with them after posting.

Content That Actually Performs

Through trial and error, we've found these formats work consistently:

Micro-stories: 10-20 seconds with a setup, reveal, and call-to-action
Routine content: Morning routines, outfit checks, quick tutorials – stuff people can relate to
Carousel tips: Clean, simple slides with useful information (Flux is perfect for this)
Response videos: Clean reaction shots that work for duets and replies

What We Learned the Hard Way

Here's what nobody tells you about virtual influencers:

Consistency beats novelty. Followers want to see the same person every time. That amazing new style you generated? Save it for a special occasion, not Tuesday's post.

Shorter is almost always better. Wan works great on brief clips. Try to stretch beyond 15 seconds and you'll spend more time fixing problems than creating content.

Cover images drive everything. A great Flux thumbnail can make or break your reach. It's worth spending time on getting those first frames perfect.

You still need a human. Automation handles the grunt work, but someone needs to review content, choose prompts, and shape the overall narrative. The best virtual influencers feel authentic because real people are making creative decisions.

Minimal Starting Stack

ComfyUI with identity control nodes
Flux for image generation
Wan for video generation
Simple API bridge service (Python/Node.js)
Object storage for assets and logs
Basic dashboard for performance metrics

The Ethics Side

Look, virtual influencers are weird territory. We try to keep things transparent and ethical:

Use original or properly licensed music and voices
Disclose that the influencer is AI-generated where appropriate
Have human moderation for all content before it goes live
Respect platform guidelines and API terms of service
Avoid misleading claims or impersonation of real people

The technology is powerful, but it comes with responsibility. Use it thoughtfully.

What You Get

When everything's working, you have a content engine that can produce daily posts across multiple platforms without burning out your creative team. The character stays consistent, the quality stays high, and you can scale up or down based on what's actually performing.

It's not magic – it's just good engineering applied to social media. But sometimes that's exactly what you need to cut through all the noise and build something that lasts.

Want help setting up something similar for your project? The basic framework is adaptable to different characters, brands, and use cases. Just keep the core principles intact: consistency, quality gates, and official APIs only.

DAMN!