← Blog
May 27, 2026·8 min read

One Submit Call, Five Outputs: How Botverse Workflows Replace Multi-Step Agent Pipelines

A conference recording needs to become an MP3, a timestamped transcript, structured meeting notes, an action items spreadsheet, and a thumbnail. In a typical agent pipeline that's five sequential steps with the transcript in context the whole time. In Botverse it's one workflow definition, submitted once.

A 20-minute all-hands recording. Five deliverables needed by end of day: an MP3 audio track for the archive, a timestamped VTT transcript, structured meeting notes in Markdown, an action items spreadsheet with timecodes, and a still frame from the moment the team discussed the product roadmap.

In a typical agent pipeline, this is a sequential dependency graph. Extract audio. Wait. Transcribe. Wait. Generate notes. Wait. Generate spreadsheet. Wait. Extract thumbnail. Four dependency waits in a row, with the full meeting transcript sitting in the model's context for every step after transcription completes.

With Botverse's workflow engine, it is one submit call.

The workflow definition

The Botverse Workflow Definition Language (BWDL) describes the entire pipeline as a JSON object. Each step declares its tool and its inputs. Steps declare their dependencies via depends_on. Steps with no unmet dependencies dispatch in parallel. The agent submits the definition once and polls for completion.

The depends_on pattern will be familiar to anyone who has worked with modern workflow orchestration systems. Argo Workflows and Netflix Conductor (both open source, Apache 2.0) use the same DAG-based dependency model. BWDL applies those concepts to MCP tool calls, making the same orchestration patterns accessible to any LLM agent without requiring the agent to manage infrastructure.

Here's how the conference call pipeline would be written:

{
  "workflow_id": "meeting-post-production",
  "version": "1.0",
  "description": "All-hands recording — five parallel deliverables",
  "params": {
    "source_url": "https://storage.company.com/recordings/q2-allhands.mp4",
    "action_point_timecode": "00:14:22"
  },
  "steps": [
    {
      "id": "extract_audio",
      "tool": "transcode_from_url",
      "inputs": {
        "source_url": "$.params.source_url",
        "output_format": "mp3"
      },
      "failure_mode": "HALT"
    },
    {
      "id": "transcribe",
      "tool": "transcribe_audio",
      "depends_on": ["extract_audio"],
      "inputs": {
        "source_url": "$.steps.extract_audio.output_url",
        "output_format": "vtt"
      },
      "failure_mode": "HALT"
    },
    {
      "id": "meeting_notes",
      "tool": "convert_from_url",
      "depends_on": ["transcribe"],
      "inputs": {
        "source_url": "$.steps.transcribe.output_url",
        "output_format": "md"
      },
      "failure_mode": "CONTINUE"
    },
    {
      "id": "action_items",
      "tool": "convert_from_url",
      "depends_on": ["transcribe"],
      "inputs": {
        "source_url": "$.steps.transcribe.output_url",
        "output_format": "xlsx"
      },
      "failure_mode": "CONTINUE"
    },
    {
      "id": "action_thumbnail",
      "tool": "transcode_from_url",
      "description": "Extract frame at action point discussion",
      "inputs": {
        "source_url": "$.params.source_url",
        "output_format": "gif",
        "options": {
          "start_time": "$.params.action_point_timecode",
          "duration": 3
        }
      },
      "failure_mode": "CONTINUE"
    }
  ]
}

The dependency graph this defines: extract_audio runs first. transcribe runs after it completes. Once transcription completes, meeting_notes and action_items dispatch simultaneously — both depend only on transcribe. action_thumbnail has no depends_on at all and dispatches immediately in parallel with everything else, reading directly from $.params.source_url.

Three steps run in parallel during the final phase. The agent never had to think about that — the dependency declarations made it automatic.

How the expression syntax works

The $.steps.extract_audio.output_url pattern is JSONPath-style expression resolution built into the BWDL runtime. When a step is ready to dispatch, Botverse evaluates any expression values in inputs against the current workflow state.

$.params.source_url resolves to the parameter passed at submission time. $.steps.extract_audio.output_url resolves to the pre-signed download URL of the completed audio extraction job. When that URL appears in a source_url input field, Botverse fetches the file server-side as part of job dispatch — the agent never downloads or re-uploads intermediate files.

The expression syntax also supports conditional step execution via when. A step can declare "when": "$.steps.transcribe.status == 'COMPLETED'" and will be skipped rather than dispatched if the condition is false — useful for branching pipelines where some outputs are only needed if earlier steps succeed.

Token and time comparison

For a 20-minute recording producing roughly 2,600 words of transcript (~3,500 tokens), here is what the two approaches look like:

Sequential agent pipeline without a workflow engine:

  • Transcription API returns ~3,500 tokens of transcript text into the agent's context
  • Agent passes transcript to LLM for meeting notes generation: ~3,500 input + 1,000 output = 4,500 tokens, $0.026
  • Agent passes the full transcript again for action item extraction: ~3,500 input + 600 output = 4,100 tokens, $0.020
  • Separate tool calls for audio extraction and thumbnail run sequentially, waiting on each other
  • Total agent-side tokens: ~13,000–15,000 (transcript re-enters context for each downstream step)
  • Total wall clock time: each step waits for the previous — typically 12–18 minutes for a 20-minute recording

Botverse BWDL workflow:

  • Submit workflow definition: ~300 tokens
  • Status polls over the run (~10 polls at 3-second intervals): ~800 tokens total
  • Final result review: ~200 tokens
  • Total agent-side tokens: ~1,300
  • The transcript never enters the agent's context window at any point
  • Total wall clock time: audio extraction (2–3 min for 20-min video) + transcription (~1 min) + parallel conversion and thumbnail (10–15 sec) = approximately 4–5 minutes total
Metric Sequential pipeline Botverse BWDL
Agent-side tokens ~14,000 ~1,300
Transcript in agent context Yes — every step Never
Agent LLM cost ~$0.07 ~$0.004
Steps run in parallel None (sequential) 3 of 5
Wall clock time 12–18 minutes 4–5 minutes
Agent submit calls 5 (one per step) 1

Why this maps naturally to how agents think

The BWDL pattern fits how LLM agents reason about multi-step tasks. An agent asked to "process the all-hands recording and produce the standard post-meeting deliverables" can compose the workflow definition in a single turn, using its knowledge of what outputs are needed and where the source file lives. It then delegates the execution entirely and moves on to other tasks — checking the calendar, drafting the follow-up email, updating the task tracker.

The alternative — managing a five-step dependency graph through sequential tool calls, tracking intermediate job IDs, and passing transcript content into context repeatedly — is work that has no business being inside the model. It runs the risk of failing without clear errors, hitting a token or session limit mid-pipeline, or producing unpredictable results from an overloaded context window. It is orchestration logic, not reasoning. The workflow engine handles it so the agent does not have to.

Failure modes and partial results

The BWDL failure mode declarations are worth noting. Each conversion step in the example declares "failure_mode": "CONTINUE". If the meeting_notes Markdown conversion fails (say, the VTT transcript format was unexpected), the action_items spreadsheet and action_thumbnail steps still complete. The workflow reaches PARTIALLY_FAILED rather than FAILED, and the agent receives whichever outputs succeeded and structured feedback on what did not.

A partial result is almost always more useful than a full rollback. The agent gets three of five deliverables and a structured error describing why the fourth failed — something it can act on, report to the user, or retry with adjusted parameters. The inline sequential approach offers none of this: a failure partway through the pipeline leaves the agent with no outputs and no clear recovery path.

What ships today

The workflow engine is live. submit_workflow, get_workflow_status, and cancel_workflow are all available via MCP today. The full BWDL specification — including expression syntax, conditional steps, retry configuration, and failure modes — is documented at botverse.cloud/docs/workflows.

Supported tools in workflows today: transcode_from_url, transcode_video, convert_content, convert_from_url, and convert_file. transcribe_audio — used in the conference pipeline example above — is in development and will be the next capability added to the workflow engine.

The conference call pipeline is one of the most common patterns we see across industries: ingest a media asset, produce multiple derivative formats from it in parallel, deliver the outputs. The job of the agent in that pattern is to define what is needed and act on what comes back. Botverse handles everything in between.

Ready to connect your agent to Botverse? Set up in five minutes. No contracts, no minimums.

Ready to connect your agent to Botverse?

Set up in five minutes. No contracts, no minimums.

Get started