FinAgent and Beyond: Multimodal Foundation Agents for Trading
Financial markets generate data in forms that most AI systems only partially understand. Prices come as time series. Sentiment lives in earnings calls, SEC filings, and Twitter threads. Chart patterns exist as images. A human trader synthesizes all of these modalities simultaneously, and until recently, no single AI system could do the same.
FinAgent changed that. Published by Zhang et al. in early 2024, FinAgent is a multimodal foundation agent that integrates numerical, textual, and visual market data into a unified reasoning loop, achieving over +36% cumulative profit improvement versus prior state-of-the-art methods on standard benchmarks. What makes it interesting is not just the performance, but the architectural patterns it introduces: dual-level reflection, structured memory retrieval, and modality-specific processing chains that generalize well beyond trading.
The Multimodal Input Pipeline
Most quantitative trading systems focus on one modality. Statistical models work with price and volume data. NLP systems analyze news and filings. FinAgent processes three modalities in parallel and fuses them before decision-making.
Numerical Data
Price data, volume, technical indicators (RSI, MACD, Bollinger Bands) are encoded into structured representations. FinAgent does not simply dump raw numbers into the LLM context. Instead, it pre-processes them into natural language summaries augmented with statistical signals.
Textual Data
News articles, analyst reports, social media sentiment, and earnings call transcripts are processed through a text encoder. Key entities, sentiment scores, and temporal relevance are extracted before being passed to the reasoning module.
Visual Data
This is the distinctive part. Candlestick charts, technical analysis patterns, and market heatmaps are processed through a vision encoder. The model learns to recognize patterns like head-and-shoulders, double bottoms, and support/resistance levels directly from chart images, the same way a technical analyst would.
from dataclasses import dataclass
from typing import List, Optional
import numpy as np
@dataclass
class MarketObservation:
"""Unified market observation combining all modalities."""
timestamp: str
numerical: dict # price, volume, indicators
textual: List[str] # news headlines, sentiment snippets
visual: Optional[np.ndarray] # chart image as array
metadata: dict # ticker, timeframe, source info
@dataclass
class ModalityEncoding:
"""Encoded representation from each modality processor."""
numerical_summary: str # LLM-friendly text summary of indicators
text_analysis: str # sentiment and entity extraction results
visual_patterns: str # detected chart patterns and confidence scores
combined_signal: str # fused reasoning prompt
def encode_observation(obs: MarketObservation) -> ModalityEncoding:
num_summary = summarize_indicators(obs.numerical)
text_analysis = analyze_sentiment(obs.textual)
visual_patterns = detect_chart_patterns(obs.visual)
combined = (
f"Market state for {obs.metadata['ticker']} at {obs.timestamp}:\n"
f"Technical: {num_summary}\n"
f"Sentiment: {text_analysis}\n"
f"Chart patterns: {visual_patterns}"
)
return ModalityEncoding(
numerical_summary=num_summary,
text_analysis=text_analysis,
visual_patterns=visual_patterns,
combined_signal=combined,
)
The key insight is that each modality is processed by a specialist encoder, then fused into a unified context for the LLM reasoner. This follows a retrieval-then-reasoning pattern where signals from different sources are orchestrated before the final decision step.
Dual-Level Reflection Mechanism
Reflection (the ability of an agent to evaluate and correct its own reasoning) is where FinAgent's architecture stands out. Most agent systems either have no reflection or use a simple self-critique prompt. FinAgent implements two distinct reflection levels.
Low-Level Reflection
After each trading action, the agent reviews the immediate outcome. Did the trade execute at the expected price? Was the position size appropriate? Were there signals the agent missed? This is a fast, tactical feedback loop that adjusts behavior within a trading session.
High-Level Reflection
Periodically (e.g., at the end of each trading day or week), the agent conducts a strategic review. It examines patterns across multiple trades: which modalities contributed the most to profitable decisions, which market regimes caused losses, and whether the overall strategy needs adjustment.
from enum import Enum
class ReflectionLevel(Enum):
TACTICAL = "tactical"
STRATEGIC = "strategic"
@dataclass
class ReflectionResult:
level: ReflectionLevel
analysis: str
adjustments: List[dict]
confidence_delta: float
def tactical_reflect(action: dict, outcome: dict, context: str) -> ReflectionResult:
"""Low-level reflection after each trade."""
prompt = f"""Review this trading action and its outcome.
Action: {action}
Outcome: {outcome}
Market context: {context}
Analyze:
1. Was the entry/exit timing appropriate?
2. Which modality signals were most predictive?
3. What would you do differently?
"""
analysis = llm_call(prompt)
adjustments = extract_adjustments(analysis)
return ReflectionResult(
level=ReflectionLevel.TACTICAL,
analysis=analysis,
adjustments=adjustments,
confidence_delta=compute_confidence_shift(outcome),
)
def strategic_reflect(trade_history: List[dict], period: str) -> ReflectionResult:
"""High-level reflection over a trading period."""
summary = summarize_trades(trade_history)
prompt = f"""Strategic review for {period}.
Trade summary: {summary}
Evaluate:
1. Overall strategy effectiveness
2. Market regime detection accuracy
3. Modality weighting - should we trust charts more or less?
4. Risk management adherence
"""
analysis = llm_call(prompt)
adjustments = extract_strategy_updates(analysis)
return ReflectionResult(
level=ReflectionLevel.STRATEGIC,
analysis=analysis,
adjustments=adjustments,
confidence_delta=compute_strategy_shift(trade_history),
)
This dual-level design mirrors how experienced traders operate. You make quick adjustments during the day and longer-term strategy shifts over weeks and months. The +36% profit improvement over prior methods comes partly from this: the agent gets better within a session and across sessions.
Memory Retrieval System
FinAgent maintains an episodic memory of past trading decisions and their outcomes. When facing a new market situation, the agent retrieves similar past episodes to inform its reasoning. This is essentially a RAG system specialized for financial decision-making.
The memory retrieval works across modalities. A current chart pattern can trigger retrieval of past situations with similar visual signatures, even if the numerical indicators were different. The similarity search relies on the same embedding and cosine-distance principles that power vector databases in other domains, but here applied to a decision-making context rather than document search.
from typing import Tuple
@dataclass
class TradingEpisode:
observation: ModalityEncoding
action: dict
outcome: dict
reflection: ReflectionResult
embedding: np.ndarray
class TradingMemory:
def __init__(self, capacity: int = 10000):
self.episodes: List[TradingEpisode] = []
self.capacity = capacity
def store(self, episode: TradingEpisode):
self.episodes.append(episode)
if len(self.episodes) > self.capacity:
self.episodes = self.episodes[-self.capacity:]
def retrieve(
self, query_embedding: np.ndarray, top_k: int = 5
) -> List[Tuple[TradingEpisode, float]]:
"""Retrieve most similar past trading episodes."""
similarities = [
(ep, cosine_similarity(query_embedding, ep.embedding))
for ep in self.episodes
]
similarities.sort(key=lambda x: x[1], reverse=True)
return similarities[:top_k]
Extensions: Agent Swarms for HFT and DeFi
The FinAgent architecture is designed as a single-agent system, but the patterns decompose naturally into multi-agent swarms. This matters for production trading systems that need to scale across instruments and strategies.
Specialist Agent Teams
Instead of one agent processing all modalities, you can deploy specialist agents that each own one modality and a coordinator agent that fuses their signals. This maps directly to the orchestrator-plus-specialists pattern common in multi-agent systems.
A numerical analyst agent monitors technical indicators and price action. A news agent processes real-time text feeds and extracts actionable signals. A chart pattern agent runs visual analysis on candlestick data. A risk management agent enforces position sizing rules and drawdown limits. The coordinator weighs their signals and executes trades.
Reinforcement Learning Integration
FinAgent's reflection mechanism is a form of self-supervised improvement, but you can go further by integrating explicit RL. The agent's trading actions become an RL policy, the market provides the reward signal, and the reflection mechanism becomes part of the value estimation.
The challenge, as always with RL in finance, is non-stationarity. Markets change regimes, and a policy learned in a bull market can be catastrophic in a crash. The dual-level reflection helps here by detecting regime changes at the strategic level and triggering policy adaptation. Hardware approaches like neuromorphic computing explore similar ideas of continuous adaptation at the architectural level, though applied to chip design rather than trading.
DeFi and On-Chain Applications
In decentralized finance, agent swarms can provide liquidity, execute arbitrage, and manage yield farming strategies across multiple protocols simultaneously. The multimodal aspect extends to on-chain data (transaction graphs, liquidity pool states) plus off-chain signals (governance proposals, social sentiment).
The architectural patterns are the same: modality-specific processing, memory retrieval for similar past market conditions, and dual-level reflection for continuous improvement. The difference is execution speed, since on-chain actions need to be atomic and gas-efficient, and the adversarial environment is far more hostile than traditional markets.
Key Takeaways
- FinAgent demonstrates that multimodal foundation agents can outperform specialized single-modality trading systems by fusing numerical, textual, and visual data in a unified reasoning loop.
- The dual-level reflection mechanism, with tactical post-trade review and strategic periodic assessment, is a general-purpose pattern applicable to any agent that needs to improve over time.
- Memory retrieval for decision-making (episodic RAG) enables agents to learn from past experiences without full retraining, bridging the gap between static models and adaptive systems.
- Single-agent architectures decompose naturally into multi-agent swarms by assigning modality specialists and a coordinator, following the orchestrator pattern familiar from distributed AI systems.
- Reinforcement learning integration is the logical next step, but requires careful handling of non-stationarity and regime changes in financial markets.
- DeFi and on-chain applications extend the same patterns to adversarial, high-speed environments where execution atomicity and gas efficiency become critical constraints.
Related Articles
Multimodal AI Agents: When AI Sees, Hears, and Acts
Discover how multimodal AI agents combine text, images, and audio to see, hear, and act. Practical examples from visual RAG to accessibility.
8 min read · beginnerAI AgentsBuilding a Multi-Agent AI System
Learn how to design, coordinate, and deploy robust multi-agent AI systems, from architecture and tools to failure modes and production concerns.
10 min read · advancedAI AgentsThe Multi-Agent Systems Explosion: 327% Adoption Growth in 2026
Why multi-agent AI systems are seeing 327% adoption growth in 2026 and what it means for startups, enterprises, and the future of automation
7 min read · beginner