Prediction markets are commonly treated as mechanisms for information aggregation. Under this view, market prices represent continuously updated estimates of event likelihood, formed through the interaction of dispersed participants processing public and private signals. In the strongest interpretation, deviations between price and “true” probability should be rapidly arbitraged away. In practice, however, publicly available information does not arrive to participants in a uniform, atomized, and simultaneously interpretable form. Rather, it emerges through bursts of coverage, repeated framing, semantic drift, selective amplification, and attention bottlenecks.
This paper begins from a simple observation: information reaches markets not only as facts, but as narratives. These narratives possess measurable structure. They have onset, acceleration, saturation, and decay. They cluster around entities, events, and outcomes. They may intensify faster than market prices respond, particularly in markets characterized by thin liquidity, fragmented participation, semantic mismatch between contract wording and real-world events, or asynchronous trader attention. The result is a class of opportunities best described not as purely probabilistic disagreements, but as instances of information lag.
The framework proposed here treats prediction markets as dynamic response systems subject to latency. Instead of asking only whether a quoted price is “correct,” the system asks whether the informational environment relevant to that contract is evolving faster than the contract is repricing. The proposed construct for measuring this informational environment is called narrative velocity.
Let \( p_m(t) \) denote the observed market price of contract \( m \) at time \( t \). Standard interpretation treats \( p_m(t) \) as a probability estimate, or at minimum as a sufficient statistic for the current state of consensus belief. We instead decompose this more carefully.
where \( B_m(t) \) is the aggregate belief state relevant to market \( m \). We then model belief itself as a function of information flow:
so that:
The critical claim is that \( I_m(t) \) is not directly equivalent to “the existence of public information.” It is instead the result of how information is produced, repeated, interpreted, and linked to a market question. Public information may exist without yet becoming high-velocity narrative information. Likewise, narrative pressure may increase before its full implications are transmitted into price.
Narrative velocity is intended to capture the effective rate at which market-relevant information is forming in the surrounding news field. It is not merely article count, sentiment, or topic frequency. It is a weighted aggregate sensitive to time, recurrence, and semantic linkage.
where:
This specification attempts to preserve three intuitively important properties. First, a signal should matter less as it becomes stale. Second, repeated mention across independent or semi-independent sources should increase significance, although not necessarily linearly. Third, a signal should matter only to the extent that it is genuinely linked to the market under evaluation. The semantic linkage term \( r_{im} \) is therefore crucial; it guards against noisy topical overlap and rewards high-specificity alignment between event language and contract language.
Narrative information alone is insufficient for ranking markets. A contract may be tightly connected to a rapidly forming narrative and yet remain impractical to trade due to weak liquidity, negligible volume, or already efficient pricing. For this reason, we define a separate market-structure term:
where:
This term is not intended as a full microstructure model. Rather, it is a tractable approximation of market responsiveness, tradability, and current attention. The use of logarithmic scaling for volume and liquidity prevents outsized markets from overwhelming the score while preserving directional information.
The system then defines a composite signal:
for tunable weights \( \alpha \) and \( \beta \). The central hypothesis is not simply that higher \( S_m(t) \) implies better trading opportunities, but that a particular dynamic condition is especially informative:
When the derivative of the narrative field significantly exceeds the derivative of market price, the market may be in a transient lag regime. Put differently, the surrounding information environment is accelerating faster than the market’s estimate is updating. This is the core condition under which the system expects informational edge to emerge.
In practical terms, the framework distinguishes between at least three cases. First, there are markets with low narrative signal and weak market structure; these should generally be ignored. Second, there are markets with strong narrative signal but already strong repricing; these may be informationally interesting but no longer attractive. Third, there are markets in which narrative support is increasing while price remains comparatively muted. These are the principal candidates for ranking as high-conviction opportunities.
This differs from conventional browsing of prediction market boards, where users often scan by volume, popularity, or raw probability. Here, the target is not popularity but lag-adjusted responsiveness. The system is therefore less a screener than a ranking engine over a coupled information-price surface.
The semantic relevance term may be implemented in a lightweight way using weighted keyword matching, phrase overlap, and stopword suppression, or in a richer way using embeddings or transformer-based similarity. The minimal version already provides useful structure when contract wording is sufficiently explicit and narratives remain lexically anchored.
Exponential decay is adopted for simplicity and interpretability, although alternative kernels may be appropriate. A sharper decay function emphasizes breaking developments; a flatter one captures slow-moving structural narratives. Parameter selection is ultimately an empirical question that should be calibrated to the cadence of the target market universe.
Repetition across titles, sources, or paraphrased variants is treated as a reinforcement term rather than as pure duplication. This is important because narratives are often socially amplified through repeated framing. However, naïvely counting repeated text risks overweighting syndication. A more mature version of the system would therefore cluster near-duplicate events before applying frequency weights.
Several limitations should be stated clearly. First, not all narrative acceleration is informative; some is purely reflexive media amplification. Second, semantic matching may fail when contracts are written in unusual terms or when relevant information is highly indirect. Third, thin markets may appear attractive precisely because they reprice slowly, but this same slowness may impair execution. Fourth, article-based systems can mistake salience for direction unless directional language is modeled explicitly.
More generally, this framework does not assume that all price lag is exploitable after costs or that all narrative divergence resolves in the predicted direction. It asserts only that the lag itself is measurable and that ranking markets by this lag may offer a useful research and trading workflow.
The broader implication is that prediction markets should not be treated exclusively as final-form probability objects. They should also be modeled as temporally extended response systems embedded in a larger informational ecology. In such a setting, news is not merely background context. It is a state variable. Narrative formation is not epiphenomenal. It is one of the mechanisms by which price becomes what it is.
This perspective aligns naturally with a hybrid research program drawing from market microstructure, information theory, NLP, event studies, and ranking systems. It suggests that “edge” is often not a matter of private information versus public information, but of structured public information moving faster than collective repricing.
This paper proposed a formal and operational account of narrative velocity as a measurable signal for prediction-market analysis. By distinguishing between raw information presence and dynamic narrative formation, and by combining that distinction with market structure variables, the framework shifts analysis away from static probability interpretation and toward dynamic information-flow modeling.
The central proposition can be stated compactly: markets do not merely reflect beliefs; beliefs themselves are shaped by the speed, recurrence, and semantic organization of incoming information. When those forces move faster than price, information lag may become visible. The objective of the system is to detect and rank precisely those moments.
Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance, 25(2), 383–417.
Bikhchandani, S., Hirshleifer, D., & Welch, I. (1992). A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of Political Economy, 100(5), 992–1026.
Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62(3), 1139–1168.
Kogan, S., Levin, D., Routledge, B., Sagi, J., & Smith, N. A. (2009). Predicting risk from financial reports with regression. NAACL Proceedings.