What’s interesting to me about social simulation as a kind of science–if we want to call it science–is that it’s fundamentally and unmistakably antirealist in its ontological commitments. What you do is you essentially create a world and then subject that world to close analysis. Modeling is literary; it’s a poetic genre of thought that creates analogues of historical social worlds.
My talk will cover three points. The first is just to introduce agent-based modeling as a computational technique; second, to discuss my own particular use of modeling to study the history of literary publishing, and third, and most importantly, to sketch out a theory and method for incorporating social simulation into an on-going program of historical research.
So what is agent-based modeling? Wikipedia as it often does, offers a serviceable definition: agent-based models “simulate the simultaneous operations and interactions of multiple agents, in an attempt to re-create and predict the appearance of complex phenomena.”
The software program which I’ve used is called NetLogo. As their website says, NetLogo “makes it possible to explore the connection between the micro-level behavior of individuals and the macro-level patterns that emerge from their interaction. … NetLogo lets students open simulations and “play” with them, exploring their behavior under various conditions.” It’s also an “authoring environment” for researchers.
So what does NetLogo look like? This model represents a bookseller’s business during the hand-press era. It’s designed around the research question: what are the main factors that drove the success or failure of book selling businesses, looking in particular at the cost structure of financing new editions. The monitors on the right track various statistics about how many editions and reprints are produced, inventory levels, and the like. On the left you see slider bars that control variables, such as per unit cost of printing, daily overhead, etc.
If I were to turn the costs way up, the seller would quickly go bankrupt. If I turned them down, the opposite would happen. And this is what it means to “play” with the models. You run it over and over again, testing it across as many variables as you can think of, and see what the simulation spits out. Sometimes you know for sure what will happen. If the costs are higher than the potential revenue, the business won’t last. But which factors are most important? What settings will enable the system to reach equilibrium? By what conditions will that equilibrium be disrupted?
Here’s a screen capture of NetLogo’s back end, where you write the code that creates the system. Each agent executes a series of commands called “procedures” that direct their behavior.
When I think of what agent-based modeling is, it’s first and foremost a genre of “expressive processing,” and if it has most in common with anything in the DH landscape, it’s not topic modeling or social network analysis, so much as game design, and in particular what’s called “serious” gaming. The designers of NetLogo are right to call it play. I sometimes describe it like this: “Imagine a Sims game where you get to write all the behaviors, control all the variables, and then tell the computer to run it on autopilot a thousand times, keeping stats of everything it does.” Call it a tool. Call it a toy. With this system, you get to play with your own thought experiments.
As with building a model airplane, modeling complex systems involves building as a form of play and play as a form of knowing.
But let me back up a minute. Every paper on modeling has to define the term at some point. So, what’s a model? What do they do? Well, models represent things. But what? And how?
If we think of fashion models, we can see they don’t actually represent people; they represent normative ideas about the human body.
In a very different but weirdly similar way, model organisms like laboratory mice don’t represent human bodies either, but in the field of medical research they serve as analogues, representatives of mammalian systems in general, including humans.
Theoretical models of the kind used in particle physics represent objects, which are not directly observable, and which might not exist in any case.
Physical scale models, like the papier-mâché volcano, serve to illustrate and visualize ideas about causal forces in geological systems.
What these all have in common is their condition of exemplarity. They don’t stand in for reality, exactly. Models don’t represent things, they exemplify things. They describe generic types, categories, theories, structures of relationality. Model represent ideas.
In the editors’ introduction to a recent book, Science without Laws, they write: “model systems do not directly represent [phenomena] as models of them. Rather, they serve as exemplars or analogues that are probed and manipulated in the search for generic (and genetic) relationships.” As the editors make clear, those generic and genetic relationships–those kinds and causes–are not really intrinsic to any phenomena, rather they are the concepts and theories that researchers bring to bear, subject to inquiry, and describe as “conclusions.”
So I use the word “model” in two senses. The first is just any interpretive framework we use to categorize the world. Model in this sense implies a cluster of ideas about what exists: what entities?, how do they behave?, and how do they interrelate? Such ideas might be specified to the point of being a theory, an interpretation, or they might be something more vague like a “frame of reference.” On the other hand, a model in the second sense is an object either constructed or chosen to represent that framework. A narration, simulation, case-study, or simply an example.
What does this mean for historical simulation in particular? In this context, we might see that simulations don’t represent the past–they’re neither factual nor counterfactual. Instead, they represent our ideas about the past. They function as analogues by teasing out our concepts analytically and subjecting them to new frames of comparison.
Models are, of course, not at all alien to historical inquiry. Robert Darnton’s model of the communications circuit tried to tease out relationships within the book trade. Ideas and books move throughout nodes in a network.
In such studies, what might be called ‘raw data’ is disconnected and often resistant to statistical analysis. However, lots of information is out there. Here is a list of edition sizes from a bookseller’s ledger.
Here are estimates from a period of intense controversy. How much did it cost to finance these editions?
Average costs can’t be calculated statistically, but demonstrated through individual examples. Here, for example, is a detailed breakdown of the cost structure of Mary Shelley’s Frankenstein. Detailed studies by DF McKenzie, Richard Sher, James Raven, John Feather, and others have shown how stationers dealt with changing business and intellectual environments. We don’t have sufficient data to generate statistics of the kind now frequently published in trade journals, but we have lots of data points that exemplify how the book trade functioned.
Historians like William St. Clair, from whom I’ve been cribbing the last few slides, gather this data together and stylize it into qualitative prose models. General explanations about how they think historical processes operated. St. Clair describes a “reader-led model” in which people have unfulfilled desires, and publishers look to satisfy those desires by reprinting old texts or getting new ones from authors. Like Darnton’s, St. Clair’s is an agent-based model, rooted in the behaviors of individuals and institutions.
In the context of humanistic explanation, computational modeling is perhaps best understood as an exercise in paraphrase from a prose model to a procedural one. And in my simulation of a bookshop, I really try to target point three here. What I want to do is isolate out the publisher’s role in the process; to see how these different activities, reprinting the old and disseminating the new, affect business in general. Before I shift into discussion of my model, it’s important to emphasize that an agent-based model won’t be a model of St. Clair’s data; instead, it’s a model of his model, a paraphrase of his description of the underlying causal forces that shape intellectual history.
Now with this theoretical background in mind, let me return to my initial model of the bookshop. Working from St. Clair’s compilation of the bookseller’s expenses and outputs, I generated a simple schematic of costs and inventory management. As you can see on the left, a number of variables are built into the system: the size of the book-buying public, the initial capital, daily overhead, fixed costs per new edition, unit cost per book, pulp rate for destroyed paper stock, print run size, and inventory size and diversity. A deliberately simple set of algorithms determine the readers’ book buying and the seller’s text acquisition–oversimplified, in fact, because this model is designed to focus on the comparative importance of categories of publishing costs. The money units have all been abstracted, and the unit price is 1, which is also the average price if any one book.
The tests that I devised all centered on what I call “time to bankruptcy.” How do changes in cost structure affect the average career life-span of a publisher? In some cases, as in the initial capital, we can see the model functioning in a very straightforward way, reproducing the assumptions built in. Here we can see that, all else being equal, the initial funding, the money I grant my publisher at setup, has a direct relationship to life span, measured in ticks that correspond loosely to a day. If the seller starts with twice as much money, it takes twice as long, on average, for his account to hit zero. Such results express simply and straightforwardly the assumptions built into it. Even if everything else in the model was completely wrong, this result would likely pertain, and so it tells us very little.
This one, though, is a little more interesting. Just as before, the y-axis represents the bookseller’s career life span. However, rather than compare life span to initial capital, this experiment looked at the unit cost of production per book: the paper and the printing. On the left, the blue chart shows the effect when the unit cost was turned way down. At a mere ten and twenty percent per unit, our seller had large margins very quickly made its fortune. However, when the unit cost was in the range of forty to fifty percent—what St. Clair estimates to be a likely range—we see that the bookseller’s business existed in a fairly long state of equilibrium, earning a steady income but accumulating or losing capital only gradually.
This next chart shows a more complex experiment. It correlates the effect of rising fixed costs—costs independent of the number of books actually produced in an edition, with the actual size of the edition itself. This would be costs like advertising: with bigger print runs, such costs are a smaller percentage of the total expenses, giving the seller larger margins. Would raising the fixed cost per edition create an incentive to produce larger editions? If so, the humps should get flatter and peak further to the right as costs increase. And that’s sort of what happened. I set the system to run about 1,250 times, gradually increasing the fixed cost, to see how long the career lasts. Each colored line represents how well the bookseller performed at different population levels. At each step we see a parabolic shape. As the per unit cost increased, if the edition size is too low, margins are negative, and the bookseller goes bankrupt no matter how big the market is. (At each of these stars, the edition size was set to 10 (the lowest setting) and in each case the lines converge at that point. The bookseller goes bankrupt just at the speed of the business itself, no matter how many customers it has.) On the other side of the hump, if the print runs get too big, catastrophic losses quickly destroy the business. In this model the threshold for an edition size being too big is anything more than about 10% of the buying public.
However, what I notice about this chart is how veered to the left these curves are. I expected that with bigger, denser readerships, more and more customers, that there’d be a stronger incentive for the larger margins made possible by bigger print runs. I was expecting a more normally distributed parabolic structure to these humps, but they’re all skewed left. With only a couple of exceptions, the peak performance tends to hit right around the threshold of profitability. This means that in this model, a bookseller who has to choose between razor-thin margins or the risk of large losses is on average much better off going with small margins. I might have suspected this if I’d thought it through, but I didn’t know it would come out this way.
A few initial observations. In the world of this model, avoiding catastrophic loss very important, even if it means lower profit margins. Inventory management crucial. This means the number one problem for publishers in this model is uncertainty and risk management. What I notice about this, then, as the designer of the model, is that nothing in the model accounts for risk management strategies that booksellers engaged in. Cost sharing through joint publication, subscription publishing, author-financing. None of these are possible in a single publisher model. How can we fix this problem?
The next step in the research project, looking forward, will involve stylizing historical data to conform to our needs. Our tests all involve publisher life-span as a key metric, so how long were the typical gaps between first imprint and last, for example. We’ll want to look at publishers’ checklists, advertisements, records of their inventories and the timelines of text acquisition. The initial model has shown that we need to see how booksellers dealt with the problem of risk: by looking at imprints we’ll be able to see, for example, that the single-publisher model doesn’t account for the many books that were jointly financed in various ways, and we’ll get a sense of what kinds of books stayed on the shelves–or at least were thought to be worth advertising–and what weren’t. The point of reconfiguring historical data in this way won’t be to “validate” the model, but rather to expose its weaknesses and identify further areas of research.
What changes can we anticipate? I’m sure, for example, that looking at booksellers’ catalogues will require a new model for books themselves. In the initial model, books were treated as more or less homogenous objects marked by price-points distributed on a normal curve, but in fact there were different kinds of books (folios, quartos, newspapers, pamphlets) categories that often corresponded to price points. We’ll also find, I’m sure, that many books carried the names of multiple booksellers or none at all. This will mean paying attention to the interactions among publishers, tracing commercial networks of partnership, piracy, import and export, as well as systems of intellectual property and censorship.
As we tackle each of these cases, we may decide to go back and revise the original model–and perhaps we’ll find something that suggests the price-controls in it were off in a way that can be profitably corrected. More likely, though, what we’ll need are a number of different, small, targeted models, each suited to the historical question that motivates their individual creation.
Modeling in the humanities should thus be understood as an ongoing, recursive process of continual correction. After developing a working understanding of the historical process, then create a working model that simulates some aspect you initially take to be important. Then you “play” with the model, testing its operations and constraints to identify key functions and meaningful aporia. (This is where I’m at now with my own work.) The next phase is to generate new historical data as commensurate as possible with what you’ve produced, expecting that no fit will be perfect, and that you’ll have to revise the model and create new ones to simulate new things. Then you play some more, and the process begins again.
Although its outputs are often expressed numerically, ABM remains at its heart a qualitative, even creative, genre of scholarly expression. As I said at the start, it’s is a poetic genre that creates digital analogues of social worlds.
— Michael Gavin, “Agent-Based Modeling and Historical Simulation” Digital Humanities. Lincoln, NE. July 19, 2013.