testdoc2.tex 53.2 KB
 Eckhart Arnold committed Feb 23, 2017 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946  \documentclass[12pt, english, a4paper]{article} \begin{document} \title{What's wrong with social simulations?} \date{September 2013; last revision: March 2016} \maketitle \begin{abstract} This paper tries to answer the question why the epistemic value of so many social simulations is questionable. I consider the epistemic value of a social simulation as questionable if it contributes neither directly nor indirectly to the understanding of empirical reality. In order to justify this allegation I rely mostly but not entirely on the survey by \citet{heath-et-al:2009} according to which 2/3 of all agent-based-simulations are not properly empirically validated. In order to understand the reasons why so many social simulations are of questionable epistemic value, two classical social simulations are analyzed with respect to their possible epistemic justification: Schelling’s neighborhood segregation model \citep{schelling:1971} and Axelrod’s reiterated Prisoner’s Dilemma simulations of the evolution of cooperation \citep{axelrod:1984}. It is argued that Schelling’s simulation is useful, because it can be related to empirical reality, while Axelrod’s simulations and those of his followers cannot be related to empirical reality and therefore their scientific value remains doubtful. Finally, I critically discuss some of the typical epistemological background beliefs of modelers as expressed in Joshua Epsteins’s keynote address Why model?'' \citep{epstein:2008}. Underestimating the importance of empirical validation is identified as one major cause of failure for social simulations. \end{abstract} \newpage \tableofcontents \section{Introduction} In this paper I will try to answer the question: Why is the epistemic value of so many social simulations questionable? Under social simulations I understand computer simulations of human interaction as it is studied in the social sciences. The reason why I consider the epistemic value of many social simulations as questionable is that many simulation studies cannot give an answer to the most salient question that any scientific study should be ready to answer: “How do we know it’s true?” or, if specifically directed to simulation studies: “How do we know that the simulation simulates the phenomenon correctly that it simulates?” Answering this question requires some kind of empirical validation of the simulation. The requirement of empirical validation is in line with the widely accepted notion that science is demarcated from non-science by its empirical testability or falsifiability. Many simulation studies, however, do not offer any suggestion how they could possibly be validated empirically. A frequent reply by simulation scientists is that no simulation of empirical phenomena was intended, but that the simulation only serves a “theoretical” purpose. Then, however, another equally salient question should be answered: “Why should we care about the results?” It is my strong impression that many social simulation studies cannot answer either this or the first question. This is not to say that the use of computer programs for answering purely theoretical questions is generally or necessarily devoid of value. The computer assisted proofs of the four color theorem \citep{wilson:2002} are an important counterexample. But in the social sciences it is hard to find similarly useful examples of the use of computers for purely theoretical purposes. In any case, the social sciences are empirical sciences. Therefore, social simulations should contribute either directly or indirectly to our understanding of social phenomena in the empirical world. There exist many different types of simulations but I will restrict myself to agent-based and game theoretical simulations. I do not make a sharp difference between models and simulations. For the purpose of this paper I identify computer simulations just with programmed models. Most of my criticism of the practice of these simulation types can probably be generalized to other types of simulations or models in the social sciences and maybe also to some instances of the simulation practice in the natural sciences. It would lead too far afield to examine these connections here, but it should be easy to determine in other cases whether the particulars of bad simulation practice against which my criticism is directed are present or not. In order to bring my point home, I rely on the survey by \citet{heath-et-al:2009} on agent-based modeling practice for a general overview and on two example cases that I examine in detail. I start by discussing the survey which reveals that in an important sub-field of social simulations, namely, agent based simulations, empirical validation is commonly lacking. After that I first discuss Thomas Schelling’s well-known neighborhood segregation model. This is a model that I do not consider as being devoid of epistemic value. For, unlike most social simulations, it can be empirically falsified. The discussion of the particular features that make this model scientifically valuable will help us to understand why the simulation models discussed in the following fail to be so. The simulation models that I discuss in the following are simulations in the tradition of Robert Axelrod’s “Evolution of Cooperation” \citep{axelrod:1984}. Although the modeling tradition initiated by Axelrod has delivered hardly any tenable and empirically applicable results, it still continues to thrive today. By some, Axelrod’s approach is still taken as a role model \citep[208-209]{rendell-et-al:2010a}, although there has been severe criticism by others \citep{arnold:2008, binmore:1994, binmore:1998}. Finally, the question remains why scientists continue to produce such an abundance of simulation studies that fail to be empirically applicable. Leaving possible sociological explanations like the momentum of scientific traditions, the cohesion of peer groups, the necessity of justifying the investment in acquiring particular skills (e.g. math and programming) aside, I confine myself to the ideological background of simulation scientists. In my opinion the failure to produce useful results has a lot to do with the positivist attitude prevailing in this field of the social sciences. This attitude includes the dogmatic belief in the superiority of the methods of natural sciences like physics in any area of science. Therefore, despite frequent failure, many scientists continue to believe that formal modeling is just the right method for the social sciences. The attitude is well described in \citet{shapiro:2005}. Such attitudes are less often expressed explicitly in the scientific papers. Rather they form a background of shared convictions that, if not simply taken for granted as “unspoken assumptions”, find their expression in informal texts, conversations, blogs, keynote speeches. I discuss Joshua Epstein’s keynote lecture “Why Model?” \citep{epstein:2008} as an example. \section{Simulation without validation in agent-based models} In this section I give my interpretation of a survey by \citet{heath-et-al:2009} on agent-based-simulations. I do so with the intention of substantiating my claim that many social simulations are indeed useless. This is neither the aim nor the precise conclusion that \citet{heath-et-al:2009} draw, but their study does reveal that two thirds of the surveyed simulation studies are not completely validated and the authors of the study consider this state of affairs as not acceptable'' \citep[4.11]{heath-et-al:2009}. Thus my reading does not run counter the results of the survey. And it follows as a natural conclusion, if one accepts that a) an unvalidated simulation is - in most of the cases - a useless one and b) agent-based simulations make up a substantial part of social simulations. The survey by \citet{heath-et-al:2009} examines agent-based mode- ling practices between 1998 and 2008. It encompasses “279 articles from 92 unique publication outlets in which the authors had constructed and analyzed an agent-based model” (Heath, Hill and Ciarallo, 2009, abstract). The articles stem from different fields of the social sciences including, business, economics, public policy, social science, traffic, military and also biology. The authors are not only interested in verification and validation practices, but the results concerning these are the results that I am interested in here. Verification and validation concern two separate aspects of securing the correctness of a simulation model. Verification, as the term is used in the social simualtions community, roughly concerns the question whether the simulation software is bug-free and correctly implements the intended simulation model. Validation concerns the question whether the simulation model represents the simulated empirical target system adequately (for the intended purpose). Regarding verification, Heath, Hill and Ciarallo notice that Only 44 (15.8\%) of the articles surveyed gave a reference for the reader to access or replicate the model. This indicates that the majority of the authors, publication outlets and reviewers did not deem it necessary to allow independent access to the models. This trend appears consistently over the last 10 years'' \citep[3.6]{heath-et-al:2009}. This astonishingly low figure can in part be explained by the fact that as long as the model is described with sufficient detail in the paper, it can also be replicated by re-programming it from the model description. It must not be forgotten that the replication of computer simulation results does not have the same epistemological importance as the replication of experimental results. While the replication of experiments adds additional inductive support to the experimental results, the replication of simulation results is merely a means for checking the simulation software for programming errors (“bugs”). Hence the possibility of precise replication is not an advantage that simulations enjoy over material experiments, as for example \citet[248]{reiss:2011} argues. Obviously, if the same simulation software is run in the same system environment the same results will be produced, no matter whether this is done by a different team of researchers at a different time and place with different computers. Even if the model is re-implemented the results must necessarily be the same provided that both the model and the system environment are fully specified and no programming errors have been made in the original implementation or the re-implementation.\footnote{A possible exception concerns the frequent use of random numbers. As long as only pseudo random numbers with the same random number generator and the same “seed” are used, the simulation is still completely deterministic. This not to say that sticking to the same “seeds” is good practice other than for debugging.} Replication or reimplementation can, however, help to reveal such errors.\footnote{I am indebted to Paul Humphreys for pointing this out to me.} It can therefore be considered as one of several possible means for the verification (but not validation) of a computer simulation. Error detection becomes much more laborious if no reference to the source code is provided. And it does happen that simulation models are not specified with sufficient detail to replicate them \citep{will-hegselmann:2008}. Therefore, the rather low proportion of articles that provide a reference to access or replicate the simulation is worrisome. More important than the results concerning verification is what Heath, Hill and Ciarallo find out about validation or, rather, the lack of validation: \begin{quote} Without validation a model cannot be said to be representative of anything real. However, 65\% of the surveyed articles were not completely validated. This is a practice that is not acceptable in other sciences and should no longer be acceptable in ABM practice and in publications associated with ABM. \citep[4.11]{heath-et-al:2009} \end{quote} This conclusion needs a little further commentary. The figure of 65\% of not completely validated simulations is an average value over the whole period of study. In the earlier years that are covered by the survey hardly any simulation was completely validated. Later this figure decreases, but a ratio of less than 45\% of completely validated simulation studies remains constant during the last 4 yours of the period covered \citep[3.10]{heath-et-al:2009}. Furthermore it needs to be qualified what Heath, Hill and Ciarallo mean when they speak of complete validation. The authors make a distinction between conceptual validation and operational validation. Conceptual validation concerns the question whether the mechanisms built into the model represent the mechanisms that drive the modeled real system. An “invalid conceptual model indicates the model may not be an appropriate representation of reality.” Operational validation then “validates results of the simulation against results from the real system.” \citep[2.13]{heath-et-al:2009}. The demand for complete validation is well motivated: “If a model is only conceptually validated, then it [is] unknown if that model will produce correct output results.” \citep[4.12]{heath-et-al:2009}. For even if the driving mechanisms of the real system are represented in the model, it remains – without operational validation – unclear whether the representation is good enough to produce correct output results. On the other hand, a model that has been operationally validated only, may be based on a false or unrealistic mechanism and thus fail to explain the simulated phenomenon, even if the data matches. Heath, Hill and Ciarallo do not go into much detail concerning how exactly conceptual and operational validation are done in practice and under what conditions a validation attempt is to be considered as successful or as a failure. But do really all simulations need to be validated both conceptually and operationally as Heath, Hill and Ciarallo demand? After all, some simulations may – just like thought experiments – have been intended to merely prove conceptual possibilities. One would usually not demand an empirical (i.e. operational) validation from a thought experiment. Heath, Hill and Ciarallo themselves make a distinction between the generator, mediator and predictor role of a simulation \citep[2.16]{heath-et-al:2009}. In the generator role simulations are merely meant to generate hypotheses. Simulations in the mediator role “capture certain behaviors of the system and [..] characterize how the system may behave under certain scenarios” (3.4) and only simulations in the predictor role are actually calculating a real system. All of the surveyed studies fall into the first two categories. Obviously, the authors require complete validation even from these types of simulations. This can be disputed. As stated in the introduction, in order to be useful, a simulation study should make a contribution to answering some relevant question of empirical science. This contribution can be direct or indirect. The contribution is direct if the model can be applied to some empirical process and if it can be tested empirically whether the model is correct. The model’s contribution is indirect, if the model cannot be applied empirically, but if we can learn something from the model which helps us to answer an empirical question, the answer to which we would not have known otherwise. The latter kind of simulations can be said to function as thought experiments. It would be asking too much to demand complete empirical validation from a thought experiment. But does this mean that the figures from Heath, Hill and Ciarallo concerning the validation of simulations need to be interpreted differently by taking into account that some simulations may not require complete validation in the first place? This objection would miss the point, because the scenario just discussed is the exception rather than the rule. Classical thought experiments like Schrödinger’s cat usually touch upon important theoretical disputes. However, as will become apparent from the discussion of simulations of the evolution of cooperation, below, computer simulation studies all too easily lose the contact to relevant scientific questions. We just do not need all those digital thought experiments on conceivable variants of one and the same game theoretical model of cooperation. And the same surely applies to many other traditions of social modeling as well. But if this is true, then the figure of 65\% of not completely validated simulation studies in the field of agent-based simulations is alarming indeed.\footnote{For a detailed discussion of the cases in which even unvalidated simulations can be considered as useful, see \citet{arnold:2013}. There are such cases, but the conditions under which this is possible appear to be quite restrictive.} Given how important empirical validation is, “because it is the only means that provides some evidence that a model can be used for a particular purpose.” \citep[4.11]{heath-et-al:2009}, it is surprising how little discussion this important topic finds in the textbook literature on social simulations. \citet{gilbert-troitzsch:2005} mention validation as an important part of the activity of conducting computer simulations in the social sciences, but then they dedicate only a few pages to it (22-25). \citet[98]{salamon:2011} also mentions it as an important question without giving any satisfactory answer to this question and without providing readers with so much as a hint concerning how simulations must be constructed so that their validity can be empirically tested. \citet{railsback-grimm:2011} dedicate many pages to describing the ODD-protocol, a protocol that is meant to standardize agent-based simulations and thus to facilitate the construction, comparison and evaluation of agent-based simulations. Arguably the most important topic, empirical validation of agent-based simulations, is not an explicit part of this protocol. One could argue that this is simply a different matter, but then, given the importance of this topic it is slightly disappointing that Railsback and Grimm do not treat it more explicitly in their book. Summing it up, the survey by Heath, Hill and Ciarallo shows that an increasingly important sub-discipline of social simulations, namely the field of agent-based simulations faces the serious problem that a large part of its scientific literature consists of unvalidated and therefore most probably useless computer simulations. Moreover, considering the textbook literature on agent-based simulations one can get the impression that the scientific community is not at all sufficiently aware of this problem. \section{How a model works that works: Schelling’s neighborhood segregation model} Moving from the general finding to particular examples, I now turn to the discussion of Thomas Schelling’s neighborhood segregation model. Schelling’s neighborhood segregation model \citep{schelling:1971} is widely known and has been amply discussed not only among economists but also among philosophers of science as a role model for linking micro-motifs with macro-outcomes. I will therefore say little about the model itself, but concentrate on the questions if and, if so, how it fulfills my criteria for epistemically valuable simulations. Schelling’s model was meant to investigate the role of individual choice in bringing about the segregation of neighborhoods that are either predominantly inhabited by blacks or by whites. Schelling considered the role of preference based individual choice as one of many possible causes of this phenomenon – and probably not even the most important, at least not in comparison to organized action and economic factors as two other possible causes \citep[144]{schelling:1971}. In order to investigate the phenomenon, Schelling used a checkerboard model where the fields of the checkerboard would represent houses. The skin color of the inhabitants can be represented for example by pennies that are turned either heads or tails.\footnote{Schelling’s article was published before personal computers existed. Today one would of course use a computer. A simple version of Schelling’s model can be found in the netlogo models library \citep{Wilensky1999}.} Schelling assumed a certain tolerance threshold concerning the number of differently colored inhabitants in the neighborhood, before a household would move to another place. A result that was relatively stable among the different variants of the model he examined was that segregated neighborhoods would emerge – even if the threshold preference for equally colored neighbors was far below 50\%, which means that segregation emerged even if the inhabitants would have been perfectly happy to live in an integrated environment with a mixed population. As \citet{aydinonat:2007} reports, the robustness of this result has been confirmed by many subsequent studies that employed variants of Schelling’s model. At the end of his paper Schelling discusses “tipping” that occurs when the entrance of a new minority starts to cause the evacuation of an area by its former inhabitants. In this connection Schelling also mentions an alternative hypothesis according to which inhabitants do not react to the frequency of similar or differently colored neighbors but on their on expectation about the future ratio of differently colored inhabitants. He assumes that this would aggravate the segregation process, but he does not investigate this hypothesis further \citep[185-186]{schelling:1971} and his model is built on the assumption that individuals react to the actual and not the future ratio of skin colors. Is this model scientifically valuable? Can we draw conclusions from this model with respect to empirical reality and can we check whether these conclusions are true? Concerning these questions the following features of this model are important: \begin{enumerate} \item The assumptions on which the model rests can be tested empirically. The most important assumption is that individuals have a threshold for how many neighbors of a different color they tolerate and that they move to another neighborhood if this threshold is passed. This assumption can be tested empirically with the usual methods of empirical social research (and, of course, within the confinements of these methods). Also, the question whether people base their decision to move on the frequency of differently colored neighbors or on their on expectation concerning future changes of the neighborhood can be tested empirically. \item The model is highly robust. Changes of the basic setting and even fairly large variations of its input parameters, e.g. tolerance threshold, population size, do not lead to a significantly different outcome. Therefore even if the empirical measurement of, say, the tolerance threshold, is inaccurate, the model can still be applied. Robustness in this sense is directly linked to empirical testability. It should best be understood as a relational property between the measurement (in-)accuracy of the input parameters and the stability of the output values of a simulation.\footnote{There are of course different concepts of robustness. I consider this relational concept of robustness as the most important concept. An important non-relational concept of robustness is that of derivational robustness analysis \citep{kuorikoski-lehtinen:2009}. See below.} \item The model captures only one of many possible causes of neighborhood segregation. Before one can claim that the model explains or, rather, contributes to an explanation of neighborhood segregation, it is necessary to identify the modeled mechanism empirically and to estimate its relative weight in comparison with other actual causes. While the model shows that even a preference for integrated neighborhoods (if still combined with a tolerance limit) can lead to segregation, it may in reality still be the case that latent or manifest racism causes segregation. The model alone is not an explanation. (Schelling was aware of this.) \item Besides empirical explanation another possible use of the model would be policy advice. In this respect the model could be useful even if it does not capture an actual cause. For public policy must also be concerned about possible future causes. Assume for example, that manifest racism was a cause of neighborhood segregation, but that due to increasing public awareness racism is on the decline. Then the model can demonstrate that even if all further possible causes, e.g. economic causes, be removed as well, this might still not result in desegregated neighborhoods\footnote{But then, would we really worry about segregated neighborhoods, if the issue wasn't tied to racial discrimination and social injustice? After all, ethnic or religious groups in Canada also often live in segregated areas (Canadian mosaic''). But other than in the U.S. this is hardly an issue. Therefore, Schelling's model -- for all its epistemological merits that are discussed here -- really seems to miss the point in terms of scientific relevance. Discrimination is the important point here, not segregation. But Schelling's model induces us to frame the question in a way that makes us miss the point. ({\em This comment has been added later as the result of some discussions I had on this point. E.A., March 25th 2016.})} - provided, of course, that the basic assumption about a tolerance threshold is true. Thus, for the purpose of policy advice a model does not need to capture actual causes. It can be counter-factual, but it must still be realistic in the sense that its basic assumptions can be empirically validated. Therefore, while the purpose of policy advice justifies certain counter-factual assumptions in a model, it cannot justify unrealistic and unvalidated models. This generally holds for models that are meant to describe possible instead of actual scenarios. \end{enumerate} Schelling did not validate his model empirically. But for classifying the model as useful it is sufficient that it can be validated. Now, the interesting question is: Can the model be validated and is it valid? Recent empirical research on the topic of neighborhood segregation suggests that inhabitants react to anticipated future changes in the frequency of differently colored neighbors rather than the frequency itself \citep[124-125]{ellen:2000}. An important role is played by the fear of whites that they might end up in an all-black neighborhood. Thus, the basic assumption of the model that individuals react upon the ratio of differently colored inhabitants in their neighborhood is wrong and one can say that the model is in this sense falsified.\footnote{There are two senses in which a model (or more precisely: a model-based explanation) can be falsified: a) if the model’s assumptions are empirically not valid as in this case and b) if the causes the model captures are (i) either blocked by factors not taken into account in the model or (ii) cannot be disentangled from other possible causes or (iii) turn out to be irrelevant in comparison with other, stronger or otherwise more important causes for the same phenomenon. The connection between the model’s assumptions and its output, being a logical one, can, of course, not be empirically falsified.} The strong emphasis that is placed on empirical validation here stands in contrast to some of the epistemological literature on simulations and models. Robert Sugden, noticing that “authors typically say very little about how their models relate to the real world”, treats models like that of Schelling (which is one of his examples \citep[6-8]{sugden:2000}) as “credible counterfactual worlds” \citep[3]{sugden:2009} which are not intended to raise any particular empirical claims. Even though the particular relation to the real world is not clear, Sugden believes that such models can inform us about the real world. His account suffers from the fact that he remains unclear about how we can tell a counter-factual world that is credible from one that is incredible, if there is no empirical validation. A possible candidate for stepping in this gap of Sugden’s account is Kuorikoski’s and Lehtinen’s concept of “derivational robustness analysis” \citep{kuorikoski-lehtinen:2009}. According to this concept conclusions from unrealistic models to reality might be vindicated if the model remains robust under variations of its unrealistic assumptions. For example, in Schelling’s model the checkerboard topography could be replaced by other different topographies \citep[441]{aydinonat:2007}. If the model still yields the same results about segregation, we are – if we follow the idea of “derivational robustness analysis” – entitled to draw the inductive conclusion that the model’s results would still be the same if the unrealistic topographies were exchanged by the topography of some real city, even though we have not tested it with a real topography. A problem with this account is that it requires an inductive leap of a potentially dangerous kind: How can we be sure that the inductive conclusion derived from varying unrealistic assumptions holds for the conditions in reality which differ from any of these assumptions? Some philosophers also dwell on the analogy between simulations and experiments and consider simulations as “isolating devices” similar to experiments \citep{maeki:2009}. But the analogy between simulations and experiments is rather fragile, because other than experiments simulations are not empirical and do not allow us to learn anything about the world apart from what is implied in the premises of the simulation. In particular, we can – without some kind of empirical validation – never be sure whether the causal mechanism modeled in the simulation represents a real cause isolated in the model or does not exist in reality at all. Summing it up, it is difficult, if not impossible, to claim that models can inform us about reality without any kind of empirical validation. Schelling’s model, however, appears to be a scientifically useful model, at least in the sense that it can be validated (or falsified for that matter). The most decisive features of the model in this respect are its robustness and the practical feasibility of identifying the modeled cause in empirical reality. Next we will see how models fare when these features are not present. \section{How models fail: The Reiterated Prisoner’s Dilemma model} Robert Axelrod’s computer simulations of the Reiterated Prisoner’s Dilemma (RPD) \citep{axelrod:1984} are well known and still considered by some as a role model for successful simulation research \citep[408-409]{rendell-et-al:2010a}. What is not so widely known is that the simulation research tradition initiated by Axelrod has remained entirely unsuccessful in terms of generating explanations for empirical instances of cooperation. What are the reasons for this lack of explanatory success? And how come that Axelrod’s research design is none the less considered as a role model today? Axelrod had the ingenious idea to advertise a public computer tournament where participation was open to everybody. Participants were asked to hand in their guess at a best strategy in the reiterated two person Prisoner’s Dilemma in the form of an algorithmic description or computer program. This provided Axelrod with a rich, though naturally very contingent set of diverse strategies and it had the, surely welcome, side-effect of generating attention for Axelrod’s research project. Axelrod ran a sequence of two tournaments. As is well known the rather simplistic strategy {\em Tit For Tat} won both tournaments. In the Prisoner’s Dilemma Game the players can decide whether to cooperate or not to cooperate. Mutual cooperation yields a higher payoff than mutual non-cooperation, but it is best to cheat by letting the other player cooperate while not cooperating oneself. And it is worst to be cheated, i.e. to cooperate while the other player does not. {\em Tit For Tat} cooperates in the first round of the Repeated Prisoner’s Dilemma, but if the other player cheats, then {\em Tit For Tat} will punish the other player by not cooperating in the following round.\footnote{For a detailed description RPD-model and the tournament see Axelrod (1984). An open-source implementation is available from: \url{www.eckhartarnold.de/apppages/coopsim}.} Axelrod analyzed the course of the tournament in order to understand just why {\em Tit For Tat} was such a successful strategy. He concluded that it is a number of characteristics that determine the success of a strategy in the Reiterated Prisoner’s Dilemma \citep[chapter 6]{axelrod:1984}: Successful strategies are (1) “friendly”, i.e. they start with cooperative moves, (2) envy-free, (3) punishing, but also (4) forgiving. Axelrod furthermore believed that repeated interaction is a necessary requirement for cooperation to evolve and that, of course, {\em Tit For Tat} is generally quite a good strategy in Reiterated Prisoner’s Dilemma situations. Unfortunately for Axelrod, the Reiterated Prisoner’s Dilemma model is anything but robust. For each of his conclusions, variations of the RPD-model can be constructed where the conclusion becomes invalid \citep[107]{arnold:2013}. It is even possible to construct a variant that allows strategies to break off the repeated interaction at will and that does not lead to the breakdown of cooperation \citep{schuessler:1990}. The failure to derive any robust results highlights the danger of drawing generalizing conclusions from models and of relying on models as a tool of theoretical investigation. This point has most strongly been emphasized by Ken Binmore, who describes the popularity that Axelrod’s model enjoyed derogatorily as the “The Tit-For-Tat Bubble” \citep[194]{binmore:1994}. Because the folk theorem from game theory implies that there are infinitely many equilibria in the Reiterated Prisoner’s Dilemma, there is not much reason to assign of all things the {\em Tit For Tat}-equilibrium a special place \citep[313-317]{binmore:1994}. If one follows Binmore’s criticism then it is not the reiterated Prisoner’s Dilemma that explains why {\em Tit For Tat} is such a good strategy, but rather the fact that {\em Tit For Tat} is a very salient and easily understood mode of behavior in many areas of life that explains why people so easily believed in the superiority of the {\em Tit For Tat} strategy in the RPD game. %(See \citet[198]{binmore:1994} and \citet[317-319]{binmore:1998}.) It is not only its lack of robustness that troubles Axelrod’s model. It is also the difficulty of relating it to any concrete empirical subject matter – a problem that Axelrod shares with many game theoretical explanations.\footnote{This is very frankly admitted by the leading game theorist \citet{rubinstein:2013} in a newspaper article. Rubinstein resorts to an aesthetic vindication of game theory (“flowers in the garden of God”).} Axelrod himself had offered a very impressive example of empirical application by relating the RPD model to the silent “Live and Let Live” agreement that emerged between enemy soldiers on some of the quieter stretches of the western front in the First World War. However, as critics were quick to point out \citep{battermann-et-al:1998, schuessler:1990}, it is not at all clear whether this situation really is a Prisoner’s Dilemma situation, let alone how the numerical values of the payoff parameters could be assessed. But precise numerical payoff values would be necessary since Axelrod’s model is not robust against changes of the numerical values of the payoff parameters within the boundaries that the Prisoner’s Dilemma game allows \citep[80]{arnold:2008}. Also, Axelrod’s model could not explain why “Live and Let Live” occurred only on some stretches of the front line \citep[180]{arnold:2008}. Therefore, Axelrod’s theory of the evolution of cooperation could not really add anything substantial to the historical explanation of the “Live and Let Live” by Tony \citet{ashworth:1980}. The chapter from Axelrod’s book on the “Live and Let Live”-system shows that he did not understand his model only as a normative model, but at least also as an explanatory model. And the model was certainly understood as potentially explanatory by the biologists who were trying to apply it to cooperative behavior among animals (see below). The distinction is important, because the validation requirements for normative models are somewhat relaxed in comparison to explanatory models. After all, we would not expect from a model that is meant to generate advice for rationally adequate behavior to correctly predict the behavior of unadvised and potentially irrational agents. Still, even normative models must capture the essentials of the empirical situations to which they are meant to be applied well enough to generate credible advice. Here, too, robustness is an important issue. For similar reasons as in the descriptive case it would be dangerous to trust the advice given on the basis of a non-robust model. Thus, in contrast to Schelling’s model Axelrod’s model is neither robust nor can the postulated driving factors of the emergent phenomenon (stable cooperation) easily be identified empirically. In Schelling’s case the driving factor was the assumed tolerance threshold, in Axelrod’s case it is the payoff parameters of the Prisoner’s Dilemma. Therefore, two important prerequisites (robustness and empirical identifiability) for the application of a formal model to a social process appear to be absent in Axelrod’s case. The popularity of Axelrod’s computer tournaments had the consequence that it became a role model for much of the subsequent simulation research on the evolution of cooperation. It spawned myriads of similar simulation studies on the evolution of cooperation \citep{dugatkin:1997, hoffmann:2000}. Unfortunately, most of these simulation studies remained unconnected to empirical research. Axelrod had – most probably without intending it – initiated a self-sustaining modeling tradition where modelers would orientate their next research project on the models that they or others had published before without paying much attention to what kind of models might be useful from an empirical perspective. Instead it was more or less silently assumed that because of the generality of the model investigations of the reiterated Prisoner’s Dilemma model would surely be useful. How little contact the modeling tradition initiated by Axelrod had to empirical research becomes very obvious in a survey of empirical research on the evolution of cooperation in biology by \citet{dugatkin:1997}. In the beginning, Dugatkin lists several dozens of game theoretical simulation models of the evolution of cooperation, an approach to which Dugatkin himself is very favorable. However, none of the models can be related to particular instances of cooperation in animal wildlife. A seemingly insurmountable obstacle in this respect is that payoff parameters usually cannot be measured. It is just very difficult to measure precisely the increased reproductive success, say, that apes that reciprocate grooming enjoy over apes that don’t. The most serious attempt to apply Axelrod’s model was undertaken by \citet{milinski:1987} in a study on predator inspection behavior in shoal fishes like sticklebacks. When a predator approaches, it happens that one or two sticklebacks leave the shoal and carefully swim closer to the predator. The hypothesis was that if two sticklebacks approach the predator they play a Reiterated Prisoner’s Dilemma and make the decision to turn back based on a {\em Tit For Tat} strategy taking into account whether the partner fish stays back or not. This was tested experimentally by \citet{milinski:1987} as well as others \citep[59-69]{dugatkin:1998}. While in his 1987-paper Milinski himself believed that the hypothesis could be confirmed, it was after a long controversy ultimately abandoned. In a joint paper on the same topic that appeared ten years later \citet{milinski-parker:1997} do not draw on the RPD model any more. In fact they treat it as an unresolved question whether the observed behavior is cooperative at all. In a later discussion, Dugatkin explained the problem when linking the model research about cooperation to the empirical research in biology by the difficulty of establishing a feedback-loop between model research and empirical research \citep[57-58]{dugatkin:1998a}. The empirical results were never fed back into the model building process and the obstacles when trying to apply the models were never considered by the modelers. Without a feedback-loop between theoretical and empirical research, however, the model-building process soon reaches a stalemate where models remain detached from reality. The frustration about this kind of pure model research is well expressed in a polemical article by Peter \citet{hammerstein:2003}. “Why is there such a discrepancy between theory and facts?” asks \citet[83]{hammerstein:2003} and continues: “A look at the best known examples of reciprocity shows that simple models of repeated games do not properly reflect the natural circumstances under which evolution takes place. Most repeated animal interactions do not even correspond to repeated games.” In saying so, Hammerstein is by no means opposed to employing game theory in biology. It’s just that in the aftermath of Axelrod most simulation studies on the evolution of cooperation focused on the Reiterated Prisoner’s Dilemma or similar repeated games. This shows that the demand for empirical validation has an important side effect besides allowing to judge the truth and falsehood of the models themselves: It forces the modelers to concern themselves seriously with the empirical literature and the empirical phenomena that their models address. If they do so, there is hope that this will lead quite naturally to the choice of simulation models that address relevant questions of empirical research. Or, as \citet[92]{hammerstein:2003} nicely puts it: “Most certainly, if we invested the same amount of energy in the resolution of all problems raised in this discourse, as we do in the publishing of toy models with limited applicability, we would be further along in our understanding of cooperation.” Just how little model researchers care for the empirical content of their research is inadvertently demonstrated by a research report on the evolution of cooperation that appeared roughly 20 years after the publication of Axelrod’s first paper about his computer tournament \citep{hoffmann:2000}. There is only one brief passage where the author of this research report talks about empirical applications of the theory of the evolution of cooperation. And in this passage there is but one piece of empirical literature that the author quotes, the study on predator inspection in sticklebacks by \citet{milinski:1987}! Nevertheless, Hoffmann believes that the “general framework is applicable to a host of realistic scenarios both in the social and natural worlds” \citep[4.3]{hoffmann:2000}. Much more believable is Dugatkin’s summary of the situation: “Despite the fact that game theory has a long standing tradition in the social sciences, and was incorporated in behavioral ecology 20 years ago, controlled tests of game theory models of cooperation are still relatively rare. It might be argued that this is not the fault of the empiricists, but rather due to the fact that much of the theory developed is unconnected to natural systems and thus may be mathematically intriguing but biologically meaningless” \citep[57]{dugatkin:1998a}. That this fact could escape the attention of the modelers tells a lot about the prevailing attitude of modelers towards empirical research. \section{An ideology of modeling} The examples discussed previously indicate that simulation models can be a valuable tool to study some of the possible causes of some social phenomena. However, the examples also show that a) modeling approaches in the social sciences can easily fail to deliver resilient results, that b) social simulations are not yet generally embedded in a research culture where the critical assessment of the (empirical) validity of the simulation models is a salient part of the research process and that c) the significance of pure simulation results is likely to be overrated. Unsurprisingly, simulation models in the social sciences excel when studying those causes that can be represented by a mathematical model as in the case of Schelling’s neighborhood segregation model. Part of the secret of Schelling’s success is surely that he had a good intuition for picking those example cases where mathematical models really work. But many of the causal connections that are of interest in the social science cannot be described mathematically. For example, the question how the proliferation and easy accessibility of adult content in the internet shapes the attitude of youngsters towards love, sex and relationships, is hardly a question that could be answered with mathematical models. Or, if we want to understand what makes people follow orders to slaughter other people even in contradiction to their acquired moral codes \citep{Browning:1992}, then any reasonable answer to this question will hardly have the form of a mathematical model.\footnote{A good discussion of the respective merits and limitations of different research paradigms in the social sciences can be found in \citet{moses-knutsen:2012}.} Unfortunately, the field of social simulations has by now become so much of a specialized field that modelers are hardly aware of the strong limitations of their approach in comparison with conventional, model-free methods in the social sciences. There is a widespread, though not necessarily always outspoken belief that more or less everything can -- somehow -- be cast into a simulation model. Part of the reason for this belief may be the fact that with computers the power of modeling techniques has indeed greatly increased. This belief has found explicit expression in Joshua Epstein’s keynote address to the Second World Congress of Social Simulation under the title “Why model?” \citep{epstein:2008}. In the following I am going to discuss Epstein’s arguments and point out the misconceptions underlying this belief. In my opinion these misconceptions are to no small degree responsible for the misguided practices in the field of social simulations. Epstein sets out by arguing that it is never wrong to model, because – as he believes – there exists only the choice between explicit and implicit models, anyway: \begin{quote} The first question that arises frequently -- sometimes innocently and sometimes not -- is simply, "Why model?"Imagining a rhetorical (non-innocent) inquisitor, my favorite retort is, "You are a modeler."Anyone who ventures a projection, or imagines how a social dynamic -- an epidemic, war, or migration -- would unfold is running some model. But typically, it is an implicit model in which the assumptions are hidden, their internal consistency is untested, their logical con- sequences are unknown, and their relation to data is unknown. But, when you close your eyes and imagine an epidemic spreading, or any other social dynamic, you are running some model or other. It is just an implicit model that you haven’t written down (see Epstein 2007). ... The choice, then, is not whether to build models; it’s whether to build explicit ones. In explicit models, assumptions are laid out in detail, so we can study exactly what they entail. On these assumptions, this sort of thing happens. When you alter the assumptions that is what happens. By writing explicit models, you let others replicate your results. \citep[1.2-1.5]{epstein:2008} \end{quote} It is not entirely clear whether Epstein restricts his arguments to projections, but even in this case it is most likely false. It is simply not possible to cast anything that can be described in natural language into the form of a mathematical or computer model. But then we also cannot assume that this must be possible, if projections to the future are concerned. It is of course always commendable to make one’s own assumptions explicit. But this does not require modeling. In addition, there are certain dangers associated with mathematical and computational modeling: \begin{enumerate} \item the danger of underrating or ignoring those causal connections that do not lend themselves to formal descriptions. \item the danger of arbitrary ad hoc decisions when modeling causes of which we only have a vague empirical understanding. The necessity to specify everything precisely easily leads to the sin of false precision, which consists in assuming detailed knowledge where in fact there is none. \item the danger of conferring a deceptive impression of understanding even if the model is not validated. \item the shaping and selection of scientific questions by the requirements of modeling, rather than by other, arguably more important, criteria of relevance as, for example, the social impact or relevance for public policy. \end{enumerate} That Epstein mentions replicability as another advantage of explicit modeling is ironic given that it is still quite uncommon in published simulation studies to give a reference for the reader to access and replicate the model (as described further above). More worrisome, however, is Epstein’s attitude towards validation: \begin{quote} ... I am always amused when these same people challenge me with the question, "Can you validate your model?" The appropriate retort, of course, is, "Can you validate yours?"At least I can write mine down so that it can, in principle, be calibrated to data, if that is what you mean by "validate,"a term I assiduously avoid (good Popperian that I am). \citep[1.4]{epstein:2008} \end{quote} Calibration (i.e. fitting a model to data) is of course neither the same nor a proper substitute for validation (testing a model against data), as Epstein knows. Validation in the sense of empirical testing of a model, hypothesis or theory is a common standard in almost all sciences, including those sciences mentioned earlier that usually do not rely on formal models like history, ethnology, sociology, political science. It is obviously not the case that validation presupposes explicit modeling, for otherwise history as an empirical science would be impossible. Epstein furthermore advances 16 reasons for building models other than prediction \citep[1.9-1.17]{epstein:2008}. None of these reasons is exclusively a reason for employing models, though. The functions, for example, of guiding data collection or discovering new questions can be fulfilled by models and also by any other kind of theoretical reasoning. Nor is it an exclusive virtue of the modeling approach “that it enforces a scientific habit of mind” \citep[1.6]{epstein:2008}. Here Epstein is merely articulating the positivistic stock prejudice of the superiority, if only of a didactic kind, of formal methods. Given what \citet{heath-et-al:2009} have found out about the lack of proper validation of many agent-based simulations one might even be inclined to believe the opposite about the simulation method’s aptitude to encourage a scientific habit of mind. It fits into the picture of a somewhat dogmatic belief in the power of modeling approaches that modelers consider the lack of acceptance of their method often as more of a psychological problem on the side of the recipients to be addressed by better propaganda \citep[2.11-2.12, 3.22-3.26]{barth-et-al:2012}, rather than a consequence of the still immature methodological basis of many agent-based simulation studies. This attitude runs the risk of self-deception, because one of the major reasons why non-modelers tend to be skeptical of agent-based simulations is that they perceive such simulations as highly speculative. As we have seen, the skeptics have good reason to do so. \section{Conclusions} It is in my opinion not least because of the abundance of simulations with low empirical impact that “social simulation is not yet recognized in the social science mainstream” \citep[abstract]{squazzoni-casnici:2013}. Why should a mainstream social scientist take simulation studies seriously, if he or she cannot be sure about the reliability of the results, because the simulations have never been validated? If modelers started to take the requirement of empirical validation more seriously, I expect two changes to occur – both of them beneficial: 1) Social simulations will become more focused in scope. Scientists will not attempt to cast anything into the form of a computer simulation from classical social contract philosophy \citep{skyrms:1996, skyrms:2004} to, well, the whole world \citep{futureict:2013, livingearth:2013}, but they will develop a better feeling for when simulations can be empirically validated and when not, and they will mostly leave out those problems where computer simulations cannot be applied with some hope of producing empirically applicable results. 2) Yet, while the simulation method will become more focused in scope, it will at the same time become much more useful in practice, because simulations will more frequently yield results that other scientists can rely on without needing to worry about their speculative character and potential lack of reliability. \singlespacing %\bibliographystyle{plainnat} \bibliographystyle{apsr} \bibliography{bibliography} \end{document}