A few days after I put up my little post on a bit of V&V history Judith Curry had a post about VV&UQ which generated a lot of discussion (she has a high-traffic site, so this is not unusual) [1].
A significant part of the discussion was about a post by Steve Easterbrook on the (in)appropriateness of IV&V (“I” stands for independent) or commercial / industrial flavored V&V for climate models. As George Crews points out in discussion on Climate Etc., there’s a subtle rhetorical slight of hand that Easterbrook uses (which works for winning web-points with the uncritical because different communities use the V-words differently, see the introduction to this post of mine). He claims it would be inappropriate to uncritically apply the IV&V processes and formal methods he’s familiar with from developing flight control software for NASA to climate model development. Of course, he’s probably right (though we should always be on the look-out to steal good tricks from wherever we can find them). This basically correct argument gets turned in to “Easterbrook says V&V is inappropriate for climate models” (by all sorts of folks with various motivations, see the discussion thread on Climate Etc.). The obvious question for anyone who’s familiar with even my little Separation of V&V post is, “which definition of V or V?”
What Easterbrook is now on record as agreeing is appropriate is the sort of V&V done by the computational physics community. This is good. Easterbrook seemed to be arguing for a definition of “valid” that meant “implements the theory faithfully” [2]. This is what I’d call “verification” (are you solving the governing equations correctly). The problem with the argument built on that definition, is the conflation I pointed out at the end of this comment, which is kind of similar to the rhetorical leap mentioned in the previous paragraph and displayed on the thread at Climate Etc.
Now, his disagreement with Dan Hughes (who recommends an approach I find makes a great deal of sense, and that I've used in anger to commit arithmurgical damage on various and sundry PDEs) is that Dan thinks we should have independent V&V, and Steve thinks not. If all you care about is posing complex hypotheses then IV&V seems a waste. If you care about decision support, then it would probably be wise to invest in it (and in our networked-age much of the effort could probably be crowd-sourced, so the investment can conceivably be quite small). This actually has some parallels with the decision support dynamics I highlighted in No Fluid Dynamicist Kings in Flight Test.
One of the other themes in the discussion is voiced by Nick Stokes. He argues that all this V&V stuff doesn’t result in successful software, or that people calling for developing IV&V’d code should shut up and do it themselves. One of the funny things is that if the code is general enough that the user can specify arbitrary boundary and interior forcings, then any user can apply the method of manufactured solutions (MMS) to verify the code. What makes Nick’s comments look even more silly is that nearly all available commercial CFD codes are capable of being verified in this way. This is thanks in part to the efforts of folks like Patrick Roache for instance [3], and now it is a significant marketing point as Dan Hughes and Steven Mosher point out. Nick goes on to say that he’d be glad to hear of someone trying all this crazy V&V stuff. The fellows doing ice sheet modeling that I pointed out on the thread on Easterbrook’s are doing exactly that. This is because the activities referred to by the term “verification” are a useful set of tools for the scientific simulation developer in addition to being a credibility building exercise (as I mentioned in the previous post, see the report by Salari and Knupp [4]).
Of course doing calculation verification is the responsibility of the person doing the analysis (in sane fields of science and engineering anyway). So the “shut-up and do it yourself” response only serves to undermine the credibility of people presenting results to decision makers bereft of such analysis. On the other hand, the analyst properly has little to say on the validation question. That part of the process is owned by the decision maker.
References
[1] Curry, J., Verification, Validation and Uncertainty Quantification in Scientific Computing, http://judithcurry.com/2011/09/25/verification-validation-and-uncertainty-quantification-in-scientific-computing/,Climate Etc. Sunday 25th September, 2011.
[2] Easterbrook, S., Validating Climate Models, http://www.easterbrook.ca/steve/?p=2032, Serendipity, Tuesday 30th November, 2010.
[3] Roache, P., Building PDE Codes to be Verifiable and Validatable, Computing in Science and Engineering, IEEE, Sept-Oct, 2004.
[4] Knupp, P. and Salari, K., Code Verification by the Method of Manufactured Solutions, Tech. Rep. SAND2000-1444, Sandia National Labs, June 2000.
I meant to link the Models as Ink Blots post too.
ReplyDeleteThe cost of IV&V must be weighed against its benefits (largely risk reduction/confidence building). Therefore, it is always necessary, based on the nature and usage of the software, to tailor the SQA requirements and processes (of which IV&V is an important component).
ReplyDeleteFor the GCMs, you clearly point out that we run the full gamut for suggested tailoring -- the extremes being "any IV&V is impossible" to "formally prove the software correctly executes".
What are the reasons for this? Well, tailoring is subjective. So we can encounter issues if 1) there is incomplete understanding of the nature of IV&V and its role in SQA, or 2) there is ethically/politically/financially motivated obfuscation.
So a consensus tailoring will be difficult. But, IMHO, a big benefit for tackling the problem directly would be to draw the subjectivity away from the climate science itself, where it is having a corrosive effect. And I am confident this is the thing to do. I have had some experience with IV&V tailoring consensus building in a contentious area (nuclear) and know it can be done.
George, I think I'm familiar with the sort of consensus process you are talking about. I think consensus on verification is achievable. One of the significant barriers to this working for validation is the lack of shared purpose, trust, and diverse risk appetites of the various stake-holders. As Jaynes points out, all of us good Bayesians shouldn't necessarily expect convergence of views when political motivations become significant.
ReplyDeleteI am interested in hearing more about your experience in consensus building though.
Hi Josh,
ReplyDeleteI've used this SCOTUS-style technique when presenting the "consensus" IV&V process/requirements tailoring to stakeholders for their acceptance. Obfuscation is more difficult using this approach.
Quality hinges on its corrective action processes. By putting forward a diversity of opinion, later necessary corrective actions become easier. Yet the documented majority tailoring opinion provides the justification for moving forward in face of risk.
Minor nit: crow-sourced is probably crowd-sourced unless there is some kind of corn crop involved. (I'm just catching up on some blogs after spending two weeks in conferences myself.)
ReplyDeleteHaha, maybe it could be crow-sourced.
ReplyDeleteInteresting: How I Learned to Stop Worrying and Love [the DOE's approach to verifying and validating models of] the Bomb.
George, I agree with you for the most part. VV&UQ, properly understood, is about risk management (or credibility building).
ReplyDeleteFor certain codes (numerical PDE solvers for instance), we can say that we've established the correctness of the implementation by demonstrating the design order of convergence. Just because it's intractable to prove the correctness of software in general doesn't mean we can't prove some useful limited or special cases.