|
This is the seventh in a series of editorials covering all aspects of good science writing. The great statistician and graphical expert John Tukey said, “The greatest value of a picture is when it forces us to notice what we never expected to see.”1 While many graphic forms can help us accomplish this goal, the most useful for science has proven to be the x-y scatterplot. In 2012, about 1/3 of all figures in JM3, and about 70% of all data plots, were x-y scatterplots.2 The first modern scatterplot is attributed to John Herschel (1792–1871), son of William Herschel, the discoverer of Uranus and infrared light.3 In 1833, John Herschel used a scatterplot of noisy binary star measurements to extract a trend “by bringing in the aid of the eye and hand to guide the judgment,”4 thus fulfilling Tukey’s goal. The scatterplot allows the viewer to visualize the important trends the data suggests, and possibly offer a theory to explain them, by imagining a line that passes “not through, but among them,” as Herschel so aptly said.4 By 1920, the scatterplot had come into widespread use as the tool of science we know it now to be. The x-y scatterplot is “a diagram having two variates plotted along its two axes and in which points are placed to show the values of these variates for each of a number of subjects, so that the form of the association between the variates can be seen.”5 If the -axis plots time, we generally call the graph a time-series plot and often use unique analysis or interpretive frameworks for the data due to the unique role of time in causality. Here I’ll talk only to the more general x-y scatterplot and not to time-series plots specifically. I’ll also (mostly) ignore the role of x-y scatterplots as a projection of multivariate data (three or more variables), as interesting and important as that role is, and instead concentrate on the basics of this most popular of science graphs. What makes for a good x-y scatterplot? As for all graphs, the goal should be to allow the data to tell its story efficiently and effectively. The first rule of a graph is that it must help to reveal the truth.2 The design and execution of an x-y scatterplot can either help or hinder this goal. And while graphs can aid both in data exploration and data presentation, I’ll focus only on the latter here. Since I gave general advice on good graphics in part 1 of this editorial, here I will strive to be more specific through the use of examples. Though I have only anecdotal evidence, I am quite confident that most JM3 authors use Microsoft Excel to create their x-y plots (as well as most other graphs in their papers). Thus, my first example will explain how to turn the seriously awful default scatterplot of Excel into an acceptable graph for submission to JM3, or any other scientific journal. My example will be simple: a plot of (made-up) experimental data along with an equation that models that data. The before and after plots are shown in Fig. 1. Here is the sequence of steps I went through in Excel to move from the default to the final graph. I’m assuming that the final graph will fit within a single column in a two-column-per-page format. For journals with other page formats, some adjustments to these directions may be required.
That’s a lot of steps. But every step left out produces a less adequate graph. Note that some of these steps can be described as aesthetic, though making a graph more pleasing to the eye is generally synonymous with making it more readable. For example, the open circle data symbols enable one to see behind the symbol to the line and to other data points. In the original graph with the solid square symbols, can you tell how many data points are at and ? When using more than one symbol, be sure to consider the symbols’ size and shape for maximum visibility when there is overlap. The next example (Fig. 2) shows how labels can sometimes be fit into the graph to avoid the need to refer back and forth to a legend. A regular problem I encounter is a graph with data that fails to use up the space in the plot area. In Fig. 3, the authors wish to show how stable their laser is, so they stretch the -axis range to be ten times the data range. As result, we can’t see the variation in the data. So why bother showing the graph? A similar effect can be obtained by including zero on the -axis scale even though no data are near zero (imagine a plot of Earth’s global surface temperature in Kelvin, then starting the -axis at zero—global warming would disappear). This is an example of advocating rather than informing—using graphs to hide rather than reveal the truth. If there is nothing in the data worth seeing, the graph should be replaced with simple statistics: mean, standard deviation, min/max of the output, and maybe a statement that a linear regression gave a slope that was not statistically different from zero. If there is something worth seeing in the data, then adjust the -axis scale so that it can be seen. There are other ways to mislead with an x-y scatterplot, some not as subtle as the previous example. Unitless axes are a favorite of those who, at a minimum, do not wish to reveal the whole truth. An axis without unambiguous labeling should never be allowed. Using “arbitrary units” for a -axis is a bit trickier, since there are some cases where such a label is appropriate (a relative measure, based on a local uncalibrated standard that can be used to compare similar measurements). A common example is the relative intensity used in spectral analysis. Arbitrary units are never preferred, but sometimes necessary. Arbitrary units should never be used to hide known units that the author does not want to reveal. Additionally, arbitrary units have an arbitrary scale, but not an arbitrary zero point. Thus, when arbitrary units are used the graph must mark the zero point on the scale. One common and important application of the x-y scatterplot is to compare different graphs (thus adding a third variable, sometimes more). Figure 4 shows a array of graph multiples, matching x-axis and y-axis scales to allow easy comparison. With small multiples, many more graphs can be compared. Figure Quality from a Production StandpointThe final step in ensuring a good quality figure in your published paper is to make sure the submitted figure matches the production requirements of the journal. I’ll talk here specifically about JM3 requirements, but I don’t think they are much different from most other journals. A few of the largest publications, such as Nature or Science, employ professional editors who can reset a graph to the standards of the journal. For most publications, however, it is up to the author to get the graph right. Below are some hints, given to me by the SPIE publications staff, that will make the production process go more smoothly, and the resulting graph higher quality.
ConclusionsWhen presenting results, a good graph is like a good scientific theory—once you see it, everything just makes sense. But arriving at such a point takes care and consideration. In part 1 of this series on figures, I talked at a high level about what makes for good graphics. Here, I provided more pragmatic advice geared toward a specific type of graph—the ubiquitous x-y scatterplot. Keeping in mind the advice from both parts of this pair of editorials will, I hope, lead to graphs that help you, the author, achieve your goal of effective and efficient communication. ReferencesJohn W. Tukey, Exploratory Data Analysis, Addison-Wesley, Reading, MA
(1977). Google Scholar
Chris A. Mack,
“How to write a good scientific paper: Figures, part 1,”
J. Micro/Nanolith. MEMS MOEMS, 12
(4), 040101
(2013). http://dx.doi.org/10.1117/1.JMM.12.4.040101 JMMMGF 1932-5134 Google Scholar
Michael FriendlyDaniel Denis,
“The Early Origins and Development of the Scatterplot,”
Journal of the History of the Behavioral Sciences, 41
(2), 103
–130
(2005). http://dx.doi.org/10.1002/(ISSN)1520-6696 JHBSA5 0022-5061 Google Scholar
John F. W. Herschel,
“On the investigation of the orbits of revolving double stars,”
Memoirs of the Royal Astronomical Society, 5 171
–222
(1833). MRYAAX 0369-1829 Google Scholar
The Oxford English Dictionary, 2nd ed.Oxford Univ. Press(1989). Google Scholar
Chris A. Mack,
“Systematic Errors in the Measurement of Power Spectral Density,”
Journal of Micro/Nanolithography, MEMS, and MOEMS, 12
(3), 033016
(2013). http://dx.doi.org/10.1117/1.JMM.12.3.033016 JMMMGF 1932-5134 Google Scholar
|