Meteorological shenanigans

Comments
Last updated 2013-06-21 16:29:58 SGT

The NEA issues hourly updates of the PSI. Ordinarily this happens from 7 AM to 7 PM, but as a result of the haze situation intensifying lately, these operating hours have been extended to 6 AM - midnight. Also as a result of that haze situation, the NEA server has come under significantly heavier load than usual.

Today, however, there was a great deal of excitement because by around 8 PM, the PSI had swollen to 190; at 9 PM the NEA announced a reading of 290.

A little background on the NEA's readings: these readings are given in terms of 3-hour averages, for reasons arcane (probably PR and panic control, just like the delay in election result announcements last GE). Ordinarily, an averaged value is given for time-series data in order to smooth out microscopic perturbations, revealing macroscopic statistical trends in the process. Actually, this should begin raising some alarm bells. Since, prima facie, the function of such a time average is merely to force stochasticity by arbitrarily increasing sample size, it doesn't really make sense to issue 3-hour averages at 1-hour intervals. But I digress, for now.

The NEA issues PSI updates on the hour, every hour. Or, well, it tries to. Usually, there is a delay of around 7 minutes, give or take another 7 minutes. However, at 10 PM, the NEA took longer than usual. Much, much, much longer.

twitter feed

At the same time, random Singaporeans were assaulting the NEA PSI information page with the refresh button, and saw this:

Screenshot of alleged tampering

So if you've been reading random online forums (I'm looking at you, EDMW) there's been considerable suspicion that the numbers released now are inaccurate, probably in an attempt to sooth the general populace. Let's take a look at that.

Instability

Purportedly, numerical propaganda (such as it is) is meant to prevent instability, but that's not what I'm talking about. A time-series averaged over more than one time step has a quirky behaviour of cyclic instability. To see how it works, perhaps it'd be illustrative to show a few pictures of an example scenario. Suppose that the PSI were, in some universe, constant.

Let's say that we have a game of information asymmetry between a bureaucrat, who reports 3-hour PSI values, and a citizen, who reconstructs 1-hour averages from that data. The air is somewhat clean and full of auspiciousness, so the PSI hovers at around 77. In these plots, the [x]-axis represents time progression, the [y]-axis represents the PSI, the blue line represents the values reported by the bureaucrat, and the red line represents the value reconstructed by the citizen. Ideally, a plot of PSI over time would look something like this:

flat graph

Nothing too surprising so far. Now suppose that the bureaucrat somehow decides that underreporting the PSI for one hour will increase his performance bonus. It's only for an hour and will have no long-lasting ramifications! Right? Right?

Cyclic pattern

In this scenario the bureaucrat has underreported the PSI at hour 7 by 10 points. This causes a cyclical pattern to emerge from the retrodicted data: a swing down by 30 points, followed by a swing up by 30 points, followed by no net change. This instability is cyclical, a with period of 3 time steps. In general, a time-series averaged over [n] time steps, underreported by [m] points for one time step, will exhibit such cyclic behaviour with a period of [n] time steps and an amplitude of [n m] in terms of deviation from the original data.

So with that out of the way, let's look at Singapore's PSI-time graph. Owing to the cyclical, unstable nature of error propagation, it's difficult to introduce fixed upper and lower bounds; the faint lines represent different initial conditions (which were unknown) while the red line represents initial conditions that made for the smoothest curve (i.e. reduced period-3 instability, and therefore most probably correct).

Suspicious graph

It is immediately apparent that this graph exhibits features identical to the ideally tampered graph above. What are the odds that this was indeed a natural occurence? We might construct a rather compelling scenario that the detector was set upon by a malicious smog cloud that beat it up for an hour and left suddenly, which would explain the spike and the regression to the mean. Unfortunately, smog clouds do not leave behind regions of (relatively) pollutant-free air. That is to say, it is highly probable that this graph indicates the data has been tampered with.

Counterintuitively, though, this graph does not suggest that the PSI was underreported at 10 PM. It suggests, instead, that the PSI was heavily underreported at 8 PM, by about 30 points (give or take 10 or so). Adjusting for this error gives a corrected graph that looks like this:

less suspicious graph

I do not claim that this is what the actual PSI data was. This is, after all, circumstantial evidence at best, like Benford's Law tests, except this is much, much weaker. However, what I do claim is that cyclic instabilities of period 3 indicate tampering. Conveniently enough, the NEA ceased issuing PSI updates after 1 AM this morning, and so we have yet to observe more than two periods' worth of any such cycle after 10 PM. This is compounded by the fact that it is entirely possible for the PSI to have gone up and down in the manner shown in our reconstructed graph; however, such large variations obscure small instabilities. In the absence of further evidence, this is pretty inconclusive.

Caveats

But wait, Joel! What happened to that screenshot? Shouldn't there have been some doctoring at around 10 PM as well?

Well, yes, that would make for quite a good story; however, the mathematics doesn't back it up. If the PSI at 2200h were underreported by [393 - 321 = 72] points, that would lead to a cyclical instability of amplitude [3 \times 72 = 216] points. The cyclical nature of error propagation requires such lies to beget more lies, because then our retrodicted 1-hour PSI for 2300h would have to be negative, in order to corroborate with the official data. We are forced to conclude that, if data manipulation of such magnitude ever occurred for a single data point, then subsequent data points must also have been falsified in order to maintain internal consistency.

Moreover, note that this is not a rigorous analysis, nor is it to be taken as such. There are too many unknowns here for concrete certainty. For one, we have assumed that, aside from a single data point, all NEA data was accurate. However, if, as we assert, they have fabricated at least one data point, then the validity of this assumption is questionable, because they could always fabricate more. Furthermore, any evidence I have is questionable at best. Since the incident in question happened at night, it was difficult for independent verification based on visibility distance. Had it happened in the day, I suspect the NEA might have been forced to be more transparent.

Finally, the most important issue is one of trust. Consider the following: as things stand, the visibility right now (at 9 AM, 20130620) is (by my reckoning) on the order of 300 m - qualitatively worse than yesterday, when the PSI was 170 at 1400h; and yet the official 3-hour PSI right now is much lower, at 130 points (EDIT: like so). One might attribute this to classical smog, with moisture persisting for longer than usual as a result of diminished solar flux. However, without reliable official corroboration or proper independent verification, it is difficult to be sure; conversely, if official numbers are known, or even suspected, to have previously been fabricated, such a history can and will cast aspersions on present figures, however accurate. This cannot but hurt the NEA in the long run.

Ultimately, the NEA cannot benefit from avoiding questions from the public. It should directly address questions about its honesty and competence (like "the PSI hit 450 why neh say properly") with detailed responses, and be ready to dispel any misconceptions about its transparency and methodology that the public may have, including any that I may have mistakenly fallen victim to here. I suppose that's more or less the only thing it can reasonably expect to do at this point to stop rumour mills.

come at me bro

For purposes of independent verification, I have uploaded my calculations in ODS format.

EDIT: Piecewise linearity

Redditor starfall-invoker points out that the PSI is a piecewise-linear function of the PM10 metric (among others) with slopes differing across different regimes. However, that does not detract from the validity of this analysis. The existence of such cyclic instabilities are a consequence of internal inconsistency in linear time-averaged data, and do not depend on the conversion between PM10 and PSI.

Actually, that also suggests a reason for the screenshot that's been floating around. Given that one of the breakpoints in the PSI spec is at 300 points, it seem likely that, having never encountered PM10 values above that breakpoint, they simply used the wrong conversion function (human error), resulting in an erroneous value that was quickly fixed.

EDIT2: update


comments powered by Disqus