I might just not like press releases (or press write-ups)

My LinkedIn feed presented me with a write-up of work done at the US Department of Energy’s Pacific Northwest National Laboratory by a website called Verdic, titled A deep neural network is being harnessed to analyse nuclear events:

[T]he data is shrouded in external noise, which can hinder the discovery of more uncommon signals. Even a light switch being turned on in a building can produce noise and subsequently affect the data.

It’s not actually very informative, not giving any information about how “deep” the network is but since it is

running on a standard desktop computer

I suspect not too deep. Although it is of course entirely possible that it “runs on a desktop computer” for the (much cheaper) classification task but needs to be trained on something more powerful.

This doesn’t stop the write-up from proclaimining that

Deep learning is likely to become the AI technology that allows cognitive systems to surpass human intelligence for specific applications.

And I have to admit that I really don’t understand the described training procedure:

A sample of 32,000 pulses was used to adapt the network, programming it to learn the changing features the pulses exhibited that would be critical when interpreting the data. Jesse Ward then sent over thousands of additional pulses so that the network could begin to deduce what signals were good and which were bad; as time progressed, the more complex the pulses became.

What exactly is the difference between the first 32k pulses and how they were used, and the ones afterwards? It sounds to me a bit as if the first step is a feature construction one – maybe using an auto-encoder network – and the second one of discriminative learning. But yeah, far from clear.

There’s unfortunately no link to any scientific publication in the write-up, so I’ll add one from 2015: Bockermann et al. Online Analysis of High-Volume Data Streams in Astroparticle Physics (pdf) which tackles the problem of

A central problem in all these experiments is the distinction of the crucial gamma events from the background noise that is produced by hadronic rays and is inevitably recorded. This task is widely known as the gamma-hadron separation problem and is an essential step in the analysis chain.

i.e. something very similar to the problem described above. They did it by introducing

the fact-tools – our high-level framework to model the data flow, which integrates state of the art tools such as WEKA and MOA to incorporate machine learning for various tasks.

Enjoy the read!

“Fake” news can indeed fool this new algorithm, “fake” news are in the eye of the beholder, and why all of this is a problem

“Fake” news detection is a big topic ever since the 2016 US presidential elections and the Brexit vote, and the claims of the respective losing sides that people had been tricked by “fake” news to vote for the winner.

Now the University of California, Riverside put out a rather strongly worded press release, titled Fake News Can’t Fool New Algorithm. I am not going to comment on the method but I have serious misgivings about the evaluation – misgivings that are by far not limited to this paper (pdf) but that apply to the entire “fake” news detection setting.

My first problem is with this statement from the press release:

The team members put three sets of articles— two public datasets and their own collection of 63,000 news articles— through their algorithm and found that it accurately sorted articles into fake news categories 75 percent of the time.

I know that this is a press release and that those have a problem with representing research correctly but still:

  1. The data set the authors created and which they perform most of their experiments is highly imbalanced: 31,739 “fake” news and 409,076 “real” news articles. To get around this, they down-sample the majority class, which can be defended when it comes to training data (since one could know the articles’ labels at the time of model building) but not for test data (when labels are unknown)1.
  2. Given this ideal setting, where both categories are balanced, they then achieved a precision of 73%, i.e. 27% of of news classified as “fake” were actually “real” because they looked too similar – that’s a big problem because it risks censoring a lot of legit information that looks dissimilar to the mainstream.
  3. Furthermore, recall was only at 74%, i.e. 26% of “fake” news escaped detection.
  4. So an arguably better way of summarizing the performance of the method is: Under ideal conditions, the method gets more than one in four stories wrong. Or, in other words, “fake” news fool this new algorithm

Continue reading

The conceptual underpinnings of machine intelligence

I’ve just discovered a series of very interesting posts by Peter Sweeney on Medium, in which he interrogates the conceptual and arguably philosophical underpinnings of machine intelligence (or a bit more narrowly, the current research in machine learning).

Especially the last post got me thinking quite a bit because while he juxtaposes ML predictions and the generated “knowledge” with the scientific approach to knowledge generation, I’m a bit surprised, that he doesn’t mention active learning in this discussion, which to mean seems to be relatively close to the Popperian scientific approach: have a hypothesis, test an example for which the result is not clear (or that you expect to violate the hypothesis), adjust the hypothesis if necessary.

And I wouldn’t be me if I didn’t think that the goal of knowledge generation could be helped by a) using learned models to generate data (if possible) and sanity-check them, and b) generate artificial data that should give certain results and see what the model/approach makes of them.

In addition: I don’t work on goal-oriented ML (predictive learning or reinforcement learning) very much, even though this blog might lead one to believe otherwise, but instead on unsupervised data mining.
We like to flatter ourselves that the results of our techniques are hypothesis-generating in that they basically just point out: “this relationship is unexpected” or “there is indeed structure in the data that had not been defined before” and leave it to (in fact, require from) humans to interpret and derive the knowledge. As a precondition for this to work, our results have to be symbolic (as in pattern mining) or at least more-or-less interpretable (as in cluster memberships of data instances).
So I wonder where this would enter into his thoughts about explanations and creativity.
The other thing is bisociation  - there was a (honestly largely still-born) EU FP7 project a couple of years ago, the stated purpose of which was developing methods for mapping vocabulary/concepts in different research domains to each other, and perform pattern mining over this space.
Today’s research on heterogeneous networks (pdf) goes in a similar direction but requires already predefined connections between concepts/entities/data sources so any results are arguably not creative leaps.

Problems with Machine Learning: we’ve been here before

A friend of mine shared a blog post about data privacy issues in machine learning.

While it seems that the paper they talk about is pretty neat, this is still an immensely frustrating post to me. Membership inference seems to me to be similar to k-anonymity, a problem that was extensively studied at least 10 years ago. Yet that term doesn’t even appear in their paper on membership interference. And adversarial learning has been a concern for machine learning researchers long before the Deep Learning hype (and, yes, also before 2011).

A few days ago, that same friend had linked to an ars technica article about how Amazon’s face recognition matches lawmakers that are people of color to mugshots of people of color. This, again, is a problem that has been explored as long as ten years ago.

What’s truly remarkable to me is that such well-funded organizations as Google and Amazon have apparently ignored much of that work to go ahead and redo old mistakes. It feels a bit as if the advent of Deep Learning has been used to wipe the slate clean and rediscover a lot of known insights, retarding development.

IJCAI bidding is upon me

and as last year, it promises to be annoying.

  1. The two highest-ranked papers for me were on Deep Learning. Ranking is supposed to help us bid and supposedly based on a combination of keywords selected by PC members and analysis of papers that they upload.

    I have no knowledge to speak off of Deep Learning, I have never written a Deep Learning paper, and I didn’t select “Deep Learning” as a keyword!

  2. After having done my bids on the first 25 papers (which took time because the titles are not super-informative, so I read the abstracts), I wanted to move on to page 2, only to find out that the system had logged me out, losing all but three of my bids in the process

But at least I’ve encountered a strong contender for the most buzz-wordy title of the year: Improved Kernel Density Estimation Self-organizing Incremental Neural Network to Perform Big Data Analysis!

I’m a (we’re) gatekeeper(s)

I’ve just finished reviewing and discussing for SDM 2018. It’s a good conference, and they give a smaller reviewing load than the dysfunctional ICDM, for instance. I had eight papers to review and it turned out that they were all rejects, either because the ideas were a bit half-baked, or (in the majority of cases) because they were unaware of important related work and therefore didn’t discuss/compare.1

SDM doesn’t blind the reviewers to each other, and I noticed that for the seven papers where I could see others’ reviews, I knew between one and three (out of three) of my coreviewers personally. In some cases, I knew the meta-reviewer as well. Now, I feel that our reviews and decisions were justified2 but if we (as a group of researchers knowing each other) simply didn’t like the direction of the work, for instance, we would have been in the position to block it.

In a sense, it’s unavoidable that a sitution like this occurs in peer review but it becomes more likely if the (sub)field is somewhat specialized and there’s only a certain number of researchers working in it on a high-enough level to be invited as reviewers.

1 This is a side-effect of the publish-or-perish mechanisms: we publish way too much in our field, which makes it often very hard to know all the relevant related work in the first place – especially when one is a PhD student. But letting such papers get published would only worsen the problem.

2 Although this introduces a chicken-egg problem: one of the reasons that I trust the others’ reviews is because I know and respect them and their knowledge.

Great accuracy + forgetting to bet = slight losses

Week Naive Bayes (Avg+OAvg) Naive Bayes (Avg) ANN (Avg + OAvg) ANN (Avg) Neural Network (Adj)
Week 13 12/16 12/16 12/16 9/16 13/16
Through week 13 112/175 115/175 107/175 100/175 105/175

Look at those accuracies! 75% for the classifier I use to bet, as for two others, even 81.25% for the ANN with adjusted statistics. Yet I still lost ~50 euros but this time this is mainly due to me – forgot both to bet the (correctly predicted) Seattle-over-Philadelphia upset and the MNF match (also correctly predicted). The biggest payout was Minnesota-over-Atlanta, btw, a match that the latter two ANN classifiers got wrong.
I never forgot to bet last year, probably because there’s was actually a chance winning – this year, I am just trying to claw some money back, and feel stymied at every turn. 🙂
Finally, purely in accuracy terms, past trends show up again – the Naive Bayes classifiers stand head-and-shoulders above the rest – 64%/65.71% vs 61.14%/57.14%/60%.

The end’s getting closer

Week Naive Bayes (Avg+OAvg) Naive Bayes (Avg) ANN (Avg + OAvg) ANN (Avg) Neural Network (Adj)
Week 11 10/14 10/14 10/14 11/14 8/14
Week 12 13/16 13/16 9/16 12/16 13/16
Through week 12 100/159 103/159 95/159 91/159 92/159

Will you look at those accuracies for the NB models: 71% and 81.25%! And the later won me exactly 19 euros! 😦 There where three matches I couldn’t bet on because the favorite’s odds were too low, I missed two upsets, and incorrectly predicted another one.

Apart from that, I tallied up theoretical winnings through week 11 last week, i.e. if I’d bet US$ 100 per match, for four models:

Week Naive Bayes (Avg+OAvg) Naive Bayes (Avg) ANN (Avg + OAvg) ANN (Avg)
Accuracy through week 11 (%) 60.84 62.94 60.14 55.24
Winnings through week 11 (US$) 28.65 203.02 1018.23 -1484.21
Underdogs correct through week 11 13 12 22 15

I am not surprised by this, it’s absolutely in line with what I observed in the past two seasons. But damn, it hurts, and I have still no idea how to decide which model to pick at the beginning of the season.

Week 9, and this season is really off

Week Naive Bayes (Avg+OAvg) Naive Bayes (Avg) ANN (Avg + OAvg) ANN (Avg) Neural Network (Adj)
Week 9 10/13 9/13 8/13 10/13 9/13
Through week 9 67/115 69/115 64/115 60/115 61/115

Again, decent accuracy, and the season-long accuracy is actually at the same level as during the last two seasons at this point, yet the payout is very much not. To give you an impression of how unusual this season is: in my betting paper, I used a baseline of Vegas money line predictions, assuming that one would always bet on the money line favorite and get all pick ’ems right. This ideal outcome lead to high payouts over the course of the season for 2015/2016 and 2016/2017.

This season, however, the cululative payout so far would be US$ -28.95, compared to US$ 854.21 in 2015.