Amazon, spam and the biggest slushpile in history

You enter a perfectly ordinary search term into Amazon – a term that seeks a quality answer to a logical question – and what do you get? Do you get a carefully built list of results that offers a strong answer to the question you arrived with? Or do you find the results a bafflingly hard-to-read collection of titles, only some of which look like books of real quality?

If you don’t know what I mean, try it. Enter “crime thriller” or “suspense” or any other broadly thematic search term into Amazon’s search bar. (I’ve mostly used examples from Amazon.co.uk in this piece, but the issue is the same on either side of the Atlantic.)

What you’ll find is a set of search results that may be dominated by weirdly titled books, over-adorned with so many subtitles and series titles that the text becomes almost impossible to read. And while I’m fine with indie authors being well represented in Amazon’s searches, you can sometimes find it hard to find any traditionally published title amidst the results, and it’s just not plausible that excluding all traditional books is the best way of being of service to the curious browser.

What’s more, I suspect this problem is getting worse not just because Amazon permits the junk but – much worse – because it encourages it.

The seven stats all indies need to know

Simplify your thinking: find out what matters, and forget the rest.

Now I know that, by law, every second post on the Internet these days must bash Amazon in some way, and plenty of those Amazon-bashers would criticise the firm even if it secured world peace, a cure for malaria and an answer to The Donald all in one fell swoop.

So let me be clear.

I like Amazon.

I think it does good things for authors, and I think it’s done some terrific things for readers, too. I happily use the firm to self-publish my work. (Though, by the way, I don’t hate traditional publishers, either. They give me a good living, too.)

So, all that said, let’s get stuck in.

The problem we’re dealing with is simple. There’s too much junk on Amazon. WAY too much. And it’s not just one issue; it’s a whole constellation of them. But let’s see if we can at least pick out some of the major sub-types.

Problem 1: Books miscategorised in error

Let’s take The Holy Roman Empire, a masterpiece of historical writing produced by the Chichele Professor of History at Oxford University. It’s a fine book, by any standards, and Allen Lane have (typically) done extraordinarily well to get such a beefy book right up on high on the overall bestseller lists despite weighing in at a challenging 1000 pages and £30 cover price.

Taking a closer look at those bestseller lists in 2016, the historical one I got, but “Fitness & Exercise > Yoga“?

The book has, I’m sure, a thousand virtues, but telling you about yoga is perhaps not one of them.

To be clear, these errors aren’t Amazon’s. They come from publishers who should know better, but that’s to pass the buck. This is Amazon’s store, and these are Amazon’s webpages and bestseller lists. It’s fine to make use of what is effectively user-content, but user-content will inevitably have errors in – and it’s down to Amazon to eliminate, or at least vastly reduce, those things.

Nor would it even be hard to do. Instead of just accepting whatever BISAC codes (basically library categorisations) the user supplies, Amazon could run them through an automated checker that would query any combination of terms that seemed implausible. (Medieval history and yoga, are you sure, etc.?)

In most cases, simply asking the user to verify those oddities would be enough but Amazon doesn’t ask, doesn’t check. The mistakes go through. The bestseller lists get muddled with nonsense.

Problem 2: Books miscategorised on purpose

In the examples just above, I presume that the publisher simply made a mistake, but there are occasions where I suspect something more deliberate is going on.

Amazon’s 2016 bestsellers in the “Crime, Thriller & Mystery Series” category, for instance, included Kill me Again, which is not a series novel. The Girl You Lost is not a series novel. No Coming Back is not a series novel. Emma Donoghue’s Room is not a series novel.

That’s a shocking fact.

And it’s not just the bestseller list itself. The “most wished for” novels includes Clare Mackintosh’s fine debut (and definitely-not-a-series) novel, I Let You Go. The “most gifted” section includes the equally-definitely-not-in-a-series book, The Lie.

Oh, and since we’re at it, number nine on the actual bestseller list was the same as number one. Number seventeen was the same as number six. Yes, the entries relate to different formats, but readers know that Amazon offers books in every format. They don’t need to clog up a bestseller list to tell us that.

In short, we have a whole torrent of errors and inaccuracies on a single page, all of which could be rejected by a simple automated test, e.g. “Is this book part of a multi-book series, Yes or No?”

And note this: although one or two of the books on this page could be there because of publisher error, you have to believe that the errors are so widespread for another, more concerning, reason, namely that perhaps publishers chose to miscategorise their books, presumably in the belief that it was easier to get to the top of this bestseller list than some others.

That’s speculation but, if I’m right, it’s implying two things. First, Amazon is allowing itself to be gamed. Second, the way Amazon structures its search results to encourage that gaming, by pushing sales towards those who would cheat. And this, remember, takes place in a category where a simple, automated, yes or no test would eliminate such cheating.

Problem 3: Spammification of subtitles

The issues we’ve looked at so far are relatively mild in terms of what’s about to come.

You come across it all the time: ever-growing thickets of keywords jammed into subtitles and series titles.

In 2016, I came across a book with the one-word title “Ruthless”, a perfectly fine title for a thriller. The book had no subtitle, which is hardly surprising because, in the real world, thrillers don’t have subtitles. Academic books often do. Plenty of serious non-fiction does. Thrillers essentially never do.

It’s titled ‘Ruthless: A Chilling Psychological Thriller’ in 2017. But just look at how Amazon described this book in 2016: ‘Psychological Suspense Horror thriller: Ruthless: (Psychological thriller, suspense, jealousy, mystery) (… SPECIAL FREE BOOK INCLUDED’).

That’s right.

It looks far better tidied up, and if you dig around amidst the junk verbiage there, you can find the one word, Ruthless, which we took to be the title of the book. In Amazon, though, anything goes. If you want a title and subtitle combination that is nothing more than a mish-mash of keywords, bundled together without regard for sense, punctuation, or any desire to convey meaning, then Amazon isn’t going to stop you. If you want ludicrous spelling mistakes in your title, Amazon still isn’t going to stop you. If you want random repetitions, capitalisations, parentheses, Amazon won’t stop those, either.

The weirder thing is that this kind of game is against Amazon’s own rules.

Those rules tell you: (1) the title must appear on the cover, (2) the subtitle has to appear on the cover.

And again, Amazon could check. Computers can scan images for text with ease. Titles and subtitles are meant to be legible. They should not be able to evade machine-capture. So if Amazon’s computers come across an image that does not appear to contain the text, they could simply reject the submission – or at least force the question over to human adjudication.

Yet Amazon chooses not to act.

Problem 4: spammification of titles

In Amazon’s world, even having your book title prominent doesn’t have to be the case.

In 2017, ‘The Grave Man – A Sam Prichard Mystery’ was its Kindle titling, but in 2016, this began ‘Mystery: The Grave Man …’ instead. The title wasn’t first, the genre was.

What, dear reader, is the title of this book? I don’t know about you, but to my mind the title of that book is The Grave Man. The implication, formerly, was that the book’s title was “Mystery” and it doesn’t even appear.

Amazon’s algorithms, of course, feature the common search term (“Mystery”). Taking this as a title, one could theoretically plug this dummy title into Amazon’s KDP system and you’d be off and away.

I only found this book, originally, because I entered “Mystery” as a search term on Amazon.co.uk, and Archer’s book was the first result to appear.

I don’t really blame author generally for pulling these sorts of stunts, but I do blame Amazon. It’s okay, is it, for people to purloin obviously popular search terms as titles? I think it damages and belittles the reader-experience, it damages and belittles Amazon’s awesome brand. The firm is, or aims to be, better than that.

Problem 5: let’s all be bestsellers

In an online world, it makes a ton of sense to have multiple bestseller lists, categorised by theme and subject. Why not?

It’s not often that a history book will top a regular bestseller list, or a genre romance novel, or a travel book. But readers might well want to browse recent and successful titles in any of those areas, so it makes good sense to supply them. And of course, though “travel” is a relatively niche genre, there are still lots of good travel books being written, being sold and being read. A travel-only bestseller list makes perfect sense. And good for Amazon for creating and maintaining such a list. It was a great and welcome innovation.

Except that its lists are so very fine-grained that plenty of them make no sense at all.

Brent Underwood recently proved the point by uploading a photo of his foot as a book, then getting three friends to buy it (at $0.99), and triumphantly found himself at the top of not one but two bestseller lists (Psychology, Transpersonal and Social Sciences, Freemasonry).

Now there are multiple problems here.

There’s the first issue we mentioned: when two wholly disparate book categorisations are chosen, the chances are it’s an error (as with The Holy Roman Empire) or a ploy to game the system. Either way, Amazon should at least query the selection, but it doesn’t.

A second issue is that of thin content. Why did Amazon not challenge the almost total absence of content in Underwood’s book? Why did he not get a message saying, “Sorry, we’ve checked your content and it appears thin and of little value to readers. Please reconsider your material and resubmit when ready”?

Clearly, he should have done. Readers would have benefitted.

But the biggest issue is the one Underwood centres on. With bestseller lists as absurdly specific as “Social Sciences / Freemasonry”, any damn book can become a bestseller.

Pretty clearly, Social Sciences does deserve a bestseller list of its own. Equally clearly the section of that list devoted to Freemasonry (and in fairness to other secret societies too) is too abstruse for any bestseller list to be meaningful or helpful.

Why have those lists? Why allow people to claim, with perfect truth, that they are #1 bestsellers on the world’s largest and most popular bookstore, when you know perfectly well that the number of sales needed to get them there is pitifully small?

The current system just devalues what should be a hard-to-achieve accolade.

The seven stats all indies need to know

Simplify your thinking: find out what matters, and forget the rest.

Problem 6: Even the good turn spammy

This final problem is almost the worst of the lot. Because of the abuses proliferating in so many parts of Amazon’s search system, even genuinely good outfits feel compelled to play the I-can-spam-more-than-you game.

Take The Girl in the Ice, published by Bookouture, a wonderful digital-only British publisher.

The book has been as high as #2 in the overall Amazon.co.uk charts and may even have been to #1. The book cover is classy. So far, fair enough, but how does Amazon describe this book?

Including its title, subtitle, and series title, the book’s description becomes (in 2017): ‘The Girl in the Ice: A gripping serial killer thriller (Detective Erika Foster Book 1) Kindle Edition’.

And already, that’s teetering on more verbiage than the eye can easily take in. A bundle of keywords mashed repetitiously together. Let’s also leave aside the little question of why this serial killer novel book was flagged as a #1 bestseller in Sociology. Let’s leave all this aside, and just focus on this one fact.

Bookouture is a fantastic publisher, with strong editorial standards, beautiful cover design, proper copyediting skills, excellent commercial success, a strong following on social media and on email. They’re not spammers. They’re not fly-by-nights.

Yet if Amazon sets the rules so that piling the keywords in works, then staying out of that game will cost a lot of money.

For a digital-only firm like Bookouture, it could be the dividing line between success and failure, all because of Amazon’s broken rules and poor enforcement of the rules. So much for the problems. (I’m not suggesting this is a comprehensive list, by the way, just that it does indicate the scale of the issues.)

I want to turn now to two questions. First, what would Amazon do? Next, what would Google do?

What would Amazon do?

That sounds like a strange question, doesn’t it? Haven’t we just looked at what Amazon does?

Yet it’s worth noting that Amazon Publishing itself behaves as primly as any fustily traditional print publisher. Here, for example, is a typical Apub book listing, The Magpies by Mark Edwards.

No subtitle.

No series title.

The book description is terse and accurate. Beyond the (accurate, helpful) use of the term “thriller” in the first line, there’s no attempt to pack the book description with search terms.

In short, the listing is completely clean. Completely spam free. Entirely helpful to the reader. The listing rejects the tools that countless publishers – indies, digital only, and big traditional firms alike – use to game the system.

And that, I think, tells you more about the core values of the firm than the profusion of spam does. The fact is that Amazon Publishing loathes spam. It maintains a set of values like those of print publishers. By not playing the spam-you-more game, it shows it would rather make less money than sacrifice its principles.

What would Google do?

So much for Amazon. Let’s shift focus a moment and turn to Google.

If that switch seems jarring, just bear in mind that at the heart of Amazon’s store is a search-engine – one whose job it is to respond to your query with a page full of relevant search results. Google is quite good at that search-engine game, so it’s worth considering how that firm might approach the same problem, if it were placed in charge.

Google’s approach to search has three critical foundations:

  1. It seeks quality and authority above all
  2. It is relentlessly innovative
  3. It is human-led

That last point may raise eyebrows, but it’s true all the same. Yes, Google’s search algorithms are automated, of course, but Google constantly monitors the quality of its search results against careful human evaluation of the authority and quality of the webpages that are turned up.

So human evaluators – carefully trained in their role – are asked to evaluate a whole set of webpages in a niche for such things as authority, clarity, easy availability of information, quality of user experience, and much else. If Google’s automated search results aren’t ranking the websites in response to a given search, Google will try to improve their algorithms to improve the match. The automated algorithms are constantly striving to meet goals set by intelligent, trained, well-resourced and human judgements.

Because technology changes, because user-requirements change, and probably because Google’s own understanding of site quality evolves, the search algorithm is never still. The firm is unrivalled in web search and is still streets ahead of its competition.

Supposing you applied the same kind of thinking to Amazon’s search results, what would you get? I’d suggest that you’d get a search engine that would think hard about three factors.

  • Relevance. Of course it matters that search terms are ‘relevant’ to a particular query. So if you ask for ‘women’s fiction’, you want to find women’s fiction. But since subtitles and the rest are so easy to fill with junk, you will have to capture your signals of relevance from elsewhere. Hard-to-game relevance signals would include (a) formal reviews of the book, (b) user reviews of the book, (c) BISAC categorisation of the book, and (d) the ‘Customers Also Bought’ metric. An intelligent combination of those signals would give you an impossible-to-game method of determining relevance.
  • Quality. Even though Amazon’s default ranking for search is ‘relevance’, that doesn’t – and shouldn’t – mean quite what it says. Supposing that you wanted ‘Cold War Spy Fiction’, there might be, say, five thousand titles that offered you precisely that. Given that relevance signals might, for those five thousand titles, be so tightly bunched that minor differences meant essentially nothing, you’d want some other way to push the ‘right’ solutions to the top. Since the ‘right’ solution here would certainly include John Le Carre, you’d have to find some way to generate signals of quality. Those signals might include sales success, formal reviews, informal user-reviews, quality of publisher, prizes won or shortlists achieved, and so forth. Again, you can’t game those things or not really, not easily.
  • Sales. I’ve not forgotten that Amazon is a retailer, not a library service. It does and should think about sales, and so it’ll want to pop newer titles and more strongly selling titles up to the top of the lists too. And that’s fine. If books rank similarly for relevance and quality, then Amazon should certainly serve its own interests – and, indeed, the reader’s interests – by promoting the books that readers are more likely to buy.

And that’s not hard, is it? None of that is particularly hard to achieve.

What baffles me is why Amazon has allowed its search results to clog up with as many problems as have been listed in this post. Some possible answers are:

A. It knows about the issue and is working on it.

B. It’s taken its eye off the ball.

C. It’s found that its sales benefit from its current approach and that’s all it cares about.

D. It knows that indie-publishers are more ready to game the system than traditional publishers and Amazon actually likes the way its current system disadvantages traditional publishers.

Because I think the firm is hellishly smart, I tend to discount the first two of those explanations.

On the sales issue, there’s a question about Amazon’s longer-term profile and success. Does the firm want to be known as a place that promotes junk over quality? I doubt it. I just can’t see that a reputation of that sort could be to the firm’s long-term advantage, and no firm has longer-term horizons than Amazon itself. (And of course, though Amazon’s market share implies monopoly, its most important ebook competitors comprise Apple and Google, who are currently battling it out for the title of world’s most valuable firm. I feel Amazon is not complacent, but if it is, it shouldn’t be.)

So that leaves the fourth explanation. I’m not sure it’s the correct answer, but what if it were? If what we’re seeing is a deliberate attempt to enfeeble the traditional industry?

I think that would be depressing. It would feel like a teenager kicking out at parental authority. And yes, Amazon, we know you’re innovative. We know you’re disruptive. We know about your huge list of achievements that go far beyond remaking old-fashioned bookselling and include such things as AWS cloud computing, the e-reader, the invention and regularisation of the ebook market, and so much else.

Whatever else, don’t clog your search results, Amazon. Respect the product. Respect the reader.

Let’s de-spam Amazon search. Please.

The seven stats all indies need to know

Simplify your thinking: find out what matters, and forget the rest.