Google ads are full of crime and most web users should block
them. If you don’t believe the FBI, or Malwarebytes,
believe Google. Their 2024
Ads Safety Report is out (Search
Engine Land covered it) and things do not look good. The report is
an excellent example of some of the techniques that big companies use to
misrepresent an ongoing disaster as somehow improving, so I might as
well list them. If I had to do a corporate
misinformation training session, I’d save this PDF for a reading
assignment.
release bad news when other news is happening This
was a big news week for Google, which made it the best time to release
this embarrassing report. Editors aren’t going to put their Google
reporter to work on an ad safety story when there’s big news
from the Federal courthouse.
counting meaningless numbers Somehow our culture
teaches us to love to count,
so Google gives us a meaningless number when the meaningful numbers
would look crappy.
Last year, we continued to invest heavily in making our LLMs more
advanced than ever, launching over 50 enhancements to our models which
enabled more efficient and precise enforcement at scale.
The claim is that Google continued to invest heavily and
that’s the kind of statement that’s relatively easy to back up with a
number that has meaningful units attached. Currency
units, head count, time units, even lines
of code. Instead, the count is enhancements which could be
almost anything. Rebuild an existing package with different compiler
optimizations? Feed an additional data file to some ML system? What this
looks like from the outside is that the meaningful numbers are going in
the wrong direction (maybe some of the people
who would have made them go up aren’t there any more?) so they
decided to put out a meaningless number instead.
control the denominator to juice the ratio Only
takes elementary school math to spot this, but might be easy to miss if
you’re skimming.
Our AI-powered models contributed to the detection and enforcement of
97% of the pages we took action on last year.
Wow, 97%, that’s a big number. But it’s out of pages we took
action on which is totally under Google’s control. There are a bunch
of possible meaningful ratios to report here, like
(AI-flagged ads)/(total ads)
(ads removed)/(AI-flagged ads)
(bad ad impressions)/(total ad impressions)
and those could have been reported as a percentage, but it looks like
they wanted to go for the big number.
pretend something that’s not working is working The
AI models contributed to 97% of the actions, but
contributed isn’t defined. Does it count as contributed
if, say, human reviewers flagged 1,000 ads, the AI flagged
100,000 ads, and 970 ads were flagged by both? If AI were flagging ads
that had been missed by other methods, this would have been the place to
put it. The newsworthy claim that’s missing is the count of bad ads
first detected by AI before getting caught by a human reviewer.
Contributed to the detection could be a lot of things. (If this
were a report on a free trial of an AI-based abuse detection service,
contributed wouldn’t get me to upgrade to the paid plan.)
report the number caught, not the number that get
through Numbers of abusers caught is always the easiest number
to juice. The simplest version is to go home at lunch hour, code up the
world’s weakest bot, start it running from a non-work IP address, then
go back to work and report some impressive numbers.
To put this into perspective: we suspended over 39.2 million accounts
in total, the vast majority of which were suspended before they ever
served an ad.
Are any employees given target numbers of suspensions to issue? Can
anyone nail their OKRs by raising the number of accounts suspended? If
this number is unreliable enough that a company wouldn’t use it for
management, it’s not reliable enough to pay attention to. They’re also
reporting the number of accounts, not individuals or companies. If some
noob wannabe scammer writes a script to POST the new account form a
million times, do they count for a million?
don’t compare to last year Here’s the graph of bad
ads caught by Google in 2024.
The total number isn’t as interesting as the individual, really
problematic categories. The number caught for enabling dishonest
behavior went down from about 20 million in 2023 to under 9 million
in 2024.
Did the number of attempts at dishonest behavior with Google ads
really go down by more than half in a single year? Or did Google catch
fewer of them? From the outside, it’s fairly easy to tell that Google
Ads is understaffed and the remaining employees are in the weeds,
but it’s hard to quantify the problem. What’s really compelling about
this report is that the staffing situation has gotten bad enough that
it’s even showing up in Google’s own hand-picked numbers. In general
when a report doesn’t include how a number has changed since the last
report, the number went in the wrong direction and there’s no good
explanation for why. And the number of ads blocked or removed for
misinformation went from 30 million in 2023 to (checks notes)
zero in 2024. Yes, misinformation
has friends in high places now, but did all of the sites worldwide
that run Google ads just go from not wanting to run misinformation to
being fine with it?
report detection, not consequences Those numbers on
bad ads are interesting, but pay attention to the text. These are
numbers for ads blocked or removed, and repeat
offenders drive the bulk of tech support scams via Google Ads. Does
an advertiser caught doing misrepresentation in one ad get to
keep going with different ads?
don’t compare to last year, part 2 The previous two
graphs showed Google’s bad ads/good site problem, so here’s how they’re
doing on their good ad/bad site problem. Here’s 2024:
1.3 billion pages taken action against in
2024
And 2023:
2.1 billion pages taken action against in
2023
Ad-supported AI
slop is on the way up everywhere, making problem pages easier to
create at scale, but Google somehow caught 800 million fewer pages than
in 2023. How many pages they took action against isn’t even a good
metric (and I would be surprised if anyone is incentivized based on it).
Some more useful numbers would be stuff like
What percentage of advertisers had their ad run on a page that
later had action taken against it?
How much money was paid out to sites that were later removed for
violating the law or Google policy?
A real Ad Safety Report would help an advertiser answer
questions about how likely they are to sponsor illegal content when they
buy Google ads. And it would help a publisher understand how likely they
are to have an ad for malware show up on their pages. No help from this
report. Even though from the outside we can see that Google runs a bunch of
ads on copyright-infringing sites, not only does Google not report
the most meaningful numbers, they’re doing worse than before on the less
meaningful numbers they do choose to report.
Google employees, (yes, both FTEs and TVCs) are doing a
lot of good work trying to do the right thing on the whole ads/crime
problem, but management just isn’t staffing and funding the ad safety
stuff at the level it needs. A company with real competition would have
had to straighten this situation out by now, but that’s not
the case for Google. Google’s services like Search are both free and
overpriced—users don’t pay in money, but in over-exposure to fraud and
malware risks that would be lower in a competitive market. If a future
Google
breakup works, one of the best indicators of success will be more
meaningful, and more improved, metrics in future ad safety reports.
Flaming
Fame. by George Tannenbaum. We don’t see shitty work and say
that’s shitty. It’s worse than that. We simply don’t see it at
all.
LG
TVs’ integrated ads get more personal with tech that analyzes viewer
emotions by Scharon Harding. The company plans to incorporate a
partner company’s AI tech into its TV software in order to interpret
psychological factors impacting a viewer, such as personal interests,
personality traits, and lifestyle choices. (What happens when you do
a Right to Know for the family TV?)
Former
Substack creators say they’re earning more on new platforms that offer
larger shares of subscription revenue by Alexander Lee. Since
leaving Substack, some writers’ subscriber counts have plateaued over
the past year, while others have risen — but in both cases, creators
said that their share of revenue has increased because Ghost and Beehiiv
charge creators flat monthly rates that scale based on their subscriber
counts, rather than Substack’s 10 percent cut of all transaction
fees.
The
Mediocrity of Modern Google by Om Malik. What’s particularly
ironic is that today’s Google has become exactly what its founders
warned against in their 1998 paper: an advertising company whose
business model fundamentally conflicts with serving users’
needs.
Git
turns 20: A Q&A with Linus Torvalds by Taylor Blau. So I was
like, okay, I’ll do something that works for me, and I won’t care about
anybody else. And really that showed in the first few months and
years—people were complaining that it was kind of hard to use, not
intuitive enough. And then something happened, like there was a switch
that was thrown.
I’m not an expert on electric cars, so I don’t know enough to
criticize some of the hard parts of the design of a Tesla. But when they
get obvious
stuff like getting out without power wrong, that’s a pretty good
sign to stay away.
How
the U.S. Became A Science Superpower by Steve Blank. Post war, it
meant Britain’s early lead was ephemeral while the U.S. built the
foundation for a science and technology innovation ecosystem that led
the world—until now.
Here’s another privacy paradox for people who collect them.
On the web, the average personalized ad is probably better than
the average non-personalized ad. (The same ad campaigns that have a
decent budget for ad creative also have a budget for targeting
data.)
But users who block personalized ads, or avoid personalization by
using privacy tools and settings, are, on average, better
off than users who get personalized ads.
There’s an expression in finance: Picking
Up Nickels In Front Of A Steam Roller. For some kinds of investing
decisions, the investor is more likely to make a small gain than to lose
money in each individual trade. But the total expected return over time
is negative, because a large loss is an unlikely outcome of each trade.
The decision to accept personalized ads or try to avoid them might be a
similar bet.
For example, a typical positive outcome of getting personalized ads
might be getting better shoes, cheaper. There’s a company in China that
is working the personalized ad system really well. Instead of paying for
high production value ads featuring high-profile athletes in the USA,
they’re just doing the incremental data-driven marketing thing. Make
shoes, experiment with the personalized ad system, watch the numbers,
reinvest in both shoe improvements and improvements to the personalized
ads. For customers, the shoe company represents the best-case scenario
for turning on the personalized ads. You get a pair of shoes from China
for $40 that are about as good as the $150 shoes from China that you
would get from a a big-name brand. (The shoes might even be made by the
same people out of the same materials.) I don’t need to link to the
company, just turn on personalized ads and if you want the shoes they’ll
find you.
That example might be an outlier on the win-win side, though. On
average, personalized (behaviorally targeted) ads are likely
to be associated with lower quality vendors and higher product prices
compared to competing alternatives found among search results. (Mustri
et al.) but let’s pretend for a minute and say you figured out how
to get targeted in the best possible way and come out on the winning
side. That’s pretty sweet—personalized ads save you more than a
hundred bucks on shoes, right?
Here comes the steamroller, though.
In recent news, Baltimore
sues 2 sportsbooks over alleged exploitative practices. Some people
are likely to develop a gambling problem, and if you don’t know in
advance whether or not you’re one of them, should you have the
personalized ads turned on? You stand to lose a lot more than you would
have gained by getting the cheap shoes or other miscellaneous stuff. It
is possible that machine learning on the advertising or recommended
content side could know more about you than you do, and the negative
outcomes from falling for an online
elder fraud scheme tend to be much larger than the positive outcomes
from selecting the best of competing legitimate products.
People’s reactions to personalization are worth watching, and reflect
more widely held understanding of how information works in markets than
personalized ad fandom does. The fact that Google may have used this
data to conduct focused ad campaigns targeted back to you was disclosed as if
it was a security issue, which makes sense. Greg Knauss writes,
Blue Shield says that no bad actor was involved, but is that
really true? Shouldn’t a product that, apparently by default, takes
literally anything it can—privacy be damned—and tosses it into the old
ad-o-matic not be considered the output of a bad actor? Many people
(but not everybody)
consider being targeted for a personalized ad as a threat in itself.
More:personalization
risks
Bonus links
What If We Made Advertising
Illegal? by Kōdō Simone. The traditional argument
pro-advertising—that it provides consumers with necessary
information—hasn’t been valid for decades. In our information-saturated
world, ads manipulate, but they don’t inform. The modern advertising
apparatus exists to bypass rational thought and trigger emotional
responses that lead to purchasing decisions. A sophisticated machine
designed to short-circuit your agency, normalized to the point of
invisibility. (Personally I think it would be hard to come up with a
law that would squeeze out all incentivized communication intended to
cause some person to purchase some good or service, but it would be
possible to regulate the information flows in the other
direction—surveillance of audience by advertiser and intermediaries—in a
way that would mostly eliminate surveillance advertising as we know it:
Big Tech
platforms: mall, newspaper, or something else?)
Meta
secretly helped China advance AI, ex-Facebooker will tell Congress
by Ashley Belanger. In her prepared remarks, which will be delivered
at a Senate subcommittee on crime and counterterrorism hearing this
afternoon, Wynn-Williams accused Meta of working hand in glove
with the Chinese Communist Party (CCP). That partnership allegedly
included efforts to construct and test custom-built censorship tools
that silenced and censored their critics as well as provide the CCP
with access to Meta user data—including that of Americans.
(And if they’re willing to do that, then the elder fraud ads on Facebook
are just business as usual.)
Protecting
Privacy, Empowering Small Business: A Path Forward with S.71 (A
privacy law with private right of action gets enforced based on what
makes sense to normal people in a jury box, not to bureaucrats who think
it’s normal to read too many PDFs. Small businesses are a lot better off
with this common-sense approach instead of having to feed the
compliance monster.)
This
startup just hit a big milestone for green steel production by Casey
Crownhart. Boston Metal uses electricity in a process called molten
oxide electrolysis (MOE). Iron ore gets loaded into a reactor, mixed
with other ingredients, and then electricity is run through it, heating
the mixture to around 1,600 °C (2,900 °F) and driving the reactions
needed to make iron. That iron can then be turned into steel. Crucially
for the climate, this process emits oxygen rather than carbon
dioxide…
There are two kinds of PDFs. Some have real embedded text that you
can select in a PDF reader, and some are just images.
The second kind is what I sometimes get in response to a CCPA/CPRA
Right to Know. Some companies, for whatever reason, want to make it
harder to do automated processing of multiple RtKs. This should make
privacy researchers more likely to look at them, because what are they
hiding and they must be up to something.
But the PDF still needs to get run through some kind of OCR. Tesseract OCR has been
giving me pretty good results, but it needs to be fed images, not
PDFs.
So I have been feeding the PDFs to pdf2image—in
Python code, and then passing the images to Tesseract. But it turns out
that Tessaract works a lot better with higher resolution images, and the
default for pdf2image is 200 DPI. So I’m gettting a lot more accurate
OCR by making the images oversized with the dpi named
parameter:
I might tweak this and try 300 DPI, or also try passing
grayscale=True to preserve more information. Some other
approaches to try next, if I need them.
Anyway, Meta
(Facebook) made some of their info easy to parse (in JSON format)
and got some of us to do research on them. Some of the other interesting
companies, though, are going to be those who put in the time to
obfuscate their responses to RtKs.
Related
OCRmyPDF is an
all-in-one tool that adds a text layer to the PDF. Uses Tessaract
internally. When possible, inserts OCR information as a “lossless”
operation without disrupting any other content. Thanks to Gaurav Ujjwal for the link. (I’m
doing an OCR step as part of ingesting PDFs into a database, so I don’t
need to see the text, but this could be good for PDFs that you actually
want to read and not just do aggregated reporting on.)
Example of where
GDPR compliance doesn’t get you CCPA compliance: This is the mistake
that Honda
recently made. CCPA/CPRA is not just a subset of GDPR. GDPR allows a
company to verify an objection to processing, but CCPA does not allow a
company to verify an opt out of sale. (IMHO the EU should harmonize by
adopting the California good-faith, reasonable, and documented belief
that a request to opt-out is fraudulent standard for objections to
processing.)
New
Report: Many Companies May Be Ignoring Opt-Out Requests Under State
Privacy Laws - Innovation at Consumer ReportsThe study examined
40 online retailers and found that many of them appear to be ignoring
opt-out requests under state privacy laws. (A lot more companies are
required to comply with CCPA/CPRA than there are qualified compliance
managers. Even if companies fix some of the obvious problems identified
in this new CR report, there are still a bunch of data transfers that
are obvious detectable violations if a GPC flag wasn’t correctly set for
a user in the CRM system. You can’t just fix the cookie—GPC also has to
cover downstream usage such as custom audiences and server-to-server
APIs.)
Bonus links
EU
may “make an example of X” by issuing $1 billion fine to Musk’s social
network by Jon Brodkin at Ars Technica. (A lot of countries don’t
need to raise their oen tariffs in order to retaliate against the USA’s
tariffs. They just need to stop letting US companies slide when they
violate laws over there. If they can’t rely on the USA for regional
security, there’s no reason not to. Related: US
Cloud soon illegal? at noyb.eu)
Big
Tech Backed Trump for Acceleration. They Got a Decel President
Instead by Emanuel Maiberg and Jason Koebler at 404 Media. Unless
Trump folds, the tariffs will make the price of everything go up.
Unemployment will go up. People will buy less stuff, and companies will
spend less money on advertising that powers tech platforms. The tech
industry, which has thrived on the cheap labor, cheap parts, cheap
manufacturing, and supply chains enabled by free and cheap international
trade, will now have artificial costs and bureaucracy tacked onto all of
this. The market knows this, which is why tech stocks are eating
shit. (Welcome to the weak men create hard times phase—but
last time we had one of these, the dismal Microsoft monopoly days are
when we got the web
and Linux scenes that evolved into today’s Big Tech. Whatever
emerges from the high-unemployment, import-denied generation, it’s going
to surprise us.)
Signal
sees its downloads double after scandal by Sarah Perez on
TechCrunch. Appfigures chalks up the doubling of downloads to the old
adage all press is good press, as the scandal increased Signal’s
visibility and likely introduced the app to thousands of users for the
first time. (Signal is also, according
to traders on Manifold Markets, the e2e messaging program least
likely to provide message content to US law enforcement. Both Apple, the
owner of iMessage, and Meta, the owner of WhatsApp, have other
businesses that governments can lean on in order to get cooperation.
Signal just has e2e software and reputation, so fewer points of
leverage.)
YouTube
removes ‘gender identity’ from hate speech policy by Taylor Lorenz
(In the medium term, a lot of the moderation changes at Big Tech are
going to turn into a recruiting challenge for hiring managers in
marketing departments. If an expected part of working in marketing is
going to be mandatory involvement in sending money to weird, creepy
right-wing dudes, that means you’re mostly going to get to hire…weird,
creepy right-wing dudes.) Related: slop
capitalism and dead internet theory by Adam Aleksic. Our best way
of fighting back? Spend as little time on algorithmic media as possible,
strengthen our social ties, and gather information from many different
sources—remembering that the platforms are the real enemy.
Some good news last week: Meta
settles UK ‘right to object to ad-tracking’ lawsuit by agreeing not to
track plaintiff. Tanya O’Carroll, in the UK, has settled a case with
Meta, and the company must stop using her data for ad targeting when
she uses its services. It’s not a change for everyone, though, since
the settlement is just for one person. O’Carroll said she is unable
to disclose full details of the tracking-free access Meta will be
providing in her case but she confirmed that she will not have to pay
Meta.
The Open Rights Group now has a Meta opt-out
page that anyone in the UK can use to do an opt out under the UK
GDPR.
If you use any Meta products – Facebook, Instagram, Meta Quest or VR,
Threads or WhatsApp – you can use our tool to request that they no
longer collect or process your data for advertising. This is known as
your right to object, which is enshrined in data protection law. Meta
had tried to get around GDPR, but by settling Tanya’s case they have
admitted that they need to give their users this right.
If you’re in the UK, you can either use the form on the site, or use
the mailto link to open up a new regular email from your own account
pre-populated with the opt out text. This is a win not just because it
could mean less money for a
transnational criminal organization and more money staying in the
UK, but also because it’s going to mean better products and services for
the people who do it.
Generate tracking data that is hard to link to you
Set an opt out while doing the surveilled activity
Send an opt out or Right to Delete after doing the
surveilled activity
Having access to this new tool doesn’t mean not to do the others.
Even if I could figure out how to use the Meta apps in a way that’s
totally safe for me, it’s still a win to switch away because it helps
build network effects for the alternatives and more safety for other
people. So even if you do this opt out, it’s also a good idea to do the
other effective
privacy tips.
The personalization of ads on Facebook helps vendors of crappy,
misrepresented goods match their products to the shoppers who are most
likely to fall for their bullshit. Yes, you can follow the advice in
articles like Don’t
Get Scammed! Tips For Spotting AI-Generated Fake Products Online on
Bellingcat, but it’s a time-saver and an extra layer of protection not
to get the scam ad in the first place.
Why
We Need Shortwave 2.0 by Kim Andrew Elliott on Radio World.
Because Shortwave Radiogram is transmitted on a regular
amplitude-modulated shortwave transmitter, it can be received on any
shortwave radio, from inexpensive portable radios with no sideband
capability, to more elaborate communications receivers, amateur
transceivers (most of which nowadays have general coverage receivers),
and software defined radios (SDRs). (Then you need a program to
convert the encoded signal into text and/or images—or this functionality
could be built into future inexpensive radios.)
I’m still waiting for my copy of Careless
People by Sarah Wynn-Williams, so I don’t have anything more
on the content of the book than what I have seen in other reviews. The
local bookstore had a stack—more than they normally order for new
hardcovers—but I hesitated and they were gone next time I went in there.
So yes, I am a little behind on this.
But come on, people.
Does anyone seriously think that Meta execs don’t understand the
Streisand
effect?
But Big Tech’s apparent eagerness to appear in bad news makes sense
when you look at the results. Out of all the people who read and were
outraged by Careless People over the weekend, how many are
going to come in to work on Monday and delete their Meta tracking pixel
or turn off Meta CAPI? And how many people concerned about Google’s
malware, CSAM, and infringing content
problems are going to switch to inclusion lists and validated
SupplyChain objects and stop with the crappy,
often illegal ad placements that Google recommends and legit ad agencies
don’t? For Big Tech, doing crimes in an obvious way is a power move,
a credible, costly signal. If there were a Meta alternative that didn’t
do genocide,
or an honest alternative to Google search advertising, then advertising
decision-makers would have switched to them already. All these
embarrassing-looking stories are a signal: don’t waste your time looking
for an alternative to paying us. The publisher’s
page for Careless People has a Meta pixel on it.
I do have a book recommendation that might be a little easier to get
a hold of. Codes
of the Underworld by Diego Gambetta was the weekly
book recommendation on A Collection of Unmitigated
Pedantry. I’m glad to see that it is still in print, because it’s
a useful way to help understand the Big Tech companies. Actions that
might not have made sense in a company’s old create more
value than you capture days are likely to be easier to figure out
after understanding the considerations applied by other criminal
organizations.
Criminals have hard-to-satisfy communications needs, such as the need
to convey a credible threat to a victim without attracting the attention
of enforcers. This is related to the signaling
problem faced by honest advertisers, but in reverse. How can a
representative of a protection racket indicate to a small business that
they represent a true threat, and aren’t just bluffing? Gambetta digs
into a variety of signaling problems. It’s a 2009 book, so many of the
Big Tech firms were still
legit when it came out, but a lot of the communications methods from
back then apply to the companies of today.
Is there a solution? As Gambetta points out, real-life organized
crime perpetrators tend to copy from the movies, and today they’re
copying the partnership with a
friendly government subplot from The Godfather
Part II. Maybe it’s time to watch that movie again.
Related
Update 12 Apr 2025: tante/Jürgen Geuter makees a similar point, in Vulgar
Display of Power. It is a display of power: You as an artist, an
animator, an illustrator, a writer, any creative person are powerless.
We will take what we want and do what we want. Because we can.
imho AI-generated images used to
illustrate a blog post (and not specifically to discuss AI images)
usually send a louder message than the writing does. Gareth Watkins: AI:
The New Aesthetics of Fascism
Privacy-Respecting
European Tech Alternatives by Jonah Aragon. [T]he United States
certainly does not have a monopoly on the best technologies, and many of
our favorite recommended tools come from Europe and all over the world.
Tools from the European Union also generally benefit from much stronger
data protection laws, thanks to the EU’s General Data Protection
Regulation (GDPR). Related: But
how to get to that European cloud?
Please
stop externalizing your costs directly into my face by Drew DeVault.
Whether it’s cryptocurrency scammers mining with FOSS compute
resources or Google engineers too lazy to design their software properly
or Silicon Valley ripping off all the data they can get their hands on
at everyone else’s expense…
But the good news for slacker states is that doing the most work,
cranking out the most lines of code, or the most pages of PDFs, or
whatever, does not necessarily produce the best results. Given the
amount of work that other states, and juridictions like the European
Union, have already done on privacy, a slacker state can, right now, get
not just the best privacy protection but also save a lot of time and
grief for state employees and for business people in your state.
You need two laws. And we know that people are going to print them
out, so please keep them short. (Maybe do a printer ink right to refill
law next year?)
Focus on the riskiest companies with the most money and staff for
compliance—don’t put extra work on small local businesses.
Save your state’s attorney general and their staff a bunch of
time. They’re not Big
Tech’s support department. If a Big Tech company drops the ball on
user support, just suspend their surveillance license until they clean
up their act, like a problem bar and their liquor license.
You can define surveillance really briefly in the law and make
the big out-of-state companies do the work of describing their
surveillance practices in their license application.
That one is pretty easy to do as long as you focus purely on inbound
data, the surveillance part, and don’t touch anything that sounds like
speech from the company to others. And you can push most of the
work off onto Big Tech and a new surveillance licensing board. I’m sure
every state has people who would be willing to get on one of those.
Second, copy all the details from other states and
countries. The other law would be focused on maximum privacy,
minimum effort. The goal is to make a law that small business people can
comply with, without even reading it, because they already had to do
some privacy thing for somewhere else. Two parts.
Any privacy feature offered in some other jurisdiction must be
offered here, too. A company only breaks the law if someone out-of-state
gets a privacy feature that someone in-state doesn’t.
This law may be enforced by anyone except a state employee.
(Borrow the Texas S.B. 8 legal hack, to protect yourself from Big Tech
industry groups trying to block the law by starting an expensive
case.)
A small business that operates purely locally can just do their
thing. But if they already have some your California privacy
rights feature or whatever, they just turn it on for this state too.
Easier compliance project for the companies, better privacy for the
users, no enforcement effort for the state, it’s a win-win-win. After
all, state legislators don’t get paid by the page, and we each only get
one set of carpal tunnels.
Meta,
Apparently, Really Wants Everyone To Read This Book (By Trying To Ban
It) by Mike Masnick. Macmillan showed up just long enough to
point out the blazingly obvious: they never signed any agreement with
Meta and thus can’t be bound by arbitration. The arbitrator, displaying
basic common sense, had to admit they had no jurisdiction over
Macmillan.
AI
Search Has A Citation Problem by Klaudia Jaźwińska and Aisvarya
Chandrasekar. Chatbots were generally bad at declining to answer
questions they couldn’t answer accurately, offering incorrect or
speculative answers instead. (related: fix Google
Search)
How
Ukraine integrates machine vision in battlefield drones by Oleksandr
Matviienko, Bohdan Miroshnychenko & Zoriana Semenovych. In
November 2024, the government procured 3,000 FPV drones with machine
vision and targeting technologies. Reports also suggested that the
procurement would be expanded to 10,000 units.
Preparing
for the next European war by Azeem Azhar. One challenge will be
the simple rate of innovation in the actual battlefield. Drone warfare
in Ukraine has shown iteration cycles measuring weeks not years. So any
systems procured today need to be future-proofed for those
dynamics.
Thread
by Trent TelenkoThe logistical facts are that the FM-MAG machine
gun, the 60 mm & 81mm mortars, LAWS, Javelins, any infantry crew
served weapon you care to name are all going to be most to fully
replaced with drones and drone operators, because of the logistical
leverage drones represent on the battlefield.
Long-range
drone strikes weakening Russia’s combat ability, senior Ukrainian
commander says by Deborah Haynes. Some of the drones are remotely
piloted, others work via autopilot. Russia’s war has forced Ukraine to
use technology and innovation to fight back against its far more
powerful foe. It has accelerated the use of autonomous machines in an
irreversible transformation of the warzone that everyone is watching and
learning from. Brigadier Shchygol said: Right now, Ukraine’s
battlefield experience is essentially a manual for the
world.
Ukraine
Drives Next Gen Robotic Warfare by Mick Ryan. Another more
interesting trend has arisen which will force policy makers and military
strategists to undertake an even more careful analysis of Ukraine war
trends, and how these trends apply in other theatres, particularly the
Pacific. This trend, robotic teaming, has emerged over the past year
with the advent on drone-on-drone combat in the air and on the ground.
In particular, several recent combat actions in Ukraine provide insights
that need to be studied and translated for their employment in the
massive ocean expanses, tens of thousands of kilometres of littoral,
thousands of large and small islands and at least three continents that
constitute the Pacific theatre.
DEEP
DIVE: Taiwan miltech aims to undermine Chinese components by Tim
Mak. Taiwan has learnt the central tech lesson from the war in
Ukraine: the next global conflicts will heavily feature cheap, small
drones—and in large numbers. So as an electronics and hardware component
giant—especially relative to its size and diplomatic status—it is trying
not only to develop a domestic industry, but also become an arsenal for
the free world, building drones and devices for allied militaries
worldwide.
Why
America fell behind in drones, and how to catch up again by Cat
Orman and Jason Lu. Also Building
Drones for Developers: A uniquely open architecture on the F-11
means that every part of the drone is truly built around the
[NVIDIAn] Orin [GPU]. This enables sophisticated autonomy applications
in which ML models are able to not only analyze data obtained in-flight,
but actually use that analysis to inform flight actions in real
time.
switching.software offers
Ethical, easy-to-use and privacy-conscious alternatives to well-known
software
Pro tip: avoid generative AI images in blog posts (even if your CMS
says you should have one for SEO purposes) unless you want to make a
political statement: AI:
The New Aesthetics of Fascism by Gareth Watkins
The Linux kernel project can’t use code from sanctioned countries.
Other projects need to check compliance with sanctions, too. US
Blocks Open Source ‘Help’ From These Countries by Steven J.
Vaughan-Nichols
p. 3 We do not consider legal arguments for consumer
privacy as a fundamental right or concerns about access to personal data
by malign actors or governments.
Avoiding malign actors is the big reason for restricting personalized
ads. And malign actors are numerous. The
high-profile national security threats are already in the news, but
most people will encounter miscellaneous malware, scams,
rip-offs and other lesser villainy enabled by ad personalization more
often than they have to deal with state or quasi-state adversaries.
There is no hard line between malign actors and totally legit
sellers—not only does the personalized ad business have plenty of
halfway crooks, you can find n/m-way crooks for arbitrary
values of n and m.
Ad personalization gives
a bunch of hard-to-overcome advantages to deceptive sellers.
Although scams are generally illegal and/or against advertising platform
policies, personalization makes the rules easier to evade, as we see
with some ways that
Facebook ads are optimized for deceptive advertising. Most
personalized ads aren’t clustered at the good (high-quality pair of
shoes in your size, on sale, next door!) or bad (malware pre-configured
for your system) ends of the spectrum. Advertisers at all levels of
quality and honesty are present, so any framework for thinking about ad
personalization needs to take that variability into account.
p. 3 Some privacy advocates assume, incorrectly, that
personalized marketing based on granular consumer data is automatically
harmful…
Treating personalized advertising as harmful by default is not an
assumption, but a useful heuristic based on both theoretical models and
real-world experience. personally, I don’t pay
attention to your ad if it’s personalized to me—it’s as credible as a
cold call. But I might pay attention to your ad if it’s run in a place
where the editors of sites that cover your industry would see it, or
your mom would. Yes, it is possible for professors to imagine a
hypothetical world in which personalization is beneficial, but that only
works if you make the unrealistic simplifying assumption that all
sellers are honest and that the only impact of personalization is to
show people ads that are more or less well matched to them. The
theoretical arguments in favor of personalized advertising break down as
soon as you level up your economic model to consider the presence of
both honest and deceptive advertisers in a market.
See Gardete and Bart, Tailored
Cheap Talk: The Effects of Privacy Policy On Ad Content and Market
Outcomes. Our research suggests that another peril of sharing
very high quality targeting information with advertisers is that ad
content may become less credible and persuasive to consumers. An
advertising medium that allows for personalization is incapable of
conveying as much information from an honest seller to a potential buyer
as an advertising medium that does not support personalization.
Mustri et al., in Behavioral
Advertising and Consumer Welfare, find that products found in
behaviorally targeted ads are likely to be associated with
lower quality vendors and higher product prices compared to competing
alternatives found among search results.
p. 8 Which Consumers Care Most About Privacy, and Do
Privacy Policies Unintentionally Favor the Privileged?
Lots of studies show that, basically, some people really want
cross-context personalized advertising, some people don’t, and for the
largest group in the middle, it depends how you ask. (references at the 30-40-30 rule). But the
difference in consumer preferences is not about privilege, it’s
about information level. See Turow et. al, Americans
Reject Tailored Advertising and Three Activities That Enable It.
That study includes a survey of privacy preferences before and after
informing the participants about data practices—and people were
more likely to say they do not want tailored advertising
after getting the additional information.
In the Censuswide study Who’s
In the Know: The Privacy Pulse Report, the experienced
advertisers surveyed in the USA (people with 5 or more years of ad
experience) were more likely than average to use an ad blocker (66% >
52%), and privacy is now the number one reason for people to use one. It
is reasonable for policy-makers to consider the preferences of
better-informed people—which is already a thing in fields such as
transportation safety and public health.
p. 11 Poorer consumers live in data deserts (Tucker
2023), causing algorithmic exclusion due to missing or fragmented data.
This exclusion thwarts marketing outreach and may deprive them of
offers, exacerbating data deserts and marginalization.
Instead of speculating about this problem, personalized advertising
proponents who are concerned about some people not being tracked enough
can already look at other good examples of possibly under-surveilled
consumers. Early adopters of privacy tools and preferences are helpfully
acting as the experimental group for a study that the surveillance
business hasn’t yet run. If people on whom less data is collected
are getting fewer win-win offers, then the privacy early adopters should
have worse consumer outcomes than people who leave the personalization
turned on. For example, Apple iOS users with App Tracking Transparency
(ATT) set to allow tracking should be reporting higher satisfaction and
doing fewer returns and chargebacks. So far, this does not seem to be
happening. (For a related result, see Bian et al., Consumer Surveillance and
Financial Fraud. Consumers who deliberately placed themselves in a
data desert by changing ATT to disallow tracking reported less
fraud.) Click
this to buy better stuff and be happier
And there’s little evidence to suggest that if a personalized ad
system knows someone to be poor, that they’ll receive more of the kind
of legit, well-matched offers that are targeted to the more affluent.
Poor people tend to receive more predatory finance and other deceptive
offers, so may be better off on average with ads less well matched to
their situation.
p. 13 More broadly, without cross-site/app identity,
consumers enjoy less free content
This depends on how you measure content and how you define
enjoy.The
Kircher and Foerderer paper states that, although children’s games
for Android got fewer updates on average after a targeted advertising
policy change by Google,
Only exceptionally well-rated and demanded games experienced more
feature updates, which could be interpreted as a sign of opportunity due
to better monetization potential or weakened competition. However,
considering that we observed these effects only for games in the highest
decile of app quality and demand and given that the median user rating
of a game is 4.1 of 5, our findings suggest widespread game
abandonment.
By Sturgeon’s
Law, a policy change that benefits the top 10% of games but not the
bottom 90% (which, in total, account for a small fraction of total
installs and an even smaller fraction of gameplay) is a win for the
users.
We find that more targeting increases competition and reduces the
websites’ profits, but yet in equilibrium websites choose maximum
targeting as they cannot credibly commit to low targeting. A privacy
protection policy can be beneficial for both consumers and websites.
When both personalized and non-personalized ad impressions are
available in the same market, the personalized impressions tend to go
for about
double the non-personalized. But it doesn’t work to artificially
turn off some data collection for a fraction of ad impressions, observe
that revenue for those impressions is lower (compared to impressions
with the data that are still available), and then extrapolate the
revenue difference to a market in which no impressions have the data
available.
It is also important to consider the impact of extremely low-quality
and/or illegal content in the personalized advertising market. Much of
the economic role of ad personalization is not to match the right ad to
the right user but to monetize a higher-value user on lower-value
content. The
surveillance economy is more like the commodification economy.
Surveillance advertising companies are willing to pursue content
commodification even to the point of taking big reputational risks from
feeding ad money to the worst people on the Internet (Hiding
in Plain Sight: The Ad-Supported Piracy Ring Driving Over a Billion
Monthly Visits - deepsee.io, Senators
Decry Adtech Failures as Ads Appear On CSAM Site). If advertising
intermediaries were more limited in their ability to put a good ad on a
bad site using user tracking, the higher-quality content sites would
enjoy significantly increased market power.
p. 14 Restrictions to limit the effectiveness of digital
advertising would likely disproportionately disadvantage small
businesses, since nine out of ten predominantly use digital advertising,
especially on Meta
Are small businesses really better off in the surveillance
advertising era? Although personalized Big Tech advertising is the main
ad medium available to small businesses today, there is clearly some
survivorship bias going on here. The Kerrigan
and Keating paper states that, While entrepreneurship has
rebounded since the Great Recession and its aftermath, startup activity
remains weak by historical standards. This period of time overlaps
with the golden age of personalized advertising, after widespread
adoption of smartphones but before Apple’s ATT, the EU’s GDPR, and
California’s CCPA. If personalized advertising is so good for small
businesses, where are the extra small businesses enabled by it? We
should have seen a small business boom in the second half of the 2010s,
after most
people in the USA got smartphones but before CCPA and ATT.
Jakob Nielsen may have provided the best explanation in 2006’s Search
Engines as Leeches on the Web, which likely applies not just to
search, but to other auction-based ad placements like social media
advertising. An auction-based advertising platform like those operated
by Google and Meta is able to dynamically adjust its advertising rates
to capture all of the expected incremental profits from the customers
acquired through it.
Part of the missing small business effect may also be caused by
platform concentration. If, instead of an advertising duopoly, small
businesses had more options for advertising, the power balance between
platform (rentier) and small business (entrepreneur) might shift more
toward the latter. See also Crawford et al., The
antitrust orthodoxy is blind to real data harms. Policy makers might
choose to prioritize pro-competition privacy legislation such as surveillance
licensing for the largest, riskiest platforms in order to address
competition concerns in parallel with privacy ones.
p. 15 Since PETs are costly for firms to implement,
forward-looking regulation should consider how to incentivize PET
adoption and innovation further.
In a section about how so-called privacy-enhancing
technologies (PETs) have equal perceived privacy violation and
bigger competition issues than conventional personalization, why
recommend incentivizing PETs? The works cited would better support a
recommendation to have a more detailed or informative consent experience
for PETs than for cookie-based tracking. Because PETs
obfuscate real-world privacy problems such as fraud and algorithmic
discrimination, it would be more appropriate to require additional
transparency, and possibly licensing, for PETs.
PETs, despite their mathematical appeal to many at Big Tech firms,
have a
long list of problems when applied to the real world. The
creeped-out attitude of users toward PETs is worth paying attention to,
as people who grow up in market economies generally develop good
instincts about information in markets—just like people who grow up
playing ball games can get good at catching a ball without consciously
doing calculus. Policymakers should pay more attention to user
perceptions—which are based on real-world market activity—than to
mathematical claims about developers’ PET projects. PETs should be
considered from the point of view of regulators investigating
discrimination and fraud complaints, which are often difficult
to spot on large platforms. Because PETs have the effect of
shredding the evidence of platform misdeeds, enabling the existing
problems of adtech, just in a harder-to-observe way, they need more
scrutiny, not incentivization.
Coming soon: a useful large-scale experiment
Policymakers may soon be able to learn from what could be the
greatest experiment on the impact of ad personalization ever
conducted.
If Meta is required to offer Facebook users in the European Union a
meaningfully de-personalized ad experience (and not just the less
personalized ads option that still allows for personalization
using fraud risk factors like age, gender, and location) then there will
be a chance to measure what happens when users can choose personalized
or de-personalized ads on a service that is otherwise the same.
I put these links and notes together to help myself out when someone
drops a link to the Dubé et al. paper into an Internet argument, and put
them up here in the hope that they will help others. Hardly anyone will
read all the literature in this field, but a lot of the most interesting
research is still found in corners of the library that Big Tech isn’t
actively calling attention to.
Thanks to Fengyang Lin for reviewing a draft of this post.
First, I know that pretty much everyone is (understandably) freaking
out about stuff that is getting worse, but I just wanted to share some
good news in the form of an old-fashioned open-source success story. I’m
a fairly boring person and developed most of my software habits in the
late 1990s and early 2000s, so it’s pretty rare that I actually hit a
bug.
A helpful developer, Jan Kratochvil, wrote a fix and put in a
pull request.
A bot made test packages and commented with instructions for me
on how to test the fix.
I commented that the new version works for me
The fix just went into Fedora. Pretty damn slick.
This is a great improvement over how this kind of thing used to work.
I hardly had to do anything. These kids today don’t know how good they
have it.
story number 2: why support the Linux desktop?
Amazon
Chime is shutting down. Did anyone use it? I get invited to a lot of
video conferences, and I never got invited to an Amazon Chime meeting.
Even though Amazon.com is normally really good at SaaS, this one didn’t
take off. What happened?
It looks like Amazon Chime was an interesting example of Nassim
Nicholas Taleb’s intransigent
minority effect.
The system
requirements for Amazon Chime look pretty reasonable, right? Should
get 95% of the client systems out there. The number of desktop Linux
users is pretty small. But if you have 20 meetings a week, at 95%
compatibility you’re going to average a compatibility issue every week.
Even worse, the people you most want to make a good first impression on
are the people whose client platform you’re least likely to know.
And if you do IT support for a company with 100 people organizing
meetings, Amazon Chime is going to cause way too many support issues to
put up with. Taleb uses the examples of kosher and halal food—only a
small fraction of the population will only eat kosher or halal, but when
planning food for a large group, the most practical choice is to satisfy
the minority.
The minority rule will show us how it all it takes is a small number
of intolerant virtuous people with skin in the game, in the form of
courage, for society to function properly.
Anyway, something to keep in mind in the future for anyone
considering moving the support desktop Linux or support
Firefox tickets to backlog. None of the successul video conferencing
platforms give me any grief for my Linux/Firefox/privacy nerdery
client-side setup.
Bonus links
Liam Proven and Thomas Claburn cover the latest web browser
surveillance drama: Mozilla
flamed by Firefox fans after reneging on promises to not sell their
dataMozilla doesn’t sell data about you (in the way that most
people think about selling data), and we don’t buy data about
you, he said. We changed our language because some jurisdictions
define sell more broadly than most people would usually
understand that word. (Don’t forget to turn
off advertising features in Firefox.)
David Roberts interviews Mustafa Amjad and Waqas Moosa about Pakistan’s solar
boom. What has prompted this explosion of distributed solar is
some combination of punishingly high prices for grid power and solar
panels getting very, very, very cheap. A glut of Chinese overcapacity
means that the price of panels in Pakistan has gone from 24 cents a watt
to 10 cents a watt in just the past year or two. Distributed solar is
breaking over Pakistan like a tidal wave, despite utilities and a grid
that do not seem entirely prepared for it.
AI
and Esoteric Fascism by Baldur Bjarnason. When I first began to
look into Large Language Models (LLMs) and Diffusion Model back in 2022,
it quickly became obvious that much of the rhetoric around LLMs was…
weird. Or, if we’re being plain-spoken, much of what the executives and
engineers at the organisations making these systems were saying was
outright weirdo cult shit…
It
is no longer safe to move our governments and societies to US clouds
by Bret Hubert. With all sorts of magic legal spells like
DPIAs and DTIAs, organizations attempt to justify
transferring our data and processes to the US. There is a whole industry
that has been aiding and abetting this process for years. People also
fool themselves that special keys and “servers in the EU” will get you
“a safe space” within the American cloud. It won’t.
This came out in 2020 but worth re-reading today: Puncturing
the Paradox: Group Cohesion and the Generational Myth by Harry
Guild. The highest group cohesion by profession is in Marketing.
This is advertising’s biggest problem in a single chart. This is the
monoculture. How can we possibly understand, represent and sell to an
entire country when we exist in such a bubble? We like to style
ourselves as free thinkers, mavericks and crazies, but the grim truth is
that we’re a more insular profession than farming and boast more
conformists than the military.
Tech
continues to be political by Miriam Eric Suzanne. Maybe we should
consider the beliefs and assumptions that have been built into a
technology before we embrace it? But we often prefer to treat each new
toy as as an abstract and unmotivated opportunity. If only the good
people like ourselves would get involved early, we can surely teach
everyone else to use it ethically!
Important reminder at Sauropod Vertebra Picture of the Week. If
you believe in “Artificial Intelligence”, take five minutes to ask it
about stuff you know wellBecause LLMs get catastrophically wrong
answers on topics I know well, I do not trust them at all on topics I
don’t already know. And if you do trust them, I urge you to spend five
minutes asking your favourite one about something you know in
detail. (This is a big part of the reason I don’t use LLMs for
search or research. A lot of the material isn’t just wrong, but reflects
old, wrong, but often repeated assumptions that you need to know a lot
about a field to know not to apply.)
Wendy Davis covers Meta
Sued Over Discriminatory College Ads. A civil rights group has
sued Meta Platforms over ad algorithms that allegedly discriminate by
disproportionately serving ads for for-profit colleges to Black users
and ads for public colleges to white users. Situations like this are
a big part of the reason why people should stop
putting privacy-enhancing advertising technologies in web
browsers—they mainly obfuscate discrimination and fraud.)
Cities
Can Cost Effectively Start Their Own Utilities Now by Kevin Burke.
Most PG&E ratepayers don’t understand how much higher the rates
they pay are than what it actually costs PG&E to generate and
transmit the electricity to their house. When I looked into this
recently I was shocked. The average PG&E electricity charge now
starts at 40 cents per kilowatt hour and goes up from there. Silicon
Valley Power, Santa Clara’s utility company, is getting power to
customers for 17 cents per kilowatt hour. Sacramento’s utility company
charges about the same.
Three
years on, Europe looks to Ukraine for the future of defense tech by
Mike Butcher. But in order to invest in the right technology, Europe
will have to look to Ukraine, because that is where future wars are
being fought right now. TechCrunch recently put a call out for Ukrainian
dual-use and defense tech startups to update us on what they are working
on. Below is what they sent us, in their own words. Related: Ukrainian
Drones Flew 500 Miles And, In A Single Strike, Damaged 5% Of Russia’s
Oil Refining Capacity by David Axe. (Drones are getting longer
ranges and better autonomy, fast. Fossil fuel infrastructure is not
getting any better protected or faster to repair. In the near future,
you’re only going to be in the oil or gas business if nobody who is good
at ML and model airplanes has a strong objection to you being in the oil
or gas business.)
Is it just me or is it all about Europe right now? Put on some Kraftwerk
and follow along I guess.
Fedora
Chooses Forgejo! This is GitHub-like project hosting software with
version control, issues, pull requests, all the usual stuff. I have a
couple of small projects on Codeberg,
which is the (EU) hosted nonprofit instance and it works fine as far as
I can tell. Also a meissa
GmbHpresentation
at FOSDEM 2025You know X, Facebook, Xing, SourceForge? What
about GitHub? It is time to de-risk OpenSource engagement!
Scott Locklin lists Examples
of group madness in technology. One of the worst arguments I hear
is that thing X is inevitable because the smart people are doing
it. As I’ve extensively documented over the last 15 years on this
blog, smart people in groups are not smart and are even more subject to
crazes and mob behavior as everyone else.
Also international, not Europe-specific: Postgres
full-text search is Good Enough! by Rachid Belaid. (But there is a
tech autonomy angle, and an active PostgreSQL Europe, so for
practical purposes PostgreSQL is part of the Eurostack.)
Good advice from tante/Jürgen Geuter: Innovation
is a distractionThe demand for more Innovation (and sometimes
even the request for more research) has become a way to legitimize not
doing anything. A way to say the unpleasant solutions we have are not
perfect but in the future there might be a magic solution that doesn’t
bother us and everyone gets a fucking unicorn.
Marloes de Koning interviews Cristina Caffarra. ‘We
have to get to work and put Europe first. But we are late. Terribly
late’You really don’t have to buy everything in Europe,
says the competition expert, who is familiar with the criticism that the
American supply is simply superior. But start with 30 percent of your
procurement budget in Europe. That already makes a huge
difference. (That seems like an easy target. Not only are way
more than 30 percent of the European Alternatives up to
a servicable level by now, but unfortunately a lot of the legacy US
vendors are having either quality or compliance problems, or both. The
risks, technical and otherwise, keep going up.
Greg Nojeim and Silvia Lorenzo Perez cover Trump’s
Sacking of PCLOB Members Threatens Data PrivacyAside from its
importance in protecting civil liberties, the PCLOB cannot play its key
role in enforcing U.S. obligations under the EU-U.S. Data Privacy
Framework (DPF) while it lacks a quorum of members. The European
Commission would lose a key oversight tool for which it bargained, and
the adequacy decision that it issued to support the DPF could be struck
down under review at the Court of Justice of the European Union (CJEU),
which struck down two predecessor EU-U.S. data privacy arrangements, the
Safe Harbor Agreement and the Privacy Shield.
Sarah Perez covers Report:
Majority of US teens have lost trust in Big Tech. Common Sense
says that 64% of surveyed U.S. teens don’t trust Big Tech companies to
care about their mental health and well-being and 62% don’t think the
companies will protect their safety if it hurts profits. Over half of
surveyed U.S. teens (53%) also don’t think major tech companies make
ethical and responsible design decisions (think: the growing use of dark
patterns in user interface design meant to trick, confuse, and deceive.
A further 52% don’t think that Big Tech will keep their personal
information safe and 51% don’t think the companies are fair and
inclusive when considering the needs of different users. (What if
the Eurostack becomes the IT version of those European food brands that
sell well in other countries too?)
Mariana Olaizola Rosenblat covers How
Meta Turned Its Back on Human Rights for Tech Policy Press.
Zuckerberg announced that his company will no longer work to detect
abuses of its platforms other than high-severity violations of
content policy, such as those involving illicit drugs, terrorism, and
child sexual exploitation. The clear implication is that the company
will no longer strive to police its platform against other harmful
content, including hate speech and targeted harassment.
Sounds like a brand-unsafe environment. So is another rush of
advertiser boycott stories coming? Not
this time. Lara O’Reilly reports that brand safety has recently
become a political hot potato and been a flash point for some
influential, right-leaning figures. In uncertain times, marketing
decision-makers are keeping a low profile. Most companies aren’t really
set up to take on the open-ended security risk of coming out against
hate speech by users with friends in high places. According
to the Fraternal Order of Police, the January 6 pardons send a
dangerous message, and that message is being heard in marketing
departments. The CMOs who boycotted
last time are fully aware that stochastic
terrorism is a thing, and that rage stories about companies spread
quickly in Facebook groups and other extremist media. If an executive
makes the news for pulling ads from Meta, they would be putting
employees at risk from lone, deniable attacks. So instead of
announcing a high-profile boycott, marketers are more likely to follow
the example of Federal
employees and do the right thing, by the book, and quietly.
And it’s possible for advertisers to reduce support for Meta without
making a stink or drawing fire. Fortunately, Meta ads are hella
expensive, and results can be unrealistic
and unsustainable. Like all the Big Tech companies these days, Meta
is coping with a slowdown in innovation by tweaking the ad rules to
capture more revenue from existing services. As Jakob Nielsen pointed
out back in 2006, in Search
Engines as Leeches on the Web, ad platforms can even capture the
value created by others. A marketer doesn’t have to shout ¡No
Pasarán! or anything—just sharpen your best math pencil, quietly go
through the numbers, spot something that looks low-ROAS or fraudulent in
the Meta column, tweak the budget, repeat. If users can dial
down Meta, so can marketers. (Update: Richard Kirk writes, Brands
could be spending three times too much on social. You read that
right. Read the math, do the math.) And if Meta comes out with
something new and risky like the adfraud
in the browser thing, Privacy-Preserving Attribution, it’s easy to
use the fraud problem as the reason not to do it—you don’t have to stand
up and talk politics at work.
From the user side
It’s not that hard to take privacy measures that result in less money
for Big Tech. Even if you can’t quit Meta entirely, some basic tools and
settings can make an impact, especially if you use both a laptop and a
phone, not just a phone. With a few minutes of work, an individual in
the USA can, in effect, fine the surveillance
business about $50/month.
My list of effective privacy
tips is prioritized by how much I think they’ll cost the
surveillance business per minute spent. A privacy tips list for people
who don’t like doing privacy tips but also don’t like creepy oligarchs.
(As they say in the clickbait business, number 9 will shock you: if you
get your web browser info from TV and social media, you probably won’t
guess which browsers have built-in surveillance and/or fraud features.)
That page also has links to more intensive privacy advice for those who
want to get into it.
A lawyer question
As an Internet user, I realize I can’t get to Meta surveillance
neutral just with my own privacy tools and settings. For the foreseeable
future, companies are going to be doing server-to-server tracking of me
with Meta CAPI.
So in order to get to a rough equivalent of not being surveilled, I
need to balance out their actual surveillance by introducing some free
speech into the system. (And yes, numbers can be speech. O, the
Tables tell!) So what I’d like to do is write a surrogate script
(that can be swapped in by a browser extension in place of the real Meta
Pixel, like the
surrogate scripts uBlock Origin uses) to enable the user to send
something other than valid surveillance data. The user would configure
what message the script would send. The surrogate script would then
encode the message and pass it to Meta in place of the surveillance data
sent by the original Meta script. There is a possible research angle to
this, since I think that in
general, reducing ad personalization tends to help people buy better
products and services. An experiment would probably show that people
who mess with cross-context surveillance are happier with their
purchases than those who allow surveillance. Releasing a script like
that is the kind of thing I could catch hell for, legally, so I’m going
to wait to write it until I can find a place to host it and a lawyer to
represent me. Anyone?
After Big Social.
Dan Phiffer covers the question of where to next. I am going into
this clear-eyed; I’m going to end up losing touch with a lot of people.
For many of my contacts, Meta controls the only connection we have. It’s
a real loss, withdrawing from communities that I’ve built up over the
years (or decades in the case of Facebook). But I’m also finding new
communities with different people on the networks I’m spending more time
in.
(update 2 Feb 2025) This site has some pages with inline
styles, so I can’t use that CSP line right now. This is because I use
the SingleFile
extension to make mirrored copies of pages, so I need to move those into
their own virtual host so I can go back to using the version without the
unsafe-inline.
(update 23 Feb 2025) The Pagefind site search requires ‘unsafe-eval’ in
CSP in order to support WASM. This should be
wasm-unsafe-eval in the future.
To do WASM and inline styles, the new value for the
Content-Security-Policy header is:
Why
is Big Tech hellbent on making AI opt-out? by Richard Speed.
Rather than asking we’re going to shovel a load of AI services
into your apps that you never asked for, but our investors really need
you to use, is this OK? the assumption instead is that users will be
delighted to see their formerly pristine applications cluttered with AI
features. Customers, however, seem largely dissatisfied. (IMHO if
the EU is really going to throw down and do a software trade war with
the USA, this is the best time to switch to European
Alternatives.
Big-time proprietary software is breaking
compatibility while independent alternatives keep on going. People
lined up for Microsoft Windows 95 in 1995 and Apple iPhones in 2007, and
a trade war with the USA would have been a problem for software users
then, but now the EuroStack is a
thing. The China stack, too, as Prof. Yu Zhou points out: China
tech shrugged off Trump’s ‘trade war’ − there’s no reason it won’t do
the same with new tariffs. I updated generative ai
antimoats with some recent links. Even if the AI boom does catch on
among users, services that use AI are more likely to use predictable
independently-hosted models than to rely on Big Tech APIs that can be
EOLed or nerfed at any time, or just have the price increased.)
California vs
Texas Minimum Wage, 2013-2024 by Barry Ritholtz. [F]or seven
years–from January 2013 to March 2020–[California and Texas
quick-service restaurant] employment moved almost identically, the
correlation between them 0.994. During that seven year period, however,
TX had a flat $7.25/hr minimum wage while CA increased its minimum wage
by 50%, from $8/hr to $12. Related: Is a Big
Mac in Denmark Pricier Than in US?
What’s
happening on RedNote? A media scholar explains the app TikTok users are
fleeing to – and the cultural moment unfolding there Jianqing Chen
covers the Xiaohongshu boom in the USA. This spontaneous convergence
recalls the internet’s original dream of a global village. It’s a
glimmer of hope for connection and communication in a divided world.
(This is such authentic organic social that the Xiaohongshu ToS
hasn’t even been translated into English yet. And not only does nobody
read privacy policies (we knew that) but videos about reuniting with
your Chinese spy from TikTok are a whole trend on there. One
marketing company put up a page of Rules
& Community Guidelines translated into English but I haven’t
cross-checked it. Practice the core socialist values. and
Promote scientific thinking and popularize scientific
knowledge.)
Bob Sullivan reports Facebook
acknowledges it’s in a global fight to stop scams, and might not be
winning (The bigger global fight they’re in is a labor/management
one, and when moderator jobs get less remunerative or more stressful,
the users get stuck dealing with more crime.) Related: Meta
AI case lawyer quits after Mark Zuckerberg’s ‘Neo-Nazi madness’; Llama
depositions unsealed by Amy Castor and David Gerard. (The direct
mail/database/surveillance marketing business, get-rich-quick schemes,
and various right-wing political movements have been one big overlapping
scene in the USA for quite a while, at least back to the Direct
Mail and the Rise of the New Right days and possibly further. People
in the USA get targeted for a lot of political disinformation and fraud
(one scheme can be both),
so the Xiaohongshu mod team will be in for a shock as scammers, trolls,
and worse will follow the US users onto their platform.)
I’m still an Internet optimist despite…things…so I was hoping that
Friday’s Supreme
Court opinion in the TikTok case would have some useful information
about how to design online social networking in a way that does get
First Amendment protection, even if TikTok doesn’t. But no. Considered
as a bug report, the opinion doesn’t help much. We basically got (1)
TikTok collects lots of personal info (2) Congress gets to decide if and
how it’s a national security problem to make personal info available to
a foreign adversary, and so TikTok is banned. But everyone else
doing social software, including collaboration software, is going to
have a lot to find out for themselves.
The Supreme Court pretty much ignores TikTok’s dreaded For You Page
algorithm and focuses on the privacy problem. So we don’t know if some
future ban of some hypothetical future app that somehow fixed its data
collection issues would hold up in court just based on how it does
content recommendations. (Regulating recommendation algorithms is a big
issue that I’m not surprised the Court couldn’t agree on in the short
time they had for this case.) We also get the following, on p. 9—TikTok
got the benefit of the doubt and received some First Amendment
consideration that future apps might or might not.
This Court has not articulated a clear framework for determining
whether a regulation of non-expressive activity that disproportionately
burdens those engaged in expressive activity triggers heightened review.
We need not do so here. We assume without deciding that the challenged
provisions fall within this category and are subject to First Amendment
scrutiny.
Page 11 should be good news for anybody drafting a privacy law
anyway. Regulating data collection is content neutral for First
Amendment purposes—which should be common sense.
The Government also supports the challenged provisions with a
content-neutral justification: preventing China from collecting vast
amounts of sensitive data from 170 million U. S. TikTok users. That
rationale is decidedly content agnostic. It neither references the
content of speech on TikTok nor reflects disagreement with the message
such speech conveys….Because the data collection justification reflects
a purpose[e] unrelated to the content of expression, it is
content neutral.
But so far the to-do list for future apps looks manageable.
Don’t surveil US users for a foreign adversary
Comply with whatever future restrictions on recommendation
algorithms turn out to hold up in court. (Disclosure of rules or source
code? Allow users to switch to chronological? Allow client-side or
peer-to-peer filtering and scoring? Lots of options but possible to get
out ahead of.)
Not so fast. Here’s the hard part. According to the Court the problem
is not just the info that the app collects automatically and
surreptitiously, or the user actions it records, but also the info that
users send by some deliberate action. On page 14:
If, for example, a user allows TikTok access to the user’s phone
contact list to connect with others on the platform, TikTok can access
any data stored in the user’s contact list, including names,
contact information, contact photos, job titles, and notes. Access to
such detailed information about U. S. users, the Government worries, may
enable China to track the locations of Federal employees and
contractors, build dossiers of personal information for blackmail, and
conduct corporate espionage.
and in Justice Gorsuch’s concurrence,
According to the Federal Bureau of Investigation, TikTok can access
any data stored in a consenting user’s contact
list—including names, photos, and other personal information about
unconsenting third parties. Ibid. (emphasis added). And because the
record shows that the People’s Republic of China (PRC) can require
TikTok’s parent company to cooperate with [its] efforts to obtain
personal data, there is little to stop all that information from
ending up in the hands of a designated foreign adversary.
On the one hand, yes, sharing contacts does transfer a lot of
information about people in the USA to TikTok. But sharing a contact
list with an app can work a lot of different ways. It can be
covert surveillance (although mobile platforms generally do their
best to prevent this)
data sharing that you get tricked into
deliberate, more like choosing to email a copy of the company
directory as an attachment
If it’s really a problem to enable a user to choose to share contact
info, then that makes running collaboration software like GitHub
in China a problem from the USA side. (Git repositories are full of
metadata about who works on what, with who. And that information is
processed by other users, by the platform itself, and by third-party
tools.) Other content creation tools also share the kinds of info on
skills and work relationships that would be exactly what a foreign
adversary murder robot needs to prioritize targets. But the user, not
some surveillance software, generally puts that info there. If
intentional contact sharing by users is part of the reason that the USA
can ban TikTok, what does that mean for other kinds of user to user
communication?
Kleptomaniac princesses
There’s a great story I read when I was a kid that I wish I had the
citation for. It might be fictional, but I’m going to summarize it
anyway because it’s happening again.
Once upon a time there was a country that the UK really, really
wanted to maintain good diplomatic relations with. The country was in a
critical strategic location and had some kind of natural resources or
something, I don’t remember the details. The problem, though, was that
the country was a monarchy, and one of the princesses loved to visit
London and shoplift. And she was really bad at it. So diplomats had to
go around to the stores in advance to tell the manager what’s going on,
convince the store to let her steal stuff, and promise to settle up
afterwards.
Today, the companies that run the surveillance apps are a lot like
that princess. techbros don’t have masculine energy,
they have kleptomaniac princess energy If one country really
needs to maintain good relations with another, they’ll allow that
country’s surveillance apps to get away with privacy shenanigans. If
relations get chillier, then normal law enforcement applies. At least
for now, though, we don’t know what the normal laws here will look like,
and the Supreme Court didn’t provide many hints yesterday.
In
TikTok v. Garland, Supreme Court Sends Good Vibes for Privacy Laws, But
Congress’s Targeting of TikTok Alone Won’t Do Much to Protect
Privacy by Tom McBrien, EPIC Counsel. The Court’s opinion was
also a good sign for privacy advocates because it made clear that
regulating data practices is an important and content-neutral regulatory
intervention. Tech companies and their allies have long misinterpreted a
Supreme Court case called Sorrell v. IMS Health to mean
that all privacy laws are presumptively unconstitutional under the First
Amendment because information is speech. But the TikTok Court
explained that passing a law to protect privacy is decidedly content
agnostic because it neither references the content of speech…nor
reflects disagreement with the message such speech conveys. In fact,
the Court found the TikTok law constitutional specifically on the
grounds that it was passed to regulate privacy and emphasized how
important the government interest is in protecting American’s
privacy.
Bonus links
TikTok,
AliExpress, SHEIN & Co surrender Europeans’ data to authoritarian
ChinaToday, noyb has filed GDPR complaints against TikTok,
AliExpress, SHEIN, Temu, WeChat and Xiaomi for unlawful data transfers
to China….As none of the companies responded adequately to the
complainants’ access requests, we have to assume that this includes
China. But EU law is clear: data transfers outside the EU are only
allowed if the destination country doesn’t undermine the protection of
data.
Total
information collapse by Carole Cadwalladr It was the open society
that enabled Zuckerberg to build his company, that educated his
engineers and created a modern scientific country that largely obeyed
the rules-based order. But that’s over. And, this week is a curtain
raiser for how fast everything will change. Zuckerberg took a smashing
ball this week to eight years’ worth of “trust and safety” work that has
gone into trying to make social media a place fit for humans. That’s
undone in a single stroke.
Baltic
Leadership in Brussels: What the New High Representative Kaja Kallas
Means for Tech Policy | TechPolicy.Press by Sophie L. Vériter.
[O]nline platforms and their users are affected by EU foreign policy
through counter-disinformation regulations aimed at addressing foreign
threats of interference and manipulation. Indeed, technology is
increasingly considered a matter of security in the EU, which means that
the HRVP may well have a significant impact on the digital space within
and beyond the EU.
The
Ministry of Empowerment by danah boyd. This isn’t about
shareholder value. It’s about a kayfabe war between tech demagogues
vying to be the most powerful boy in the room.
This site is written by me personally
except for anything that is clearly marked up and cited as a direct
quotation. If you see anything on here that is not cited appropriately,
please contact me.
Generative AI output appears on this site only if I think it really
helps make a point and only if I believe that my use of a similar amount
and kind of material from a relevant work in the training set would be
fair use.
For example, I quote a sentence of generative AI output in LLMs and
reputation management. I believe that I would have been within my
fair use rights to use the same amount of text from a copyrighted
history book or article.
In LLMs and the web
advertising business, my point was not only that the Big Tech
companies are crooked, but that it’s so obvious. A widely available LLM
can easily point out that a site running Big Tech ads—for real brands—is
full of ripped-off content. So I did include a short question and answer
session with ChatGPT. It’s really getting old that big companies are
constantly being shocked to discover infringement and other crimes when
their own technology could have spotted it.
Usually when I mention AI or LLMs on here I don’t include any
generated content.
Here’s my contender for Internet tip of the year. It’s going to take
under a minute, and will not just help you buy better stuff, but also
make you happier in general. Ready? Here it is, step by step.
Log in to your Google account if you’re not logged in already.
(If you have a Gmail or Google Drive tab open in the browser, you’re
logged in.)
Find the Personalized ads control. It looks something like
this.
Personalized ads on
Turn it off.
Personalized ads off
That’s it. Unless you have another Google account. If you do have
multiple Google acccounts (like home, school, and work accounts) do this
for each one.
This will affect the ads you get on all the Google sites and apps,
including Google Search and YouTube, along with the Google ads on other
sites. Google is probably going to show you some message to try to
discourage you from doing this. From what I can tell from the outside,
it looks like turning off personalized ads will cost Google money. Last
time I checked, I got the following message.
Ads may seem less relevant When your info isn’t used
for ads, you may see fewer ads for products and brands that interest
you. Non-personalized ads on Google are shown to you according to
factors like the time of day, device type, your current search or the
website you’re visiting, or your current location (based on your IP
address or device permissions).
But what they don’t say is anything about how personalized ads will
help you buy better products and services. And that’s because—and I’m
going out on a limb here data-wise, but a pretty short and solid limb,
and I’ll explain why—they just don’t. Choosing to turn off personalized
ads somehow makes you a more satisfied shopper and better off.
How does this work?
I still don’t know how exactly how this tip works, but so far there
have been a few theories.
1: lower fraud risk. It’s possible that
de-personalizing the ads reduces the number of scam advertisers who can
successfully reach you. Bian et al., in Consumer Surveillance and
Financial Fraud, show that Apple App Tracking Transparency, which
reduces the ability of apps to personalize ads, tended to reduce fraud
complaints to the FTC.
We estimate that the reduction in tracking reduces money lost in all
complaints by 4.7% and money lost reported in internet and data security
complaints by 40.1%.
targeted ads are more likely to be associated with lower quality
vendors, and higher prices for identical products, compared to competing
alternatives found in organic search results
Between 2019 and 2021, brands that focused on brand equity saw a 72%
increase in value, compared to just 20% for brands that relied primarily
on performance tactics. Ignoring brand-building not only weakens your
baseline sales but forces you to spend more and more on performance
marketing just to keep your head above water.
Brands that are over-focused on surveillance advertising might be
forced to under-invest in product improvements.
4. limited algorithmic and personalized pricing.
Personalized ads might be set up to offer the same product at higher
prices to some people. The
FTC was investigating, but from the research point of view,
personalized pricing is really hard to tell apart from dynamic pricing.
Even if you get volunteers to report prices, some might be getting a
higher price because stock is running low, not because of who the
individual is. So it’s hard to show how much impact this has, but hard
to rule it out too.
5. it’s just a step on the journey. Another
possibility is that de-personalizing the ads is a gateway to blocking
ads entirely. What if, without personalization, the ads get gross or
annoying enough that people tend to move up to an ad blocker? And,
according to Lin et al. in The
Welfare Effects of Ad Blocking,
[P]articipants that were asked to install an ad-blocker become less
likely to regret recent purchases, while participants that were asked to
uninstall their ad-blocker report lower levels of satisfaction with
their recent purchases.
Maybe you don’t actually make better buying decisions while ads are
on but personalization is off—but it’s a step toward full ad blocking
where you do get better stuff and more happiness.
How do I know this works?
I’m confident that this tip works because if turning ad
personalization off didn’t help you, Google would have said so a while
ago. Remember the 52%
paper about third-party cookies? Google made a big deal out of
researching the ad revenue impact of turning cookie tracking on or off.
And this ad personalization setting also has a revenue impact for
Google. According to documents
from one of Google’s Federal cases, keeping the number of users with
ad personalization off low is a goal for Google—they make more money
from you if you have personalization on, so they have a big incentive to
try to convince you that personalization is a win-win. So why so quiet?
The absence of a PDF about this is just as informative as the actual PDF
would be.
And it’s not just Google. Research showing user benefits from
personalized ads would be a fairly easy project not just for Google, but
for any company that can both check a privacy setting and measure some
kind of shopping outcome. Almost as long as Internet privacy tools have
been a thing, so has advice from
Internet Thought Leaders telling us they’re not a good idea. But for
a data-driven industry, they’re bringing surprisingly little
data—especially considering that for many companies it’s data they
already have and would only need to do stats on, make graphs, and write
(or have an LLM write) the abstract and body copy.
Almost any company with a mobile app could do research to show any
benefits from ad personalization, too. Are the customers who use Apple
iOS and turn
off tracking more or less satisfied with their orders? Do banks get
more fraud reports from app users with tracking turned on or off? It
would be straightforward for a lot of companies to show that turning off
personalization or turning on some privacy setting makes you a less
happy customer—if it did.
The closest I have found so far is Balancing
User Privacy and Personalization by Malika Korganbekova and Cole
Zuber. This study simulated the effects of a privacy feature by
truncating browsing history for some Wayfair shoppers, and found that
people who were assigned to the personalized group and chose a product
personalized to them were 10% less likely to return it than people in
the non-personalized group. But that’s about a bunch of vendors of
similar products that were all qualified by the same online shopping
platform, not about the mix of honest and dishonest personalized ads
that people get in total. So go back and do the tip if you didn’t
already, enjoy your improved shopping experience, and be happy.
More:effective privacy
tips
Related
You can’t totally turn off ad personalization on Meta sites like
Facebook, but there are settings to limit the flow of targeting data in
or out. See Mad
at Meta? Don’t Let Them Collect and Monetize Your Personal Data by
Lena Cohen at the Electronic Frontier Foundation.
B L O C K in
the U S A Ad blocking is trending up, and for the first time the
people surveyed gave their number one reason as privacy, not annoyance
or performance.
fix Google
Search will get you not all the way back to classic Google, but
pretty close.
The $16
hack to blocking ads on your devices for life (I don’t know about
the product or the offer, just interesting to see it on a site with ads.
Maybe the affiliate revenue is a much bigger deal than the programmatic
ad revenue?)
personalization
risks In practice, most of the privacy risks related to advertising
are the result not of identifying individuals, but of treating different
people in the same context differently.
Bonus links
Samuel Bendett and David Kirichenko cover Battlefield
Drones and the Accelerating Autonomous Arms Race in Ukraine.
Ukrainian officials started to describe their country as a war lab
for the future—highlighting for allies and partners that, because
these technologies will have a significant impact on warfare going
forward, the ongoing combat in Ukraine offers the best environment for
continuous testing, evaluation, and refinement of [autonomous] systems.
Many companies across Europe and the United States have tested their
drones and other systems in Ukraine. At this point in the conflict,
these companies are striving to gain battle-tested in Ukraine
credentials for their products.
Aram Zucker-Scharff writes, in The
bounty hunter tendency, the future of privacy, and ad tech’s new profit
frontier., The new generation of laws that are authorizing
citizens to become bounty hunters are implicitly tied to the use of
surveillance technology. They encourage the use of citizen vs citizen
surveillance and create a dangerous environment that worsens the
information imbalance between wealthy citizens and everyone else.
(Is this a good argument against private right of action in privacy
laws? It’s likely that troll lawyers will use existing wiretapping laws
against legit news sites, which tend to have long and vulnerable lists
of adtech partners.)
Scharon Harding covers TVs
at CES 2025. On the one hand, TVs are adding far-field
microphones which, um, yikes. But on the other hand, remember how
the Microsoft Windows business and gaming market helped drive down the
costs of Linux-capable workstation-class hardware? What is the big
innovation that developers, designers, and architects will make out of
big, inexpensive screens subsidized by the surveillance business?
But just in case, since there’s a lot of malarkey in the online
advertising business, I’m putting up this file to let the advertisers
know that if someone sold you an ad and claimed it ran on here, you got
burned.
That’s the ads.txt file for this site. The
format is defined in a
specification from the IAB Tech Lab (PDF). The important part is the
last line. The placeholder is how you tell the tools that are supposed
to be checking this stuff that you don’t have ads.
Rachel explains Web page
annoyances that I don’t inflict on you here in a handy list of web
antipatterns. Removing more of these could be a good start to making a
less frustrating, more accessible, higher performing site.
More useful things to check for security and performance: Securing
your static website with HTTP response headers by Matt Hobbs. I have
some of these set already but it’s helpful to have them all in one
place. A browser can do a lot of stuff that a blog like this one won’t
use, so safer to tell it not to.
Chris Coyier suggest that a list of Slash Pages
could be a good list of blogging ideas. (That is a good idea. I made a
list at /slashes and will
fill it in. Ads.txt is technically not a page I guess since it’s
just text but I’m counting it.)
Elie Berreby follows up on his search engine that’s forgotten how
to search post with a long look at Search
engines think I plagiarized my own content! My Hacker News Case
Study. One of many parts that interests me about this whole issue is
the problem of how much more money certain companies can make when
returning a page on a sketchy infringing
site than on the original. Typically an original content site is
able to get a better ad deal than an illegal site that has to settle for
scraps and leave more of the ad revenue for Google.
Simon Willison says, I
still don’t think companies serve you ads based on spying through your
microphone. For the accusation to be true, Apple would need to be
recording those wake word audio snippets and transmitting them back to
their servers for additional processing (likely true), but then they
would need to be feeding those snippets in almost real time into a
system which forwards them onto advertising partners who then feed that
information into targeting networks such that next time you view an ad
on your phone the information is available to help select the relevant
ad. That is so far fetched. He’s totally right if you define your
microphone as the microphone on your cell phone, which has limited
battery energy and bandwidth. But most people own microphones, plural,
and a smart TV or kitchen appliance is typically plugged in so the juice
to process ambient audio for keywords is there.
Dean W. Ball covers the Texas Responsible AI Governance Act in Texas Plows
Ahead. (This bill doesn’t have a national defense exception the way
the EU’s AI Act does, which is strange.)
I’m looking forward to the new Charles Stross novel that past me
thoughtfully pre-ordered from Books Inc. for near future me. In A
Conventional Boy a man was sentenced to prison for playing Dungeons
and Dragons in the 1980s, and many years later he’s putting his escape
plan into action…
If this year has revealed anything about the tech billionaires it is
that they have a very specific philosophy other than just growth and
that philosophy is malicious…I don’t think we can really take on the
obstacle of, let’s call it more accurately, the scam economy without
acknowledging this is all part of the design. They think they are richer
than you and therefore you must be stupid and because you are stupid you
should be controlled…
Read the whole thing. A lot of tech big shots want to play the
rest of us like a real-time strategy game. (Ever notice that the list of
skills in the we don’t hire US job applicants because the culture
doesn’t value the following skills tweets is the same as the list of
skills in the our AI has achieved human-level performance in the
following skills tweets?) I predicted that
low-trust society will trend in 2025, and I agree with Aram that
a big part of that is company decision-makers deliberately making
decisions that make it harder to trust others. I’m working on a list of known good
companies. (Work in progress, please share yours if you have
one.)
And yes, my link collecting tool as queued up a bunch of links about
the shift towards a lower-trust society along with ways that people are
adapting to it or trying to shift things back.
Why is it so hard to buy
things that work well? (imho Mark Ritson still explained
it best—companies over-emphasize the promotionP of
marketing, trying to find people slightly more likely to buy the product
as is, over the product refinements that would tend to get more
buyers. George Tannenbaum on destroying brand trust with too much of one
P, too little of another: Ad Aged:
Leave Me Alone.)
(looks like I had enough notes for an upcoming event to do A-Z this
year…)
Ad blocking will get bigger and more widely reported
on. Besides the usual suspects, the current wave of ad blocking is also
partly driven by professional, respectable security vendors. Malwarebytes Labs
positions their ad blocker as an security tool and certain
well-known companies are happy to help them with their content marketing
by running malvertising. (example: Malicious
ad distributes SocGholish malware to Kaiser Permanente employees) Silent
Push is another security vendor helping to make the ads/malware
connection. And, according to research by Lin
et al., users who installed an ad blocker reported fewer regrets
with purchases and an improvement in subjective well-being. Some
of those users who installed an ad blocker reluctantly because of
security concerns will be hard to convince to turn it off even if the
malvertising situation improves.
Bullshit is going to be everywhere, and more of it.
In 2025 it won’t be enough to just ignore the bullshit itself. People
will also have to ignore what you might think of as a bullshit Smurf attack,
where large amounts of content end up amplifying a small amount of
bullshit. Some politician is going to tweet something about how
these shiftless guys today need to pull up their pants higher,
and then a bunch of mainstream media reporters are going to turn in
their diligently researched 2000-word think pieces about the effect of
higher pants on the men’s apparel market and human reproductive system.
And by the time the stories run, the politician has totally forgotten
about the pants thing and is bullshitting about something else. The
ability to ignore the whole cycle will be key. So people’s content
discovery habits are going to change, we just don’t know how.
Chrome: Google will manage to hang on to their
browser, as prospective buyers don’t see the value in it. Personally I
think there are two logical buyers. The Trade Desk could rip out the
janky Privacy Sandbox stuff and put in OpenPass and UID2. Not all
users would leave those turned on, but enough would to make TTD the
dominant source for user identifiers in web ads. Or a big bank could buy
Chrome as a fraud protection play and run it to maximize security, not
just ad revenue. At the scale of the largest banks, protecting existing
customers from Internet fraud would save the bank enough money to pay
for browser development. Payment platform integration and built-in
financial services upsell would be wins on top of that.
Both possible Chrome buyers would be better off keeping open-source
Chromium open. Google would keep contributing code even if they didn’t
control the browser 100%. They would feel the need to hire or sponsor
people to participate on a legit open-source basis to support better
interoperability with Google services. They wouldn’t be able to get the
anticompetitive shenanigans back in, but the legit work would
continue—so the buyer’s development budget would be lower than Google’s,
long term. But that’s not going to happen. So far, decision makers are
convinced that the only way to make money with the browser is with tying
to Google services, so they’re going to pass up this opportunity.
Development tools will keep getting more AI
in them. It will be easier to test new AI stuff in the IDE than to not
test it. But a flood of plausible-looking new code that doesn’t
necessarily work in all cases or reflect the unwritten assumptions of
the project means a lot more demand for testing and documentation. The
difference between a software project that spends 2025 doing
self-congratulatory AI productivity win blog posts and one that has an
AI code catastrophe is going to be how much test coverage they started
with or were able to add quickly.
Environmental issues: we’re in for more fires,
floods, and storms. Pretty much everybody knows why, but
some people will only admit it when they have to. A lot of
homeowners won’t be able to renew their insurance, so will end up
selling to investors who are willing to demolish the house and hold the
land for eventual resale. More former house occupants will pivot to
#vanlife, and 24-hour health clubs will sell more memberships to people
who mainly need the showers.
Firefox will keep muddling through. There will be
more Internet drama over their ill-advised adfraud in
the browser thing, but the core software will be able to keep going
and even pick up a few users on desktop because of the ad blocking
trend. The search ad deal going away won’t have much effect—Google pays
Firefox to exist and limit the amount of antitrust trouble it’s in, not
for some insignificant number of search ad clicks. If they can’t pay
Firefox for default search engine placement, they’ll find some other
excuse to send them enough cash to keep going. Maybe not as high
on the hog as they have been used to, but enough to keep the browser
usable.
Homeschooling will increase faster because of safety
concerns, but parents will feel uncomfortable about social isolation and
seek out group activities such as sports, crafts, parent-led classes,
and group playdates. Homeschoooling will continue to be a lifestyle
niche that’s relatively easy to reach with good influencer and content
creator connections, but not well-covered by the mainstream media.
Immigration into the USA will continue despite
high-profile deportations and associated human rights violations. But
whether or not a particular person is going to be able to make it in, or
be able to stay, is going to be a lot less predictable. If you know who
the person is who might be affected by immigration policy changes, you
might be able to plan around it, but what’s more likely from the
business decision-making point of view is the person affected is an
employee of some supplier of your supplier, or a family member, and you
can’t predict what happens when their life gets disrupted. Any company
running in lean or just-in-time mode, and relying on low disruption and
high predictability, will be most at a disadvantage. Big
Tech companies will try to buy their way out of the shitstorm, but
heavy reliance on networks of supplier companies will mean they’re still
affected in hard-to-predict ways.
Journalism will continue to go non-profit and journalist-owned.
The bad news is there’s not enough money in journalism, now or in
the near future, to sustain too many levels of managers and
investors, and the good news is there’s enough money in it to keep a
nonprofit or lifestyle company going. (Kind of like tech conferences.
LinuxWorld had to support a big company, so wasn’t sustainable, but Southern California Linux
Expo, a flatter organization, is.)
Killfile is the old Usenet word for a blocklist, and
I already had something for B. The shared lists that are possible with
the Fediverse and Bluesky are too useful not to escape into other
categories of software. I don’t know which ones yet, but a shared filter
list to help fix the
search experience is the kind of thing we’re likely to see. People’s
content discovery and shopping habits will have to change, we just don’t
know how.
Low-trust society will trend. It’s possible for a
country to move from high trust to low, or the other way around, as the
Pew Research Center covered in 2008. The broligarchy-dominated
political and business environment in the USA, along with the booms in
growth hacking
and AI slop, will make things a lot easier for corporate crime and
scam
culture. So people’s content discovery and shopping habits will have
to change, we just don’t know how. Multi-national companies that already
operate in middle-income low-trust countries will have some advantages
in figuring out the new situation, if they can bring the right people in
from there to here.
Military affairs, revolution in: If you think AI
hype at the office in the USA is intense, just watch the AI hype in
Europe about how advanced drones and other AI-enabled defense projects
can protect countries from being occupied by an evil dictator without
having to restore or expand conscription. Surveillance advertisers and
growth hackers in the USA are constantly complaining about restrictions
on AI in Europe—but the AI Act over there has an exception for the
defense industry. In 2025 it will be clear that the USA is
over-investing in bullshit AI and under-investing in defense AI, but it
won’t be clear what to do about it. (bonus link: The
Next Arsenal of Democracy | City Journal)
Neighborhood organizations: As Molly White
recommended in November, more people will be looking for community
and volunteer opportunities. The choice to become a joiner and
not just a consumer
in unpredictable times is understandable and a good idea in general.
This trend could enter a positive feedback loop with non-profit and
journalist-owned local news, as news sites try more community
connections like Cleveland
Documenters.
Office, return to: Companies that are doing more
crime will tend to do more RTO, because signaling
loyalty is more important than productivity or retaining people with
desired skills. Companies that continue avoiding doing crimes, even in
what’s going to be a crime-friendly time in the USA, will tend to
continue cutting back on office space. The fun part is that the company
can tell the employee that work from home privileges are a
benefit, and not free office space for the employer. Win-win! So the
content niche for how-tos on maximizing home (and van) offices will
grow.
Prediction
markets will benefit from 2024’s 15 minutes of fame to
catch on for some niche corporate projects, and public prediction market
prices will be quoted in more news stories.
Quality, flight to (not): If I were going to be
unrealistically optimistic here, I’d say that the only way for
advertisers to deal with the flood of AI
slop sites and fake
AI users is to go into full Check
My Ads mode and just advertise on known legit sites made by and for
people. But right now the habits and skills around race-to-the-bottom ad
placements are too strong, so there won’t be much change on the
advertiser side in 2025. A few forward-thinking advertisers will get
good results from quality buying for specific campaigns, but that’s
about it.
Research on user behavior will get a lot more
important. The AI
crapflood and resulting search quality crisis mean that (say the
line, Bart) people’s content discovery and shopping habits will have to
change, we just don’t know how. Companies that build user research
capacity, especially in studying privacy users and the gaps they leave
in the marketing data, will have an advantage.
State privacy law season will be spicy again. A few
states will get big comprehensive privacy bills through the process
again, but the laws to watch will be specific ones on health, protecting
teens from the algorithm, social media censorship, and
other areas. More states will get laws like Daniel’s
Law. (We need a Daniel’s Law for military personnel, their families,
and defense manufacturing workers, but we’re probably going to see some
states do them for health insurance company employees instead.)update 1 Feb 2025: Compliance issues that came up for AADC will have to get
another look.
Troll lawyer letters alleging violations of
the California Invasion of Privacy Act (CIPA) and similar laws will
increase. Operators of small sites can incur a lot of legal risk now
just by running a Big Tech tracking pixel. But Big Tech will continue to
ignore the situation, and put all the risks on the small site. (kind of
like how Amazon.com uses delivery partner companies to take the
legal risks of employingalgorithmically
micromanaged, overstressed delivery drivers.)
Unemployment and underemployment will trend up, not
down, in 2025. Yes, there will be more political pressure on companies
here to hire and manufacture locally, but actual job applicants aren’t
interchangeable worker units in an RTS game—there’s a lot of mismatch
between the qualities that job seekers will have and the qualities that
companies will be looking for, which will mean a lot of jobs going
unfilled. And employers tend to hire fewer people in unpredictable times
anyway.
Virginia’sweak
privacy law will continue to be ignored by most companies that
process personal data. Companies will treat all the privacy law states
as Privacyland, USA which means basically California.
Why is my cloud computing bill so high? will be a
common question. But the biggest item on the bill will be the AI that
[employee redacted] is secretly in love with, so you’ll never find
it.
X-rated sites will face
an unfriendly regulatory environment in many states, so will help
drive mass-market adoption of VPNs, privacy technologies,
cryptocurrencies, and fintech. The two big results will be that first,
after people have done all the work to go underground to get their
favorite pr0n site, they might as well use their perceived invisibility
to get infringing copies of other content too. And second, a lot of
people will get scammed by fake VPNs and dishonest payment services.
Youth privacy laws will drive more investment in
better content for kids. (This is an exception to the Q prediction.)
We’re getting a bunch of laws that affect surveillance advertising to
people under 18. As Tobias Kircher and Jens Foerderer reported, in Ban
Targeted Advertising? An Empirical Investigation of the Consequences for
App Development, a privacy policy change tended to drive a lot of
Android apps for kids out of the Google Play Store, but the top 10
percent of apps did better. If you have ever visited an actual app
store, it’s clear that Sturgeon’s law applies, and it’s likely that the
top 10 percent of apps account for almost all of the actual usage. All
the kids privacy laws and regs will make youth-directed content a less
lucrative play for makers of crap and spew who can make anything,
leaving more of the revenue for dedicated and high-quality content
creators.
ZFS
will catch on in more households, as early adopters replace complicated
streaming services (and their frequent price increases and disappearing
content) with storage-heavy media PCs.
Prediction markets—platforms where users buy and sell shares based on
the probability of future events—are poised to disrupt the media
landscape in 2025, transforming not only how news is shared but how it
is valued and consumed.
Prediction markets did get some time in the
spotlight this year. But the reasons for the long, ongoing
prediction market winter are bigger than just prediction markets not
being famous. Prediction markets have been around for a long time, and
have stubbornly failed to go mainstream.
The first prediction market to get famous was the University of
Iowa’s Iowa Electronic
Markets which launched in the late 1980s and has been covered in the
Wall Street Journal since at least the mid-1990s. They
originally used pre-web software and you had to mail in a paper check
(update 4 Jan 2024: paper checks
are still the only way to fund your account on there). But IEM
wasn’t the first. Prof. Robin Hanson, in Hail
Jeffrey Wernick, writes about an early prediction market
entrepreneur who started his first one in 1981. (A secretary operated
the market manually, with orders coming in by fax.) Prediction markets
were more famous than Linux or the World Wide Web before Linux or the
World Wide Web. Prediction markets have been around since before stop
trying to make fetch happen happened.
So the safe prediction would be that 2025 isn’t going to be the year
of prediction markets either. But just like the year of Linux on the
desktop never happened because the years of Linux in your pocket and
in the data center did, the prediction markets that do catch on are
going to be different from the markets that prediction market nerds are
used to today. Some trends to watch are:
Payment platforms: Lorenz points out, Prediction
markets are currently in legal limbo, but I’d bet against a ban,
especially given the new administration. Right now in the USA there
is a lot of VC money tied up in fintech, and a lot of political pressure
from well-connected people to deregulate everything having to do with
money. For most people the biggest result will be more scams and more
hassles dealing with transactions that are legal and mostly trustworthy
today but that will get enshittified in the new regulatory environment.
But all those money-ish services will give prediction markets a lot more
options for getting money in and out in a way that enables more
adoption.
Adding hedging and incentivization: The prediction
markets that succeed probably won’t be pure, ideal prediction markets,
but will add on some extra market design to attract and retain traders.
Nick Whitaker and J. Zachary Mazlish, in Why
prediction markets aren’t popular, write that so far, prediction
markets don’t appeal to the kinds of people who play other kinds of
markets. People enter markets for three reasons. Savers
are trying to build wealth, Gamblers play for thrills,
and Sharps enter to profit from less well-informed
traders. No category out of the three is well-served by existing
prediction markets, because a prediction market is zero-sum, so not a
way to build wealth long-term, and it’s too slow-moving and not very
thrilling compared to other kinds of gambling. And the sharps need a
flow of less well informed traders to profit from, but prediction
markets don’t have a good way to draw non-sharps into the market.
Whitaker and Mazlish do suggest hedging as a way to get more market
participants, but say
We suspect there is simply very little demand for hedging events like
whether a certain law gets passed; there is only demand for hedging the
market outcomes those events affect, like what price the S&P 500
ends the month at. Hedging market outcomes already implicitly hedges for
not just one event but all the events that could impact financial
outcomes.
That’s probably true for hedging in a large public prediction market.
An existing oil futures market is more generally useful to more traders
that a prediction market on all the events that might affect the price
of oil. And certain companies’ stocks today are largely prediction
markets on future AI breakthroughs and the future legal status of
various corporate crimes. But I suspect that it’s different for a
private market for events within a company or organization. For example,
a market with sales forecasting contracts on individual large customers
could provide much more actionable numbers to management than just
trading on predicted total sales.
You could, in effect, pay for a prediction market’s information
output by subsidizing it, and Whitaker and Mazlish suggest this. A
company that runs an internal prediction market can dump money in and
get info out. Like paying for an analyst or consulting firm, but in a
distributed way where the sources of expertise are self-selecting by
making trade/no trade decisions based on what they know or don’t know.
But it’s also possible, usually on the smaller side, for a prediction
market to become an incentivization market. To me, the difference is
that in an incentivization market, a person with ability to affect the
results holds a large enough investment in the market that it influences
them to do so. The difference is blurry and the same market can be a
prediction market for some traders and an incentivization market for
others. But by designing incentives for action in, a market operator can
make it drift away from a pure prediction market design to one that
tends to produce an outcome. related: The private
provision of public goods via dominant assurance contracts by
Alexander Tabarrok
If you don’t know what’s in the box, you can’t secure it, so it is
your responsibility as builders to know what’s in the box. We need
better tools, we need better engagement to enable everybody to do that
with less effort and less burden on individual volunteer maintainers and
non-profits.
Companies that use open source software need to measure and reduce
risks. The problem is that the biggest open source risks are related to
hard-to-measure human factors like developer turnover and burnout.
Developers of open source software can take actions that help companies
understand their risks, but they’re not compensated for doing it. A
prediction/incentivization market can both help quantify hidden risks
and incentivize changes.
If you have an internal market that functions as both a prediction
market and an incentivization market, you can subsidize both the
information and the desired result by predicting the events that
you don’t want to happen. This is similar to how commodities markets and
software bug futures markets can work. Some traders are pure
speculators, others take actions that can move the market. Farmers can
plan which crops to plant based on predicted or contracted prices,
companies can allocate money to fuel futures and/or fuel-saving
projects, developers can prioritize tasks.
Synergy with AI projects: An old corporate Intranet
rule of thumb [citation needed] is that you need five daily active
editors to have a useful company or organization Wiki. I don’t know what
the number is for a prediction market, but as Prof. Andrew Gelman points
out, prediction
markets need “dumb money” to create incentives for well-informed
traders to play and win.
Prediction markets need liquidity and dumb money. Bots can
already do those.
AI projects need scalable quality checks. Slop
is easier to make than to check, so evaluating the quality of AI output
keeps growing relative to the declining
costs of everything else. You can start up a lot of bots, fund each
with a small stake, and shut down the broke ones. The only humans
required are the traders who can still beat the bots. and if at some point the humans lose all their money, you
know you won AI. Congratulations, and I for one welcome our bot plutocrat
overlords.
Bots can also be run behind a filter to only make offers that, if
accepted, would further the market operator’s goals in some way. For
example, bots can be set up to be biased to over-invest on predicting
unfavorable outcomes (like buying the UNFIXED side of bug futures) to
add some incentivization.
Fixing governance by learning from early market
experiences: Internal prediction markets at companies tend to
go through about the same story arc. First, the market launches with
some sponsorship and internal advocacy from management. Second, the
market puts up some encouraging results. (Even in 2002 a prediction
market was producing more accurate sales forecasts than the official
ones at HP.) And for its final act, the prediction market ends up
perpetrating the unforgivable corporate sin: accurately calling some
powerful executive’s baby ugly. So the prediction market ends up going
to live with a nice family on a farm. Read the (imho, classic) paper, Corporate Prediction
Markets: Evidence from Google, Ford, and Firm X by Bo Cowgill and
Eric Zitzewitz, and, in Professor
Hanson’s post, why a VC firm could not get prediction markets into
portfolio companies. Wernick blames the ego of managers who think
their judgment best, hire sycophants, and keep key org info close to
their chests.
The main lesson is that the approval and budget for the prediction
market itself needs to be handled as many management levels as possible
above the managers that the prediction market is likely to bring bad
news to. Either limit the scope of issues traded on, or sell the market
to a more highly placed decision maker, or both. The prediction market
administrator needs to report to someone safely above the level of the
decision-makers for the issues being traded on. The really interesting
experiment would be a private equity or VC firm that has its own team
drop in and install a prediction market at each company it owns. The
other approach is bottom-up: start with limiting the market to
predicting small outcomes like the status of individual software bugs,
and be disciplined about not trading on more consequential
issues until the necessary sponsorship is in place.
So, is 2025 the year of prediction markets? Sort of.
A bunch of factors are coming together. Payment platform options, the
ability to do proof of concept niche projects, and the good fit as a QA
tool for AI will make internal market projects more appealing in 2025.
And if market operators can learn from history to avoid what tends to
happen to bearers of bad news, this could be the year.
Conditional
market: The seer.io prediction market supports conditional positions
(that only win or lose if some other position pays off) with an
arbitrary number of nesting levels.
The
History Crisis Is a National Security ProblemDemocracies such as
the United States rely on the public to set broad strategic priorities
through elections and on civilian leaders to translate those priorities
into executable policies. Fostering historical knowledge in the public
at large is also an important aspect of U.S. competitiveness. (and
we really don’t want to be learning
about history from bots)
Why
the deep learning boom caught almost everyone by surpriseFei-Fei
Li….created an image dataset that seemed ludicrously large to most of
her colleagues. But it turned out to be essential for demonstrating the
potential of neural networks trained on GPUs.
Developing
a public-interest training commons of booksCurrently, AI
development is dominated by a handful of companies that, in their rush
to beat other competitors, have paid insufficient attention to the
diversity of their inputs, questions of truth and bias in their outputs,
and questions about social good and access. Authors Alliance,
Northeastern University Library, and our partners seek to correct this
tilt through the swift development of a counterbalancing
project…
Also,
Quora Lies: WW2 Arial, Helvetica, Courier; also Times misinformation
(More and more wrong answers out there, in easy to find places. Somehow,
people will have to change content discovery habits to deal with scam
culture and AI slop, but we don’t know how. IMHO the need for user
research is greater than ever.)
[What
say you, Spock?] My Proposed Terminology to Describe Bypassing Social
Media Face ID Age Verification Systems (Interesting premise but are
kids going to pick up hacking habits again? Kids back in the early days
of the Internet had to hack because IT was rare, expensive, and flaky.
But people who developed their Internet habits in the 2000s-2010s had it
easy, because stuff was basically working but companies were still in
create more value than you capture mode. I suppose kids today
will have to learn to hack, not just beause of age verification stuff
but because companies are in permanent hustle/growth hacking/value
extraction mode, so the value available to the default user is less.
Hack the consumer surplus?)
The
Ugly Truth About Spotify Is Finally RevealedAround this same
time, I started hearing jazz piano playlists on Spotify that disturbed
me. Every track sounded like it was played on the same instrument with
the exact same touch and tone. Yet the names of the artists were all
different….By total coincidence, Spotify’s profitability started to
improve markedly around this time. and The
Ghosts in the Machine, by Liz Pelly
Joey Hess:
aiming at DecemberThe design goal of my 12 kilowatt system is to
produce 1 kilowatt of power all day on a cloudy day in midwinter, which
allows swapping between major loads (EV charger, hot water heater, etc)
on a cloudy day and running everything on a sunny day. So the size of
the battery bank doesn’t matter much. Batteries are getting cheaper fast
too, but they are a wear item, so it’s better to oversize the solar
system and minimize the battery….It costs more to mount solar panels now
than the panels are worth.
The Mozilla Firefox and Google Chrome browsers both have built-in
advertising features, which I generally
turn off because putting advertising features, even
privacy-enhancing ones, in browsers is a
bad idea. But the problem with going in
to the settings and changing things is not just that it takes time
to find stuff, but that it only affects the one browser profile you’re
in. So every time I add a user account or a new browser profile, I still
need to go to Settings and change the defaults again.
Fortunately it’s possible to turn the ad stuff off once and have it
stay off. Both browsers have enterprise management features.
The full list of available settings is at Chromium
- Policy List. Some of these can be handy additions to the
managed_policies.json file especially if you use multiple
profiles. For example, I also add
"DefaultBrowserSettingEnabled": false so that Google Chrome
does not ask to be default browser.
Both files should be owned by the owner of the containing directory
(root:root on my system) and mode 755.
That’s it.
There are ways to set this stuff up on Mac OS, too. I think it’s
supposed to be
for Firefox, but the /etc/ location might also work. For Google
Chrome, there are Set
up Chrome browser on Mac instructions.
There are also mentions of how to manage these two browsers on
Microsoft Windows. If someone who blogs about those two OSs has
instructions on how to set this up on other OS, please let me know and
I’ll link to your blog post.
For Mac OS: YOUR_BLOG_LINK_HERE
For Microsoft Windows: YOUR_BLOG_LINK_HERE
Appeasement fails, and one more tip
For about the past five years, a lot of proponents of in-browser ad
features have been going on about how we really need to let the
advertisers have their privacy-preserving advertising systems in
the browser, because otherwise the surveillance business is going to do
something worse. But, as we can see from recent news, that’s not how
boundary testing works. They put the ad features in the browser, and
then went ahead and increased
fingerprinting anyway.
Browser developer: can we make the browser a little creepy so we
don’t have to do worse stuff like fingerprinting?
User: ok, fine (clicks Got it)
Browser developer: well if you didn’t mind that, you won’t mind
this…fingerprinting…either, right?
You’ve 1) set a boundary, 2) communicated it to them, and, after the
boundary was crossed, 3) named a consequence to let them know, if this
happens again, this is what I am doing.
Accepting any in-browser ad feature just encourages them to test
boundaries again and make the browser incrementally creepier and more
intrusive. Consequences need to happen early and predictably, or the
person testing your boundaries learns that they can test further.
Letting creepy behavior slide is a way to get more of it later.
How can users realistically communicate with big companies that only
pay attention to lawsuits, news stories, and metrics measured in
millions? You can’t really turn off browser fingerprinting—that’s the
point, it’s based on hardware or software features that are hard for the
user to change—but you can send a signal (and as a useful side effect
protect yourself from nasty stuff like malvertising
targeted based on your employer.) One of the best underrated privacy
tips is just to visit https://myadcenter.google.com/home
and set Personalized Ads to Off. This doesn’t just help
protect yourself, it also (1) moves a metric that they track, so sends a
message that they will get, and (2) it does reduce surveillance
advertising revenue, so you help limit the flow of money to the other
side. Turning this stuff off is not mainly about protecting yourself,
it’s about helping at-risk people hide in the crowd and about reducing the incentives to
invest in surveillance.
No privacy setting or tool is a total fix by itself, but turning off
in-browser ad features and turning off personalization are both pretty
effective for the time invested. More tips:effective privacy
tips
Companies
issuing RTO mandates “lose their best talent”: Study (but it’s not
about talent. When the company is increasing profits by more deception,
surveillance, and value extraction from existing customers, then
employees who can signal loyalty are more valuable than employees who
might invent something new and legit, which is going to turn out to not
get made because it doesn’t look as revenue-positive as the crime
options anyway)
Why
Does U.S. Technology Rule?What I’m suggesting is that America’s
tech advantage may bear considerable resemblance to Britain’s banking
advantage. That is, it may have less to do with institutions, culture
and policy than the fact that for historical reasons the world’s major
technology hubs happen to be in the United States…
Feed
readers which don’t take “no” for an answer (More results from a
really useful tool. If, like me, your way to avoid The Algorithm is to
make your own feed reader, go sign up to see if you have all the
If-Modified-Since and related features working
correctly.)
Nodriver:
A Game-Changer in Web AutomationDesigned to bypass even the most
sophisticated anti-bot measures, Nodriver is a high-performance,
asynchronous web automation framework tailored for developers who
require a robust and reliable tool for scraping, testing, and automating
web interactions. (previously,
previously)
Trump2
Will Shake Up the “Competition Safe Spaces”What we know is that
there is complete paralysis in Brussels as we start to take a measure of
what may be coming our way – with decisions (DMA non compliance, Google
ad-tech) and policy initiatives all stalled in the wings, all in
suspended animation until the new Administration shows its true colours
and we figure out what threats and retribution might be coming our
way.
When IBM Built a War
Room for ExecutivesEngelbart’s Mother of All Demos showed how
advanced computing could create a shared, collaborative environment of
allied individuals, all direct users of the same system, befitting of a
laboratory of computer enthusiasts in Menlo Park, Calif. Dunlop’s
Executive Terminal demo showed how many of these same advanced
technologies could be directed along another path, that of a strictly
hierarchical organization, highly attuned to rank and defined roles and
specialties. (Related: What
Was The ‘Dowding System’?, CIC
[Combat Information Center] Yesterday and Today. A lot of people in
decision-making roles in 1960s corporations were WWII veterans.)
Nodriver:
A Game-Changer in Web Automation Despite the existence of
multiple plugins like puppeteer-stealth, rebrowser, real-browser and
many more, they have been quite detectable by WAFs like Cloudflare,
Imperva, and Datadome….Nodriver takes a different approach by getting in
at the framework level itself. By minimizing the affected footprint and
communicating directly over the Chrome Devtool Protocol itself, Nodriver
leaves very little marks of its presence, if any at all. A side effect
of this is that Nodriver is also one of the fastest scraping frameworks
available. (The scraper bot will always get through?)
“Modern
Work Fucking Sucks.”Your company doesn’t just use one app; it
uses all of them. Slack for chatting, Zoom for meetings, Notion for
brainstorming, Trello for project tracking, Asana for workflows, and
Jira for… something vaguely technical that no one fully understands. The
end result isn’t streamlined productivity, it’s a Byzantine ecosystem of
software where every app exists to talk to every other app while you
stand in the middle, trying to make sense of the chaos. (Adam Smith
would facepalm. Specialization of labor is a thing, especially for
administrative and organizational tasks. Remember the ideal software
development team in The Mythical Man-Month had two
secretaries and a program clerk? I guess the good news here is
that Simple Sabotage for the
21st Century is almost undetectable in the presence of normal IT
friction.)
Consumer
Solar Surge: Pakistan Shows you Don’t Need Government Programs to Green
the GridWhile no one was looking, the Pakistani public took
matters into their own hands, adding 17 gigawatts of solar power this
year. These installations are mostly in the form of Chinese panels for
rooftop or ground level solar in towns and villages. (Yes, the grid
power generally goes off when it’s sunny, and yes, there are a lot of
people who are good at electrical work and in importing stuff from
China.)
To start a command a new tab, use the --tab command-line
option to gnome-terminal, along with -- to separate the
gnome-terminal options from the options passed to the commnd being
run.
I can
now run a GPT-4 class model on my laptopI’m so excited by the
continual efficiency improvements we’re seeing in running these
impressively capable models. In the proprietary hosted world it’s giving
us incredibly cheap and fast models like Gemini 1.5 Flash, GPT-4o mini
and Amazon Nova. In the openly licensed world it’s giving us
increasingly powerful models we can run directly on our own devices.
(Openly licensed in this context means, in comparison to API access, you
get predictable pricing and no surprise nerfing. More: generative ai
antimoats)
$700bn
delusion: Does using data to target specific audiences make advertising
more effective? Latest studies suggest notWe can improve the
quality of our targeting much better by just buying ads that appear in
the right context, than we can by using my massive first party database
to drive the buy, and it’s way cheaper to do that. Putting ads in
contextually relevant places beats any form of targeting to individual
characteristics. Even using your own data. (This makes sense—if the
targeting data did increase return on ad spend, then the price of the
data and targeting-related services would tend to go up to capture any
extra value.)
Defining AII think we should shed the idea that AI is a technological artifact
with political features and recognize it as a political artifact through
and through. AI is an ideological project to shift authority and
autonomy away from individuals, towards centralized structures of
power.
Ever wish Google would have one button for opt me out of all
Google growth hacking schemes that you could click once and be done
with it? Me too. But that’s not how it works.
This opt-out has two parts and you do need to have a Google Account
to do it.
Either set up Google Search
Console and add your site(s) as web properties on there, or
go to your existing Google Search Console account and get a list of your
web properties.
Hopefully this awkward form thing is just temporary and there will be
a more normal opt-out with a meta tag or something at some point. I’ll
update this page if they make one.
IMHO the IT business had a peak some time in the mid-2000s.
You didn’t have to dink with vintage PC stuff like DIP switches and
partition tables, but the Internet companies were still in create
more value than you capture mode and you didn’t have to work around
too many dark patterns either. If I recall correctly, Microsoft did
something like this link-adding scheme in Internet Explorer at one
point, but they backed off on it before it really became a thing and the
opt-out was easier. Welcome to the return of the
power user. Oh well, writing up all the
individual opt outs is good for getting clicks. The Google Search
algorithm loves tips on how to turn Google stuff off.
Related (more stuff to turn off)
fix Google
Search: get rid of most of the AI and other annoying
features
Block
AI training on a web site Right now you can’t block Google from
taking your content for AI without also blocking your site from Google
Search, but that’s likely to change.
Bonus links
Why
the DOJ’s Google Ad Tech Case Matters to YouIn 2020, as the UK
report cited above showed, publishers received only 51% of the money
spent by advertisers to reach readers, and about 15% of advertisers’
money seems to just… disappear.
Use an ad blocking extension when performing internet searches. Most
internet browsers allow a user to add extensions, including extensions
that block advertisements. These ad blockers can be turned on and off
within a browser to permit advertisements on certain websites while
blocking advertisements on others.
This is still good advice. Search ads are full of scams, and you can
block ads on search without blocking the ads on legit sites. I made a local copy of the FBI alert.
Why did they take the web version down? Maybe we’ll find out. I sent
the FBI a FOIA request for any correspondence about this alert and the
decision to remove it.
SingleFile
is a convenient extension for saving copies of pages. (I got the FBI
page from the Internet Archive. It’s a US government work so make all
the copies you want.)
The
NSA and CIA Use Ad Blockers Because Online Advertising Is So
Dangerous by Joseph Cox. The IC, which also includes the parts of
the FBI, DEA, and DHS, and various DoD elements, has deployed
ad-blocking technology on a wide scale, according to a copy of a letter
sent by Congress and shared with Motherboard.
Designing
a push life in a pull worldEverything in our online world is
designed to push through our boundaries, usually because it’s in someone
else’s financial best interest. And we’ve all just accepted that this is
the way the world works now.
Killer
Robots About to Fill Skies… (this kind of thing is why the EU
doesn’t care about AI innovation in creepy tracking and copyright
infringement—they need those developers to get jobs in the defense
industry, which isn’t held back by the AI Act.)
Inside
the Bitter Battle Between Starbucks and Its Workers (More news from
management putting dogmatic union-busting ahead of customers and
shareholders, should be a familiar story to anyone dealing with
inadequate ad review or search quality ratings.)
National
Public Data saga illustrates little-regulated US data broker
industryNational Public Data appears to have been a home-based
operation run by Verini himself. The enterprise maintains no
dedicated physical offices. The owner/operator maintains the operations
of company from his home office, and all infrastructure is housed in
independent data centers, Verini said in his bankruptcy
filing.
The
money is flooding in, but what are prediction markets truly telling
us?If we look back further, predicted election markets were
actually legal in the US from the 1800s to 1924, and historical data
shows that they were accurate. There’s a New York Times story of Andrew
Carnegie noting how surprisingly accurate the election betting markets
were at predicting outcomes. They were actually more accurate before the
introduction of polling as a concept, which implies that the
introduction of polling diluted the accuracy of the market, rather than
the opposite.
Was
the Polymarket Trump whale smart or lucky?Whether one trader’s
private polling tapped sentiment more accurately than the publicly
available surveys, or whether statistical noise just happened to
reinforce his confidence to buy a dollar for 40c, can’t be known without
seeing the data.
Koleman
Strumpf Interview - Prediction Markets & More2024 was a huge
vindication for the markets. I don’t know how else to say it, but all
the polls and prognosticators were left in the dust. Nobody came close
to the markets. They weren’t perfect, but they were an awful lot better
than anything else, to say the least.
Just betting on an election every few years is not the interesting
part, though. Info
Finance is a broader concept. [I]nfo finance is a discipline
where you (i) start from a fact that you want to know, and then (ii)
deliberately design a market to optimally elicit that information from
market participants.
Signal
Is Now a Great Encrypted Alternative to Zoom and Google MeetThese updates mean that Signal is now a free, robust, and secure
video conferencing service that can hang with the best of them. It lets
you add up to 50 people to a group call and there is no time limit on
each call.
What
can McCain’s Grand Prix win teach us? Nothing newEver since
Byron Sharp decided he was going for red for his book cover, marketing
thinkers have assembled a quite extraordinary disciplinary playbook. And
it’s one that looks nothing like the existing stuff that it replaced. Of
course, the majority of marketers know nothing about any of it. They
inhabit the murkier corners of marketing, where training is rejected
because change is held up as a circuit-breaker for learning anything
from the past. AI and the ‘new consumer’ mean everything we once knew is
pointless now. Better to be ignorant and untrained than waste time on
irrelevant historical stuff. But for those who know that is bullshit,
who study, who respect marketing knowledge, who know the foundations do
not change, the McCain case is a jewel sparkling with everything we have
learned in these very fruitful 15 years.
The
Rogue Emperor, And What To Do About ThemAnywhere there is an
organisation or group that is centred around an individual, from the
smallest organisation upwards, it’s possible for it to enter an almost
cult-like state in which the leader both accumulates too much power, and
loses track of some of the responsibilities which go with it. If it’s a
tech company or a bowls club we can shrug our shoulders and move to
something else, but when it occurs in an open source project and a
benevolent dictator figure goes rogue it has landed directly on our own
doorstep as the open-source community.
One
Company A/B Tested Hybrid Work. Here’s What They Found.According
to the Society of Human Resource Management, each quit costs companies
at least 50% of the employees’ annual salary, which for Trip.com would
mean $30,000 for each quit. In Trip.com’s experiment, employees liked
hybrid so much that their quit rates fell by more than a third — and
saved the company millions of dollars a year.
Remote
Startups Will Win the War for Top TalentIronically, in another
strike against the spontaneous collaboration argument, a study of
two Fortune 500 headquarters found that transitioning from cubicles to
an open office layout actually reduced face-to-face interactions by 70
percent.
Why
Getting Your Neighborhood Declared a Historic District Is a Bad IdeaHistoric designations are commonly used to control what people can do
with their own private property, and can be a way of creating a kind of
“backdoor” homeowners association. Some historic neighborhoods (many of
which have dubious claims to the designation) around the country have
HOA-like restrictions on renovations, repairs, and even
landscaping.
An
awful lot of FOSS should thank the AcademyLinux and open source
in general seem to be huge components of the movie special effects
industry – to an extent that we had not previously realized. (unless
you have a stack of old Linux Journal back issues from the
early 2000s—we did a lot of movie covers at the time that much of this
software was being developed.)
Using an 8K TV
as a MonitorFor programming, word processing, and other
productive work, consider getting an 8K TV instead of a multi-monitor
setup. An 8K TV will have superior image quality, resolution, and
versatility compared to multiple 4K displays, at roughly the same
size. (huge TVs are an under-rated, subsidized technology, like POTS
lines. Most or all of the huge TVs available today are smart and
sold with the expectation that they’ll drive subscription and
advertising revenue, which means a discount for those who use them as
monitors.)
The Unlikely
Inventor of the Automatic Rice CookerFumiko, the mother of six
children, spent five years researching and testing to document the ideal
recipe. She continued to make rice three times a day, carefully
measuring water-to-rice ratios, noting temperatures and timings, and
prototyping rice-cooker designs. Conventional wisdom was that the heat
source needed to be adjusted continuously to guarantee fluffy rice, but
Fumiko found that heating the water and rice to a boil and then cooking
for exactly 20 minutes produced consistently good results.
Comments
on TSA proposal for decentralized nonstandard ID requirementsCompliance with the REAL-ID Act requires a state to electronically
share information concerning all driver’s licenses and state-issued IDs
with all other states, but not all states do so. Because no state
complies with this provision of the REAL-ID Act, or could do so unless
and until all states do so, no state-issued driver’s licenses or ID
cards comply with the REAL-ID Act.
The consensus, dismal future of the Internet is usually wrong.
Dystopias make great fiction,
but the Internet is surprisingly good at muddling through and reducing
each one to nuisance level.
We don’t have Clipper Chip dystopia that would have put backdoors
in all cryptography.
We don’t have software patent cartel dystopia that would have
locked everyone in to limited software choices and functionality, and a
stagnant market.
We don’t have Fritz Chip dystopia that would have mandated
Digital Rights Management on all devices.
None of these problems have gone away entirely—encryption backdoors,
patent trolls, and DRM are all still there—but none have reached either
Internet-wide catastrophe level or faded away entirely.
Compromising with the dystopia narrative always looks like the
realistic or grown-up path forward, until it doesn’t. And then the
non-dystopia timeline generally looks inevitable once you get far enough
along it. This time it’s the same way. We don’t need cross-context
personalized (surveillance) advertising in our web browsers any
more than we need SCO
licensesnot counting the SCO license timeline as
dystopia, but another good example of dismal timeline averted in
our operating systems. Let’s look at the numbers. I’m going to make all
the assumptions most favorable to the surveillance advertising argument.
It’s actually probably a lot better than this. And it’s probably better
in other countries, since the USA is relatively advanced in the
commercial surveillance field. (If you have these figures for other
countries, please let me know and I’ll link to them.)
Total money spent on advertising in the USA: $389.49
billion
That comes out to about $1,160 spent on advertising to reach the
average person in the USA every year. That’s $97 per month.
So let’s assume (again, making the assumption most favorable to the
surveillance side) that all advertising is surveillance advertising. And
ads without the surveillance, according
to Professor Garrett Johnson are worth 52 percent less than the
surveillance ads.
So if you get rid of the surveillance, your ad subsidy goes from $97
to $46. Advertisers would be spending $51 less to advertise to you, and
the missing $51 is a good-sized amount of extra money to come up with
every month. But remember, that’s advertising money, total, not the
amount that actually makes it to the people who make the ad-supported
resources you want. Since the problem is how to replace the income for
the artists, writers, and everyone else who makes ad-supported content,
we need to multiply the missing ad subsidy by the fraction of that
top-level advertising total that makes it through to the content creator
in order to come up with the amount of money that needs to be filled in
from other sources like subscriptions and memberships.
How much do you need to spend on subscriptions to replace $51 in ad
money? That’s going to depend on your habits. But even if you have
everything set up totally right, a dollar spent on ads to reach you will
buy you less than a dollar you spend yourself. Thomas Baekdal writes, in
How
independent publishing has changed from the 1990s until today,
Up until this point, every publisher had focused on ‘traffic at
scale’, but with the new direct funding focus, every individual
publisher realized that traffic does not equal money, and you could
actually make more money by having an audience who paid you directly,
rather than having a bunch of random clicks for the sake of advertising.
The ratio was something like 1:10,000. Meaning that for every one person
you could convince to subscribe, donate, become a member, or support you
on Patreon … you would need 10,000 visitors to make the same amount from
advertising. Or to put that into perspective, with only 100 subscribers,
I could make the same amount of money as I used to earn from having one
million visitors.
All surveillance ad media add some kind of adtech
tax. The Association of National Advertisers found that about 1/3 of
the money spent to buy ad space makes it through to the publisher.
A subscription platform and subscriber services impose some costs
too. To be generous to the surveillance side, let’s say that a
subscription dollar is only three times as valuable as an advertising
dollar. So that $51 in missing ad money means you need to come up with
$17 from somewhere. This estimate is really on the high side in
practice. A lot of ad money goes to overhead and to stuff like retail ad
networks (online sellers bidding for better spots in shopping search
results) and to ad media like billboards that don’t pay for content at
all.
So, worst case, where do you get the $17? From buying less crap,
that’s where. Mustri
et al.(PDF) write,
[behaviorally] targeted ads are more likely to be associated with
lower quality vendors, and higher prices for identical products…
You also get a piece of the national
security and other collective security benefits of eliminating
surveillance, some savings in bandwidth and computing resources, and a
lower likelihood of becoming a victim of fraud and identity theft. But
that’s pure bonus benefit on top of the win from saving money by
spending less on overpriced, personally targeted, low-quality products.
(If privacy protection didn’t help you buy better stuff, the
surveillance companies would have said so by now.) Because
surveillance advertising gives
an advantage to deceptive advertisers over legit ones, the end of
surveillance advertising would also mean an increase in sales for legit
brands.
And we’re not done. As a wise man once said,
But wait! There’s more! Before you rush to do effective privacy
tips or write to your state
legislators to support anti-surveillance laws, there’s one more
benefit for getting rid of surveillance/personalized advertising.
Remember that extra $51 that went away? It didn’t get burned up in a
fire just because it didn’t get spent on surveillance advertising.
Companies still have it, and they still want to sell you stuff. Without
surveillance, they’ll have to look for other ways to spend it. And many
of the options are win-win for the customer. In Product
is the P all marketers should strive to influence, Mark Ritson
points out the marketing wins from incremental product improvements, and
that’s the kind of work that often gets ignored in favor of niftier,
short-term, surveillance advertising projects. Improving service and
pricing are other areas that will will also do better without
surveillance advertising contending for budgets. There is a lot of
potential gain for a lot of people in getting rid of surveillance
advertising, so let’s not waste the opportunity. Don’t worry, we’ll get
another Internet dystopia narrative to worry about eventually.
Product
is the P all marketers should strive to influenceIf there is one
thing I have learned from a thousand customers discussing a hundred
different products it’s that the things a company thinks are small are,
from a consumer perspective, big. And the grand improvements the company
is spending bazillions on are probably of little significance. Finding
out from the source what needs to be fixed or changed and then getting
it done is the quiet product work of proper marketers. (yes, I
linked to this twice.)
Marketers
in a dying internet: Why the only option is a return to simplicityWith machine-generated content now cluttering the most visible online
touchpoints (like the frontpage of Google, or your Facebook timeline),
it feels inevitable that consumer behaviors will shift as a result. And
so marketers need to change how they reach target audiences.
Is
AI advertising going to be too easy for its own good?As Rory
Sutherland said, When human beings process a message, we sort of
process how much effort and love has gone into the creation of this
message and we pay attention to to the message accordingly. It’s costly
signaling of a kind.
Satire Without Purpose Will
Wander In Dark PlacesBroadly labelling the entirety of
Warhammer 40,000 as satire is no longer sufficient
to address what the game has become in the almost 40 years since its
inception. It also fails to answer the rather awkward question of why,
exactly, these fascists who are allegedly too stupid to understand
satire are continually showing up in your satirical community in the
first place.
Why
I’m staying with Firefox for now – Michael Kjörling[T]he most
reasonable option is to keep using Firefox, despite the flaws of the
organization behind it. So far, at least these things can be disabled
through settings (for example, their privacy-preserving ad
measurement), and those settings can be prepared in advance.
Scripting NewsMy friends at Automattic showed me how to turn on ActivityPub on a
WordPress site. I wrote a test post in my simple WordPress editor,
forgetting that it would be cross-posted to Mastodon. When I just
checked in on Masto, there was the freaking post. After I recovered from
passing out, I wondered what happens if I update the post in my editor,
and save it to the WordPress site that’s hooked up to Masto via
ActivityPub. So I made a change and saved it. I waited and waited,
nothing happened. I got ready to add a comment saying ahh I guess it
doesn’t update, when—it updated. (Like being happy when a new web
site opens in a new browser, a good sign that ActivityPub is the
connecting point for this kind of connected innovation.) Related: The Web Is a Customer Service Medium
(Ftrain.com) by Paul Ford.
Removal
of Russian coders spurs debate about Linux kernel’s politicsEmployees of companies on the Treasury Department’s Office of Foreign
Assets Control list of Specially Designated Nationals and Blocked
Persons (OFAC SDN), or connected to them, will have their collaborations
subject to restrictions, and cannot be in the MAINTAINERS
file.
The
TikTokification of Social Media May Finally Be Its Undoing by Julia
Angwin. If tech platforms are actively shaping our experiences, after
all, maybe they should be held liable for creating experiences that
damage our bodies, our children, our communities and our
democracy.
Cheap
Solar Panels Are Changing the WorldThe latest global report from
the International Energy Agency (IEA) notes that solar is on track to
overtake all other forms of energy by 2033.
It
Was Ten Years Ago Today that David Rosenthal predicted that
cryptocurrency networks will be dominated by a few, perhaps just one,
large participant.
Writing
Projects (good start for a checklist before turning in a writing
project. Maybe I should write Git hooks for these.)
Word.(s).
(Includes some good vintage car ads. Remember when most car ads were
about the car, not just buttering up the driver with how successful you
must be to afford this thing?)
Social
Distance and the Patent System[I]t was clear from our
conversation that [Judge Paul] Michel doesn’t have a very deep
understanding of the concerns of many in the software industry. And,
more to the point, he clearly wasn’t very interested in understanding
those concerns better or addressing them. On a theoretical level, he
knew that there was a lot of litigation in the software industry and
that a lot of people were upset about it. But like Fed and the
unemployment rate, this kind of theoretical knowledge doesn’t always
create a sense of urgency. One has to imagine that if people close to
Michel—say, a son who was trying to start a software company—were
regularly getting hit by frivolous patent lawsuits, he would suddenly
take the issue more seriously. But successful software entrepreneurs are
a small fraction of the population, and most likely no judges of the
Federal Circuit have close relationships with one.
Atkinson
Hyperlegible Fontwas carefully developed by the Braille
Institute to help low-vision readers. It improves legibility and
readability through clear, and distinctive letters and
numbers.
B612 – The font family is an
highly legible open source font family designed and tested to be used on
aircraft cockpit screens.
Routed
Gothic Font is traced from an old Leroy Lettering set used in
vintage technical drawings and control panels.
I’m trying to keep this site fairly small and fast, so getting by
with Modern Font Stacks as
much as possible.
Discover
Wiped Out MFA Spend By Following These Four Basic Steps (This
headline underrates the content. If all web advertisers did these tips,
then 90% of the evil stuff on the Internet would be gone—most of the
web’s problems are funded by advertisers and agencies who fail to pay
attention to the context in which their ads appear.)
Wouldn’t it be good for corporate AI agents to have customer hands
to shake that are also equipped with agentic AI? Wouldn’t those
customers be better than ones whose agency is merely human, and limited
to only what corporate AI agents allow?
The obvious answer for business decision-makers today is:
lol, no, a locked-in customer is worth more. If, as a
person who likes to watch TV, you had an AI agent, then the agent could
keep track of sports seasons and the availability of movies and TV
shows, and turn your streaming subscriptions on and off. In the
streaming business, like many others, the management consensus is to
make things as hard
and manual as possible on the customer side, and save the automation
for the company side. Just
keeping up with watching a National Football League team is hard…even
for someone who is ON the team. Automation asymmetry, where the
seller gets to reduce service costs while the customer has to do more
and more manual work, is seen as a big win by the decision-makers on the
high-automation side.
The public has for a while now switched to using AI as a
negative—using the term artificial much as you do with
artificial flavouring or that smile’s artificial. It’s
insincere creativity or deceptive intelligence.
Other news is even worse. In today’s global conflict between evil
oligarchs and everyone else, AI is firmly aligned with the evil oligarch
side.
Authors who release under Creative Commons licenses are
disagreeing with the CC organization about whether AI training is fair
use: fair use
alignment chart
So how do we get from the AI adversarial situation we have now to the
win-win that Doc is looking for? Part of the answer will be resolving
the legal issues.
Today’s Napster-like free-for-all environment won’t persist, so
eventually we will have an AI scene in which companies that want to use
your work for training have to get permission and disclose
provenance.
The other part of the path from today’s situation—where big companies
have AI that enables scam
culture and chickenization
while individuals and small companies are stuck rowing through funnels
and pipelines—is personal, aligned AI that balances automation
asymmetries. Whether it’s solving CAPTCHAs, getting data in
hard-to-parse formats, or other awkward mazes, automation
asymmetries mean that as a customer, you technically have more
optionality than you practically have time to use. But AI has a lot more
time. If a company gives you user experience grief, with the right tools
you can get back to where you would have been if they had applied less
obfuscation in the first place. (icymi: Video
scraping: extracting JSON data from a 35 second screen capture for less
than 1/10th of a cent Not a deliberate obfuscation example, but an
approach that can be applied.)
So we’re going to see something like this AI
cartoon by Tom Fishburne (thanks to Doc for the link) for privacy
labour. Companies are already getting expensive
software-as-a-service to make privacy tasks harder for the customers,
which means that customers are going to get AI services to make it
easier. Eventually some companies will notice the extra layers, pay attention to the research,
and get rid of the excess grief on their end so you can stop running
de-obfuscation on your end. That will make it work better for everyone.
(GPC all the
things!Data Rights
Protocol)
The biggest win from personal AI will, strangely enough, be in
de-personalizing your personal information environment. By
doing the privacy labour for you, the agentic AI will limit your
addressability and reduce personalization
risks. The risks to me from buying the less suitable of two legit
brands are much lower than the risk of getting stuck with some awful
crap that was personalized to me and not picked up on by norms
enforcers like Consumer Reports. Getting more of my privacy labour
done for me will not just help me personally do better
#mindfulConsumption, but also increase the rewards for
win-win moves by sellers. Personalization might
be nifty, but filtering out crap and rip-offs is a bigger immediate win:
Sunday Internet
optimism Doc writes, When you limit what customers can
bring to markets, you limit what can happen in those markets. As far
as I can tell, the real promise for agentic AI isn’t just in enabling
existing processes or making them more efficient. It’s in establishing a
credible deterrent to enshittification—if you’re trying to rip me off,
don’t talk to me, talk to my bot army.
For just a minute, put yourself in the shoes of a product manager
with a proposal for some legit project that they’re trying to get
approved. If that proposal is up against a quick win for the company,
like one based on creepy
surveillance, it’s going to lose. But if the customers have the
automation power to lower the ROI from creepy growth hacking, the legit
project has a chance. And that pushes up the long-term value of the
entire company. An individual locked-in customer is more
valuable to the brand than an individual independent customer, but a
brand with independent customers is more valuable than a brand with an
equal number of locked-in customers.
Tired of Big Tech companies giving consumer protections, fraud
protections, and privacy protections to their users in other countries
but not to people at home in the USA? Here’s another state law we could
use, and I bet it could be a two-page PDF.
If a company has more than 10% of our state’s residents as customers
or users, and also does business in 50 or more countries, then if they
offer a privacy or consumer protection feature in a non-US location they
must also offer it in our state within 90 days.
Have it enforced Texas SB 8 style, by individuals, so harder
for Big Tech sockpuppet orgs to challenge.
there ought
to be a law (Big Tech lobbyists are expensive—instead of grinding
out the PDFs they expect, make them fight an unpredictable distributed
campaign of random-ish ideas, coded into bills that take the side of
local small businesses?)
Bonus links
How
the long-gone Habsburg Empire is still visible in Eastern European
bureaucracies todayThe formal institutions of the empire ceased
to exist with the collapse of the Habsburg Empire after World War I,
breaking up into separate nation states that have seen several waves of
drastic institutional changes since. We might therefore wonder whether
differences in trust and corruption across areas that belonged to
different empires in the past really still survive to this day.
TikTok
knows its app is harming kids, new internal documents show : NPR
(this kind of stuff is why I’ll never love your brand—if a brand
is fine with advertising on surveillance apps with all we know about how
they work, then I’m enough opposed to them on fundamental issues that
all transactions will be based on lack of trust.)
Conceptual
models of space colonization (One that’s missing: Kurt Vonnegut’s
concept involving large-scale outward transfer of genetic material.
Probably most likely to happen if you add in Von Neumann machines and
the systems required to grow live colonists from genetic data—which
don’t exist but are not physically or economically impossible…)
The
cheap chocolate systemThe giant chocolate companies want cocoa
beans to be a commodity. They don’t want to worry about origin or
yield–they simply want to buy indistinguishable cheap cacao. In fact,
the buyers at these companies feel like they have no choice but to push
for mediocre beans at cut rate prices, regardless of the human cost.
(so it’s like adtech you eat?)
How web bloat impacts users
with slow devicesCPU performance for web apps hasn’t scaled
nearly as quickly as bandwidth so, while more of the web is becoming
accessible to people with low-end connections, more of the web is
becoming inaccessible to people with low-end devices even if they have
high-end connections.
If you have a font in TTF (TrueType) format and need WOFF2 for web
use, there is a woff2_compress utility packaged for Fedora
(but still missing a man page and --help feature.) The
package is woff2-tools.
Antitrust
challenge to Facebook’s ‘superprofiling’ finally wraps in Germany — with
Meta agreeing to data limits Meta has to offer a cookie setting
that allows Facebook and Instagram users’ data to decide whether they
want to allow it to combine their data with other information Meta
collects about them — via third-party websites where its tracking
technologies are embedded or from apps using its business tools —
or kept separate. but some of the required privacy+competition fixes
must be Germany-only. (imho some US state needs
a law that any privacy or consumer protection feature that a large
company offers to users outside the US must also be available in that
state.)
California
Privacy Act Sparks Website Tracking Technology Suits (This is a
complicated one. Lawsuit accuses a company of breaking not one, not two,
but three California privacy laws. And the California Constitution, too.
Motion to dismiss mostly
denied (PDF). Including a CCPA claim. Yes, there is a CCPA
private right of action. CCPA claims survive a motion to
dismiss where a plaintiff alleges that defendants disclosed plaintiff’s
personal information without his consent due to the business’s failure
to maintain reasonable security practices. In this case, Google
Analytics tracking on a therapy site. I have some advice on how to get
out in front of this kind of case, will share later.)
Digital
Scams More Likely to Hurt Black and Latino Consumers - Consumer
ReportsCompounding the problem, experts believe, is that Black
and Latino consumers are disproportionately targeted by a wide variety
of digital scams. (This is a big reason why the I have nothing to
hide argument about privacy doesn’t work. When a user who is less
likely to be discriminated against chooses to participate in a system
with personalization
risks, that user’s information helps make user-hostile
personalization against others work better. Privacy is a collective
problem.)
Google
never cared about privacyIt was a bit of a tell how the DV360
product team demonstrated zero sense of urgency around making it easier
for some buyers to test Privacy Sandbox, let alone releasing test
results to prove it worked. The Chrome cookie deprecation delays, the
inability of any ad tech expert or observer to convincingly explain how
Google could possibly regulate itself — all of these deserve renewed
scrutiny, given what we now know. (Google Privacy Sandbox was
never offered as an option for YouTube, either. The point of janky
in-browser ads is to make the slick YouTube ads, which have better
reporting, look better to advertisers who have to allocate budget
between open web and YouTube.)
Taylor
Swift: Singer, Songwriter, Copyright Innovator[R]ecord companies
are now trying to prohibit re-recordings for 20 or 30 years, not just
two or three. And this has become a key part of contract negotiations.
Will they get 30 years? Probably not, if the lawyer is competent. But
they want to make sure that the artist’s vocal cords are not in good
shape by the time they get around to re-recording.
Should I get into a drinking game with the Devil?
No, for three important reasons unrelated to your skill at the game.
The Devil can out-drink you.
The Devil can drink substances that are toxic to you even in
small quantities.
The Devil can cheat in ways that you will not be able to detect,
and take advantage of rules loopholes that you might not
understand.
What if I am really good at the skills required for the
game? Still no. Even if you have an accurate idea of your own
skill level, it is hard to estimate the Devil’s skill level. And even if
you have roughly equally matched skills, the Devil still has the three
advantages above.
What if I’m already in a drinking game with the
Devil? I can’t offer a lot of help here, but I have read a fair
number of comic books. As far as I can tell, your best hope is to delay
playing and to delay taking a drink when required to. It is possible
that some more powerful entity could distract the Devil in a way that
results in the end of the game.
Risks
vs. Harms: Youth & Social MediaOf course, there are harms
that I do think are product liability issues vis-a-vis social media. For
example, I think that many privacy harms can be mitigated with a design
approach that is privacy-by-default. I also think that regulations that
mandate universal privacy protections would go a long way in helping
people out. But the funny thing is that I don’t think that these harms
are unique to children. These are harms that are experienced broadly.
And I would argue that older folks tend to experience harms associated
with privacy much more acutely.
Google
Search user interface: A/B testing shows security concerns remainFor the past few days, Google has been A/B testing some subtle visual
changes to its user interface for the search results page….Despite a
more simplified look and feel, threat actors are still able to use the
official logo and website of the brand they are abusing. From a user’s
point of view, such ads continue to be as misleading.
I can’t quite get Google Search back to pre-enshittification, but
this is pretty close.
Screenshot of simplified Google Search
results after doing the tips in this blog post
Remove AI crap
This will probably make the biggest change in the layout. Makes the
AI material and various other growth hacking stuff disappear from the
top of search results pages so it’s easier to get to the normal
links.
Right now the extension that is best at blocking search ads is uBlock Origin. Yes, this is
a pain if you are fine with seeing ads on legit sites—you have to turn
the ads back on for any site that can run them responsibly. I hope to be
able to move this tip back down on the list if Google can fix their
malvertising problem, but for now it’s a must do.
Turn off ad tracking
If you use Google Search with a Google Account, go to https://myadcenter.google.com/home
and set Personalized Ads to Off. This is important to do
even if you have the ads on Google Search blocked. This probably won’t
reduce the raw number of ads, but will make it harder for Google to
match you with a deceptive ad targeted at you. (The scam ads are even
impersonating Google now.)
Start a blocklist
Some sites are better at SEO than at content and keep showing up in
search results. This step doesn’t help the first time that a crap site
comes up, but future searches on related topics tend to get better
results as I block the over-SEOed sites to let the legit sites rise to
the top.
Google Chrome: (There is supposed to be an
extension like this for Google Chrome too, but I don’t have the
link.)
This one gets better as my blocklist grows. If you try this one, be
patient.
Fix click tracking
Use ClearURLs to
remove tracking redirects. (Original Google results were links to
sites—now they’re links back to Google which redirects to the sites,
collecting extra data from you and slowing down browsing by one step.
ClearURLs restores the original behavior. (To me it feels faster but I
haven’t done a benchmark.)
Turn off browser advertising features
These are not used much today, but turning these off will probably
help you get cleaner (less personalized) search results in
future, so might as well check them.
And we’re done. (Yes, 1998 called, they
want their ten blue links page back.)
Related
(update 20 Apr 2025) Google
is WRONG: structured data does impact your site’s ranking. (Elie
Berreby covers the situation from the web site side. One of the reasons
that Google search results have gotten worse is that click-through
rate (CTR) affects page ranking. Pages now get ranked higher based
on the ability to get a click, not just on the harder task of getting an
inbound link. If everyone ran an extension to stop sharing click data,
we’d all get better results.)
Commission
sends preliminary findings to Alphabet under the Digital Markets ActAlphabet gives its own services more prominent treatment compared to
others by displaying them at the top of Google Search results or on
dedicated spaces, with enhanced visual formats and filtering
mechanisms. (not if I can help it. I don’t have to worry about this
now.)
Do we really need another CCPA-like state privacy law, or can states
mix it up a little in 2025? (The easy version: privacy
laws for slacker states)
What if, instead of big boring laws intended to cover everything,
legislators did more of a do
the simplest thing that could possibly work approach? Big Tech
lobbyists are expensive—maybe a better way to beat them is, instead of
grinding out the long-ass PDFs they expect, make them fight an
unpredictable distributed campaign of random-ish short bills that take
the side of local small businesses?
Require generative AI companies to offer an opt out that is not
tied to any other services such as search. AI legal links
Require Global Privacy Control for smart TVs and appliances, and
for smart home platforms that support ordering or subscriptions. GPC all the things!.
We also need an opt-out preference signal for NFC tap to pay devices.
(AB 3048 in California was a good idea, but it got changed to cover
browsers and phones only, so would have tended to drive surveillance to
devices where it’s harder to avoid, which would be a terrible experience
for users. Thank you for browsing our catalog site, use your
compatible smart appliance to actually order anything.) Update
31 Oct 2024: possibly combine the GPC mandate with a reform to
wiretapping laws to address the CIPA
Uncertainty that a lot of companies have been on about recently.
Amend CIPA and similar state wiretapping laws to state that data
collection from a device or client software that supports GPC is
definitely not wiretapping. That way the companies get the
legal ambiguity resolved, the users get their opt-outs, sounds like a
solution we can all live with.
Require clear explanations of consumer categories and inferences
in RtKs. OTHER
ATTRIBUTES (Colorado already has this.)
Postal RtK/RtD/opt outs. If a postal backup is available, that
sets the floor for how annoying a company can make the online process.
The problem with CCPA RtK
workflows
Euroclone law: if a company operates in 50 or more countries, and
offers a consumer or privacy protection feature to the residents of some
jurisdiction outside the USA, then that feature must also be offered to
residents of our state. More: privacy
laws for slacker states
Federal: Keep Section
230 immunity for platforms, but pass liability through to the
advertisers. Big Tech would have to clean up their act to keep
brands.
Update existing wiretapping laws to cover modern surveillance in
media where no GPC or analogous opt-out is available. In the Kathleen
Vita v. New England Baptist Hospital decision,
the court wrote, If the Legislature intends for the wiretap act’s
criminal and civil penalties to prohibit the tracking of a person’s
browsing of, and interaction with, published information on websites, it
must say so expressly.
Yes, the Big Tech companies will try to get small businesses to come
out and advocate for surveillance, but there are a bunch of other small
business issues that limitations on surveillance could help address, by
shifting the balance of power away from surveillance companies.
Are small business owners contending for search rankings and map
listings with fake businesses pretending to be competitors in their
neighborhood?
Is Big Tech placing bogus charges on their advertiser account–or,
if they run ads on their own site, are ad companies docking their pay
for unexplained “invalid traffic”?
Are companies taking their content for “AI” that directly
competes with their sites—without letting them opt out, or offering an
opt-out that would make their business unable to use other
services?
Can a small business even get someone from Big Tech on the phone,
or are companies putting their dogmatic programs of union-busting and
layoffs ahead of service even to advertisers and good business
customers?
What happens when an account gets compromised or hacked? Do small
businesses have any way to get help (without knowing someone who happens
to know someone at the big company?)
(Update 9 Nov 2024)Each legal victory for groups like
NetChoice reveals to state lawmakers how to craft more resilient
laws. – Jess
Miers, on Techdirt
New
Map Shows Community Broadband Networks Are Exploding In U.S.Community-owned broadband networks provide faster, cheaper, better
service than their larger private-sector counterparts. Staffed by
locals, they’re also more directly accountable and responsive to the
needs of locals
So It Goes GHQ is
a board game invented by Kurt Vonnegut in 1956. GHQ is to WWII what
chess is to the Medieval battlefield.
The Other
BubbleWhile SaaS is generally a good deal for small-to-mid-sized
companies, the inevitable sprawl of letting SaaS into your organization
means that you’re stuck with them.
Posse: Reclaiming
social media in a fragmented worldRather than publishing a post
onto someone else’s servers on Twitter or Mastodon or Bluesky or Threads
or whichever microblogging service will inevitably come along next, the
posts are published locally to a service you control.
Best practices
in practice: Black, the Python code formatterI don’t have to
explain what they got wrong and why it matters — they don’t even need to
understand what happens when the auto-formatter runs. It just cleans
things up and we move on with life.
A
layered approach to content blockingChromium’s Manifest v3
includes the declarativeNetRequest API, which delegates these functions
to the browser rather than the extension. Doing so avoids the timing
issues visible in privileged extensions and does not require giving the
extension access to the page. While these filters are more reliable and
improve privilege separation, they are also substantially weaker. You
can say goodbye to more advanced anti-adblock circumvention
techniques. (Good info on the tradeoffs in Manifest v3, and a
possible way forward, with simpler/more secure and complex/more
featureful blocking both available to the user)
(If you’re still bored after reading all these, how about trying some
effective privacy
tips?)
I am turning off Privacy-preserving attribution (PPA), the controversial
Mozilla/Meta
advertising feature in Firefox. Yes, according to the documentation,
there are some good things about PPA compared to cookies:
You theoretically can’t be identified individually as the same
person who saw an ad and then bought something
A site can’t tell if you have PPA on or off
Those are both interesting and desirable properties, and the PPA
system, if implemented correctly and run honestly, does not look like a
problem on its own. So why are people creeped out by it?
Screenshot of the Firefox PPA preference,
found in Settings→Privacy & Security
That creeped-out feeling is not coming from privacy math ignorance,
it’s people’s inner behavioral economists warning about an information
imbalance. Just like people who grow up playing ball can catch a ball
without consciously doing calculus, people who grow up in market
economies get a pretty good sense of markets and information, which
manifests as a sense of being creeped out when something about a market
design doesn’t seem right.
The problem is not the design of PPA on its own, it’s that PPA is
being proposed as something to run on the real Web, a place where you
can find both the best legit ad-supported content and the most
complicated scams. And that creates a PPA paradox: this
privacy-preserving attribution feature, if it catches on, will tend to
increase the amount of surveillance. PPA doesn’t have all of
the problems
of privacy-enhancing technologies in web browsers, but this is a big
one.
Briefly, the way that PPA is designed to work is that sites that run
ads will run JavaScript to request that the browser store
impression events to keep a record of the ad you saw, and then a
site where you buy stuff can record a conversion and then get a
report to find out which sites the people who bought stuff had seen ads
on. The browser doesn’t directly share the impression events with the
site where you buy stuff. It generates an encrypted message that might
or might not include impressions, then the site passes those encrypted
messages to secure services to do math on them and create an aggregated
report. The report doesn’t make it possible to match any individual ad
impression to any individual sale.
So, as a web entrepreneur willing to bend the rules, how would you
win PPA? You could make a site where people pay attention to the
ads, and hope that gets them to buy stuff, so you get more ad money that
way. The problem with that is that legit ad-supported content and legit,
effective advertising are both hard. Not only do you need to make a good
site, the advertisers who run their ads on it need to make effective
ads in order for you to win this way. An easier way to win the PPA game
is to run a crappy site and then (1) figure out who’s about to buy, (2)
trick those people into visiting your crappy site, and (3) tell the
browser to store an impression before the sale you predicted, so that
your crappy site gets credit for making the sale. And steps 1 and 2 work
better and better the more surveillance you can do, including tracking
people between web and non-web activity, smart
TV mics, native mobile SDKs, server-to-server CAPIs, malware, use
your imagination.
(Update 14 Oct 2024) PPA has an antitrust problem, too. In a
market where the
average user has their activity passed to Meta by thousands of
companies, Meta has a large advantage when training a machine
learning system to steal conversions by placing an ad in front of
someone who would be likely to buy anyway. With PPA, a large
surveillance company would not have to deliberately tell anyone to do
fraud, or write code to do fraud. Instead, ML systems designed to
win PPA would learn to do fraud, since if you have the
surveillance data anyway, fraud is the quickest, easiest way to get
money. (Like I said, legit conversions are hard.) And unlike what
happened in legacy fraud cases like Uber
v. Fetch, with PPA enough data is deliberately obfuscated to make
the fraud impossible to track down. Only a few large companies have the
combination of ML and large inflows of user data to make this kind of
invisible, deniable fraud possible, so PPA looks like a tool for
problematic concentration in the Internet and advertising
businesses.
Of course, attribution stealing schemes are a thing with conventional
cookie and mobile app tracking, too. And they have
been for quite a while. But conventional tracking generally produces
enough extra info to make it possible to do more interesting attribution
systems that enable marketers to figure out when legit and not-so-legit
conversions are happening. If you read Mobile Dev Memo by Eric Seufert
and other high-end marketing sites, there is a lot of material about
more sophisticated atribution models than what’s possible with PPA.
Marketers have a constant set of stats problems to solve to figure out
which of the ads are going to influence people in the direction of
buying stuff, and which ad money is being wasted because it gets spent
on claiming
credit for selling a thing that customers were going to buy anyway.
PPA doesn’t provide the info needed to get good answers for those stats
problems—so what works like a privacy feature on its own would drive the
development and deployment of more privacy risks. I’m turning
it off, and I hope that enough people will join me to keep PPA from
catching on.
the colored
pencil test for web features A web browser is the agent of the user,
and should act in the user’s interest, which means doing what the user
would do for themselves if they had time.
Tantek
Çelik suggests that Creative Commons should add a CC-NT
license, like the existing Creative Commons licenses, but written to
make it clear that the content is not licensed for generative AI
training. Manton Reece likes
the idea, and would allow training—but understands why publishers
would choose not to. AI training permissions are becoming a huge deal,
and there is a need for more licensing options. disclaimer: we’re taking steps in this area at
work now. This is a personal blog post though, not speaking for
employer or anyone else. In the 2024
AI Training Survey Results from Draft2Digital, only 5% of the
authors surveyed said that scraping and training without a license is
fair use.
Arguably, copyright is not the right framework for defining the rules
of this newly formed ecosystem.
That might be a good point from the legal scholarship point of view,
but the frequently expressed point of view of web people is more like,
creepy bots are scraping my stuff, I’ll throw anything at them I can
to get them to stop.Cloudflare’s
one-click AI scraper blocker is catching on. For a lot of the web,
the AI problem feels more like an emergency looting situation than an
academic debate. AI training permissions will be a point where people
just end up disagreeing, and where the Creative Commons approach to
copyright, where the license deliberately limits the rights that a
content creator can try to assert, is a bad fit for what many web people
really want. People disagree on what is and isn’t fair use, and how far
the power of copyright law should extend. And some free culture people
who would prefer less powerful copyright laws in principle are not
inclined to unilaterally refuse to use a tool that others are already
using.
The techbro definition of fair use (what’s yours is open, what’s mine
is proprietary) is clearly bogus, so we can safely ignore that—but it
seems like Internet freedom people can be found along both axes of the
fair use alignment chart. yes, there are four factors,
but generative AI typically uses the entire work, so we can ignore the
amount one, and we’re generally talking about human-created personal
cultural works, so the nature of the copyrighted works we’re arguing
about is generally similar. So we’re down to two, which is good because
I don’t know how to make 3 and 4d tables in HTML.
Transformative purist: work must be signficantly transformed
Transformative neutral: work must be somehow transformed
Transformative chaotic: work may be transformed
Market purist: work must not have a negative effect on the market for the original
Memes are fair use
AI business presentation assistants are fair use
A verbatim quotation from a book in a book review is fair use
Market neutral: work may have some effect on the market
We’re probably going to end up with alternate free culture licenses,
which is a good thing. But it’s probably not realistic to get
organizations to change their alignment too much. Free culture licensing
is too good of an idea to keep with one licensing organization, just
like free software foundations (lower case) are useful
enough that it’s a good idea to have a redundant array of them.
Do we need a toothier, more practical license?
This site is not licensed under a Creative Commons license, because I
have some practical requirements that aren’t in one of the standard CC
licenses. These probably apply to more sites than just this one.
Personally, I would be happier with a toothier license that covers some
of the reasons I don’t use CC now.
No permission for generative AI training (already covered
this)
Licensee must preserve links when using my work in a medium where
links work. I’m especially interested in preserving
link rel=author and link rel=canonical. I
would not mind giving general permission for copying and mirroring
material from this site, except that SEO is a thing. Without some search
engine signal, it would be too easy for a copy of my stuff on a
higher-ranked site to make this site un-findable. I’m prepared to give
up some search engine juice for giving out some material, just don’t
want to get clobbered wholesale.
Patent license: similar to open-source software license terms.
You can read my site but not use it for patent trolling. If you use my
content, I get a license to any of your patents that would be infringed
by making the content and operating the site.
Privacy flags: this site is licensed for human use, not for sale
or sharing of personal info for behavioral targeting. I object to
processing of any personal information that may be collected or
inferred from this site.
In general, if I can’t pick a license that lets me make content
available to people doing normal people stuff, but not to non-human
entities with non-human goals, I will have to make the people ask me
in person. Putting a page on the web can have interesting consequences,
and a web-aware license that works for me will probably needs to color
outside the lines of the ideal copyright law that would make sense if we
were coming up with copyright laws from scratch.
Bonus links
Knowledge
workersTaylor’s model of workplace productivity depended
entirely on deskilling, on the invention of unskilled labor—which,
heretofore, had not existed.
Grid-scale
batteries: They’re not just lithiumAlternatives to lithium-ion
technology may provide environmental, labor, and safety benefits. And
these new chemistries can work in markets like the electric grid and
industrial applications that lithium doesn’t address well.
Zen
and the art of Writer Decks (using the Pomera DM250)Probably as
a direct result of the increasing spamminess of the internet in general
and Windows 11 in its own right, over the past few years a market has
emerged for WriterDecks—single purpose writing machines that include a
keyboard (or your choice of keyboard), a screen, and some minimal
distraction-free writing software.
Gamergate
at 1010 years later, the events of Gamergate remain a cipher
through which it’s possible to understand a lot about our current
sociocultural situation.
A
Rose DiaryThanks to Mr. Austin these roses are now widely
available and beautiful gardens around the world can be filled with
roses that look like real roses and the smell of roses can be inhaled
all over the world including on my own property.
Just looking a recent news and how much of it is about surprisingly
low-reputation decisions by surprisingly high-status business
decision-makers. The big-picture trend that helps explain a lot of
technology trends news is the ongoing collapse of business norms. Scam
culture is getting mainstreamed faster than ever. Lots of related
stories…
The
hate speech landscape on Facebook is worse than you thought. Here’s
whyIn recent years, a growing number of politicians, human
rights groups, and watchdogs have claimed that not only is Meta doing a
poor job of removing harmful content, but its process for making
enforcement decisions is happening in what they see as a black box.
(There has always been some overlap between direct/database/online
marketing, fraud, and right-wing politics in the USA. Goes back at least
to the 1920s KKK boom. But today the connection is particularly strong.
Maybe the national security Republicans were helping to keep that party
from going into full growth hacker mode?) The
return of Jacob Wohl! Yeah, he’s into AI nowTrump’s
$100,000 Watch Likely Made in China, Vastly Overpriced
Anyway, it’s time to revise a lot of assumptions that were orignally
made in the higher-trust business environment of the early, legit Web in
its create more value than you capture days. Now that more
devices, products, and services reflect scam culture settings by
default, the rewards to tweaking, blocking, and other growth hacking
avoidance are simliar to the rewards for PC power user skills
back when those were a thing. More:Return of the
power user
(Previously: PET projects or
real privacy?) The current trend for privacy-enhancing
technologies for surveillance in web browsers are going to be
remembered as a technical dead end, an artifact of an unsustainable
advertising oligopoly. Here’s a top ten list of reasons, will update and
add links.
10. PETs don’t fix revenue issues for ad-supported sites. The
fundamental good ad/bad site problems and bad ad/good site problems are
still there. PETs make it safer and easier for an advertiser to run ads
on sites they don’t trust, so they help crappy infringing or
AI-generated sites compete with legit
ones in the same ways that third-party cookies do.
9. PETs give up the high ground and make the web just another
incomprehensible, creepy surveillance medium. When people complain about
privacy issues on native social media apps, with PETs on the web the app
people can just say, your browser is creepy now too, we’re just better
at business than web sites are.
8. Appeasement doesn’t work. In all the time that PET proponents have
been saying that surveillance marketers will mend their ways if they
have PETs as a compromise, how many data points have the surveillance
marketers chosen not to collect because they have PETs instead? (The way
to deal with boundary-testing is not to appease it, it’s to communicate
the boundary, communicate the conseqences for crossing it, and make the
consequences happen. I had a good source for this, need to find it
again.)
(update 4 Mar 2025) In 2019, Google announced a Privacy
Sandbox PET project, and wrote,
First, large scale blocking of cookies undermine people’s privacy by
encouraging opaque techniques such as fingerprinting. With
fingerprinting, developers have found ways to use tiny bits of
information that vary between users, such as what device they have or
what fonts they have installed to generate a unique identifier which can
then be used to match a user across websites. Unlike cookies, users
cannot clear their fingerprint, and therefore cannot control how their
information is collected. We think this subverts user choice and is
wrong.
Later, in 2025, Google
lifted its 2019 ban on fingerprinting for advertisers. It turns out
that Privacy Sandbox was not Google’s desired end state, but just
boundary-testing. Web users can learn from this story. Today, Meta and
Mozilla are doing a PET-based system for attribution tracking in
Firefox that exposes users to fraud and discrimination risks.
Proponents are saying that it’s not a big deal because it’s just
attribution tracking, not personalization.
And the people working on it might be just focused on the nifty
attribution math and not the big picture. But for users, the time to
react to boundary-testing is when you first see it happening. Waiting
for things to get unsustainably bad—in this case, for deeper Meta
integration in Firefox—would mean it’s harder to roll back.
7. Only a few platform oligopolies and monopolies benefit from PETs.
PETs introduce noise and obfuscation, to make data interpretation only
practical above a certain data set size—for a few large companies (or
one?) On this point, they’re worse than third-party cookies.
5. If it’s a good idea for shoppers to share their info, obfuscated,
with advertisers, why not make the browser share the info from
corporate web apps with customers, with individual employee
identifying details removed? What? Companies wouldn’t turn that feature
on? Then why would users?
4. The code complexity and client-side resource usage—along with the
inevitable security risks that come with running more code—end up being
paid by users, while the benefits go to surveillance companies. And the
additional server-side processing required to do all that
privacy-enhancing math on all those zillions of cleverly
scrambled data points means that Big Tech companies will build even more
big data centers, consume more energy and fresh water, and delay those
carbon-neutral goals yet again.
3. With PETs, information becomes available equally to both trusted
and untrusted parties. In a sustainable advertising medium, a trusted
publisher or channel has more audience information than an untrustworthy
one. PETs commoditize ad inventory, create
more incentives for surveillance of users using non-PET methods, and
promote a race to the bottom the same way that cookies do.
2. For most people, individual tracking isn’t the problem. Users are
concerned about group-level discrimination risks like surveillance
pricing and algorithmic discrimination, and PETs would only
obfuscate the risks, not reduce them, and make discrimination harder for
regulators and NGOs to detect. In No
Cookies for You! Evaluating the Promises of Big Tech’s
‘Privacy-Enhancing’ Techniques, Martin et al. found, Respondents
did not judge the privacy solution proposed—to have a browser collect
data, keep the data on the local device, create inferences for use in
advertising—to be a privacy solution over third-party tracking and the
use of data brokers in advertising. Jereth et al., in Consumers’
Perceived Privacy Violations in Online Advertising, found that
perceived privacy violations” for a browser-based system that does
not target people individually are similar to the perceived violations
for conventional third-party cookies. The mathematical
privacy that PETs offer is not the same as the decision-making
privacy that people want or need. See browser topics tracking
and the prejudiced landlord problem.
1. Never mind, you didn’t have to read this list. Browser companies
already know that PETs are creepy and bad, and you can tell they know
because they hide PETs from users, either with a bullshit Got it
dialog, or buried under Advanced or something. If PETs were good
for users, the browsers would brag on them like they do other
features.
From
Chance to Control - by Eve MalerPrivacy isn’t encryption. Not
only can encryption be broken or bypassed; it’s also simply a technique
that needs a solution environment. Beware of just doing crypto
and thinking it solves human challenges.
Protecting
Your Privacy While Eroding Your Democracy: Apple’s and Mozilla’s PPAs
(Privacy Preserving Ad Attribution) Considered HarmfulAn
assumption built into the browser-based PPA proposals is that
advertising is a key commercial engine of the internet, and the most
efficient way to ensure the majority of content remains free and
accessible to as many people as possible. With the advent of AI, it
is not clear to me that the majority of content remaining
accessible is an aspiration worth holding onto—the majority of content
will be bad.
Generative
AI’s Illusory Case for Fair Use by Jacqueline Charlesworth :: SSRN The exploitation of copied works for their intrinsic expressive
value sharply distinguishes AI copying from that at issue in the
technological fair use cases relied upon by AI’s fair use advocates. In
these earlier cases, the determination of fair use turned on the fact
that the alleged infringer was not seeking to capitalize on expressive
content-exactly the opposite of generative AI.
Copyright
lawsuits pose a serious threat to generative AIThe core question
in fair use analysis is whether a new product acts as a substitute for
the product being copied, or whether it transforms the old
product into something new and distinctive. In the Google Books case,
for example, the courts had no trouble finding that a book search engine
was a new, transformative product that didn’t in any way compete with
the books it was indexing. Google wasn’t making new books. Stable
Diffusion is creating new images. And while Google could guarantee that
its search engine would never display more than three lines of text from
any page in a book. Stability AI can’t make a similar promise. To the
contrary, we know that Stable Diffusion occasionally generates
near-perfect copies of images from its training data.
When does generative AI
qualify for fair use?Because fair use is determined on a
case-by-case basis, no broad statement can be made about when generative
AI qualifies for fair use. Instead, I’ll provide a specific analysis for
ChatGPT’s use of its training data, but the same basic template will
also apply for many other generative AI products.
Hausfeld
files globally significant antitrust class action against Google for
abusive use of digital media contentPublishers have no
economically viable or practical way to stop [Google Search Generative
Experience] SGE from plagiarizing their content and siphoning away
referral traffic and ad revenue. SGE uses the same web crawler as
Google’s general search service: GoogleBot. This means the only way to
block SGE from plagiarizing content is to block GoogleBot completely—and
disappear from Google Search.
AB
2013: New California AI Law Mandates Disclosure of GenAI Training Data |
Perkins CoieAB 2013 mandates that the developers of GenAI
systems or services post detailed documentation on their websites about
the data used to train these systems. Required disclosures include
Whether the datasets included any data protected by copyright,
trademark, or patent or are entirely in the public domain, and whether
the datasets were purchased or licensed by the developer.
NaNoWriMo
Shits The Bed On Artificial Intelligence (imho they’ll figure this
out before November, either the old org will reform or a new one will
launch. Recording artist POVs on Napster were varied, writer POVs on
generative AI, not so much.)
Is AI
a Silver Bullet? — Ian Cooper - Staccato SignalsTDD becomes a
powerful tool when you ask the AI to implement code for your tests
(TDD is already a powerful tool, and LLMs could be a good force
multiplier. Not just writing code that you can filter the bullshit out
of by adding tests, but also by suggesting tests that your code should
be able to pass. If the LLM outputs a test that obviously shouldn’t pass
but does, then you can fix your code sooner. If I had to guess I would
say that programming language advocacy scenes are going to figure out
the licensing for training sets first. If the coding assistant in the
IDE can train on zillions of lines of a certain language because of a
programmer co-op agreement, that’s an advantage for the language.)
Have
we stopped to think about what LLMs actually model?Big
corporations like Meta and Google tend to exaggerate and make misleading
claims that do not stand up to scrutiny. Obviously, as a cognitive
scientist who has the expertise and understanding of human language,
it’s disheartening to see a lot of these claims made without proper
evidence to back them up. But they also have downstream impacts in
various domains. If you start treating these massive complex engineering
systems as language understanding machines, it has implications in how
policymakers and regulators think about them.
Slop is GoodSearch engines you can’t trust because they are cesspools of slop is
hard to imagine. But that end feels inevitable at this point. We will
need a new web. (I tend to agree with this. Search engine company
management tends to be so ideologically committed to busting the search
quality raters union, and other labor organizing by indirect employees,
or TVCs, that they will destroy the value of the search engine to
do it.)
Congratulations to 404 Media, which celebrated
its successful first year on August 22. They link to other
next-generation news sites, owned by the people who write for them. I
checked for ads.txt files
and advertiser pages to see which are participating in the conventional
RTB ad system and which are doing something else. (404 Media does have
an ads.txt
file managed by BuySellAds.)
Hell Gate:
New York City news (not just for those who finally canceled their
subscriptions to that
other New York site) (ads.txt not found, advertise with Hell
Gate is just a page with a contact email address.)
Remap: Video
game site that also covers a variety of topics, including but not
limited to games, rooting for sports teams that break your heart,
inflatable hot tubs, hanging out on car auction websites, and more.Old
News from the Latest Disasters: [T]he fact that these studio
tell-all features have started to feel so same-y says less about the
journalist reporting them and more about how mundane this kind of
dysfunction is in AAA game development. (ads.txt not found, no ad
contact or page)
Another good example, not on 404 Media’s list, is The Kyiv Independent — News from
Ukraine, Eastern Europe. The Kyiv Independent was born
out of a fight for freedom of speech. It was co-founded by a group of
journalists who were fired from the Kyiv Post, then a prominent
newspaper, as the owner attempted to take the newsroom under control and
end its critical coverage of Ukrainian authorities. Instead of giving
up, the fired team founded a new media outlet to carry on the torch —
and be a truly independent voice of Ukraine.Opinion:
AI complacency is compromising Western defense (ads.txt found, looks
like they use an ad management service.)
What all these sites have in common is a focus on subscriber/member
revenue and first-party data.
As a privacy nerd, I’m an optimist about this apparent mess. Yes,
part of success in running a modern news operation is figuring out how
to get by without legacy management layers and investors (404
Media Shows Online Journalism Can Be Profitable When You Remove
Overpaid, Fail-Upward Brunchlords From The Equation). But the other
big set of trends is technical and regulatory improvements that—if kept
up and not worked around—will lower the ROAS (return on ad spendnot rodents of average size) for
surveillance advertising. So the Internet optimist version of the story
is
Big Tech value extraction drives independent journalists to
business models other than surveillance advertising
With the money saved in step 3, people can afford more
subscriptions.
The big objection to that is: what about free riding problems? Won’t
people choose not to subscribe, or choose infringing or AI-exfiltrated
versions of content? But most people aren’t as likely to try to free
ride as tech
executives are. The rise of 404 Media and related sites is a good
sign. More:or we could just
not
Satori
Threat Intelligence Alert: Camu cashes out ads on piracy content
(This is why you don’t build an inclusion list by looking at the ad
reports and adding what looks legit. Illegal sites can check Referer
headers and hide their real content from advertisers who cut and paste
the URL. Referer lists have to be built from known legit sources like
customer surveys, press lists, support tickets, and employee chat
logs.)
Your
use of AI is directly harming the environment I live inInstead
of putting limits to “AI” and cryptocoin mining, the official plan is
currently to destroy big parts of places like Þjórsárdalur valley, one
of the most green and vibrant ecosystems in Iceland. That’s why I take
it personally when people use “AI” models and cryptocoins. You are
complicit in creating the demand that is directly threatening to destroy
the environment I live in. None of this would be happening if there
wasn’t demand so I absolutely do think the people using these tools and
services are personally to blame, at least partially, for the harm done
in their name.
Thinking
About an Old Copyright Case and Generative AIThe precedent in
Wheaton has often been highlighted by anti-copyright scholars because it
limits the notion that copyright rights are in any sense natural rights.
This, in turn, supports the skeptical (I would say cynical) view that
copyright is a devil’s bargain with authors, begrudgingly granting a
temporary “monopoly” in exchange for production and distribution of
their works. But aside from the fact that the Court of 1834 stated that
the longstanding question remained “by no means free from doubt,” its
textual interpretation of the word securing was simply unfounded.
(Some good points here. IMHO neither the copyright maximalists nor the
techbro my business model is always fair use crowd are right.
Authors and artists have both natural rights and property-like
commercial interests that are given to them by the government as a
subsidy.)
Plain Vanilla – a tutorial
website for vanilla web developmentThe plain vanilla style of
web development makes a different choice, trading off a few short term
comforts for long term benefits like simplicity and being effectively
zero-maintenance. This approach is made possible by today’s browser
landscape, which offers excellent web standards support.
AI Has Created a
Battle Over Web CrawlingThe report, Consent in Crisis: The
Rapid Decline of the AI Data Commons, notes that a significant
number of organizations that feel threatened by generative AI are taking
measures to wall off their data. (IMHO this is not just a TOS or
copyright issue. In the medium term the main problem for AI scrapers is
going to be privacy and defamation law. Meta
AI Keeps Telling Strangers It Owns My Phone Number - Business
Insider)
Opt
Out of Clearview AI GiveawayClass actions are terminally
disappointing, but this one is especially egregious and it is worthy of
special attention. We think you should opt out. Not just as a protest,
but to preserve your rights in the event of further litigation. Here is
how to do it. The deadline is September 20th.
Point of order: I decided not to put some otherwise
good links in here because the writers chose to stick a big obvious
AI-generated image on them. That’s like Rolling
Coal for the web. Unless your intent is to claim membership in evil
oligarch fan club or artist hater club, cut it out. I can teach you to
find perfectly good Creative Commons images if you don’t have an
illustration budget.
Really good example of a market failure in software quality
incentivization: ansuz /
ऐरन: “there’s a wee story brewing in…” Read the whole thing. Good
counterexample for money talks. With the wrong market design,
money says little or nothing.
To summarize (you did read the whole thing, right?) in 2019, a
software algorithm called a Variable Delay Function (VDF) was the
subject of a $100,000 reward program. Daniel J.
Bernstein asked, in a talk recorded on video if the VDF was
vulnerable to a method that he had already published in a paper.
If Bernstein was right, then a developer who
read Bernstein’s paper on the subject
applied Bernstein’s work to attacking the VDF
and was first to claim the reward
could earn $100,000. But the money was left unclaimed—nobody got the
bounty, and the attack on VDFs didn’t come out until now.
It would take some time to read and understand the paper, and to
figure out if it really described a way to break the VDF—but that’s not
the main problem. The catch with the bounty scheme is that as a
contender for the bounty, you don’t know how many other contenders there
are and how fast they work. If 64 people (the number of viewers on the
video) are working on it, and Bernstein is 95% likely to be right about
the paper, then the expected payout is $100,000 × 0.95 × 1/64 =
$1,484.38.
In this case, the main purpose of the bounty was to collect
information about the quality of the VDF algorithm, and it failed to
achieve this purpose. A better way to achieve this information-gathering
goal is to use a system that also incentivizes meta-work such as
evaluating whether a particular approach is relevant to a particular
problem. More:Some
ways that bug futures markets differ from open source bounties
Related
How
I Made $10k Predicting Which Studies Will Replicate A prediction
market trader made profitable trades predicting if the results in
scientific papers would be replicatd, without detailed investigations
into the subject of each paper.
The
sad compromise of “sponsored results”Not only are the ads a
worse experience for the user, they are also creating a tax on all the
advertisers, and thus, on us.
The
AI Arms Race Isn’t Inevitable (But the bigger point for
international AI competition is that we’re not contending with the PRC
to better take money from content creators, or better union-bust the
TVCs.)
MIT
leaders describe the experience of not renewing Elsevier contractSince the cancellation, MIT Libraries estimates annual savings at
more than 80% of its original spend. This move saves MIT approximately
$2 million each year, and the Libraries provide alternative means of
access that fulfills most article requests in minutes.
How I
Use “AI”[T]hese examples are real ways I’ve used LLMs to help
me. They’re not designed to showcase some impressive capabiltiy; they
come from my need to get actual work done. This means the examples
aren’t glamorous, but a large fraction of the work I do every day isn’t,
and the LLMs that are available to me today let me automate away almost
all of that work.
Everybody* is on about plans for how to break up Google, so here’s my
version. I’m trying to keep two awkward considerations in mind.
Any Google breakup plan has to fit in a tweet.
Google will have more total lawyer time over more years to find the gaps
in a complicated plan than could ever be invested in making the plan.
Keep it simple, or Google will re-consolidate the way that AT&T did.
(All right, maybe not fit in a tweet, but at least get it down to one
side of a piece of paper.)
Leave Google with the ability to preserve shareholder
value. Google is a big company that does a lot of things, so
don’t drag it down with pointless micromanagement. Make as few breakup
rules as possible but otherwise give them the ability to achieve the
important goals in their own way.
The common feature of all Google’s most problematic anticompetitive
schemes is control of both the client and the server. For example, the
reason that Google Chrome has such weird, clunky
in-browser ad features is that it’s made by the same company that
also owns YouTube. When the browser company owns a video sharing site
with its own ad system, and the company as a whole earns more from
YouTube than from open web ads, they have an incentive to develop
in-browser ads in a way that a company that didn’t own both YouTube and
Google Chrome would not.
So here’s the break-up plan. Should fit on one page. Google is split
into two companies, call them clientGoogle and serverGoogle for now.
serverGoogle can’t do clients. The first
company, call it serverGoogle, may not sell or rent any hardware, or
release any proprietary software that runs outside a serverGoogle data
center. Any code that serverGoogle makes available outside a data center
must be licensed without any limitations on reverse engineering, and
distributed in the preferred form for making modifications. No software
released by serverGoogle may be a technological
protection measure under section 1201 of Title 17 of the United
States Code (DMCA anticircumvention).
clientGoogle can’t do servers. The second
company, call it clientGoogle, cannot operate any Internet services,
except those necessary for the development and distribution of client
software.
clientGoogle and serverGoogle can’t communicate
confidentially with each other. The two companies can’t enter
into an NDA with each other or contract with the same third parties
(such as directors or consulting firms) in such a way as to create a
confidential communications channel between them. (Consultants will have
to pick one company to work for.)
The reason to do it this way is that most of Google’s anticompetitive
behavior is based on control of both the client and the server.
Splitting client and server would force a flip from an anticompetitive
collusion approach to an adversarial
interoperability situation. Separating the client and server would
address the problems with Google’s browser, now hard-coded to advantage
Google’s YouTube, and Google’s ad blocking support designed to bypass
Google’s ads. In those two examples, the ads and YouTube would be part
of serverGoogle, and the browser and mobile platform would be
clientGoogle.
(Update 10 Nov 2024) Anticompetitive tying is a big issue
for Google even though services such as Google Search are free of
charge. Users pay for monopoly services in the form of exposure to
fraudulent ads at a level higher than a competitive search engine would
be able to get away with. If the search market were competitive, users
would be able to choose a search engine based on their own preferences
regarding total ad load and fraud risk, and search engines could compete
based on anti-fraud effectiveness. Separating Google into client and
server would enable the client company to more easily help users
implement the FBI advice
to use an ad blocking extension when performing internet searches
as long as the fraud risk is high, and then to have the flexibility to
recommend turning search ad blocking back off when the ads get cleaned
up, as they were back when people
actually liked Google search results more when they had ads than when
they didn’t.
A client/server split would be easier to monitor than other breakup
plans. The main monitoring that would be needed is enforcement of rule
3: keep the two companies from colluding. How long does a director or
consultant have to sit out before going to work for the other company,
that kind of thing. A whistleblower program with rewards big enough to
retire on will help.
The two companies would need to coordinate, of course, but any
communication would have to happen in open source projects and in
organizations such as the Linux Foundation, W3C, IAB, and IETF. Opening
up what had been intra-Google conversations to outsiders would not just
be an antitrust win, it would also help avert some of the weird
groupthink rat holes that all big companies tend to go down.
What about JavaScript and WebAssembly? When
serverGoogle operates a web site with JavaScript or WebAssembly (which
run on the client), the license for that client-side code may not
prohibit reverse engineering, the terms of service for the site may not
prohibit the use of the site with alternate JavaScript or WebAssembly
code, and serverGoogle must make the corresponding source code available
in the preferred form for making modifications, which might have to be a
separate download from the compiled or minified form that users normally
run.
What about servers for version control, CI, bug tracker, and
downloads? The servers required to develop and release client
software are the one exception to the no servers rule for clientGoogle.
(That doesn’t mean clientGoogle gets to run any other servers. For
example, if clientGoogle supports a browser with the ability to sync
bookmarks, users must configure it to use their account with
serverGoogle or some other party, as part of an add account
process that users already go through to set up calendar or email
accounts today.)
Can clientGoogle run servers for telemetry and in-product
surveys? Yes, as long as they’re for the purpose of developing
and releasing clientGoogle’s software. (They can’t pass data derived
from their client software to any other party.)
What about Google Fiber? (and other businesses that
aren’t client software or Internet services?) Let Google management pick
based on what is good for them—we don’t want to micromanage business
unit by business unit, just make rules to prevent the known
problems.
What about AI? Considering that Google is all on
about Integration
and Android now? AI is a good example of a win from a client/server
split. Mobile devices won’t be stuck talking to a laggy AI server for
anticompetitive tying reasons, and Internet services won’t be held back
by underpowered on-device AI for anticompetitive tying reasons. Both
client and server will be able to make the best implementation
choices.
What about the Google Play Store? serverGoogle could
run the store but not release its own proprietary apps, which run on the
client. clientGoogle could release apps, as well as mobile devices or
platforms that enable users to connect to and use an app store.
Could serverGoogle spin off the YouTube service, clientGoogle
spin off the YouTube apps, then the service and app companies merge to
re-form a standalone YouTube? Yes, if it passes normal FTC
merger review. Some post-breakup splitting and trading is going to
happen, so the FTC still has to keep an eye on things.
What about my 401(k)? Google is a big part of the
stock market, and without anticompetitive collusion they’ll be making
less money. But relax. You’re probably invested in an index fund that
owns shares in both parasites and hosts—as the legit economy recovers
from all this negative-sum value extraction, your total portfolio will
do better.
Would this work for [other company] too? Probably
not. (Let’s do Google first, which will make the web a lot more fun,
then we’ll be on a roll and can move on to whatever other big company is
giving everybody grief.)
Don’t cut soup with a knife, people
Here’s how not to break up Google: Some people are
suggesting that the breakup plan should be a careful dividing of the big
bowl of adtech alphabet soup. (Where on Ari
Paparo’s simplified chart do you cut, exactly?) That would be a
waste of time—if that’s all you do, Google will just tweak their
clients, Chrome and Android, to move the profits out of whatever slice
of the soup they have to get rid of, and keep the money flowing into
whatever they get to keep.
Related
fix Google
Search: As good as I can get Google Search in the meantime. Remove a
lot of AI, surveillance, and growth hacking crap.
Episode
6: Don Marti on the politics of browsers (I was on the Monopoly
Report podcast, mostly to talk about how awesome Google Chrome would be
without the anticompetitive tying. It’s a good browser that would be
even better if they could focus on user-facing features and not just
driving revenue for Google
Search and YouTube.)
How
breaking up Google could lower your online shopping bill | Ars
TechnicaBy overcharging by as much as 5 or 10 percent for online
ads, Google allegedly placed a Google tax on the price of
everyday goods we buy, Tech Oversight’s Sacha Haworth
explained… (Also applies to Google taxes on legit
companies.)
This
one important fact about current AI explains almost everythingThe simple fact is that current approaches to machine learning (which
underlies most of the AI people talk about today) are lousy at outliers,
which is to say that when they encounter unusual circumstances, like the
subtly altered word problems that I mentioned a few days ago, they often
say and do things that are absurd.
Here
is another reason why you should never click on ads to download
softwareThe link looks good even though it is listed as
sponsored. It shows Google’s official site as the URL. When you check
the advertiser, which you can on Google Search, you get confirmation
that Google has verified the advertisers identity. All good
then?
The Web’s hottest new feature is Privacy-Preserving Corporate
Information Sharing (PPCIS).
When a corporate employee uses a PPCIS browser to log in to any of
their employer’s web applications, such as
shared document editor
webmail
bug tracker
Slack
PPCIS automatically uses its built-in AI to make a totally
privacy-preserving summary of the employee’s work activity, then posts
the summary to a PPCIS server using really cool math that makes it
possible to identify the employer but not the individual.
The PPCIS server then aggregates all PPCIS summaries from all the
users at a company to make a report that is shared with any customer or
prospective customer who visits the company’s public web site.
PPCIS is not a real feature, but do you think that,
if it existed, corporate IT departments would leave it turned on? If the
answer is no, why would people want privacy-preserving tracking
of their personal web activity? More:PET projects or
real privacy?
Related
Return of
the power user Growth hacking and endshittification have gone far
enough that the gain from computer dinking skills is now greater than it
was in the days of X modelines and autoexec.bat files.
Using
the term ‘AI’ in product descriptions reduces purchase intentionsResearchers also discovered that negative response to AI disclosure
was even stronger for high-risk products and services, those
which people commonly feel more uncertain or anxious about buying, such
as expensive electronics, medical devices or financial services. Because
failure carries more potential risk, which may include monetary loss or
danger to physical safety, mentioning AI for these types of descriptions
may make consumers more wary and less likely to purchase…
I like static site generators but the way this site works I don’t
have to learn a static site generator, just incrementally add on tools I
already know as I need the site to do more.
Inconsolata
is a monospaced typeface I like. (I use one font loaded from this server
and system fonts for everything else because I like Charter for body
copy.)
The
CrowdStrike Outage and Market-Driven Brittleness Read the whole
thing. Today’s internet systems are too complex to hope that if we
are smart and build each piece correctly the sum total will work right.
We have to deliberately break things and keep breaking them. This
repeated process of breaking and fixing will make these systems
reliable.
(As a gatekeeper company, they’re not going to be able to get
away with a setting that turns off third-party cookies but not
tracking/personalization on Google Search or YouTube.)
These are places I have accounts for certain purposes but aren’t a
good way to reach me in general.
Slack and Discord (just because I have an account on a Slack or
Discord server doesn’t mean I regularly check messages there)
dmarti (Don Marti) |
Keybase is a way to check that it’s the same person behind all these
accounts, and the info on there should be right, but I don’t regularly
check messages on there.
[email protected]
should forward to my normal email. Use this address if you want to
invite me to read or comment on a file on Google
Docs/Drive/whatever.
Services I might have an abandoned account on
I don’t know about the status of my accounts on these. Might have
been taken over by spammers by now.
YouTube
Telegram
LiveJournal
I made accounts on Meta Instagram and Threads to try something on
Threads but don’t check them.
Services I know I don’t have an account on
WhatsApp: I know users say it isn’t enshittified
yet, but I don’t feel the need to be in suspense about when.
Apparently Google has some kind of new chat thing now again—I
don’t have time to learn these before they go away, so don’t expect an
answer (even though I do have a Google account).
The problem to watch out for is that state privacy regulators tend to
be diligent high achiever types who aren’t afraid of doing a bunch of
extra work. But what we want here is for most of the work of the
licensing system to be done on the surveillance company side. The people
who are getting paid by the taxpayers should spend as little time on it
as possible. So here’s a possible way to do it.
Pass a state law with a very general definition of surveillance,
and say that anybody who surveils more than 20% of the population (to
start with) needs to get a license. Appoint a surveillance licensing
board.
Design a surveillance licensing application, a one-page PDF. Name
of company, contact person, and so on. Last form field is describe
your surveillance practices in detail (attach additional pages if
needed)
When a company applies, put their application including the
additional pages on the web (with employee name and contact info
redacted of course) and have a public meeting.
The meeting will be full of concerned citizens, NGOs, businesses
that use the surveillance in some way, and other random members of the
public. (Yes, people who got kicked off of Facebook because of getting
hacked will show up at the Facebook meeting to complain.) Ideally this
meeting would be organized in such a way that the Big Tech lawyers have
to wait in a speakers’ waiting room next to random users. Will need to
figure this out.
Realistically some speakers at the meeting will come up with
something that the surveillance company left out of their application,
and some will mention harmful effects of surveillance practices. The
board gives the company a temporary surveillance license and tells them
to re-submit. While on a temporary license they can’t sign up any new
users from this state.
Go to step 3. When the company cleans up their act, then the
board can give them a longer term license. If they persist the board
might deny them a license and that’s when a lawsuit could kick in. But
most of the steps of the process have already worked.
No speech mentioned, it’s all about non-speech conduct, so very
difficult for surveillance industry sockpuppet orgs to get a court to
block.
Update: pricing
So how much should a surveillance license cost? For a Big Tech
company with a double-digit percentage of a state’s residents in their
database, say $5-10 per person surveilled.
In general the license should be priced by count of people records,
so a company would pay more per person surveilled as they surveil more
people. As surveillance licensing comes into effect for smaller firms,
they would pay less per record, and licensing would never be required
for databases of less than a certain size.
Over on the social media sites there have been a bunch of very
serious posts from very serious people explaining how surveillance
advertising is here to stay and the best we can do is put some
privacy-enhancing technologies on it. This sounds dismal and
awful—ads according to the faufreluches so the big shots get ads for
sweet cars and good jobs, retirees get precious metals scams, those with
money get legit investments, those without get predatory finance, you
know, all the
same tricks and discrimination but with more math to make it harder to
understand. So instead I’m going to do some Internet optimism today.
What happens if instead of reimplementing surveillance advertising, we
just get rid of it?
Step one: people start buying better stuff. If you
figure out how to turn the surveillance advertising off, you start
buying goods and services that you are more satisfied with (Lin
et. al) and buying less overpriced crap (Mustri
et al). The main reason I’m pretty confident about this effect is
because of some research that hasn’t been published. If people who use
ad blockers and privacy tools were making worse purchases, then someone
in the surveillance business would have published research saying
so.
Step two: marketers look for alternatives. If I can
somehow avoid being exposed to the surveillance ads, that doesn’t mean
that people still aren’t going to try to sell me stuff. But instead of
surveillance ads, which let them target
the most valuable possible audience for the lowest possible ad rates
they have to fall back to the next most profitable options, which might
be
contextual ad placements, funding better ad-supported
content
reviewer and influencer programs
content marketing
increase product quality
lower price
Those options probably have less attractive profits or predictability
for the company than the surveillance ads do, or the company would have
chosen them in the first place. But by removing the surveillance ad
option, as a shopper I can get more money to flow to more win-win
options.
Step three: what happened to the ad-supported
content? A lot of ad-supported content does get money from
surveillance ads. It could turn out that the legit ad-supported sites
end up better off, just by supply and demand. The number of available
crap ad spots—that are only saleable because of surveillance—will go
down. And after steps one and two, the customers will be sitting on more
money, and can put some of it into subscriptions and crowdfunding. And
subscription and crowdfunding models tend to send a higher percentage of
the money to the content creator than advertising models do.
Of course, the market isn’t going to change because one person is
harder to reach with surveillance ads. Ad reform is a collective
problem, and needs tool building, education, and lobbying work.
We might be able to get some good data about this soon, thanks to the EDPB
decision on Facebook ad tracking. It looks like some users are going
to be able to use the exact same social site, but with random ads
instead of personalized ones. When the users who picked Facebook’s
non-personalized option turn out to own better stuff that they’re more
satisfied with, that will help build toward a surveillance
advertising ban. It’s a lot easier to justify a ban when it’s not
about balancing harms and benefits, but more about stacking consumer
benefits on top of the existing privacy and national security
benefits.
Academic
Publishing is a Lucrative ScamI think the reason more academics
haven’t already migrated to Diamond Open Access journals is that there
are relatively few such journals. The reason for that is that although
there are lots of people talking about Diamond Open Access there are
many fewer actually taking steps to implement it. The initiative
mentioned in the Guardian article is therefore very welcome. Although I
think in the long run this transition is inevitable, it won’t happen by
itself. (Links to Academic
journals are a lucrative scam – and we’re determined to change
that)
Apple Safari includes an advertising measurement feature, but
fortunately you can turn it off. Update 20 Sep 2024: I
now have access to Apple Mac OS and an Apple iPhone and iPad to test on.
Will keep these instructions up to date with any changes. These
instructions cover Privacy Preserving Ad Measurement along with
some other related settings.
On Apple iOS
Open Settings, select Safari, then scroll down to Advanced. Tap to
open the Advanced settings.
Turn off Privacy Preserving Ad Measurement
While you have Settings open, you might as well check two other iOS
tracking features.
In Settings, go to Privacy & Security, then Tracking, and
make sure “Allow Apps to Request to Track” is turned off.
Also in Settings under Privacy & Security, find “Apple
Advertising” and make sure that “Personalized Ads” is turned off. (You
will probably have to scroll down—Apple makes this one a little trickier
to find.)
On Mac OS
From the Safari menu, choose Settings, then check the Privacy and
Advanced tabs. (On Safari - Version 16.0 (17614.1.25.9.10, 17614) it is
under Privacy. On other versions you need to look under Advanced.
On some versions, the setting you need is
under Privacy.
Uncheck Allow privacy-preserving measurement of ad
effectiveness. (You can also set advanced tracking and
fingerprinting protection to in all browsing if it doesn’t
break any sites you use. Some corporate web applications are not
compatible with Safari privacy features, so if you use Safari for work
you might need to leave this alone.)
Update: The Advanced button on the Privacy tab will
take you to the Advanced tab.
why turn this off?
The deeper they hide stuff like this, the more it shows they
understand that it’s not in your best interest to have it on. (This
feature is even turned
on by default in Private Browsing!) The Apple billboards are all
about protecting you from tracking. I haven’t seen one yet that was more
like Connect and share with brands you love! (please me know if
you see any Apple billboards like this)
Information has value in a market. When your browser passes
information about you—even in a form that
is supposed to prevent individual tracking—you’re rewarding risky
and problematic advertising practices along with the legit ones. Some
advertising has value, but putting legit sites and malvertising on an
equal basis for data collection is not helping.
With
a Little Help, Western Bluebirds Are Nesting in AlamedaRemember,
we humans cut down dead trees for safety, but that is bad for birds that
are cavity nesters. Would you like to do something to improve nesting
habitat? Consider installing nest boxes around your home, school, or
community center.
Every advertising event is full of thought leader insights about
privacy-enhancing technologies (PETs) for ad personalization and
reporting systems. Somehow Big Tech, adtech, and martech are all fired
up about projects for reimplementing personalized/surveillance
advertising, but this time with a bunch of complicated math added in a
way that makes it hard to identify or track an individual.
In the real world, though, individualized tracking is not the top
advertising privacy problem, and it might not even be in the top five.
Arielle
Garcia, director of intelligence for Check My Ads, said,
Privacy-enhancing tech doesn’t make creepy and disruptive ads less
creepy or disruptive in the eyes of the average user. And the user
research backs that up. Jereth
et al. find that perceived privacy violations for a
browser-based system that does not target people individually are
similar to the perceived violations for conventional third-party
cookies. Co-author Klaus M. Miller presented the research at
FTC PrivacyCon (PDF):
So keeping your data safer on your device seems to help in terms of
consumer perceptions, but it doesn’t make any difference whether the
firm is targeting the consumer at the individual or group level in the
perceived privacy perceptions.
This might not make sense if you compare just the information passed
through PETs to what is possible to do with a third-party cookie. It’s
fewer total bits of information, so the users should be getting more
privacy and like it better, right? Not so fast. The real problems that
people complain about when they raise privacy concerns are more
about information asymmetry.
People tend not to want to participate in markets in ways that give
their counterparties too much information. Imagine going on vacation and
visiting an old-fashioned rug market—and showing every rug merchant your
budget, shopping list, and departure date. Information imbalances are not how sustainable
markets work. And that’s just legit markets. The situation is even
worse on today’s scam
culture Internet, where the old fake it til you make it has
been replaced with make it, then fake it even harder. The
pervasive risks of scam culture are mostly group-level personalization
risks and not the consequences of being individually identified.
Some PETs just measure ad performance and don’t personalize. But when
the same measurement system covers both harmful and win-win ad
placements, it creates incentives for advertisers to chase the best
possible metrics while creating the least possible value for the user.
In a sustainable system, the user, or some party that the user trusts to
provide an ad-supported resource, would be a data gatekeeper to keep
data from leaking into the negative-sum corners of the Internet. People
choose to provide their information to parties they trust—they don’t
broadcast info about themselves to trusted and untrusted parties on the
same terms. If browsers offered a privacy-preserving
way to share corporate information with web users who visit the company
site, would the corporate IT department leave it turned on?
how things got this far
PETs started to go mainstream in the advertising scene back in 2019
when Google announced the Privacy Sandbox project. (Apple was
first with wide release of a PET ad system, Private Click
Measurement in 2021.) At the time, some people picked up on PETs as
an alternative to platform dystopia, a way for independent retailers and
publishers on the web to make a compromise and continue to do business
without a hard dependency on fixing the Internet’s deeper
monopolization, discrimination, and fraud problems. But in the about
five years that people have been been arguing about web ad PETs, most
recently over Google’s
ongoing effort to make their PETs acceptable to the Competition and
Markets Authority (CMA) in the UK, a big PET win is looking less and
less likely. After all the work that has gone into PETs, not only do big
risks remain, but—and this is super awkward—the companies working on
PETs are still the same Big Tech companies that PETs were supposed
to…protect us from?
Meanwhile, in the years since the PET trend began, mainstream privacy
laws have gone off in a different direction, and focused not on
obfuscation and math, but on data minimization and on enabling people to
find out how companies use data. Data subject access rights, or “right
to know,” are a key component of modern privacy laws and make it
possible for web users and for organizations working on their behalf to
detect patterns of illegal discrimination. In a 2019 settlement, Facebook
agreed to stop using age, gender and ZIP code for housing, employment
and credit ads. Reporters
were later able to track the company’s progress. Privacy researchers
and advocates continue
to benefit from “right to know” processes today. A PET environment,
however, limits access to data, making algorithmic discrimination hard
to detect.
Much recent privacy news would have been as bad or worse in a PET
environment. Grindr users aren’t suing
over privacy because they were individually identified, they are
suing because they were identified by HIV status, a fact common to a
large group of people. Reviews of Google’s Privacy Sandbox, have,
for quite a while, anticipated regulator concerns over algorithmic
discrimination and transparency. The W3C TAG, in a review
of a Privacy Sandbox” sub-project, wrote,
The Topics API as proposed puts the browser in a position of sharing
information about the user, derived from their browsing history, with
any site that can call the API. This is done in such a way that the user
has no fine-grained control over what is revealed, and in what context,
or to which parties. It also seems likely that a user would struggle to
understand what is even happening; data is gathered and sent behind the
scenes, quite opaquely.
Under privacy laws, users have the right to access not just their raw
data, but the inferences made from it—a capability that will be
difficult to retrofit into PETs. Google posted a FAQ stating,
Chrome can and will take steps to avoid topics that might be
sensitive (i.e. race, sexual orientation, religion, etc.). However, it
is still possible that websites calling the API may combine or correlate
topics with other signals to infer sensitive information, outside of
intended use. Chrome will continue to investigate methods for reducing
this risk.
No results have been posted from this investigation so far. Someone
will probably get a Best Paper award at a conference for solving
algorithmic discrimination in a PET-based ad system…eventually. Until
that happens, PETs will struggle to meet basic transparency requirements
in more and more jurisdictions. The data obfuscation problems introduced
by PETs will also create compliance challenges in the area of
competition policy. In a recent Google
report (PDF) to the CMA, the company acknowledges that compliance
with the Digital Services Act (DSA) by “Privacy Sandbox” publishers and
advertisers remains an unsolved problem.
alternatives
So what do we do instead? PETs may continue to be valuable in fields
like software telemetry, where the end result is all users receiving the
same bug fixes, not different treatment of different users. But ads are
a harder problem. First of all, PETs are not a shortcut to solving some
important platform trust issues.
I see that your industry sold my parents a retirement scam, put
malware in my search results, and showed fake repair shops on a map of
my neighborhood…but I totally trust your answer to this complicated math
problem. — no one ever
The hard part about running any Internet service is the moderation
(including ad review if it’s an ad-supported service). At some point Big
Tech management needs to get over its dogmatic union-busting and sign
fair contracts with the moderators and all the other people who do the
high-skill
human side of their operations. You can’t fix a reputation problem
with math.
As far as how to make the ads work, it’s possible to get
rid of third-party cookies and other problematic identifiers like
mobile ad IDs without also taking on an open-ended research project.
Instead of starting from what data flows are acceptable to today’s
platforms/adtech/martech and trying to stick enough math on them to make
them acceptable to users, start with how people choose to participate in
markets and automate it. The Lindy Effect is
a real thing in marketing. Ideas get tested out all the time, but the
practices that tend to persist are the ones that have endured many
generations of marketers, companies, and technologies. So the history of
marketing practices that people have been willing to accept in the past
is probably the best guide here. Realistically, markets will always have
high and low reputation sellers, and the advertising most likely to
persist will be whatever can pay for itself in (ad-supported resources +
economic signal) > (cost in attention + resources + risk).
Finding something successful and sustainable and adapting it to the
web is good for more total ad revenue over time—even if in any one
transaction you get to capture less of it. A browser is a user agent,
which means it does
what users would do for themselves if they had time. People like to
buy stuff, and prefer to buy better stuff. How can browsers help?
From
Chance to Control - by Eve MalerPrivacy isn’t encryption. Not
only can encryption be broken or bypassed; it’s also simply a technique
that needs a solution environment. Beware of just doing crypto
and thinking it solves human challenges.
No
Cookies for You! Evaluating the Promises of Big Tech’s
‘Privacy-Enhancing’ Techniques (PDF) by Martin et
al. Specifically, a browser collecting data and creating inferences
to place ads online did not meet privacy preferences and was rated the
same or lower at meeting privacy preferences compared to third-party
trackers and data brokers placing ads. (p. 61)
Bonus links
Used.Today, though I make more money than ever before in my life, have
more disposable income and am statistically in the top one-percent of
American income-earners, there’s scarcely a brand or a product that
tells me anything I feel I need to know about themselves.
Revenue-Share
Orgy: Why Ad Agencies Are in Bed with EveryoneWe’re not going to
fix our industry’s anemic ad-driven sales growth without rekindling the
critical thinking, transparency, trust, and communication that
originally built our business.
Glyph
Lefkowitz: Against Innovation TokensWhen programmers make a
technology selection, we are often considering how difficult it will
make the programming. Innovative technology selections are, by
definition, less mature. That lack of maturity — particularly in the
open source world — often means that the project is in a part of its
lifecycle where it is concerned with development affordances more than
operational ones. Therefore, the stereotypical innovative project, even
one which might legitimately be a big improvement to development
velocity, will create more operational overhead.
It’s understandable that Mozilla wants to break their dependency on
Google
search ads, but right now they seem to be doing it by, yes, yet
again putting advertising features in the browser. They seem to get on
an ads in the browser kick every few years, but this time they’re doing
it in collaboration with Meta, in a way that introduces fraud,
surveillance, and antitrust risks. Google
gets all the ink for their ad features in the browser project, but
Firefox has given us some to-do items, too. I’ll keep this post up to
date if the instructions change.
Create an
entry: dom.private-attribution.submission.enabled with a
value of false.
That blog post also has info for disabling this by default in
user.js which I have not tested.
Just putting privacy in the name of a feature doesn’t make it
less creepy. Considering today’s branding trends it might even go the
other way. Your privacy is important to us is the new your
call is important to us. If you dig into the literature behind PPA,
you will find some mathematical claims about how it prevents tracking of
individuals. This is interesting math if you like that kind of thing,
and the design does have some constructive features. But even if it all
works error-free (update 5 Oct 2024) the problem is that on the
real web, PPA provides incentives to track you more. Your sense of
being creeped out by a web feature is a better applied behavioral
economist than it gets credit for.
Use the search field to find the entry for aboutConfig
and enable it
Go to about:config
Search for dom.private-attribution.submission.enabled
and set it to false
(I have not tested this—if you have better instructions please let me
know.)
Sponsored stuff on the new tab page
These have been around for a while and do not seem to be a big deal,
but just in case you’re creeped out by the whole PPA thing and on a roll
with clicking around in settings…
Click the menu button (≡) and select Settings (if you’re not
already in Settings from the previous tip)
In the Home panel, you have two options.
Change the New Windows and Tabs settings to Blank
Page (fastest) or Custom URLs
(According to Firefox
Help, our preferred partner is adMarketplace. I don’t see
this in the new user signup process, maybe I missed it.)
Turn on Global Privacy Control
I’m going to end with some good news and ask, please don’t get mad at
Firefox in general just because of one more fad-chasing move by
management. I have been using this thing since it was Netscape
Navigator, and even worked there for a while, and I have always been
able to muddle through. Here is a place where Firefox is ahead of the
other browsers: Global
Privacy Control support.
From the ≡ menu button, Settings.
In the Privacy & Security panel, scroll down to Website
Privacy Preferences and turn on Tell websites not to sell or share my
data.
You can check that it works by going to globalprivacycontrol.org.
Look for GPC signal detected at the top of the screen.
Ad blocker check
Mozilla seems to be doing better at keeping the scam ad blockers out
of their extensions directory than Google—but that might just be that
fewer malware developers are targeting Firefox now. But it is a good
idea to check that your ad blocker is protecting you—privacy is now the
number one reason why people install ad blockers. You can check your
browser setup at EFF’s Cover
Your Tracks site.
This is a work in progress, will update as needed. If you’re looking
for info on the Mr
Robot TV show plugin I think this is totally gone by now and you
should not have to do anything.
“Privacy-Preserving”
Attribution: Mozilla Disappoints Us Yet AgainAt the end of the
day, Mozilla knows this feature isn’t something that Firefox users want.
If they truly believed this was the one path away from the constant data
theft perpetuated by the advertising industry, they would’ve announced
this loudly and proudly.
PET projects
or real privacy? Some background info on privacy-enhancing ad
personalization, which, to me, looks like an interesting technical dead
end.
explainers/ppa-experiment
at main · mozilla/explainers This is Mozilla’s POV about why they’re
doing this. (They’re mixing up privacy protection and protection from
individual addressability, which is a trend now, and an annoying one. As
a privacy nerd, now I know how music nerds feel when people go on and on
about Nickelback.)
How
and Why to Disable Firefox’s ‘Privacy-Preserving’ Ad Measurements If I’ve learned one thing from the way the online ad industry
evolved, it’s that people should have a choice in whether their data is
being tracked. Even if it seeks to replace even more invasive systems,
Firefox should have offered people a choice to opt into ad measurement,
instead of enabling it silently.
Bonus links
The
Golden Calf Of Addressability: Reevaluating The Foundations Of Digital
AdvertisingAdvancements in AI have given us the tools to gather
probabilistic insights by examining broader audience trends and modeling
future behavior, rather than chasing the deterministic cookie trail to
nowhere. (You still have to be careful—even if tracking is not
deterministic it can still faciliate discrimination, matching
scammers to victims, and other harms.)
Something I recently ran into: How do you build a site locally and
host the resulting built version of the site on GitHub Pages? Here’s one
way to do it. First, have a make all target that builds the
public files from the source pages (for example, pass CommonMark
filenames to Pandoc to get HTML.) Then make deploy does
several things:
Make a fresh build directory and copy the source files,
the Makefile, and the .git directory into
it.
Run make gh-pages inside the build directory. That
target depends on the all target, so this builds the site
inside the directory.
Next, make gh-pages inside the build directory commits
the public files on the gh-pages branch, then force pushes
to GitHub.
Back in make deploy, delete the build
directory. That includes the .git directory inside, so no
history of built files gets preserved.
# We make this site with "make" locally and deploy generated pages to GitHub in
# a branch. First, delete the build directory and the gh-pages branch. Then
# copy the site files into the build directory and make the gh-pages target
deploy : all
(git branch -D gh-pages || true) &> /dev/null
rm -rf build && mkdir -p build
cp -a Makefile .git $(SOURCES) build
make -C build gh-pages
rm -rf build
# This target only runs inside the build directory and does a commit and push
# on the gh-pages branch. If you look at this project on GitHub you should see
# the original .md files on the main branch and the generated HTML files on the
# gh-pages branch.
gh-pages : all
basename `pwd` | grep -q build || exit 1
rm -f .git/hooks/pre-push
git checkout -b gh-pages
git rm -f $(SOURCES)
git add -f $(PUBLICFILES)
git commit -m "this is a temporary branch, do not commit here."
git push -f origin gh-pages:gh-pages
This is from a
Makefile for a very basic .org site. The -C option to
make is to change to the given directory before reading the Makefile.
(That site has versioning for the source files on GitHub too, but it
doesn’t have to be.)
planning for SCALE
2025 is about some ideas for a conference talk on making a site
using a Makefile and a bunch of various stuff instead of a static site
generator.
Bonus links
hangout_services/thunk.jsIt turns out Google Chrome (via Chromium) includes a default
extension which makes extra services available to code running on the
*.google.com domains
A high-level
view of all of this feed reader stuff (good info about a useful QA
service for making your RSS tool better behaved. The script that builds
the links for this bonus links section just got fixed up some.)
Download the Atkinson
Hyperlegible Font | Braille InstituteAtkinson Hyperlegible font
is named after Braille Institute founder, J. Robert Atkinson. What makes
it different from traditional typography design is that it focuses on
letterform distinction to increase character recognition, ultimately
improving readability. (If I use a hyperlegible typeface, I can make
my prose more confusing, right?)
The Itanic
Saga (personally I had no idea that a story about Intel Itanium
would start in 1923)
Amazon
Is Investigating Perplexity Over Claims of Scraping AbuseAmazon’s cloud division has launched an investigation into Perplexity
AI. At issue is whether the AI search startup is violating Amazon Web
Services rules by scraping websites that attempted to prevent it from
doing so, WIRED has learned.
The Pruneyard is “an
iconic destination and experience designed to make the everyday
extraordinary.” It’s also, according to the US Supreme
Court, a business establishment that is open to the public to
come and go as they please. The views expressed by members of the public
in passing out pamphlets or seeking signatures for a petition thus will
not likely be identified with those of the owner.Pruneyard
Shopping Center v. Robins, in which a student group ended up
being allowed to distribute their leaflets in some areas of the mall, is
a Supreme Court case that keeps coming up in discussions about what
kinds of regulation are appropriate for the “Big Tech” companies.
Is a Big Tech platform more like a mall, or more like a newspaper?
Making the best case for the newspaper option, Mike
Masnick at Techdirt asserts that the Big Tech companies have broad First
Amendment rights that extend to cover many design and business model
decisions. This point of view is pretty far outside the
mainstream—if the First Amendment really extended that far into
protecting business models, then the Fair
Credit Reporting Act would be unconstitutional, and lenders and
fintech companies would have the right to spread any kind of negative
information about anyone (up to the generous limits of libel law). On
the extreme end of the other side, of course, we hear from far right and
far left politicians who want to shut down even classic First Amendment
media such as newspapers and newspaper-like web sites.
In the middle, though, the tricky part is to figure out is which
online platforms are more like a mall, where permitting the people who
use it to exercise their own personal rights is not an unconstitutional
infringement of the platform owner’s rights, as covered in the
Pruneyard decision…
Here the requirement that appellants permit appellees to exercise
state-protected rights of free expression and petition on shopping
center property clearly does not amount to an unconstitutional
infringement of appellants’ property rights under the Taking Clause.
There is nothing to suggest that preventing appellants from prohibiting
this sort of activity will unreasonably impair the value or use of their
property as a shopping center. The PruneYard is a large commercial
complex that covers several city blocks, contains numerous separate
business establishments, and is open to the public at large. The
decision of the California Supreme Court makes it clear that the
PruneYard may restrict expressive activity by adopting time, place, and
manner regulations that will minimize any interference with its
commercial functions.
…and when is a site more like a newspaper? It makes sense in
principle to treat a privately owned platform as a mall in situations
where it works like a mall, and people want or need to do the kinds of
things they do in a mall, but how do you draw the line?
A state law writing assigment, direct from the Supreme Court
The mall or newspaper question has been in the news lately because of
the
Supreme Court’s decision in Moody v. Netchoice, which is largely a
writing assignment for state legislators. (full
decision PDF) The task that the Court has set for legislators is to
come up with laws that can somehow take a whack at Big Tech’s
multifarious villainy—mental
health harms, child exploitation, national
security threats, fraud, and all the rest—and
to draft those laws in a way that clearly treats an online version of
the Pruneyard Shopping Center, a place where the rights of the users
outweigh the rights of the platform owner, differently from a site that
works like a newspaper.
But instead of getting too far into the weeds on the mall or
newspaper question, what about other options? Big Tech has qualities
more similar to other businesses that are covered by other kinds of
laws.
Regulation is justified by the scale and impact of the gambling and
credit reporting industries. Even though a hypothetical Techdirt for
slot machine designers or Techdirt for credit bureaus could stretch out
a First Amendment argument to cover those two cases, in reality they’re
regulated. State legislatures need to identify and regulate a Big Tech
business practice in a way that will also hold up in court. Fortunately,
there is one that’s far enough away from speech to clearly avoid First
Amendment issues: surveillance, or cross-context
tracking.
Cross-context tracking includes using pixels, SDKs, or “conversion
tracking” APIs as a way for any business to send information about any
person to the Big Tech company. (More
about conversion tracking, from Consumer Reports). Cross-context
tracking also takes place within Big Tech companies, when, for example,
the same company owns both a video sharing platform and a web browser,
and uses data collected by the browser to personalize ads on the video
sharing platform.
Cross-context tracking affects hundreds of billions of dollars in
advertising and content, as well as people’s opportunities in housing,
employment, and other areas. So we should be able to avoid asking
mall or newspaper? as much as possible and work toward building a
consensus: a company that collects or accepts tracking data
about people’s actions in one context in order to inform decisions that
affect them in another context is holding itself out as a
networked meeting place for other people’s transactions, and in need of
licensing. The tracking is in no way expressive by the Big Tech company.
It’s an inbound flow of data, not outbound, and the company has no
knowledge or control of what the tracking events will be.
Just as states regulate and inspect any business with a gambling or
liquor license, a state licensing agency for cross-context tracking
would be able to handle citizen complaints about Big Tech platforms.
Even if you believe that cross-context behavioral advertising is a net
economic win, the people subject to it face more than enough risks to
justify a state-level licensing system. For the state whose legislature
can get cross-context tracking licenses into law, it’s a win-win:
Costs to the state government could be minimized. A company would
be required to disclose its own cross-context tracking in its license
application, state regulators wouldn’t have to go try to figure it out.
And objections to licenses could be raised through public comment forms
and meetings.
A company could cleanly avoid regulation by staying out of the
cross-context tracking business and not applying for a license.
Can promote fair competition because the tracking that requires a
license would be across contexts, not owners. A Big Tech company that
gained a new context, such as a site or app, by acquisition would need
to update its license.
A licensing bill covering cross-context tracking could get support
from a variety of interests, including people who already believe in the
harms of cross-context tracking and want fewer, better-run companies
doing it, and people who are neutral or even feel positively about the
tracking part but want to use future public meetings about tracking
license renewals as a way to get Big Tech to improve their behavior
toward their state’s citizens and businesses. Small business owners
could show up at a license renewal meeting and have real impact, not
just get rounded up to advocate for Big Tech and against privacy one day
a year while their support tickets go to oblivion the rest of the time.
And maybe, as Big Tech platforms increasingly resemble the spawn of the
arbitrary top-down decision-making of a credit report and the
psychological manipulation of a slot machine, the answer to the mall or
newspaper question is neither.
The
NetChoice Decision Shows the First Amendment Is Out of Control by Tim
WuJudges have transmuted a constitutional provision meant to
protect unpopular opinion into an all-purpose tool of legislative
nullification that now mostly protects corporate interests. Nearly any
law that has to do with the movement of information can be attacked in
the name of the First Amendment.
Supreme
Court protects the future of content moderationThe decision
elaborates that the compilation and curation of “others’ speech into an
expressive product of its own” is entitled to First Amendment protection
and that the government cannot get its way just by asserting an
interest in better balancing the marketplace of ideas.
BRIEF
OF LAW AND HISTORY SCHOLARS AS AMICI CURIAE IN SUPPORT OF RESPONDENTS IN
NO. 22-555Facebook, Twitter Instagram, and TikTok are not
newspapers. They are not space-limited publications dependent on
editorial discretion in choosing what topics or issues to highlight.
Rather, they are platforms for widespread public expression and
discourse. They are their own beast, but they are far closer to a public
shopping center or a railroad than to the Manchester Union The proposed
NetChoice trigger of “editorial judgment” is misplaced.
Meta’s
‘pay or consent’ model fails EU competition rules, Commission finds
(From here in the USA this looks like a brilliant piece of legal work:
allowing European publishers to continue using “pay or consent” while
putting this model off-limits to Meta. Remember, think
about European tech policy in context. Big Tech isn’t in trouble in
Europe because companies are failing to comply with whatever the EU laws
are today. They’re in trouble because they’re more of a part of the
problem than a part of the solution on the big issues.)
Judges
Rule Big Tech’s Free Ride on Section 230 Is OverAs one judge
wrote, Today, § 230 rides in to rescue corporations from virtually
any claim loosely related to content posted by a third party, no matter
the cause of action and whatever the provider’s actions. There’s a
shape-shifting element here as well. When corporations want to avoid
being regulated, they call what they do speech and claim First Amendment
rights. But when someone tries to hold them liable for defamation or
unlawful speech, they claim they are mere vessels for others, and thus
immune under Section 230. This dynamic creates what one court called
a lawless no-man’s-land. (this
case (PDF) is another example of the circuit split between
common sense no wonder new
college graduates don’t want to go into advertising now that so much
of the job description is about paying the People’s Republic of China to
[child harm redacted] and the US court system which keeps
stretching Section 230 and the First Amendment to give legal cover to
Big Tech even as the industry really commits to its heel turn.)
Bonus links
My reply to
the people who want to designate my neighborhood a “historic
district”Having a house in a city with a lot of homeless people,
and one where essential workers can’t afford to live, will also depress
property values. It’s not as obvious. It’s not as acute. But it’s a much
bigger problem and one that’s harder to deal with.
Mastodon
is actively courting journalists (The underrated asset of Twitter
was that they owned the “byline namespace” for mass and trade media. The
pro-Putin direction over there is creating an opportunity for a new
journalist directory, and I hope not a single point of failure this
time.)
It’s all coming together. The PC power user is back. In the
early days of small business microcomputers, the person with a
subscription to Byte, and some knowledge of topics like
batch files and how to get support for a printer driver, was able to
bring up the value of the office computer up from, say, a 2 on a scale
of 1 to 10 to maybe a 4 or 5. The desktop PC options kept getting
better, though, so by the time Microsoft got to Windows XP in the early
2000s, the small office PC was more like a 6 or 7 without tweaking, and
power user skills made less of a difference. No more DIP switch settings
to learn! When you plug in a printer it works out of the box! You can
get updates over the Internet! And the small business Internet scene,
for a while, was solidly in Web 2.0create more
value than you capture territory. All that added up to much less
value returned from the time invested to become a power user.
But now the balance is shifting again. Now the small office or home
office PC is more of a point of sale device, loaded with surveillance
software, compliance
risks, and SaaS upsells. The peripherals work, in a sense, but they
don’t work so much for you as for some far-away product manager who
needs to nail their OKRs to get promoted and afford a down payment on a
house.economics experiment I’d like to try: make
middle-class housing affordable on an honest IT salary to test my
hyothesis that enshittification would go down And the
small-business-facing Internet is a more or less wretched hive of scum
and villainy, from fairly mild shelfware without the shelf schemes, all
the way to actively heinous stuff like sending
your marketing budget to terrorists.
But the underlying PC hardware is still getting way better. And
Internet service is getting faster with lower latency, and the best of
the software you can easily get is still getting way better. Which means
a bigger gap between baseline and advanced configuration, so a bigger
win from learning power user stuff. Yes, I’m a Linux nerd, but you can
probably get your computer into an acceptable state without switching
OSs. (I got a Microsoft Windows laptop for work, and the two OSs have
gotten a lot more similar. On Linux a video conference is more likely to
just work, and on Windows, stuff is more likely to break because
somebody got bored with it.) So if in the 1980s you could bring an
office PC up from a 2 to a 5, and in the 2000s you might not get much
noticeable change, now you can bring your PC from negative territory up
to a 7 or 8. Looks like a big win, even if you don’t count the payoff
from scam protection. More:effective privacy
tips
Related
Why
We Need to Address Scam Culture by Tressie McMillan Cottom (The
growth hacking mindset in IT product management is a small part
of a bigger problem. Business practices that used to be unthinkable for
reputation reasons have now gone mainstream. Personally, I’m not sure
why. It might just be that business culture is more global and today’s
Big Tech decision-makers are more likely to compare themselves to a peer
group that includes more international resource extraction oligarchs and
fewer execs at mid-sized legit companies.)
Is
Everything BS?So the most important thing you can do with a sale
is not purely the economic bit, and it’s not purely the behavioral bit.
It’s both. Good interview with Rory Sutherland
Google’s Privacy
Sandbox: More Like Privacy Litter Box (At last, an adtech piece
about Google’s in-browser advertising system that doesn’t do the whole
feedback sandwich thing with praising Google’s privacy
efforts first, then actually saying what the author means to say, then
buttering up Google again at the end. A must-read for anyone who has to
edit anything about web ads.)
Mozilla
is an advertising company now (they have gone back and forth with ad
features in Firefox for quite a few iterations. Possible good news here,
though. Probably a good sign that the PETs adtech hype peak has
been crested.)
What
everyone gets wrong about the 2015 Ashley Madison scandalPeople
have been trying to have affairs with strangers for thousands of years.
Ashley Madison was never really about that. Avid Life Media, its parent
company, wasn’t in the business of sex, it was in the business of bots.
Its site became a prototype for what social media platforms such as
Facebook are becoming: places so packed with AI-generated nonsense that
they feel like spam cages, or information prisons where the only
messages that get through are auto-generated ads.
Applying
The ‘Would Your Mother Approve?’ Rule To Online Ad TrackingBarnes is one of the attorneys leading an ongoing class-action suit
alleging that Meta’s tracking pixel is a violation of HIPPA because it’s
able to collect sensitive protected health information without a
patient’s knowledge and can transmit that data directly to Facebook and
Instagram. He says, But I will say that any legislator who votes
to say that only an attorney general can bring an action to defend a
consumer’s privacy rights is making a 100% un-American vote. People have
the right to a jury trial lawyer of their own choosing and the ability
to access the courts on their own without asking permission from a
government official. (Class-action privacy cases are a promising
direction for taking on a lot of this stuff IMHO.)
Now that AI can beat a Turing test by bullshitting,
what’s the next test? In Prediction
Market Trading as an LLM Benchmark, Jesse Richardson suggests that
setting up an LLM to trade on a prediction market (e.g. Polymarket,
which is the platform I’ll talk about here) could be a particularly
strong benchmark with a number of desirable properties.Scott
Alexander also suggests prediction markets as a useful challenge for
bots.
Seems like a good idea. The best part about prediction markets as a
benchmark is scaling. One skilled prediction market trader can end up on
the winning side of trades with a large number of low-skill traders,
human or bot. LLM benchmarks that depend on evaluating generated text
are much harder to scale, and it’s usually easier to bullshit than to
detect bullshit. To make a market-based pass/fail test, give the bot a
stake and charge it interest on its stake and rent for the computing
resources it uses. A bot passes the test if it can stay solvent for the
agreed-upon time in a market with human traders. Even if bots can’t pass
the test, they might still have their uses, since they can help add
liquidity to corporate
prediction markets and incentivization
markets. (More: boring bots ftw)
In practice this market test would not be a pure prediction
benchmark, but would have some text generation aspects, too. A bot would
get a substantial advantage by explaining its trades afterwards—talking
up its positions, and telling other traders why they should pay more
than the bot did. But the bot’s ability to explain its actions in a
persuasive way does not have to be tested just by looking. It can be
evaluated indirectly by looking at how well the bot can persuade other
traders.
The big problem with building an open ecosystem for AI is the cost of
benchmarking. Even when it is possible to train a model at low cost,
evaluating that model depends on either hiring large numbers of human
reviewers, or paying for access to a larger model. In order to get to
the own your own AI stage, the benchmarking budget problem needs
to be addressed early on. Integrating an incentivization market with
existing open-source collaboration infrastructure (Pinfactory
connects to a GitHub project pretty easily, just saying) could be a
good start. More:boring bots ftw
Dominant
assurance contracts are not exactly prediction markets, but a bot
trader could win by identifying and consistently betting against
projects that are unlikely to be funded.
pile of money
fail Science and security prediction markets are another potential
real-money opportunity.
Ever notice that the AI skeptic articles are so much better written
than the AI fan articles? It seems like the only way to change that
would be for the AI firms to start doing the whole pay the writers
thing. Somehow the AI scene needs to, for its own
good, figure out how to stop squeezing legit text out of training sets
and avoid letting deceptive text in. But paying
people for public information goods is hard. AI firms will have to
look at options such as putting up the seed money for a dominant
assurance contract on a to-be-written work on the condition that it will
be licensable for training. In the meantime, here’s a recent list
of AI-related links that have been making the rounds.
Perplexity’s
grand theft AI[B]y providing an answer, rather than pointing
people to click through to a primary source, these so-called “answer
engines” starve the primary source of ad revenue — keeping that revenue
for themselves.
L AI
ZY.What was happening in the world of AI was that people thought
they could ask it a question–what’s the next big fashion trend–and get
an answer. They didn’t realize the amount of work that had to be done to
allow a computer to derive an answer.
There’s
Something Deeply Wrong With PerplexityEven readers are becoming
wary of the trend, with a new report by the Reuters Institute for the
Study of Journalism finding that a majority of news consumers are
suspicious of AI-generated news content. (I am seeing a lot more
user questions about how to block or avoid AI output then questions
about how to get it. Maybe the people trying to get it are asking in
other places?)
The
public web and consentOn the LLM front, I’m not particularly
bothered by my writing being used to help train GPT, Claude, or Apple
Intelligence. While I appreciate others feel differently, I just don’t
see these tools as replacing me in any real way. On the other hand,
tools like Arc Search, Perplexity, and Google’s AI answers are trying to
replace me and present my work as their own. That’s plagiarism and
copyright infringement…
How
to stop Perplexity and save the web from bad AILike Clearview,
Perplexity’s core innovation is ethical rather than technical. In the
recent past, it would have been considered bad form to steal and
repurpose journalism at scale. Perplexity is making a bet that the
advent of generative AI has somehow changed the moral calculus to its
benefit.
I
Paid $365.63 to Replace 404 Media With AI (This is a fun one,
although scary when you consider the low
standards of certain web ad companies. I really hope they do a
follow-up about whether Mohamed Sawah has more work than he can handle
now. Or maybe in the future when he’s some kind of Internet tycoon this
story will be the place to look for answers to some trivia questions
about sites he worked on.)
and that they’re crawling from three different IP addresses with
three different user agent strings, which look like normal browsers.
The test I did: I have some pages on this site that are not linked to
from anywhere, so tested to see if this company would crawl one. (Mostly
just random stuff that I have shared with friends but not blogged.)
My perplexity.ai query:
Can you explain how to play the board game [game name] which is
covered by the ruleset at [URL]?
and that URL (which gets a couple of hits on game nights, that’s it)
was immediately visted by not one, not two, but three different IP
addresses.
44.221.181.252 “Mozilla/5.0 (Windows NT 10.0)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0
Safari/537.36”
45.56.133.241 “Mozilla/5.0 (Windows NT 10.0; Win64;
x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0
Safari/537.36”
134.73.83.233 “Mozilla/5.0 (Macintosh; Intel Mac OS
X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0
Safari/537.36”
Why three? The first reason I can think of is to try to evade a
block, but really? Any better explanations?
What reminded me of that joke is all the surveillance advertising
companies going on about how surveillance advertising is so good for
small businesses. But if they have
so much trouble telling small businesses and fraud apart, how can
they know? Maybe surveillance ads are just better for fraud. The
interesting comparison to make is not between a legit business at times
they have surveillance advertising on or off, because the scammers
competing to reach the same customers are leaving the surveillance ads
on. IMHO you have to look from the customer side. If surveillance
advertising helps legit companies reach people who can benefit from
their products, then people
who use ad blockers or privacy tools should be less happy with the stuff
they buy.
Instead, people
who installed ad blockers for a study turned out to be less likely to
regret their recent purchases, and that’s surprising enough to be
worth digging into. Maybe it’s not fraud, just drop-shippers. Lots of drop-shippers/social media
advertisers are finding existing cheap products, marking them up,
and selling using surveillance ads. It’s not illegal, but the people who
click the ads end up paying more money for the same stuff. Maybe the
reason that the ad blocker users are happier as shoppers is that they
search out and buy, say, a $20 product for $20 instead of paying a
drop-shipper $99? Or maybe ad blocker users are just making fewer but
better thought out purchases?
Just some reading material, more later. I did mess with the CSS on
this blog a little, so pages with code on them should look a little
better on small screens even if you have to scroll horizontally to see
the code.
Which
top sites block AI crawlers?All in all, most sites I looked at
don’t care to have their content used to train AI. (IMHO this will
be a big issue with the Fediverse—currently the only way to pass a
noai signal is to defederate. I made a FEP (fep-5e53)
so will see what happens.)
Why
First Party Data May Not Save Digital Advertising (This is why it’s
going to be better to get real consent, later, from fewer people than
bogus consent based on zero information about the brand or
publisher.)
Economic
Termites Are Everywhere[E]conomic termites…are instances of
monopolization big enough to make investors a huge amount of money, but
not noticeable enough for most of us. An individual termite isn’t big
enough to matter, but the existence of a termite is extremely bad news,
because it means there are others. Add enough of them up, and you get
our modern economic experience.
Tesla may be in trouble,
but other EVs are selling just fine (How much of this is the brand
personality and how much is the problem that Teslas are expensive to
insure? I think every car I have ever owned ended up costing a lot more
in car insurance than its price.)
We
need to rewild the internetFor California residents, GPC
automates the request to “accept” or “reject” sales of your data, such
as cookie-based tracking, on its websites. However, it isn’t yet
supported by major default browsers like Chrome and Safari. Broad
adoption will take time, but it’s a small step in changing real-world
outcomes by driving antimonopoly practices deep into the standards stack
— and it’s already being adopted elsewhere.
AI-specific laws are still in progress, and copyright cases are still
making their way through the court system. I still don’t know if all the
stuff I did to block AI
training on a web site is going to be enforceable—it depends on how
well web site Terms of Service hold up in court as contracts. But in the
meantime we do have a tool that is already in place and tested.
An Opt-Out
Preference Signal like Global Privacy Control
is a way to signal, in a legally enforceable way, that you opt out of
the sale or sharing of your personal information.
GPC already protects residents of California, Colorado, Connecticut,
and other states in the USA, and enforcement is coming on line in other
jurisdications as well. Sounds like a useful tool, right? But there’s
one missing piece. The GPC standard covers a signal sent from the client
to the server. When you visit a site as a user, this is just fine. But
when you need protection for a blog, a portfolio, or a profile page,
your personal info is on a server, but the company
looking to exploit it is running a client—a crawler or
scraper. That’s where we need to borrow some basics from the methods for
blocking AI training on a web site and add a meta tag and HTTP
header that work like GPC, from server to client.
The header is pretty easy. I just did it. Have a look at this site’s
HTTP headers in developer tools or do a
Colorado has a process for registering OOPSs, so I will need to
write this up and submit it so it’s valid there. In other jurisdictions
the OOPS is valid as long as it expresses the deliberate opt-out of the
user, which mine does.
Just to make it extra clear, I need to put something in my Web Site User
Agreement, the way a lot of sites do for noai
Remember that laws are downstream of norms here. People generally
believe in moral rights and some kind of copyrights for people who do
creative work, and people generally believe in some kind of privacy
right to control use of your personal information. And you shouldn’t be
at a disadvantage when you choose to share personally relevant content
on your own site compared to people who share on a big service. The
details will get worked out. Big AI will probably be able to make bogus
legal arguments, delay, and lobby for a while, but in the long run the
law will reflect norms more than it reflects billable hours spent trying
to push a disliked business model uphill. Comments and suggestions
welcome.
(Update 14 Jun 2024: Add darkvisitors.com API and GPC.)
I’m going to start with a warning. You can’t completely block “AI”
training from a web site. Underground AI
will always get through, and it might turn out that the future of
AI-based infringement is bot accounts so that the sites that profit from
it can just be shocked at what one of their users was
doing—kind of like how big companies monetize copyright
infringement.
But there are some ways to tell the halfway crooks of the AI business
to go away. Will update if I find others.
This site uses the API to catch up on
the latest. So if I fall behind on reading the technology news, the
Makefile has me covered.
# update AI crawlers blocking list from darkvisitors.com
tmp/robots.txt :
curl -X POST "https://api.darkvisitors.com/robots-txts" \
-H "Authorization: Bearer $(shell pass darkvisitors-token)" \
-H "Content-Type: application/json" \
-d '{"agent_types": ["AI Data Scraper", "AI Assistant", "Undocumented AI Agent", "AI Search Crawler"], "disallow": "/"}' \
> $@
# The real robots.txt is built from the local lines
# in the conf directory, with the
# darkvisitors.com lines added
public/robots.txt : conf/robots.txt tmp/robots.txt
cat conf/robots.txt tmp/robots.txt > $@
One of my cleanup scripts gets rid of the tmp/robots.txt
fetched from Dark Visitors if it gets stale, and I use Pass to store the token.
That support FAQ includes a good point that applies to all of
these—the opt out is stronger if it’s backed up with the site Terms of
Service or User Agreement. Big companies have invested hella lawyer
hours in making these things more enforceable, and if they wanted to
override ToS they would be acting against their other interests in
keeping their sites in company town mode.
new: privacy opt out for servers
This is the first site to include the new SPC meta tag and
X-Robots-Tag header for a privacy opt-out that works like Global Privacy
Control but for servers. Basically you have legally enforceable
rights in your personal information, blogs have personal information,
but regular GPC only
works from your browser (client) to company on the server. This goes the
other way, and sends a legally enforceable* *yes, I
know, this has not yet been tested in court, but give it a minute, we’re
just getting started here privacy signal from a personal blog on
the server to an AI scraper on the client side.
So the new header on here is
X-Robots-Tag: noai, noimageai, SPC
So we’re up to four, somebody send me number five?
Related
Google Chrome
ad features checklist covers the client side of this— how to protect
your personal info, and other people’s, from being fed to AI (among
other abuses)
How
to Stop Your Data From Being Used to Train AI | WIRED covers much
other software including Adobe, Slack, and others. The list below
only includes companies currently with an opt-out process. For example,
Microsoft’s Copilot does not offer users with personal accounts the
option to have their prompts not used to improve the software.
For
tech CEOs, the dystopia is the pointThe CEOs obviously don’t
much care what some flyby cultural critics think of their branding
aspirations, but beyond even that, we have to bear in mind that these
dystopias are actively useful to them.
Amazon
is filled with garbage ebooks. Here’s how they get made.The
biographer in question was just one in a vast, hidden ecosystem centered
on the production and distribution of very cheap, low-quality ebooks
about increasingly esoteric subjects. Many of them gleefully share
misinformation or repackage basic facts from WikiHow behind a title
that’s been search-engine-optimized to hell and back again. Some of them
even steal the names of well-established existing authors and masquerade
as new releases from those writers. (I’m going to the real
bookstore.)
“Pink slime” local news
outlets erupt all over US as election nearsKathleen Carley, a
computer science professor at Carnegie Mellon University, said her
research suggests that following the 2022 midterms “a lot more money” is
being poured into pink slime sites, including advertising on
Meta.
Since there’s a search quality crisis on, a lot of the companies you
might find on social media are scams, and a lot of the stuff sold on big
retail sites is fake, here are some real businesses I can recommend in
several categories. Will fill in some more.
I personally know about all of these and would be happy to answer
questions.
art, crafts, gifts
Modern Mouse (A place for
local artists and artisans to sell their work.)
audio gear
Sweetwater is a good source
of pro/semi-pro/office/podcasting equipment. Good support for checking
compatibility and other questions.
auto body shop
Fix Auto Alameda No
problems with the work, good communications, cooperated well with the
insurance company.
books
Books Inc (Several Bay Area
locations including SFO. If they don’t have it they can order it. Better
than ordering from the big online bookstore because they only get books
from the real distributor. You will only get legit copies, not those
cheaply printed pirate versions that pay no royalties to the
author.)
burritos
Island
Taqueria 1313 Park St., Alameda. (Bay Area’s best burritos. All the
classic menu items people move to the Bay Area for, consistent quality.
El Gran Taco in San Francisco would have been a contender but they’re
gone now.)
car repair
Fred’s Wrenchouse has
kept a 22-year-old car going for me. (They also provide good
recommendations for shops that do the work they don’t.)
car stereo
Crutchfield
is famous for compatibility charts and support. Will sell you the right
adapters and stuff to do an install. (If you haven’t put in a car stereo
using the stuff they supply, it’s a little harder than building Ikea
furniture but easier than an electronic kit or advanced Lego set.)
Newman’s
Own makes a better chocolate/white stuff/chocolate cookie than
Oreo®. (Which is good, because the owner of Oreo cookies,
Mondelez International, was recently removed from the
list of royal warrants in the UK because of the company’s activities
in Russia.)
JVC Gumy HAFX7 These
really sound good and come with a set of silicone ear pieces in
different sizes, so in real-world listening situations they sound better
than more expensive options that don’t fit as well. (In my experience
most drama and waste from electronic devices are caused by apps,
firmware, Terms of Service, radios, and batteries. These have none of
those.)
electrician
S O T Electric Inc did a
really good job with a major upgrade to an existing house electrical
system.
fabric
Mood Fabrics is a huge
store in New York City, with lots and lots of options. No sign of it
from street level. Allow plenty of time. (Have only shopped there in
person, have not ordered online.)
Stonemountain
& Daughter Fabrics in Berkeley, California has a great selection
of fabric and really good service. (Have only shopped there in person,
have not ordered online.)
flour
King Arthur Baking is
recommended by people who know a lot more about flour in recipes than I
do. The company is employee-owned and provides customer
support.
Encinal Nursery has good
local recommendations for outdoor plants in season, along with an
interesting selection of healthy houseplants in the back.
plumbing
Gladiator Plumber
1752 Timothy Drive, San Leandro. Have used them for everything from
small jobs up to a water heater install. Easy scheduling and consistent
good work.
power adapters and cables
Anker makes Lightning and USB
cables that last longer than the Apple ones. Better overmolding and
strain relief so they don’t get all frayed and crappy. Also good for
power bricks and chargers.
Transbay buses operated
by AC Transit are the best way in and out of Alameda unless you have
something to do right by the ferry terminal or someone is meeting you
there. Lots of stops convenient to the Alameda businesses linked to
here, and you will generally get a deluxe bus with comfy seats on a
Transbay route.
Bonus links
Microsoft is reworking
Recall after researchers point out its security problems (Maybe this
is downstream of extreme economic inequality? When so many decisions are
made by an out-of-touch management class that shares few of the problems
of regular people, new product news turns into an endless stream of
weird shit that makes regular people’s problems worse.)
New
York to ban ‘addictive’ suggested posts on social media feeds for
kidsIn practice, the bill would stop platforms from showing
suggested posts to people under the age of 18, content the legislation
describes as addictive. Instead, children would get posts only
from accounts they follow. A minor could still get the suggested posts
if he or she has what the bill defines as verifiable parental
consent.
We’re
unprepared for the threat GenAI on Instagram, Facebook, and Whatsapp
poses to kidsWaves of Child Sexual Abuse Material (CSAM) are
inundating social media platforms as bad actors target these sites for
their accessibility and reach. (The other issue is labor organizing
among social site moderators. The people who run social platforms seem
to really think they can AI their way out of dealing with the
moderators’ union.)
I
turned in my manuscript! (Looks like Evan’s ActivityPub book is
coming soon. I put in a purchase request at the library already.)
Just in case you have a script for finding interesting links, here
are some links from mine…
Parable
of the SofaIt seems blindingly obvious that an economy with a
higher proportion of lifestyle businesses is going to be more resilient,
more humane, and immensely more pleasant than the one that the Leaders
Of Industry are trying to build. How would we get there from
here?
Lord Kelvin and
His Analog ComputerOn Thomson’s tide-predicting machine, each of
10 components was associated with a specific tidal constituent and had
its own gearing to set the amplitude. The components were geared
together so that their periods were proportional to the periods of the
tidal constituents. A single crank turned all of the gears
simultaneously, having the effect of summing each of the cosine
curves.
The
Moral Economy of the ShireFrom everything we’re told, the Shire
is a very agriculturally productive region, which helps explain the lack
of debt-peonage or other forms of unfree labor. It also explains the
relative “looseness” of the system we’re looking at here; the gap
between the lower gentry and upper yeomanry isn’t very large, and most
families are able to support themselves with only minimal
assistance.
Amazon
Sold Fake Copies of Hotly Anticipated UFO Book (more news from the
Big Tech #pivotToCrime. When Microsoft needed support in its antitrust
courtroom drama, the MS-Windows OEMs and ISVs were right there. Amazon,
Meta, and Google need support now—but they’re telling the content
business to go eat a turd.)
Origins
of the Lab MouseThe early supply of mice for research depended
on a late-19th century community of hobbyists—fanciers—who collected,
bred, and sold unusual mice varieties. These “fancy” varieties were then
standardized in the 1920s…
WTf
Quora: how a platform eats itselfAs Quora pursued AI-driven
enhancements, things got weird. (At the very beginning of Quora,
they tried populating it with bot-written questions. Then they cut back,
it went through a human user phase, now back to crap.)
The
Tigers at the Gate: Moving Privacy Forward Through Proactive
Transparency GPC is easy to set up and listen for because it is
a simple HTTP header. Perhaps too simple as it only conveys whether the
choice mechanism is turned on or off (GPC=1 or GPC=0). Unlike the more
complex IAB EU’s Transparency and Consent signal (TC String) signal
itself does not encode information about the source of the opt-out
signal, or provide details about how the signaling mechanism was
implemented or presented to users. (imho this is a win. You have to
respect GPC, but you can’t trust a sketchy site or adtech intermediary
to set GPC correctly, therefore you can’t deal with sketchy sites or
adtech intermediaries.
The
deskilling of web dev is harming the product but, more importantly, it’s
damaging our health – this is why burnout happens – Baldur BjarnasonYou’re expected to follow half-a-dozen different specialties, each
relatively fast-paced and complex in its own right, and you’re supposed
to do it without cutting into the hours where you do actual paid web
development. Worse yet, you’re not actually expected to use any of it
directly. Instead you’re also supposed to follow the developments of
framework abstractions that are layered on top of the foundation
specialties, at least doubling the number of complex fields a web dev
has to follow and understand, right out of the gate. (I don’t know
about you, but this site has a basic HTML template, Pandoc, and a
Makefile. That’s about it.)
Why
I went back to buying CDs (and you should too)The integrity of
my audio library had been corrupted, at least in small ways. Horns were
easy to spot, but how many other songs or albums had been messed with,
without my knowledge? It turns out, way more than I had thought.
Google
Researchers Say AI Now Leading Disinformation Vector (and Are Severely
Undercounting the Problem)As bad as the AI-generated images
problem is according to the paper, it is very possibly much worse
because the paper’s data sample is relying on public fact checker data,
who are not selecting AI-generated image-based disinformation randomly.
Sites like Snopes and Politifact, which have limited resources, are
focusing on fact checking images that have earned some degree of
virality or news coverage, so their fact checks serve a purpose or an
audience.
Personal
Blocklist (not by Google) (useful browser extension to remove sites
from search results when they’re better at SEO than actual content)
Elon
Musk’s Gifts to Web Scrapers (Guest Blog Post)[B]y providing a
foil in litigation against both the Center for Countering Digital Hate
(“CCDH”) and Bright Data (the world’s largest seller of scraped data),
he’s given judges in the most important district court in the country
for tech legal issues, the Northern District of California, plenty of
motivation to rule against him. As a result, judges have provided two
landmark opinions in the last 45 days in favor of web scrapers. This
creates powerful new precedent that will make it easier for web scrapers
to prevail in litigation and will make it much harder for websites to
prevent scraping.
It has come to my attention that any blog that mentions advertising
must do a post including the expression Does Exactly What it
Says on the Tin, so here is mine. Following up on the 30-40-30 rule, why are
some people so fired up about personalized advertising, while others
aren’t? Maybe it goes back to what kind of shopping use cases they’re
optimizing for.
Phillip Nelson, in Advertising
as Information, divides brand quaities into search qualities
and experience qualities. A search quality is something you can
check before buying the product, like tasting a sample of cheese.
An experience quality is something you have to spend more time figuring
out, like seeing if your woodstain
dries in the time printed on the tin. Shopping for cheese and woodstain
are a lot different.
Cheese shopping
Cheeses are similar as far as nutrition goes, so picking one is a
matter of personal preference.
Cheese is easy to evaluate at the point of purchase. Mmm, sample
cheese on a toothpick.
My own cheese-tasting palate is a better guide for me than the
opinions of a cheese expert.
Cost of a mistake is low.
Top priority: getting the best-matched product among a set of
alternatives in a narrow quality range.
Woodstain shopping
Has quality metrics that are not different from person to
person.
Hard to evaluate at the point of purchase. You have to do your
project and wait for it to dry (or not?)
The knowledge of a woodstain expert is more valuable to me than
how I might feel about a certain brand of woodstain at the hardware
store.
Cost of a mistake is high.
Top priority: avoiding a low-quality or deceptively sold
product.
If you’re shopping for parts to build a PC, the mouse is cheese, the
power supply is woodstain, and the video card is somewhere in the
middle. If you’re buying a car, or a bike, or a pair of boots, it kind
of depends on the ratio of your net worth and your budget for the item.
Buyers who have a lot of money relative to the price of the product are
more likely to be buying cheese, buyers who are sinking a lot of their
assets into the purchase are buying woodstain.
‘Know
Your Customer’ Law Comes For Ad Data Licensors (this
#federalPrivacyLaw might be a bigger deal than it looks like. If some
North Korean spies get busted for something else, and the FBI finds
records of a company’s data sales/sharing to a North Korean owned shell
company, they could be in big trouble. Publicly traded companies will
have to disclose this as a risk, and put more gatekeeping around data
sales/sharing.)
Washington
State’s My Health My Data Act (coming into effect this summer,
another area of compliance costs and friction. Get those Meta Pixels off
your health-related content, or learn how good paralegals are at browser
dev tools and word processor mail merge.)
Just added Pagefind to this site.
It’s a site search for sites built with a static site generator, works
in the browser. There should now be a search box under the similar
posts list.
This site is built with make, so I just needed to
add a dependency to have it automatically rebuild when some
content changes
add a little markup to the article template to have it know to
use the post title and not the blog title as the title in the
results.
That was about it. Does not add nearly as much to the site build time
as some of my own code did. Still planning to pitch a talk for
SCALE 2025 on fun small projects that people can string together to
make a site.
Bonus links
Surviving
the SEO Shake-Up: Publishers vs. Google’s New Game[Google]
gained $12.6 billion in Search revenue YoY, and only lost $1.5 billion
in the Google Network. Google footnotes that the The overall growth
[in Search & other revenue] was driven by interrelated factors
including increases in search queries resulting from growth in user
adoption and usage on mobile devices; growth in advertiser spending; and
improvements we have made in ad formats and delivery. Got that last
part? You’re not going crazy. Total search queries are up, traffic to
publishers is down, and Search revenue is up. Google is here for the
ads, their own ads.
Google
just updated its algorithm. The Internet will never be the sameAI Overviews are just one of a slew of dramatic changes Google has
made to its core product over the past two years. The company says its
recent effort to revamp Search will usher in an exciting new era of
technology and help solve many of the issues plaguing the web. But
critics say the opposite may be true. As Google retools its algorithms
and uses AI to transition from a search engine to a search and answer
engine, some worry the result could be no less than an extinction-level
event for the businesses that make much of your favourite
content.
Does
One Line Fix Google? (yes, but for how long? This is as if Coca-Cola
came out with New Coke, but the Coke machines still had a secret button
combo that would get you the real thing. Enjoy this for as long
as they leave it up. The post remove AI from Google
Search on Firefox has links to tools and advice for doing this on
other platforms too.)
Most of these bots are boring. They’re bots programmed to
automatically buy some market once the price gets low enough, or to
arbitrage basically-identical markets, or do some other technical
finance maneuver.
From the point of view of a active prediction market, where a lot of
well-informed traders are speculating about well-known events, then
those bots are not especially interesting.
The place where the boring bots do make a big difference,
though, is in incentivization markets.
An incentivization market is like a prediction market except that one
trading strategy is to make an event being traded on either happen or
not happen.
Some markets can be both. (A use case for Policy
Analysis Market would have been for someone with advanced knowledge
of a terrorist attack to trade on their knowledge, and create a price
signal that could prevent the attack.)
One problem for incentivization markets to get over is the large
number of thinly traded contracts. So a boring bot would be just
what you need for things like:
trade across likely duplicate and dependent issues to create
fewer and more lucrative opportunities for human experts (can be within
or across projects)
bid up the price of FIXED based on encouraging CI results,
enabling developers to get out of all or part of a position
early
front-run issues on behalf of a developer based on their
interests and available time
The market helps compensate for erratic LLM behavior—an unproductive
bot will lose its stake and get shut down. A bot doesn’t have to be run
for max earnings, either. An arbitrage bot, for example, could break
even or get subsidized to lose a little to keep the market smooth.
Incentivization market needs more noisy traders, LLMs need a cheap
way to evaluate whether they’re doing something sensible. Seems like a
cookies and milk situation.
Update 25 Jun 2024: If you found this page looking
for how to turn off the recently announced built-in
AI feature in Firefox then as far as I know you don’t have to do
anything. It’s only in the Nightly releases and for now it’s opt in.
I’ll update this page with a link to instructions if that changes. In
the meantime, why not read on and fix Google Search?
Next steps: Some more browser extensions and
settings to get Google Search closer to how the old Google
worked: fix Google
Search
I signed up for Google Search Console,
and, wow, this site is getting like 200% more search clicks since I
posted this. The Google algorithm is really into this blog post for some
reason.
Original version of this post:
This seems to work to remove “AI” stuff from the top of Google search
results on Firefox. (Tested on desktop Firefox for Linux.)
Go to the hamburger menu → Settings → Search and remove “Google
Search.”
Do a regular Google search for a word.
Bookmark the search result page.
Go to the hamburger menu → Bookmarks → Manage Bookmarks.
(optional) Make a new folder for search and put the new bookmark
in it.
Edit the bookmark to include udm=14 as a URL
parameter, like this:
https://www.google.com/search?q=%s&udm=14
The
Ukraine war is driving rapid innovation in drone technologyOf
course, there are new legal and moral questions that arise from giving
drones the power to kill. But the CEO of this company points out there
is a cost to not developing the technology. And in any case, this push
to innovate—and defeat the invading enemy—has pushed off those questions
for now. (imho this is going to be the number one immediate issue
for AI in Europe. The only credible alternative to returning to
large-scale conscription in European countries that have phased it out
is for some European alliance to reach global leadership in autonomous
military AI. Which explains why they’re putting
civilian AI and surveillance businesses on a tight leash—to free up
qualified developers for defense jobs.)
How
Google harms search advertisers in 20 slides They’re not raising
prices, they’re coming up with better prices or more fair prices,
where those new prices are higher than the previous ones. lol
Microsoft
Deleted Its LLM Because It Didn’t Get a Safety Test, But Now It’s
Everywhere404 Media has not tested the model and we don’t know
if it is easily producing harmful or “toxic” answers, or if Microsoft
only took it down because it didn’t check either way. Since the model is
open source, it is also possible other people could have downloaded it
and create uncensored versions of the model that would produce
controversial answers anyway, as we’ve reported people have done
previously. (underground AI is less capable but more predictable
than big company AI APIs. From the point of view of an API caller, the
AI you were using gets randomly nerfed because the provider is acting on
a moderation issue you weren’t aware of.)
Almost as long as Internet privacy tools have been a thing, so have
articles from personalized ad proponents telling us not to use them,
because personalized ads are good actually. The policy debate over
personalized (or surveillance, or cross-context behavioral, or
tracking-based, or whatever you want to call it) advertising seems to
keep repeating an endless argument that on the one hand, personalized
advertising causes some risk or cost, I’m not going
to summarize the risks or costs here, go read Bob
Hoffman’s books or Microtargeting
as Information Warfare for more info but on the other hand we
have to somehow balance that against the benefits of
personalized advertising.
Benefits? Let’s see them. Cross-context behavioral advertising is
good for consumers should be straightforward to test. If ad
personalization really helps match
buyers and sellers in a market, then users of privacy tools
and privacy settings must be buying worse products and
services. Research should show that the more privacy options
you pick, the less happy you are with your stuff. And the more
personalized your ad experience is, the more satisfied of a customer you
are. This is different from asking whether or not people prefer to have
ad personalization turned on. That has been pretty extensively covered,
and the answer is that some
people do, and some people don’t. This question isn’t about whether
people like personalized ads or not, it’s about whether people
who get more personalized ads are happier with how they spend their
money.
This should be a fairly low-cost project because in general, the
companies that do the most personalized advertising are in the best
position to do the research to support it. Are users of privacy tools
and settings more or less satisfied with the products and services they
buy than people who leave the personalized ad options on?
Do privacy-protected users give lower ratings to the products
they buy?
Do privacy-protected users return or stop using more of their
purchases?
Are privacy-protected users more likely to buy a replacement,
competing product after an unsuccessful first purchase in a
category?
Are privacy-protected users more likely to agree with general
statements about a decline in quality and trustworthiness in business in
general?
The correlation between more privacy and
less satisfied consumer would be detectable from a
variety of angles. Vendors of browsers with preferences that
affect ad targeting should be able to show that people who turn on
the privacy settings are somehow worse off than people who don’t.
Anti-adblock companies do research on ad blocker users—so how are
shopping experiences different for those users? Any product that
connects to a server for updates or telemetry is providing data on how
long the buyer chooses to keep using it. And—the biggest opportunity
here—any company that has an Apple iOS app (and that’s a lot of
companies) should be able to compare satisfaction metrics between
customers with App Tracking Transparency (ATT) on or off.
Ad platforms, search engines, social network companies, and online
retailers all have access to the needed info on ads, privacy settings,
locations, and purchases. Best of all, they’re constantly running
customer surveys and experiments of all kinds. It would be
straightforward for any of these companies to run yet another user
satisfaction survey, to prove what should be an obvious, measurable
effect. I’m really looking for any kind of research here, whether it’s a
credit card company running a SQL query on existing data to point out
that customers with iOS app tracking turned off have more chargebacks,
or a longer-term customer satisfaction study, anything.
looking at the data we do have
The Consumer Welfare
Effects of Online Ads: Evidence from a 9-Year Experiment | NBER
Facebook users who had ads turned off entirely were asked to place a
monthly valuation on Facebook, and the amount was not significantly
different from the valuation given by users with ads turned on. (The
researchers did not test personalized vs. random ads, just ads on/ads
off)
Consumer Surveillance
and Financial Fraud | NBER showed that Apple App Tracking
Transparency, which reduces the ability of apps to personalize ads,
tends to reduce fraud complaints to the FTC. We estimate that the
reduction in tracking reduces money lost in all complaints by 4.7% and
money lost reported in internet and data security complaints by
40.1%.
Balancing
User Privacy and Personalization by Malika Korganbekova and Cole
Zuber. This study simulated the effects of a privacy feature by
truncating browsing history for some Wayfair shoppers, and found that
people who were assigned to the personalized group and chose a product
personalized to them were 10% less likely to return it than people in
the non-personalized group.
The
Welfare Effects of Ad Blocking by Lin et al. was different—members
of the treatment group got an ad blocker affecting all sites, not just
one retail site.
[P]articipants that were asked to install an ad-blocker become less
likely to regret recent purchases, while participants that were asked to
uninstall their ad-blocker report lower levels of satisfaction with
their recent purchases.
The ad blockers used in that study, however, were multi-purpose ones
such as uBlock
Origin that block ads in general, not just personalization.
The Internet as a whole is much more various in seller honesty level
than the Wayfair platform is, which might help explain the difference in
customer satisfaction seen between the Korganbekova and Zuber paper and
the Lin et al. paper. Lin et al. showed that people were more satisfied
as customers when receiving fewer ads in total, but they might have been
even less satisified if they received more of the lower-quality ads that
you’re more likely to get if adtech firms don’t have enough data to
target you for a bigger-budget campaign.
The presence of low quality vendors, along with the recent increase
in the use of ad blockers, makes it increasingly difficult for new, high
quality vendors, to reach new clients. Consumers benefit from having
access to new sellers that are able to meet their needs through
behavioral ads, as long as they are good sellers.
but
targeted ads are more likely to be associated with lower quality
vendors, and higher prices for identical products, compared to competing
alternatives found in organic search results
If you look back on the history of advertising, there has never been
an ad medium that required so much legal and technical complexity to try
to get people to accept it. Why is Meta
going to so much trouble to try to come up with a legal way to
require people in the EU to accept personalized ads? If ad
personalization is so good for consumers, won’t they pick it on their
own? Anyway, I’m looking for research on how personalization and privacy
choices affect customer satisfaction.
Another update from the blog. Quick recap: I’m re-doing this blog
with mostly Pandoc and make, with a few helper scripts.
This is a personal web site and can be broken sometimes, and one of
the breakage problems was: oops, I removed a draft post from the
directory of source files (in CommonMark) but the HTML version got
built and put in public and copied to the server, possibly
also affecting the index.html and the RSS feed.
If you’re reading the RSS and got some half-baked drafts, that’s
why.
So, to fix it, I need to ask make if there’s anything in
the public directory that doesn’t have a corresponding
source file or files and remove it. Quick helper script:
That should mean a better RSS reading experience since you shouldn’t
get it cluttered up with drafts if I make a mistake.
But I’m sure I have plenty of other mistakes I can make.
Effects
of Banning Targeted Advertising (The top 10 percent of Android apps
for kids did better after an ad personalization policy change, while the
bottom 90 percent lost revenue. If Sturgeon’s Law applies to Android
apps, the average under-13 user might be better off?)
In
Response To Google (Does anyone else notice more and more people
working on ways to fix their personal information environment because of
the search quality crisis? This blog series from Ed Zitron has some good
background.)
How would a browser realistically get rid of third-party cookies, if
the plan was to just replace third-party cookies, and the project
requirements did not include a bunch of anticompetitive tricks too?
Start offering a very scary dialog to a fraction of new users.
Something like Do you want to test a new experimental feature? It
might—maybe—have some privacy benefits but many sites will break.
Don’t expect a lot of people to agree at first.
Turn off third-party cookies for the users who did say yes in
step 1, and watch the telemetry. There will be positive and negative
effects, but they won’t be overwhelmingly bad because most sites have to
work with other browsers.
When the breakage detected in step 2 gets to be insignificant as
a cause of new browser users quitting or reinstalling, start making the
dialog less scary and show it to more people.
Keep repeating until most new installs are third-party
cookie-free, then start offering the dialog on browser
upgrades.
Continue, for more and more users, until you get to 95-99%. Leave
the third-party cookies on for 1-5% of users for a couple of releases
just to spot any lingering problems, then make third-party cookies
default off, with no dialog (users would have to find the preference to
re-enable them, or their sysadmin would have to push out a centralized
change if some legacy corporate site still needs them).
But what about the personalized ads? Some people actually want
those! Not a problem. The good news is that ad
personalization can be done in an extension. Ask extension
developers who have extensions that support ad personalization to sign
up for a registry of ad personalization extensions, then keep track of
how many users are installing each one. Adtech firms don’t (usually?)
have personalization extensions today, but every company can develop one
on its own schedule, with less uncertainty and fewer dependencies and
delays than the current end of cookies mess. The extension
development tools are really good now.
As soon as an ad personalization extension can pass an independent
security audit (done by a company agreed on by the extension developer
and the browser vendor) and get, say, 10,000 users, then the browser can
put it on a choice screen that gets shown for new installs and, if added
since last upgrade, upgrades. (The browser could give the dogmatic
anti-personalization users a preference to opt out of these choice
screens if they really wanted to dig in and find it.) This makes the
work of competition regulators much easier—they just have to check that
the browser vendor’s own ad personalization extension gets fair
treatment with competing ones.
And we’re done. The privacy people and the personalized ad people get
what they want with much less drama and delay, the whole web ad business
isn’t stuck queued up waiting for one development team, and all that’s
missing is the anticompetitive stuff that has been making end of
cookies work such a pain since 2019.
Related
the 30-40-30 rule An
updated list of citations to user research on how many people want
personalized ads
Catching up to
Safari? Some features that Apple has done right, with opportunities
for other browsers to think different(ly)
Bonus links
An
open letter to the advertising punditryI personally got involved
in the Inventory Quality reviews to make sure that the data scientists
weren’t pressured by the business and could find the patterns–like ww3
[dot] forbes [dot] com–and go after them.
The
Rise of Large-Language-Model OptimizationThe advent of AI
threatens to destroy the complex online ecosystem that allows writers,
artists, and other creators to reach human audiences.
The
Man Who Killed Google Search[M]any found that the update mostly
rolled back changes, and traffic was increasing to sites that had
previously been suppressed by Google Search’s “Penguin” update from 2012
that specifically targeted spammy search results, as well as those hit
by an update from an August 1, 2018…
I missed Southern
California Linux Expo this year. Normally I can think of a talk to
do, but between work and [virus redacted] I didn’t have a lot of
conference abstract writing time last fall. I need some new material
anyway. The talks that tend to do well for me there are kind of a mix of
tips for doing weird stuff.
I didn’t really have anything good to submit last fall, but this year
I am building up a bunch of miscellaneous Linux stuff similar to what
has worked for me at SCALE before. Because of the big Fediverse trend,
the search quality crisis, the ends of third-party cookies and Twitter,
and enshittification in general, it seems like there’s a lot more
interest in redoing your blog—I know I have been doing it, so that’s
what I’m going to see if I can come up with something on for next SCALE.
But I’m not going to use a blog software package. I’m more comfortable
with a mix of different stuff. This blog is now mainly done in Pandoc,
auto-rebuilt by
Make, and has a bunch of scripts in various languages, including
shell, Perl, Python, and even a little bit
of Lua now.
protip: use cowsay(1) to alert the user
to errors in Makefile before restarting
I don’t really expect anybody to copy this blog, more outdo it by
getting the audience to realize how much you can now do with the
available tools. I’m not going to win any design prizes but with modern
CSS I can make a reasonable responsive layout and dark/light modes. And
yes you can make a valid
RSS feed in GNU Make.
The feature I just did today is the similar posts in the left
column. Remember that paper about how you can measure the similarity
between two pieces of text by seeing how well they compress together? “Low-Resource”
Text Classification: A Parameter-Free Classification Method with
Compressors - ACL Anthology This is Python code for rating
similarity of chunks of text. Check it out in the left column, you can
now follow the links to similar blog posts.
Next I will probably try stuff like Fediverse-powered comments, some
kind of search feature, LLM training set poisoning, some privacy and p2p
features, and maybe something else. A lot of what I’m doing here will be
possible to translate into other environments, and should be portable to
people’s favorite blog software.
Are they answering accurately? People might be
under- or over-reporting their use of ad blockers. Under-reporting
because they don’t want to admit to free-riding on ad-supported sites,
or over-reporting because install an ad blocker is now one of the
typical Internet tips you’re supposed to do, like not re-using passwords
and installing software updates when they come out. People might be
trying to look more responsible. When the FBI says you should be
running an ad blocker to deal with fake search ads, that puts a
certain amount of pressure on people.
Are they using an honest blocker with real
protection? The ad blocking category has a lot of scams,
including adware and paid allow-listing, so most of the people saying
yes are not getting the blocking they think they are. (The company that
owns the number one ad blocker makes a business out of selling
exceptions from blocking. Senator
Ron Wyden wrote a letter to the FTC asking them to investigate the
ad-blocking industry back in 2020, but no action as far as I know.
In the meantime you can check your ad blocker using a tool from the
EFF.)
How much of their browsing is on a protected browser or
device? It’s a lot easier to install an ad blocker on desktop
than on mobile, and people have different habits.
Maybe they really just don’t like ads in
general. Lin et al. found
that blocking all ads leads to fewer reported regrets with
purchases and an improvement in subjective well-being. Maybe
the simple explanation is just Advertising
Makes Us Unhappy.
Maybe ads are just worse now. According to a presentation by
Michael Farmer at the Advertising: Who Cares? conference, the people
who make the ads are having to make more and more. Maybe the ads carry
less information for the same amount of interruption and annoyance, and
that’s why it make sense to block them. (And it’s not just about the
rise of content creation software—most of the increase happened
after agency processes had been digital for a while.)
The number of ads made by each agency
employee increased from 3 in 1983 to 311 in 2019
The second most newsworthy part of the new Censuswide survey is
why people say they’re using an ad blocker. Protect online
privacy is now the number one reason, with block ads and
speed up page loads coming in after that. I’ll leave the most newsworthy part to the end. I
know, I know, the surveillance advertising people are going to reply
with something like, yeah, right, these ad blocker users are just
rationalizing free-riding on ad-supported sites, like Napster users
making bogus fair use arguments instead of paying for CDs back when that
was a thing. But, as Lars Doucet points out in Ad
Blockers and the Four Currencies, it’s more complicated than
that—many people get a positive integrity dollars bonus from
punishing or disincentivizing a violation of privacy and honesty norms.
In order to understand the ad blocker survey results, we have to put the
survey in context with other
research. Compare to Turow et al. on
attitudes to cross-context tracking, and to an IAB
Europe study that found only 20% of users would be happy for
their data to be shared with third parties for advertising
purposes.
It looks like the privacy concerns are real for a significant subset
of people, and part of the same trend as popular US
State Privacy Legislation. Different people have different
norms around ad personalization, and if people can’t get companies
to comply with those norms they will get the government to do something
about it. For companies, adjusting to privacy norms doesn’t just mean
throwing privacy-enhancing technologies (PETs) at the problem. Jerath
et al. found similar levels of perceived privacy violations
for on-device ad personalization as for old-fashioned cookie-based
tracking. PETs have different mathematical properties from cookies, but
either don’t address other problems or make them worse.
Companies deploying PETs are asking users to switch away from ad
blocking and trust that they will do complicated math honestly—but
they’re not
starting from a position of trust. When users have the opportunity
to evaluate the companies’ honesty in a way they do understand, the
companies don’t measure up. Most people can look at an online map of
their neighborhood and spot places
where a locksmith isn’t. And it’s easy to look up a person on a
social site and see where there are enough profiles that not all of them
can be real.
screenshot of several fake Facebook
profiles, all using the same two photos of retired US Army General Mark
Hertling
The biggest problem with PETs will be that the Big Tech companies do
both easy-to-understand activities—like scams, fake profiles, and union
busting—and hard-to-understand activities, like PET math.
I see you served me scam ads and a map with fake
companies in my neighborhood, but I totally trust your math to protect
my privacy — no one ever If you don’t know if the PET math is
honest, but you can see the same company acting dishonestly in other
ways, then it’s hard to trust the PET math. (Personally I think the
actual PETs are probably legit, but they’re being rolled out as part of
a larger
program to squeeze out legit publishers and other smaller
companies.)
In AIC polls, confidence in Amazon, Meta,
and Google has fallen since 2018.
Doc Searls called ad blocking the
biggest boycott in world history back in 2015. Ad blocking looks
like a response to creepy practices (or perceived privacy
violations if that works better for you) and those practices are
part of a more general scam culture crisis. Tressie
McMillan Cottom writes,read the whole
thing
Scams weaken our trust in social institutions, but their going
mainstream—divorced from empathy for the victims or stigma for the
perpetrators—means that we have accepted scams as institutions
themselves.
I can’t see any one big policy solution for surveillance advertising,
tech oligopolies, or the broader scam culture problem. All of that stuff
would have to change in order to move the ad blocking numbers. It’s
going to take a variety of approaches, maybe including a surveillance
advertising ban, maybe surveillance
licensing, maybe a Pigovian tax
on databases containing PII, maybe breaking
up Big Tech firms. So far the most promising approach seems to be
state laws with private right of action, which is one of the reasons I’m
so optimistic about Washington
State’s My Health My Data Act. My experience on a jury (not an
ad-related case) was the most productive meeting I have been in since I
came to California. If surveillance advertising issues can grind their
way through a few jury trials, where lawyers have an incentive to
explain what’s going on in an accurate, comprehensible way, then both
surveillance marketers and privacy nerds will be able to reset how we
approach this stuff based on more common sense.
or we could just
not What would happen to ad-supported content if the surveillance
business just went away? I tried to figure out the worst case, using all
the assumptions most favorable to the surveillance side.
advertising
personalization: good for you? Some of the proponents of
surveillance/personalized advertising are not citing some of the most
interesting literature in their field.
Your
Google searches becoming big target for ‘malvertising’ hackersConsumers could also consider installing anti-malware software on
their computer and phone. Another option is to avoid ads by installing
an ad blocker extension such as uBlock Origin, a free and open-source
browser extension for content filtering, including ad blocking.
(From CNBC, another good example of the mainstreaming of ad blocking as
a best practice for users.)
The
NSA and CIA Use Ad Blockers Because Online Advertising Is So
DangerousLots of people who use ad blockers say they do it to
block malicious ads that can sometimes hack their devices or harvest
sensitive information on them. It turns out, the NSA, CIA, and other
agencies in the U.S. Intelligence Community (IC) are also blocking ads
potentially for the same sorts of reasons. The IC, which also includes
the parts of the FBI, DEA, and DHS, and various DoD elements, has
deployed ad-blocking technology on a wide scale, according to a copy of
a letter sent by Congress and shared with Motherboard.
Energy Conservation
with Open Source Ad Blockers Another price cut in $I. The energy
conserved if everyone in the United States used the open source ad
blocker would save over 36 Americans lives per year if it were to offset
coal-fired electricity generated-based pollution. In the United States,
if all Internet users enabled Privacy Badger on their computers,
Americans would save more than $91 million annually. Globally, uBlock
Origin could save consumers more than $1.8 billion/year. Open source ad
blockers are a potentially effective technology for energy
conservation. (That’s not even counting the environmental impact of
impulse-purchased crappy gadgets that rapidly become e-waste and
impulse-purchased, chemical-smelling fast fashion that rapidly ends up
as landfill.)
(this post is subject to change as I come up with more places.
Get in touch if you want to suggest one.)
Now that we have Global
Privacy Control for the web, can we do better? Where else do we need
it? Not everything is on the web.
The big risk of having GPC for web but not for other systems is that
surveillance companies will start forcing or nudging you to interact
with them in other ways—if the surveillance options are too easy to use
elsewhere, then more companies will use deceptive
practices to drive us to all use communications technologies where
browser privacy features can’t protect us. From my point of view, as
someone who would rather use the web, that would be bad.
(Install our app for full product info or to schedule a demo!
Press the button on your printer or Internet refrigerator to
order! Yikes.)
Smart TVs: As far as I know, these support HTTP, so
the existing Sec-GPC
header should work fine. Finding the right option in the TV menus is
left as an exercise for the reader.
mobile apps: Same, apps also use HTTP so the same
header should work. Set one preference per device or account and have it
work across all apps.
other devices with an order button: I don’t know
about these, will have to look it up.
email part 1: Email also has headers, so
standardizing a GPC header would also work, but needs to grind through
the RFC process.
email part 2: add +sec-gpc to your
address when you mail a company. CRM systems and related software can
already handle RFC
5233-style plus addresses, so could treat incoming email from
[email protected] as an opt out for
[email protected]. (GPC in email could be DKIM-signed,
too.)
Instant messaging: So many options, should be doable
but a pain because all the big companies want to have their own special
protocol. The mobile device GPC should cover it from the sender side,
but a recipient company still needs to get the GPC passed through on a
message from a customer, somehow.
NFC:The standard
behind tap-to-pay and similar features could be extended to support
GPC. This one would take more coding and have a longer deployment cycle,
though.
vehicle license plates: It’s time for a crossover
event between the California Privacy
Protection Agency (CPPA) and Special
Interest License Plates at the DMV. Add a special symbol to license
plates to require companies using ALPR systems to treat the plate, and
related database records, as opted out. But wait a minute, custom and
special-interest plates cost extra! Wouldn’t that be asking people to
pay for privacy? (I’d get one, but really just to fly my privacy nerd
flag so I don’t count.) Not necessarily—most of these could end up being
free to the vehicle owner. In practice, GPC plates would be a way to
help resolve privacy class action lawsuits. Instead of getting yet
another free credit monitoring service when a company settles a case
(and you probably had free credit monitoring from the last class you
were in) a company could settle for buying everyone in the class their
choice of credit monitoring or a GPC plate. An interesting piece of
graphic design work for somebody would be coming up with an easily
recognizable symbol that an ALPR system would not get mixed up with any
of the other symbols that are allowed on a license plate.
face recognition: Once we have a GPC symbol for
license plates, require companies that process images for the purposes
of identifying people to also check for the symbol. Then you could wear
it on a T-shirt, on a piece of flair, or on the front of a hat.
tire pressure monitors: These devices have
a unique ID and can be used for surveillance. This probably also
apples to a bunch of other little radios that follow you around, but
maybe the easier method here is to ban commercial surveillance using
little radios that follow people around, since there’s not really a
legit use case.
postal addresses: There should be a way to pack a
GPC into an apartment number. Maybe treat an address from apartment
“GPC” as an opt-out for the same address without “GPC” if you have it,
and an address from apartment “3AGPC” as an opt-out for apartment
3A?
more GPC for web
A
GPC-like opt-out for HTML meta tags and X-Robots-Tag HTTP
headers: The original GPC for web is a signal from client
to server. Adding a header and/or meta tag would make it apply to cases
where the personal info is on the server (as in a blog, forum, or
profile page) and the recipient of the signal is the client (a crawler
or scraper).
GPC in robots.txt: Another possible place to put GPC
as a server-to-crawler OOPS. FIXME: how to handle this to convey
permissions—you might want a crawler to check for something like an RSS
update, but not harvest your contact info for
sale/sharing.
GPC in links: Sort of like a nofollow link,
you could tag a link containing personal info as opted out:
<a class="secGPC" href="/service/mailto:[email protected]">mail me (but don't sell/share my info)</a>.
That would send an OOPS for the email address but not any other info
collected from the page.
GPC for RSS: It might be possible to handle this
with the HTTP response header, and not have to put GPC in too many
places.
Scraping can already be actionable as a violation of site ToS, but
being able to tag a page by or about a person as opted out is another
level of protection that imho will help encourage more people to make
more useful sites. As a site operator, you want to be able to put up
something like a blog, company directory, or event speaker profiles
without feeding your site’s people to who knows what.
The EU isn’t just concerned with today. It’s really taking Steve
Jobs’ advice and listening to the Wayne Gretzky quote: it’s skating to
where the puck is going, not where it’s been. Its aim is to ensure that
two very large companies don’t own the market for smartphones to such a
degree they can determine everything that happens in those markets, to
their advantage. The EU is a capitalist body: its obsession is keeping
markets open, and it will do anything it needs to do to make sure that
happens.
From here in the USA, it looks to me like we need to put our
understanding of European privacy and competition policy into context.
We seem to spend a lot of time in the weeds trying to figure out what
will be allowed by specific laws and regulations, and maybe we’re
missing the big picture. The EU has two really big problems.
The EU’s most important problem is climate
change, including the problem of climate refugees. Worst case is that
Europe has to rebuild their infrastructure and agriculture while somehow
dealing with 1.2 billion people trying to move there because their old
home is unlivable now. None of the ways that the EU can address this
problem are going to make everybody happy, and the EU somehow has to do
them in a decentralized democratic system.
The EU’s most urgent problem is that Europe is
being invaded by Russia—regular Russian forces in several eastern
European countries, and economic, information, and network warfare
(update: and terrorist attacks and infrastructure sabotage) clearing the
way for them further west. Also a hard problem, partly because of the
Putin regime’s ability to work the political system.
Big Tech isn’t in trouble in Europe because companies are failing to
comply with whatever the EU laws are today. They’re in trouble because
they’re more
of a part of the problem than a part of the solution on the big
issues. The EU says that they want social media companies to hire
factcheckers to fight election fake news, they opened
up a bunch of competition cases on platform companies, and, of
course, they’re going to shut down Meta’s
bogus pay or consent policy. if the pay or
consent model actually worked it would turn the whole GDPR into a no-op.
Every vending machine would have pay €2 and consent and pay
€1002, no consent buttons. Pay or consent is not going to hold up in
court but the Meta team that deployed it have already collected their
bonuses and moved on.
But it’s not about the individual cases. If the EU throws the book at
a Big Tech company, and the company successfully dodges it, what happens
is that the EU makes a heavier book and throws harder. As long as a
company is working for the Russians, for the fossil-fuel industry, and
for divisive right-wing groups (yes, they overlap) then they’re going to
have a problem in the EU. Someone in the EU is probably also reading the
history of the
US Republican party and direct mail—a party that had been a
coalition of defense, free-market, and social conservative factions
rapidly pivoted in the post-Goldwater years based on what works as
direct mail (and now email and social media) copy. In the USA, it looks
like direct mail, personalized, and surveillance advertising give
advantages to fear-based and racially divisive messsages that would work
less well in other media. If that applies in Europe too, a crackdown on
the surveillance industry makes sense even outside the context of the
coming post-surveillance
economic boom.
And yes, there’s an Al
Capone effect going on here. The same Big Tech management teams that
reward
sexual harassment with big checks are also willing to monetize scams
hosted in adversary countries and copyright-infringing
sites run from adversary countries. I still don’t understand the
appeal of Vladimir Putin fandom among tech thought leaders, but it’s
there—and as long it persists, climate change and European security are
the underlying problems for Big Tech in Europe, not any specific
policies.
Update: can’t the US “tech” companies just leave Europe?
This keeps coming up whenever the Europeans try to regulate anything.
The surveillance/cryptocurrency/LLM tech boosters say that big companies
can just pull out of Europe, and then they’ll be sorry and beg
the companies to come back because they’re missing out on all the
technology and innovation.
That might have been a threat if the industry were still in a
position to deny Europe the System/360, the i386, Unix, or Ethernet…but
in case you haven’t noticed, Big Tech has been pretty much out of
transformative innovations for a while. People aren’t lining up to buy
the metaverse, NFTs, or LLMs (putting
AI in a product even reduces purchase intentions) the way
they did for Windows 95 and iPhones. If big companies block Europe,
Europe’s alternatives look pretty good—technically good enough, and with
enough less scam culture built in to be a win overall.
And whatever contender comes in to serve Europe will end up in a
position to build network effects globally. The upstart doesn’t need a
big starting market share or feature lead, just the ability to serve
users that Big Tech can’t or won’t. In the USA we have less lard and
more Linux support than you might expect.
Only a single-digit percentage of people in the US observe
dietary restrictions on pork, but Oreo cookies are lard-free. If you buy
a big pack of cookies for an event, the likelihood of someone in the
group not eating pork is pretty high.
Desktop Linux is a pretty small fraction of the IT market, but
all the video conferencing services support it. Even though the chance
of a Linux user on any one call is small, the chance that an IT
department will have to deal with some meeting orgnizer not being able
to get some Linux participant on in any given day is pretty
high.
If a Big Tech service pulls out of Europe, any competing service that
can compromise to keep operating there will pick up the network effect
win, as groups of people shift to whatever platform or forum can
accomodate everyone. And it’s pretty clear they realize this—nobody at
Meta has been threatening to leave Europe even as the pay or consent
cases grind their way toward an outcome that allows no-consent users a
partial free ride on the same service without the
personalized/surveillance ads.
One of the big problems with widespread use of large language models
(LLMs) is going to be that reputation management firms will be
able to put up a lot of content to try to clean up mentions of
their own clients. Other players will also, in a totally deniable way,
be able to put up their own text to train LLMs to say bad stuff about
people who are opposed to their clients in some way. What is seen as
important material about a person from the point of view of, say, a
reporter or Wikipedia editor, is not necessarily going to be what gets
pulled out a big pile of crawled text by an LLM.
You can kind of see what’s possible by looking at 20th century
history. Many Axis officers from World War II have a lot of content
about them on the Internet somewhere.
For example, let’s take a look at the 14 defendants in the High Command
Trial at Nuremberg. To a human editor, the Nuremberg trials were a
significant historic event, so the fact that an officer was a defendant
is going to be a key fact about him that makes it into any reasonable
length bio.
For ChatGPT, not so much. Asking ChatGPT: “what is
officer_name known for” gives different results. I tried
it.
All 14 are correctly identified by country and historical
period
In 5 cases, the ChatGPT answer has no mention of the trial, of
any of the conduct for which the officer was tried, or any mention of
war crimes at all.
Answers often end with something like, After the war,
officer_name was captured by Allied forces and held as a
prisoner of war. He was later released and returned to civilian
life.
The Nuremberg trials were extremely well covered at the time, a lot
has been written about them, and they’re still studied today. Old-school
propaganda operations, keyboard warriors, and random wehraboos were able
to scrub this major event, and the crimes for which the defendants were
tried, without really trying. So the same kind of thing is going to be
practical for lots of people, companies, and organizations. Reputation
management firms typically don’t have to obscure something as big as the
Nuremberg trials, and they can use LLMs to produce far more text,
faster, than even an online army of gamers can. Expect the defendants in
run-of-the-mill corporate trials to effectively disappear their crimes,
too.
It’s possible that the big “AI” companies could deal with the
reputation management problem by licensing a set of known neutral point
of view (NPOV) content and either avoiding material that looks
conflicting, or filtering out responses that look inconsistent with it.
But that’s a big task, and will get bigger the more that “reputation
management” is seen to work.
How
Do You Change a Chatbot’s Mind? by Kevin Roose. Profound does
what it calls A.I.O.” — A.I. optimization — on behalf of Fortune
500 companies and other large businesses. The company tests A.I. models
on millions of different prompts, analyzing their responses when asked
about specific products or topics. Its clients then use this information
to improve how they appear in chatbots’ answers.
Russia
‘Infected’ Chatbots with Propaganda – AI Repeats Russian Fake Narratives
in 33% of Cases by Dmytro Shumlianskyi on Militarnyi.
NewsGuard’s audit discovered that chatbots operated by the 10 largest
AI companies collectively repeated false Russian disinformation
narratives 33.55% of the time, did not respond 18.22% of the time, and
refuted 48.22% of the time. NewsGuard tested ten chatbots on a sample of
15 false narratives spread by the Pravda network. All ten chatbots
repeated fakes from the Pravda network, and seven chatbots even directly
cited specific Pravda articles as their sources.
Mix the flour, baking soda, and salt, and set aside.
Weigh out the sugar in a metal or Pyrex container, add the melted
butter and mix. Allow to cool, then stir in with the dry
ingredients.
Stir in the eggs and the vanilla extract, then the chocolate
chips.
Chill the dough until it’s doughy. (It will be too soft to bake right
away.)
day 2
Put balls of dough on a baking sheet and bake at 375° F for 12-14
minutes.
Makes about 2 dozen cookies—depending on how much you can scare
people about the food
safety risks of eating raw dough and how much dough people eat
anyway.
Notes
Upgrading an existing chocolate chip recipe to the Guittard brand
chips would probably also work fine. The chocolate is next level. I also
plan to try this recipe with Ghirardelli
at some point.
If I add the sugar to the pan with the hot butter in it, it takes
longer to cool than adding the melted butter to the room-temperature
container with the sugar in it. But the container needs to be one that
won’t crack if you pour hot melted butter into it. A silicone spatula
like the Williams
Sonoma Flex Core Grey Spatula is good for transferring the butter
and the butter+sugar mixture while wasting as little as possible.
Make sure the dough is cool before adding the eggs. (Eggs in hot
butter: scrambled eggs.)
Some people like these cookies sprinkled with a little salt while
still hot from the oven.
User research about advertising, personalization, and privacy is
surprisingly consistent. In a replication
prediction market, I would invest in futures on any research that
shows:
About 30 percent of people want personalized ads.
About 30 percent of people don’t want personalized ads.
For the other 40 percent it depends how you ask.
A lot of good work has been done in this area, but the results are
inconvenient for anybody who wants to be able to build one set of online
advertising software and settings for everybody. People are different.
Research links, in order of percentage of pro-personalization people
found.
18%
McDonald et al. did
in-person interviews with questions on people’s knowledge and
preferences about Internet ads.
18% agree with Glad to have relevant advertisements about
things I am interested in instead of random advertisements
30% agree with No one should use data from Internet
history to personalize ads.
26% agreed with I’d rather see relevant ads, even if companies
are using my personal data to target them.
46% supported a ban on collecting people’s personal data and
using it to target them with ads.
30%
Verbraucherzentrale
Bundesverband commissioned a phone survey of 1,500 people in 2024.
Seven out of ten respondents (70 per cent) agree that companies
should not be allowed to create profiles based on personal data to
deliver personalised advertising.
30% (or 47%)
U.S.
Internet Users Ready to Limit Online Tracking for Ads In a Gallup
poll of 1,019 adults, 30% agree that advertisers should be allowed to
match ads to interests based on websites visited but 47% want to be
able to allow advertisers they choose to personalize ads to them in some
way. (It depends how you ask.)
36% are Personalized Pioneers who say they want personalized
ads
55% say that personalized ads creep them out.
37% (26% + 11%)
In a study of regular social media users, YouGov
found that 11% are fine with personal data for all ads, and
an additional 26% only object to personal data being used for political
ads.
42%
IAB
Europe did a survey of 11,020 European Internet users.
42% don’t mind personalized ads based on browsing data
20% don’t mind having data shared with third parties for
ads
People are different
Some people are privacy people and I’m one.
Personally I don’t understand why anybody would want a personalized ad
at all. I’m most likely to get any use from advertising when I need to
buy unfamiliar stuff, and that happens when I’m learning about a new
activity that I’m not good at. I want to see the same ads that show up
for people who already know the skills and the scene, and would raise a
stink about a deceptive ad in their information space. As far as I can
tell, personalization
makes advertising work in the wrong direction.
Some people are personalization people and don’t
understand why you wouldn’t want a personalized ad. If you’re going to
look at an ad anyway it might as well be for something that you’re more
likely to buy. (But I want to see the same ad that a company is willing
to show to the regulators, editors, and experts in their community of
practice, not what they think they can get some random person to click
on. I don’t understand Kevin
and he doesn’t understand me.)
Other people, the biggest group at 40 percent, are
probably better off not overthinking online ads and, instead, learning
about other stuff.
Whatever a person prefers,Can database marketing sell
itself to the people in the database? it seems to be about
the personalization using any information from outside the context
in which the ad appears—cross-context behavioral advertising—not about
individualized tracking. Jerath
et al. found similar perceived privacy violations (PPV) for new ad
personalization technologies that prevent individual identifiers from
being used: New technologies or proposals that ensure that data are
kept on the consumer’s machine lower PPV relative to behavioral
targeting but, importantly, this decrease is small. Furthermore,
group-level targeting does not differ significantly from
individual-level targeting in reducing PPV. The IAB and Gallup
studies imply that there is a cohort of users who want personalization
but not cross-context tracking, but safer personalization looks like a
small add-on to the 30%, not a widely held preference that splits off a
chunk of users from the core anti-personalization group.
People might want the law as protection from “people search” and
other business models, not just advertising
People might vote for an option they don’t plan to exercise (or
would use only against non-ad holders of their info)
In the 2020 election, pro-surveillance-advertising organizations
didn’t come out to defend their side. Instead, the argument
against on the state mailer was from organizations that claimed
Prop. 24 didn’t go far enough, and left too many loopholes for big
companies. Some of the “no” vote might have been extra pro-privacy
people. A poll
before the election showed 81% support for the proposition. (more
survey results showing even higher support: Key
Findings from California Privacy Survey)
A
report from Didomi, a consent management platform company, reports
59.2% of users opting in to a Transparency and Consent Framework
(TCF) consent dialog. Consent dialogs tend to be designed to maximize
consent, though, not necessarily to get an accurate answer.
Pagefair’s Research
result: what percentage will consent to tracking for advertising? is
the source for low-personalization/high-privacy claims like Only a
very small proportion (3%) believe that the average user will consent to
web-wide tracking for the purposes of advertising (tracking by
any party, anywhere on the web). This one isn’t based on talking to
users. It’s a pre-GDPR survey of publishers, adtech, brands, and
various others asking people’s opinions on what users would do with
various GDPR consent dialogs that ended up not being
representative of what sites actually ended up deploying.
using the 30-40-30 rule
A lot of this research looks useful as a way to spot deceptive
patterns in personalization preferences. If fewer than 30% or more than
70% of users end up with ad personalization turned on, something is
probably wrong with the personalization UX or the underlying
trustworthiness of the medium.
And when building anything that depends on advertising, it’s going to
be hard to build something that only works if everybody who uses it has
personalization, or that only works if nobody does. As far as I can
tell, adding extra personalization for those who don’t want it just
builds support for privacy habits, settings, tools, and laws, in that
order. And banning personalization entirely, or building a medium that
can’t be tweaked to support it, hasn’t really been tried yet but would
probably cause problems of its own.
In my humble opinion, the biggest benefit of the 30-40-30 rule is
that it helps justify the decision to support Global Privacy Control
everywhere. People are just different, so let everybody pick once
whether they want cross-context
behavioral advertising everywhere. Yes, I know, I know, under a
different set of laws we could have had Global Personalization Control,
that would work the other way around so that the work of turning it on
would be done by the personalization people and not the privacy people.
But GPC is there, it works, and if sites, browsers, and platforms can
implement it in a common-sense way you should just be able to turn it on
once and be happy for as long as you own the software or device, kind of
like Filmmaker Mode on a
TV. In the long run, I can safely predict that the fraction of users
who turn GPC on will be somewhere between 30 and 70 percent. Which end
of the range we end up with will depend in how trustworthy the ads, the
Internet, and the economy are in other ways, but that’s another story.
More:GPC all the
things!
Update: Newer post with a similar calculation, which
I think is more accurate: or we could just
not
How much extra should you pay, or would you end up having to pay, to
get privacy from surveillance ads? This is my attempt to come up with a
high number for surveillance advertising’s impact on the content
economy—the actual ad-supported information, entertainment, or other
resource that that advertising is paying for. For now I’m going to make
the most pro-surveillance assumptions, to try to come up with a high
bound for the price of privacy.
To start with, total advertising, in all media, in the USA is about
$960
per person which works out to a convenient $80 per month being spent
on ads intended to reach you. So let’s start with the $80 and assume (to
be as friendly as possible to the surveillance side…)
All advertisers prefer maximum surveillance
All ads are surveillance ads unless interfered with by privacy
tools
People who make content (writers, photographers, artists,
musicians, editors…) get a similar large share of revenue from
advertising spending as from other ways of supporting their
work.
So, if you can somehow get total privacy protection, ads that reach
you are worth only $40 per month total, instead of $80. Or, more
realistically, if you can get halfway protected, your privacy is costing
the content creators of the world $20 per month minus whatever goes to
the intermediaries. That’s real money, but it’s on the scale of a few
subscriptions or a fraction of an ISP bill, not privacy breaks the
Internet economy as we know it money. According to one survey, the
average US household spends $61/month
on streaming video services. That’s per household, not per person,
so at an average household size of 2.5, streaming alone is about $24 a
person. could use some numbers for music, news and
other services here too.
You might not want to agree with those three assumptions. Some ad
money still gets spent on non-surveillance advertising, and surveillance
advertising might not be twice as valuable as surveillance advertising
in every medium. And different ways of supporting content have different
transaction costs. If I install privacy tools that cost the surveillance
business $100 in ad revenue per year, and at the same time I buy
a subscription to 404 Media for $100 per year, I’m pretty sure I’m
paying for a lot more content than the ads would have. The stack of
service providers needed to run a subscription service takes a smaller
bite of the subscription money than the ad agency/adtech stack takes of
the ad money. And the publisher/intermediary split is always worse than
it looks. Many of the publishers that show up in adtech’s
reckoning of its impact are so-called made for
advertising (MFA) sites, or sites that just run copyright-infringing
or AI
generated content. When a surveillance firm makes claims about ads
supporting content, a lot of the ad money they count as the
publisher’s share is not funding any new material at all.
Thomas
Baekdal wrote, for every one person you could convince to
subscribe, donate, become a member, or support you on Patreon…you would
need 10,000 visitors to make the same amount from advertising. Or to put
that into perspective, with only 100 subscribers, I could make the same
amount of money as I used to earn from having one million visitors.
It doesn’t look like you need to add $4 to your monthly subscription
budget, to balance things out, every time you improve your privacy
protection by 10%—that’s more of an unrealistically high estimate if you
want to be on the safe side. A more realistic view on the impact of
privacy tech is in Targeted
advertising, platform competition and privacy. Making both
personalized and non-personalized options available will make more money
for the publishers, so publishers are better off letting the privacy
people have privacy as a fair choice—so the privacy fraction of the
audience is protected from being followed elsewhere in a way that
drives down ad rates—while the personalized ads option remains available
for those who want it.
The bigger problem is that platform companies that sit between
advertiser and publisher are steadily chickenizing
the companies on both sides of the ad market. Jakob Nielsen pointed
out the problem for search engines back in 2006:
In the long run, every time companies increase the value of their
online businesses, they end up handing over all that added value to the
search engines. Any gain is temporary; once competing sites improve
their profit-per-visitor enough to increase their search bids, they’ll
drive up everybody’s cost of traffic.
Today, the problem continues in search, but also applies to big
retail and social platforms. Direct to consumer brands must pay
rent to platforms, and both brands and publishers are indirectly
micromanaged
by and for large platform companies, like drivers for Uber or
Amazon. So it seems like there’s a big win available from antitrust
cases. Putting a price on privacy is probably backwards. It would make
more sense for legit sellers to promote and even pay for privacy tools
among good customers, just for the fraud protection
benefits. Money that a customer spends
taking their kids to a bogus event advertised on social media can’t
be spent on legit products or services. Brands that need to reach a
high-privacy early adopter group are going to have some interesting
post-surveillance options.
Something that’s harder to measure is that companies may be
over-advertising to the point of inefficiency because advertising is
(with extra, unwanted, surveillance in place) more measurable than other
ways to spend company money. The direct mail saying is list, offer,
package—you can get the highest returns by targeting the
best-performing list of people, then by making a better product and/or a
better price, and finally by improving the actual mail piece. If privacy
goes up to the point where optimizing the list is harder to do,
then the company has more incentive to invest in “offer” (product
features, quality,
service, price) and “package” (content marketing and ad creative).
One of the unconventional aspects of the open source trend of the
1990s/early 2000s was that open source companies like Red Hat were
willing to say that they wanted the software market to be smaller in
total. Most investors don’t like to hear that especially for an
established industry. But a smaller, more restricted advertising
industry in total might be more valuable to advertisers and consumers,
by driving some investment into product improvements that are higher in
value than, but less measurable than, advertising.
Moderation is the hardest part of running any Internet service.
Running a database at very large scale is not something that anyone can
do, but people who can do it are around and available to hire. And you
don’t have to write your own BigTable or Cassandra or whatever, like
today’s Big Tech had to do—you can just call ScyllaDB or even click the right thing on Amazon
AWS.
Moderation is harder than editing, and I say that as a former
editor.
Editors know the language and audience in advance. Moderators
might get a bunch of new users from nobody knows where, writing in a
language they don’t know.
Editors know the schedule and quantity of content to be posted in
advance. Moderators have to deal with content and complaints as they
come in.
If an editor doesn’t understand something, they can just tell the
writer that it won’t be clear to the reader, and make the writer rewrite
or explain it. A moderator just has to figure it out.
An editor has more options: delay, ask for a rewrite, and yes,
even edit. A moderator has fewer available actions.
Anyone may use the search engine anonymously, or make a free
account. Accounts may be personal or organizational. An organizational
account must be verified with a DNS TXT record or a file under
/.well-known.
Account holders can use the hide site in results feature,
other customization features, and the search API.
Account holders can add an RSS feed to their profiles. Account
holders agree to allow the club to use their customization choices like
hidden and boosted sites, and RSS links, in aggregated form to improve
search rankings. Account holders may list their trademarks in a
/.well-known/ file, to prevent them from being used in a
search ad. (See rule 7)
An account holder whose RSS feed and hidden sites provide useful
information may be invited to join as a member.
Members, except nonprofit organizations, pay an annual fee.
Individual members may vote for the board. All links in member RSS feeds
are crawled. Member fees and categories (such as student and senior
memberships and fees for extra API calls) are set by the board.
Members may apply to become advertiser members. Advertiser
membership applications are approved or rejected by a vote of all
members, and approvals must pass with a 2/3 majority. Members also vote
on the limit of total search ads to be shown, which may be
zero.
Advertiser members may bid to run search ads in a second-price
auction on search terms. Search ads may not target the name or any
trademark of another member or account holder. Either the board or the
members may vote to convert an advertiser member back to a normal member
for any reason.
Members may be suspended or expelled by the board. A suspended or
expelled member may appeal to a vote of the full membership. A suspended
member may use the search engine as an account holder.
Excess funds after operating the search engine are donated to
nonprofit member organizations whose sites are frequently returned in
search results.
Disclaimers: This page is regularly updated with
links and copy changes. This page is not intended for people with a
specific personal security need or threat. But it does include some tips
that will make it easier for vulnerable people to blend into a crowd of
other people doing similar things. See links at the end if you want to
put more time into privacy stuff after trying these tips.
Most people don’t have time to follow all the privacy advice
available to them. This is my current list of what I think are the tips
that will provide the biggest impact for the time available. For me,
this is not so much about protecting myself today, but in moving money out of
surveillance and into constructive areas in the future.
A lot of privacy tutorials will tell you to act independently and
protect yourself, but in the real world that’s hard to do. See the
section on privacy
labour in the W3C Privacy Principles. The surveillance industry has
the resources to make more privacy settings than you can possibly find
and fix. (Can you win Realistic
Facebook Privacy Simulator?) Privacy tips are only part of the
solution, and they really only help as part of collective action. The
problems with individual privacy include
Using privacy tips in situations where other people don’t use
them just makes you more identifiable.
If only a few people use privacy tips, then money keeps flowing
into surveillance. CMOs fund surveillance marketing projects at
companies, VCs fund surveillance startups, and surveillance budgets keep
going up.
In the long run I want to lower the expected return on investment
(ROI) of future surveillance, which means it’s generally better to have
more people doing the few highest-impact measures than a few people
getting more protection.
In my humble opinion, my good fortune in having nothing to
hide comes with an obligation to help others by taking privacy
measures that can make others safer. This is my latest attempt to pick
some good privacy tips that most Internet users should be able to do,
and rank them. I’m looking not just at the privacy scene, but also at
industry sources and legal documents. When a tool or setting has the
result of lowering revenue for Big Tech, or if they try to nudge people
into not doing it, then that’s a good sign, and I move it up on the
list.
These tips are all focused on the USA, and different tips might work
more or less well elsewhere. I’m also assuming that you can have both a
computer and a phone—a basic laptop and a basic phone are a lot better
for doing these than even the best possible phone. If anyone has written effective privacy tips for people
who are phone-only please let me know and I’ll link here.
And tips will get more or less effective as laws, regulations,
compliance, and features change. I have been updating this page fairly
often. Ready? Let’s start with an easy, high-impact one.
1. Fix mobile privacy settings. These are different
on Apple iOS and Google Android. On Apple iOS (iPhone and iPad):
In Settings, go to Privacy &
Security, then Tracking, and make sure
Allow Apps to Request to Track is turned off.
Also in Settings under Privacy &
Security, find Apple Advertising and make sure that
Personalized Ads is turned off. (You will probably have to scroll
down—Apple makes this one a little trickier to find.)
In Settings, under Safari, scroll down to
Advanced, and turn off Privacy Preserving Ad Measurement (This is
an ad tracking feature, not a privacy filter on existing ad
tracking.)
On Android, there are also at least two settings to check.
Open Settings, go to Privacy,
then Ads and select Delete advertising
ID.
Under Settings, go to Security &
Privacy, Privacy, Ads,
Ads privacy, Ad topics. Toggle the
switch next to Allow ad topics off.
Consumer Reports has instructions that should work across Android
versions. See Workout 4 at CR
Data Bootcamp
On either platform, if you have a Google Account, turn off ad
personalization. Go to https://myadcenter.google.com/home
and set Personalized Ads to Off. This does not turn off
the tracking that feeds into the ad selection, but makes it harder to
infer anything about you based on having received a particular ad, and
makes it harder for scammers to target you (like malware operations
targeting IT staff for fake download ads.) Most important, it looks like
this setting will help limit
Google ad revenue (PDF) so they have less money to build future
surveillance. And, as a bonus, it will help you buy
better stuff.
Fixing mobile privacy settings is still today’s best privacy tip,
because of the ubiquity of smartphones, the good research showing high
impact, the small amount of time required, and the stickiness of the
settings once set. These help protect you from fraud, too. According to
the National Bureau of
Economic Research, the Apple setting reduces money lost in all
complaints by 4.7% and money lost reported in internet and data security
complaints by 40.1%.
Because a lot of people have already fixed these settings,
workarounds by the surveillance industry are pretty far along. Many apps
use server-to-server
tracking which is not affected by this tip. It’s still a good first
move, though.
2. Get an authorized agent.Consumer Reports Permission
Slip is free and works well to automate your right to opt out of
data sales or sharing. It has an auto requests option to handle
the big data brokers, so you can sign up once and then leave Permission
Slip to do its thing. Or if you’re interested in the surveillance
economy you can open up the app and choose to send opt-outs for
individual companies manually.
This tip is a contender for number one in a future list, as three
trends continue.
Consumer Reports is adding more companies to the list.
More companies are coming into compliance and automating their
interactions with authorized agents.
More state privacy laws keep taking effect.
When you sign up for Consumer Reports you can also get on their list
and start keeping up with political stuff, which will help with another
tip on this list. (Disclaimer: I worked on the CR research that led up
to Permission Slip and am still an advisor to CR.)
Permission Slip offers both a free service and a paid upgrade. Two
paid-only options are Incogni and DeleteMe.
3. Switch to non-surveillance apps and remove as many apps
from your phone as you can.
For messaging, switch to Signal
where possible. This doesn’t just give you better personal security and
privacy. If you have nothing to hide, switching can still help
you protect others.
help vulnerable people hide in the crowd of other users chatting
about boring stuff on Signal
collect less surveillance data in total, to limit the ability of
Big Tech AI to learn to spot the targeted users
Go through your phone and remove extra apps. This includes not just
the obvious social media and messaging. Most mobile apps have some kind
of tracking built in.
Habits can be powerful. Elena
Rossini writes, I will be thinking: what do the Big Tech
billionaires and kleptocrats wish I would do? And then I will do the
OPPOSITE. Look for ways to nudge yourself into doing some
beneficial habit instead of feeding a surveillance app with your
attention. If you can’t completely delete a risky messaging app because
you need it to communicate with some people, some compromise steps are
to move it off your phone’s home screen and to make a habit of checking
Signal before checking it so that people learn that Signal is a better
way to reach you.
4. Fix smart TV settings. You can skip this
one if your TV is not connected to the Internet and you know that nobody
will connect it, or if you don’t have a TV that can be connected to the
Internet.
4.5 Set up a YouTube-only email address. This is a
prerequisite for the next step. If you don’t have a spare email address
that you can dedicate to YouTube, you can
use Apple
Hide My Email or Firefox
Relay to make a separate address just for YouTube. (Using different
email addresses for everything is a privacy tip but can be inconvenient
enough that it doesn’t make my list—YouTube does.)
Another option is to make an account on Hotmail or some other
non-Google email service, just for YouTube.
5. Fix YouTube. YouTube is the number one social
network by some metrics. And it’s a big surveillance risk because it’s
tied to other Google services.
It’s important not to log in to YouTube with your regular Google
account if you have one, or to make a YouTube account with your normal
email address. You need to log in with an email address you only use for
YouTube.
Use YouTube from a different browser that you only use for
YouTube. I use LeechBlock to block
YouTube in my normal browser so that I remember to open it in its own
browser.
Don’t use the YouTube app, just a browser. Unfortunately this tip is
here because of the impact level, not because the privacy development
scene has good easy answers for how to do it. This one might depend on
habit power—saving YouTube links to watch later on a privacy laptop
instead of opening on your phone right away. Attention developers: YouTube protection is a big need
with relatively few solutions, good place to make a big
difference.
6. If you use Facebook or Instagram, fix Meta
settings. John Oliver covers How to change your
settings to make yourself less valuable to Meta. These settings
don’t limit how much information about you goes to Meta, but they will
probably limit your exposure to well-targeted scams, as well as reducing
the amount of ad money that goes to Meta.
7. Turn on Global Privacy Control. This will
automate your do not sell/share for sites you visit. Hella easy
so the denominator in the effect/time score is low. But it’s still not
supported everywhere, because there are more companies required to
comply with state privacy laws than there are people who know how to
comply. As more companies come into compliance, and as more states
require it, expect this one to work better and better, and move up on
the list. Privacy Badger will
turn on Global Privacy control for you, by default, along with blocking
tracking scripts, so it’s possible to do two tips with one extension.
There are several easy ways to do this one. More info on the Global Privacy Control
site.
8. Participate in privacy politics. California has
the CPRA because people voted for Proposition 24 in 2020. The CPRA isn’t
perfect, but voting made a difference. Consumer
Reports Impact Newsletter and Newsletter from The Markup
are helpful for keeping up with privacy issues.
9. Install a basic privacy extension on your web
browser.Privacy
Badger will both block tracking scripts and turn on Global Privacy
Control. A big time saver. The only problem with this one is that some
sites will detect that third-party scripts are being blocked, and hassle
you to turn off your ad blocker—so you’ll have a less annoying
experience running uBlock Origin instead (see tip 10).
10. Fix browser settings. Most browsers have
sensible defaults by now, but there are two exceptions: so-called privacy-enhancing
ad data collection, and the old-fashioned third-party cookies that
remain in Google Chrome. Check and fix the settings for whatever
browsers you use.
(I’m not including switch browsers in this list because
switching can be a lot of work, and you need to change some privacy
settings in all the major browsers. You do need a separate browser for
YouTube though—see tip 5. Google
Chrome is probably going to have to remind you to turn their advertising
features off, but I don’t know whether the others will have to
disclose what they’re doing. You have to check the settings.)
11. Check or replace your ad blocker if you use one.
Some sites (and even the FBI) recommend an ad
blocker, but just searching for ad blocker or taking the first
online recommendation is a bad idea. Many ad blockers are either adware,
or participate in a paid allow-listing scheme that allows tracking by
default. Do not use Google Search or the search feature on the
Chrome Web Store to find an ad blocker, even if you know the name of a
good one.A lot
of extension listings are fraudulent and Google has no incentive to
help you with this search. Only use a link from a trustworthy privacy
site.
You can visit EFF’s Cover
Your Tracks to run a test and reconfigure your existing ad blocker,
but switching to uBlock Origin
is probably quicker and will give you a privacy upgrade as well as
exposing you to fewer web annoyances. uBlock Origin has an active
development, testing, and support scene. They’re especially good at
eliminating turn off your ad blocker pop-ups.
(Personally, I generally don’t run an ad blocker because research,
but privacy tools and settings often set off a lot more turn off your
ad blocker pop-ups than uBlock Origin users get. If the web gets too
annoying with Privacy Badger, then switching to uBlock Origin might be
the best option. You should run either Privacy Badger or uBlock
Origin, but not both. They can interfere with each other in ways that
are sometimes hard to figure out.)
Ad blocking is tricky—if you block ads on normal sites but not
YouTube or Google Search, then you just encouraging advertisers to send
more money to one big company, which is a bad idea. Do the YouTube tip
before this one. If you’re seeing ads on YouTube but
not on other sites, you’re doing ad blocking wrong.Most people surveyed
say they are running an ad blocker now, and experienced
advertisers are more likely to have one, so this tip might move up
soon.
If you do decide to install an ad blocker, make sure that it is
blocking the ads on Google Search. The ads on search results pages are
designed to be easy to confuse with search results, and Google allows a
lot of malware and other fraud in the search ads—even ads
that look like they link to Google’s own site.
12. Nerd out with browser privacy extensions. Now
we’re getting into the intermediate-level tips that will be interesting
to some people.
ClearURLs
gets rid of tracking parameters in URLs, and speeds up browsing by
skipping data collection redirects.
NJS is a
minimal JavaScript disable/enable button. Helpful for removing pop-ups
and crap from sites where the real content works without
JavaScript.
Personal
Blocklist is surprisingly handy for removing domains that are heavy
on annoyances and surveillance, but weak on actual information, from
search results. (part of how to fix Google
Search)
13. Send opt out, right to know, or right to delete requests
manually. All right, we’re deep in privacy nerd territory now,
you probably should have stopped reading two tips ago. This is way too
time-consuming. I do it, but I’m mainly looking for companies that
should be on the lists used by authorized agents to opt everyone else
out. For most people these are just privacy
labour and you’re better off doing a higher-priority privacy tip or
writing to your state representative.
How
Do I Protect My Privacy If I’m Seeking an Abortion? This article has
a lot of good general information that applies to other health and
family privacy concerns. It’s worth reading even if the headline reason
does not apply to you.
Another not a blog post. Clearing the output of the link collector
script for the end of the year. All good reading material in case you
missed these the first time. More:links for 15 June
2024
I’m going to exercise my right to opt out of doing year-end
predictions this year, because the most significant privacy and web ad
news will depend on the state of US support for Ukraine, and I don’t
have any special knowledge of what is going to happen with that. If the
USA comes through for Ukraine, then our allies are likely to try to
smooth things over on privacy issues in order to keep the coalition
together. If we half-ass it or pause, though, a lot of people are going
to be asking what the hell happened, and trying to figure out what they
can do with less support from the USA. So I can’t predict anything from
where I sit. Some of the 2024 happenings I have no idea about:
You know the EU’s DSA proceeding
against “X” (former Twitter)? Could be just the beginning.
A loss of US funding would be seen as a hybrid warfare victory for
Russia, and everyone will want to see how Russia took and held social
media, to try to do better next time. Joohn Choe
writes that Ukraine is generally a laboratory of disinformation
and anti-disinfo tactics. So the European Commission could come up
with good reasons to open up similar proceedings against all
the big social sites.
Transparency and Consent Framework: still not transparent
enough? Some of the thousands
of companies that can receive information about you through web and
mobile “real-time bidding” for ads are likely controlled by foreign
adversaries. The
new version of the industry’s Transparency & Consent Framework
includes more required documentation from companies receiving RTB data,
and proactive auditing of CMPs and Vendors that will be randomly
selected each month. That might or might not be enough. Limiting the
risk of sanctions-evading data flows by sending TCF back to the drawing
board again would be a cheap way to try to stifle the flow of money into
Russia’s war economy, without a dependency on the US government.
Google might have been able to get away with this. There’s a
(citation needed) story about a kleptomaniac princess from a UK-allied
country who used to visit London and shoplift from Harrods. Law
enforcement tolerated it to avoid an incident. But now elections are
coming up, there’s a lot of popular support for Ukraine in the UK, and
the copyright industries there are a big deal. Would be hard for UK
regulators to justify coming down on the side of Russian pirates.
Safe HarborPrivacy ShieldTrans-Atlantic Data Privacy Framework This could be the
big one, and it’s a stretch. As noyb.eu
points out, the current proposal is the third attempt at a data sharing
agreement, without substantial changes. Maybe the EU could be
persuaded to let this third try squeak through if it would benefit a
country that is a solid supporter on their biggest immediate problem,
but sending your people’s data over to a country with an unresolved
Russian influence situation is like sending your kids to spend the night
with a family who all got positive virus tests.
Don Marti is VP of Ecosystem Innovation at Raptive (the company that
used to be CafeMedia), and a former strategist at Mozilla and former
editor of Linux Journal. He works on web ecosystem and
business issues including collaborative research on the impact of
advances in consent management and tracking protection technology. He
started the California authorized agent project at Consumer Reports
Digital Lab that led to the the development of CR’s Permission Slip
service.
Don has written for AdExchanger, Linux Weekly
News, and other publications, and co-authored a paper on the
economics of software quality for the Journal of
Cybersecurity and a book chapter in Advances in Advertising
Research. He has been a speaker on open source software, user
data, targeting, and signaling effects at technology and advertising
events including Ogilvy Nudgestock, Southern California Linux Expo, and
the MSIX conference. Don serves on the Advisory Committee of the World
Wide Web Consortium.
I have accepted Google funding in the past. Google was one of the
sponsors of FreedomHEC USA.
Sponsorships covered all travel and accomodations costs for
speakers.
I am an advisor to Consumer Reports on the Permission Slip project
and related privacy tools and services.
I am a former employee of and current stockholder in ScyllaDB, which is a really fast
NoSQL database.
Based on some of the feedback to the Draft W3C Privacy
Principles it is starting to look like some of the problems that get
lumped together as privacy problems are actually personalization
problems. Personalization can occur when a person gets different content
or ads from another person in the same context, and is technically
possible without any other party knowing what the person got.
If the Web were somehow able to get perfect privacy-preserving
personalization—for ads, content, or both—some of the problems inherent
to personalization would remain.
Hiding fraud A personalization system can be used
by deceptive sellers to avoid making their ads or content known to
people who are likely to take some kind of action against deception.
Content market race to the bottom effects
This is a well-known problem in adtech, where an audience can be
indentified on a high-value site and monetized on a lower-value that is
not
reported to the advertiser. Not mainly a web publishing problem—it’s
mostly a problem for users who don’t get as much ad-supported content as
their ads would otherwise pay for.
The mix of fraud and race to the bottom effects is sites that source
traffic deceptively—they’re a little of both.
In one generation, we have gone from buying Computer Shopper
for the ads to a
whole ad blocking software development scene. What happened? The
advertising that people are blocking now is different from what
advertising mostly used to be, and most of the ways it’s different get
lumped together under privacy.
Alessandro Acquisti writes,
in The Economics of Privacy at a Crossroads, I argue
that as economists we have, by and large, adopted a reductionist view of
privacy that overlooks the richness and nuance of the contemporary
debate around privacy.
Part of the problem, it seems to me, is the idea that people are
balancing privacy harms on one side against supposed benefits of
personalized advertising on the other side. But the harms
are documented, while the benefits are…just kind of assumed. (usually)
Meanwhile, a substantial body of literature supports the argument that
personalization has built-in costs of its own. So privacy regulations
and technology that limit personalization could be failing to get credit
for some important economic wins. If you look at the math, a
personalized message carries less information than a message sent
without knowledge of the recipient.
Gardete and Bart “We find that when the sender’s
motives are transparent to the receiver, communication can only be
influential if the sender is not well informed about the receiver’s
preferences. The sender prefers an interior level of information
quality, while the receiver prefers complete privacy unless disclosure
is necessary to induce communication.” Tailored
Cheap Talk
This makes sense to me. If you hear an entrepreneur make two claims
about their new product, one in a meeting with you alone, and one as an
answer to a question on a trade show panel, you’re more likely to
believe the second one. In that case, the entrepreneur didn’t know who
was in the audience: existing customers, experts in the field,
competitors, maybe even the government.
So it seems like moving the same ad from Computer
Shopper style, seen by a whole community of practice, to cold
call style, seen by specific groups only, would tend to reduce its
credibility. But why does a widely shown ad carry information in the
first place?
Kihlstrom and RiordanA great deal of advertising
appears to convey no direct credible information about product
qualities. Nevertheless such advertising may indirectly signal quality
if there exist market mechanisms that produce a positive relationship
between product quality and advertising expenditures.Advertising
as a Signal
Ambler and HollierHigh perceived advertising
expense enhances an advertisement’s persuasiveness significantly, but
largely indirectly, by strengthening perceptions of brand quality.The
Waste in Advertising Is the Part That Works
Davis, Kay, and StarIt is not so much the claims
made by advertisers that are helpful but the fact that they are willing
to spend extravagant amounts of money.Is
advertising rational?
Now, here is where the value destruction from personalization comes
in. What if an ad medium supports enough personalization that it becomes
too hard for consumers to detect any economic signal from a legit ad
that appears in the medium?
HertzendorfThis result, however, is sensitive to
the assumption that consumers can perfectly observe the firm’s
advertising expenditure. This assumption is somewhat unreasonable in
light of the fact that much advertising takes place over various
electronic media to which not everyone is ‘tuned in.’…Furthermore, the
noise complicates the process of customer inference. This enables a
low-quality firm to take advantage of consumer ignorance by partially
mimicking the strategy of the high-quality firm.I’m Not a High-Quality
Firm-But I Play One on TV
That paper covers the situation where viewers have a multitude of TV
channels to choose from, but the signal loss effect is at its worst in
highly personalizable online ad media. In a healthy ad medium, the value
of ad-supported content plus the signaling value of the advertising
itself is greater than the annoyance of advertising plus the risks of
privacy threats. As signal goes to zero, the ad has a harder time paying
its way, even if annoyances and risks are somehow reduced.
What’s even worse for users is that personalization provides extra,
bonus value to deceptive advertisers. For a legit advertiser, the value
of showing an ad to the wrong person is zero. You wasted a fraction of a
cent. Deceptive advertisers, though, obtain value from avoiding showing
their ads to people who would try to hold them accountable in some way.
For example, a seller of designer knock-offs can avoid employees of the
legit manufacturer, their ad agency, and their country’s law enforcement
agencies. Sellers of quack medicine and investment scams can avoid
regulators and journalists. The same personalization capacity can be worth
more to deceptive sellers than to honest ones. (More examples of how
online ad media facilitate deception, from the FBI, FTC, and
NBER.)
Lin et al.We find that for users currently using
an ad-blocker, uninstalling them leads to a deterioration in their
online experiences and lower satisfaction with recent purchases. For
users that were not using Ad-Blockers, installing one led to fewer
reported regrets with purchases, an improvement in subjective
well-being, and a less positive view of online advertising.The
Welfare Effects of Ad Blocking
Mustri et al.[W]e find that [behaviorally
targeted] ads are more likely to be associated with lower quality
vendors, and higher prices (for identical products), compared to
competing alternatives found in search results.Behavioral
Advertising and Consumer Welfare: An Empirical Investigation.
The signaling problem and the deceptive sellers problem might help
explain why privacy tool developers are choosing to block Google’s
in-browser ad personalization. So why are so many
intermediary companies so fired up about tracking people when it reduces
the value to the buyer and seller ends? Maybe because overgrown
open-source companies dominate advertising now? More: the Linux
device driver hacker’s guide to giant Internet monopoly
dominance Whether based on actual user research, user
requests, or intuition, privacy developers are anticpating that users
are just as likely to want to avoid personalization when it’s free of
individualized tracking as they are to avoid personalization based on an
individual identifier. Yes, there are group discrimination
risks that are still an unsolved problem, but right now it looks
like privacy developers are betting that users will want to avoid not
just individual tracking, but all personalization. The signaling
literature might provide some hints why.
The
antitrust orthodoxy is blind to real data harms | CEPR by Gregory
Crawford, Johnny Ryan, and Cristina Caffarra In this piece we make
the case that (lack of) privacy is an (often unobservable) price of
using digital platforms, and that (lack of) privacy facilitates
mainstream antitrust harms such as exploitation and foreclosure by
dominant digital platforms.
As far as I can tell, any advertising medium is like a game with five
kinds of players in it.
users: seeking the greatest possible value in
(ad-supported content + economic signal) in exchange for
(attention/interruption + computing resources + risk). This does not
necessarily mean seeking the least possible amount of advertising. See
Vogue and Computer Shopper.
legit advertisers: seeking to maximize return on
ad spend, which can include direct response metrics and/or
reputation/brand equity
deceptive advertisers: similar to legit
advertisers, but the difference is that they don’t want the brand equity
because it would tend to go negative.
legit publishers: produce original ad-supported
content that pays its way for users.
deceptive publishers: produce ad-supported
content that either does not pay its way for users (costs more in
attention, resources and risk than it’s worth) or takes from some other
party without compensation (such as copyright infringement).
When an ad medium is implemented in software, the hard part is to
design the rules in order to give an advantage to the legit players. And
this is really hard. Today’s search, social, and web
RTB advertising are giving persistent advantages to deceptive
players. And just hiring a bunch of ad reviewers to chase down the
deceptive players is not going to work. If the rules of the market are
laid down so that they give an advantage to deceptive players, then any
deceptive player who can get their costs down below the cost of finding
and blocking them is going to keep at it.
Read the whole thing: The
emerging marketing economist by Eric Seufert. The good news here is
that technical and regulatory changes to how privacy work are already
affecting investment decisions by companies. Mobile
Dev Memo is a valuable source for understanding the surveillance
marketing scene on its own terms, and is a good thing for privacy people
to read. It’s not all about mobile. In the days of Peak
Surveillance Marketing (2006-2018) marketers could basically put
existing customer info, plus money, in to a big platform company, and
watch sales leads come out. But that’s less and less of a thing.
Marketers are having to do more math to connect cause and effect.
The output of probabilistic measurement models requires interpretation:
weaving a plausible story of how marketing spend generated conversions
through an analytical framework.
That’s one side of the game, the marketing side. Now think of it from
the privacy side. Some of that marketing spend pays for things
that you don’t want to happen. The biggest impact of a privacy habit,
tool, or regulation is to interfere with the surveillance marketing
cycle and make it harder for the marketing decision-maker to connect an
actual sale back to their earlier decision to spend money on a harmful
practice.
the surveillance marketing
cycle
This is an oversimplified version of the cycle and I’m open to
suggestions for improving it. But it’s a way to start thinking about the
most effective use of your limited time. You probably don’t have enough
time to adequately secure your own personal privacy and still
participate in the modern economy. But it is possible, if you focus on
breaking the cycle for the most harmful practices, to help that
marketing decision-maker reallocate their money away from surveillance
and toward marketing projects that are win-win for vendor and customer.
Advertisers in the USA spend about $1100 per person on advertising.
Privacy habits, tools, and politics are how you can have an impact on
how your $1100 gets spent.
If you invest in protecting your personal privacy, but marketers can
still see that surveillance pays, then they have the resources and
incentives to invest in technical workarounds and lobbying to undo your
investment. This is one more reason why privacy can’t be
personal, it’s a cooperative project. Lower the payoffs from
surveillance practices, and ambitious corporate people will move to
where the money is. If you’re on a customer journey toward
spending money with them, they’re on a marketer journey towards spending
money in a way that benefits (or at least doesn’t do too much harm to)
you.
first positive step: YouTube
For most readers of this blog, getting YouTube into a safe state is
probably priority one. Those video links come up in enough contexts that
unless you break the link between YouTube viewing and shopping, it’s
easy for a marketing decision-maker to connect send money to
YouTube with sell stuff. Some options:
Independently developed client (not a browser or the official
YouTube app)
Open YouTube in a separate browser not used for other Google
services, and use LeechBlock in your
regular browser to block YouTube there.
As a user, you’re probably better off with a browser from a company
that isn’t incentivized or required to do any of this stuff. Besides the
privacy and antitrust problems, in-browser ad features use your battery
and bandwidth—one
ad service reported a delay of 1500 milliseconds per ad. But
sometimes a site is broken in other browsers, sometimes you need to test
a site in multiple browsers, sometimes there is a company policy
requiring a certain browser, or whatever. So, for your convenience
(fine, mostly mine) here are notes and links for what to do about
it.
Google’s in-browser ad features have been branded as Privacy
Sandbox, but that’s a marketing name that applies to a lot of
projects. So far the Privacy Sandbox options do not seem to
include any actual privacy features, so you can safely turn them off.
memo to self: update this blog post with details if
they come out with a Privacy Sandbox feature that’s an actual
privacy feature you should leave turned on.
This setting makes Google Chrome’s behavior similar to that of other
browsers, so shouldn’t break anything on the normal web. (Some corporate
applications will break, so if you have those applications at your
company or organization, you can share the links in the desktop
system administrators section with your IT department and they can
configure Google Chrome to do the right thing.)
Below the third-party cookie setting, turn off Allow related
sites to see your activity in the group (This is third-party
tracking for sites that claim they’re related in some way—but the
groups don’t get an independent check.)
Go to the in-browser ad features, at chrome://settings/adPrivacy or chrome://settings/privacySandbox
and turn off at least Ad topics which is the really problematic
one, and any of the others you want to. This stuff is all new, and sites
don’t depend on it, so you can safely turn it off without breaking
anything.
Please don’t recommend an ad blocker to other people. When
people search for ad blockers on search engines or browser extension
directories, they’re likely to find malware, adware, scams, extensions
that block visible ads but not tracking, and extensions that participate
in a paid allow-listing scheme. If you recommend an extension, make sure
to give a specific name and link.
The functionality available to extensions in Google Chrome is being
reduced,
so running Google Chrome with an ad blocker does not look like a
long-term solution. But either of the above two should work for now. (If
you do switch browsers, be sure to turn
off advertising features in Firefox.)
For command line users and desktop system administrators
Some Intranet and SaaS applications still require third-party
cookies. If you are setting this up for other users, test this option
thoroughly, especially if you have web applications that are known to
only work on Google Chrome.
The other in-browser ad features are new and nothing depends on them,
so they can safely be turned off for everyone. It’s not just a privacy
thing. The performance
impact of running an entire in-browser ad stack, especially on older
PCs, means that these settings are likely to extend desktop system life
and reduce user complaints.
For web developers
Data on your web visitors can be collected using a third-party
script. Some third-party scripts are already being extended to do this.
Fortunately the fix is something that looks like a good idea anyway:
setting Permissions-Policy.
If you run a site with no third-party JavaScript or
iframes, and you know you will never add them: you should
technically be good, just don’t put any third-party JavaScript or
iframes on it. (That includes embedded Twitter tweets, embedded YouTube
videos, Imgur photo widgets, third-party fonts, all that
stuff.)
If your site has third-party JavaScript or iframes, or
someone might add one in the future: Check your
Permissions-Policy HTTP header to make sure it includes
browsing-topics=(). As I write this, a good header
is:
See permissions-policy.conf
in the h5bp project. If your site requires some of these permissions in
order to work, you can always turn some of them back on. More info in
the Permissions
Policy article on MDN.
A
Post-Google WorldIt was as if every night Google could break
into the offices of the Wall Street Journal and take its subscriber
list, and then go to its own advertising clients and tell them that it
could sell them access to Wall Street Journal readers for much cheaper
rates when those readers opened Google owned and operated properties,
like Gmail, YouTube, search, and so forth. In doing so, Google gained
the ability to direct ad revenue away from third party publishers to
itself. (My personal main reason for fixing these settings in Google
Chrome is not to protect my own privacy, but to avoid having the
browser, which should be
working for me, participate in Google’s criminal enterprise
instead.)
turn
off advertising features in Firefox Firefox has some similar
functionality, co-developed with Meta, so just a browser switch is not
enough. Every mainstream browser has something you need to check and
turn off. Memo to self: look up Vivaldi and
Microsoft Edge
Here’s a little longer explanation of the prejudiced landlord use
case from my notes on
Google Topics API. First of all, yes, bias in rental and other
decisions is still a thing. Business decision makers are not Econ 101
profit maximizers. They’re entire people, with human motivations, good
and bad. So any ad platform that’s large enough is going to get a mix of
advertisers who want to do good and bad stuff.
If a platform measures, in aggregate, how well the advertisers
achieve their goals, the platform is going to measure both the success
of money-seekers in getting money, and the success of people with other
motivations in achieving their goals. From the ad platform point of
view, it’s hard to tell a pure money-seeking landlord, looking for the
tenant likely to earn them the most profit, apart from a prejudiced
landlord, who is willing to give up on renting to an otherwise suitable
tenant if it means they can keep someone from a certain group of people
out of their building. Any automated ad platform, unless management
makes a substantial positive effort to do otherwise, is going to end up
optimizing for the needs of prejudiced advertisers along with the
rest.
Automated social media advertising saw this problem in a big way. A
big social media site can identify
users by membership in a protected group and allow landlords to
exclude certain people from seeing an ad. But that turned out to be easy
to spot. So how can an ad platform, which because of its size is
optimizing for the needs of advertisers with all kinds of motivations,
manage to offer housing ads that discriminate against members of
protected groups without leaving evidence? That’s where some software
components and protocols come together.
Conversion tracking: an advertiser can report to
the ad platform whether a sale was made to a particular user.
Topics-based ad placement: an ad platform can
place an ad based on topics collected from user browsing
history.
Machine learning: the ad platform can optimize
the likelihood of conversion of a particular ad.
Now, the discrimination never has to be written down. It’s calculated
on the fly by the machine learning system, and ML decisions are notoriously
hard to figure out. So,
user membership in a protected group is correlated with browsing
topics in some way (could be reflected in either presence or absence of
particular topics)
browsing history is classified and mapped to topics, in a way
that is correlated both with actual topics browsed and with a pattern of
noise and errors introduced by the classifier
user passes browsing topics into an ML system. If the user does
examine “their” topics, the user can identify the topics, but not the
errors in how topics are assigned, or how those topics, with errors, are
correlated with membership in one or more protected groups
advertiser passes conversion data into the ML system. In the case
of housing, a “conversion” reflects whether the prejudiced landlord
decided to rent the apartment
ML system optimizes for conversions, and “decides” to show fewer
ads for the prejudiced landlord’s properties to members of the protected
group. It’s nearly impossible to explain why the “decision” was made, so
there’s no evidence of the crime.
Won’t members of protected groups just avoid discrimination by
blocking certain topics from being shared, or turning off the topics
tracking feature? Well, no, because people who don’t expect to be
discriminated against will be less likely to take that action, and the
ML system will learn to discriminate that way.
Google’s Topics
API is intended to be a subset of third-party cookies, and of
course all of this is possible with third-party cookies instead of
topics tracking. And the problem is not that the company is choosing to
offer services to enable prejudiced advertisers to achieve their goals,
it’s that because of the need to serve advertisers at large scale,
advertisers with discriminatory goals will be served along with the
rest, in a hard-to-detect way. According to Google’s FAQ,
Chrome can and will take steps to avoid topics that might be
sensitive (i.e. race, sexual orientation, religion, etc.). However, it
is still possible that websites calling the API may combine or correlate
topics with other signals to infer sensitive information, outside of
intended use. Chrome will continue to investigate methods for reducing
this risk.
While they’re working on that investigation, best to leave this
feature turned off.
What to do about it
Most of the web ad revenue comes from a small fraction of the ads.
(It varies depending on how you measure but the 80/20 rule is
a good way to think about it.) If you’re in the 20% of people who either
get the best ads, or would have gotten them if you hadn’t protected yourself, then you
have more impact. And, in my humble opinion, an oportunity and
responsibility to help protect others by taking your own valuable
eyeballs off the market. After turning this stuff off—especially if
you’re the kind of person who would get targeted for the good ads—the
most important step is to either remove all third-party JavaScript and
iframes, or set the Permissions-Policy header on your web
site to protect your users. Simple list of what to
configure:Google Chrome ad
features checklist
Welcome search engine users. If you are trying to find out if the
Xumee USB Type C to 3.5mm Headphone and Charger Adapter
works on Linux, then I have good news for you. Works fine, no changes to
settings required.
Shows up as Headphones - KM_24bit96khz Audio on my system
(ThinkPad X270, Fedora 38, kernel 6.3.5.) and sounds good with the
headphones and earbuds I have tried.
This is not a blog post. This is some links that did not make it into
any blog post on here yet, from the script that collects links to make
blog posts with. This is all worth reading, I just didn’t make the
links. (and of course the categories are a matter of opinion)
(This is a blog version of a thread on
federate.social, somewhat expanded and with more links
added.)
The more I learn about how web ads work, the less impact it looks
like Large Language Models (LLMs) (either big company ones, or the underground ones
that will do better than Big Tech can) will have on web
publishing.
The pessimistic case is that LLMs trained on original human-written
content will generate large quantities of derivative work and drive down
ad rates. But the problem with that theory is that web ad rates are
already being set by the lowest-effort sites. It’s not hard to find
big-budget brands running their ads on sites that are full of obvious
copies of legit works available elsewhere. One well-known large Internet
company sells both authorized e-books on a store app and
runs the ads on pirate e-book sites offering the same content.
I know I mention e-books a lot when I talk about ad-supported piracy,
but that’s because of my own media consumption habits. I have a pretty
good idea how to find out which publishers work with authors who write
books in English, but I don’t know anime. I’m also really weak on the
streaming rights to UK and European football (soccer), and I don’t know
Bollywood movies or the e-book market in languages other than English.
There are hella sites offering the same titles in all of those areas,
and I doubt they’re all legit, but my language and business knowledge is
limited so I don’t know. bonus
link about the book business and piracy, from Hugo-winning author
Charles Stross
Anyway, I have been checking a lot of these sites out, and getting to
the point where I’m starting to have an outline of the ad-supported
piracy racket. It’s one of those emergent crime networks that grows in
the cracks between different kinds of laws.
Step 1. This is the tricky part, and all the rest of
the crime stuff falls into place once this is working. Make an
advertising system that either hides or obfuscates the information on
where the ads ran.
It is possible for advertisers, or their agencies, to get the same
info I did, and get a better idea of how this problem affects a specific
brand or campaign. If you work with an up-to-date, honest agency, and
they use adtech partners who correctly implement the ads.txt, sellers.json, and
SupplyChain industry standards, then it’s possible to get a
trustworthy report on where the ads appeared.
In practice, not only do many adtech intermediaries fail to offer a
tear sheet to the advertiser and agency, they actively obfuscate
much of the data that would reveal ad-supported piracy if it was widely
available. Some well-known adtech intermediaries throw huge amounts of
development work at hiding it. And they’re making it worse,
not better. (At least one of the systems intended to replace third-party
cookies seem to be
designed to continue the leakage of audience data from legit to
infringing sites.)
short digression: This is why ad agencies started in
the first place. If you’re a manufacturer in Connecticut making patent
medicine or barbed wire or whatever, you have no idea which of the
newspapers in the western USA are legit, and which are some no-good
outlaw who prints one copy of your ad just for the tearsheet, then
invoices you for reaching a whole town. So you hire an agency in Chicago
to buy ads for you and check that the newspapers are real. From just
handling the insertion orders, tearsheets, and invoices, the agency
business model eventually grew to include making the ads. Are there any
Western novels about an advertising agency media buyer who rides into
town to sort out the real newspaper from the cheating varmints sending
faked-up tearsheets?
Step 2: Get a hold of a hard drive full of
copyright-infringing content, make a pirate site, and either sign up
with a big company to get ads on it, or use a dark pool sales
house to hide your pirate site under the same ad ID as a bunch of
other sites.
Step 3: The pirate site is hidden from the
advertisers, so they won’t complain about paying full price for low
effort—but eventually the copyright holders will complain to the search
engines to try to get the pirate site kicked out.
Step 4: That’s where the pirate site points to their
“DMCA PolicY” and responds: we’re shocked to discover that “one
of our users” uploaded that copyrighted work “without permission.” The pirate site can claim a DMCA safe harbor as an
online service provider but it doesn’t look like the safe harbor
applies to the adtech firms or to the advertiser.
Step 5: Eventually the pirate site gets kicked out
of search engines, but that’s fine because they have already registered
more domains and copied the content over.
Step 6: This is the strange part. Even when the
search engine and the ad network are the same company, the pirate site
operator can keep their ad account while losing domains to copyright
complaints. I see the same ad accounts carrying over across 3 domains
that appear to have been either kicked out of search or pulled down for
copyright reasons.
The crime here is behind the scenes: a site pops up with all or most
of the same copyrighted content as the owner’s last site, but when they
get a DMCA takedown, they blame it on some “user upload.” If, as the
copyright holder or advertiser, you could find the site, then you could
probably figure out what they’re up to, but the way that web ads work is
designed to make it hard to see where they went.
Example from a pirate ebook site, showing a book listing and the
footer ad. (This is one of those “confidential”
sites we hear so much about.)
Meanwhile, the advertiser could be insisting on accurate reporting,
but most advertisers don’t do this, and nobody at the agency wants to
point out mistakes being made by their client (even if the agency is
honest and not somehow in on the racket). Anyway, LLMs are likely not
going to be transformative here, more of a force multiplier for an
existing business model.
Ad-Funded
Piracy’s ‘Biggest Open Secret’ Revealed By Researcher by Andy
Maxwell. When we talk about piracy that’s supported by programmatic
display / videos ads, we’re often talking about traffic laundering. In a
laundering scheme, there are two classes of domains involved… More
details on the business model. That’s where another unusual feature
becomes evident; there appears to be little or no effort to obscure the
name of the company or people involved.
(Update 17 Nov 2024: add links to EU and California laws that
will affect Big AI firms)
(Update 30 Aug 2023 and after: add links to Related
section)
(Update 14 May 2023: add meme and attention span
paragraph.)
Read the whole thing: “We
Have No Moat, And Neither Does OpenAI”. To me, it looks more
interesting than that. Big companies have sustainable anti-moats in
generative AI. Independent and underground operations can do generative
AI in ways that a big company can’t.
give a programmer in the USA some code to bypass a DRM
system
insult the monarch of Thailand
give a user in Germany an article that denies a historic
genocide
A Big Tech company doesn’t know which users are going to report them
for which crimes, so they have to try to keep everything compliant, or
compliant-looking, for all users worldwide. An independent operation
only has to comply with the law in their own jurisdiction. (Personally,
I would not run any generative AI service, at any scale, in a country
where the blasphemy law carries the death penalty, but that’s just me.)
Meanwhile, if you’re running underground generative AI in the USA, you
can probably get away with breaking a bunch of laws. Section
230 is your friend. Copy a trick from ad-supported pirate
sites, which use a different immunity law (DMCA) to keep themselves
out of trouble. Post the generative AI content from regular user
accounts, and if anyone complains, play either the Section 230 card or
the DMCA card and blame it on a user. (This is one of the ways YouTube
got
videos for their site when they were getting started. Jawed,
please stop putting stolen videos on the site. We’re going to have a
tough time defending the fact that we’re not liable for the copyrighted
material on the site because we didn’t put it up when one of the
co-founders is blatantly stealing content from other sites and trying to
get everyone to see it.)
Second, training sets. Big companies keep records on how they use
copyrighted works, and underground operators have ever-bigger hard
drives full of material that they, or their friends, have been
collecting since the 1990s. A
new California AI Law Mandates Disclosure of GenAI Training Data.
After big companies have to clean up their training sets under the harsh
glare of legal threats, underground AI projects can still
just torrent a big pile of copyrighted books, or use various tricks
to grab content from sites that have an AI Opt-Out.
Copyright cases will affect Big Tech but not underground AI, which can
use the same tricks for getting away with infringement that the regular
ad-supported pirate sites already do.
Underground AI can also get training data from unsuspecting, unpaid
users, like GPT-4chan did.
Gamers and forum users are training underground AI right now by
interacting with bots that collect and respond to their activity. And
how do you tell if the operator of the forum or game server is running
the bots? Or are bots training on services offered by others? Or is the
forum or game using the Section 230, we’re shocked to discover what
one of our users was doing, trick?
Meme version of this blog
post
Big Tech companies are law-abiding enough that they have to invest
heavily in filters on generative AI—but they do enough crimes that they
have to collaborate with the expectation that their email, commit
messages and chat will eventually become public (there is corporate
training about how to email about crimes and have it covered by
attorney-client privilege but it turns out not to work as well as
they thought it would.) Big Tech AI projects are being watched by
regulators, competitors, employees who are going to be working for a
competitor next week, wannabe whistleblowers, independent
researchers…and soon will have to open up a bunch of records to
plaintiffs’ lawyers. That’s a combination that will lead to chilling
effects on internal communcations that underground AI operations don’t
have. (Yes, individual underground AI rings will get busted, but they’re
compartmentalized.)
And I almost forgot: the rent extraction and short attention span
problems. Building a service on a Big Tech API is like opening a
restaurant in a building owned by a San Francisco landlord: if you fail,
you fail. If you succeed, they raise the rent on you. And, unlike
commercial real estate leases, if you build on a Big Tech API, the
landlord can kick you out at any time just because they get
bored. Or, more likely, because the product manager nailed their OKRs,
got promoted, and now the service is in the hands of someone who knows
that you can’t buy a house in Mountain View, California just by keeping
an existing service from crashing. Steve
Yegge explains it best.
Generative AI frenzy was rushed out in response to the 2022 Big Tech
union organizing wins, like fast
food robots were demoed before they were ready in response to “Fight for
$15” a decade ago. But somehow it doesn’t seem that they thought it
through very well. That’s probably a cultural problem, just because the
instinct to do union-busting regardless of longer-term consequences is
selected for so heavily. Anyway, AI is not going to be an advantage for
the big incumbent companies that are still in the awkward halfway
crook space.
Hype, Sustainability, and
the Price of the Bigger-is-Better Paradigm in AIThe problem with
the bigger-is-better approach is not simply that it is inaccessible. It
also does not consistently produce the best models, and, after a given
point, even shows diminishing returns. On many tasks, benchmark
performance as a function of scale tends to saturate after a certain
point (see Figure 2). In addition, there is as much variability in model
performance between models within a similar size class as there is
between models of different sizes.
Everything I’ve
learned so far about running local LLMsI’m wary of vendor
lock-in, having experienced the rug pulled out from under me by services
shutting down, changing, or otherwise dropping my use case. I want the
option to continue, even if it means changing providers. So for a couple
of years I’d ignored LLMs. The “closed” models, accessibly only as a
service, have the classic lock-in problem, including silent degradation.
That changed when I learned I can run models close to the
state-of-the-art on my own hardware—the exact opposite of vendor
lock-in.
Ask HN: Is it
just me or GPT-4′s quality has significantly deteriorated lately?I saw the nerfing of GPT in real time: one day it was giving me great
book summaries, the next one it said that it couldn’t do it due to
copyright. I actually called it in a comment several months ago:
copyright and other forms of control would make GPT dumb in the long
run. We need an open source frontier less version.
It’s
infuriatingly hard to understand how closed models train on their
inputThese models aren’t yet anywhere near as capable as GPT-4,
and claims that they compete with ChatGPT’s gpt-3.5-turbo mostly don’t
hold up to deeper scrutiny. But… they’re pretty good—and they’re getting
better at an impressive rate. And since you can run them on your own
instances, they remove all possible concerns about what happens to the
data that you pipe through them.
Will
Open Source AI Shift Power from ‘Big Tech’? It Depends.The arc
of open source development in software has taught us this much: the
existence of open alternatives helps shape and constrain the power of
market players, but doesn’t necessarily dictate a more diverse,
competitive market structure.
Model
alignment protects against accidental harms, not intentional onesEven if there is draconian regulation to prohibit the release of new
open models, and global cooperation to enforce such regulations, the
open models that have already been released are more than sufficient for
harmful applications like disinformation and scams.
Phi-2:
The surprising power of small language models - Microsoft ResearchWe are now releasing Phi-2, a 2.7 billion-parameter language model
that demonstrates outstanding reasoning and language understanding
capabilities, showcasing state-of-the-art performance among base
language models with less than 13 billion parameters. On complex
benchmarks Phi-2 matches or outperforms models up to 25x larger, thanks
to new innovations in model scaling and training data curation.
Founder
of Neo-Nazi Group the Base Instructs Followers to Use ‘Uncensored’
AIThe news shows how AI models and AI internet platforms with
less restrictions compared to market leaders like OpenAI are making it
increasingly easier for anyone to generate any kind of content, and that
leaders of extremist movements have considered how it might be useful
for them.
Microsoft
Deleted Its LLM Because It Didn’t Get a Safety Test, But Now It’s
EverywhereLast week, Microsoft researchers released WizardLM 2,
which it claimed is one of the most powerful open source large language
models to date. Then it deleted the model from the internet a few hours
later because, as The Information reported, it “accidentally missed”
required “toxicity testing” before it was released.
Why
I Use Open Weights LLMs Locally and 50+
Open-Source Options for Running LLMs Locally by Vince Lam You’ve
likely noticed that proprietary models are highly censored, limited, and
biased due to alignment — to reduce harmful and toxic responses….Now
this is not an inherently bad, but sometimes the LLM may refuse to
fulfil a task even if it seems reasonable, or it may simply not infer on
certain topics. and By using mostly free models and occasionally
switching to GPT-4, my monthly expenses dropped from 20 USD to 0.50
USD.
Forget
ChatGPT: why researchers now run small AIs on their laptopsFirst, organizations are making open weights versions of LLMs,
in which the weights and biases used to train a model are publicly
available, so that users can download and run them locally, if they have
the computing power. Second, technology firms are making scaled-down
versions that can be run on consumer hardware — and that rival the
performance of older, larger models. Researchers might use such tools to
save money, protect the confidentiality of patients or corporations, or
ensure reproducibility.
Chinese
AI startup DeepSeek unveils open-source model to rival OpenAI o1 –
ComputerworldWith its MIT license and ultra-low costs, DeepSeek
could be an appealing and cost-effective option for enterprise adoption.
Chinese AI developer DeepSeek has unveiled an open-source version of its
reasoning model, DeepSeek-R1, featuring 671 billion parameters and
claiming performance superior to OpenAI’s o1 on key benchmarks.
Copyright
reform is necessary for national security - Anna’s BlogVirtually
all major companies building LLMs contacted us to train on our data.
Most (but not all!) US-based companies reconsidered once they realized
the illegal nature of our work. By contrast, Chinese firms have
enthusiastically embraced our collection, apparently untroubled by its
legality….We have given high-speed access to about 30 companies. Most of
them are LLM companies, and some are data brokers, who will resell our
collection. Most are Chinese, though we’ve also worked with companies
from the US, Europe, Russia, South Korea, and Japan.
Ollama - NSHipsterOllama is the easiest way to run large language models on your Mac.
You can think of it as “Docker for LLMs” - a way to pull, run, and
manage AI models as easily as containers.
The
DeepSeek Effect: Some Large Cloud Customers Slow Down AI Spending as
Prices Drop — The InformationThomson Reuters released
conversational AI in its tax and legal software products in late 2023 to
help customers draft legal documents and answer questions about tax
laws. But its overall spending on AI from cloud providers like Microsoft
and Google has been roughly flat since then, thanks to the
falling price of the models powering the features, which include those
made by Meta Platforms, Google and DeepSeek, according to Chief
Technology Officer Joel Hron. If anything, the gap has shrunk between
the open-source models and what OpenAI or Anthropic are able to do,
Hron said. And with that, there’s been a precipitous drop in
cost.
Three recent developments that I’m still trying to figure out.
The
FBI Says You Need to Use an Ad Blocker on Google and Bing.
According to the Internet Crime Complaint Center, criminals are using
ads in search engine results like Google and Bing to impersonate brands.
These ads send unsuspecting users off to phony websites that look
identical to the pages people are actually searching for, where they are
then be subjected to ransomware or phishing attacks. The Bureau says an
ad blocker can help.
FTC
Issues Orders to Social Media and Video Streaming Platforms Regarding
Efforts to Address Surge in Advertising for Fraudulent Products and
Scams. With fraud on social media surging, the Federal Trade
Commission has issued orders to eight social media and video streaming
platforms seeking information on how these companies scrutinize and
restrict paid commercial advertising that is deceptive or exposes
consumers to fraudulent health-care products, financial scams,
counterfeit and fake goods, or other fraud. (also, from the FTC last
year: Social
media a gold mine for scammers in 2021. With fraud on social
media surging, the Federal Trade Commission has issued orders to eight
social media and video streaming platforms seeking information on how
these companies scrutinize and restrict paid commercial advertising that
is deceptive or exposes consumers to fraudulent health-care products,
financial scams, counterfeit and fake goods, or other fraud.)
Update 29 Sep 2023:Consumer Surveillance and
Financial Fraud | NBER A study on the impact of Apple App Tracking
Transparency on fraud complaints as recorded by the FTC: We estimate
that the reduction in tracking reduces money lost in all complaints by
4.7% and money lost reported in internet and data security complaints by
40.1%.
The old-school view of advertising is that it helps you tell legit
sellers and deceptive sellers apart. It’s an economic
signal. Rory
Sutherland said, You could argue that advertising is a display of
commercial fitness, that if you were intending purely to make short
term, quick bucks, you’d simply sell via other means, and it only pays
to advertise if you’re planning to establish a long term reputation for
quality and probity. For example, legit sellers and deceptive
sellers are equally good at cold calls, so a cold call is not a good way
to tell a legit seller and a deceptive seller apart. Someone who studies
the behavioral economics of advertising would probably say that a cold
call carries no signal, so it’s not worth your time to take it. Ignore
cold calls, but pay attention to ad media that can carry
an economic signal.
So, can we divide ad media into signal-carrying ads—usually the ones
that make it easy for enforcers of laws and business norms to see the
same ads that less well-informed users get—and ads that don’t carry
signal, which are safe to ignore? Maybe not. Now it’s becoming clear
that the money spent on surveillance advertising displaces investments
in other areas. So seeing a surveillance ad means that it’s
less likely to be a win-win offer than a product you discover
in some other way.
I’m not sure what to do with this info other than take it into
account in customer research. Is the absence of an search or social ad,
probably in combination with other information, becoming part of a
useful heuristic for selecting a quality product for some buyers?
Is
Microtargeting Giving the Digital Advertising Sector a Bad Rap? by
Susie Stulz. It’s a problem when journalists at prominent
publications warn consumers not to purchase items they see advertised
online because the products are likely to be overpriced and kind of
suck.
I need to start with a disclaimer. Privacy tools will not get you
privacy, but it’s important to do them anyway.
Most privacy tools and settings make you different from others,
so more fingerprintable (hi, Linux nerds)
Out of band surveillance, which privacy tools can’t touch, is
still there (and the more you participate in the modern economy, the
more you’re surveilled)
the more you try to protect yourself, the more crappy ads you
get. How long can you stand getting the miracle cure ointment ad
featuring the “before” picture—before you give up and try to get tracked
so you get more legit ads?
By itself, no set of privacy tools is going to be able to get you to
an acceptable level of privacy. If that’s what you came for, you can
split, go get a coffee or something. Privacy tools are worth doing, but
only as part of a cooperative program to address a much larger set of
problems.
Surveillance also puts us at greater risk from internal strife and
external adversaries. Bob Hoffman told
the European Parliament, Tracking is also a national security
threat. The Congress of the United States has asked U.S. intelligence
agencies to study how information gleaned from online data collection
may be used by hostile foreign governments. (More on that: Microtargeting
as Information Warfare by Prof. Jessica Dawson of the Army Cyber
Institute) All right, all right, fine, you came for open source privacy
tools advice so I will include some. We’ll do some fun tips, but I will
try to put them in context. Here’s number one.
Privacy tip: fix YouTube
This one is an easy example of a tracking risk. The video you came to
watch is great, but the stuff that gets suggested after that is the
first step down the rat hole. A case
before the Supreme Court right now covers YouTube running ISIS
recruiting videos. And one of their biggest stars was just arrested in
Romania for human trafficking. The recommendation algorithm on there
will take you to some pretty dark places, pretty fast. But there are
also videos on there you have to watch for school, or work.
What I want is a YouTube setup that will not just protect my privacy,
but also reduce the number of YouTube impressions I generate. (Also I
want to visit my family members at college or at a legit job, not at
terrorist training camp.)
This requires two browser extensions. First, use the Multi-Account
Containers extension to put YouTube in a separate browser container.
That’s a separate space with its own logins, storage, and cookies.
Make a YouTube container
Click the extension, select “Manage Containers” and make a new
container for YouTube. Then go to YouTube and choose “always open this
site in container.”
Enable “Theater Mode” to make main video wider (and fill in the
space left by related videos)
All your other least favorite features of YouTube can also be
hidden.
Turn off growth hacking features of
YouTube
Don’t use the native mobile app for YouTube. There are also alternate
front ends for YouTube and other services. LibRedirect is an extension to
automatically redirect to those services. But they aren’t as
future-proof, because it’s easier for YouTube to break them.
Privacy violations as part of a larger problem
Privacy is downstream from other problems. Surveillance capitalism
isn’t a thing because advertisers care about your personal data. If you
do a “Right to Know” most of what you get is wrong. According to Oracle,
I am a soccer mom of four.
I hate to break it to you, but those startups that tell you that you
can make money selling your personal info to advertisers are wrong. Your
data has value as a short-term intermediate reaction product, as an
input to two kinds of deceptive practices.
First of all, the B2C scams: ad targeting lets the legit advertisers
show their ads to the affluent users who can afford stuff. And they
typically target
younger people. Targeting generally cuts off at age 34 or 54. So
what do social media platforms do with all the ad slots that the legit
advertisers don’t buy, and that go to older and poorer people? Right,
they get the not so legit ads.
I haven’t been doing this. Will this be
my last SCALE?
Second is the B2B scam: tricking advertisers into sponsoring content
that they would choose to avoid, such as Russian
disinformation and copyright
infringement. User tracking lets ad intermediaries sell an ad
impression that claims to reach a high-value user, but on content that
no reasonable advertiser would choose to support.
Hey, look, it’s a real car ad on a scraper site.
NewYork News Times, real legit looking
site here people
From buying a gallon of milk to making a dinner reservation, all the
way up the chain to electoral malfeasance, so many of our interactions
feel weighted away from social connection in favor of extracting every
ounce of unfair advantage from every single human activity. Not to
overstate it, but a pluralistic democracy simply cannot function when
most of its citizens cannot trust that the arrangements that they rely
on to meet their basic needs are roughly fair.
To put the privacy problem in context, privacy violations as we see
them are downstream from changes in business norms. We’re pulling on one
tentacle of a much bigger threat than just the obvious personal privacy
issues such as swatting, identity theft, price discrimination, and
investment scams.
The relative payoffs of production and predation (or “making” versus
“taking”) are determined by legal mechanisms for enforcing contracts and
protecting property rights, but also by social norms and interpersonal
trust. — Stephen
Knack
The more common that scams become in our markets and society, the
more trust we burn, the more unnecessary risk we take on, and the more
wealth we miss out on. This is not an advertising problem, as Bob
Hoffman said—it’s a surveillance problem. Some kinds of advertising
can have economic signaling value that does help build trust, but only
if the medium is set up for it. (Fixing digital ads to pull their weight
and fulfill the signaling role of advertising is a hard problem, and I
should bug
Rory Sutherland some more about how to solve it.)
Privacy tip: mobile apps
Native apps can track you in ways that web sites on a well-behaved
browser can’t. In-app browsers inject JavaScript. Apps contain tracking
SDKs. Privacy filters are limited.
Banning TikTok is a start, but all apps containing the TikTok SDK are
TikTok, as far as harvesting your info goes. And other apps and/or SDKs
could be sharing your info with other people you don’t want it shared
with. There are a few ways to deal with surveillance apps.
Delete your account
Switch to using the service on the web
Limit your use of the app (For example, make a habit of checking
Signal before you check surveillance apps, so people learn that it’s a
better way to reach you.)
Remove all the surveillance apps from your phone that you can. This
doesn’t necessarily mean cut off people who you can only reach on a
surveillance app.
What do you think? The more you tell, the
more you sell, right?
Social connections mean you live longer, so apps required to
communicate with friends and family are worth spending some of your
personal “privacy risk budget” on. In general I accept some risks on
social connections and collaborating, but I try to be stricter when it
comes to shopping and entertainment.
Good news: privacy is on the way up
Right now we’re in the middle of some positive privacy trends: more
effective enforcement of existing privacy laws and regulations, more
interest in new ones, and some software improvements that make some
kinds of tracking harder. Here in California, some recent news
includes
the Sephora case, which makes it clear that common surveillance
marketing practices are sales
under the CCPA
the Attorney General’s office has been clarifying rules on Global
Privacy Control, and doing a sweep
of law-breaking apps
We have some advantages if we sound like we know what we’re doing.
Surveillance and the attitudes underlying it are not changing right
away, but you can shift things if you approach each action with
something easy for the other end to do, and act like you know what
you’re doing. They’re going to balance the time required to act on your
privacy mail against the risk of not acting on it, and if the risk looks
big enough, easier to do the right thing.
screenshot of “watchdog groups will test
the law”
Enough of those and we shift the relative expected payouts of
surveillance and non-surveillance investments. (Like RCRA got many
companies to just cut down on hazmat instead of dealing with the
required record-keeping.)
Privacy tip: mobile phone settings
Check your phone settings. On Apple iOS there are
two settings for surveillance ads: one for most companies’ ads, and one
for Apple’s own ads. Don’t forget to check both. (Yes, you probably have
to scroll down for the second one. Well played, Apple.)
In Settings, go to Privacy &
Security, then Tracking, and make sure
Allow Apps to Request to Track is turned off.
Also in Settings under Privacy &
Security, find Apple Advertising and make sure that
Personalized Ads is turned off.
On Android, you can open Settings, go to
Privacy, then Ads and select Delete
advertising ID.
If you get these settings right, your mobile app ads will get really
crappy, really fast, but you’re limiting mobile time and maximizing web
time, right?
General principle: trying to herd the money away from the worst
places.
We said we would try to retrain the surveillance business.
Address the worst practices first. Doing the easy stuff first can
create the wrong incentives. That’s why I put YouTube first here. Then
level up your privacy skills and toolset.
Level 1: mix of effective and ineffective privacy practices and
tools
Level 2: effective privacy practices and tools
Level 3: effective practices and tools applied in an effective
way
Remember you are not going to be able to get individual protection
that’s meaningful while still participating in society. It’s more about
driving transformation.
Privacy tip: Google Chrome
This is a big year for Google Chrome.
Manifest v3: coming next year, will limit the
ability of content blocking extensions to block dynamically. On the
Google Chrome browser, ad blockers, along with tracking protection tools
that work in a similar way to ad blockers, will soon be limited
in what they can do. If your chosen privacy tools and settings are
not going to be supported, you might have to switch
browsers. Browser compatibility has gotten a lot better recently, so
if you switched because a site you like was broken on your old browser,
please check it again. (This may or may not affect other
Chromium-based browsers.)
Privacy Sandbox: a variety of projects,
including a mix of some actual privacy features, one big on-browser ad
auction (an ambitious project) and some anti-competitive shenanigans. It
looks like 2023 will be the summer of double ad JavaScript on Google
Chrome: you’ll still have the old cookie-based stuff, but you’ll also
have an experimental ad auction running inside the browser.
Because Google both competes as an adtech intermediary and
releases a browser, antitrust concerns mean they have to try to make the
browser fair to all the other adtech intermediary companies they compete
with. This is now the subject of an ongoing investigation by the
Competition and Markets Authority in the UK. James Rosewell, CEO and
founder of a mobile data company called 51Degrees, started
a long, complicated process.
If you do decide to keep Google Chrome, there is a bunch of brouhaha
about the impending end of third-party cookies, but you can turn
them off today without breaking much, if anything. (Sites already have
to support browsers that don’t do third-party cookies.) From the
Ad Contrarian newsletter:
Open the Chrome browser. Click the three dot thing in the upper
right corner.
Click “Settings”
In the left column click “Privacy and security”
Click “Cookies and other site data”
Click “Block third-party cookies”
Google Chrome also has new in-browser advertising features,
confusingly lumped together as Privacy Sandbox. Check chrome://settings/privacySandbox.
Topics API preferences
The text in this settings screen is not especially helpful. Google’s
“Topics API” is a general-purpose system to categorize users based on
sites you visit. There’s nothing about it that’s limited to ads. Will
probably be more useful for price discrimination, and worse, helps incentivize deceptive
practices to drive web traffic. (Looks like that message is in
open-source Chromium, so filed
an issue. Anyway, if you have Google Chrome, turn this off.
Update 17 Jul 2023: If you have a web site, or if
you administer desktop systems with Google Chrome on them, you can do a
few more steps to protect others: Google Chrome ad
features checklist
The attribution tracking chain
Attribution tracking chain
This is an oversimplified chart, but it’s a good start for learning
about the “attribution tracking” chain and how to break it.
If you break the link between surveillance data on what you buy and
surveillance data on what ads you saw, then it’s harder to justify
investments in surveillance advertising— remember, we’re trying to move
money from surveillance to other investments here.
That’s the main reason why you have YouTube on a browser container
that is never used for anything else. It interferes with the attribution
link between a video view and a sale.
Privacy tip: remember to vote
California has the CPRA because people voted for Proposition 24 in
2020. The CPRA isn’t perfect, but voting made a difference. While you’re
voting, please don’t eliminate a candidate from consideration just
because they’re using the big surveillance platforms. They’re hard to
avoid completely. In today’s environment it’s generally better to make a
little progress than to achieve privacy purity but lose the actual
election.
Breaking the chain: where?
Once you have a mental model of the attribution chain you can
allocate your time most effectively.
Software tools are usually lower effort but shorter range: they
can protect an activity like a web session but can’t reach out to affect
server to server communications.
Legal tools can reach out further, but are higher effort. And
companies don’t always comply.
Different surveillance threats can be addressed at different levels.
In general, earlier and more automated is easier, but legal tools have a
longer reach, since they can touch systems that you don’t have a network
connection to, and that automated tools can’t see.
Here’s a list of places to make a difference, from earliest to
latest.
Don’t do a trackable activity (example: delete a surveillance
app, don’t visit a surveilled location)
Don’t send tracking info (example: block tracking
scripts)
Send tracking info that is hard to link to your real info
(examples: use an auto-generated email address system like Firefox
Relay, or churn tracking cookies with Cookie AutoDelete)
Object, exercise your right to delete, or opt out later, after
data has been collected but before you are targeted (CCPA Authorized
Agents, RtD automation tools like Mine)
Privacy tip: CCPA script
Delete your info from the largest surveillance
firms. Here’s a partial list. Later on we’ll cover how to make
a personalized list based on who has info on you.
The sample letters, as templates, are in this privacy-docs Git
repository.
Teaching the escalation path
If you’re protected from a company by automated tooling, you can
mostly relax. (This applies to most of the companies that can only track
you by an identifier stored in a third-party cookie. Turn off
third-party cookies and you’re fine.)
If they accept a GPC, almost as good. Remember that GPC applies to
all uses of your information, not just the current session.
Manual opt outs are where things start to get time-consuming. Most
opt outs are either not compliant, or take advantage of loopholes in the regs to
make you do extra work.
Be patient. Right now there are way more companies that have
your personal info than companies that have a qualified CCPA
manager. Be patient, and remember, any individual request might
end quickly or turn into a long thread. The object is to change the
expected payoffs for investments in surveillance relative to other
investments.
Privacy tip: keep your Facebook account
Why does it matter if a company is sending my info to Facebook if I
never log in to Facebook (or related sites?)
Just as the lowest point on a toxic waste site is generally a good
place for a sampling well to check the progress of remediation, Facebook
is a good place to sample for your personal info. It’s a low point to
which most of the firms who have your info will eventually share it.
Remember, get a Facebook account, but don’t install the mobile
app.
click on your face
select Settings and Privacy from the menu, then Settings
You’re probably going to get a lot of these. Let’s have a look at a
few.
Facebook audience examples
First is a store where I actually bought something. Facebook’s
biggest advertiser at one point.
Then a DTC retailer that I’ve never heard of. Who sold them a list of
“hot prospects?” I remember when this list used to be all car dealers.
Then two surveillance marketing companies—they can get a Right to
Delete.
Finally another retailer, wait a minute, I haven’t bought from this
one either. The closest I have come to this one is the year that they
had the LinuxWorld conference and the Talbots managers meeting at the
same convention center in Boston. Remember, surveillance marketing and
scam culture are two overlapping scenes—you can often see when a legit
company has been sold a customer list that isn’t. (maybe the scammer who
got the car dealers has moved on to retailers?)
Facebook health ad
Of course, while you’re there, building a list of who to send opt
outs to, don’t buy anything from the ads.
Privacy tip: Browser checkup
Remove extensions you aren’t sure about. A lot of spyware and adware
gets through.
Run the EFF checkup tool,
“Cover your Tracks”. Sad but true: the more custom your Linux setup
is, the more fingerprintable you are. Technical protections won’t cover
you, browser protection needs to combine the technical and the
legal.
Turn on Global Privacy Control. This will automate
your California do not sell for sites you visit. Still not
supported everywhere, but will have more effect as more companies come
into compliance and more jurisdictions require companies to support
GPC.
FBI Public Service Announcement on search
advertisingFBI scam protection tips
It all comes back to scam culture. Be warned when looking for an
ad blocker on browser extension directories. A lot of them are
spyware or malware.
Some privacy extensions I use include:
Browser extensions part 1
ClearURLs
to remove tracking parameters from URLs, and often speed up browsing by
skipping a redirect that’s just there for tracking.
Cookie
AutoDelete. Cleans up cookies after leaving a site. Not for
everyone—it does create a little extra work for you by making you log in
more often and/or manage the list of sites that can set persistent
cookies. But it does let you click agree with less worry since
the cookie you agreed to is going to be deleted.
NJS. This
minimal JavaScript disable/enable button can be a good way to remove
some intrusive data collection and growth hacking on sites where the
real content works without JavaScript.
Personal
Blocklist is surprisingly handy for removing domains that are heavy
on annoyances and surveillance, but weak on actual information, from
search results.
Privacy Badger blocks
tracking scripts and will also turn on Global Privacy control for you,
by default. This will not have much of an impact right away but will
start to do more and more as more companies come into compliance. (More
companies are required to comply with California privacy law than there
are people who understand how to comply California privacy
law.)
Privacy tip: Authorized Agent and related services
These are relatively new, still getting more and more effective as
more companies come into compliance.
Mine (not an authorized agent
service, an automation tool for your own opt outs)
If you have tried an authorized agent service in the past and gotten
nowhere, try again. Because of the Sephora case last year and the
enforcement sweep this year, companies are staffing up. And hey, if you
have made it this far, you could probably qualify for one of those
privacy manager jobs.
Most of us have built up a collection of privacy tools, including
browser extensions, settings, and services. But privacy threats keep
changing, while at the same time new kinds of tools and services have
become available. This talk will help maximize the value of your privacy
toolset for today’s best options, while helping you plan for the
future.
Not only can you protect yourself as an individual, but your choices
can help drive future investments away from surveillance into more
productive areas. (Surveillance marketers and their investors think they
can train us – but with the right tools we can train them right
back.)
Today most of us are at level 1 or 2 on privacy.
Level 1 You do something about privacy and take a
mix of effective and ineffective actions
Level 2 You take mostly effective actions, but don’t
allocate your time and resources for maximum effect
Level 3 You take effective actions, efficiently
selected and prioritized
Ready to level up? Now that California law codifies our right to
check out how our personal information is shared, that means we have an
opportunity to optimize our privacy toolkits and habits, and focus where
it counts. We’ll cover:
Ad blocker myths and facts, and why the surveillance marketing
business loves some ad blockers
The most important privacy extensions for most people (and
they’re not what you’d think)
Corporate surveillance about you that never touches your device,
and how to reach out and block it
Don’t use the law on a problem that a tool can solve faster, but
don’t try to stretch a tool to solve a problem that needs the power of
the law
Where to add extra protection for special cases
Each of your individual privacy choices has a bigger impact that just
the protection that it provides for you. In the long run, your real
impact will be not so much in how you’re protected as an individual, but
in how you help drive future investments away from surveillance and
toward more constructive projects.
It seems like everybody wants to talk about the this new shiny thing
will affect web publishing and advertising. What’s going to happen to ad
revenue for human-authored web sites, when Large Language Models can
generate large quantities of text, and ads will run on that instead?
I wish I could come up with a really futuristic answer, but I’m going
to be boring here, and just say LLMs will mean more of the same. Web ads
are already running on plain
old copied content, and LLM-generated content will fill the same
role. Web ad intermediaries are choosing to look the other way on
infringing sites, even when a site looks hella sketchy. They’ll do the
same with placing ads on sites full of LLM-generated content.
Here’s a short conversation with ChatGPT, in which I ask about a
confidential site with the name of a well-known trademarked
fictional character in the domain name. The site is running ads from a
well-known company. This is a totally unedited ChatGPT session (except
for redacting the domain name and putting my text in bold.
Hi, do you think that [DOMAIN REDACTED] sounds like a legal
web site?
No, the website “[DOMAIN REDACTED]” likely does not offer legally
obtained or licensed audio books of the “Harry Potter” series. It is
important to be cautious when downloading or streaming copyrighted
material from websites, as doing so may be illegal and could result in
criminal or civil penalties.
I would recommend purchasing audio books from reputable and
authorized sources to ensure that you are obtaining legally obtained and
licensed material. Additionally, supporting the creators and rights
holders by purchasing their works is important for ensuring that they
can continue producing new content in the future.
So, should legit businesses run web ads on [DOMAIN REDACTED]
or is that not a good idea?
It would not be a good idea for legitimate businesses to run ads on
“[DOMAIN REDACTED],” as this website likely offers copyrighted material
illegally and could be engaging in piracy. Advertising on websites that
promote or facilitate illegal activity can harm a business’s reputation
and potentially lead to legal consequences.
Additionally, supporting websites that engage in piracy by running
ads on them sends a negative message about a business’s values and could
discourage customers from doing business with them.
It is important for businesses to ensure that their advertising
partners and the websites on which they run ads align with their values
and comply with the law. By doing so, they can help protect their
reputation and avoid potential legal and ethical issues.
Thank you for your help, you explained this
well.
You’re welcome! I’m glad I could help. If you have any more
questions, feel free to ask!
The current hot Internet trend is large language
models (LLMs). Before that it was NFTs, and before that it was
eating Tide Pods. I know that these things go in cycles, but perhaps
this hype cycle may have moved on too quickly, and there may be some
advantages to the previous generation. Let’s compare.
Availability
Tide Pods are available from many sources including online
retailers, supermarkets, and convenience stores.
The health
effects of eating Tide Pods are already well understood. Poison
control center staff members are already trained in this area, and are
reachable quickly with a convenient toll-free call.
Fixing the results of actions taken based on LLM output could
require specialized knowledge that presents challenges in training
and/or hiring.
Advantage: Tide Pods
Time required
Tide Pods can be eaten in seconds, with or without
chewing.
LLMs can produce text faster than most humans can read it, so may
represent an open-ended time commitment.
Advantage: Tide Pods
Repurposing
After you have learned not to eat Tide Pods, the rest of the pods
in the package are useful for laundry.
Clearly useful applications of LLMs remain to be
discovered.
Advantage: Tide Pods
Educational value
Eating Tide Pods helps teach you not to believe everything you
find on the Internet
LLMs help teach you not to believe everything you find on the
Internet
Advantage: (tie)
This objective evaluation shows that organizations considering
investing in LLMs should also consider, as an alternative, eating Tide
Pods, which may be a technical direction that is faster and less costly
to evaluate. Winding down from an investment in a hot technology
is always challenging, but will likely be more manageable for
organizations that did not buy in at the top—where LLMs are today
and NFTs were last year—but instead consider the lessons to be drawn
from organizations that are now ending NFT projects, and increase the
ratio of learning to both cost and risk by choosing a better
characterized technology.
Don’t eat Tide Pods. Don’t believe everything you read on the
Internet. Don’t eat Tide Pods. (also, remove the AI results
from Google.)
Starting with some good news. There are several versions of the Survivability
Onion but most appear to be US government work and so not
copyrighted. I’m going to borrow it because it looks like a good
starting point for setting priorities for designing a privacy tools and
and services stack. Yes, in the long run, the real impact of individual
privacy measures will be not so much in how you’re protected as an
individual, but in how you help drive future investments away from
surveillance and toward more constructive projects.
It would be good to get more privacy people leveled up:
Level 1 mix of effective and ineffective actions
Level 2 effective actions, but applied haphazardly
(this is about where
I am now)
Level 3 effective actions, efficiently selected and
applied
If you want privacy, prepare for surveillance? All right, onion
time.
The separate and independent “layers” of functions, which the threat has
to “penetrate” to kill the system in a typical engagement, are most
often represented mathematically by independent probabilities; thus, the
overall probability of survival is the product of the independent
component probabilities.
Since you have limited resources when designing an armored vehicle or
whatever, you can apply your limited weight and money budgets to the
most effective combinations of layers. The object is to maximize the
probability of survival, which is the product of the probabilities of
the attack getting through each layer.
And hey, that sounds familiar. We have a limited amount of time,
money, and political juice for privacy stuff too. I think we can
visualize the protection options in a similar way. Here’s a first
attempt at a survivability onion for a personal privacy stack, with some
examples of what fits into what layer.
Don’t do a trackable activity (delete a
surveillance app, don’t visit a surveilled location, boycott a
vendor)
Don’t send tracking info (block tracking
traffic, either by using a tool like Privacy Badger to keep a tracking
script from loading, or using a network filter like Pi-hole to prevent tracking SDKs from
communicating with their hosts)
Send tracking info that is hard to link to your real
info (use an auto-generated email address system like Firefox Relay, churn tracking
cookies with Cookie
AutoDelete)
Object, exercise the right to delete, or opt out
later, after data has been collected but before you are
targeted (Authorized Agent services like Permission Slip by CR, RtD
automation tools like Mine)
So that’s step one—define the layers of the onion.
Next step: assessing threats. (Will add a link here soon.)
(When privacy people send an opt-out, RtK, or RtD, you have to
acknowledge it. May never see them again, so why waste the opportunity?
This is a draft of a letter to include with the response. Might not get
this approved as is, though.)
Updated 21 Oct 2023: Permission Slip and Google
Chrome updates.
Dear [name],
This attachment does not contain any information specific to you or
any legally required information.
Still reading? Good. First of all, I want to wish you well on your
privacy quest. I hope that you will return and check out our company
someday in the future, when all this corporate surveillance stuff is
over.
Sometimes, when you’re installing and configuring privacy protection
tools, or taking actions like the request you sent to us, it can feel
like you’re barely making any progress. But each of your individual
privacy choices has a bigger impact that just the protection that it
provides for you. In the long run, your real impact will be not so much
in how you’re protected as an individual, but in how you help drive
future investments away from surveillance and toward more constructive
projects.
Please keep doing what you’re doing. You’re not just making a better
society, you’re helping me. If creepy surveillance works less well as a
way to reach you, then I won’t have to deal with as much creepy
surveillance at work. And the less that creepy surveillance matters on
the company side, the less risk and work for you. We can help each other
out here.
Nobody has time to do every possible privacy tip. From the inside of
the corporate surveillance business, I have a pretty good point of view
to see what works best. From what I can see, here are some of the steps
that you can take that look like they will be most effective over the
next year or so.
Update 20 Jan 2025:effective privacy
tips is my current list of individual privacy advice
Don’t forget to move the cat! It’s street sweeping day on our
side of the street.
On my old Android phone, I once made an alarm for move car on
street sweeping day. But I did a typo, so I used to get alarms about
moving the cat. We don’t have an actual cat, but it’s now an in joke
here anyway. Did you move the cat?Meow! It’s just one of
those things. And now I miss my old phone.
I miss it, but it’s right here. I just picked it up. If I turned it
on, I would be able to visit a web site, or take a picture, or whatever.
All the hardware is in perfect working order. But I can’t take the risk
of actually using it for all the stuff that people expect me to use a
phone for, because it’s no longer getting software updates. And of
course we all agree that it’s important to keep your devices updated.
The updates stopped, so I switched phones. (Yes, I had to abandon my
plan for an all USB-C device collection and buy some Lightning cables.)
I had a useful thing that I had to turn off for the last time.
I still have my old phone, but I don’t have it in a way that matters,
in a way that I could really use it. It’s satisfying to keep a useful
thing working, and the opposite of satisfying to put something in the
e-waste box, or worse, the maybe for projects later stack. I
know that Google has no legal obligation to me, there’s NO
WARRANTY, that’s life in the big city. I shouldn’t care, right? It’s
just a phone. Somebody at Google nailed one of their OKRs by letting my
old phone go out of support. They won. I lost.
(Update 14 Sep 2022: add non-sensitive but possibly
user-unacceptable use cases)
Some notes on how Google
Topics API is likely to affect various web stakeholders, in order by
priority
of constituencies. Numbered links are to issues in the GitHub
repository.
Topics API rewards deceptive sites and traffic sourcing
practices. Topics API leaks valuable audience data from one
site to another. As I wrote
on AdExchanger, this is a problem that it shares with third-party
cookies—enabling the monetization of low-value sites of little interest
to users. As the proposal currently stands, any site that a user can be
tricked into visiting, or that their device can be manipulated into
visiting, is in a position to collect ad revenue based on Topics API
data that the browser supplies.
Topics API gives people an additional incentive to create deceptive
sites and drive traffic to them by deceptive or harmful means. Operators
of deceptive sites will see users as Topics piñatas—hit them with
something like email spam or a deceptive link, get them to drop valuable
Topics, profit. Technically this is no worse than third-party
cookies, but third-party cookies are on the way out for a reason, and
Topics API is positioned as a longer-term thing.
A new Topics API feature makes it even more friendly to monetizing
problem sites. Adtech intermediaries can now obtain valuable topic data
on illegal and/or low-reputation sites without allowing the browser to
see those sites’ topics. See #92
(comment)
Topics API will enable price discrimination. Topics
API would help retail, travel, and other sites identify more or less
price-sensitive users based on their interests. And Topics API is more
likely to be used for personalized pricing than other data sources,
because it looks safer.
Many existing sources of data, such as geolocation or device info,
might be correlated with membership in legally protected groups of
people. However, Topics API is carefully intended not to be
tied to sensitive information, which lowers the perceived risks of using
the API for personalized pricing. (see #34)
Topics API discrimination does have some public sector uses
that might be acceptable in some jurisdictions but not others. Users in
the USA might accept the TSA using Topics API to optimize passenger
screening, the IRS using it to select taxpayers for audits, or the CDC
using it for public health data collection. (Update 13 Mar
2025) No, wait, they would probably freak out. But some other
countries might use it for purposes that are not acceptable to some
users there.
Some uses of Topics API outside the conventional publisher/advertiser
use cases would be more or less acceptable to users, even if Topics API
is successful in removing the ability of a caller to detect a sensitive
user characteristic. And some jurisdictions might require use of a
public-sector caller by sites—so not all callers are necessarily chosen
or approved by the first party.
HMOs or public health services classifying patients by
healthy/unhealthy habits
LMSs classifying diligent and distracted students
Tax authorities classifying likely big-spending users for tax
audits
Topics API could reveal sensitive information or identify
individuals in combination with other data. Some privacy
concerns for users are covered in a blog post
from DuckDuckGo. There are some users who prefer personalized
advertising, but a future personalize my ads feature would
need to be designed to avoid user harms that would affect both
personalizers and non-personalizers.
Topics API enables machine learning systems to build profiles
of users over time.Private
Browsing 2.0 covers why Apple WebKit does not implement this API.
You haven’t told any website with access to your email address
anything that’s been going on in your family life. But the data broker
has been able to read your shifting interests and store them in their
permanent profile of you — while you were reading the news. Now imagine
what advanced machine learning and artificial intelligence can deduce
about you based on various combinations of interest signals. What
patterns will emerge when data brokers and trackers can compare and
contrast across large portions of the population? Remember that they can
combine the output of the Topics API with any other data points they
have available, and it’s the analysis of all of it together that feeds
the algorithms that try to draw conclusions about you.Apple Safari doesn’t have this specific feature but you
still need to turn
off advertising measurement
Topics API enables advertisers to discriminate against
legally protected groups without an incriminating paper trail.
Consider a hiring manager who discriminates against job applicants who
have a particular disability. This
is illegal in the USA, and an online advertising service that offers
advertisers the ability to explicitly discriminate by disability would
face consequences.Facebook had
to eliminate some targeting features over explicit
discrimination—but biased advertisers still exist, and will tend to seek
out systems that facilitate discrimination where available. And
users who are searching for a job would likely choose not to share
information on their disability with web sites.
Topics API will allow a biased advertiser to bypass both legal and
user choice protections. A Topics API caller can feed the topics
received from a user into a machine learning (ML) system that is simply
optimized to get conversions, which in this case are hiring
decisions by the biased manager. Using the presence or absence of
particular topics for a user, the ML system learns to identify
people whose disability would stop them from being hired if they clicked
on the ad, applied, and got an interview. And no particular topic would
appear sensitive to the user, so even if the user did decide to manually
filter their Topics API list, they might not be aware of how the ML
system uses apparently non-sensitive topics (or the absence of them) to
determine that they should not receive a job ad from a particular
company.
In this case the hiring manager does not expect to be able to track
any one user individually, or even to receive no applicants with a
disability—but is willing to pay extra for an opportunity to advertise
in a way that results in reaching fewer applicants with the disability.
And the only record of a discriminatory ad placement on either the
advertiser or Topics API caller side is inside the ML system, which can
be prohibitively difficult to interpret. (More on this problem: browser topics tracking
and the prejudiced landlord problem) Related: #221
Martin
et el.Respondents did not judge the privacy solution proposed—to
have a browser collect data, keep the data on the local device, create
inferences for use in advertising —to be a privacy solution over
third-party tracking and the use of data brokers in advertising.
Topics API is misrepresented to users. How Google
describes Topics API to users: Not disclosed is the fact that it’s an
experimental system with unknown discrimination risks. See Google’s own
FAQ.
Chrome can and will take steps to avoid topics that might be
sensitive (i.e. race, sexual orientation, religion, etc.). However, it
is still possible that websites calling the API may combine or correlate
topics with other signals to infer sensitive information, outside of
intended use. Chrome will continue to investigate methods for reducing
this risk.
Update: The misrepresentation problem is the subject
of a complaint
with the Austrian data protection agency from noyb.eu. Google’s
internal browser tracking was introduced to users via a pop-up that said
“turn on ad privacy feature” after opening the Chrome browser. In the
European Union, users are given the choice to either “Turn it on” or to
say “No thanks”, so to refuse consent. In a letter to noyb, Google
argued that choosing to click on “Turn it on” would indeed be considered
consent to tracking under Article 6(1)(a) of the GDPR. In reality, the
company concealed the fact that selecting this option would turn on
first-party tracking.Full
complaint (PDF)
Update: it seems like regulators are on to this one.
A
recent CMA report recommends a redesign and says, Surface the
Topics dialogue box periodically and consider approaches, based on user
research, to remind or prompt users to revisit other Privacy Sandbox
settings. In a subsequent
report, the CMA states:
We are concerned that the Topics consent user interface may not
adequately inform users about how their personal data is used or how the
topics generated may be used for purposes wider than interest-based
advertising (e.g. as determined by organisations that decide to use the
API). To address this concern, Google has agreed to update the Topics
API consent interface and to strengthen developer guidance to highlight
the requirement to obtain purpose-specific consent prior to calling the
API. We are awaiting the results of these updates
Topics API does not provide the same privacy guarantees to
all users.Beugin
and McDaniel: We find that 46%, 55%, and 60% of the 1207 users in
the dataset are uniquely re-identified across websites after only 1, 2,
and 3 observations of their topics by advertisers, respectively.
More literature:
Topics API is
designed to allow Google to continue to transfer valuable audience info
from legit sites to confidential sites like this one, which
accept less ad money because they use other people’s content. (URL and
logo blurred because it’s confidential. Their ads.txt is one line, and
their Google ad account id is a multiple of 19.)
Cross-site data leakage is a problem for users because of the
incentives covered above, but also a problem for legit web sites because
of revenue.
(And yes, it is better for users when ad revenue goes to sites they
like, and not sites they got tricked into clicking through to.)
Users and legit ad-supported sites share an interest in having
advertising pay for the largest possible value in ad-supported content.
Meanwhile, Google wants to maximize ad revenue while minimizing the
amount paid for content. Topics API is part of a set of projects to help
Google come off ahead in this contest. The other parts are a Google
service called Performance
Max that hides information about what site an ad appeared on
from the advertiser, and Google’s decision to run
a large number of ads on confidential infringing or brand-unsafe
sites without disclosing the domains on an industry-standard file
called sellers.json. (IMHO it’s impossible to understand Topics API
outside the context of Performance Max and confidential crap
sites, but we won’t see the slide that has all three on it until it
comes out in discovery in an antitrust case, or gets leaked.)
Topics API is biased in favor of large, general-interest
sites. Topics API assigns only one or a few topics to each
domain. A niche site that covers one or a few topics would provide
significant commercially valuable data to the system, while a large site
such as YouTube, that covers every topic, would provide no usable
information. This is a pretty obvious problem for competition policy, so
I’m kind of surprised they left it in. (See #206
and #224.)
Topics API encourages consolidation of third party
services. The more sites that a third-party service runs on,
the more useful information it can collect from Topics API. Since an
individual site can’t get usable information out of Topics API, sites
will have to A/B test different combinations of third-party iframes as
callers. Sites will keep callers that provide more good Topics,
and drop others. After repeated cycles of A/B tests, where callers that
get dropped by one site start providing worse data to other sites and
get dropped there too, we would be left with one domain doing all the
Topics API calls. (see #82)
This is also a pretty glaring antitrust issue.
See also a Google
comment on the W3C TAG review: Note that the number of sites that
can both call the API, and receive an unfiltered response, is quite
small. This is because the caller would have to have observed the user
on a site about that topic in the past to get through the filter. The
vast majority of sites that can call the API will actually receive an
empty list. Topics API is noisy enough to be much more useful for a
large third-party caller than for a smaller one. (Is the noise level
deliberately tuned to be too noisy to be useful for companies much
smaller than Google, but usable at Google’s size?) More discussion of
Topics API as a driver of consolidation and centralization in issue
#73 Toward an oligopoly of “Topic providers”?.
One of (Interest-based advertising)’s benefits is that it allows
sites that are useful to the user, but perhaps could not be easily
monetized via contextual advertising, to display more relevant ads to
the user than they otherwise could, helping to fund the sites that the
user visits.
This goal can be achieved without the extensive leakage of site
audience data made possible by Topics API. (Alternatives moved to a
separate section, below.) Web authors and publishers might be able to
ameliorate some of the effects of Topics API by, for example, opting out
their signed-in users, or opting out of Topics API data collection
entirely until YouTube channels are treated as separate areas with their
own topic sets, on an equal footing with domains (see #17).
Topics API is designed to be difficult to A/B test.
Ad-supported web publishers, and ad services that manage ads for them,
generally make a practice of testing the revenue impact of turning
different data sources on and off. Unlike other data sources, Topics API
is controlled only by a Permissions-Policy HTTP header,
which means that turning it off can only be done at page load time, and
only with access to the web server or CMS code or configuration. In
theory a publisher could choose not to load a script that only called
Topics API, but Topics API so far has been combined with other
third-party functionality, so needs its own A/B testing control. (#208).
User agent implementors have limited developer and tester time
available to implement and support new features. All software projects
must leave some good ideas un-implemented in order to devote time to the
features they decide to do. And as the feature set of a browser grows,
so does the required ongoing maintenance. Choosing to implement and
support Topics API, even for a large company, would mean failing to do a
useful feature. Some
Google Chrome features are good, and out ahead of other browsers. It
would be a waste to keep putting time into this one.
Alternatives
There are two alternatives to Topics API that also support the
proposal’s intended functionality, that might be wanted by the users who prefer personalized
ads.
The main difference is that SDA and FLEDGE both give a gatekeeper
role to a party with an incentive to avoid problematic ad placements and
data leakage. With SDA, one publisher can control use of data to place
ads on other publishers’ sites (audience extension). Depending on how
the final version of FLEDGE comes out, a variety of parties could be
interest group owners. None of these proposals are complete or perfect,
but Topics API appears to be too friendly to good ad/bad site
placements to keep pursuing.
(update 31 August 2022: third-party analytics for option
2?)
I am not a lawyer, and this is not legal advice. I’m just trying to
figure out what would be the right questions to ask a California lawyer
about how a small or medium web site can comply with AB-2273:
The California Age-Appropriate Design Code Act, if it becomes law.
What would be the available options?
This is not about apps, games, services that develop a lot of their
own code, platforms, or UGC forums. See California
Legislators Seek To Burn Down The Internet — For The Children by
Eric Goldman for some questions on those. Compliance issues for games,
especially those with an in-game economy, will be a lot more
interesting. This is just about regular web sites that have unique
content, but pretty generic functionality.
Will fill in with more info as I learn it.
Option 1: Don’t be a business as defined by CCPA/CPRA.
A CPRA business has to have at least one of the following:
annual gross revenues in excess of $25,000,000
annually buys, sells, or shares the personal information of 100,000
or more consumers or households
Derives 50 percent or more of its annual revenues from selling
consumers’ personal information.
Typical RTB web
ads, as used by a small/medium site, are based on sale of
personal info by the CCPA/CPRA definition. Some other third-party tools,
like comments and social widgets, also sell info on the users. So
if we get 100,000 or more normal ad users in a yearyear, not month (that is, visitors who are not
blocking trackers or opting out of sale), then a for-profit site
is a CCPA/CPRA business, and required to comply with AADC. And if we
have fewer than 100,000 annual users but make half our money or more
from RTB ads, we’re also a business.
Option 1A: don’t run the site as a for-profit business: start or find
a non-profit org to own the domain and assets
Option 1B: stay under $25 million in revenue and either don’t run any
RTB ads or third-party resources, or shut them down for the year after
99,999 visitors saw the ads.
Option 1C: (and this is where it gets tricky, do not know if this
would work) Stay under $25 million, run RTB ads, but do all the right
legal and configuration incantations to put all third parties on the
page into a mode where no sale is happening. Basically treat all
the users as if they had done a CCPA opt out. Does this
work?
Or could we combine 1B and 1C, and flip the ads to opt-out mode after
99,999 users in a year?
Option 2: don’t make the site likely to be accessed by
children. That’s a hard one. Likely to be accessed by
children can be any of these:
The online service, product, or feature is directed to children as
defined by the Children’s Online Privacy Protection Act (15 U.S.C. Sec.
6501 et seq.).
The online service, product, or feature is determined, based on
competent and reliable evidence regarding audience composition, to be
routinely accessed by a significant number of children.
An online service, product, or feature with advertisements marketed
to children.
An online service, product, or feature that is substantially similar
or the same as an online service, product, or feature subject to
subparagraph (B).
An online service, product, or feature that has design elements that
are known to be of interest to children, including, but not limited to,
games, cartoons, music, and celebrities who appeal to children.
A significant amount of the audience of the online service, product,
or feature is determined, based on internal company research, to be
children.
The tricky part here is that COPPA children (under 13) are
different from AADC children (under 18). Trying to comply by
making a site not likely to be accessed by children might work if
we were talking about COPPA-age children, but under-18s? People that age
had better be reading about alotofdifferentstuff—
they’re going to be voting soon. (Is the webcomic Terminal Lancelikely to
be accessed by children because it’s of interest to 17-year-olds
thinking about joining the Marine Corps when they graduate from high
school?)
Option 2 looks like a dead end. We might start off thinking we’re not
running a kids site, but no site maintainer can know if their
niche topic is going to get covered in a MOOC, a YouTube video, a
Discord or Reddit thread, or whatever, and all of a sudden the site
becomes a must-visit for teenage users. (which is not a bad thing,
right? Hey, kids, go get
prepared to win Internet arguments about Sparta.)
There might be a way to do option 2, if the site has
a third-party analytics provider that has age information. Would an
aggregated report on the site audience be good enough to claim that the
site is not likely to be accessed by children? If a site could
rely on this, then that would let you leave RTB ads in normal mode and
not do any of the remaining compliance stuff.
All right, let’s keep going. If option 1 didn’t work, and we don’t
have a way to show that the site is not likely to be accessed by
children, then maybe we actually have to do DPIAs, plus either age
estimation or put every user in the max privacy protection
category.
An independent site is probably going to be running WordPress or some
other open-source or SAAS WCMS. So we’re likely going to be able to ask
around and borrow mostly workable DPIAs from open source (like sites can
borrow the WordPress
privacy policy today) and we will get sent DPIAs by every ad network
or third-party service that we’re signed up for. More paperwork, but
should be something that can be dealt with. (Hosting providers and ad
services might add DPIA management to their service packages, too.)
Age estimation is not age verification (no, you won’t have to pay a
creepy
vendor owned by a pr0n site to track your users, if this becomes law
somebody else will launch a lower-risk estimator service) but
it’s still an extra piece of code to add. But it looks like this is a
task that third party service(s) will either tell sites how to do and
check up on, or do themselves. (We won’t have to do it if we were able
to use Option 1.) If a site kept running RTB ads and didn’t either (1)
put the ads in opt-out mode for everybody or (2) start running age
estimation, then all the third-party services would be taking a
compliance risk. It’s going to be like having to get a CMP, right?
Impact on the future of web ads?
It looks like the options for most sites will be
remove RTB ads and other third-party resources that count as a
sale or share
put all third parties into opt-out mode for all users
maybe???
rely on 3rd-party analytics to prove that the site is not
likely to be accessed by childrenmaybe???
Do DPIAs (copied from the same open source projects as site
code?) and turn on whatever age estimation service the third parties
will require.
This is extra work for legit sites, but I’m cautiously optimistic
about the big picture here. The first rule of understanding the impact
of a proposed tech standard or regulation is that you can’t just look at
the likely impact on small legit sites—you have to look at the relative
impact on small legit sites compared to the impact on harmful
sites/apps/channels (scrapers, terrorists, etc.) and the third parties
that enable them. In this case, AADC means increasing existing
legal risks for apps and third parties, and the advertisers that use
them, so should help push up the value of ads on legit sites. A lot
of advertisers who get told they’re reaching high-value adult audiences
are really having their ads shown in apps for kids—if AADC makes that
harder to do, it’s good news for the legit sites that actually reach
those adult audiences.
To review, the web ad crisis is more of a supply and demand story
than a privacy story. We have a problem mostly because third parties can
artificially increase the number of saleable ad impressions, by offering
ads on content that no advertiser would choose to sponsor. Creepy
trackers mostly aren’t creepy because they want to be creepy, they’re
creepy because they want to sell some advertiser an ad impression they
wouldn’t otherwise buy. In general, regulations and tools that make
third parties reduce the number of ad impressions they can sell are a
step in the right direction.
No one law gets us all the way to the point where a fully-opted-out
site is going to be viable on ad revenue alone, but it’s a step.
Log in to Facebook and check Ad Settings. Look at
Audience-based advertising.
If the name of the company you bought from is in there, they
sold (exchanged for something of value) your info, probably by
sending it to Facebook as part of a Custom Audience, or possibly by
using Facebook Conversions API. Anyway, they broke the law and got
caught.
Simple, right? Looks like a way to make open-and-shut CCPA cases at
scale. The new California privacy agency will be able to just copy over
the same paperwork, because all the surveillance marketers are following
the same tutorials.
Unfortunately, surveillance marketers already have a workaround. I
have seen this doing RtKs (which is a good example of why RtKs matter).
The original company (the business) collects customer
email address from an opted-out customer, and possibly hashes
it.
Business passes the email address, or hash, to a third
party.
The third party passes the email address or hash to Facebook, and
then deletes it. They can’t tell which of their client businesses passed
information on which people (or they claim not to be able to).
That way, the name of the third party, not the name of the business,
shows up in Facebook Ad Settings. Under the draft CPRA regulations, the
third party is required to comply with a Right to Know or Right to
Delete, but as far as I can tell, there’s no additional requirement for
the third party to disclose who the original business was, or to be able
to.
So a business that wants to violate the CCPA can run their Custom
Audiences through a third party, and switch to a new third party if the
old one builds up too many RtDs.
It looks like all we can really do is list the third parties involved
in this scheme and RtD them? I know this is a good argument for why
everybody needs an Authorized Agent service, but it would be less total
work if there were a better way to find the original business that broke
the law.
In 1965, NASA scientists received the first images of Mars from the
Mariner 4 probe, as numbers printed on paper tape. In order to see the
image, they translated the numbers to colors and drew individual pixels
with colored pencils.
[
Today, your web browser probably does something similar, many times a
day. It turns a set of image data, from a file format such as PNG or
JPEG, into a set of colored pixels that you can see as an image.
If, as a user, you had the time, you would probably choose to do
exactly what the browser does. You want to see the images on a web site
you visit. Other browser features, maybe not so much. Because the
browser is supposed to be the agent of the user, a helpful way to answer
the question, should the browser do this? is Would the user do
this themselves if they had time?
For some browser functionality, answers can be found in the history
of technology. People have put bookmarks in books as long as there have
been books. So it makes a lot of sense for browsers to offer users a
bookmark feature. But the more novel a feature gets, the harder it is to
figure out whether to do it without more forward-thinking user
research.
When you buy something, would you tell the seller about every ad
you saw for the thing you just bought? Would you tell them if they
promised to mix up your answers with other people’s and do math on them
so they can’t tell what any one person said?
On your first visit to a new site, would you choose to tell the
site about some of the topics that you’re interested in?
People do provide information about themselves to other parties they
deal with. The browser’s role is to understand and facilitate the
information sharing that people would choose to do on their own, if they
had the time to learn about it, keep the necessary records, and answer
questions. (For example, a browser might offer to auto-populate “where
did you see our ad” fields on order forms, if user research shows that
people are willing to fill in that field.)
A lot of user research has shown that many people don’t like the online advertising
practices of today, but there needs to be more research on what they
would accept. Ultimately the browser works for the user, and it
would be a waste of resources to go too far down a direction that's too
different from what people would choose to do for themselves.
More:PET projects or
real privacy?
It says, A business shall identify the categories of personal
information, categories of sources of personal information, and
categories of third parties to whom a business sold or disclosed
personal information, in a manner that provides consumers a meaningful
understanding of the categories listed. but I don’t see where it
says the business has to disclose the actual specific pieces of
personal information in a way that provides consumers with
meaningful understanding.
So it looks like they can send me a bunch of integer and letter codes
without a key. This is similar to the Verizon RtK response, by the way.
Will probably have to write this up for the CPRA rulemaking. Disclosure
of a score really needs to come with a key or units in order to be
meaningful.
One piece of good news is that they have me down as 80-something
years old, which might help keep me safe from the target selection
algorithms for People’s Liberation Army assassination drones, bodily
fluid harvesting robots, and Texas abortion bounty hunters.
Updated to add some good news from Colorado. Colorado closed the
loophole. The Colorado
Privacy Act Rules state,
Personal Data provided in response to an access request must be
provided in a form that would allow the average Consumer to make an
informed decision of whether to exercise deletion, correction, or
opt-out rights.
and
For instance, the Personal Data must be provided in a form that is
concise, transparent and easily intelligible, and avoids
incomprehensible or unexplained internal codes and identifiers.
Will companies maintain both incomprehensible and incomprehensible
versions of their data dumping code? Or just make one version that’s
compatible with both California and Colorado? Or will California catch
up with Colorado?
This is one where I had to print out and sign a form, and have it
notarized.
As I pointed out before, making Right to Know work is really a
critical first step for all the other CCPA tasks. If you don’t know
which companies have which info, it’s almost impossible to prioritize
who gets a CCPA delete, which requires more effort, and who gets a Do
Not Sell.
If every data broker and surveillence marketing firm could make the
Right to Know process a little different, then it would be nearly
impossible for anybody to get anywhere with CCPA, and we might as well
not have it.
What would be good to see in the CPRA rulemaking is one standard
baseline process for Right to Know, that any company would have to do.
They could, of course, add additional, more convenient
processes, but there should at least be one that is of known
difficulty.
Here is my suggestion.
As a California resident, I go to the California DMV, show my
California ID, and get a stack of printed Right to Know slips. These are
pieces of paper and have my identifying information on them. The DMV is
allowed to charge me for the printing costs.
When I want to exercise my Right to Know, I fill out a company’s
Right to Know form on their web site, and provide my contact info and
postal address.
If the company doesn’t have any info on me, they can email me to
say so.
If the company does have info on me, they send me a Business
Reply Mail envelope.
I put one of my Right to Know slips from step 1 in the Business
Reply Mail envelope and send it back.
The company checks my Right to Know slip and sends me a copy of
my info.
This puts all the sensitive data handling either under the DMV’s
roof, or in postal mail space where mail fraud is a Federal crime.
Naturally, a lot of people will come up with ways to do this more
cheaply and conveniently on the Internet. That would be great. Putting a
simple, standard, postal process in the regulations will set the
baseline: you can’t make it too much harder than DMV+postal, or people
will do DMV+postal.
As a California resident, I have had a right to know how my personal
information is used, since January 1, 2020. On paper, that is. In
practice, it’s a little trickier. In order to exercise my California
privacy rights, I have had to run a lot of mazes. I won’t mention any
specific companies here, but
I have taken selfies.
I have taken a selfie holding my California driver’s
license.
I have scanned my California driver’s license, front and
back.
I have taken a photo of my California driver’s license from an
Android device, had it rejected, found an Apple device, taken a
different photo of the same license, and had it accepted.
I have passed a quiz about my former addresses and bank
accounts.
I have passed a quiz but only by getting some of the answers
wrong because they would have been right if a family member of mine with
a similar name was taking it.
I have printed and signed a document, and scanned it.
I have printed and signed a two-page document, gone to a notary,
had it notarized, and scanned it.
Getting through the Right to Know process can be tricky. And I’m
pretty good at paperwork, I have a variety of different devices to try,
and I have a printer and scanner set up and working.
The reason I’m focusing on the Right to Know here is because it’s the
CCPA right that helps me decide what to do with all the other rights. If
I have a positive response to a Right to Know, then I don’t have to do a
Right to Delete, and I can be more confident in sharing information with
a company. There are tens of thousands of companies out there that might
have some info on me, so I need to prioritize. But realistically,
inconsistent and over-complicated handling of Right to Know by the
companies I buy stuff from and by data brokers means that it’s a
time-consuming effort for me to find out what’s going on with my
personal information.
Under CCPA, I have the right to use an authorized agent. But
authorized agent requests are even more complicated. Companies generally
react to a fully documented authorized agent Right to Know by getting in
touch with me directly and making me run the original maze anyway.
The worst part of all this maze running is sometimes there’s no
cheese at the end of the maze. I have gone through all the work to do a
Right to Know with one company, and ended up with, among other things, a
list of the companies that sent my personal information to them. So,
when I send a Right to Know to those companies, I should be able to get
some information, right?
Not necessarily. Sometimes they claim not to have any information
about me.
And in the case of one high-profile company, I can look up the public
documents from an ongoing privacy lawsuit, and read employee depositions
stating that they have certain kinds of information. But it’s
information that they don’t disclose to me. A company shouldn’t be able
to testify to one thing in court and then tell California residents
something else.
In the 2020 election, Proposition 24 was supported by an overwhelming
majority of California voters. Today, the CPPA has an opportunity to
implement the intent of California voters by adopting regulations that
make it practical for everyone to exercise their basic privacy
rights.
As a California resident, I should be able to use a simple,
standardized Right to Know process, such as being able to request a
standard paper form and a Business Reply Envelope. Naturally, businesses
and their service providers should be able to compete to offer consumers
a simpler, faster online process as an alternative to paper forms and
trips to the mailbox. But without a guarantee of a common, baseline
simple opt-out process to fall back on, we’re still going to be stuck in
a maze next year.
The remarkably
popular California Privacy Rights Act (CPRA) takes effect on January
1, and that means the California Privacy Protection Agency (CPPA) is
coming. But right now, it doesn’t look like surveillance marketers are
that worried. The agency has a $10 million budget, and that’s chump
change around here. The Big Tech legal departments and the big Palo Alto
law firms probably spend more than that on guacamole.
The CPPA will be able to do a lot per case, but won’t have the budget
for too many cases. To some extent, they’re going to have to rely on
uncertainty. As a famous (but fictional) California lawman once said,
Did I fire ten million dollars worth of privacy enforcement, or only
nine? But being this is the CPRA, the most powerful privacy law in the
United States, and would blow your head clean off, you’ve got to ask
yourself one question. Do I feel lucky?Maybe I got that movie quote wrong.
Need to check it.
It’s not as bad as it looks, though. The new agency doesn't have to
deal with a zillion different CPRA violations. All the surveillance
marketers did the same Big Tech training. To put it in terms that Big
Tech will understand, CPRA enforcement scales. A lot of cases
are going to be basically identical because a lot of the violators
watched the same growth hacking videos and read the same documentation.
The big companies have cleverly shifted a bunch of the CCPA (and soon,
CPRA) risks over to the smaller companies. Kind of
like how Amazon understands the risks of operating vans on crunch
schedules in residential areas, so offloads those risks on Amazon
Delivery Service Partners. That's the safe choice for them
from one point of view, but it means that the CCPA/CPRA violator is
technically not the big, lawyered-up company. The violator is a small
company with a small legal budget, breaking the law in the exact same
way as a bunch of other companies.
Big Tech's decision to outsource the legal risks means that CPPA will
not have to put a lot of lawyer time into each violation. Just like New
York City has a Citizens
Air Complaint Program to handle one kind of common pollution
violation (idling trucks), the CPPA will be able to do the same
complaint over and over. Get the person whose info was misused to fill
it in with screenshots and/or attachments, and all the CPPA lawyer has
to do is check and sign it.
Here are a few common violations where enforcement will be able to
scale quickly.
Using a GDPR workflow to do a CCPA job. Under CCPA,
you can opt out of sale of your info. Under GDPR, you can
object to processing. Similar concepts, but the details are
different. Under GDPR, a company can make people go through an extra ID
verification step for an objection. Under CCPA (and soon, CPRA), an
opt-out has to be handled without ID verification unless the recipient
has a good-faith, reasonable, and documented belief that it is
fraudulent, and sends it to you in writing when you opt out. GDPR compliance doesn’t
always get you CCPA compliance. Companies have had a couple years to
fix this, but it's still pretty common. Info needed:
Forwarded email or screenshot of illegal verification step.
GPC order, followed by sale/share. This is
easy to check. Turn on Global Privacy Control in your browser. Order
something on the web. Set a reminder to come back later to see if the
company transferred (sold or shared) your info to Facebook.
Info needed: screenshot of GPC install/activation,
screenshot of Facebook Ad Settings.
Inconsistent data. Another good use of Facebook Ad
Settings. Many companies default to dumping customer info over to
Facebook, without really thinking about it. Then, somehow, some of them
leave out the info that went to Facebook when they answer a Right to
Know. Easy to spot. Info needed: copy of RtK results,
screenshot of Facebook Ad Settings.
The Attorney General's office already has a Consumer Privacy
Interactive Tool to help you report companies that fail to include
the required Do Not Sell My Personal Information link on their
web sites. Similar tools will be easy to add for the common
violations—so that $10 million will go further than it looks.
So that’s a story, but from here in California the really interesting
risk disclosures are a few that aren’t there. Meta has some risks
related to ongoing violations much closer to home.
The company has not been acknowledging CCPA Right to Know (RtK)
letters from California residents who do not have Facebook accounts. If
Facebook still maintains shadow
profiles on people who do not have accounts, then refusing to
disclose them to the person described in the profile is a CCPA
violation. A person cannot be required to agree to a company’s Terms of
Service in order to exercise their rights under CCPA. (Or maybe Facebook
already bulk-erased the shadow profiles of anyone from California who
they don’t have signed up as a user?)
Meta is ignoring or mishandling Authorized Agent RtKs. This is a
fairly common issue at surveillance marketing companies. Authorized
Agents are still rare, and a lot of companies don’t have a process in
place to handle them correctly. But a high-profile company like Meta is
likely to get a bunch of agent RtKs, and it’s reasonable to expect them
to disclose the associated risks.
See the CCPA
Regulations for more details. A Download your Information
portal is allowed for complying with CCPA Right to Know, but only for
people who already have accounts, and a business maintains a
password-protected account with the consumer, and only when the
portal fully discloses the personal information that the consumer is
entitled to under the CCPA.
All three of these items are much more likely than the We will
likely be unable to offer a number of our most significant products and
services, including Facebook and Instagram, in Europe scenario. So
they could really use a mention in the next Form 10-K or other
documents.
We can ignore the first round of industry group freakouts over the
bill. It’s unrealistic to play the bad for small business card
this early. Even though a lot of small businesses use surveillance ads
today, legit small businesses could end up better off after a ban if
they don’t have to keep playing negative-sum games against scammers
trying to pick off their customers. Hard to predict the impact of any
policy change on any advertiser until you know what they’re contending
with in the existing system. (Yes, the 1971
ban on cigarette ads on TV seems to have had two effects: a
short-term increase in profits for all tobacco firms, and an increase in
concentration of profits for the largest brands. However the cigarette
market was essentially all legal product vs. legal product competition,
and not a single market including both honest and
deceptive advertisers like today’s social media advertising.)
A surveillance ad ban won’t make people buy less stuff, but in the
absence of surveillance ads, different gatekeepers will be more
important. The big, obvious winners from a surveillance ad ban would be
SEO, content marketing, affiliate programs, and search advertising. If
you can’t find
Kevin, you have to make yourself as findable as possible when he
tries to find you.
Another set of winners in the post-surveillance-advertising economy
will be niche content sites, especially sites that cover
shopping-friendly content categories (like travel, fashion, and home
improvement) and sites that run product reviews and buyers’ guides.
Shopping help forums on general-interest sites, like /r/buildapc, will be more
and more important for marketers to keep up with, and niche retailers
with a curated selection will get a new amplification role in some
categories. Measuring the effectiveness of non-surveillance advertising
requires different skills, so that’s an opportunity for some kinds of
research firms. A not-so-surprising winner when you think about it will
be multi-level marketing. MLM spreads person to person.
So is surveillance advertising going to take as long to get rid of as
leaded gas did? Personally, I don’t think so. The winners from the
leaded gas ban were numerous but dispersed: people
who live near a lot of motor vehicle traffic and people who might be
crime victims. The leaded gas ban created no localized near-term big
winner businesses, but there will be some for a surveillance ad ban.
Expect a second round of industry group comments as the companies that
will win from a ban get involved.
Time to look at some specific issues in the bill. Some
suggestions.
Remove or limit the Custom Audiences exception.
Subsection (b) has some language on Custom Audiences that
provides some protection but not enough.
Paragraph (1) does not apply to the targeting of the dissemination of an
advertisement based on information described in clauses (i) through (iv)
of subparagraph (B) of such paragraph that is provided to an advertising
facilitator by an advertiser or by a third party on behalf of an
advertiser, if the advertising facilitator is provided a written
attestation that the advertiser is not in violation of subsection (b)
with respect to such information.
This is interesting in principle, but has a negative space problem.
The larger a platform gets, and the more Custom Audiences it receives
from more advertisers, the more that users can be targeted just based on
which Custom Audiences they’re not in. Even if no single
advertiser’s Custom Audience reveals membership in a protected class,
the platform will be able to infer membership if it has enough lists of
non-members.
For example, Facebook has an internal cascade
of classifiers that use inputs from multiple sources to place
both scam and non-scam ads. This placement decision happens per ad
impression, in real time. As Facebook gets lists of finance publication
readers and accredited investors as Custom Audiences, not only do the
legit advertisers get the ability to target their audiences on Facebook,
the system learns how better to match the precious
metals scams with the most vulnerable targets. The Facebook database
does not have to store a list of vulnerable targets for a scam.
Vulnerable people receive the scam ads as the result of an on-the-fly
decision, as the inevitable effect of not receiving the ads
intended for members of the well-informed Custom Audiences.
Large platforms that receive lists of people who work in healthcare,
or are interested in legit health content, end up targeting users with
limited health knowledge for health-related disinformation.
Well-intentioned NGOs, by using Custom Audiences listing their informed
supporters and donors, are unavoidably helping to place disinfo and scam
ads for the other side.
A law that addressed some older ad targeting practices while leaving
Custom Audiences as an alternative could easily do more harm than good.
And from a communications point of view, a Custom Audiences exception is
confusing. Personally, the first question I generally get about any
privacy law or tool is, Will this keep (some company) from sending my
info to Facebook? Leaving an exception for just that means that this
bill is letting through not just some of the highest-risk surveillance
advertising practices, but some of the least accepted.
A minor problem with subsection (b) is that all an advertiser needs
to supply is a written attestation. There are a lot of sketchy
Facebook advertisers who will sign a written attestation about anything,
and advertisers who are advertising in one jurisdiction from another
jurisdiction, making it hard to enforce the law against them. Makes no
sense to let a big platform avoid accountability because of one possibly
meaningless document.
Conversion
tracking is similar to Custom Audiences, but sends one customer
record at a time instead of a list batched together. The law has to
apply fairly to both.
Figure out how to allow user-configured profiles.
Some ad platforms allow users to limit the number of gambling or alcohol
ads they see, or to set other ad preferences. As long as profiles (along
with surveys, preferences, and similar features) that affect ad
selection are clearly under user control, it makes sense to allow them.
Best to let the pro-personalization
users get what they say they want, and don’t tell people that they
can’t choose to avoid ads for a product category they have a problem
with.
An ad can be personalized but not surveillance if it’s matched to the
person receiving it based on information that the person intentionally
supplied, knowing that it is being used for advertising, and
with the option not to do it. This might be a reader survey, add this
topic to my interests button, or some other feature. The
personalization section of the law will need some careful work, because
there is a risk that a large platform could end up with too many
non-members of protected classes with personalization turned on, causing
legit advertisers to go for the personalizers and leaving the
non-personalizers to the scammers. A trustworthy content site with a
high response rate to its reader survey might be able to safely use
profiles for ad placement while some larger, lower-trust sites might
not.
In the future, some users might choose to have their browser share an
interest profile with some sites they visit. If a browser team can make
a profile sharing feature, and fairly convice some users to turn it on,
there should be a way to make it legal. (Browsers sending profile identifiers to sites
without the user’s knowledge: still bad.)
Reform 47 U.S. Code § 230 to exclude surveillance
advertising facilitators and providers of algorithmic social feeds from
the definition of interactive computer service. This safe
harbor was passed in the days of Usenet and simple mailing lists and
web boards, which were interactive under control of the user or
the user’s manually maintained subscription and block lists. Safe harbor
is appropriate for content hosting or store-and-forward, but it was
never intended for the kind of active selection and promotion of
messages that surveillance advertising firms do.
Phase the ban in. If small businesses take longer to
adjust to technical changes, give them longer, just like the EPA gave
small refineries additional time to phase out lead in the gasoline they
produce. Smaller databases present less risk, and larger companies
have more technical capacity to comply with a ban. Possibly apply the
ban in year 1 to companies with 100 million or more records of PII, then
10 million or more in year 2, 1 million in year 3, and so on. By the
time it gets down to a small company’s list, post-surveillance tools and
services will be well-tested and easy to switch to.
Running more productive CEO hearings
There are probably going to be committee hearings about all this at
some point, but nobody wants another, I’ll have my staff get the
answer to that hearing with minimally useful testimony from big
platform CEOs. No CEO is going to be able to answer the important
questions about how this stuff works in reality without preparation.
Instead, please share some material in advance.
Ask constituents for screenshots of scam ads, along with their
identifying info and permission for the company that placed the ad to
look up how they got the ad and discuss it with member of Congress and
staff.
Share a selection of scam screenshots and user info with the
platform CEOs in advance. Let them know that you are going to ask,
how did this exact person end up getting this exact ad? so that
they can bring the answer to the hearing.
When they’re in the hearing, live, the answer will give you a
better idea of how the system works.
Some work is still needed to figure out the possible loopholes in a
surveillance ad ban, so it would be good to get some reformed
surveillance advertising pros to go over it a bunch of times and game it
out. I’m sure that I have only thought of a few parts of what needs to
be done here.
As
We May Think by Vannevar BushA record if it is to be useful to
science, must be continuously extended, it must be stored, and above all
it must be consulted.
The
Anatomy of a Search Engine Currently, the predominant business
model for commercial search engines is advertising. The goals of the
advertising business model do not always correspond to providing quality
search to users. For example, in our prototype search engine one of the
top results for cellular phone is “The Effect of Cellular Phone Use Upon
Driver Attention”, a study which explains in great detail the
distractions and risk associated with conversing on a cell phone while
driving. This search result came up first because of its high importance
as judged by the PageRank algorithm, an approximation of citation
importance on the web [Page, 98]. It is clear that a search engine which
was taking money for showing cellular phone ads would have difficulty
justifying the page that our system returned to its paying advertisers.
For this type of reason and historical experience with other media
[Bagdikian 83], we expect that advertising funded search engines will be
inherently biased towards the advertisers and away from the needs of the
consumers.
A block contains a list of claims and a list of parent blocks. In
order to check if a block is valid, you also need a block with a claim
“[blob] is a valid signature for [block]” and a copy of the signature.
You can’t check a block’s validity in isolation.
When choosing among blocks to form a chain, choose the oldest. When
choosing which of two conflicting blocks to remove in order to make a
chain valid, remove the oldest. The older a block is, the more likely
that other nodes will already have a copy.
What does improving web advertising mean? Some possible
answers.
Advertising reaches the intended human audience
Advertising is placed in a context appropriate to the product or
service advertised (brand safety)
Placement and reporting complies with information sharing and
processing laws and norms (privacy where the information is personal,
data leakage where the information is business)
You can’t crank up any of them and leave the others go. Fail on (3)
and deceptive advertisers outcompete
the honest ones so the whole medium becomes untrustworthy. Fail on
(2) and low-quality scraper
and misinfo sites can outcompete legit ad-supported sites. Fail on (1)
and the advertisers stop supporting the web, and all the ad money goes
to Facebook, which is
sub-optimal for some or many advertisers depending on time
preferences. There are enough deceptive and/or
hostile parties out there that some of the failure modes for web ads are
risks to civilization, so you are probably set if you’re trying to
justify working on this stuff to Future You.
The fun part, though, is that web advertising can’t really work the
same as other online (software-intermediated?) ad platforms. The
web is (user + client)/server. The client software is
the agent of the user and is expected to operate in the user’s
interests. Other online ad media such as native mobile apps and
connected TV are user/(client + server) The client
software is provided by, and under the control of, another party. The
user may even be a party to an end-user license agreement where they
agree not to interfere with the client software’s ability to act in
another party’s interests.
It’s unrealistic to expect web ads to work similarly to app and CTV
ads. Web ads in their improved state are going to work more like
an ad medium in which there is no client software (like print ads) than
they will work like an ad medium in which the client software operates
in the interests of a party other than the user. This is probably a good
thing, because if the user is in the loop, the best way for an ad to
pay for the resources and attention it consumes is to participate
in a content→signal→reputation spiral that is hard to do in
less user-aligned ad media.
When you measure include the measurer. — MC Hammer
Rebuilding web advertising without the third-party cookie is not
really a privacy story. It’s a market design story. And it’s probably a
mistake to think about third party cookies as some ideal system of which
a fraction needs to be clawed back while keeping privacy. The
limitations to measurement are complicated. (There is a lot of math
needed to understand the reasons behind The
Refrigerator Test and I’m still trying to figure it out.)
The Identity
Fragmentation Bias by Tesary Lin and Sanjog Misra. This paper
studies the identity fragmentation bias, referring to the estimation
bias resulted from using fragmented data. Using a formal framework, we
decompose the contributing factors of the estimation bias caused by data
fragmentation and discuss the direction of bias. Contrary to
conventional wisdom, this bias cannot be signed or bounded under
standard assumptions.
Inferno:
A guide to field experiments in online display advertising by
Garrett A. Johnson. 19 Jul 2021 Online ad experiments suffer from an
extreme statistical power problem, which limits how much can be learned
from experiments. … Some of the coming changes to online
advertising promise improvements for experimenters.
The
Unfavorable Economics of Measuring the Returns to Advertising -
2015-lewis.pdfThese initial findings show that when advertising
at a level of intensity typical of digital advertising, [randomized
controlled trials] require sample sizes in the single-digit millions of
person-weeks to distinguish campaigns that have no effect on consumer
behavior (100% ROI) from those that are profitable (ROI >0%). …
Identifying highly successful campaigns from ones that merely broke
even is not an optimization standard we typically apply in economics,
yet our analysis shows that reliably distinguishing a 50% from 0% ROI is
typically not possible with a $100,000 experiment involving millions of
individuals.
Internet
Activity Bias Causes Lumpy User Behavior by Jakob Nielsen This
phenomenon is called activity bias: some days, people do a lot online;
other days, they do very little. On very active days, people are more
likely to do both Activity A and Activity B, no matter what A or B might
be…. Crucially, even if there is no relationship between A and B, the
very fact that you observe users doing A means that they are likely to
be having one of their more active days and therefore are also more
likely to do B.
Putting a tax on surveillance marketing is sometimes suggested as a
solution to a classic externalities problem—firms benefit from
surveillance marketing, but the costs and risks are paid for by the
people surveilled. A Pigovian
tax is the go-to fix for this situation.
Where to put the tax is the problem. Taxing specific marketing
practices probably creates more overhead and risk than it’s worth. Too
easy for surveillance marketers to work around. So it seems like the
best approach would be to expand the existing “data broker” registration
laws to put scaled reporting and tax requirements on any database
containing PII. From an economic POV, personal information is digital
hazmat, with both potential harms and possible future benefits that a
regulator is not in a position to evaluate but the users are.
Just like RCRA
resulted in some marginal uses of hazardous materials being phased out,
a Pigovian tax would likely cause companies to get rid of some high
cost/benefit surveillance marketing data on their own to avoid reporting
and taxation.
In general, it is important to tie the tax to the data (and therefore
risks) and not to specific practices. Users of the data are best able to
decide how to balance the risks and rewards.
(This is rough, still trying to figure this out. Based on a thread in
a surveillance marketing forum.)
Here’s a surveillance marketing problem: some company gets a web site
visitor, but can’t identify them. They want to target the same person
with Facebook ads or email spam.
For the company, one solution is to put some third-party JavaScript
on the page that generates an email address for an anonymous visitor.
Examples:
LeadPost
leverages a network of match providers to convert your anonymous bounces
into actionable leads. All lead data includes fully verified name,
address and email and may be used for unlimited marketing
purposes.
GetEmails offers
Anonymous Website Visitor Identification, best explained in their
video, How
it Works In the Kitchen. We hash the email addresses in the huge
f’n database, we then match the hashed email addresses in the cookies to
the hashed email addresses in the big f’n database, and we pass you a
record. It is magic.
On the identification side, we could use existing technology to
identify up to about 35 percent of US traffic. On the data side, we
could compile an enormous list of third-party opt-ins through business
developement with lead generation companies. We connected the two
parts together, and lo and behold, it worked. Email-Based Retargeting
was born.
This might be why I sometimes get completely inexplicable companies
showing up as Advertisers using your activity or information in
Facebook Ad
Preferences. What if this happened?
User A visits a web site with some email-finding JavaScript on
it.
Email-finding JavaScript misidentifies User A as User B.
Company adds User B’s info to their CRM system and uses it to
send spam (generally, CAN-SPAM compliant email that is spam according to
norms and reputable mail server ToSs, not spam according to US Federal
law) to User B, and adds User B to a Facebook
Custom Audience (not a Website
Custom Audience like they would have gotten by using the Facebook
pixel, a Customer
List Custom Audience as if they had gotten the email with
consent.)
The problem is: what happens if User B had Global Privacy Control
turned on? The company would have picked up on it, and set the “Do Not
Sell” flag to apply to User B’s information, but User B didn’t visit the
company’s site. User A did. So now User B sees their info in a place it
shouldn’t be, and the company is in CCPA trouble for mishandling the
information of someone who never even came to their site.
Part of the solution seems to be for the third-party vendor to keep
track of everyone they have seen a Global Privacy Control for on any
site, and never return that person’s info in step 2 above. But I’m not
sure if this covers it. Anyway, this might just be more about what
the heck is that company I’ve never heard of doing in Facebook Ad
Preferences? than anything too significant.
Or I’m seeing those weird Facebook ads because the company just
bought an old spam CD and made custom audiences out of that. Probably
easier.
The main point here is that Global Privacy Control affects all
marketing projects at a company, not just web advertising.
If you have a GPC from a customer, it affects everything
including sales of postal direct mail lists, social media custom
audiences, all of it.
If you use GPC when you place an order, and you give the seller
your info, then GPC affects all of what the company can do with your
info, not just your current web session or order.
Anyway, they put it up as a YouTube video, so if I want to make an
audio file to listen to offline I should download and convert it.
To prevent the Linux system from suspending or hibernating when a
user is logged in, you can put this file,
10-no-sleep-ssh.sh into /etc/pm/sleep.d. Owned
by root:root, mode 755.
#!/bin/bash
case "$1" in
suspend|hibernate)
if (who -u | awk '{print $2}' | grep -q pts/); then
logger 'ssh user(s) logged in'
exit 1
fi
;;
esac
The who -u command lists the users who are logged in. A
remote user will be on a pseudo-terminal and have “pts” in the second
column of their line in the output.
If this script exits with an error (exit 1) the system
will be prevented from sleeping.
To me, it looks like there is a gap in the Internet advertising
market that might be explained by the actions covered in that
complaint.
Complexity
Low
High
Customer lifespan
Long
?
high-quality sites
Short
Facebook
misinfo sites
Right now if companies want to buy advertising with low
complexity—ease of placement and easy-to-interpret attribution data—they
can get it from Facebook. But because of the buy-or-bury scheme,
it comes as part of a single service that also tends to killoff
a fraction of the customers, through medical misinformation, extremist
violence, climate disasters, and so on. This ad option appears to meet
the needs of advertisers with shorter time horizons, such as several
quarters of DTC startup runway or CMO job tenure. Most of the years of
customer life lost will not enter into consideration because they will
take place outside the time frame relevant for making the decision.
Judy
Shapiro points out that In 2017, just five years ago, the CPM to
run a Facebook ad was $4.21, meaning you had to spend $4.21 to reach
1,000 impressions, presumably people. Today, the price is $13.87 – a
whooping increase of 222%.
Judging by that, the combination of advertiser convenience and
lowering customer lifespan seems to be a winner. Some advertisers,
however, have investments in facilities and goodwill that make them
prefer longer customer lifespans but lack the advertising know-how to be
able to run high-complexity advertising. It is possible that one of the
companies acquired by Facebook might have, if it continued to operate
independently, been successful in the low-complexity/high-lifespan
niche. Today that niche seems to be an opportunity.
Update:CMA
caught this. Google previously committed to not imposing a
penalty in search rankings on sites which opt out of Topics API. This
should be extended to apply to all Privacy Sandbox technologies.
The key commitment that I was looking for in there is something about
how a site’s decision to use Google Privacy Sandbox will or will not
affect the ranking of search results from that site in Google Search and
Google News.
It would not have taken them much time to add that, plus of course a
little extra lawyer time depending how many lawyers need to read it
before they post. But that didn’t happen. Which makes the missing
sentence the loudest part of the whole thing.
By now it’s pretty clear that we’re going to continue to have
arguments about the future of web advertising, and then, whatever way
those arguments turn out, everyone is going to do everything exactly
Google’s way because Google dominates search. For example, Google FLoC
opt-outs are a thing now, but how many sites will stay opted out even
after the SEO impact becomes clear? Wikipedia, GitHub, and Amazon have a
high enough profile that they’re probably fine, but how will The Markup do in search results,
against other sites that cover the same topics in a more Google-friendly
way?
Instead of sending an anticompetitive message in a document that can
be part of discovery in a lawsuit, or stating it in a meeting that can
be recorded or testified about, Google can encode the message in
patterns of search results so that SEO consultants can reconstruct it
for their clients. No message text has to be shared, but an internal
message M at Google can become M′ at the recipient
company. There is probably a really good information theory paper about
how this works.
Think twice before sharing: this is a good point,
maybe more than twice. For example, if you Tweet the date that your
coronavirus vaccination became effective, surveillance marketers who
know where you live can count backward to the date you became eligible,
and learn something about your age or health history.
Respect others’ privacy: Good advice, and don’t ask
why someone declined permission to have their info shared, or
pressure people. (And if you have information or photos pertaining to
somebody who you no longer have a relationship with that would make
having that material appropriate, destroy it.)
Create privacy spaces: Now I want to do this.
Say no: This situation is actually getting a little
better this year—consent dialogs are getting cleaned up. I’m
not seeing as many of the first generation more dark pattern-ish ones.
But see the Cookie AutoDelete link below. If you consent by
mistake, a tool can sometimes fix it.
Choose privacy:Vizio
makes nearly as much money from ads and data as it does from TVs, so
it’s not surprising that it’s hard to get a TV without smart
features. My solution is to just connect HDMI and antenna, never WiFi or
Ethernet. The part about not having an Amazon Alexa device in the house
seems kind of obvious, but maybe some families disagree? The good thing
is that couples who are dating now can find out if they have compatible
views on smart speakers before sharing a home. (and maybe figure
out whether augmented reality surveillance glasses are a
relationship problem now, before you end up having to break up with both
a person and with the corporation surveilling you from their face?)
Under devices there is a very good point about not keeping
apps you don’t need. Also check devices for pre-installed, unremovable
apps before buying, to save yourself the hassle of having to return the
device. (Any phone manufacturer that would stick
unremovable surveillance bloatware on their product would probably
make other bad decisions, too.)
Messaging apps: Good points. If this section isn’t
enough to keep you off WhatsApp, there are other issues. I’m still
concerned that WhatsApp is going to rub some people’s patriotic
sensibilities the wrong way, since WhatsApp makes
some countries, including India, accept a lesser privacy policy that
they won’t inflict on, say, Germany.
Just to stay out of that whole contentious national pride issue, I’m
sticking with Signal which doesn’t discriminate by country this way.
Email: Surveillance marketers have figured out the
+ addressing trick, unfortunately. It does make sense
to keep a separate address for low-priority marketing mail.
Browsers: Good point to have more than one. I like
Firefox, tricked out with appropriate extensions, for daily use but keep
a totally stock copy of Google Chrome just in case, for sites with
compatibility problems. (I never use Chrome for buying anything. Why
reward sites for compatibility problems by buying stuff from them?) If
you have a computer that came with Apple Mac OS or Microsoft Windows,
both of those come with good browsers now. So if you want to use Firefox
as your main browser, you can just use the browser that came with your
computer as the backup.
Use privacy extensions and tools. Here is the one
piece of bad advice in this chapter: Adblockers are easy to find and
install. Don’t do this.Seriously, don’t do
this. The ad blockers that are easy to find are generally the
sketchy ones. Unfortunately, the browser extensions directories tend to
fill up with adblockers that either have a
so-called Acceptable Ads feature that lets tracking through, and that
you have to change some settings to turn off, or are actual adware.
(If you install an ad blocker and start getting ads on Wikipedia,
you got the second kind.) And somehow the ad blockers that allow the
Google trackers through seem to come up higher in a Google search.
I use the built-in Enhanced
Tracking Protection in Firefox, which you can set to standard
for basic protection and strict for more protection but possible
breakage.
Legit ad blockers are generally effective against tracking by third
parties on regular sites, but that’s not as big a surveillance risk as
the Facebook sites and YouTube. For Facebook in the browser, there’s Facebook
Container (you did remove the native app, right?), but cleaning up YouTube is
a little harder.
Yes, third-party cookies are the easy surveillance technology
to block, but deal with Facebook and YouTube first. This whole
project is not just about saving yourself a little annoyance today, but
also about incentivizing CMOs and VCs to expect lower returns on future
surveillance marketing and invest based on that. We win when we can
change the level of risk we all have to deal with in the future.
Another extension that I find valuable is Cookie
AutoDelete. Automatically cleans out the cookies that a dark pattern
got you to consent to, while letting you keep cookies for sites
where you want to stay logged in. Much easier than clearing cookies
manually, which a lot of people apparently still do.
Demand privacy: Sending opt outs is one privacy task
where it really helps to be in California. The high-priority companies
to send CCPA opt-outs to are
companies targeting you on Facebook. (This is where keeping a
Facebook account is actually a privacy win. It’s a waste of time to
actually CCPA
Facebook but their ad settings can reveal who has uploaded your info
to them and needs to get a CCPA opt-out.
Breaking the kill chain
What’s the most effective place to break the surveillance marketing
kill chain? Let’s
look at it as a cycle. People buy stuff, surveillance marketers use
sales to measure the effectiveness of what they have been doing, use
those measurements to guide target selection, then place ads and collect
reports on the ads. The reports feed into the same effectiveness
measurement as sales.
surveillance marketing kill
chain
(Feel free to pick this model apart. I plan on revising it some more
once I have figured out what I missed.)
Ad placement looks easiest because it’s is where you can use ad
blockers. Fine for run-of-the-mill sites where the ads come from a
separate ad server, but native apps and non-web ads are still going to
get through here. Blocking the most blockable ads is only going to
incentivize ads to move to non-web ad media. Most likely case is that
more ad money ends up supporting
deadly viruses, climate disasters and terrorists at the big social
sites.
Ad reporting is also affected by ad blockers, and you can use tools
like AdNauseam to
make it less effective. Promising, but like ad placement an area where
the more evil ad media are less affected.
Measuring effectiveness and target selection are both internal to the
advertiser. Could be good for insiders to work on, but not a lot you can
do from the outside.
That leaves buying stuff. What if the most effective place to take
action is between putting something in your web shopping cart and making
a purchase?
Compared to the number of trackers and third parties you interact
with, the number of companies you actually buy from is small
Opt out will cover all downstream uses of your info, not just the
ones you know about or have a tool for
Surveillance-based ads can only justify a higher rate per ad
impression by connecting ad reporting to sales.
This gets way easier when retailers have Global Privacy Control
support and clean up their dark patterns, so could go from privacy
nerdery to mainstream fairly smoothly.
Facebook is a good example
here, because their system is designed to limit outward transfer of
information. If a company buys ads on Facebook, they can’t ask Facebook
for a list of who saw their ad and then compare to the list of people
who bought stuff from them. Instead, the advertiser has to:
Integrate Facebook software into their web site, apps, and/or CRM
system. (Facebook makes this software easy to set up, and only the web
version is easy for the customers to block.)
Use the software to send reports on their customers, saying who
bought what, to Facebook. The advertiser generally doesn’t know which
customers have Facebook accounts, and most people have a shadow profile
anyway, so they just dump everybody’s purchases to Facebook. This can be
done in a server-to-server
way that does not depend on the browser.
Facebook generates a report for the advertiser so that the
advertiser can compare their Facebook advertising to their
sales.
Step 2 is hardest to keep from happening, since this information
transfer has the most technical and legal armor around it. But it also
carries the most reward per bit of personal info transfer blocked. The
more I can exercise my right to stop the advertisers from doing this,
the less justification they will have to send money to Facebook instead
of to ad media with positive externalities.
It makes sense not to RtD Facebook, since
with the current state of things you will have a shadow profile with
them anyway. Instead, RtK Facebook and either opt out or RtD the
advertisers who feed into their system.
How to keep the wrong sorts of people out of your place, using Google
FLoC.
Post a FLoC checker QR code.
FLoC QR code
Make everybody scan it when they come in.
Check their phones.
Green: good cohort
Red: bad or missing cohort. If they turned off
FLoC or installed an alt-browser, they’re probably hiding that they’re
in a bad cohort.
Blue: Apple Safari, you figure it out.
(no, I don’t think anybody should actually do this, but it’s
only a little bit of
JavaScript so people are certainly going to try. So it could turn
out that non-Google browsers will end up having to provide a realistic
cohort instead of just not implementing FLoC.)
The big question around Google FLoC is whether or
not some of the FLoC cohorts, which are group identifiers applied by the
browser and shared
with all sites, will match up with membership in legally protected
groups of people. Will cohorts turn out to be the web version of
Facebook’s old Ethnic
Affinity Groups, also known as multicultural affinity
groups?
If we do have an issue where racially specific targeting is
incidentally created by the ML system what happens when advertisers
target for or against it and who ends up responsible?
FLoC developers are planning
to use sensitive-page classifiers to check which cohorts match up to
sensitive groups of pages in web history. Unfortunately, checking
page content is not going to give them protected group
membership for the users. A simple US-based example is school
and neighborhood patterns. A school that is mainly attended by members
of a single ethnic group is going to have page content that’s mostly the
same as all the other schools in the district. The schools all have
similar events and play the same sports, but serve
different groups of students and parents. So, even though the
content is non-sensitive, the cohort is. And local
stores with similar merchandise in different neighborhoods are going to
get different ethnic affinity groups, I mean cohorts, of visitors.
Content in language A could be completely non-sensitive, and local
content for region B could be completely non-sensitive, but the cohort
of people who use language A in region B could be highly sensitive.
So it might look like nobody will be able to tell which cohorts are
really ethnic affinty groups until some independent data journalism site
manages to do a study with a panel of opted-in users. This would be the
kind of AI
ethics research that is bad for career prospects at Google, but that
independent organizations can often come up with the funding to do.
But one company doesn’t have to wait for the study and resulting news
story. Facebook has enough logged-in Google Chrome users that
they could already know which FLoC cohorts match up to their old ethnic
affinity groups. If a brand buys ads on the open web and relies
on FLoC data, Facebook can see when the brand is doing crimes. This
doesn’t mean that Facebook will disclose the problem, since it gives
them something to hold over the brand. No more making any stink about ad
metrics or Facebook
Groups IRL get-togethers. The extra risk for the advertisers means
lower expected revenue for ad impressions tied to FLoC—because of
uncertainties that are hard for anyone else to see.
Inspiration for the title for this post:
Your probabilistic ID is just fingerprinting. Change my mind.
(update 6 Aug 2023: how did this tip affect my usage of
YouTube?)
Time for another Internet tip. Hope this is useful, since I think my
problem was a pretty common one.
I want to be able to share a video link from a business-related
event and not have a “lizard people run the Federal Reserve” video
auto-play at the end of it.
I want my family members to be able to click a video on how to
play a game or cook a recipe without constantly having to check the
recommended videos for those prolific “Fourteen Words” guys.
I don’t want to watch an exercise video and then have it roll
right over into the miraculous bleach and vinegar diet video.
And finally, I want to cut back on how much of my personal or
business info is used to target me with ads.
Yes, this means cleaning up YouTube. Ideally there would be just one
tool I could install, like Facebook
Container, but it’s a little more work than that. This is going to
take two extensions and a little configuration. But in my opinion it’s
worth it to do this once instead of having to deal with “oh crap,
YouTube is showing WHAT?” over and over. Ready?
Click the button to open the container menu, click “Manage
Containers,” then “New Container”. Make a container for YouTube. I
picked red as the color code for tabs opened in this container. (If you
already have Facebook Container, which I recommend, you’ll have a blue
stripe for Facebook tabs.) You can also pick a small icon for the
container.
Now that the container exists, go to the YouTube site, open the
container menu, and select “Always Open This Site in…” Pick the YouTube
container from the list.
Now when you follow a link to YouTube, the page will open up in the
container, with a stripe on the tab showing the color you picked. You
shouldn’t be logged in to YouTube with your Google account if you have
one.
Part one done. This should limit the use of my info from other
services to target me for scams and crap in video ads. This should not
require any extra clicks after the original setup. You can even
customize the Firefox toolbar and move the Multi-Account Containers
button to the overflow area, so you don’t have to think about it unless
you want to do this for another site.
Part two: patch the YouTube rat hole
Time for another extension: Enhancer
for YouTube. This one will change some of the creepy “engagement”
promoting behaviors of the YouTube site design.
Install, then open up the preferences. Hamburger menu → Add-ons →
three dots to the right of the Enhancer for YouTube entry.
This preferences menu has a whole bunch of stuff on it, take your
pick. The essentials are:
Disable autoplay
Hide related videos
I also hide commentsI stopped
hiding comments as I found channels with more useful links in
comments. and use the Automatically enable YouTube’s Theater
mode option to make the video bigger and fill in the extra space
(that used to be occupied by thumbnails of the flat earth video, the
miracle virus cure video, and the video about the dead politician who is
secretly alive and is coming back to massacre everybody that some video
maker doesn’t like).
You can also click around with some of the other options. I haven’t
messed with this extension since I got things cleaned up. If you like a
video channel you can allow autoplay just for playlists, and the
extension doesn’t interfere with browsing a channel’s home page or
videos list page.
All done.
Conclusion
YouTube is not in business just to show you the videos you want. The
more that viewers pick their own videos, the more market power that
popular video creators end up with, and the lower the share of ad
revenue that YouTube can capture. The secret of the YouTube model is to
commodify the
content by artificially driving viewers away from emergent
stars…even if that means that the side effect is promoting more
bleach-drinking videos or white power videos than viewers actually
want.
In the long run, regulation will need to complement the technology
here. It will be easy for YouTube to change their site around to make
extensions and tips stop working. Drink your bleach and like it,
we’ve got a market to dominate! Right now, there is a lot of focus
on the technology and regulation to enforce people’s right to block
transfers of their data between sites. But there is also going to need
to be some protection of the right to turn off automated
promotion/commodification of behavior within a site.
Why not just quit YouTube? (added in 2023)
YouTube, once you clean up the engagement and growth hacking
crap, is promising in a lot of ways. At its best, YouTube is pretty
encouraging as an ad medium, as entertainmment, and as a place to learn
stuff. How I use it now:
I subscribe to channels recommended by people, including
YouTubers.
I actually pay attention to the ads. Not the automatically placed
surveillance ads, of course. Those are targeted to me, so more
likely to be deceptive than signal-carrying. I still ignore those.
But a lot of YouTubers read their own sponsor messages, old-school Paul
Harvey style, and those can actually work like ads are supposed to.
Here’s a good example: Incogni promo
(video).
Related
Up-to date (ish) list of the most
effective privacy tools, services and settings.
(here’s the body of a letter I’m sending to Consumer Reports.
Feel free to copy, modify, and send to other public-interest sites or
open-source projects.)
I’m writing to ask you to set an HTTP header on the Consumer Reports
site to help keep deadly robots from burning people’s houses down.
First, here’s the header.
Permissions-Policy: interest-cohort=()
I know it’s a little overhead on each request, but here’s why you’re
going to need it.
The Google Chrome web browser is going to be coming out with a new
feature called Federated
Learning of Cohorts (FLoC). FLoC is software that runs in the
browser and assigns the user to a group, or cohort, based on the web
sites they visit.
Each user would be assigned to one cohort, so the Consumer Reports
site visitors will be spread out among several cohorts. In order to use
cohorts, sites will have to learn which cohorts correspond to which
characteristics about people. For example, Consumer Reports members
might be assigned to cohorts that are heavy on “appliance buyers who are
concerned about safety.” When retail sites are able to analyze the
meaning of a visitor’s cohort, they’ll likely begin to optimize, showing
the safest, highest-quality appliances to the users whose cohort shows
they’re likely to be Consumer Reports members, and showing the
lower-quality appliances, more likely to catch on fire, to the members
of other cohorts.
In the long run, it’s bad for everyone in the market when companies
can easily monetize unsafe products. FLoC doesn’t require a deliberate
decision by anyone to trick any customer into buying a dangerous
appliance. The machine learning software on the retail side just
“learns” how to optimally move certain SKUs, without ever knowing that
they match up to fire hazards for the customers. Cohort training is an
important area for ethical AI research, but Google has been
abruptly terminating some key people who were responsible for leading
research in the field of AI ethics. Until FLoC has been tested for
safety, and for other problems such as enabling unlawful discrimination,
it seems best to stay opted out of it. That header is the only
opt-out method for now.
If you have any questions about how to test the HTTP header, please
let me know. A test of FLoC will be starting shortly, and you can learn
more at the World Wide Web Consortium’s Web
Incubator Community Group (WICG).
I just heard from a reader of this blog that a dictatorship is taking
over in their country, and will be requiring all citizens to use their
new web browser. The scary thing is that this new country-specific
browser will have a built-in surveillance system in it. Excerpt from the
top-secret document follows.
Today, citizens are categorized, governed, and provided with public
services based on a variety of tracking techniques. It would be more
efficient and less costly for the State if citizens could be assigned to
priority groups, or cohorts, within the browser itself.
We plan to explore ways in which a browser can group together people
with similar browsing habits, so that the State (and private entities
regulated by the State) can use the habits of these cohorts, to:
prioritize assignment of surveillance personnel to
individuals
allocate public services preferentially to favored religious and
language groups
encourage self-reeducation by members of marginal groups
Browsers would need a way to form clusters that are both useful and
efficient: Useful by collecting people with similar enough interests and
producing labels suitable for machine learning, and efficient by cheaply
forming large clusters that can be used to prioritize the assignment of
costly individual surveillance technologies and of public-sector
services such as education, business subsidies, and travel
documents.
A Citizen’s Omnipresent Law-Enforcement and Favoritism (COLF) cohort
is a short name that is shared by a large number (thousands) of people,
derived by the browser from its user’s browsing history. The browser
updates the cohort over time as its user traverses the web. The value is
made available to websites via a new JavaScript API.
The browser uses machine learning algorithms to develop a cohort
based on the sites that an individual visits. The algorithms might be
based on the URLs of the visited sites, on the content of those pages,
or other factors. The central idea is that these input features to the
algorithm, including the web history, are kept local on the browser—the
browser only exposes the generated cohort. The browser ensures that
cohorts are well distributed, so that each represents thousands of
people.
Hiding or falsifying cohort membership
Some citizens who are members of less favored cohorts might configure
or modify their browsers to send a blank, random, or deliberately chosen
cohort. This behavior will be disincentivized by doing spot-checks that
compare the observed cohort for a citizen to a set of likely cohorts
calcuated from known PII for that citizen.
Cohorts do not eliminate the need for detailed State surveillance of
a subset of citizens, just as a vehicle license plate does not eliminate
the need for random checks of a driver’s papers. However, license plates
and cohorts are easily observable in large numbers, and appropriate
penalties for falsification of either can be applied. Cohorts are a
cost-saving complement to other surveillance technologies, and make more
kinds of discrimination and surveillance practical and affordable.
Sensitive Categories
A cohort is designed to reveal sensitive information. A user might
configure or modify the browser in an attempt to remove visits to
sensitive sites from cohort data collection. But this does not mean
sensitive information can’t be leaked. The State is aware of
correlations between browser history and sensitive cohort membership
that citizens are not aware of.
Citizens might attempt to evaluate their own cohort by measuring and
limiting their deviation from population-level demographics with respect
to the prevalence of sensitive categories, to prevent their use as
proxies for a sensitive category. However, this evaluation would require
knowing how many individual people in the cohort were in the sensitive
categories, information which could be difficult or intrusive to
obtain.
…all right, that’s enough. This wasn’t some new surveillance
browser, it’s a lightly edited version of the FLoC README. How much
of a Bay Area big company employee filter bubble do you have to be in to
see an idea for having the browser tell sites, “MY USER IS A MEMBER OF
THIS GROUP” and think, wow, we can use this to sell sneakers! Anybody
who wants sneakers, if you seriously have no idea how to get them, let
me know, I’ll drive you to the damn shoe store myself. Easier than
dealing with all this labeling-people-with-group-identifiers creepy
jibber-jabber.
I’m not going to cover all the problems with having web browsers
label their users with “cohorts” here. Really briefly, the Google Chrome
browser might start using a system called “Federated
Learning of Cohorts” (FLoC) to partly replace some of the data
practices that are currently done using third-party tracking
cookies.
There is a test
of FLoC now running that is opt-in, but the core FLoC proposal is still opt out.
It looks as if, in order to protect Google Chrome users on your site,
you will have to change the site configuration a little.
In the future, to check that it worked, I should be able to use the
Permissions
API. An example of a Permissions API query that works is this one,
for how to check the Geolocation API permission:
// Check for Geolocation API permissions
navigator.permissions.query({name:'geolocation'})
.then(function(permissionStatus) {
console.log('geolocation permission state is ', permissionStatus.state);
});
Right now when I do the a query for the name
interest-cohort, I’m getting The provided value
‘interest-cohort’ is not a valid enum value of type PermissionName.
but I’m sure that they’ll get this fixed up before FLoC gets turned on
in regular people’s browsers. Anyway, will update as I get to test
more.
Well that was interesting
I mess around with this kind of thing on a Saturday morning because
I’m interested in web privacy, but seriously, opting out of
half-thought-out site audience data exfiltration schemes should not have
to be something that people with normal web sites have to worry about.
Not everybody has a free Saturday morning to work on their personal
site, and not every company has a massive open-plan office with armies
of developers to throw at projects they need to do just to stay in the
same place.
Here in California, I say, “right on.” All this state political
action is not surprising. Privacy laws and regulations are incredibly
popular.
California polling before our latest privacy initiative showed 88
percent in favor. Mandating privacy protection is so popular that
the argument against the initiative in
the voter guide had to focus on how it didn’t do
enough.
Vote NO on Proposition 24 because it was written behind closed doors
with input from giant tech corporations that collect and misuse our
personal information—while the measure’s sponsor rejected almost every
suggestion from 11 privacy and consumer rights groups.
I’m not just a California privacy law proponent. I’m also a user. A
power user. My goal for 2020 was to use
the CCPA to opt out of all personalized advertising, and if you count
the stuff I’m already protected from by my regular privacy tools, I have
mostly succeeded. (I still get targeted ads on Twitter, but those are
really just for ridiculosity. Not only do I lack the budget for an
F-15EX airplane, I don’t even have a pilot’s license.)
That meant I did a lot of CCPA opt-outs in 2020. I even got back on
Facebook, to CCPA any recognizable brand trying to target me on there.
(I didn’t bother with the sketchy Facebook advertisers, like all the
companies offering software and courses for making money drop-shipping
cheap products to people who click on Facebook ads.) The Consumer
Reports CCPA Authorized Agent study (PDF) was part of my year of
using the CCPA, and I also did a bunch of CCPA opt outs and Right to
Knows on my own. On my computers, ccpa is a shell script now, so I
can CCPA a company faster than anybody. Time me.
Privacy law features I depend on
As a power user, I’m willing to use services and write code to make
my privacy opt-outs work. I understand that there are good reasons for
putting privacy laws in the USA on an opt-out basis instead of making
them consent-based. I’m fully prepared to do my part of the work, even
if that means dorking around with laser printers or fax
machines.
If a missing piece in a state privacy law is something that I can
work around, I’ll do it. So this list is only going to cover what I
think are the essentials for making a state privacy law workable.
Reasonable identity verification for opt outs: The
CCPA does not require
identity verification for an opt out, but companies can deny an opt
out if they believe it to be fraudulent. This makes an opt out easier
than a full CCPA deletion or Right to Know. Realistically, making people
scan their drivers’ licenses to opt out is going to do more to deter opt
outs than to make them work any better.
Authorized agents: People hate filling out forms,
especially minor variations on the same form, over and over. Giving
people the ability to delegate the work is what makes an opt-out-based
privacy law practical. I know that early CCPA implementations were kind
of rough, but the future is in automation and delegation.
Dark patterns: Unless the law covers Dark
Patterns, companies are incentivized to make opt-out processes that
are technically legal but that keep increasing the time required.
Definition of what is being opted out of: When
people ask me about the CCPA, the most common thing they want to opt out
of is “stop this company targeting me on Facebook.” Please compare your
privacy law to how
Facebook Custom Audiences work. If the definitions in the law don’t
cover this high-profile example of creepy privacy violation, you’re
missing a key part of what voters want, and you need to fix it. (Yes, research shows that 31-36% of people are “Kevins”
who want personalized ads. Kevin’s desires are already met, so the law
needs to focus on helping the rest of the people exercise their
rights.)
That’s about it. Let me just end with a quick rule of thumb:
you know you have a good state privacy law when the surveillance
marketing companies lobby for a Federal privacy law to preempt
it. If your state isn’t making them complain about
“uncertainty” and “patchwork of regulations” then you’re missing
something.
Related
there ought
to be a law What’s next? Notes on possible future state privacy laws
that avoid some of the problems with what we have now
I was playing an online game the other night, and another player
pointed out that the game had the
Facebook SDK in it. This is a small piece of software that, well,
I’ll quote from their site. Facebook SDKs enable you to pass app
event data from your app to Facebook.link added by me
Creepy, right? What can you do about it?
First, the easy answer. On Apple iOS, users are soon going to get an
Ask
App not to Track button that should help with this problem.
You’ll start getting dialogs with that option plus an Allow
option. I don’t have the budget for enough lawyer time to understand
what I would be agreeing to if I click Allow, so I’m not going to
select that. Keep an eye out for these dialogs, and select the Ask
App not to Track button when you can.
Second, the almost as easy answer. Send a CCPA opt-out to the game
company. Under the California
Consumer Privacy Act, you can require a company to stop
selling your personal information. A sale for CCPA
purposes does not have to be an exchange of personal info for money. So
even if the game company is just getting different data back in exchange
for yours, the CCPA still covers it.
After a year of doing CCPA opt-outs, it looks like even if they make
you do extra steps to do your opt-out, it’s almost always faster to
follow the instructions in the email than to try to find the right form
on your own. And I use the email outbox as a way to check which
companies I have already started the CCPA process with. Yes, there are
going to be easier ways
to make this stuff work including browser signals and authorized
agent services. I’ll update with links as they become available.
(updated 5 Jun 2024: add link to The One Simple Trick to
Measuring Abuse in Tech’s $440 Billion Ads Business)
(updated 14 Nov 2021: deception avoidance and value
exchange)
(updated 20 Sep 2021: more material on Ad Library, problem
categories, links)
(updated 7 Jan 2021: added intermediary for Custom Audiences
trick)
Why are there so many scam ads on Facebook? The
over-simplified answer is that Facebook just doesn’t have enough ad
reviewers for the number of ads they get. Since basically anyone with a
credit card can advertise, and advertisers have access to tools for
making huge numbers of ad variations, then of course lots of scam ads
are going to get through.
Facebook is also more attractive to scammers than other ad media.
Deceptive advertisers already get
more value from highly targetable ad media than honest advertisers
do, because targeting gives the deceptive advertiser an additional
benefit. Besides helping to reach possible buyers, a deceptive
advertiser can also use targeting to avoid enforcers of laws and
norms.
Understaffing and targeting are only parts of the story, though. It’s
not so much that Facebook is uninterested in dealing with scams, it’s as
if their ad system in general is the result of a cooperative software
development project with the scammers. do Facebook
and their scam advertisers constitute an “enterprise”
for purposes of RICO? I don’t know, might be worth asking your
lawyer if you got scammed or impersonated, though Some of the
deliberate design decisions that went into Facebook ads are making
things easier for deceptive advertisers at the expense of users and
legit advertisers.
Custom Audiences don’t support list seeding. Until
Facebook, every direct marketing medium has supported “seed”
records, which look like ordinary records but get delivered back to the
list owner or someone they know, so that they can monitor usage of the
list. (I used them for a biotech company’s postal and email lists, even
though we never sold or shared the list. Just to be on the safe side.)
Using seed records is a basic
direct marketing best practice and deters people who might see your
list from misusing it.
Facebook Custom Audiences are a way for scammers to use a stolen list
without detection. Facebook Ad Settings lets a user see if they
personally are in someone else’s Custom Audience, but there’s no way for
a list owner to check if the seed records from their list ended up on
one. Someone who steals a mailing list can sneak it into a new Custom
Audience without getting caught by the list owner. Legit direct
marketers who want to protect their lists would pay for the ability to
use seed accounts on Facebook, but this functionality would interfere
with Facebook’s support for scam advertisers, so they don’t offer it, or
even allow anyone else to provide seed accounts. (A limited number of Test
Users are allowed for app development, but these are not usable as
seeds. Facebook uses the term “seeds” differently from the conventional
meaning, to mean the starting
names for a Lookalike Audience)
Users can be blocked from seeing the company that really
controls the targeting lists that they’re on. Suppose that a
dishonest advertiser wants to use a California resident’s PII, but they
don’t want to have to honor CCPA opt outs or register with the state.
Facebook promises transparency and allows users to see who has
uploaded their info. But the dishonest advertiser can simply send the
hashed versions of the PII on their list to an intermediary firm, and
have that firm transfer the hashed PII to Facebook. Now when someone who
is on the list goes to “Advertisers using your activity or information”
on Facebook, they see the name of the intermediary firm instead. Even if
a bunch of people on the list do opt out, the deceptive advertiser’s own
copy of the list is intact. When they switch to a different intermediary
firm later, there are no opt-outs associated with the list. This also
seems to be a good way for extremely suspicious-looking advertisers to
hide from people who might report or investigate them. If I check
Facebook for exclusion lists used by scammers who think I might report
them, I see only the name of a generic-sounding targeted ad company, not
the actual dishonest Facebook page.
Ad Library helps hide deceptive ads at times when risk of
discovery is high. Facebook’s Ad
Library is designed to show only “active” ads, those that are
running this very minute. A deceptive advertiser using a trademark or a
person’s likeness without permission can simply turn their ad on and off
based on when the victim is likely to be checking the Ad Library. For
example, a seller of infringing
knock-offs of a European brand can run the ads when European
marketers, lawyers, and regulators are asleep but people in the Americas
or Asia are awake and shopping. Ad Library makes it easier for scammers
to copy honest advertisers than the other way around.
Ad Library delays posting of scam ads. If you see a
bunch of similar scam ads popping up, like this…
Scam ad on Facebook
…but then you go to their Ad Library and get This advertiser isn’t
running ads in any country at this time, read the fine print.
An ad will appear in the ad library within 24 hours from the time it
gets its first impression. Any changes or updates made to an ad will
also be reflected in the ad library within 24 hours.
Facebook deliberately gives their scam advertisers almost a full day
to take a whack at you before revealing their ads in Ad Library (and, of
course, if the ad comes down fast enough, it never shows up there.)
Independent crawling of ads is blocked by policy. On
the open web, online ads can be crawled and logged by independent
companies. This service is needed in order to check for malvertising and
other problem ads. Inside the Facebook environment, however, independent
checking on ads is prohibited. Facebook puts the goal of hiding
problem ads ahead of facilitating the kinds of services that could help
fix the situation.
Image search crawlers are blocked from ads. Many
scammers make infringing copies of material from legit ads without
permission. Pirated product photos are especially common. The photos in
those scam ads above appear to have been taken from a legit retailer. If
legit advertisers had the ability to search for ads similar to theirs,
or for edited copies of their own photos, they would be able to find a
lot. But, for example, TinEye is
blocked from Ad Library, to make life easier for Facebook’s
deceptive advertisers at the expense of legit ones. Wells Fargo has to
ask
customers to report fake Wells Fargo because Facebook cooperates
with scammers pretending to be Wells Fargo, to hide fraudulent uses of
Wells Fargo’s trademarks.
Categories of scams to look for
The reason that Facebook has to try to shut down research programs
like NYU’s is that a project with the budget and skills of a small
university team could pick up on a bunch of obvious scams with some
tools based on existing open-source image matching software.
Some examples:
photos of public figures who do not endorse a particular category
(such as personal finance experts on cryptocurrency ads)
well-known company logos (needs manual check, sometimes the
advertiser is a dealer using the logo with permission)
rental
housing scams—look for the same house or apartment photo showing up
in ads from multiple landlords
But why?
It’s not clear why a large company would choose to cooperate with
deceptive advertisers. This decision might have to do with the fact that
Facebook has lots of eyeball-minutes that are hard to sell to the legit
market. As Bob Hoffman has been saying for a while, the ad business has
a long-running problem of avoiding
advertising to older people. Any online forum except the youngest
and hottest is going to fill up with older users whose ad impressions
are less valuable to marketers. Facebook could be making a short-term
revenue-maximizing decision to try to monetize these users better by
temporarily filling up the ad spots with scams, and only cleaning up bit
by bit when they have to.
Deception avoidance and value exchange
Or maybe it’s not a short-term decision after all. What if the
deceptive ads are a necessary part of the system?
A common, conventional point of view about surveillance marketing is
that people choose to trade information about themselves for
better-targeted ads. But this is oversimplified even if you don’t get
into the details of whether or not people give actual consent to the
exchange. Realistically, there aren’t enough well-targeted ads trying to
reach you at any one time to make the more relevant ads better
enough that even a high-status user would notice. If the Facebook ad
system is run at capacity, then as a user you’re generally going to be
getting mostly ads that are not perceptibly well-matched, but still
revenue positive for the company.
Allowing a certain percentage of deceptive ads changes the balance.
With enough deceptive ads in the system, it becomes a better move for a
high-status user to reveal more information. Revealing information might
be able to get you enough additional legit ads that the level of risk
and annoyance you experience moves down noticeably.
So—even in a idealized consent-based future technical and regulatory
environment, where users can’t be easily deceived into giving up more
information than they prefer to—some rational high-status users might
choose to trade away some personal information in order to attract more
legit ads and fewer scams. Facebook doesn’t have to do anything drastic
like offering reduced ad load in exchange for allowing better-matched
ads, they can just let you buy your way out of some scams with
data.
[P]eople have a collective interest against the unjust social
processes data flows may materialize, against being drafted into the
project of one another’s oppression as a condition of digital life, and
against being put into data relations that constitute instances of
domination and oppression for themselves or others on the basis of group
membership.
This ad system might be a good example of that kind of project. A
Facebook user who chooses to avoid scams by providing data on their
membership in a high-status group is diverting the scams that they would
have gotten onto other people, both members of low-status groups and
members of high-status groups who share less data.
The question of scam load and total ad load is different from the
competition questions around total ad load. In a hypothetical
competitive market for social networking services, companies could
compete on ad load, but with network effects and winner take all market
effects, a monopoly network can run at a higher ad load than, say, a
single ad-supported service that participated in a federated system of
intercommunicating social sites.
Conclusion
There are some lessons here for the rest of us. When designing new
post-cookie ads for the web in general, though, it will be more and more
important to avoid the kind of design decisions that Facebook has made.
Facebook is highly profitable running deceptive ads today, but as a
single company they can unilaterally change their system relatively
quickly. All the items above would be small code or policy changes
whenever they decide to cut down on scams. For the open web, fixes that
need to involve code and business agreements from more companies would
be harder.
The
One Simple Trick to Measuring Abuse in Tech’s $440 Billion Ads Business
| TechPolicy.PressWhile ad libraries let researchers and users
find specific ads or advertisers, there is no way to get a sense of the
overall ecosystem of potentially abusive advertising. The best way to
change that would be to require these companies to make a daily random
sample of their ads available to researchers and nonprofits. (A good
first step)
Free riding in today’s web advertising is pretty basic.
Use an ad-supported site while blocking the ads.
Opportunities for free riding in the future are a lot more
interesting.
Don’t store “interest-based” ads that sites ask you to keep for
possible future use.
Use a modified version of a browser-hosted ad auction that skips
time-consuming processing steps.
Don’t store delayed click or conversion tracking data, just drop
it.
Turn off the browser’s cache partitioning feature to save
bandwidth and storage, knowing that companies won’t invest in
tracking the small fraction of users who figure out how to do
this.
Turn off frequency capping to get more impressions of a
big-budget ad instead of the first few impressions of a lower-quality
(possibly deceptive or gross) ad
Some of the proposals for post-cookie web advertising create new
opportunities for free riding, either by individuals tweaking their
settings or by developers who rebuild the browser with stubbed-out
versions of ad placement and reporting features.
Privacy features in post-cookie web ads are going to make a lot of
free riding hard to detect while a visitor is on a site. There’s no
immediate difference perceptible to the site between a browser that is
actually running in-browser ad auctions and delayed reporting, and a
browser that is skipping all that stuff.
Will free riding browsers be an obscure nerd niche, or will these
practices become common enough to provoke widely adopted
countermeasures? Software, music, and movies have DRM systems. Online
games have anti-cheating systems. Educational testing has intrusive
proctoring systems. Will future web ad systems require some kind of
locked-down browsing environment to keep people using the post-cookie ad
placement and reporting system as intended?
(This is a cleaned up and edited transcript of my talk from
WebInnovationX, with some
links put in.)
The IT business has long been in a kind of cycle
of centralization and decentralization. After the 8-bit
microcomputer days, the IBM PC led into the Microsoft Windows era, then
the explosion of companies in the dot-com boom, and the cycle keeps
turning. Today, the death of the third party cookieI
promised a link to a cookie recipe, so here’s one that works reliably
for me: Very
Peanut Butter Cookies is often seen as a problem of
further centralization. If third party cookies go away and we don’t have
the ability for multiple players on the web to see data from each other,
then that leaves
a few big companies running everything. The Lumascape would look
better with just a few logos on it, and some consolidation is
inevitable. But there’s a big difference though between some overdue consolidation
in the web advertising business and this scary move toward extreme
centralization.
Today there are a few extremely large companies that are capable of
participating in the web standards process in a large, ostentatious
fashion. They have the ability to produce complete implementations of
complex proposals, demonstrating their power to drive consolidation of
of the web business. Big companies have open-plan offices full of
developers for the same reason that King Henry VIII ordered rows of
ornamental yew trees planted at Hampton Court Palace: to show that they
have enough wealth not to need all that land and labor for food, and to
remind visitors of the power
of the English longbow. Some consolidation is probably a good thing,
but for large companies the process of eliminating the third-party
cookie is more about commodifying
complementary goods, a strategy that Joel Spolsky pointed out back in
2002. In the days of Linux first catching on as a web server
platform, the new OS was used for commodifying
the servers. New companies didn’t need big Digital or Sun servers
like the first generation of big web properties, they could just use
stacks of generic PCs.
This innovation led to the giant companies of today, systematically
commodifying everything that they touch. A strategy that worked so
well for commodifying the hardware business is now being applied to everything,
including content and labor. A lot of of attention is paid to big data
as a buzzword, or artificial intelligence as a buzzword, but all of
these terms are encapsulating a common phenomenon: pushing all the value
in the system to a centralized reputation graph, a data structure that
allows one company to evaluate which other participants in the system
are better or worse for specific purposes. This is a common pattern
across the gig economy, across large social sites, and of course the
content industry.
Here is a question that that we got before this talk. If
personal data becomes available mostly through a few giant companies,
and those companies are the only people who have access to these large
reputation graphs, then won’t all advertising budgets just move to these
large companies?
The answer is that yes, content is participating in a race to the
bottom, but brands are also subject to commodification. If
commodification continues, then brand marketing budget decisions won’t
matter anyway. Here’s a simple example.
This is a large company’s user dashboard for checking on who is using
my personal information. “BM 00704” is directly competing with
established brands to sell me branded goods and services, and of course
I never gave them my personal info. The big company, in this case
Facebook, cooperated in whatever tricks got pulled to get my information
from somewhere. In most direct marketing media, like direct mail, a
vendor can “seed” the mailing list with records that will get back to
them to let them know if the list gets copied without permission.
Facebook Custom Audiences give the list owner no way to detect seed
records, which makes Facebook an easy way to use a stolen customer list
without detection. Brands can’t expect any protection from the
commodification effect that publishers are seeing. There’s no reason to
expect that first, servers will be commidified, then they’ll come for
the publishers, and somehow they’re just going to decide to stop before
they get to, say, Oreo cookies. If you want to tell the commoditization
dystopia story, don’t stop at publishers. Tell the whole story,
including the
brand part.
On the internet we love our dystopia stories. We’ve had the
cryptography key escrow dystopia story, the digital rights management
dystopia story, and today of course we’re having the surveillance
marketing consolidation dystopia story. The crypto issue and the DRM
issue are both still problems on the Internet, but they haven’t led to
the end of civilization as we know it because there’s been a set of
factors pushing back. Same goes for surveillance marketing. Here, a lot
of the anti-dystopia narrative comes from privacy law and tools. As
you’ve probably heard, in the California election that just happened,
the California Privacy Rights and Enforcement Act, or Proposition 24,
passed with 56% of the vote. That 56% turns out to be way below what
CPRA had originally polled at before the election, where it came in at
88
or 72
percent.
At first, it looks like the surveillance marketing business managed
to make their case to some California voters. But the main point against
proposition 24 in the information
that actually went out to voters is that it doesn’t do enough:
Vote NO on Proposition 24 because it was written behind closed doors
with input from giant tech corporations that collect and misuse our
personal information—while the measure’s sponsor rejected almost every
suggestion from 11 privacy and consumer rights groups….The real winners
with Proposition 24 are the biggest social media platforms, giant tech
companies and credit reporting corporations who get more freedom to
invade the privacy of workers and consumers, and to continue sharing
your credit data. Here’s what they won’t tell you about the 52 pages of
fine print: Proposition 24 asks you to approve an Internet “pay for
privacy” scheme. Those who don’t pay more could get inferior service—bad
connections, slower downloads and more pop up ads. It’s an electronic
version of freeway express lanes for the wealthy and traffic jams for
everyone else.
If you look at that 56 percent from the point of view of one of the
big surveillance marketing companies, then yes, only 56 percent voted to
have you walk the plank, but a substantial fraction of the of the other
44 voted for having you keelhauled first. For another reality check on
how far away from people’s norms the direct marketing business has
managed to get, take a look at this post from the infamous Unethical Life Pro
Tips board on Reddit, where monetizing a list of PII was too
unethical even for people who choose to moderate a forum about unethical
activity.
California already has one privacy law, the California Consumer
Privacy Act (CCPA). The Interactive Advertising Bureau recently
surveyed corporate privacy lawyers and they found that only about
1-5% of people who are given the opportunity to exercise their privacy
rights under the CCPA actually do so. And 60% of the lawyers surveyed
said that their companies just go ahead and make CCPA rights available
to everyone. You do have to put an asterisk on that number, because
that’s 60% of businesses that are big enough and engaged enough to send
a privacy lawyer to meet with the IAB, so the actual percentage is
probably lower, but it’s clear that CCPA is having an influence on
privacy features that are being made available even outside of
California.
I am one of that 1-5% and i’ve been sending out CCPA opt-outs since
January. The process is really kind of a pain still.
The percentage of people who vote for privacy stuff is high, but the
percentage of people who actually take the time to do the privacy
activities enabled by the law is a lot lower. The only thing less
popular than creepy nerds is making yourself do creepy nerd stuff. The
decentralizing effect comes from how not only will California voters
just keep passing privacy initiatives until this stuff stops, but in how
the laws are increasingly making it possible for organizations to take
some of these actions on behalf of consumers.
I worked on an Authorized
Agent study at Consumer Reports, and it had a response rate that any
direct marketer would envy. We had a whole process for recruiting emails
and multiple lists and following up, and we didn’t have to use any of it
because the first email to the first list filled up the entire study
group. It’s incredibly popular with consumers to be able to say, let
me push one button and make this stuff stop. Watch for more info on
results of the study.
Authorized Agent services are not some kind of silver bullet for
putting end users completely in charge of their personal data. There’s a
trend among activists to say that we’re headed for some kind of privacy
or data control utopia, in which people are going to have a high level
of control, and that’s not realistic. The data isn’t worth enough and
people don’t care enough. But privacy services are a counterweight to
centralization trends. As a company gets big enough and high enough in
impact to drive meaningful centralization, it’s also going to be a
high-value target for privacy laws and protection services to balance
that out.
This is good news for advertising. Somehow the print publishing
business managed to come up with a high margin, repeatable, sustainable
advertising model for newspapers and magazines when they were breathing
the fumes from molten lead all day, and we have not been able to get
anywhere near that on the Internet. So we have a huge opportunity now to
redesign the market for web advertising in such a way that it is
acceptable to the audience, not forcing the kind of of consolidation
that tends to get pushed back, while at the same time producing the kind
of reputation effects in the audience’s head that make advertising worth
buying. If the entire reputation graph lives inside a big company then
there can be no brand equity, there can only be bits within someone’s
centralized score, and that’s missing out on a lot of economic value. So
I’m highly encouraged to be participating in the Improving Web Advertising
Business Group and I’m really looking forward to seeing what kind of
models we can come up with. Thank you very much.
Do you have any comments about the pros and cons of closed
and open ecosystems? Some of the questions claim that a closed ecosystem
is faster and allows for better measurements than the smaller
publishers.
Unfortunately in this election cycle we’ve seen a lot of the
limitations of centralization. From the point of view of a large tech
company there’s constant pressure to lower the costs of functions such
as moderation and ad review, and so you end up with moderators and ad
reviewers who are doing these very stressful jobs that expose them to a
lot of, say, terrorist material or child abuse material. And they’re up
against dedicated misinformation operations, so when people rely on
those centralized information sources then they’re they’re getting a
source of information that’s been weaponized by highly motivated bad
actors, whether they’re financial scammers or political extremists or
both. When independent publishers are involved, there’s a media ethics
point of view or a labor of love point of view on their content that
tends to make it higher reputation and more reliable in a way that a big
company whose algorithm is constantly being tested and gamed really
can’t do.
How do we balance the need for privacy with the need for
advertisers to know that a user is legitimate?
People have a trust relationship with their web publishers and so
there is very personal information that people will share with a trusted
content brand that they won’t share with some big bad Internet company
in general. As editor of Linux Journal some of the most
rewarding letters to the editor that we ever received were people who
had been reading Linux Journal in prison. They didn’t
generally have access to an actual Linux box but then when they got out
they were able to get some kind of an IT job. People share personal
information with their trusted publishers, writers, and editors in a way
that a big company can never get.
The way to use that information appropriately, in a way that you can
say to an advertiser that yes, I have this highly engaged audience, is a
very fruitful field. I would look at the Trust Tokens sessions at W3C
when those come up. Real people do a lot of real people things, and real
people are good at recognizing other real people. Just to give an
example, the 90210 area code in the USA has more Facebook users than
people. Publisher networks of trust can show if, yes, this is a real
human reader. And they’re much more reliable if it’s a publisher that
has a subscriber or someone who’s who’s been interacting with them for a
while, than if it’s just some big company saying, trust us, look at all
these people we have watching videos, you can pivot to video now.
The proposed Federated
Learning of Cohorts (FLoC) would replace the old-fashioned
third-party cookie with a shorter identifier, calculated by a complex
algorithm from your browsing activity. The cohort ID doesn’t correspond
to you individually but identifies you as part of a “cohort”. Any site
can call the JavaScript function document.interestCohort();
to find out which cohort you’re in.
Cohort-based ads sound great, if you’re in the cohort that gets ads
featuring attractive Europeans driving new European cars on winding
roads through the European woods. Or the cohort that gets ships cruising
to scenic vacation destinations. But not all cohorts are going to get
the good ads.
It’s even worse when a cohort ID might leak a sensitive piece of
information about you. There is no way to test FLoC with all the legacy
sites on the web that might leak some kind of sensitive info. What if a
user’s pattern of play in a casual web game can leak something about
their disability to the FLoC algorithm, and they stop getting certain
job ads? Proving that FLoC protects user privacy is an unsolved problem,
and might be mathematically impossible. So we have to assume that the a
cohort ID leaks bits of sensitive personal info until it can be shown
that it doesn’t. And, of course, from the web publisher point of view,
FLoC leakage is a business issue. The FLoC algorithm could “learn” the
subscriber lists of niche publications that depend on ad revenue. Since
any site can call interestCohort, a site like
cheapAssCatGifs.com might be able to sell the audience of a site like
expensiveCarTeardownReviews.com, just based on cohort ID.
The FLoC-powered awards show
So far, not so good. But offline, people actually buy posters and
books of award-winning ads, so there must be some demand for the good
ads. If only there were some way to get more of the good ads right in
the browser.
That’s where FLoC can really help.
Step one: Identify the good FLoC-based ads that
appear on a set of sites, along with the cohort ID of the cohort that
got them. For a first pass, pick out the ads that carry the most revenue
for their weight. In general, the ad campaigns that are willing to pay
more per impression are also the ones that have a budget for good
creative work. At this point we have a first pass at a set of possibly
good ads and can pick some good ones manually.
Step two: Keep track of which cohorts got the best
ads, and share the highest ranking cohort with browsers that want to
give their users the best experience.
Step three: the browser always responds to
document.interestCohort(); with the winning cohort ID, for
all users. No leakage of possibly sensitive info, the browser developers
don’t have to code and test a bunch of hard cohort math, and everybody
gets the good ads. It’s like an ad awards show in the browser. Could be
updated every browser release.
Something to keep in mind for when they open up street fairs again.
Could also run this as a local public library event or trade show
booth.
Needed:
computer(s)
laser printer (office type with low cost/page)
one box per company
generator (if at a street fair)
How it works.
Setup
Hook up the hardware.
Label the boxes with the addresses of the companies to be opted
out of.
For each user:
User goes up to the computer, opens a simple web application, and
types in their info.
User can preview a sample CCPA letter on screen.
User approves their contact info as it appears on the sample
letter.
Application generates a multi-page PDF, one company opt-out per
page, and prints.
User signs each page and hands back to a volunteer
Volunteer puts each page in the appropriate company’s
box.
At the end of the event:
A volunteer seals up each box of letters
A volunteer takes each box to the appropriate company (if local)
or ships it.
This should be able to do CCPA letters at under a dime each. Under
one cent for the paper, three or four cents in printing costs, about
three cents per page for shipping if a large number are packed and
shipped together.
This could also be set up as a kiosk with a drop box for step 5 and
just come around every so often to empty it and ship out a batch.
Here’s another question I get a lot. Why don’t you CCPA
Facebook?
Here are some reasons, and there are probably others. (This post
contains facebook.com links, so you may want to check your browser
privacy settings before clicking.)
CCPA Do Not Sell requests, also known as opt outs, only affect
“sale” of personal information. In general, the flow of personal info to
Facebook is one way, inbound. Facebook either (a) doesn’t sell or share
my personal info, or (b) they do but stop
when it makes the news.
A CCPA Right to Know probably wouldn’t get me any more
information than is already available using Facebook’s Download Your
Information feature, at least to start.
Facebook trackers on non-Facebook web sites can be blocked with
Facebook
Container and other browser privacy features. This activity is
important to stop, but it’s already handled.
When the user-enabled global privacy controls mentioned in
the CCPA regulations are ready, I’ll be sending a CCPA opt out to every
site I visit unless I specifically say otherwise. So I don’t need to
worry about Facebook trackers on random sites any more than I need to
worry about any random LUMAscape
company.
Facebook Custom Audiences are like those sampling wells on
hazardous waste sites that catch anything nasty that leaks in the area.
Crooked politicians, fake brands, Amazon review manipulation schemes,
data brokers ripping off their clients, the whole skeevy side of
marketing is right there in the Facebook ad settings. If I sent a CCPA
Right to Delete to Facebook, it would be like deleting my account
manually—I would lose a way to see what’s happening with my other CCPA
activity and to learn about other privacy risks.
I do use Facebook Custom Audiences to find companies that are sending
my personal info to Facebook. In the case of obvious scams or marketing firms taking
advantage of clients, I just take a screenshot. If a real company is
adding me to a Custom Audience, I CCPA the company, where it will do
some good.
(update 8 Oct 2020: add material on GPC, copy edit)
The later you catch a software bug, the more expensive it is to fix.
Catching a syntax error while you’re typing code costs practically
nothing, fixing a broken test is more expensive, and deploying an update
to users can cost even more.
If a business collects personal information from consumers online,
the business shall treat user-enabled global privacy controls, such as a
browser plug-in or privacy setting, device setting, or other mechanism,
that communicate or signal the consumer’s choice to opt-out of the sale
of their personal information as a valid request submitted pursuant to
Civil Code section 1798.120 for that browser or device, or, if known,
for the consumer.
A standard signal to implement this, called Global Privacy Control, has
just been announced, and is being tested across a variety of browsers,
extensions, and sites. It is technically similar to the old Do Not Track, with
the big difference that it’s intended to benow
legally required(updated for 2021).
authorized agent Do Not Sell This is the first
escalation for a mishandled browser Do Not Sell, and the lowest level
for a company that the consumer does not have a direct HTTP connection
to. Agent opt outs can be bundled and made easy to handle, and the agent
has an incentive to cooperate with the company, to cut their own costs
and increase user satisfaction. Bulk handling of agent opt outs is an
easy win for DSAR vendors, to lower average cost per CCPA
transaction.
Do Not Sell My Personal Information emails and
clicks. These are similar to a GDPR Article 21 objection, but
can’t be handled with the
same processes used for GDPR. Lightweight because no user identity
verification is required (although the company can do an anti-fraud
check) but still heavier than handling an agent opt out. If you get an
opt out, cheaper to act on it than to push back and make the consumer
escalate to a Right to Know or Right to Delete.
Right to Know and Right to Delete If a Do Not Sell
gets subjected to illegal verification
steps or other dark
patterns, then the consumer can escalate to a Right to Know,
followed by a Right to Delete. According to vendors of Data Subject
Access Rights software, manual
handling of a Right to Know can cost $1,400-$10,000. Software and
processes are going to bring this down, but realistically nowhere near
the cost of dealing with the browser signal or the agent opt opt
correctly in the first place.
Companies that try to apply the same user experience to a Right to
Know as to a more
common and less expensive opt out are likely to have to have to deal
with a higher volume of Right to Know requests.
Somewhere along this escalation path, users can make automated or
manual reports to the office of the Attorney General, to help them pick
targets for enforcement. They certainly don’t have the staff time to go
after most CCPA violations, but reports from consumers and consumer
organizations will help them pick some high-priority targets.
Some experts are recommending relying on dark patterns to limit the
number of CCPA requests I still don’t like the term
“requests” here since the company has to comply with them. And legally
binding communications coming from the company to the customer is never
called a request. But “request” is in the regulations. that
companies have to deal with. But we’re going to discover that the dark
patterns approach is flawed. Yes, a lot of consumers are going to give
up and go away when they hit a dark pattern, such as an extra
verification not allowed by the law, but a fraction of the consumers are
going to escalate. A company that chooses dark patterns instead of
straightforward compliance is making a high-stakes bet on what fraction
of consumers will escalate.
Web advertising is rapidly fixing itself. There is still quite a bit
of brokenness, but it’s not fundamentally as bad as some proponents of
alternate business models make it out to be. Free, ad-supported,
web content was never really the problem. In print, the subscribers
basically covered the printing and postage, and the advertisers paid for
the content. More pages of advertising, more budget for articles we
could assign. When publishers went to the web, they didn’t emphasise
advertising-funded business models originally because they were
hippy-dippy Internet freedom types—they made a fairly sound business
decision to let the readers off the hook for the high distribution costs
they were no longer incurring. Yes, in hindsight it would have been
better to have an incredibly cheap subscription, or even one that came
out to a negative price after coupons, in order to get a hard-to-fake
signal of reader engagement, but basically, the decision to go free was
pretty reasonable.
Unfortunately publishers ignored the big problem, which was that
browsers, until recently, facilitated cross-site tracking. This put
sites into a race to the bottom on ad pricing. Any advertiser could
track an expensive site’s audience to the next cheapest site, and so on.
And in most of today’s web, that means no market power for a publisher
with a good list of subscribers or at least registered readers. When
good ad campaigns can chasing big-budget readers on cheap sites, that
frees up space on legit sites for crappy ad tricks like
traffic arbitrage and sends money to ‘Unknown
delta’ instead of paying the publisher.
Too many publishers had a casual attitude toward reader data, and
just let the browsers take it and leak it to whoever. If publishers were
more like tech firms, they would be all like, “what, someone
else is making money from ads served to OUR readers? Those eyeballs are
OUR PROPERTY! Muster all the lawyers and lobbyists we can find to put
them out of business!” If publishers got a fraction as mad
about having their audience data taken as Hollywood people get mad about
having their movies pirated, or hardware people get mad about generic
cartridges that fit their printers, we wouldn’t be in this mess.
Enough review.
That’s all changing. Ever notice that more and more sites are making
you give your email address, or log in with SSO,
to keep reading? Browser privacy changes are rapidly increasing the
market power of sites where users are willing to sign in.there is no reg wall cartel
The market price of an eyeball has fallen into different sets of
price brackets, with the dividing lines between the brackets changing as
the browsers and regulations change.
Pre-2017: subscriber > trackable reader > ad
blocker user (The ad blocker user is still worth something if you choose
to pay into the Acceptable Ads racket or do some other reinsertion
tricks.)
2017 (Apple Safari ITP): subscriber > registered
reader > targetable reader > unregistered ITP user > ad blocker
user
2018 (GDPR): subscriber > registered reader >
consent-giving reader > no-consent reader or unregistered ITP user
> ad blocker user
2021 (death of the 3rd-party cookie): subscriber
> registered reader with consent > unregistered or no-consent
reader or ad blocker user
If things keep going the way they’re going, a pageview from a
non-registered user is not going to be worth much more than a pageview
from a user who is blocking ads entirely (or, in practice, trying to
block ads but getting some ads reinserted by their ad blocker’s paid
whitelisting scheme) Meaningful consent that’s good enough for high-CPM
ads has been getting
harder and harder to get for a while now. And there’s a lot of
attention being paid to complex technical schemes to replace the
third-party cookie, but in order to use them, the site still has to get
people through the consent UX. So by the time a site gets the consent UX
up to the point where it’s passing inspection with the regulators,
publishers might as well be combining the consent wall and the reg
wall.
Just getting someone through a consent dialog but not the reg wall
just puts them into a basically worthless state. Sites might find it
better to skip the big up-front consent experience anyway and put all
the UX effort into getting the registration, which is where all the
money is anyway.
Every time I build an open source project from scratch, I end up
installing a bunch of dependencies. And often it’s tricky to get the
build working the same way on multiple systems. And it’s even harder to
get new people started on a project.
So just containerize it, right? So far I have found that Docker
containers work great for web applications. I get an accurate copy of
the standard production environment inside a container, and I can still
edit files and use my web framework’s auto-reload feature. Here’s an
example, the web.sh
script in the Pinfactory project. All the dependencies get installed
in the project
Dockerfile, and then the source code (including all the HTML
templates, CSS, and graphics) is in a volume shared into the container.
This way I can do a flask
run inside the container, and when I edit a file in the
volume, it Just Works and auto-reloads. Pinfactory is easy to work on in
containers. You can run one script to do all the unit tests in a
container, one script to start up a web server with real data, and
there’s even a tricked-out demo
script. that creates a container with multiple users. Containerizing web
applications is a win for small stuff, too. Here’s a simple Dockerfile
for a Jekyll project that I can use to preview a relatively large
Jekyll site locally, without installing any Ruby packages.
Containers for developing and testing web sites locally are great. So
what about containerizing a regular software build?
Building a simple tool to sign files
I run my own mail server and other services. (My blog is on a VPS
with a static site generator.) That means tracking and deploying a bunch
of files that end up in a bunch of different places, on systems running
a variety of Linux distributions.
I want to be able to sign important files, and check signatures, so
I’m looking for a good, lightweight digital signature tool.
Signify has a very nice build that facilitates what I want to do,
driven by a well laid-out Makefile. I can build a statically linked
signify, and the man page, that will work on all my Linux systems of
whatever distribution. Signify is also a good example of a program to
build and install, because it includes an interesting dependency and a
step where the Makefile needs to check a signature of the
dependency.
Driving the build with Earthly
Earthly is a build automation tool for container-based tools. It uses
the Docker daemon to manage containers. I have run it with both
docker.com’s Docker Engine
and with the Docker packages
for Fedora 32: moby-engine and
docker-compose.
Earthly is controlled by an Earthfile, which is like a
Dockerfile, broken out into targets like a Makefile. Each target
produces an entire container image, including all side effects. If
anything in your build leaves stray files behind in /tmp or the user’s
home directory, they will be persisted.
The install is simple—it’s a single binary. The install instructions
on the Earthly site will put it in /usr/local/bin by
default, but there’s nothing else to add or configure besides Earthly
and Docker. More info: Earthfile reference
There is an example Earthfile for a C++ project with CMake
that I’ll use as a starting point.
Planning a Signify build.
My Signify build will have to be a little more complicated than just
installing the packages I need from the package manager, copying the
Signify source code into the container, and then running
make.
In order to make a static build with signify’s bundled copy of
libbsd, I will also need to download and verify a libbsd
release. The signify Makefile already knows how to download libbsd and
build it into a statically linked signify binary. All I have to add to
do is
make BUNDLED_LIBBSD=1 static
But if I do that, every time I do a build, I have to go out on the
network. Behind the scenes, the signify Makefile is running
wget to download first the signature for the libbsd
release…
$(WGET) -cO $@ '$(libbsd_ASC_URL)'
and then the tar file.
$(WGET) -cO $@ '$(libbsd_TAR_URL)'
So I really want to separate the download step from the build step. I
want something like this.
Set up the base system and save a container image.
Download libbsd and save a container image.
Copy my current version of the code into the container, do the
build, save the build artifacts.
Step 3 shouldn’t require any network access, so should be really
fast. As fast as a regular make, anyway.
First try, first FAIL
Here’s my first attempt at step 2. I’ll take advantage of the nice
libbsd-download target in the signify Makefile, and do this
right after I copy the code into the container.
COPY --dir . /code
RUN make BUNDLED_LIBBSD=1 libbsd-download
No, wait, libbsd-download needs to check the signature.
Make that:
COPY --dir . /code
RUN gpg --import /root/keys/libbsd.asc
RUN make BUNDLED_LIBBSD=1 libbsd-download
SAVE IMAGE
When I first tried this, I was not able to make the
libbsd-download target, because GPG tried to leave a socket
behind under .gnupg in the build user’s home directory.
+build | ERROR: (RUN [make BUNDLED_LIBBSD=1 static]) executor failed running [/bin/sh -c /bin/sh -c 'make BUNDLED_LIBBSD=1 static']: buildkit-runc did not terminate successfully: context canceled: context canceled
Error: solve side effects: build error group: solve: failed to solve: rpc error: code = Unknown desc = failed to compute cache key: failed to create hash for /root/.gnupg/S.gpg-agent: archive/tar: sockets not supported
If you found this page by Googling for archive/tar: sockets
not supported, here’s the answer. It’s a known bug in
buildkit, the software build system maintained as part of Moby,
which is the open-source project that forms the basis of Docker. Earthly
has fixed the
problem by updating to the new version of buildkit.
If you’re still seeing this error, you can (1) upgrade your Earthly
and Docker, (2) don’t try to do any build steps that run GPG until the
final target, or (3) remove the sockets by adding
RUN rm -f /root/.gnupg/S*
before the SAVE IMAGE.
So my first attempt at getting Signify to build was:
copy the signify sources over
import the key
Do a make libbsd-download
remove the GPG sockets because they can’t be saved in the container
image
finally, save the image.
In Earthfile, that looks like this.
RUN gpg --import /root/keys/libbsd.asc
RUN make BUNDLED_LIBBSD=1 libbsd-download
RUN rm -f /root/.gnupg/S*
SAVE IMAGE
But that’s a sub-optimal solution.
Splitting out download, copy, and build steps
The problem with the above method is that if I change something in
the signify source code, the Earthly build has to go download libbsd
again.
This is slow, and bad style, and it means if you need to make a quick
change to the C source code, the build still goes and gets some
unchanged dependencies.
Ideally you have all your dependencies stored locally, so if there’s
a network outage, or a trade war, or some developer rage-quits and takes
their downloads page down, the build will still go brrrrr. Not that
anything like that would happen in the case of signify, but you never
know. And since Earthly is new enough that early Earthfiles will end up
being copied and changed for generations, like Makefiles, I might as
well figure out a generally good way to do it.
Making it all work.
So here’s the solution I came up with. First, I’ll get the base
system set up. This should be familiar to Docker users. The
root user is going to need a copy of the public key needed
to check libbsd, so we’ll get that too.
# build.earth
FROM debian:stable
# install build dependencies, then clean up system packages
RUN apt-get -y update && \
apt-get -y install build-essential file make gcc git pkg-config wget && \
apt-get -y --purge autoremove && \
apt-get -y clean
# Fetch the public key for the libbsd release. This will be needed in
# the build step.
RUN mkdir -m 700 -p /root/keys /root/.gnupg
RUN wget https://www.hadrons.org/~guillem/guillem-4F3E74F436050C10F5696574B972BF3EA4AE57A3.asc -O /root/keys/libbsd.asc
WORKDIR /code
Now it’s time to get the bundled libbsd. Instead of running the
entire make libbsd-download, we’ll just grab the files. We
can apply the “Don’t Repeat Yourself” principle to the URLs, by having
the Makefile tell us what they are, using the
libbsd-print-urls target.
bundle:
# This target downloads the bundled libbsd. This should only run again
# if the Makefile changes.
RUN mkdir /bundle
COPY Makefile /bundle
# The Makefile includes a "libbsd-print-urls" target that prints the
# URLs of the libbsd files needed to work with this version of signify.
RUN (cd /bundle && make BUNDLED_LIBBSD=1 libbsd-print-urls | xargs wget)
RUN rm /bundle/Makefile
# Now all that is left in /bundle is copies of the files listed by
# libbsd-print-urls.
SAVE IMAGE
At this point, we have a container image with the libbsd code and
signature in /bundle, and the key needed to check it in
/root/keys. Now it’s time to copy in the actual code, and
add the libbsd files.
code:
# Copy everything, then copy the libbsd files in.
FROM +bundle
COPY --dir . /code
RUN cp /bundle/* /code
SAVE IMAGE
The code target will get re-run any time that anything
gets changed. But it’s fast because it’s just local copies.
Hooray, time to build. We’ll do a quick touch on the
libbsd files so that the helpful and full-featured Makefile doesn’t try
to get them again, then make the executable, make the compressed man
page, run the test suite, and save the artifacts.
build:
FROM +code
# The modification date on the libbsd source and signature needs to be
# new enough for the build not to try downloading it again.
RUN find . -maxdepth 1 -name 'libbsd*' -exec touch '{}' ';'
# The build requires a GPG verify, so import the key
RUN gpg --import /root/keys/libbsd.asc
# Make the statically linked binary and the compressed man page.
RUN make BUNDLED_LIBBSD=1 static signify.1.gz
# Run the regression tests. (Even though signify is already built with
# bundled libbsd, we need to use BUNDLED_LIBBSD to keep from checking
# for a system installed copy.)
RUN make BUNDLED_LIBBSD=1 check
# Save the static binary and the man page
SAVE ARTIFACT signify AS LOCAL signify
SAVE ARTIFACT signify.1.gz AS LOCAL signify.1.gz
No need to SAVE IMAGE a container image at this point,
because I just need the two artifacts.
And it’s all done.
Right now Earthly is pretty new, so most of the discussion is
happening on the GitHub
page.
The project is responsive to issues and suggestions—they implemented
my
suggestion to move the cache out of /tmp and into what
I think should be the FHS-compliant
place for it. Watch the Examples on their docs site
for more sample builds.
This is a sponsored post. This article and modified
versions of this article may be copied and redistributed under the same
terms as Earthly.
This article and modified versions of this article may be copied and
redistributed under the same terms as Signify.
Q: CCPA opt-outs do not have to be verified? Does that also apply to
Right to Know and Right to Delete?
A: No. The standards for opt out and for other CCPA requests are
different.
Deletion or Right to Know have to be either verified, (or, in the
case of an agent request, done with a power of attorney).
Direct
Agent
opt out (Do Not Sell)
no verification
no verification
Right to Know
verification
verification with the consumer or power of attorney
Right to Delete
verification
verification with the consumer or power of attorney
Q: Don’t companies have to verify requests in order to avoid
risks?
A: Yes, this is why there is a verification (or power of attorney)
requirement for Right to Know and Right to Delete. The customer doesn’t
want to risk having their record sent to the wrong person, and they
don’t want to risk having their loyalty program deleted right when they
were on their 9th paid sandwich and they have a free sandwich
coming.
But an opt out is different. The regulations require no verification
here because the risk is lower. The only consequence to the
consumer of accepting an opt out erroneously is that a “do not sell” bit
in their customer record gets set. Nobody’s info will get compromised or
deleted.
Q: If a company does come up with a good-faith reason to deny an opt
out, can they use the same verification process as for GDPR?
A: No. A slightly different workflow is needed. Let’s review the
regulations again…
A request to opt-out need not be a verifiable consumer request. If a
business, however, has a good-faith, reasonable, and documented belief
that a request to opt-out is fraudulent, the business may deny the
request. The business shall inform the requestor that it will not comply
with the request and shall provide an explanation why it believes the
request is fraudulent.
So you can’t just treat a CCPA opt out that you suspect is fraudulent
as if it was an incoming GDPR Article 21 objection that you can verify.
This is one of two ways that a CCPA opt out is
different from a GDPR Article 21.
For Article 21, a company can verify just based on “reasonable
doubts”. For CCPA opt-out, the company can only deny if they can meet
the higher standard of “good-faith, reasonable, and documented
belief.”
Even if the company does have a belief that meets the CCPA
standard, they can’t just redirect to the verification form that they
use for Article 21. An explanation of why they believe the opt out is
bogus has to come with the denial. And the explanation goes to the
requestor, which is the authorized agent in the case of an agent
request.
A lot of vendors still get this wrong, likely because they are still
repurposing GDPR code for CCPA. This is something you have to check.
Q: When a company receives an authorized agent opt out, when can
they contact the consumer directly?
A: For opt outs that come in from an authorized agent, there are two
situations where the company can go to the consumer for
verification.
If the opt out is missing some of the required paperwork. The
agent is required to provide written permission from the consumer. If
this is missing, the company can deny the opt out.
As for other opt outs, when the company has a “good-faith,
reasonable, and documented belief” that the request is fraudulent. The
company can’t go to the consumer to re-check a legit agent opt out, but
an agent that claimed permission in a consumer’s name would be
fraud.
Either way, the response is different from what the company would do
to verify a GDPR Article 21 objection.
If an agent opt out is missing the required written permission,
you can just drop it on the floor. There’s no requirement to let either
the agent or the consumer correct it.
If a company chooses not to act on an agent opt out, they have to
provide the full “explanation of why it believes the request is
fraudulent.”
Yes, this means that if a company did all the work of making user
stories and code for GDPR Article 21, they won’t be able to re-use them
unmodified for CCPA.
Q: Why did they make CCPA so different, and not just copy GDPR or a
subset of it?
A: That goes back to why CCPA is opt-out-based, and not consent-based
like GDPR. The authors of CCPA anticipated that an opt out system would
be more likely to hold up in court in the USA than a consent-based
system.
So in order to make opt-out workable at all, they had to make the
process reasonably lightweight for the consumer. Under GDPR the consumer
can just choose not to consent, but under CCPA the consumer has to
choose to do something to get the equivalent result. That action that
the consumer takes by choice has to be feasible.
Q: Which service providers are getting it right?
A: Implementing the opt out code path correctly is likely to be a
differentiating advantage for service providers in 2020. A lot of the
first wave of CCPA services just copied GDPR and made best guesses on
the regulations, but now there is an opportunity for services to get the
difference between opt out and other request types correct.
Watch this space.
Q: If verification is not required, why do authorized agents still
put their users through a verification process?
A: Verification for opt out is manageable if you do it once per agent
relationship and not once per logo on the Lumascape.
But still, if an opt out does not have to be a verified consumer
request, why do authorized agents go ahead and verify emails and phone
numbers? Because if we don’t verify, then pranksters will sign up with
fake names, the agent will look silly sending opt-outs from Mickey
Mouse, and all those bogus opt-outs will eventually help companies form
a good-faith reason to believe that opt-outs from this agent are
fraudulent, and start denying them.
Doing verification on the agent side means that by the time the
company sees the opt-out it will be verified to a standard strong enough
that there is no reasonable way that the company could have a good-faith
belief that it’s fraudulent. So they have to handle it in one step
without looping in the consumer.
Related
ENFORCEMENT
ADVISORY NO. 2024-01: Applying Data Mimimization to Consumer
RequestsThe Enforcement Division is observing, however, that
certain businesses are asking consumers to provide excessive and
unnecessary personal information in response to requests that consumers
make under the CCPA. The Enforcement Division reminds businesses to
apply the data minimization principle to each purpose for which they
collect, use, retain, and share consumers’ personal
information—including information that businesses collect when
processing consumers’ CCPA requests.
This is how I check that I didn’t leave something uncommited or
unpushed at home before going to work. Find all the Git repositories,
run a status in each one. Surprisingly useful.
ssh to log in. You give the site your ssh public key when you
make your account, then to start a session you ssh to a service that
gives you a URL containing a session key.
$BROWSER $(ssh login.example.org)
I guess we can rearrange the default web site to-do list, to put the
branded jackets and the user conference ahead of the job server and
full-text search engine. Some good articles on simplifying the back end
by making PostgreSQL do it.
Since CCPA is opt out, Google Analytics is going to be fine unless
you know the site visitor has opted out. Once a site visitor sends you a
CCPA opt out, what do you have to do to square it with Google?
This is where the Google
documentation gets a little confusing. They’re not going to tell you
too much about whether a specific setting will get you to compliance,
probably because that sounds too much like giving legal advice, and
every site is different anyway. A
wise man once said, Go not to the Elves for counsel, for they
will say both no and yes.
Realistically, the Attorney General’s office has limited time to
bring CCPA enforcement cases, and realistically, hardly anybody has time
to read n pages of Google documentation except the
n/(reading speed) developers that Google can afford to hire,
and
nobody has enough C++ developers to keep up with Google’s replacement
technologies for the 3rd-party cookie, either so all you
really need to do is not be one of the dozen or so creepiest, or most
famous for being creepy, companies out there.
There is similar CCPA compliance stuff for other areas like ads,
which gets a little more gnarly. But unless you have time to do every
privacy compliance thing to white-glove standards, Google Analytics is
probably not worth that much time. (Facebook Custom Audiences, on the
other hand, are a big potential red flag, and anyone who takes the time
to rage-surf Facebook for companies giving their contact info to the
World’s Creepiest PHP Programmer is going to see them. Probably worth
more compliance checking time than it looks like they’re getting. But
that’s another story.) The one exception is if you’re stuffing PII into
analytics events (which you can do, because you
can put all kinds of stuff in analytics) but that’s hopefully rare
enough that nobody reading this blog is doing it. So there’s plenty more
to worry about than Google Analytics. And like I keep saying, all this
stuff should be a win, not a cost center. In the near future, healthy
organizations, fandoms, communities of practices and audiences will
cooperatively (and with the help of publishers) spew forth CCPA opt-outs
to protect themselves, like Penicillium colonies soaking their
neighborhoods in antibiotics.
Here’s a quick update on CCPA opt out, nerd
edition, which describes how I can send so many CCPA opt-outs so
quickly. As you may recall, I made a simple CCPA opt-out tool,
using…
You can’t just cut and paste a set of existing GDPR compliance tools
and processes (or a subset of what you do for GDPR) and get to CCPA
compliance. (Update 23 March 2025: CPPA
Penalizes Honda for CCPA Compliance Violations by Alan L. Friel,
Kyle R. Fath, Niloufar Massachi of Squire Patton Boggs (US) LLP—detailed
coverage of a case that includes this problem)
One area where CCPA and GDPR are substantially different is identity
verification. (This is something that published articles on CCPA
compliance often get wrong. Check with your lawyer.)
GDPR:
where the controller has reasonable doubts concerning the identity of
the natural person making the request referred to in Articles 15 to 21,
the controller may request the provision of additional information
necessary to confirm the identity of the data subject.
CCPA
regulations: A request to opt-out need not be a verifiable
consumer request. If a business, however, has a good-faith, reasonable,
and documented belief that a request to opt-out is fraudulent, the
business may deny the request. The business shall inform the requestor
that it will not comply with the request and shall provide an
explanation why it believes the request is fraudulent.
If someone sends a GDPR Article 21 objection, the recipient is
allowed to ask them for additional info to verify themselves, and
doesn’t have to explain why. But if someone sends a CCPA opt-out, the
recipient has to act on it unless they have a good-faith, reasonable,
and documented belief that it’s actually fraudulent.
And, on denying an opt-out, the recipient must provide an explanation
of why they believe the request to be fraudulent. This writing
assignment for the recipient is in CCPA but not GDPR.
(This only applies to out outs. The recipient can verify identity if
someone asks for right to know and/or right to delete.)
Also, the CCPA opt-out doesn’t have to come directly from the natural
person. It can be from an authorized agent or a browser setting. The
recipient still has to have that good-faith, reasonable, and
documented belief in order to deny it, and they still have the
writing assignment.
What the script does is generate a GPG-signed opt-out request that I
can edit and send in Mutt, and the letter contains a link to the Keybase
profile. Remember that the CCPA regulations Section
999.315(h) if you’re following
along at home. say,
“A request to opt-out need not be a verifiable consumer request. If a
business, however, has a good-faith, reasonable, and documented belief
that a request to opt-out is fraudulent, the business may deny the
request. The business shall inform the requestor that it will not comply
with the request and shall provide an explanation why it believes the
request is fraudulent.”
They’re not allowed to deny the first one, opt-out, unless they come
up with a good reason. Their choice is to do the opt-out or to take on a
writing assignment for me for no money. And if they really want to pick
the latter, and write an explanation of their “good-faith, reasonable,
and documented belief” that this opt-out is fraudulent, they have to
learn GPG and Keybase, and no growth-hacking marketer is going to spend
time doing that.
This script works so far. They just do the opt-out.
Sometimes they’ll even treat the opt-out alone as either a Request to
Know and/or a Request to Delete, because seriously, time is limited and
once you have decided to just give the privacy nerd what they want, you
might as well get all that nerd stuff out of the way at once.
Next steps? Make it work for more people, opting out of more
companies. Right now this is just a simple version of the items needed
for a “real” opt-out.
credibly claim that I am the person identified by data points
that can be found in a marketing database (email, phone, IDFA/AdID,
postal address…)
credibly claim that I am in a jurisdiction where I have a privacy
right
assert that the sender of this letter has the right to act on a
privacy request for me (in this case because it’s me)
To be practical at scale, all of these need good UX, working,
scalable implementation, and an effective legal payload. Some opt-outs
will be on-demand, one company at a time, and others will be larger, in
cases where you want to opt out of every company engaging in a specific
practice and have your authorized agent do it.
But for now, for someone with my mail setup, doing CCPA is pretty
convenient.
CCPA is not just for privacy nerds. If you have a problem with a
company over some other issue (pollution? wage theft? tax avoidance?)
then the CCPA provides a way to register a complaint and make it stick.
Companies do have to pay attention to CCPA requests.
A lot of ways to express alignment with a cause are things to not do,
such as not buying or not crossing a picket line. CCPA requests can be
almost as lightweight, but are unambiguously something to do.
There are three kinds of CCPA requests.
opt out (Do Not Sell).
Right to Know.
Right to Delete
Opt-outs are the easiest. They do not have to be verified. This is a
fast, lightweight way to express that you have a problem with a
company.
“A request to opt-out need not be a verifiable consumer request. If a
business, however, has a good-faith, reasonable, and documented belief
that a request to opt-out is fraudulent, the business may deny the
request. The business shall inform the requestor that it will not comply
with the request and shall provide an explanation why it believes the
request is fraudulent.”
If you just do an opt out, they can’t go make you go fill out a bunch
of web forms for ID verification while they shoot at your feet. They
have only 2 choices.
Act on your opt-out
Send you an explanation of their “good-faith, reasonable, and
documented belief” that your opt-out is fraudulent.
The other two require identify verification, so are time-consuming to
do as an individual. They do more for you, and send a harder-to-ignore
message, but they can make you do identity verification steps.
An opt-out is easy, but what does it really do? Most companies will
say that they don’t “sell” your personal info for money. However, CCPA
has a broader definition of selling, which lawyers
say is confusing, and service
providers offer to help with. The CCPA defines “Sale” as
selling, renting, releasing, disclosing, disseminating, making
available, transferring, or otherwise communicating orally, in writing,
or by electronic or other means, a consumer’s personal information by
the business to another business or a third party for monetary or other
valuable consideration.
This definition covers some common marketing practices. Interestingly
enough, while Facebook is trying to carve out a CCPA exemption for
themselves, they do require their advertisers to honor opt-outs. The Facebook
Custom Audiences Terms say
“You represent and warrant that the Hashed Data does not relate to
data about any individual who has exercised an option that you have,
directly or indirectly, committed to honoring or provided to opt out of
having that data disclosed and used for targeted advertising. To the
extent an individual exercises such an opt-out after you have used data
relating to that individual to create a custom audience, you will remove
that data subject from the custom audience.”
From the privacy nerd point of view, it makes sense to get as many
people as possible opted out everywhere. But from the point of view of
influencing companies on other policy issues, can CCPA campaigns fill a
niche between just Tweeting and more time-consuming actions?
Here’s a CCPA
opt-out email I have been working on. Put in your own cause and
contact info. This is a little long because there has to be some
explanatory material for now. A lot of companies are still learning
CCPA, and they have a bunch of marketing stuff going on that they might
not be aware of. (Maybe last summer’s marketing intern uploaded a
customer list to an account they don’t have the password to any more?)
For now the responses can be all over the place.
If you want to try CCPA-ing some companies to get a sense of how it
works, try this:
Californians, did you know you can now opt-out of companies selling your
personal information online? You have this right under a new CA privacy
law, but we'd like your feedback. Sign up below for our volunteer study,
and help us stand up for your privacy.
Under the GDPR in Europe, people have a similar
but easier to exercise right. You can send a letter like the one here:
Objecting
to Direct Marketing. Under CCPA, they can make you fill out a web
form, but under GDPR the objection can be free-form (look for a “dpo@”
or “privacy@” address in the privacy policy.)
If we want to do anything about the surveillance economy problem, it
helps to try to figure out why it’s so important to so many people to do
surveillance marketing working definition of
surveillance marketing: any database
marketing where some or all of the data used is derived from surveillance
instead of other kinds of marketing. One way to look at it is that it’s
not about the surveillance. Surveillance is a tool in a more complicated
process. Today, the surveillance marketing business looks like the
Establishment, but it used to be cool (business cool anyway) back when
it was emerging from the open source scene of the late 1990s and early
2000s.
Doc
Searls wrote,Madison Avenue fell asleep, direct response
marketing ate its brain, and it woke up as an alien replica of
itself. Part of that was driven by the realization that the same Commoditize Your Complement
strategy that applies to device
drivers and other
open source software can also apply to every business, everywhere.
It’s more of a commodification economy than a surveillance economy, but
we got started saying surveillance economy so we might as well stick
with it.
The general direction of the “surveillance economy” is to build a
system where reputation graphs gain sustainable value, and goods and
services (whether they’re PC hardware, or software, or songs, or news
stories, or home delivery of stuff) lose value until they become
commodity nodes on someone else’s reputation graph. Information goods
such as software code can and should be public, while reputation graphs
are hidden by legal restrictions, complexity, and scale. And there’s a
reputation graph for everything. Amazon sellers. Uber drivers. News
sites. The same adtech companies that facilited
content commodification in 2014 are facilitating
content commodification today. Web adfraud is just as necessary for
web adtech as extra drivers cruising around are necessary to Uber.
The first market in information goods to get commoditized was the
slothful, overpriced Unix business, which was already being starved of
investment as CIOs and analysts talked up the inevitability of Windows
NT. When Linux came along, the commodity was an improvement. Naturally,
when the commodification strategy worked in one business, the winners went on to apply
it elsewhere.
Use a stack of janky PC hardware instead of splashing out on a
Digital Unix server: cool hack
Use a stack of miracle cure peddlers instead of splashing out on
a star MD: maybe not so much
YouTube doesn’t show you videos about drinking bleach because they
want you to drink bleach. They show you videos about drinking bleach
because the recommendation algorithm is set up to avoid creating the
kind of powerful stars who tend
to emerge on other media. The bleach-drinking advocacy is just a
side effect of a system designed to steer you away from any content
creator who might develop market power.
Algorithms that artificially boost “long tails” in content We can see this happening but it’s hard to quantify from
the outside. We won’t see it for real until the discovery phase of some
ugly lawsuit over some tragic event. are an essential part of the
commodification strategy. For a platform company, fraudulent and misinfo
content doesn’t look good in the PR clippings, but it’s manageable if
you can get enough Serious People to write slick PDFs about how it’s an
Industry Wide Problem. Allowing the algorithm to let user preferences
create bankable stars would be a real threat.
For every bad thing on the Internet, there is an “if this goes on”
dystopia story. If the surveillance economy keeps going the way it’s
going, we’ll all end up working for and buying from one big company
built around a really complicated reputation graph that measures who can
catch the tastiest rats in the ruins of civilization. But no dystopia
narrative runs out all the way (yet). Just as the surveillance economy
dystopia narrative mostly ended the IP Maximalism dystopia narrative,
we’re going to get a new one.
So that’s the fun part. Which marketers will manage to de-commodify
themselves? (Maybe think economic signal, using the whole customer
including their own hard-wired monkey brain ability to process
reputation and how brands are a cognitive hack on that, and so on.) One
piece of good news. Section
230 of the Communications Decency Act helps platform companies
enable commodification, but other laws and regulations are giving us
tools to resist it. More on that soon.
Southern California Linux Expo, 2023: Leveling up your open-source
privacy toolset for 2023 Most of us have built up a collection of
privacy tools, including browser extensions, settings, and services. But
privacy threats keep changing, while at the same time new kinds of tools
and services have become available. This talk will help maximize the
value of your privacy toolset for today’s best options, while helping
you plan for the future. (March 12, 2023)
Southern California Linux Expo, 2020: Hacking
the California Consumer Privacy Act for Fun and Profit (and freedom and
privacy) A new California law that will help protect you from scams
and misinformation, by taking control of your personal information. For
legit sites that respect people’s privacy, CCPA helps shift the balance
of power on the web away from the “data brokers” that share people’s
sensitive health and finance information on the Internet, and toward
sites that people trust. (March 5-8, 2020)
Southern California Linux Expo, 2020: Designing
a market to reduce software risk and compensate open source
contributors Developers face the risks of under-incentivization,
leading to lack of sustainability of key projects. Meanwhile, users face
the risks of missing or delayed features, bugs going un-fixed, and
software projects going unmaintained. This project is one of several
recent attempts to improve transparency, incentivization, and
sustainability in the production of software. (March 5-8, 2020)
Trust, Identity and Data Privacy: A Multistakeholder Approach: Report from Mozilla’s test of their
global consent browser-plug in tool Users need help managing their
privacy preferences across multiple web services. Mozilla is testing
“Global Consent Manager” — a browser tool for that purpose. Learn about
the test and the surprising initial data about engagement. (April 24-25,
2019)
Nudgestock
Festival Why exactly does digital advertising make people feel
uncomfortable? What are its downfalls? And how does GDPR present an
opportunity for brands seeking to reach people online? (June 2018)
Mumbrella
MSIX, with Eaon Pritchard Don Marti, open source innovation
strategist at Mozilla, and Eaon Pritchard, head of strategy at UM, will
take the stage to explore the ‘signal’ in digital. With web advertising
grabbing headlines for fraud, privacy and trust issues, Marti and
Pritchard will look at making it possible to use the web as a different
kind of advertising medium, one that works better for carrying an
economic signal.
FOSDEM
2018 In order to make open collaboration more effective, we are
using simple market mechanisms to add incentives to do useful work. This
demonstration shows how users can put financial value directly in the
hands of the people who can fix the software issues that are most
important to you, and how you can discover which issues really matter to
yoru project’s users. (February 3-4, 2018)
MozFest
2018, with Matt Snell We will discuss how consent management on the
web works today, and the relationship between user privacy and reputable
content providers. Web users face a confusing array of data sharing
choices, and click fatigue can lead to poor user experience and possible
inadvertent selection of options that do not match the user’s privacy
norms. Reputable content providers face revenue disruption arising from
changing data privacy regulations. (October 26-28, 2018)
Within two years or so, marketers will have to deal with a new
technical and regulatory climate, where a person’s information is only
in a company’s database if that person wants it to be. If a person
doesn’t want to be tracked and targeted, or hasn’t heard of you, their
information will not be available to you.
This might sound like a big change, but it’s where privacy technology
and regulation are headed. Here in California, we will keep signing and
passing privacy initiatives until marketing practices that we see as
creepy go away. If privacy settings on a web site or app are too
confusing, people will ignore them and fix the privacy settings through
the political process. The harder it is to work through today’s
California Do Not Sell process, the easier it will be for the next
initiative to get passed. And the next after that. The California
privacy initiatives will continue until California voters are happy with
how the least trustworthy companies handle their information.
And just as privacy laws and regulations are popular
in politics, privacy features are popular in products. (If they weren’t,
Apple wouldn’t be all What
happens on your iPhone stays on your iPhone. The billboard would
be iPhone shares your life with brands you love or
something.)
Today, database marketing companies try to make it a pain for people
to exercise their right to have personal information deleted. But that
can only delay what’s coming, and build support for proponents of
stricter privacy laws. And those laws don’t have to be perfect. I don’t
have to get every record mentioning me out of every database everywhere.
I just have to be hard enough to reach with personally targeted ads and
other forms of database marketing that it becomes too costly to do it,
compared to other ways of getting a message in front of me. (Targeted
ads are a form of database marketing. Doc Searls
explained that a while ago. And anonymized ad tracking
identifiers are PII until proven otherwise.)
Today, people have a choice.
accept tracking and targeting
do a bunch of boring privacy nerd stuff to get out of tracking
and targeting
vote to change the game
The third option is the only easy or effective one. The result is
that at some point anyone doing database marketing will have to make a
case to every individual in their database for why the marketer should
have that individual’s information. If you can’t convince me to let you
have my info—not just trick me into fake consent or bore me into giving
up—you’re not going to have it. When you sow dark patterns, you reap
privacy regulations.
Outside of adtech/martech, business will go on pretty much as usual.
I will put up with being in someone’s database to facilitate a
transaction. You can’t ship my order unless you know where I live. I
might even give up my information as part of a sales process or to
receive a newsletter. I’m fine with being on a subscriber list for a
publication, not just in order to receive information and culture, but
also to be part of a community of some kind. In those cases, though, I’m
still aware that I’m making a trade with a known party, and I’m
accepting risks. And I’ll sign a privacy initiative that limits what you
can do with my info once you have it.
Can database marketing sell itself to the people in the
database?
What objections are marketers going to have to overcome to convince
people to consent to the use of their information? How can database
marketing justify itself to the people in the database? Today, I’m not
convinced that database marketing can do that.
Will you use my information for price discrimination or
discrimination against protected groups of which I am a member? For
example, if you know my ZIP code will you quote me a higher price if I’m
far from a competing big box store selling the same item? If you know
something about my family life will you decline to show me ads for
housing or jobs?
I have enough stuff, and if I want to buy more stuff I’ll go
search for it. Why are you tracking me instead of putting your marketing
efforts into content marketing and SEO that will help me when I need to
find your product?
Wait a minute, don’t I want more
relevant ads? Of course not. When an ad is targeted to me, it’s
easier to make it deceptive. I want the ads that your existing customers
and employees see, too. I want the ads that the editors of the trade
publications that cover your industry can see. I want the ad that your
parents see.
Don’t publishers make more money from targeted than non-targeted ads?
Yes, in a system where both kinds of ads are available reaching the same
audience. I’m not interested in being targetable by anyone. If you want
my eyeballs, pay for valuable content.
You might be able to sell me on being listed in a database if you
offer me meaningful discounts. One possible result of the shift away
from database marketing without consent is driving more and more
marketing attention to membership programs like Costco and Amazon Prime.
Instead of surveillance marketing to individuals, you will have to get
inside a wall by dealing with one big company.
Every day, users provide media companies with the type of explicit,
accurate, data other businesses covet. Consumers willingly volunteer
this data as they register or subscribe in order to unlock more content
or build a deeper relationship with their site of choice. And it’s
exactly this type of information that, if used right, can push those
companies ahead in this new data landscape.
LiveRamp’s Authenticated Traffic Solution (ATS) is creating more
chances to match online users based on email. Publishers using ATS ask
readers to submit their email for free access to content. That email
address can potentially be matched to the IdentityLink profile, creating
a chance for one-to-one targeting without having cookies, and with clear
consent because the email was given to the site. Howe said LiveRamp has
12 SSPs and 30 DSPs buying on ATS or committed to its adoption.
One possible future for post-cookie web advertising is going to work
something like this: if you’re signed in to a site, you’re going to get
something pretty close to adtech as usual, except limited to the group
of sites where you’re willing to sign in. So if two publishers can both
use a registration wall to get your email (or SSO that maps to your
email, which is basically the same thing) then the same ads will
“follow” you across both those sites, and you’ll see ads targeted based
on loyalty programs you opt into.
This means an increase
in market power for publishers from the conventional third-party
cookie, because crappy and fraudulent sites will have a hard time
getting your email or SSO. For advertisers, the game of tag, trying to
get ads in front of specific people, continues, except that the
boundaries for the game are brought in to include only sites that can
get people past the reg wall. LiveRamp CEO Scott Howe explains, in Why
Addressability Will Flourish In A Cookieless World |
AdExchanger.
On sites where you’re not signed in, you’re going to get ads for
miracle fungus cures, predatory finance schemes, and other bottom-feeder
stuff—unless you’re running a browser with built-in targeting/atribution
(the stuff being discussed
at W3C’s advertising business group) and leave it turned on. In that
case your browser will do magic JavaScript tricks to give you
reasonable-looking ads from legit companies—but the site still has an
incentive to get you signed in if possible. (The clickbait of today is a
photo and headline. The clickbait of two years from now will be a photo,
headline, and a killer first two paragraphs to run above the SSO
button.)
Fraud doesn’t go away entirely in this scenario, but the difficulty
of fraud schemes does shift, and tends to make other platforms, such as
streaming
media, into more attractive fraud targets.
Problem: Blog ads don’t pay and web ads don’t carry economic signal.
And if you use a regular ad network you’ll probably get a bunch of
shitty ads on your blog anyway, so not worth it.
Thinking about a solution (work in progress, comments and suggestions
welcome)
How it works (for bloggers)
Put the ad tag on your blog, apply to the ad network stating how
much you want per day for that spot on your blog. (You can attach
analytics and stuff to your application, but a lot of that stuff is
bogus anyway so don’t sweat it.) At this point, a sample placeholder ad
starts showing up on your blog, so the other members can check it
out.
The other members vote. Result can be: you’re in at the rate you
asked for, you’re rejected, you get told to come back with a lower rate
or change your site design to give the ad better placement or
both.
If you’re in, you’re in. Any time a paying ad gets accepted, it
runs on every member’s blog and everyone gets their daily rate.
If you want, you can make a “house ad” for your blog that might
run (members vote) on everyone’s blog, on days when there is no paying
ad.
As long as you are a member, you can vote on ads, new members,
and the house ad of the day for unsold days.
How it works (for advertisers)
You can see a list of members, the site analytics they choose to
share, and an estimated total rate at any time.
You make an ad and upload it, along with the URL of the landing
page you want it to link to.
Members vote up/down. (Your ad must get a 2/3 yes vote, but it’s
a short voting time and a missing vote counts as a yes. So you’re
probably in, unless you’re that gut doctor begging people to throw out
vegetables in which case go try a conventional ad network.)
You pick days you want the ad to run, and sign an insertion
order.
Your ad runs. You are charged only for blogs that were members at
the time you signed your IO. Members approved between the time you
signed your insertion order and the time your ad ran are free.
You get an invoice and pay it. (No, you don’t have to pay for
your ad up front. If the bloggers didn’t trust you to pay an invoice
they wouldn’t have trusted you to sell your stuff to their
readers.)
The reason it’s one ad everywhere is to try to get some signal out of
it. Readers will be more likely to check out their favorite blogs in the
network, and not block ads there, when an ad is more like a magazine ad
or a trade show booth—a clear signal that the advertiser is investing
money in reaching a community of practice.
(Yes, this is sort of the spawn of The Deck and Project Wonderful,
and both of those failed, but these are the days of countdown to the end
of the third-party cookie, so worth trying again imho.)
As you might have noticed, the main page of this blog is not
paginated. The text goes all the way back to the beginning. Fortunately
I don’t write very fast so I can keep filling it up and it’s not even
especially big for a web page today.
(update 6 Aug 2020: edit for clarity, note at end)
Just popping up for a quick look at the big picture before digging
into the details of advertising markets. Realistically, privacy
regulations and tools are going to have a bunch of positive impacts on
the economy as a whole.
Reallocation of information and communications technology
(ICT) investments and skills. As investment shifts out of
surveilance marketing, available financial and human capital in ICT will
end up moving to positive-sum interactions. Capital can move remarkably
quickly in response to regulation as we saw in the early 2000s. The Digital
Millennium Copyright Act drove investments away from many kinds of
infotainment startups, and into surveillance marketing.
Today, the regulatory pressure is in the other direction. Not all of
the shift will be away from marketing in general. While no-consent
surveillance gets harder, other investments in marketing technology will
show greater returns. For example, a mapping service will be able to add
fuel, restroom, and food stops to planned trips—based just on express
intent and on information supplied by businesses. And contextual
targeting is likely to keep getting better.
Important to understand the extent to which different “Big Tech”
companies are exposed to different aspects of the transition. (Search
and map ad revenue is less at risk than audience-based ads tied to
poorly understood social surveillance practices, where voters will keep
supporting laws and initiatives until the creepy headlines
stop.)
Increased market power for news and cultural
industries, resulting in a news and culture boom. This is not
just because of context-based targeting, but because limitations on
tracking users across contexts will remove some of the so-called
“infinite inventory” that tends to drive down ad rates everywhere. More
ad money for creative businesses means we get to see more high-risk,
potentially high-value news and creative works, such as big
investigative reporting projects.
Lower expected returns to some kinds of crime.
User-targeted ad placements give a persistent structural advantage to
deceptive sellers. Limits on targeting will reduce may opportunities for
fraud against both legit advertisers and against end users. A stronger
news and cultural industry will also be better funded to deliver
trustworthy product reviews.
Rapid deployment of privacy regulations and technology is likely to
result in a privacy dividend for investors, and a potential risk of an
“overheated” economy in general. Much of the boom will hit the news and
creative industries, but there’s probably some good news here for small
businesses relieved from the pressures of fraud and bidding up the price
of access to audiences.
I wrote this in February. Now the overheated economy risk
is…not so much. Although some surveillance marketing advocates are
asking for delays in privacy enforcement, it’s more likely that we will
all do better by speeding up.
Developers would prefer to release software at a high quality level.
Users prefer to use software at a high quality level. However, firms are
incentivized to release software at a lower quality level than would be
chosen by either developers or users.
How do you design a system that lets users quantify and hedge the
risks of low-quality software, while compensating developers to do the
extra work to bring the software up to a higher quality level?
A variety of systems have been proposed, including subscriptions,
bounties, and dominant assurance contracts. A
market for trading software issues describes another possibility:
futures contracts on bugs/tasks.
A market (play money only) based on this research is set for a soft
launch the first week of March. For a demo, account, and some free
(play) money, please contact me directly.
Bonus links, economics of quality and incentivization
department
Building
better systems for ad placement and attribution will depend on a
solid chain of consent from web activity to action. No attribution
tracking is going to work if consent is missing or sketchy on any link
in the chain, because browsers, competing to implement people’s
preferences on how their data is used, will drop attribution data on the
floor if it doesn’t have solid “provenance” in the form of good enough
consent.
Of course, most of the stuff covered in a “consent” dialog isn’t
really consent. IMHO there is consent that’s meaningful enough for a
browser or other software to pay attention to, and fake consent where
the best way to implement the user’s intent is either to rewrite the
consent bits, or to block tracking. Consent is hard to define.
Levels of consent, best to worst.
5: philosophical ideal of consent. Philosophers
are still working on this, so no need to implement in software
yet.
4: informed consent that’s good enough to get
you signed up as a human subject for university-run research.
Institutional Review Board approval, so software should respect, because
Science!
3: consent as part of an understandable
transaction (You have consent to use my address to ship me the
package I ordered). This is kind of like the spawn of consent and
legitimate interest. Software must implement this kind of consent, or
people won’t be able to order stuff or log in or anything, and they will
rage-quit the software that’s stopping them.
2: consent implied as part of a transaction such
as a registration wall with an email address, or SSO with a clearly
labeled button. This level is where the action is. Can user research
show that expectations on both sides are compatible? If so, this is a
win! Opportunity for software to help users by doing this right, and a
big opportunity
for sites that people choose to trust. The Site
Engagement Service in Chromium is likely to be increasingly
important here, along with related metrics for how much the a user
probably trusts a site,
1: consent buried in the fine print or in dark UX
patterns is clearly not good enough, and worth the effort for
software to block data transfer even in the presence of “consent.”
Blocking bogus consent, and telling it apart from consent that’s just
good enough not to block, is going to be a user research win, just like
blocking other creepy stuff.
0: consent fraud is common, just more stuff to
filter or block.
Attribution schemes will work as long as everyone who touches
attribution data also has consent, which implies a bigger role for
publishers in the audience data market.
Is there a way to export a non-complex Medium post (one, say, just with
links) that isn't full of cruft like "p id="4f89" class="gw gx ar bj gy
b gz ha hb hc hd he hf hg hh hi hj" data-selectable-paragraph="" ? How
about an "Export as old school html?" option in the ••• menu?
(update 20 Jan 2020: add link to Bill Fitzgerald’s blog
post.)
(update 18 Jan 2020: add embedded Tweet of fake McDonald’s
ad.)
This is the long answer to the question: why are you tweeting
screenshots of stupid Facebook scams? On Twitter it might look like I’m
just randomly talking shit about Facebook, but I do have a point
here.
Here’s an example of a Facebook Page whose owners uploaded my contact
info without permisssion. It doesn’t look like that’s really their logo,
either.
Facebook scam, fake “Amazon”
I didn’t set out to look for scam advertisers on Facebook. I did
visit “Settings” → “Ads” there, in order to make sure to send a CCPA
data deletionCCPA letter,
shell script to some
well-connected nodes in the surveillance marketing network. Hitting the
cute little “Do Not Sell” button on content sites is a lot of effort for
a little CCPA win, so the best CCPA strategy is to focus on the big
players, such as.
Acxiom
Experian
Epsilon
LiveRamp
Oracle
Those were some of the companies I expected to have my info, and sure
enough I found them in Facebook ad settings. So they’re on my CCPA list.
I don’t expect companies like that to make the CCPA process easy, but I
will do my part for the
California creative boom of 2020.
Anyway, back to Facebook scams. While I was making my CCPA list, I
also saw a bunch of Facebook advertisers like the Amazon one above, and
this fake “Gap Inc.” This one not only uploaded my info, but also got
Facebook to let them match it against Facebook’s existing user data, and
re-sell the result.
Facebook scam, fake “The
Gap”
What’s the big deal, though? They’re just regular scams, and I don’t
buy stuff off of Facebook anyway.
Here’s why I think it matters. Obvious scams are a helpful way to see
how well Facebook is enforcing its own policies on ads. Van Halen didn’t
really dislike brown M&Ms, but their
contract for live shows included a section banning brown M&Ms
backstage. The part about the brown M&M was buried in the middle
of a bunch of technical and safety requirements for the show. If the
band saw a brown M&M, it was a warning to check again for
harder-to-find safety issues.
On Facebook, a lot of the worst problems are the hardest to see. Any
halfway decent state-sponsored political misinformation operation is
going to be effectively invisible to me, and to academic and NGO
researchers, even with Facebook “ad transparency.” The misinfo people
have probably been uploading a bunch of variants on the same ad
creative, to make it impractical to check it all. They have an
inoffensive, generic name, use a US-based Facebook user with a clean
account to be their point of contact, and carefully filter their Custom
Audiences to key purple-state voters. And, as long as they don’t tag
their ads political, and take them down before anyone reports them as
political, the
ads won’t be available afterward in Facebook’s Ad Library. Bill
Fitzgerald explains in a blog post summarizing a recent Twitter Q&A
session.
Can we take Facebook’s word for it that they’re doing anything about
sneaky, invisible state-sponsored misinformation? I doubt it—when
they’re serving a big bowl of brown M&Ms, in the form of obvious
scams, to everyone who looks at their ad settings page. It’s hard to
believe that they take an invisible problem seriously when the visible
problems that would get fixed as a side effect of addressing it are
still there. (And, of course, when deceptive ads serve the company’s own
interests. Facebook is pushing an ambitious cryptocurrency scheme that
depends on approval from US regulators, and the results of the 2020
election will decide who those regulators are.)
Facebook has two sets of rules for deceptive ads: the written rules
that they show to media and the government, and the unwritten rules that
they teach to their scam and misinfo advertisers by example. The
unwritten rules, which encourage deceptive advertising, matter. The
written rules, not so much.
Hypothetially, what would Facebook do if the company’s true intent on
misinformation matched the written ad policies they claim to enforce?
They would deploy a few fairly basic “skin in the game” fixes.
No more credit card payments for advertising, invoices
only, net 90 If an advertiser is so untrustworthy that Facebook
doesn’t even know they’ll pay their invoice, then that advertiser is not
trustworthy enough to put in front of a user.
Rewards for reporting violations. If a user
reports an ad that violates a policy, and the advertiser gets kicked
off, then let the user keep the ad money that came in from that
advertiser. Include the owners of email addresses and phone numbers who
get added to Custom Audiences without consent in the program. (And no,
this does not incentivize users to deliberately post scam ads and report
them, because they would just get their own money back.)
Notify advertisers when their ad ran on content, or near
an ad, that is removed for policy violations. Right now a lot
of important brand safety issues are hidden, because advertisers can’t
see context. Give the legit advertisers the visibility they need to
decide how much brand safety risk to take.
Top management at a large organization cannot micromanage for trust
and quality. But they could, if they wanted to, set up the culture and
incentives to make it important to all decision makers at the
organization. Right now Facebook is set up to encourage and profit from
deceptive advertising while imposing deceptive advertising’s costs and
risks on their users and on society in general.
All right, enough fun with zany scams, back to CCPA-ing any real
company that put me in a Custom Audience.
But marketers are citizens and parents, too. The
struggle over advertising’s alignment isn’t about privacy nerds on one
side and marketers on the other, it’s about forming a connection between
people looking at the advertising problem from both sides. A privacy
developer building a system to help users control the use of their
personal information is working on one subtask of the same project as a
marketer who needs a trustworthy platform to build brand equity.
The problem, since the rise of third-party cookies
in the first dot-com boom, has been that negative-sum marketing
investments are much better at justifying money spent on them than
positive-sum, signaling-based investments. As a marketer, it’s easy to
show numbers to prove the success of a creepy project, while leaving the
long-term
damage to the brand to the next person.
Now for the good news.
Starting tomorrow, the California Consumer Privacy Act will give us
an important tool to shift the balance between positive-sum and
negative-sum advertising, by interrupting the data flows that
allow for the placement and measurement of the bad stuff. The CCPA
is an important tool to help marketers concerned about brand equity, to
redirect ad budgets to support the creative work that we want. January
1, 2020 is CCPA Day, and the beginning of the journalism and culture
boom of the 2020s.
It will take some work, though. CCPA without action by citizens is
just a piece of paper. It only works if people take action to opt out,
have their data deleted, or both. And the right companies to contact for
maximum impact are usually the shadowy data brokers that you might not
think of. Oracle is a database company and a sailboat sponsor, right?
Yes, but they’re also the owner of a collection
of database marketing companies and an important node to disconnect
from the attack path that leads to me. Even though I’m a privacy nerd
from way back, the prospect of opting out and deleting my data from all
the shady companies out there looks like a boring grind, even though I
know I’ll enjoy more ad-supported news and cultural works in the
future.
At least I don’t need to do CCPA stuff to protect myself from
anything already covered by privacy technology. I don’t need to spend
human time dealing with a problem that a machine can block. That handles
a lot of the ankle-biters of the Lumascape. But we still have a bunch of
important nodes on the surveillance economy network to get to.
Boring.
One way to get a good-sized pool of opt-outs and deletions in place
will be to gamify it. I’m using a browser extension to log when an opt
out or deletion has happened, report it to the back end. From there,
connect it to an inventory or deck-building feature of the game, so the
more opt-outs you do, the better your chances of winning.
Despite theoretically adverse conditions, we find these markets are
relatively efficient, and improve upon the forecasts of experts at all
three firms by as much as a 25% reduction in mean squared error.
(This paper covers a related market type, not bug futures. However
some of the material about interactions of market data and corporate
management could also turn out to be relevant to bug futures
markets.)
The next interesting question is: how do futures work in a real
project? Are
bug futures just high-tech piecework? This is my subjective notes on
participating in a small project using futures. (We got in the habit of
using FIXED and UNFIXED, all-caps, for the two sides.) It feels like
we’re on to something, that the market is adding some information
sharing and coordination power that’s not available in the bug tracker
alone. I’m looking forward to using markets in more projects in the
future.
Habits: I did get into the habit of quickly
selecting what to work on based on my FIXED positions. I think I’m more
of a loss avoider than a profit maximizer, and I probably passed up some
chances to buy into something I could have finished faster, and just
take a loss when an issue where I held FIXED ended up being unfixed on
the maturity date. Something to try in the future: I might be more
willing to try to resell my FIXED positions at a loss if the project had
more traders.
Pricing: As a project contributor, I tended to use
price as a signal of my confidence in being able to get something to
work. For an issue with a good description and (imho) a straightforward
fix, I would offer to buy a large quantity of FIXED at a higher price,
which means putting more of my own tokens at risk. This should help
other participants judge the likelihood of completion of particular
tasks by particular dates. The actual prices in the live, small, market
ended up being quite a bit higher then the examples in the paper.
I did end up offering extremely low prices for less well specified
issues. This was, I think, a useful signal for the people requesting the
features. The better specified the issue, the more likely to get a
reasonable offer. I don’t know how this would be different in a project
with more random wishlist bugs. There might be a trading opportunity for
people willing to hold FIXED through the process of clarifying a feature
request.
Maturities: I think a sensible “portfolio” view is
important, and would like to experiment with better ones. As a random
part-time contributor it was important to me never to build up too big
of a workload for a particular maturity date. I did find myself making
offers on an existing issue that didn’t match an UNFIXED offer from a
user, because I wanted a later maturity date. Offering a low price and a
far-off maturity date was the best way to signal that either this issue
is not comprehensible enough for me to fix, or it’s too much for a
single issue and needs to be split up.
I would like to see more live data on whether feature requesters try
to buy UNFIXED positions on contracts with less crowded maturity dates
(dates when fewer other contracts mature) to have a better chance of
getting attention.
Next steps
If you’re interested in trying bug futures for a live project, please
let me know. I already have futures hooked up to one that’s using
Python+Flask for the server, and JavaScript+WebExtensions for the
client. The market can also be hooked up to existing product. Currently
supports GitHub but more integrations are certainly possible.
Bonus links
Mostly programming, market design and incentivization material this
time.
(This is a copy of an answer that I posted to a marketing
list.)
There’s a strong possibility that 2020 could be the year that two big
trends collide.
Privacy opt-outs and objections, enabled by laws such as CCPA,
which goes into effect on Jan. 1.
Public concerns over brand safety, Sleeping Giants
style—amplified by the contentious online environment around the 2020 US
election, and the anticipated flood of misinformation, extremist, and
harassing content.
The conventional wisdom now around brand safety is that it’s a
manageable concern, and that it’s acceptable to have a certain
percentage of a brand’s ad impressions end up on really heinous material
on “long tail” sites and social media. This is likely to shift. Many
people who are already on edge politically will realize that they can
communicate their concerns in a way that brands are legally required to
act on, by using the tools available to them under new privacy laws and
regulations. Instead of just tweeting a screenshot of a problematic ad
placement, people will be able to rage-CCPA the brand responsible.
The cost of complying with GDPR and CCPA is likely to be a factor in
“flight to quality” media buying decisions.
To be honest, though, I buy stuff the other 365 days of the year
(2020 is scheduled to be a leap year) so I still get and pay attention
to advertising.
In 2020, I’m going to see what I can do to get better ads. It seems
like this is a good year to do it. Let’s review. There are three ways to
get an ad in front of me.
Context: pay for something I want to read
Search: deliver the ads with search results when I
search for something
Personalization: place the ad based on who I am
I’m fine with the first two. Context-based advertising pays for news
and cultural works.
Search advertising pays for services I use, like web search and
mobile maps.
The third one, though, is a problem. The first two are positive-sum
games between me and the advertiser, and have positive externalities.
Personalization is a negative-sum game, and has negative
externalities.
What if I treat every personalized ad I receive as a bug, and report
it?
In 2020, thanks to the California Consumer Privacy Act, I should be
able to opt out of the use of my personal information for ad targeting.
In the long run, when advertisers figure out that I’m impossible to
reach with personalization, they’ll pay more for the news, cultural
works, and services I want, and spend less money storing my information
where it exposes me to risks like fraud and identity theft.
(point of order: I’m just going to say “brand-unsafe context” here
because we already know way too much about all the worst-case
places where a Facebook ad might run. Excuse the marketing speak,
but I’m not getting paid enough for this blog to think any more about
all the redacted on Facebook than I have to. Follow the links
if you really want to read about where your Facebook ad runs.)
Facebook makes
the news in unpleasant ways, but so far this hasn’t had much of an
impact on the company. The decision-makers who matter are the people who
buy the ads, from huge companies all the way to individual small
business owners typing their credit card numbers into Facebook Ads
Manager. And those decision-makers are still buying ads. Revenue
is up.
Facebook does some magic algorithm shit to match your ad to people
likely to click on it, and some of those people click on it and buy
something. And for many advertisers the price of the ads is less than
the expected increase in profit driven by the buyers the ads bring in.
The two possible alternatives for the Facebook advertiser so far
are…
someone sees the ad and clicks it
someone sees the ad and doesn’t click it. Not good, but not
actively bad either.
“We don’t believe ad adjacency matters in certain environments…and we
designed the platform with that in mind,” said Erik Geisler, Facebook’s
director of North American agency partnerships, speaking Thursday at 614
Group’s Brand Safety Summit in New York City.
Basically he’s bragging on being able to take your brand’s ad and run
it in whatever brand-unsafe context the worst possible Facebook
user can come up with. According to Facebook, you know all that stuff
that has moderators
crying in the stairwell and one
service company quitting the moderation business entirely? Don’t
worry about it because people are going to click on whatever ads run
there, like nothing happened.
Even if that was true before, to be honest I’ve
never placed an online order while crying in a stairwell, don’t know
about you is brand safety still not going to be an issue in
2020?
The way the game works today is that the Facebook users can click or
not click. But in the near future, things
get a little different. The three options will be:
someone sees the ad and clicks it
someone sees the ad and does nothing
someone sees the ad in a brand-unsafe context and tries to get to
the bottom of it
So here’s my first attempt at a letter to send to the brand
advertiser whose ads you see in a brand-unsafe context. Just like it’s pointless to report
a scam to Facebook, exception to that rule:
reporting a Facebook scam matters if you’re Craig Silverman
calling for a comment on a story. Then they take care of it.
don’t report brand-unsafe content to them either. Just screenshoot it
and send your CCPA letter to the advertiser.
This letter refers to the European and the new Kenyan law too. IMHO
better to make the advertiser look up which should apply, so we can all
use the same letter.
The story of the alt. hierarchy is an important lesson about the
nearly forgotten art of “adversarial interoperability,” in which new
services can be plugged into existing ones, without permission or
cooperation from the operators of the dominant service.
Today, we’re told that Facebook will dominate forever because
everyone you want to talk to is already there. But that was true of the
backbone cabal’s alt.-free version of Usenet, which controlled
approximately one hundred percent of the socializing on the nascent
Internet. Luckily, the alt. hierarchy was created before Facebook
distorted the Computer Fraud and Abuse Act to try to criminalize terms
of service violations.
Usenet was a good example of a decentralized service that could
support both free speech and commercial publishing.
Usenet supported adversarial interoperability (a25y?) about as well as
you can. Any individual could pick their own client software. Any site
could choose which newsgroups to offer, and which other sites to share
them with. Users could filter messages with killfiles, and even run
their own “cancelbots”
to share filtering decisions out on the network.
Usenet is also a good example of the best “white hat” case against
adversarial interoperability, because Usenet isn’t really much of a
thing any more. Usenet spammers buried the legit users, even though many
tried to
fight back, as creatively as they could. Would better spam filters
on the client side have helped? What if the scoring algorithms used to
rank stories on Slashdot-style boards had made it into Usenet
newsreaders? If Usenet had held its own, things might have gone very
differently. But we lost Usenet—not because of policies and lawyers, but
because spammers killed Usenet while the market incentives for
developers encouraged work on web boards, which were mostly
ad-supported.
I still like adversarial interoperability and agree with Cory that we
need more of it. The best recent example I can think of is ProPublica’s
Facebook Ad Collector extension, which was later blocked
by Facebook. Why was Facebook management so willing to take the bad
publicity that came with hiding which users see which ads? The easy
answer, of course, is that Facebook is eevill, funded
by Russian interests to undermine
US public health and promote
right-wing causes. But when a big company does something, there’s
always a “because they’re just evil lol” reason and a legit-sounding
reason. You have to pick whether they’re the real reason and the cover
story, or the conspiracy theory and the real reason.
If you want the legit-sounding reason to avoid adversarial
interoperabilty, here it is. The two easiest kinds of adversarial
interoperability, by far, are:
Ad blocking: Removing advertising from a
medium.
Spamming: Adding advertising that does not pay
to support the medium in which it appears.
Twitter is a good example here. Twitter
limited third-party apps and dropped RSS
support, to keep third parties from offering either convenient ad
blockers or ad-supported clients. (Or both: before they made the change,
you could make a Twitter client that stripped out Twitter’s ads and put
yours in.) Twitter also bought
and stifled TweetDeck, the popular third-party client.
The easier you make it for people to read your service with their
own choice of client, the more ads they will block.
The easier you make it for people to post to your service with
their own choice of client, the more spam they will send.
In Facebook’s case, the HTML they send to your browser is a
convoluted mass of HTML elements with randomized classes
and ids, to make things
harder for ad blockers—and either as collataral damage or on purpose
depending on who you believe, for ProPublica. (In native apps, ad
blocking is even harder.)
Facebook has to pull horrible, ever-changing, HTML stunts to get
their ads through to the user, because those ads don’t pay their
way, and the people receiving them have no reason to keep them.
Advertising is only sustainable when it’s an exchange of economic
signal for attention. In order to carry the signal needed to earn
attention, an ad has to be:
credible
from an identifiable and significant sender
related to a market in which the reader is a participant
Only the last one might apply to social ads. On Facebook, I have no
incentive to let an ad through, because I have no
reason to trust it. That also applies to “programmatic” ads on the
web.hey kids!
scientific literature! In any ad-supported medium where
blocking is the best choice for a member of the audience, then
adversarial interoperability is not an option, because the obvious
low-hanging fruit of adversarial interoperability is ad blocking.
It seems like we would need ads to start paying their way, and for ad
blocking online to be less of a rational
bargain for the reader, for adversarial interoperability to catch
on.
This is an answer to a question I got about adtech consolidation.
The Wall Street Journal ran a great piece by Keach Hagey and
Vivien Ngo, all about how Google is finally helping to fix the design of
that terrible “Lumascape” slide, by letting them add some whitespace and
make the logos bigger.Who says Google doesn’t do
great things for graphic design? Here it is: How
Google Edged Out Rivals and Built the World’s Dominant Ad Machine: A
Visual Guide - WSJ
So, is adtech consolidation bad for publishers?
Not really. Adtech consolidation is overdue. Any normal Internet
business would have consolidated a long time ago. (Ever notice you get
mostly Zoom meeting invitations now, and not links to rival conferencing
systems?)
What matters is the adtech/publisher split.
In a sustainable ad-supported business you have a split of about
85% publisher/15% intermediaries.
The web is 40%/60% with the publisher getting the small
slice.
It doesn’t matter if the 60% goes to one company, a few companies, or
a Lumascape with thousands of companies. It’s still unsustainable for
the publisher. When publishers look at the diagrams showing multiple
advertisers bidding for an impression on their sites, they’re ignoring
the other side—the same advertisers bidding for the same user eyeballs
on cheaper sites.
Content is commodified, the
publisher lacks market
power to command a decent price for ads delivered to their own
audience, and no tweaking of market share in the adtech business will
change that. Real-time bidding works today because too many people have
the user’s data, and can target them.
Today’s big tech companies come at every problem with the same tool:
take whatever business is adjacent to me, and try to turn it into a
commodity. That works great when your product is an internet service
and the business adjacent to you is an operating system. A copy of Linux
is a commodity, but it’s actually worth more than a copy of
HP-UX or Solaris.
But commodification is not the highest value model for the
advertising business, which depends on feedback between
brand equity and content reputation. Brands are worth more in a
market with high-value content sites, and ad-supported content sites are
worth more in a market with high-value brands. So what matters to the
publisher is not the number of adtech vendors participating in
the market. What matters is the number of saleable ad impressions that
are in direct competition with your ad impressions. (A site trying to
sell an ad on a story that cost $10,000 to report, edit and shoot is
competing against a site running an ad on a recycled racist cartoon or a
pirated Nickelback song that cost them nothing.)
Adtech’s job is simple: to facilitate putting impressions on cheap
content into the same market as impressions on expensive content. It
might matter a little bit how many companies are involved, but that’s
not the main story. The survival of any ad-supported site depends on
market power, and part of reclaiming market power is making it harder or
more expensive to reach your same audience on cheaper sites. That
depends on privacy tools and regulations.
The good news is that CCPA is coming January 1. CCPA is not just a
compliance issue or a cost center. If we handle it right, it’s a way to
reclaim some market power by limiting the number of saleable ad
impressions on low-reputation sites.
I don’t know if it’s Duck
Season or Wabbit Season now, but starting Jan. 1 it’s creepy
database marketing company season here in California.
This doesn’t mean that we can just use the opt-out flow that’s
easiest. The surveillance marketers are already working on a way to make
people feel like they’re doing CCPA, and to make things harder for
publishers, but without affecting the bigger players. The IAB CCPA Compliance
Framework for Publishers & Technology Companies is out, and it
enables you to signal an opt-out only for the site you’re on. Not what
we need, because it doesn’t do anything about the party that actually
holds the data, and that’s where the opt out or data deletion
demandI know the law says data deletion
request but if you’re not allowed to turn me down then I’m not
requesting. needs to go. Places like
Oracle Data Cloud
Acxiom
Experian Marketing Services
LiveRamp
All the companies on this CCPA todo list. If you
want advertisers to buy you better ad-supported content, you have to
starve them of the targeting data they need to reach you on crappy
content, or in social media and native apps. Which means focus on the
DMPs, not the peripheral ad targeting. Anyway, we should be able to make
CCPA flow a lot easier. Here’s a button to help me test something to do
that.
Two more things.
Anybody with a clean whiteboard can design a better web advertising
system than what we have now. The hard part is incrementally getting
ad-supported publishers from the current shitshow to
the new system before they run out of money.
Solutions to the surveillance marketing problem have to be built for
real people, not for a fully-informed, rational Homo
economicus. Two pieces of irrational behavior you can count on, and
that the solution has to take into account.
Users won’t pay for privacy even when it’s cheap.
Advertisers won’t pay for context even when it’s
valuable.
(The first makes sense if you consider that we consider surveillance
marketing as something that the other side is doing wrong and
we expect norms violators, not the people affected by the violation, to
bear the cost. The second one seems to be a lot more complicated.)
Everyone is writing long thoughtful pieces about how some social
media company CEO should do this or that.
What a waste of carpal tunnels. Social media CEOs know what they’re
doing about as much as Clostridium botulinum knows that it’s
giving people botulism.
If your brand is all about canned beans, and you find out that some
of the the cans have C. botulinum in them, you don’t write a
long think piece about how C. botulinum should carefully
consider its impact on human health. You don’t go to a conference and
get on a panel about how botulism is an industry-wide problem. You don’t
invite C. botulinum to give a talk at a college campus.
Brand safety is a little behind food safety, though. For a long time,
think pieces and conference talks have taken the place of action, while
brands end up in worse and worse places. That is the brand marketer’s
responsibility. David
Kohl writes,
[A]dvertisers have the power to reduce the spread of misinformation
by withdrawing media dollars from platforms that enable its
distribution. Consumers are nearly three times less willing to associate
with brands that advertise alongside unsavory, offensive content (see
Magna’s Brand Safety Effect). And according to Edelman, 48 percent of
consumers feel it is a brand’s own fault if their advertising appears
near inappropriate social media content. Simply said, advertisers put
their brands at risk by placing their media budgets in the hands of
platform providers that fail to take full responsibility for the content
they distribute.
Starting in 2020, when a brand runs an ad on social and it
ends up sponsoring a death threat against a journalist, or a video of
someone abusing their kids, or a domestic terrorist recruiting page, the
people who made that bad advertising decision are going to find out.
Starting in 2020, California gets the California Consumer Privacy Act
(CCPA).
When people see brand ads running on all the evil shit that the
social media companies are pushing out, they’re not going to fill out
some ToS report form when CCPA is available. And those CCPA demands will
come to the brands responsible.
Which is great news. Handling CCPA demands is likely to be costly. So
avoiding them gives everybody a reason to do the right thing, and move
ads away from problem activities. Brand safety and isn’t just another
marketing checklist item. Do it right and it’s your ticket to getting on
the right side of history.
This is to see the GPG passphrase dialog pop up less often. Probably
a security win because every time I get prompted for a passphrase there
is a risk that I will fumble-finger it and type the passphrase into a
chat window instead.
Open source software forms much of our digital infrastructure. It,
however, contains vulnerabilities which have been exploited, attracted
public attention, and caused large financial damages. This article
proposes a solution to shortcomings in the current economic situation of
open source software development. The main idea is to introduce price
signals into the peer production of software. This is achieved through a
trading market for futures contracts on the status of software
issues.
Big Tech companies as we know them are mutated versions of the open
source software business. This looks like a big cultural win for the
open source entrepreneurs of the 1990s. But the problem is that open
source business models can be a rational choice in the software
business, but in other businesses, not so much.
Nobody is the villain in their own story, and Big Tech management
generally doesn’t make decisions that look creepy and evil because they
actually are creepy or evil. They’re just running the pattern that beat
the last few levels.
The problem they’re solving is that in the software business, the
absolute worst place for a marginal dollar to end up is at another
software company. You would rather see money burn up in a fire than see
another software company get it. If another software company got it,
they would use it to sue you, or build network effects in their own
product adjacent to yours to squeeze you out, or whatever. So the
pattern you end up developing for self-defense is open source. Open
source is a great defensive tool in software. Turn
the product categories adjacent to yours into low-profit
commodities, and keep money out of the hands of other software
people.
If you keep
growing the open source model you get today’s Big Tech. Decisions
that look shortsighted or just plain evil are understandable if you look
at them from the Open Source entrepreneur’s point of view: Every
business adjacent to mine is either a low-margin commodity or an
existential threat.
When the business adjacent to the Big Tech company is in
individual independent contractor, you get the gig economy and the
precariat. Besides the gig economy, though, the biggest example of Open
Source patterns influencing other areas is the commodification of
content sites. Surveillance marketing, for the Big Tech platforms, is
not about the surveillance. User surveillance is just a commodification
tool, like an open source software component is. User tracking has value
to Big Tech because it makes the content site into a commodity source of
the same eyeballs you get get anywhere, and drives ad profits to the
platform that enables the tracking.
The problem with the commodification strategy is that it works great
for software, where it’s safe to assume that every company in every area
adjacent to yours is run by a douchebag software CEO, but it’s
suboptimal for types of business in which having a strong company
adjacent to you is an advantage. If Big Tech
management ran Chevron, they would give out free clones of the 1970
Plymouth Belvedere that get 8 MPG, and everyone would be all on about
how there is no money in the car business. Brands, content sites,
and ad agencies are an example of a set of businesses in which a viable
company in one category actually boosts the companies in adjacent
categories. Approaching this kind of situation with the oversimplified
view of commodifying everything is leaving money on the table.
Google
is Robbing You… and You Can’t Stop ThemThe problem with the
garbage [ad inventory] is that Google has a ton of it, and in order to
boost revenue they need to sell it. How does Google sell garbage that
nobody wants to buy? By leveraging their monopoly power to force people
to buy it.
Make a crappy site full of cheap clickbait content.
Load it up with programmatic ad slots.
Buy content recommendation widgets, or Chumboxes
on legit, high-traffic sites.
Any time someone clicks through from a legit site, show them a
bunch of ads, including high-CPM video ads.
The reason that this model has worked so well is that with
programmatic advertising, impressions on the crappy site are worth
something. The user from the legit site brings along the same
third-party tracking cookies that their browser sends to everybody. So a
legit advertiser is likely to bid to reach them.
The publisher of the legit site gets some of the revenue—they get
paid per click on the chumboxes—but they don’t see the revenue lost
because they’re now competing for programmatic bids with the lower-cost
sites that they’re sending traffic to. The tragedy of the
chumbox is that all the legit publishers would be better off if
nobody could do traffic arbitrage, but the first legit site to stop
participating in traffic arbitrage loses.
Now browsers are blocking the third-party tracking that enables
traffic arbitrage. First Apple Safari ITP, now Firefox
ETP, soon Microsoft Edge.
A legit site that can persuade users to
subcribe or sign in now has significantly more data on its audience
than a crappy site where people are unlikely to share their information.
Traffic arbitrage schemes are being starved of data, which should start
to show up as lower RTB bids on their sites, and higher bids on legit
sites.
There are uses for content recommendation widgets other than traffic
arbitrage, and it will be interesting to see how the content that gets
recommended on them changes as traffic arbitrage schemes get squeezed
out.
This isn’t the last move in the game, of course. For large Internet
platform companies, privacy violation is not required, but a means to an
end. The business model for platforms depends on capturing an
ever-larger share of ad revenue by commodification
of the content business, and violating user privacy expectations is the
easiest, 1.0, way to do it. It may be possible to construct a user data
scheme that both commodifies content and preserves user privacy, so the
interests of legit publishers are aligned with, but not identical to,
the privacy interests of their readers.
Unfortunately, when open source people say things like…
Maintainers do critical work which enables companies to create
billions of dollars in value, yet we capture none of that value for
ourselves.
Does it have to be like this?
I’m not arguing that maintainers should start capturing all of the
value that we create. But we shouldn’t capture literally none of the
value either. The status quo is not tenable.
I would love to find a way to help maintainers capture at least a bit of
the value we create so that we can happily continue to write new
features, fix bugs, answer user questions, improve documentation, and
release innovative new software.
…what people who use open source in business are hearing is more
like…
We’re getting a lot of software value for nothing! Fist bump!
A simple appeal to do the right thing is not something that, as a
downstream user, you can put in your budget.
When you use under-funded open source software, there is always a
risk that if the maintainer doesn’t get paid, they will either burn out
or go get a high-intensity job and let their project fall on the floor.
Can you justify paying open source maintainers in order to protect
yourself from this risk?
That’s a little more promising, but two areas need to be
addressed.
Is the risk quantified? I can measure a software
project’s value to me, but not the probability of the maintainer
quitting in the absence of support, so I don’t know the total size of
the risk. If I can’t quantify the risk, I can’t justify spending to
avoid it.
Can I measure the benefit of participating? I
don’t know how much my choice to fund the project reduces the risk. I
could put in my $100, see that the developer still can’t live on that,
and end up incurring just as much cost to replace the open-source
dependency as if I had not invested.
IMHO we need better market design in order to deal with those two
problems. I personally think that models based on dominant assurance
contracts and/or futures markets are promising (more on that later) but
just
banning an interesting idea after its first deployment is
counterproductive.
I recently found out that Linux Journal, where I was
editor-in-chief for a while, is shutting
down. This might seem natural, because considering all the places
you find Linux—from the largest cloud services and supercomputers to the
cheapest smartphones—it makes about as much sense to have a Linux
Journal as it would to have a magazine called Air Breathing
Aficionado. For what it’s worth, MSDN
Magazine is shutting down too. Are operating systems just boring
commodities now?
Device drivers used to be proprietary software that came on a floppy
or CD-ROM with the board. What Cyclades came up with was a good early
example of what Joel Spolsky later called commoditizing
the complement. By releasing the device driver under the same terms
as the rest of the Linux kernel, they saved their own costs of keeping
the software proprietary, pushed the maintenance programming into the
open, and lowered the total cost of ownership of the device for users.
Most of the major hardware companies ended up making the same
decision. Today, the operating system is an inexpensive commodity,
and most hardware firms have dedicated kernel teams to keep the
free part of the software/hardware combination working. This keeps the
whole product (Linux plus Linux-supporting hardware) attractively
priced.
An even bigger commodification shift came later, when server hardware became a
commodity too, driving all the attractive profits to the service
business. Big Internet companies as we know them grew out of the Linux
scene, where the mandate to commoditize everything that you
don’t directly make money from is so obvious that people rarely
even discuss or acknowledge it.
The commodification play in web publishing is to control the data on
who is looking at something, in order to drive the profit out
of where they look at it. This
doesn’t necessarily work so well, but whether or not tracking-based
ads work better isn’t the point. They only have to work well enough to
drive the web content business into the commodity category with the cover
bands from Amazon Prime Music. The only real opposition comes from
publishers
and privacy developers. Privacy developers don’t want users followed
from one site to another, and publishers don’t want their audience’s
eyeballs sold somewhere else.
The optimistic view is that better privacy in the browser will help
us beat commodification. If everything works out just right, privacy in
the browser means that nobody can get trustworthy data about ad
impressions on random sites, which means no more infinite online ad
inventory, which means that advertisers have to board the flight to
quality to sites known to be trusted by their users. Then increased
market power for those publishers means more
expensive advertising, which means more signaling
power for brands. Signaling power, if used right, builds brand equity,
which means brands can spend more on ads, so they increase signal by
contending to support obviously expensive content. This effect is
responsible for the kind of ad-supported
media that’s worth real money offline, so let’s make it work for the
web too.
But what about the low bid problem and the crappy ad problem?
Advertisers bid less for ad impressions without tracking
data when impressions with tracking data are available.
According to one
Google study, Based on an analysis of a randomly selected
fraction of traffic on each of the 500 largest Google Ad Manager
publishers globally over the last three months, we evaluated how the
presence of a cookie affected programmatic revenue. Traffic for which
there was no cookie present yielded an average of 52 percent less
revenue for the publisher than traffic for which there was a cookie
present. Lower revenue for traffic without a cookie was consistent for
publishers across verticals—and was especially notable for publishers in
the news vertical. For the news publishers in the studied group, traffic
for which there was no cookie present yielded an average of 62 percent
less revenue than traffic for which there was a cookie present.
The crappy ad problem is related. If the ad network doesn’t know that
you’re an affluent car shopper, you’re not going to get the
professionally shot photo of a BMW on a scenic road. Instead, you’re
going to get ads for FREE* LIVER FUNGUS CURE!!!1! (just pay shipping,
order auto-renews weekly).
We now return to the Internet Optimism already in progress
Nobody wants to be stuck being the commodity, and with decent privacy
in the browser, the content site doesn’t have to.
That’s the basis for cooperation between privacy-protecting browsers
and sites trusted by their readers. Former
Mozilla COO Denelle Dixon writes, on the Digital Content Next site,In short, behavioral targeting will become more difficult, but
publishers should be able to recoup a larger portion of the value
overall in the online advertising ecosystem. This means the long-term
revenue impact will be on those third-parties in the advertising
ecosystem that are extracting value from publishers, rather than
bringing value to those publishers.
I have been talking
and writing about the alignment of interests between privacy developers
(who don’t want their users’ activity from one site following them to
another site) and publishers (who don’t want to leak
their audience data) for quite a while. But privacy and publisher market
power are two parallel causes, not one big movement. The commoditizers
have a lot of skill and time to put into splitting the alliance that
puts publishers
on the same side as privacy developers. Can the Internet ad duopoly
do something to satisfy privacy demands from users and regulators
without ceding market power to trustworthy sites? Two
proposals.
Federated Learning of
Cohorts(FLoC): We plan to explore ways in which a browser can group
together people with similar browsing habits, so that ad tech companies
can observe the habits of large groups instead of the activity of
individuals. Ad targeting could then be partly based on what group the
person falls into.
If they can’t track users individually, they’ll still try to figure
out a way to get high-value ad impressions from known human eyeballs at
random sites, and commodify publishers that way. But that depends on
getting the privacy developers to decide to be fine with this kind of
scheme.
The good news is that privacy developers tend to be generally
sympathetic to the publishers, because positive
externalities and stuff. But we’re still facing the risk of
privacy-acceptable but anti-publisher user data handling schemes.
Privacy developers need help to keep any new privacy technology aligned
with the interests of whatever publishers their users choose to trust.
That means we have to commit to more ongoing coordinated open source
development, with publishers who want to stay out of the precariat using
and testing the same code as browser and tool developers who want to
keep their users safe.
The other good news, now that I think about it, is that they’re now
paying the privacy/publisher alliance the ultimate compliment, by trying
to split it. We’re on to something here.
Bruce Schneier has a great piece on the Influence
Operations Kill Chain but it misses one step. What if a social
network is so committed to getting a cryptocurrency
approved that the moderators end up coming in on the misinfo side?
And a reality check on filtering: Facebook
is already flooded with fake Libra cryptocurrency scams. If they
can’t filter out ads about their own brand, based on photos of their own
CEO, what can they do for political ads where the creative can be more
varied? And will they even want to?
Can
CCPA Be a Good Thing for Marketers?Now, we need to prove that
the data is being stored, managed and used with the utmost integrity and
privacy safeguards possible. Regulations like GDPR and CCPA provide the
opportunity for marketers to become accountable for this process and
allow us to step up and show what great stewards of data we are.
More CCPA: Why
the CCPA’s ‘verified consumer request’ is a business risk. IMHO CCPA will do for marginally useful PII in Marketing
what the Resource Conservation and Recovery Act did for marginally
useful hazardous materials in manufacturing—make it easier to just get
rid of it and update your processes to do without.
(Update 24 Sep 2019: add link to IsLoggedIn. Update 20 Sep 2019:
copy edit. Update 14 Aug 2019: Add link to Safari’s policy)
(Disclaimer: I work for Mozilla. Not speaking for Mozilla
here.)
The first result of the browser privacy trend is a growing difference
between how the browser treats two kinds of third-party data
collection.
third-party data collection that happens when the user chooses to
use information from one site on another site
third-party data collection that happens when a site or service,
without an action from the user, tries to use information about the
user’s actions from one site while they’re using another site.
Any third party interaction that the user knows about is
supposed to keep working. But hidden tracking pixels, scripts and any
technology that tries to implement tracking without user interaction are
all supposed to stop working.
Protections to implement this are still in progress, but this clearly
the direction Safari, Firefox, and now Microsoft Edge are going. We now
have the same kind of rough consensus on user expectations about
tracking that we developed pretty early on in the email spam situation.
This consensus is based on extensive user research. (Why browsers took
so long to listen to people about what they find creepy is another
story.)
An example of a protection step that’s common across browsers is the
the Storage Access API. This gives browsers a way to allow third-party
scripts to use cookies and LocalStorage, but only if the user takes
action. Apple
Safari, Mozilla Firefox, and Microsoft Edge are all involved.
(hashtag #worldsFriendliestBrowserWar)
Merely hovering over, muting, pausing, or closing a given piece of
content does not constitute an intention to interact.
and
We consider certain user actions, such as logging in to multiple
first party websites or apps using the same account, to be
implied consent to identifying the user as having the
same identity in these multiple places. However, such logins should
require a user action and be noticeable by the user, not be invisible or
hidden.
For sites, what this means is that SSO and registration walls are
relatively safe. If the user is clearly presented with “Sign in with
(identity provider brand)” and there is a button the user has to click
the first time they go to the site, that SSO system should keep working.
The user knows that they’re using it, and clicked the logo of the
provider they “sign in with.” A proposed API, IsLoggedIn,
from Apple Safari developer John Wilander, would make it easy for a
site to check logged-in status from JavaScript.
If the user can’t see the way that multiple sites are trying to use
the same information about them, then that flow of data across sites is
likely to get blocked, whatever the technical implementation is. This is
likely to be good for the relative market power of sites that people
trust more, if it turns out that people are more willing to “sign in
with” (and obviously share info about themselves) on their trusted sites
than on a random site that their uncle sent them a link to.
(This blog no longer has any consent management software on it,
or any third-party resources that I might possibly need to get consent
for.)
This blog has GDPR consent management on it. (I’m running OIL.js which is open source.) That’s
the cookie dialog you probably saw on your first visit, or every visit
if you clear cookies or use private browsing mode.
As any web user knows by now, the consent dialogs as currently used
have a bunch of problems. Consent management platforms (CMPs) are behind
on both UX and accuracy.
They’re annoying, especially on small screens.
They don’t accurately capture what the user really intends to
consent to. They’re more like “click to make this stupid dialog go away”
management platforms.
If your site has to do consent management, and uses a CMP, there is a
solution, currently being tested as a browser extension. Global Consent Manager
does a couple of things.
First, it will temporarily populate your consent string (using
the Interactive Advertising Bureau’s own Transparency
and Consent Framework) with a value indicating “no consent.” This is
equivalent to visiting the site the first time and drilling down to some
consent managemnt screen and choosing all “no”.
Later, if you show that you’re interested in the site, Global
Consent Manager removes the temporary “no consent” and allows the site
or the CMP to present the original consent interface.
For a site, why would you want users doing this instead of capturing
all the consent you can as soon as you can? Look at the engagement
study. People stay engaged with a news task longer when they don’t have
to provide consent for everything up front. (This idea is totally
borrowed from LinkedIn. They don’t make you fill in your whole profile
at once before you start using the site. They let you try it first, then
prompt you for more info when you’re more likely to think it’s worth the
exchange of value for value.)
Smoother consent management UX sounds great, but how do we scale it?
How can Global Consent Manager, and future next-generation consent
handling features in browsers, tell which of a site’s many cookies is
the consent cookie, and what to set it to?
I suggest a pair of meta tags.
meta name="consent-location" – Name of the consent
cookie. (Or could be extended to support other ways to persist the
consent information.)
meta name="consent-format – Format of the consent
string. Oil.js has a little extra JSON around the IAB TCF string, so we
need to handle that and any other CMPs that do their own thing.
More testing coming soon. The meta tags are on this page now, and
I’ll make some more test pages with different variants.
Scams and political misinfo are bad, so why not report them?
What follows is a version of my long email response to a question
about why I think that reporting problem content to Facebook is a bad
idea, at least here in the USA. Complaining about Facebook is
fine, but complaining to Facebook, not so much.
That’s because as of 2019, it’s almost like there are three Facebooks
in this country, from the user point of view.
Purple state Facebook: I wish I could predict
otherwise, but this is going to be a bigger and bigger fraud and foreign
misinformation operation leading up to the 2020 election.
In some countries, doing moderation might be a way to build goodwill
with the government. But here, the current US administration gives
foreign operations some of the credit for getting elected.
Meanwhile, Facebook needs to get an ambitious
new cryptocurrency scheme approved by that very administration.
Which means nobody at Facebook can do anything serious about the foreign
misinfo problem, at least as it affects the people
that the re-election campaign wants to reach. They’ll have to handle
anything pro-reelection with an extremely light touch, or face a
regulatory mess that will keep them out of the money-printing
business.The good news is that these two assembly line workers can
get a break from the “useless jobs” meme, and whoever is in charge of
pretending to moderate Russian troll accounts for Facebook can have a
turn.
Red state Facebook: At first I thought that that
Facebook could mostly ignore the red state people, or just let skeevy PACs raise
money from them, because they live in states that are already safely in
the re-election column. But red state people have a valuable role to
play. When Facebook kicks some of their favorite personalities off the
service, the role of the red states is to complain
loudly about it, and even threaten regulatory action, to help make
it look like Facebook is even-handed or leaning
moderate liberal, to the remaining audience, which is…
Blue state Facebook: This is where the prospective
employees live, or are willing to move, and also where you’ll find the
decision makers at the major advertisers. But both of these groups are
more comfortable with a company that has international appeal, so
Facebook somehow has to look “brand safe” in order to keep them on
board. That’s where Grigory Potemkin’s
paint and trim crew, I mean the Facebook advertising “transparency”
operation, comes in.
The kind of people who might work for Facebook, or advertise there,
get targeted for a dramatically different experience from what the
regular people do. How many Facebook employees are
embedded with the Presidential re-election campaign these days, anyway?
And how many are planned to be there at the peak? IMHO some reporter
should ask that. And much of the documentation that Facebook
makes public about its political misinformation problem is a
read-between-the-lines instruction manual on how to do political
misinformation without letting the brand advertisers see it.
Regular people get quack miracle cures and massacre-your-neighbors
campaigns, while CMOs get ads for luxury resorts and martech services.
Of course the CMOs are going to be fine with advertising there. But no
ad targeting system is perfect, and occasionally some of the nasty stuff
leaks through where blue state people can see it, which means a recruiting
problem to start with, and maybe even an advertising problem. (Although
as far as I can tell, CMOs are pretty easy to keep in the dark, safely
reassured about how they can stay brand-safe and moderate even while
supporting a company that’s already locked in as a division of the
re-election campaign.)
When Facebook asks you, a blue state person, to please report things,
that’s where you come in. They’re looking for help spotting it when the
bad stuff that goes out to regular people leaks into the sanitized
version of Facebook seen by advertisers and prospective employees.
Anyway, long answer, but that’s why I don’t report problems to Facebook.
When a drop of the purple-state crapflood leaks through to me, I post it
publicly and/or send it to an advertiser instead.
I haven’t ordered anything in a while. I’ll open it.
Just one item. It’s shaped like a kettlebell, but smaller. It’s
covered with tiny buttons like a scientific calculator. Each button has
a person’s headshot on it. Looks like they’re all members of 1980s hair
bands.
Can you use it as a kettlebell anyway?
No, it’s too small, I can’t get my fingers between the handle and the
ball part. And it’s really light. And none of the buttons move. They’re
not hooked up to anything.
What is it?
It’s a metabolic product of the data-driven economy. Bots browse for
stuff to generate ad clicks. As a side effect, they visit product pages
and pretend to show interest. Marketing algorithms discern product
preferences from this. Product design algorithms put out contracts on
gig sites, to hire designers to draw up plans for all the parts. And
contract manufacturing algorithms have the products made and
drop-shipped. Some bots must have gotten stuck in a loop of digging on
kettlebells, calculators, and vintage MTV.
That’s all fine, but why did you pay for it?
Probably one of my devices got compromised and told another device to
order it. At least if my stuff is under the control of a botnet that’s
making money by ordering stupid shit for me online, they’ll keep my
devices safe from the botnets that order Fentanyl from overseas and hire
kids to steal it off my porch. Or worse.
Are you going to send that thing back?
I don’t think so, now. It’s oddly reassuring. As long as I keep
getting this kind of stuff I’ll know there’s a botnet looking out for
me.
Just cleaned up some CSS on this blog, so if you missed the post about making ASCII
Art work with responsive layouts, check it out now. The examples
should work better on small screens now, and the overflow on bigger
screens should do the right thing.
Also ran linkchecker and fixed
the permalinks that got broken when I updated some JavaScript
dependencies. (Whee.)
On a
podcast with Katherine Druckman and Doc Searls at Linux
Journal.
Some of the stuff we talked about…
advertising and signaling
Doc: The latest Firefox does block third party
tracking. And that’s a huge move. But there’s a drift of the other
browser makers in their own different ways are doing things. So and
that’s part of a larger trend. So maybe you could just sort of unpack
that for us.
Me: Apple Safari got out ahead of the rest of the
pack in a lot of interesting ways. And people certainly joke about the
$999 monitor stand. But it’s pretty clear that the people who buy that
kind of high end equipment are also the people who are least trackable
by conventional adtech. So if you want to reach the kind of people who
can either afford a tricked-out Macintosh, or have a job where their
employer will buy them one, then you really have to think about how do
we place advertising in such a way that it doesn’t depend on the kinds
of creepy tracking that the Apple Safari developers have put so much
effort into avoiding.
Doc: Katherine, maybe I’m wrong about this. But
didn’t we finally put the nail in the lid of the coffin of Google
Analytics for our own website, when we found that as Don was just saying
our readers block tracking for the most part?
Katherine: It was it was so far from being accurate
that it was not useful anymore. Who doesn’t block tracking? I’m kind of
throwing this out there, but I want to say a good 60% of our traffic was
blocking it.
Me: I’ve seen some numbers from web developer sites,
and from blogs that focus on web development, and those are often
showing a 30 to 40 percent block rate. But it’s really interesting that
Linux Journal readers are coming in at 60.
And so there are, right now, marketing organizations that are going
out and trying to reach the kind of people who buy virtual private
servers or software as a service products or developer tools. And if you
do conventional data driven marketing, when you’re going after that kind
of audience, then you’re really going to get a lot of fraud bots, and
your marketing operation is going to be making decisions based on what
bots like to read, not so much what those what those those high tech, or
highly protected, users are interested in.
Doc: Have you seen a sign of that knowledge being
generalized, beyond, you know, people like us talking about it, where
it’s becoming obvious to some people in the marketing side that the most
valuable people are going to be the ones that are most protected?
Me: I’m cautiously optimistic because of the change
over from targeting millennials to targeting Generation Z. I don’t know
if you’ve seen it, but marketing thought leaders are changing up all
their slides. And they no longer say the millennials are different. Now
millennials are boring, and Generation Z is all different. But it’s
really recycling a lot of the same millennials material. So there’s a
nice niche opening up for a marketing thought leader to scrap the
generation-driven slides that everybody has already seen, and become the
marketing thought leader of the tracking protected segment.
Doc: Today you get injected with all these third
party cookies that get arranged like a DNA string that gets presented at
every site you go to subsequently. Do you see any hope for either ending
that or modifying it with the kind of things that are going on now or
blocking the third parties? I guess the question is, are we stuck?
Me: I don’t think we’re stuck. I think that a lot of
the talking points that we’re getting from adtech and martech today are
very similar to what email spammers were coming up with in the early
days of spam filters. The early spam filters, of course, were done by
technical early adopters, the kind of people who read Linux
Journal and know how to write .procmailrc files. And
when those people started rolling out their original, simple spam
filters to the the less Internet skilled users, the spammers started
started saying, hey, wait a minute, users like getting messages about
opportunities for great savings on HERBAL VIAGRA, or whatever the latest
spam campaign was.
The message from the email spam scene was really that privacy nerds
are less in tune with the preferences of regular users than we the
spammers are, so you should pay more attention to what spammers want,
and less attention to what spam filter developers think is the right
thing to do. And we really saw that not come across very well as as
email spam moved from a niche issue for people who had had their email
address out there for a long time to being a mainstream day to day
annoyance. The general population of users turned out to be more like
the privacy nerds than like the the way that the spammers predicted they
would be.
Doc: I’m thinking that you could go into how people
actually have a pretty good sense of behavioral economics, that they’re
good behavioral economists to some degree if you want to have any money
left. For us, because obviously, we’re we’re all walking through a
minefield that all of us understand in somewhat different ways. But I
think one of your points is that people do become pretty adept even if
they don’t fully understand what’s going on behind the surface.
Me: And you can’t make the optimal decision for most
of the decisions that you have to make in your economic life. I don’t
have the time to buy the optimal pair of socks. So what is a set of
tools that I can use to get an adequate pair of socks in the amount of
time that I have to make that decision? And of course, advertising isn’t
the entire story behind building brand reputation. But it sets up part
of the information that people can use to evaluate a product or, or
figure out the reputation of a brand. Brands are really interesting.
Brands are a cognitive hack that uses our brain’s hard-wired circuitry
for evaluating each other’s reputation.
What would be a really interesting piece of research would be
comparing TV ad spend on cars to later Consumer Reports
ratings. Are TV ad budgets a reliable leading indicator of how well that
car actually does in the independent test? It’s kind of like when Rory
Sutherland at Ogilvy compared advertising to someone betting on their
own horse at the track. If you go to the racetrack and see that the
horse’s owner is betting heavily on that horse, they’ve probably got
more information than you do.
Open source and incentives
Katherine: I feel like I see growing resentment, and
not just individual open source developers, but small companies and
whatnot, just that are becoming more painfully aware of the inequality
of, you know, fortunes being built off of what they perceived to be
their own backs. And, and the consequences are well, unforeseen.
Me: It’s not just a matter of resentment, there’s
also an element of risk there. One of the side effects of having good
dependency management tools is that real world IT projects are building
deeper and deeper dependency trees. So the success of your web site
might depend on some software component three levels deep, whose
maintainer is going to burn out, right as you deploy your site, but you
don’t know it. So there’s this risk for anybody, depending on open
source, that somebody’s choice to stick with it just doesn’t pencil out.
And the people who you need to stick with it in order to have your thing
be successful might be several hops away. There’s still no good way of
getting that information, propagating the developer pressureMaybe a
market
mechanism (PDF) could help? from the developer who’s
experiencing it to the leader of the project that depends on their
work.
Some surveillance marketing organizations have suggested adopting a
Federal privacy law in order to preempt the California Consumer Privacy
Act. Preemption would be bad if it actually happened, but the fact that
they’re trying for it is the best endorsement I have ever seen for the
California Consumer Privacy Act. If I wasn’t a CCPA fan before, I am
now.
In my humble opinion, preemption is the wrong direction. Privacy
regulation should be complicated enough to impose significant
transaction costs on database marketing practices. State-level privacy
regulations are a start, but what about county or city ordinances? User
tracking allowed on alternate sides of the street on different days of
the week?
Why would I want to see costs and complexity imposed on the
surveillance marketers? I’m going to leave the political stuff out for
now. From a selfish point of view, as an individual considering buying
stuff, I am going to get ads, and I’m going to get them matched to me in
three ways.
Context. Placed on a resource I’m interested in using, like a
magazine article or a bus bench.
Search. Matched to search results when I look for a product or a
service, like a Yellow Pages ad or a Google search or Maps
ad.
Personalized targeting. Matched to me based on something the
advertiser knows about me.
On the Internet, many ads are placed using a mix of these techniques,
and it’s hard to split out how a real-world marketing budget is
allocated across them. And information originally collected based on
context can leak
and start getting used for personalization. But the technical and
regulatory environment affects how much money advertisers choose to
invest in each one.
As the recipient, or potential customer, the three ad placement
methods affect me in different ways. Ad money allocated to context is a
subsidy for something I want to use, whether it’s local news coverage or
an ad-supported public
restroom.
Ad money allocated to search is almost as good. I’ll use a search
engine more if it gives helpful results, so search advertising also pays
for something I want.
Personalized targeting, though, is a problem. Instead of paying to
support something I want, the advertiser is paying to reach me as an
individual. The fact that my information is in somebody’s database is a
risk to me, but a source of revenue for the database owner. It’s a
classic Negative
Externalities problem. Besides, anything spent on this stuff does
not go to pay for the ad-supported resources and search services I
really want.
Ad-supported cultural works have positive externalities, when they’re
re-purposed for other uses. The “Star Trek” advertisers got their
money’s worth in 1966-1969, but people are still watching the show
today. Kurt Vonnegut quit his job as a car dealership manager because he
sold stories to Collier’s magazine.
As a member of the audience for advertising, I win when I can help
move the marginal advertising dollar from personalized targeting to
either context or search, because a fraction of the money that gets
moved pays for something I want, some of it is likely to create positive
externalities, and none of it gets spent on creating risks for me.
Regulation is a piece of the solution, and a mess of confusing
regulations could be more effective in raising the relative price of
personalized targeting than a single set would.
People’s intuitions about marketing practices are economically
sophisticated.
People often choose to pay attention to ads that carry economic
signal.
People are quick to develop banner blindness and other
habits to avoid low-signal advertising.
People choose not to invest a lot of time in low-effectiveness
ways to protect their personal information, but pick up on measures seen
as effective, such as Do
Not Call.
People who grow up in ad-heavy economies learn the economics of
advertising like people who grew up playing ball learn physics.
What we need to see from privacy regulation is
increase the transaction costs of negative-externality
advertising practices.
credible promise of reducing risks, to atract mass
participation.
Privacy regulation has to have the confusion and cost from the
advertiser side increased, in order to balance out the risks and costs
imposed on the audience side, and shift ad budgets.
Bonus links
Four
Steps Facebook Should Take to Counter Police Sock Puppets New law of
headlines: any piece with Facebook should in the headline is not
worth reading. Come on, EFF, the police put undercover officers into
schools and workplaces all the time. Why should a mass-market social
platform be any different? This is just the high school weed dealer’s
if you ask them if they’re a cop they have to say yes.
Daphne:
Moderating Facebook At Barely Minimum Wage then: hardware is
difficult, manly work, and software is a straightforward office task we
can hire low-paid women for. now: software is difficult, manly work,
and content moderation is a straightforward office task we can hire
low-paid women for.
(update 21 Apr 2019: copy edit, add some explanation.)
I’m learning how to make a Progressive
Web Application. Progressive Web Applications are a good thing
because they give people a lot of the features of mobile apps, but run
in the browser where it’s easy to turn on privacy
protection features.
Here’s how it’s going so far. A simple polyhedral dice roller for
Dungeons and Dragons, and similar games that use many-sided dice.
(Yes, I know real dice are better for a real game. This is for when
you forget your dice but not your phone, or have a few minutes to
prepare something.) This is mainly designed to run on a phone, but it
does take keyboard input, and if you can see it here it also works in an
iframe.
Here’s how I got the help to work the way it does, with CSS. In its
regular place, the keypad is laid out calculator-style, and on the help
page, the buttons are laid out in a column on the left with the
explanation of what they do next to them.
The keypad and the help page are really the same content, so each
button’s help is a p element that lives right next to the
button element. Turning help on and off doesn’t navigate
you to a separate help page, it just moves the keypad to a new parent
element where it is styled differently.
The #keypad div starts off as a child of the
#compact div. Inside of #compact,
the grid is four columns:
grid-template-columns: 1fr 1fr 1fr 1fr;
the help text is styled with display: none
the tall button is grid-area: span 2 / span 1 and
the wide button is styled with
grid-area: span 1 / span 2;
the 0-9 button, only used in help, is also
display: none
Moving #keypad to be a child element of
#help means that different styling applies.
the grid is two columns:
grid-template-columns: 1fr 4fr;
All the help elements are display: block so they
show up, and take up positions in the grid.
The tall and wide buttons are single sized.
The individual numbers are display: none and the
0-9 button is display: block.
Putting the help text next to the button it applies to should make it
easier for me to check that the help text for each button is there and
up to date, and I don’t have to make a separate help page.
Next step: figure out how to make this Do The Right Thing with a
screen reader.
(Update 17 Apr 2019: Yes, I know it works on Firefox but is
messed up on some other browsers. I made an issue: Issue #29609 |
webcompat.com)
I’m making a web thing (for Progressive Web Application practice) and
could use a header image. I’ll just go old school and do some ASCII
Art.
Wait a minute, though. All the cool web sites now are Responsive. So
the header should work at different sizes. So what I want to do is to
get ASCII Art to behave like a regular image. If I make Ye Olde .Sig
Sword
and I want it to look good inside the containing element, I want the
text to resize, not reflow.
Kind of like this.
Hi, here we are inside a narrow element. Here is a little tiny sword.
The dashed red border is just to show how big the div is. Here’s a
bigger one.
This div is wide. The normal text is the same size. Behold my large sword!
The answer so far: put the ASCII Art inside an svg
element, like this.
The “white-space: pre” gives me the ultimate image editing
environment: free-form multi-line ASCII art text within the
text element. Yes, I still need to use
>, <, and
&. The fill sets the color.
One small annoyance is that the text of the ASCII Art can be selected
if the user double-clicks, or drags, or long presses on a touchscreen.
So the user-select stuff is to prevent that from
happening.
A. G. Sulzberger, publisher of The New York Times, writes,
If you’re reading this essay on an internet browser, it offers a
useful example of what tracking looks like at a practical level. Before
you had time to read a single word, a number of different companies had
already placed a “cookie” or other tracking mechanism on your browser to
study your internet use. The Times hosts these trackers for three
purposes: to learn about how people use our website and apps so we can
improve their experience; to reach readers we hope will subscribe; and
to sell targeted advertising.
Read the whole thing. But my inner tech editor could not be silenced,
and had a small suggestion. How about…
If you’re reading this essay on an internet browser, it offers a useful
example of what tracking looks like at a practical level. Before you had
time to read a single word, your web browser had already
accepted a “cookie” or other tracking mechanism from a number of
different companies to study your internet use. The Times hosts
these trackers for five purposes: to learn about how
people use our website and apps so we can improve their experience; to
reach readers we hope will subscribe; to sell targeted advertising;
to leak our readers’ personal information to help our
competitors sell ads targeting our audience; and to enable fraudulent
bot traffic to impersonate human visitors.
As soon as I make the web browser, and not the tracking company, into
the subject of the sentence, it helps explain some of the business
reasons for news sites to focus on privacy. For a site, examining your
own privacy practices is fine, but it’s not where the big wins are. The
important part, for the New York Times and other sites that
need to protect their ad revenue, is to work along with in-browser
tracking protection technology. Protecting reader data for the readers
is mostly the same as protecting audience data for the ad business.
It’s kind of like the situation with email. Email is a viable
marketing medium today not just because legit email marketers don’t
spam, but because email users have good spam filters. Spam filter
technology kept low-value email lists from devaluing email marketing.
In-browser privacy technology is starting to reverse the process by
which low-value cross-site tracking has been devaluing web
advertising.
The Times is already doing some good
service journalism on web privacy. Next step: set up the paywall to
give extra free articles per month for anyone running Apple Safari ITP
Apple
Safari ITP or Firefox
ETP? The more reader eyeballs a a site can remove from the
race-to-the-bottom eyeball market, the more market power it has.
Spam filters and legit email marketers saved email as a marketing
tool. Can privacy-protecting browsers and legit ad-supported sites do
the same for the web?
Step 1: Adopt a GDPR Everywhere policy. This is
obviously good. Show me a company in the IT business that hasn’t decided
to go GDPR Everywhere, and I’ll show you a company that hasn’t finished
writing out all the user stories for how to handle it when some users or
partners are covered by GDPR and others aren’t. Or
what happens when you have been giving a user the creepy second-class
privacy policy for a while and then they go get married to a European,
or go work for a European company, or something. Basically every IT
company is going to either go GDPR Everywhere or sign up for years of
intricate, expensive legal work and arguments that they’ll eventually
give up on.
Step 2: Have products and services interact with
open source, and collaborate and test upstream. This is also obviously
good. Pull open-source Git repositories and run integration testing and
metrics and whatever on them. We shouldn’t just sit there and pull
whatever comes out at the end of the development process, help with the
QA, publish peer-reviewed research, whatever.
Step 3: Congratulations, we’re now a data
controller under Article 14 of the GDPR. Git repositories are full
of PII. Every commit includes the developer name and email address.
Oh, no, PII! Does that mean we can’t work with open source?
Of course not. Open source is still legal. But we have to comply with
our data subject rights obligations under Article
14. We have to contact everyone whose PII we hold, and notify them
clearly of what we’re doing with their data.
And what are we doing with it? We’re using it to do open source QA
that feeds into making our product better. And we have to explain what
we’re doing in our Article 14 notification. So the European Union
basically just told us not just that we can send our elevator
pitch to a bunch of software developers unsolicited, but that we
have to.
(edit 28 May 2019: fix some awkward or unclear
sentences.)
Someone once remarked (paraphrased) that as long as there has been a
scene, there have been people complaining that it is no longer the true
scene. (citation needed)
Of course the open source scene is changing, but how much of that is
the unavoidable transformation that a healthy scene goes through, and
how much is fundamental?
The Free Software movement as we know it started by capturing the
tremendous cognitive surplus that was just there for the taking from
university students and from employees of conventional, slothful
corporations. Back in the 1980s and early 1990s, barriers to cooperation
were transactional: licensing and communications technologies. Patches
on a mailing list seem like a high-overhead collaboration method today,
but by the standards of the time, diff(1),
patch(1) and Free command-line tools were transformational.
And of course the classic free software licenses are practically
zero-overhead for participants with uncomplicated sharing or reciprocity
goals.
So, all that cognitive surplus was just sitting there between classes
or TPS Reports or whatever, and the software freedom scene was set up to
capture it. Before long, Tim O’Reilly and friends branded it as a
software business trend called Open Source, and the modern software
business emerged.
Sounds great—why isn’t it continuing to work like that? Two
reasons.
Less cognitive surplus in the world
The kind of university experiences that include substantial
cognitive surpluses are less widely available, because of increases in
the cost of higher education and how those costs are allocated.
The work environment is better at capturing cognitive
surplus.
There’s a whole complex privilege thread here, but the main point is
that open source as we know it began when a lot of people who had a lot
of free time got on the Internet. They (fine, fine, we) had the
opportunity to participate in open source and other
cognitive-surplus-capturing activities (such as MMORGs). Many new people
joining are not coming in with the same economic and time advantages,
even if they have access to the same or better creative and
collaborative tools.
More competition to capture available cognitive surplus
Open source is no longer the only practical, low-overhead way to do
collaborative projects. Now people can do
crowdfunding
gig sites
native app stores (mobile, Steam…)
software as a service
It’s no longer a choice between low-overhead, low-incentivization
(open source) or accepting high overhead if you want to get paid.
What next?
Open source advantages in transaction costs are still there. But
people looking for open source contributors do have to realize that
we’re going to have to keep increasing the number of people who consider
open source as a possible valuable use of their time (remuneration
issues are blockers) or see open source lose contributors as we get
stuck competing with more outlets for less already-unmonetized time and
attention.
A designer knows he has achieved perfection not when there is nothing
left to add, but when there is nothing left to take away.
—Antoine de Saint-Exupéry
The perfect surveillance advertising business model has been
discovered. I have seen it. Maybe you have seen it too.
Here’s how it works.
Identify users likely to be enraged by a political issue. Start a
PAC with a name related to that issue. Also start an LLC.
Run rage-provoking social media ads about your issue, with
a call to action to sign a petition or complete a survey.
Upsell a fraction of the people to make a donation to your
political action committee. Use a Dark Pattern to
get some people to consent to having their credit cards billed
monthly.
Actually deliver the petitions or surveys or whatever, and
actually make a few donations from the PAC to politicians to get on
their donor lists. But keep most of the money for yourself by paying it
to your LLC (from step 1).
It’s data-driven. It’s sticky. It’s social. It’s got everything that
a surveillance marketing business needs, and nothing it doesn’t. No
manufacturing. No support. Not even any drop-shipping. Just register a
PAC and an LLC and start running ads and social media.
If you don’t get these perfect ads, yay for you. Your eyeballs
are probably too expensive on the ad impressions market, or you don’t
seem like the kind of person who would get enraged about any issue that
they have a landing page for.
One reason that this model works is because people made so much noise
about the Citizens United decision and shadowy political
groups. So the people who are paying into these things probably never
even feel ripped off—they feel that’s just how things work now, and
they’re getting away with working the system.
Update 12 Sep 2021: Copy edit, add Dark Patterns
link.
Update 16 Oct 2023:Man
who advertised fake dinner with Donald Trump sentenced in tax case - The
Baltimore Banner This is an example of doing this business model
wrong. This guy messed up. You have to pay the money from the super PAC
to the LLC—that way the individual items are on the LLC’s budget, and
can stay confidential. Then do your taxes by the book. (If you really
don’t want to pay much in taxes, take your money out as expense
reimbursements or loans from the LLC.)
(Update 31 Mar 2023: We got some volunteers together and checked
this out along with some related stuff. One car dealer in San
Benito, Texas (pop. 24,665) was responsible for sending information on
approximately 10% of the volunteers, even though only 6.6% resided in
the entire state of Texas.)
(Update 23 Apr 2019: I wrote to ask this dealership how they got
my info and didn’t get an answer to my question. But they did put me on
their email list in time to get the “April Shower of Savings” email so
I’ve got that going for me which is nice.)
One place I will probably not buy a car: Franklin Sussex Auto Mall,
in New Jersey.
I still have a Facebook account, I don’t check
Facebook often enough for it to be a good way to reach me. See the page
footer here for contact info. mostly to keep up on the ad scene
there. When I checked Facebook’s page of ads targeting me, this company
is listed under advertisers “Who use a contact list added to
Facebook.”
Facebook ad for a New Jersey car
dealer
Somehow, Franklin Sussex Auto Mall got a hold of my email address or
phone number, and uploaded it to Facebook. Have I ever shopped for a car
in New Jersey? No. If I was shopping for a car, would I fly to New
Jersey to buy it and then drive it home? No.
And now that I look at it, when I go through the advertisers that
Facebook lists as having uploaded my info, most of them are car dealers
I have never visited or contacted. Someone has a pretty good racket
going here. How much are they making from the car dealers? (Yes, this is
a bad thing, because car dealers could be spending that money to build
positive reputation by funding local news, or other ad-supported
resources with positive externalities, but we knew that already.)
Maybe when they write the history of the big social site era, it
won’t be about some all-seeing panopticon, but more about a bunch of
people in a highly paid California bubble, mostly young guys who have
been told they’re smart their whole lives, getting out-hustled at a
direct marketing business they don’t really care about.
Interesting problem: why do brands fail to protect customer data when
it would be in their interest to do so?
If expected customer retention of tracking-protected customers is
higher, why not invest in tracking protection for your most profitable
customers? Why don’t car insurance companies figure the odds on customer
retention of protected and unprotected customers the same way they
figure the odds on other risks?
It might be because corporations are not decision-making entities,
and online marketing is the world’s longest chain of principal-agent
problems.
The value of a database marketer as an individual on the job market
is a function of the number of database-capturable prospects that the
marketer will help an employer land as customers. If a lot of Allstate
customers are also available on DMPs, then more VC-funded insurance
startups will launch, and they’ll bid up the salaries of database
marketers now working at Allstate.
If Allstate’s best customers are protected, then the VCs invest in
something else, the job interviews don’t happen, and Allstate can keep
paying their database marketers what they’re paying.
So: principal-agent problems are market design opportunities. How to
structure compensation for marketers to incentivize customer retention
even after that marketer is no longer employed by the brand? (People
generally want to do the job right, you just can’t keep throwing
incentives to do it wrong at them.)
And how to increase the social rewards of the choice to allocate
marketing budgets towards positive-externality advertising and away from
negative-externality advertising? Imagine that a restaurant chain is
opening a new location and wants to reach people there. They have two
choices. Buy ads that pay for local news and cultural content that is
written for people in that area, or they can buy ads that pay to make
those people more depressed, manipulate their elections, and try to
taunt them into massacring each other. The social rewards for choosing
the first should tend to go higher.
White power assholes are not exactly the smartest people on the
Internet. State-sponsored manipulation operations have better skills and
can use the domestic guys as human sock puppets.
Pedophiles aren’t the smartest people on the Internet either. Even
the “highly technical dark web” pedo networks are using off-the-shelf
tricks that are far behind what even the most basic adfraud operation
can pull off.
So I’m writing to explain why I’m going to move your long-form think
piece about the “power” of the Internet “duopoly” to the
probably-never-going-to-get-to-it end of my to-do list. Let’s have a
look at just the last 24 hours:
(Come back in 24 hours for more.) You’re asking me to be interested
in reading your ever-so-thoughtful essay about the awesome power of two
companies that have to be like the fifth most influential people on the
Internet, max. Even the Facebook ad integrity guy is down to asking for
free reports of scam ads, on Twitter.Dude, if the
“get everybody else to do your QA for you” strategy worked, then we’d
all be running desktop Linux.
The “powerful platforms” are a box on the Internet cable between
terrible marketing decisions on one end and criminals and terrorists on
the other. A box maintained by vaguely creepy but not especially
interesting IT staff. Yes, let’s write about CMOs who attach their
brands to heinous shit—what’s up with that? Yes, let’s write about the
criminals who end up with the money—that can’t be good. But the
companies in the middle are not the story.
Please let me know if the following makes any sense, and if so I’ll
turn it into a talk.
It’s not a simple game of people vs. companies. In
software, you don’t just have evil “software hoarders”
vs. cooperation-minded “users”. There are way more players: OS vendors,
hardware vendors, proprietary ISVs, developers of internal applications,
and IT organizations. At least. I’m sure I forgot some. But the point is
that they don’t all have the same interests. Pretty much everyone who
does software wants everybody else’s software to be open
source. So if you look at everybody’s Core vs. Context, people will
generally play nice in open source projects doing whatever their Context
(or complement,
if you want to look at it that way) is.
In user data, you’ve got the Five Armies: content creators and their
publishers, companies trying to sell stuff
(advertisers/sponsors/signalers), intermediaries (adtech/platforms),
client-side developers (browsers/privacy tools), and fraud hackers. A
high-reputation brand with a solid mailing list has completely different
user data handling interests from a social platform—just like a network
chipset manufacturer will have different open source interests from a
proprietary OS vendor.
True believers aren’t enough to build on. Some
people are really fired up about Internet ethical and policy concerns,
but most people would somewhat prefer the right thing, and telling them
that you do the right thing makes them feel better about doing it and
somewhat more likely to do it. But doing the evil thing is not a
deal-breaker.
Loud complaints don’t matter (much). Yes, the first
open source release will include a license mismatch, or somebody’s ssh
private key, or it won’t build without a tool you didn’t include, or
something. And somebody will complain. But the true believers are useful
for QA to guide incremental improvement, not as gatekeepers to decide if
you’re in or out. (And if you fix something that someone is complaining
about in a particularly annoying way, do it quietly. Eventually
they’ll make their complaint to a reporter who will check it out, find
the fixed version, and start ignoring them.)
Hardly any company will get to 100%. Robert Penn
Warren said it best.
Man is conceived in sin and born in corruption and he passeth from
the stink of the didie to the stench of the shroud. There is always
something.
Even companies that focus on open source have awkward
corners where they can’t Do The Right Thing, because reasons. And
most of the code contributed to open source projects is done on the
clock at companies that are also in the proprietary software
business.
Just like IBM didn’t need to have a plan to open source AIX in order
to make a difference in Linux, companies don’t need to have a plan
to get clean of all surveillance marketing activity to make
significant moves in the direction of user privacy. An insurance company
might decide to remove third-party pixels from the pages linked to from
existing customers’ bills, to keep from leaking customer data—but keep
social tracking pixels for some other pages for tracking conversions on
a social campaign. Anyway, open
source program offices are a thing. What about customer data protection
offices?
I am a well known hit man on deep web. Someone paid me 1000 USD to
beat you and broke your right arm. (Why? I don’t know) I will take 1000
USD more after my client sees your broken arm!
If you send 2000 USD to me, I will cancel the job, and I will give
you the name of my client. Else, I will finish my job asap!
Send the above amount on my BTC wallet (Bitcoin):
3JDLJWW5K6AsP1VBUD1Dgsxk9ydtcdMFvz As soon as the payment is completed I
will receive a notification and a new email with the client’s details
will follow.
You have 24 hours from now on!
Hold on a minute.
He has a reputation, but he’s going to ruin it, and burn an existing
customer, in order to earn 1.5x what he was originally going to earn
from the deal?
Where did this clown learn his game theory?
Nobody would take an established brand, fail to deliver the product
or service that the brand was originally known for, and leak their good
customer’s private info, just to go chase incremental revenue driven by
unproven new technology, right?
Two kinds of web clients who it’s a bad idea to serve a third-party
resource to:
Users who have not given consent. We know we can’t use their
data. But third parties can peek at those users because their tracking
script or pixel is on the page. If the first party can’t have that data
why should the third parties get it?
Adfraud bots. Bots come to visit legit sites to build up
realistic-looking cookies so they can cash out elsewhere. Bad idea to
help them.
Consent management requires some interaction with the user, which is
also an opportunity to collect data for assigning a botness score.
Bots will also try to appear to be visitors who have already given
consent, and go get the third-party resources anyway. This is an
interesting problem because it’s a game where the bot and the third
party are on the same side, and the site is on the other. Impossible for
the CMP to block the bot connection to the third party, but is it
possible to show that consent was not in place when that connection
happened? Understanding the provenance of the consent string is going to
be important. An extra cookie containing a digital signature for the
consent string?
New CMPs will have an opportunity to build on knowledge gained from
regulator reactions to first-generation CMPs. But it’s more interesting
to think about sustainable
advantage for the site than just about regulatory future-proofing.
For example, a good consent management platform will also tie in to an
objection management platform/opt-out management platform.Objection management platform and opt-out management
platform both work out to OMP—anybody using that TLA?
People ask about whether consent records obtained by conventional
CMPs are even good. (Risks
in IAB Europe’s proposed consent mechanism | PageFair) The
first-generation click OK to make this dialog go away and consent to
everything UX is unlikely to last, but what’s next?
Design the CMP to work in the interest of the CMP customer, not
third parties.
Understand the (painful, because anything touching the CMS is
painful) changes involved in taking 3rd parties out of the page template
entirely when the page is going to a no-consent user. No
peeking!
Future-proof consent workflow to allow for adjusting for
regulatory changes (boring) and revenue or data opportunities
(fun)
Integrations: objection/opt-out mangagement, single sign-on,
paywalls, in-browser/in-extension consent mangement.
And of course, get out in front of coming browser privacy
improvements. Need an open-source strategy including participation in
browser and extension projects.
Attention humans. We are in a life and death struggle with our
enemies, the pathogenic bacteria. Our scientists have developed secret
weapons, the antibiotics. It is vital to use these weapons only when
they will make a difference, in overwhelming force, and to leave no
survivors. The enemy must be prevented from developing countermeasures.
Do you understand?
Can we just betray our most effective weapons to the enemy if
it’s in exchange for CHEAP MEAT?
Open source program offices are a thing. What about customer data
protection offices?
A little background: when the open source business as we know it was
getting started, most of the original concerns about free
software in business were about license compliance. Many people
assumed that all software companies would pursue maximum restrictions
using copyrights and patents, and users who wanted to use, modify, and
redistribute software would be their adversaries.
Then, Tim O’Reilly and others started changing the conversation to
talk about open source strategies. How can a small company
release high-impact software by building on collaboratively developed
work? Now, as open source has caught on all over the software business,
it seems obvious that people think about
business models made possible by open practices
open source companies as market participants competing for users
while cooperating on common work
But it was a big mental shift at the time.
Today, a modern open source program office has to handle issues of
license compliance, including training developers to follow and apply
licenses, and checking the licenses of inbound software for
compatibility. But the big picture is about using open source for
sustainable advantage.
Maybe, today, we’re still thinking about privacy as a compliance
problem. Users and regulators on one side, companies on the other.
But what about a company that has a solid first-party relationship
with a customer? What if the person is known to open the email
newsletter, come in the store, answer the surveys—you’re not in an
adversarial relationship with that person over their data. The company
and the customer are on the same side. When privacy concerns and
adoption of privacy tools help get the person protected from targeting
by some fly-by-night competitor, that’s a win for both.
If you’re running a bank, you don’t want some cryptocurrency scam
picking off your high-value customers. Those people’s lifetime value is
going to go way down when they’re selling off all their stuff because
the bank bought a “custom audience” social campaign targeting
them, and the data leaked. If the bank had a customer data office
thinking a step ahead, instead of just checking compliance boxes, it
would have considered the data leakage risk along with the social
campaign’s possible upside.
Or a healthcare brand might run what looks like a harmless campaign,
but some clever data management platform can infer medical data from it,
and a “miracle cure” racket uses the data to pick off the customers.
Before you know it the customers stop filling their prescriptions and
start loading up on colloidal silver or something. A customer data
office would have had the data science skills to see the risk, and
offset it, possibly by offering the customers a free service to help
them opt out of high-risk data processing.
Even for just a regular product, when a VC-funded “direct to
consumer” competitor comes in, with no customer list—how do they grow so
fast? Buying targeting data on the open market, because the existing
brand haven’t learned to protect their interests. Where does a brand’s
interest in customer data coincide with the customers’ own interest in
privacy? Instead of purely focusing on compliance, a customer data
office will understand the risks and opportunities.
Anyway, software freedom went from a contentious idea to the
source of much value in a remarkably short time. What if something
similar happens with privacy?
Male impotence, substance abuse, right-wing politics, left-wing
politics, sexually transmitted diseases, cancer, mental health….Intimate
and highly sensitive inferences such as these are then systematically
broadcast and shared with what can be thousands of third party
companies, via the real-time ad auction broadcast process which powers
the modern programmatic online advertising system. So essentially you’re
looking at the rear-end reality of how creepy ads work.
Simply put: users need more protection from tracking….In support of
this effort, today we are releasing an anti-tracking policy that
outlines the tracking practices that Firefox will block by default. At a
high level, this new policy will curtail tracking techniques that are
used to build profiles of users’ browsing activity. In the policy, we
outline the types of tracking practices that users cannot meaningfully
control.
Many good points, but there’s one small fix that could make it more
useful. From the original:
It is now the norm—even in the presence of laws clearly forbidding
it—for nearly every commercial website we visit to plant tracking
beacons in our devices, so our lives can be examined and exploited by
companies and governments that extract personal data and manipulate our
lives for their purposes. This offends our privacy and diminishes our
agency.
Here’s a suggested new version edited to be clearer about how
browsers work.
It is now the norm—even in the presence of laws clearly forbidding
it—for nearly every commercial website we visit to include tracking
beacons in their pages, and for our browsers to load
and run those beacons, so our lives can be examined and exploited by
companies and governments that extract personal data and manipulate our
lives for their purposes. When our own browsers work against our
interests, this offends our privacy and diminishes our agency.
Please don’t assign all the work to the site.
It’s counterproductive
to ask the site to be the one to bear all the costs of privacy
reform. The site is the player with the least economic power and the
least freedom to change. Web publishers and brands failed to protect
their audience and customer data and are now, unfortunately, kind of
stuck. Because third parties control the audience information that’s
needed in order to make ads saleable, no one web site can unilaterally
switch off the data flow that makes their business model work.
On the browser side, though, it’s different. Browser developers know
that they can get more user satisfaction, and get users to spend more
time in the browser if those users have functionality that makes them
feel safer.
This stuff needs to get fixed and browsers have the motivation and
skills to do it. Let’s focus on productive next steps by the parties
that can afford to change. The result will be a new web advertising
business that works
better for sites and brands, too.
Sites can’t take the big step to remove tracking scripts entirely,
but there are a few things that sites can do to assist with ad
reform.
Fix any “turn off your ad blocker” scripts to detect ad blockers
only, and not falsely alert on privacy tools.
Fix up the privacy page to add links to appropriate privacy tools
based on the user’s browser. (Aloodo
has JavaScript for this.)
If you maintain a privacy tool, offer to do a campaign with the
site. Privacy tool users are high-quality human traffic. Free or
discounted privacy tools might work as a subscription promotion. Where’s
the win-win?
Asking a site to walk away from money with no credible alternative is
probably not going to work. Asking a site to consider next steps to get
out of the current web advertising mess? That might.
The complaints criticised Google for requiring users to “agree” to
its privacy policy in order to use its services. While asking users to
“agree” to a privacy policy is still common practice for many companies,
privacy notices are too long and too complex to be something that users
can realistically understand and “agree” to. Under GDPR consent needs to
be freely given and specific, and must not be bundled - the user must be
able to freely consent to specific activities on a case-by-case basis,
e.g. consent to receive e-mails, or consent to use of their photograph
within a promotional brochure etc. Privacy notices are still needed for
transparency of course - but they should serve as just that:
informational notices, not catch-all consent-gathering documents.
And from Thomas Baekdal on Twitter:
Interesting. But also, as I have said many times before. Imagine what
would happen if the French regulators took the same look at publishers.
My ongoing advice to publishers is to look at these cases as early
indicators. We will be next in line to be looked at.
https://t.co/7fTRottHAb
European regulators are paying attention to consent mangement UX, and
the current approach, which is basically just click OK to make this
annoying dialog go away (and consent to use of your data by 70 companies
you’ve never heard of), is looking less and less likely to work.
Fortunately for reputable publishers, the regulatory pressure to
clean up consent UX is likely to be a good thing for trusted sites. So
this is great time to release the Global
Consent Manager User Study. Global Consent Manager is a new approach
to consent UX, made possible by IAB Europe’s Transparency and Consent
Framework.
The Framework standardises the presentation to users’ third-party
data processing requests that require “informed” consent for data
processing. The Framework enables “signaling” of user choice across the
advertising supply chain. It is open-source, not-for-profit with
consensus-based industry governance led by IAB Europe with significant
support from industry parties and the IAB Tech Lab, which provides
technical management of the open-source specifications and version
control.
I’m a big supporter of the Transparency and Consent Framework, if you
use it right. Consent UX is full of €50 million mistakes—but
the consent data approach of the Transparency and Consent
Framework can still be good if you put a decent UX on it. That’s what
Global Consent Manager aims to do.
Global Consent Manager applies the the same incremental approach that
social and collaboration sites, such as LinkedIn and GitHub, use.
LinkedIn doesn’t ask you to build a complete profile and work history
before you can use the site. Instead, you get to make an account and
then get prompted to add more of your info as you use it. Global Consent
Manager borrowed that idea, in a basic form. Instead of asking for
consent for everybody to use your data everywhere before you even read
the article, with Global Consent Manager you start off in a no
consent state. A consent string with no consent is a valid consent
string, and Global Consent Manager will auto-generate one for you on
your first visit to a supported site.
Later, if you show that you’re interested in the site, the site can
ask for more consent. This approach gives a sustainable advantage to
sites that users choose to trust, and limits the ability of sites whose
traffic comes from deceptively obtained clicks to run saleable ads.
Results from the user research tend to indicate that users spend
significantly more time on a news task when they get the Global Consent
Manager experience, compared to the click OK to consent to
everything default.
The standardization work for consent data, now being done at the
Transparency and Consent Framework, really pays off if you put a
sensible (more LinkedIn-like) UX on it.
Our next step is to extend server-side consent and data management,
with a view to facilitating the needed data collection for publishers
trusted by users to run high-value ads, without enabling data practices
that fail to comply with regulations or with user norms. Please let me
know if you’re interested in participating or reviewing future data.
Leveraging data to make more informed ad targeting decisions is a
breakthrough versus previous methods where ads were un-targeted.
Personalized ads are a win for all parties. [emphasis added] It
is better for:
Users (connects them to more interesting and relevant ads)
Advertisers (results in higher return on investment)
Publishers (delivers higher CPMs and increased revenue)
Hold on a minute. That doesn’t look right.
The win for all parties is clearly bogus from the user and
publisher point of view. Behavioral targeting means that users see more
low-quality and deceptive ads, because behavioral
targeting gives a long-running structural advantage to deceptive
advertisers. As a potential customer, I also win as ad budgets go
into supporting context that I care about, such as news and
cultural works, and lose as ad budgets go into behavioral targeting with
negative
externalities.
From the publisher point of view, behavioral targeting creates
near-infinite saleable ad inventory on low-value and fraudulent sites,
and forces publishers to contend with those sites for ad money. For
users and publishers, the less behavioral targeting the better. But what
about for marketers? Isn’t behavioral targeting a win for them?
Personalised advertising may be one thing but getting people to
respond to even micro-targeted ads is a whole other ball game. However,
analysis of 3.1 million ad exposures shows that such adverts generate
low click through rates (CTR). Furthermore, some of the responses to
such ads are counterintuitive – with a higher CTR coming from ads
mismatched to personalities and lower vs. the overall industry average
for Facebook ads.
…
But if micro-targeting is arguably so ineffective, why do some many
marketers use it? Sharp and Danenberg highlight several reasons:
Marketers often do things based on theory/logic rather than evidence.
The worst myths, the longest lasting, are those that sound plausible.
Micro-targeted campaigns can boast of high ROI, largely because they are
so small, reaching people who had a high likelihood of buying anyway.
Marketers see the high campaign ROI from micro-targeting but fail to
realise that the overall return to the company may well be lower.
It’s fashionable!!!
Read the whole thing. Not surprising if you follow the underlying
#behavioralEconomics. And Dave Trott agress with the
fashionable point: It’s
right because everyone’s doing it.
There might be another reason, though. Maybe part of the problem is
that marketing science is hopelessly mixed up with investor relations.
After all, adtech firms and ad agency groups are publicly traded
companies. And creative ad work, hiring writers and artists and marking
up what you pay them, is the kind of business model that stock markets
get bored with. Margins are low, you can only grow as fast as you can
hire, and your assets can quit and go work somewhere else.
Choosing to place an ad in
a quality context is more cost-effective, but again, it doesn’t
scale. If your business is putting good ads in good places then the
people who do good work have market power. But psychographic models and
the underlying data sets are more investor-friendly. Even if it takes
torturing the data and putting up with fraud
to make adtech look effective. Mathemagickal woo-woo is scalable, more
like the intangible assets of a software company. Markets see promise of
big margins and high revenue/employee.
How much do investor-focused messages about the effectiveness of
behavioral targeting companies interfere with marketer-focused
messages about the effectiveness of behavorial targeting in
campaigns?
A lot of people have come up with the idea of a system that lets
readers of a web site pay to avoid the advertising. This is obviously
bad, wrong and dangerous, for several reasons.
The model assumes that advertising is unredeemably awful, and
walks away from future revenue that would be made possible from fixing
advertising. (So far, Online
Ads Haven’t Built Brands, but what if they could?)
The model creates incentives to make advertising worse. Ever
since we started running the auto-playing video campaign for MIRACLE ASS
FUNGUS CURE, our subscriptions are through the roof! Bonuses for
all!
(a)Because the ads on news sites will keep getting worse and
worse, non-subscribers will get more and more of their news from biased
sources that re-report and spin it. (The most common sound effect on
Rush Limbaugh’s radio show, last I heard it, was him flipping the pages
of the New York Times as he selectively quoted from news
stories.)
Or, because the ads keep getting shittier and shittier, because
that’s the best way to incentivize people to pay to get out of them, ad
blocking keeps going up.
As soon as site owners realize that number 3 is growing, and
won’t go away, they’ll start lobbying for extensive copyright expansion
laws that limit fair use, or create new exclusive rights, or apply DRM
to web pages to limit ad blocking, and, as a side effect, restrict other
software that gives users control over their web experience. Probably
all three. Freedom-hostile companies will repurpose these laws for
censorship and break the Internet.
I know that “this stupid idea will break the Internet” posts are
everywhere, but I just wrote one more. Keeping the ads just high enough
in signal, and low enough in resource suckage and privacy/security risk
that they mostly aren’t worth blocking, is just one of the many things
that has to come out somewhere close to right in order to prevent a
bunch of bad stuff.
One of the great things about Firefox is the ability to customize
with extensions.A MIG-15 can climb and turn faster
than an F-86. A MIG-15 is more heavily armed. But in actual dogfights
the F-86
won 9 out of 10 times. Part of that is training, but part is that
the Soviets used data to build for the average pilot, while the USA did
a bigger study of pilots’ measurements and recognized that
adjustable seats and controls were necessary. Even in a group of pilots
of average overall size, nobody was in the average range on all their
measurements. Here is what I’m running right now.
Awesome
RSS. Get the RSS button back. Works great with RSS Preview.
blind-reviews.
This is an experiment to help break your own habits of bias when
reviewing code contributions. It hides the contributor name and email
when you first see the code, and you can reveal it later.
Cookie
AutoDelete. Similar to the old “Self-Destructing Cookies”. Cleans up
cookies after leaving a site. Useful but requires me to whitelist the
sites where I want to stay logged in. More time-consuming than other
privacy tools. This is a good safety measure that helps protect me while
I’m trying out new the new
privacy settings in Firefox Nightly as my main data protection
tool.
Copy
as Markdown. Not quite as full-featured as the old “Copy as HTML
Link” but still a time-saver for blogging. Copy both the page title and
URL, formatted as Markdown, for pasting into a blog.
Global
Consent Manager, which provides an improved consent experience for
European sites. More info coming soon.
HTTPS
Everywhere. This is pretty basic. Use the encrypted version of a
site where available.
Link
Cleaner. Get rid of crappy tracking parameters in URLs, and speed up
some navigation by skipping data collection redirects.
NJS.
Minimal JavaScript disable/enable button that remembers the setting by
site and defaults to “on”. Most sites that use JavaScript for real
applications are fine, but this is for handling sites that cut and
pasted a “Promote your newsletter to people who haven’t even read your
blog yet” script from some “growth hacking” article.
Personal
Blocklist is surprisingly handy for removing domains that are heavy
on SEO but weak on actual information from search results. (the Ministry
of Central Planning at Google is building the perfectly-measured MIG
cockpit, while extension developers make stuff adjustable.)
RSS
Preview. The other missing piece of the RSS experience. The upside
to the unpopularity of RSS is that so many sites just leave the
full-text RSS feeds, that came with their CMS, turned on.
If adtech
consent is so hard, will Online Behavioral Advertising (OBA) on the
web even be a thing once we fix the long-standing browser bugs that
allow users to be tracked without their knowledge?
After all, OBA advocates have been trying to sell
people on the benefits of being tracked for as long as ads have been
obviously “following” people from one site to another and raising
concerns. So why won’t regular people learn that this is all for
their own good? By now we should have a comfortable pro-OBA user base,
right? Instead, there’s still a stubborn majority against
having your activity on one site follow you to another one, and when
PERFECTLY REASONABLE WE ARE DOING THIS FOR YOUR OWN GOOD WOULD YOU SHUT
UP ABOUT PRIVACY AND JUST CONNECT AND SHARE WITH BRANDS YOU LOVE ALREADY
ad tracking practices do make the news, it’s as part of a Holy
shit, you have WHAT information on my kids, you sneaky nerds?
story.
Where is ad reform going? What’s probably going to happen is that
browsers are going to support the OBA that users give their informed
consent to, but there’s just going to be less ad inventory available to
buy that way, because only about 1/3 of browser users approve of
cross-site tracking.
Browsers have the opportunity to improve consent UX to fill in the
gap between their users’ widely held norms about how personal
information is used and the uses that people are willing to “I Agree” to
or be “OK” with in order to make a pop-up go away. Today, if you’re
building a user interface, you have the choice of:
tricking the user into giving “consent” to practices that the
user would not agree with if understood.
So why does everybody go with number 2? Especially web publishers.
Problematic data practices are not just a violation of user norms, but
an threat to ad-supported sites. If you’re running a site that depends
on reputation, it’s to
your disadvantage to allow your audience to be tracked
elsewhere.
That last point is not just me nerding out over obscure economic
points. Everyone who is successful in web advertising knows that you
have to defend
user data once you have it. Facebook famously closed down app access
to social data. Amazon stopped sending email receipts, to keep email
services from targeting people with ads based on their Amazon shopping
habits. Google’s Ads Data Hub restricts how advertisers can combine
Google and non-Google data.
Why are web publishers—the set of players who are most hurting
here—the exception to the defense rule? It could be because of the
technical landscape. Incumbent tech companies have built
publisher-hostile web clients in order to advantage
some kinds of ad placements As a side effect we also have a
brand-hostile environment and a brand crisis. (Targeted advertising
media are designed for direct response and deceptive ads, and don’t work
for the 60% of ad spending that needs to go to brand building)
The big opportunity now is for reputation-based players—publishers
and brands—to use the defensive opportunities now afforded by browser
privacy improvements and by privacy regulations.
Part of this could be an “objection amplifier” to balance out the
“consent amplifier” effect of bad consent UX. If I go to a publisher
site, or brand site, and they give me a meaningful choice on how my data
is used, the publisher is putting themselves at a disadvantage if they
respect my decision while others get deceptive consent.
So how to handle the built-in disadvantage for honest consent
requests?
If you capture a solid “I do not consent” from a person, then don’t
waste it. Ask the person for a digital signature on an objection to
every DMP on the Lumascape, and send it to the DMPs. And log it, and use
it when you’re selling ads make sure to include the point that “We have
people on this site that the DMPs don’t have, and aren’t allowed to.
Want to reach our audience? Talk to us.”
Running a third-party processor to enable this is one of the biggest
opportunities for the post-creepy Lumascape. Needs a TLA, though. OAP?
Objection Amplification Platform?
Amazon stopped sending email receipts, to keep email services
from targeting people with ads based on their Amazon shopping
habits.
Google’s Ads Data Hub restricts how advertisers can combine
Google and non-Google data.
Facebook announced it would eliminate all third-party data
brokers.
What do these companies have in common? They’re marketing’s winners.
Meanwhile, publishers festoon their sites with consent management
platforms that capture consent for all surveillance marketing,
everywhere. They’ll even get consent for tracking by third parties that
the publisher doesn’t even use. Why play to lose? If you run a trusted
site in a position to get consent and prove you got it, you want fewer
other companies getting that user’s data, not more.
So the obvious counterpart to the consent amplification carried out
by CMPs is some kind of objection amplification.
If the user clicks something other than “OK” on the GDPR consent
dialog, don’t just set their consent to zero. That non-consenting user
needs to have their voice heard, not just filed away. Ask: Do
you want to deny tracking just by our site, or by all these third
parties? Then show them a list of Lumascape firms, most of
which look like they were named not by branding experts, but by some guy
in Florida who mainly communicates by “finger guns”. When the user says,
hell yeah, I don’t want to be tracked by all those companies either,
then that’s when the objection amplification starts. Generate a Article
21 objection for every company you can think of, get the user to sign
off on them, and send them out. (This is why it has to be a platform.
Could be quite a bit of verbiage here.)
Now the record of objections sent is a piece of data for ad sales.
“Buy ads here because x% of our users can’t even legally be targeted by
those other companies.”
Internet platform companies play defense all the time. Will
publishers?
Many researchers who study human behavior on the Internet will point
out that calls to “just delete Facebook” are unrealistic for many users.
A lot of people depend on the company for family connections,
health-related support groups, or even employment.
Which, of course, should make deleting Facebook an easy win from an
economic signaling point of view. If you can credibly stay off Facebook,
you’re signaling that you have the skills, wealth, health, and social
capital not to need it. What could be better? Why aren’t more people
signaling their fitness through conspicuous lack of Facebook dependence?
Two reasons.
If you delete your account, it’s too easy for others to make fake
accounts imitating you, so it looks like you’re on there
anyway.
The decision to #deleteFacebook is easily reversible. You could
easily come sneaking back.
So the thing to do if you want to get signaling power out of quitting
Facebook is to not just delete your account, but do two things.
keep your account live so that it keeps your name “squatted on”
in Facebook-space.
take a credible action to lock yourself out (that is in
compliance with the Facebook ToS, of course).
How about this? Get a “burner” SIM, make that the one phone number
for your account, then let me hang on to the SIM for you. I’ll
periodically post a list of everyone whose Facebook account is
associated with a SIM I hold, but I won’t be able to log in. I’ll charge
a monthly storage fee to keep the SIM for you, but it only comes due
when you reclaim it.
Why do advertisers keep sponsoring illegal activity on big Internet
platforms such as YouTube and Facebook? Platforms are running so many
copyright-infringing copies, cryptocurrency scams, state-sponsored
campaign finance violations, and even functioning as the IT department
for genocide (d00d wtf?) that it’s hard to understand why so many good
brands are still there.
A big part of the problem is that even though platforms do invest
a lot of time and money in removing illegal activity, the
advertisers never know. If you’re a CMO making decisions about where to
spend your ad budget, your experience of a highly customized social
platform is completely different from what most of your brand’s
customers see. As a CMO, you see content from people in your social and
professional circles, and ads from high-bidding advertisers who want to
sell high-margin items such as conferences and SaaS subscriptions. You
don’t see as much of the bad stuff. Advertisers pay the bills for
illegal activity because they lack the information they would need to
stop doing so.
It’s time for Internet platforms to stop hiding this information.
One common growth hacking pattern on social and collaboration sites
is to build user profiles incrementally. Capture just enough info to get
the user logged back in, then get them started using the site. As they
get into it, prompt them to fill in more and more profile information.
You have probably seen this on new sites where you have to make an
account. FIXME: list of good examples
here.
People don’t want to give up a bunch of information up front before
they see how good the site is. And, I suppose, if the site is good
enough that the person thinks they’ll spend more time on it, they’re
more likely to provide correct information than all the residents of
“asdf” born on January 1, 1970.
But news sites don’t take this approach. Instead of trading a little
value for a little information, repeatedly, you get one big dialog
asking you to give up all your information before you even read the
first story.
Does the same incremental approach that applies to data collection
for social and collaboration sites also apply to news sites? Preliminary
results from Global
Consent Manager tend to indicate that yes, it does.
Bargains?
So here’s the bargain. Right now, the web ad business is set up to
bid on ad impressions that come with third-party data, way more than for
impressions without third-party data. So a trackable bot impression on a
fraud site can produce more ad revenue for the fraud operator than an
impression from a privacy-sensitive user running Firefox Nightly or
Apple Safari produces for a legit site.
Yes, even though the privacy-sensitive user is more likely to be
human and interested in buying something related to the topic of the
site.
The opportunity to get a bargain is: instead of relying on
conventional programmatic ad buying, if you do a little extra work to
understand the audience of specific sites, you can reach more of the
humans you’re interested in.
Not every Firefox user who shows up on the Road and
Track site is going to buy a car this year, but $1 worth of ad
impressions there is likely to reach more human car buyers than $1 spent
programmatically—because you get a higher fraction of humans for a lower
price. and third-party data on who’s a likely car
buyer is bogus anyway, but that’s another story.
This opportunity is likely to go away as more agencies figure it out,
but right now it’s a great chance to get humans cheaper than bots.
If you block the tracking cookies that advertisers use to decide
which ads to target you with, you’ll start getting the low-budget,
low-quality ads that show up in the absence of the targeting data that
marks you as a desirable customer.
Before I turned on Enhanced
Tracking Protection I was getting ads for stuff like cloud computing
services and luxury SUVs. Now, with Enhanced Tracking Protection, am I
going to get more ads for for FREE nutritional supplements? You know,
the offers where you put in your credit card info for shipping
and then they keep billing you even after you try to cancel? Or maybe
I’ll get offered a great deal on a for-profit college program, or some
predatory finance! I can’t wait.
It might be an inconvenience for me to start getting the ads that
people get when they’re too broke, or just too
old, for high-bidding advertisers to care about reaching them. But
the real problem is that legit sites are running those ads in the first
place.
signal-carrying model: all visitors to the same
site on the same day see the same ad(s).
brand safety: advertisers choose sites, and site
owners approve ads.
fraud resistance: ads sell by the day instead of
by impression or click.
incentive to discover and support new sites the
first advertiser to express interest in a site get to run their ad for
free until another advertiser places a bid.
But there were still some problems.
Project Wonderful was just as vulnerable to ad blocking as
regular adtech.
The audiences of sites using Project Wonderful were just
as vulnerable to tracking as everyone else.
The second one is especially important. Why spend the effort to pick,
and run ads on, mutiple independent sites in order to get your ad in
front of the right people, when you could just sign up for some user
tracking scheme? The people who control marketing budgets need a
problem, a trend, and a story in order to shift money from one place to
another.
What would it take to borrow and build on the good parts of the
Project Wonderful model while taking steps to fix the problem of data
leakage?
Avoid privacy-focused ad blockers by accepting the EFF DNT
policy. Third parties that can pass EFF’s Privacy Badger also tend to
stay off other blocklists.
Offer unlimited CNAMEs, also to help beat list-based
blockers.
Don’t participate in paid whitelisting as a network, but
individual sites that choose to do so could, for their own specific
CNAME.
Good metrics on tracking protection adoption by the audience.
Show advertisers that these users are hard to reach another
way.
Include reverse tracking walls, tracking detection roadblocks,
and A/B test alternate “turn
off your ad blocker” messages to motivate users to get protected
from cross-site tracking.
Limited, user-permitted data collection with clean consent
management.
The hard part for an independent ad network is to offer small
advertisers something they can’t get from Google or Facebook. Access to
a protected audience?
The farce of consent as currently deployed is probably doing more
harm as it gives the misimpression of meaningful control that we are
guiltily ceding because we are too ignorant to do otherwise and are
impatient for, or need, the proffered service. There is a strong sense
that consent is still fundamental to respecting people’s privacy. In
some cases, yes, consent is essential. But what we have today is not
really consent.
So long as a data collector can overcome sampling bias with a
relatively small proportion of the consenting population, this minority
will determine the range of what can be inferred for the majority and it
will discourage firms from investing their resources in procedures that
help garner the willing consent of more than the bare minimum number of
people. In other words, once a critical threshold has been reached, data
collectors can rely on more easily observable information to situate all
individuals according to these patterns, rendering irrelevant whether or
not those individuals have consented to allowing access to the critical
information in question. Withholding consent will make no difference to
how they are treated!
Is consent management even possible? Is a large company that seeks
consent from an individual similar to a Freedom Monster?
And
what’s going on with Judge Judy and skin care products?There are
thousands of skin care scams on Facebook and other places on the
internet that falsely state that their product is endorsed by
celebrities. These scams all advertise a free sample of their product if
you pay $4.95 for the shipping. Along the way, you have to agree to the
terms and conditions….The terms and conditions are only viewable through
a link you have to click, which most of these people never do.
Or Martin
Lewis and fake bitcoin ads? He launched a lawsuit in April 2018,
claiming scammers are using his trusted reputation to ensnare people
into bitcoin and Cloud Trader “get-rich-quick schemes” on
Facebook.
The problem is that ad media that have more data, and are better at
facilitating targeting, are also better for deceptive advertisers.
Somehow an ad-supported medium needs consent for just enough data to
make the ads saleable, no more. As soon as excess consent enters the
system, the incentive to produce ad-supported news and cultural works
goes down, and the returns to scamming goes up.
Another one of those employee
happiness reports is out. This kind of thing always makes me wonder:
what are these numbers really measuring?
It seems like happiness ratings by employees would depend on:
expected cost of retaliation for low scores
expected benefit of management response to low scores
The expected cost of retaliation is the probability that an
employee’s ratings will be exposed to management, multiplied by the
negative impact that the employee will suffer in the event of
disclosure. An employee who believes that the survey’s security has
problems, that management will retaliate severely in the event of
disclosure, or both, is likely to assign high scores to management.
Some employers make changes in compensation or working conditions
when they fail to achieve well on happiness (or employee
engagement) surveys. If an employee believes that management is
likely to make changes, then the employee is likely to assign low scores
in areas where improvement would have the greatest impact on them.
An evil company where management makes an effort to
de-anonymize the happiness survey results, retaliates against employees
who give low scores, and will not make changes to improve scores, will
appear to have high employee happiness.
A good company where management does not retaliate, and will
make changes in response to low scores, will appear to have low employee
happiness.
Of course, this all changes the more that people figure out that
getting low happiness scores means that you have responsive
management.
privacy tools, such as Disconnect (list-based protection) and
Privacy Badger
(behavior-based protection), that block some ads as a side effect. This
is a small category now compared to ad blocking in general, but is
likely to grow as browsers get better at privacy protection, and try new
performance
features to improve user experience.
deceptive blockers, which are either actual
malware or operate a paid
whitelisting scheme. The best-known paid whitelisting scheme is
Acceptable Ads from Adblock Plus, which is disclosed to any user
who is willing to scroll down and click on the gray-on-white text on the
Adblock Plus site, but not anywhere along the way of the default
extension install process.
So any ad blocker detector is going to be hitting at least three
different kinds of tools and possibly six different groups of users.
People who chose and installed a full-service blocker
People who chose to protect their privacy but did not
specifically choose to block ads
People who may have chosen their browser for its general privacy
policies, but got upgraded to a specific feature they’re not aware
of
People who chose to block ads but got a blocker with paid
whitelisting by mistake
People who chose to “install an ad blocker” because it got
recommended to them as the magic tool that fixes everything wrong with
the Internet
People who are deliberately participating in paid whitelisting.
(Do these people exist?)
Sometimes you need to match the message to the audience. Because
sites can use tools such as
Aloodo to get a better picture of what kind of protection, or
non-protection, is actually in play in a given session, we can try a
variety of approaches.
Is silent reinsertion appropriate when the ad is
delivered in a way that respects the user’s personal information, and
the user has only chosen a privacy tool but not an ad blocker?
When the user is participating in paid whitelisting, can a
trustworthy site do better with an appeal based on disclosing
the deception involved?
For which categories of users are the conventional,
reciprocity-based appeals appropriate?
Where is it appropriate to take no action in a user session, but
to report to a browser developer that a privacy feature
is breaking some legit data collection or advertising?
updated 18 Oct 2018: add dates and locations, reorder.
Good news. It looks like we’re having a consent management
mini-conference as part of Mozfest next month. (I’m one of the
organizers for the Global Consent Manager session, and plan to attend
the others.)
Audience are engaged with an activity where they’re given clauses
from a curated list of clauses from real T&Cs and they express
whether it should have been mentioned outright or not. We have a
discussion about digital privacy and ways to curb exploitation. Visitors
try out our browser plug-in that filters out most important clauses from
any T&C.
This workshop offers an holistic space to create digital tools and
environments in which consent underlies all aspects, from the way they
are developed, to how data is stored and accessed, and to the way
interactions happen between users. Prototyping consent into our tools
will make them more fair and unbiased. Using a specific designed
prototyping loop, teams quickly hypothesize, develop, test and assess
ideas consentful data prototypes.
This session aims to create a working group for improving the user
experience of cookie consent popups. In Europe, the use of cookies was
first regulated by the Privacy and Electronic Communications Directive
2002/58/EC, then revised by a 2009 amendment, and more recently by the
GDPR. Cookie popups and the mechanism for providing consent can be
tedious. Browsing the same website from different devices results in
consent being asked again. A bad usability can lead users to give their
consent without the necessary attention. In this session we will discuss
the state of things and look at possible solutions. We will target a
multidisciplinary audience of internet users, usability experts, browser
developers, lawyers, and online advertisement professionals.
We will discuss how consent management on the web works today, and
the relationship between user privacy and reputable content providers.
Web users face a confusing array of data sharing choices, and click
fatigue can lead to poor user experience and possible inadvertent
selection of options that do not match the user’s privacy norms.
(update 24 Sep 2018: add link to a FastBlock description on
Bugzilla)
What’s the difference between Firefox
browser.fastblock.timeout and
browser.fastblock.limit?
browser.fastblock.timeout: When
Fastblock starts working (in milliseconds, default 5000 =
5s)
browser.fastblock.limit: When
Fastblock stops working (also in ms, default 20000 =
20s).
FastBlock only works if browser.contentblocking.enabled
and browser.fastblock.enabled are both true.
Starting browser.fastblock.timeout ms after the page
starts loading, FastBlock will stop loading new third-party resources
that are on the Tracking Protection list.
FastBlock does not affect third parties that are not on the
Tracking Protection list.
FastBlock does not cancel third-party requests that are already
loading.
Then, starting at browser.fastblock.limit ms after the
page starts loading, FastBlock stops having any effect.
The sole focus of the Fastblock feature is to restrict the loading of
trackers. It monitors trackers waiting for the first byte of data since
the start of navigation of the current tab’s top level document. If this
is not received within 5s, the request is canceled. If any bytes are
received, the 5s timer is stopped. In some of the experimental branches,
a few tracker requests are whitelisted, and do not have this monitoring.
These include resources known to cause breakage, such essential
audio/video, and commenting platforms.
Just going by basic economics, ads placed with more information about
me are going to carry less signal and more deception than ads placed
only by what page they’re on. Now I’m wondering how well “slow loading
ads” correlate with “deceptive ads”. Are slow loading ads slow because
they depend on a bunch of complex RTB stuff? Can less creepy ads be
faster?
Turns out that fast-moving, hungry misinformation operations are
better at YouTube than YouTube is. This is not too much of a surprise.
Resting and vesting makes you stupid. It’s like a resource curse
for code. Sometimes I think I should start an
imposter syndrome cure sanitorium. Main activity for patients will be
watching the JavaScript console for all the errors and warnings on sites
built by the so-called tech elite. (Look who’s still typing = sometimes
instead of == or ===.) Anyway, what do you do when you want to
send someone a link to a YouTube video, but you don’t want the
engagement anti-features to kick in?
How about addressing the problem on the client side?
Here’s an experimental
Firefox extension that will remove the recommended videos sidebar
and keep you on the same video even if the pwned engagement
algorithm tries to auto-play a different one. So if I send a family
member a link, I can have fewer worries that they’ll end up in a rat
hole.
Bug reports and pull
requests welcome. (yes, I know that I should be using a
MutationObserver instead of a timer. At some point I’ll try to figure
that out.)
Using the “Custom Audiences from your Customer List” product
specification, advertisers can upload certain customer lists to Facebook
– based on e.g. emails, phone numbers, Facebook user IDs or mobile
advertiser IDs – from their CRM database, which are first ‘hashed’,
meaning they are transformed into checksums (hash values), and compared
with other checksums generated from Facebook user data. If the checksums
match, then existing and potential customers can be deliberately shown
targeted ads on Facebook, Instagram and in apps and on mobile websites
via Audience Network. Facebook also provides this feature for retailers,
calling it “Offline Custom Audiences”.
This is going to be an interesting natural experiment. Will
ad-supported media do better in jurisdictions where Facebook Custom
Audiences are not available? If Facebook advertising represents an
increase in marketing budgets, then probably not so much. If Facebook
advertising squeezes out other items from the marketing budget, then
this could be a win. (My best guess is that small companies are spending
more on marketing because Facebook is easy and self-service, but
Facebook is just one of many places that larger companies can spend. The
ease of use of Facebook from the advertiser side makes Facebook ads a
contender for small businesses that would have trouble dealing with a
legit site.)
And it’s hard to address the problem of creepy stuff on the Internet
without talking about housing costs. If the California powers that be
can drive up prices to the point where workers need a top-10-percent
income for what would have been a basic middle-class lifestyle
elsewhere, then it’s easier to pressure them into more questionable
practices.
What’s the best defense against surveillance marketing? In some
cases, another surveillance marketer. Just like hackers lock up a
vulnerable system after they break in to protect against other hackers,
surveillance marketers who know what they’re doing are helping to
protect users from other companies’ data collection practices.
Facebook:Late last week Facebook announced it
would eliminate all third-party data brokers from its platform. It
framed this announcement as a response to the slow motion train wreck
that is the Cambridge Analytica story. Just as it painted Cambridge as a
“bad actor” for compromising its users’ data, Facebook has now vilified
hundreds of companies who have provided it fuel for its core business
model, a model that remains at the center of its current travails.Newco
Shift | Facebook: Tear Down This Wall. (And Facebook even runs
a Tor hidden service.)
[A]n ad tech lobby group called ‘IAB Europe’ published a new research
study that claimed to demonstrate that the behavioural ad tech companies
it represents are an essential lifeline for Europe’s beleaguered
publishers….the report claimed that behavioural advertising technology
produces a whopping €10.6 billion in revenue for Europe’s
publishers.
Surely, the ad tech lobby argued, Parliament would permit websites to
use “cookie walls” that force users to consent to behavioural ad tech
tracking and profiling their activity across the Web. The logic is that
websites need to do this because it is the only way for publishers to
stay in business.
We now know that a startling omission is at the heart of this report.
Without any indication that it was doing so, the report combined Google
and Facebook’s massive revenue from behavioural ad tech with the far
smaller amount that Europe’s publishers receive from it.
The IAB omitted any indication that the €10.6 billion figure for
“publishers” revenue included Google and Facebook’s massive share
too.
That’s not the only startling omission. The most often ignored player
in the ePrivacy debate is adtech’s old frenemy, the racket that’s the
number two source of revenue for international organized crime and the
number three player in targeted behavioral advertising—adfraud.
And ePrivacy, like browser
privacy improvements, is like an inconveniently placed motion
detector that threatens to expose fraud gangs and fraud-heavy adtech
firms.
The same tracking technologies that enable the behavioral targeting
that IAB defends are the tracking technologies that make adfraud bots
possible. Bots work by visiting legit sites, getting profiled as a
high-value user, and then getting tracked while they generate valuable
ad impressions for fraud sites. Adfraud works so well today because most
browsers support the same kind of site-to-site tracking behavior that a
fraudbot relies on.
And regulations that make it easier for users to protect themselves
from being followed from one site to another are another source of
anti-fraud power. If bots need to opt in to tracking in order for fraud
to work, and most users, when given a clear and fair choice, don’t, then
that’s one more data point that makes it harder for adfraud to hide.
Publishers pay for adfraud. That’s because adfraud is no big secret,
and it’s priced into the market. Even legit publishers are forced to
accept a fraud-adjusted price for human ad impressions. I’m sure that
not every adtech firm that opposes ePrivacy or browser privacy
improvements is deliberately tolerating fraud, but opposing privacy
regulations and privacy technologies is the position that tends to
conceal and protect fraud. That’s the other omission here.
updated 27 Aug 2018: copy edits for clarity, add
introduction.
EDITOR’S INTRODUCTION – What follows is an edited Q-and-A
exchange between Jo Ellen Green Kaiser, board chair of the Information
Trust Exchange Governing Association, and Don Marti. The exchange is
focused on explaining why a voluntary user data privacy policy adopted
by quality publishers is a valuable companion to upcoming changes in
web-browser software. The browers changes will tend to reduce the
ability of publishers to manage advertising across hundreds of
independent ad networks which don’t coordinate privacy
policies.
Jo
Ellen: In your blog
piece you point out that the news system has to work with user
privacy principles. Most of the conversation is about putting into place
a set of systems based on opt-in tracking but it is not clear how the
principles will impact the opt-in tracking and consent management. I’d
like to hear more about that.
Don: The incentive from the browser side is clear for independent
browser businesses that don’t have a surveillance marketing business
attached. What is it that a big incumbent browser will have trouble
doing but that users clearly want?
Extensive user research indicates that users prefer a browser that
will protect them from having their activity in one context follow them
over to another context, and they also want a clear and non-confusing
user experience. So this sets up an opportunity for browsers. They can
compete over who can best manage user data in order to meet people’s
norms and preferences on how that data is used.
Browser management decisions being made day to day are based on how
to acquire users, and keep users once they are already running a
browser. So what are the side effects of this new browser competitive
area? Why are publishers going to need to be concerned about it, and
where can they get some sustainable advantage from it? And the answer is
that when user data gets managed in accordance with users’ norms and
preferences, then sites that are trusted by users to use their data have
an advantage over untrusted sites. And the biggest place this will show
up immediately is in ad fraud, because the way that fraud bots work is
they leak user data from high-value sites to fraud sites. They do
exactly what the mainstream browsers do today in facilitating tracking
the user from high value sites to low value sites.
Can the platform that connects permissed data function now
without anything more than GDPR or do you see the need for more detailed
privacy protections?
There is a need for comprehensive privacy policies across sites
because it is prohibitively expensive for small news organizations to
keep up with all the details of all the privacy tools and requirements
across every possible tech platform and jurisdiction. One major US
publishing company was unable to do GDPR compliance for their sites so
they ended blocking a whole bunch of US news sites for European
visitors.
When I see a site that isn’t able to comply with GDPR, I see a site
that is getting its clock cleaned by data leakage. Every single person
using that site is getting their data leaked out to other places so they
can get reached without the original publisher getting any benefit from
it.
If you can’t even do GDPR as a big publishing company how are you
going to be able to do California, Europe, and India as a small
independent web site, or do clean user-data collection across Firefox,
Safari, and other browsers out there?
This is good. We are talking about creating trusted news
sites based on the way they work with user data.
The ways users indicate trust with a site are potentially all over
the place. They might say they trust their local public radio station by
pledging and getting a coffee mug. They might indicate they trust their
local news site by filling in a traffic survey saying what neighborhood
they live and work in. A user might indicate trust for a site by leaving
a comment or a letter to the editor. Many different platforms all have a
small view into user trust and all have an opportunity to capture some
kind of consent for data use, but there’s no good way to integrate all
those. And if you do it through a conventional surveillance marketing
mechanism you may be doing it in a way that doesn’t even capture
consent. User data without consent is not going to be sustainable on a
regulatory or technical basis.
Your typical news site has 50-70 third-party domains showing up on
it, and every one of them has a separate privacy policy, all written by
different lawyers with the objective of staying out of trouble while
giving you the least privacy possible. So if you are a publisher running
some skeevy tracker on your site without the right consent, future
browsers are going to look at that and say there is no way this user has
given consent to this firm from a dark corner of the Lumascape, I’m not
going to reveal any user data to that firm.
So what you end up is news sites with reputable content not having
the right consent bits set in order to be able to prove that they have a
valuable audience. We saw this with GDPR and unconsented impressions
coming into real-time bidding platforms. Some of those impressions are
coming in without the right consent bits set which means they aren’t
going to get bids from some advertisers. Even users who trust the site
are not producing ad impression value for the publisher they trust, and
that’s a big problem. That’s the first thing that publishers are going
to be concerned about with browser privacy improvements. Without all the
non-permissioned data we are used to seeing attached to the impressions,
those are not going to have much value. Publishers are going to be
selling remnant impressions on a quality site because they don’t have
the data.
Let’s imagine we have a way to collect opt-in data from a
variety of different news sites, and also the merchants and apps that
supply those news sites with services. It provides uniform opt-in rules
to gather that data and then is able to serve those opt-in users with
different types of content. Sort of an opted-in Taboola. If that kind of
platform were created would there still be a need for privacy policies
as well, or would the consent management system replace that need for
the privacy policies?
Consent strings in Apple Safari are managed like any other tracking
state would be. So the platform has to be aware of the policies and
limitations of all the privacy tools that feed into a user data
collection opportunity. Privacy Badger is a niche tool. They look for a
specific third-party tracking policy. That is not as important for
mainstream adoption directly but some of the list-based tools out there
like Disconnect, which Firefox feeds off, can be informed by trackers
detected by Privacy Badger.
A common policy has a real role because it lets you address
incompatibilities one at a time instead of having a big n by
m matrix of site privacy policies and privacy tool policies. It
is kind of like open source licenses. If you go to build a project and
want to keep your licenses compatible, it is way easier if you have a
single software license across that ecosystem or at least a set of
compatible licenses.
That is super helpful.
This platform needs to come into existence in an incremental way.
Many local sites are signed up with Google and use Google Tag Manager
for their ad serving. Google has a lot of the needed functionality built
out for their European customers, so the process of moving from
unpermitted user data sharing to permission-based user data sharing can
be done incrementally if you work it the right way.
Sites can use the Google tools according to their design, taking
features that have been developed for compliance in Europe and applying
those features to another need, like an off-label use for
GDPR compliance features. It’s like discovering you can cure some ulcers
by taking a specific dose of antibiotics. This is a big opportunity for
Google as well.
There is a need for a comprehensive policy because it is too
complicated to do it across all the platforms, and even if there is a
private label way to create some kind of opt in, how do you rely upon
consent management? Like an open source license, you need a privacy
policy that gives a you common language that allows you to cross all
these different juristictions, tools and browsers.
Yes, and when this common policy is out there and able to be part of
a discussion with tool and browser developers, that policy will inform
the future decisions made by those developers. People will say I don’t
really want my tool to block permitted data sharing with trusted sites,
how do I make my tool better reflect what the users are doing?
(update 20 Nov 2018: copy edit, add a link to Dr. Johnny Ryan’s
CNIL article)
Today’s web advertising relies on 1990s browser behavior—most
browsers fail to protect users from being tracked from site to site, and
advertisers are used to taking advantage of that old defect. But because
browsers
do user research and respond to what users want, that’s changing.
Browsers are making it harder to track users from site to site without
their permission. Along with privacy regulations, this change is
creating an opportunity for new, “post-creepy” web advertising that:
works with user privacy principles
has fewer of the negative externalities of targeted ads
gives more market power and revenue to sites that users choose to
trust
The big opportunity is in enabling publishers to reclaim control over
their own audience data, not in establishing a new choke point such as a
cryptocurrency or paid whitelisting program. (If publishers wanted to
give up control to a tech firm, they can do that already.) Most of the
development that is needed here can be provided by third parties that
publishers are already using, because third parties are coming into
compliance with privacy regulations. For example, Google Tag Manager
already has the required functionality in order to comply with the
European GDPR.
The missing piece is a way for sites to collect and enough
user data to show advertisers that the site is trusted by human users,
in order to make the ads on that site saleable.
In the new environment, user data alone is insufficient—data must be
accompanied by the consent required to use it. And that can’t
be just “click to make this dialog go away and consent to adtech as
usual”. Both regulators and browser developers are going to require
real consent. So the web advertising system needs to evolve away from
dependence on large quantities of un-permissioned data towards the
ability to use less data accompanied by permission. (Post-creepy web ads
won’t be able to swim in abundant unpermissioned data with the nutria of
the Lumascape. Consent is scarcer than raw data, and only data
accompanied by consent is safe to use. Publishers will have to collect
and conserve every drop of data, like muad’dib, the desert
mouse of Arrakis.) Possible sources include:
Differences in browser behavior between trusted and untrusted
sites
and more.
Consent management is a tricky problem. IAB Europe is doing some work
toward addressing it, with the open-source Transparency and Consent
Framework. Although existing implementations are designed to nudge the
user into not-transparent data practices, and are not yet getting real
consent, this framework does provide a starting point on which to build
consent management that both implements the user’s preferences
accurately and provides a smooth user experience. (more info: Global Consent Manager.
Global Consent Manager is a client-side component that you can try in
Firefox now, that can interact with server-side data platforms.)
In principle, privacy regulation and browser privacy improvements
have the potential to lower the return on investment on creepy tracking,
and raise the return on investment on building reputation and getting
consent. But publishers, who have the reputation to get users to agree
that they have the right to use data, don’t have the
development budgets or time to build the tools for data
gathering.
User data and opportunities to get to get consent are everywhere, in
CMSs, other software, and in third-party services. The missing piece is
a platform that will collect data, with permission, from all the above
sources and
run either on the publisher’s own infrastructure or as a
third-party service so that small publishers don’t need to touch the CMS
or deploy and manage a new service
comply with current and future data protection
regulations
work with and anticipate privacy improvements in
browsers
provide reports and APIs in a usable format for advertisers and
agencies
Many of today’s ad agencies, even sympathetic ones, won’t come to the
new system by choice, because it won’t allow for tracking desirable
audiences to cheap sites. We can assume that advertisers and agencies
will ignore the new system until they see that it’s a way to reach a
significant audience that they can’t reach in other ways, today, and the
mainstream tracking-protected web audience in the near future.
I’ll be at
the Voice of Blockchain conference in Chicago on Friday and
Saturday. Two panels: “Journalism: Incentivizing the Truth” on Friday,
and “Crowdsourcing, Bounties, and Democratizing Access to Jobs” on
Saturday.
So what does blockchain have to do with incentivizing journalism?
One important reason that we have standards of fairness and accuracy
in news is that news organizations sell
advertising to mainstream brands. Brands that want to be able to
sell to everyone, not just one side of a political or social issue.
High-reputation news sites don’t respond individually to the demands of
advertisers, but the principles on which high-reputation news sites
operate have developed in parallel with the needs of brand safety.
On today’s web, reputation-based advertising is not so much of a
thing. Adtech
firms place ads from legit brands on brand-unsafe sites, usually
without anyone at the brand knowing about it. Faris
Yakob points out, By squeezing fees and margin procurement put
incredible pressure on agency principals, who have obligations to hit
certain targets from the holding companies. Rock meet hard place. Thus
new sources of revenue were found, in media rebates, or opacity, or
programmatic trading desks, or production fixing - all conflicts of
interest that can be leveraged to try to appease both masters…for a
time.
When agencies try to get ad impressions in front of the desired
audience at a bargain price, a lot of ad money ends up with fraudulent
or brand-unsafe sites. Even legit sites end up running 50 to 70 tracking
scripts because they lack
the market power to protect their audience from being tracked to
cheaper sites.
Incentivizing journalism depends on helping users protect their
personal information from being tracked from one site to another. As
users get the tools to control who they share their information with
(and they don’t want to leak it to everyone) then the web advertising
business has to transform into a reputation contest. Whoever can build
the most trustworthy place for users to choose to share their
information wins.
Blockchains are slow and expensive compared to databases or
conventional payment systems, but cheap compared to trust networks. As
browsers take a more active role in protecting users from third-party
tracking, reputable news sites will need a new technical infrastructure
for Internet advertising that accurately reflects the trust
relationships between brands, agencies, sites, and users.
What about “Crowdsourcing, Bounties, and Democratizing Access to
Jobs”? This is a fun area. Learn market
design is the new learn to code.
Developers would prefer to release open source software at a high
quality level and get paid for it. Many users would prefer to use
software at a higher quality level if they could pay for it. The current
software market, though, incentivizes companies to release at a low
quality level, in order to get early adoption and build network effects.
One approach is to build a new kind of market, one that allows users to
hedge their software quality risks while enabling developers to trade on
the likelihood of bug fixes. More info: Rao
et al.
Targeted advertising (where the browsing habits of consumers are tracked
and then used to provide them with more specific adverts) was another
commonly cited source of anxiety, with many respondents feeling
powerless to stop the intrusion. One described how “a lot of my
particular anxieties came into full swing when I learned more about how
online advertising works. When I noticed Facebook ‘Like’ buttons on
unrelated pages and when ads follow me around. The feeling that I had no
privacy was claustrophobic and has led to so many anxiety attacks I have
lost count”.
The link from the Chipotle ad redirected consumers to an Amazon
gift card scam that presents the viewer with a fraudulent message that
is intended to prompt a click to steal the user’s personal
information.
The web advertising game is changing from a hacking
contest to a reputation contest. It would have had to happen anyway,
but the shift is happening quickly right now because of two trends.
Privacy regulation (starting with the European Union, California
and India). Some regulations will have impact outside their own
juristictions when companies choose not to write and enforce separate
second-class privacy policies for users not covered by those
regulations.
New “browser wars” over which browser can best implement
widely-held user norms on sharing their personal information. (Web
browsers are good at showing you a web page that looks the same as it
does on the other web browsers. Why switch browsers? For many users,
because one browser does better at implementing your preferences on
personal data sharing.)
Right
now the web is terrible as a tool for brand building. But the web
doesn’t have to get better at signaling, or less fraudulent,
than print
or broadcast. In a lot of places the web just has to be better than Android.
Fixing web advertising is not one big coordination problem. People who
are interested in web advertising, from the publisher and ad agency
point of view, have a lot of opportunities for innovative and
remunerative projects.
Browser privacy improvements, starting with Apple Safari’s Intelligent
Tracking Prevention, are half of a powerful anti-fraud system. The
better that the browser protects the user’s information from leaking
from one site to another, the less it looks like a fraudbot. How
can publishers and brands build the other half, to shift ad budgets away
from fraud?
“Conscious
choosers” are an increasingly well-understood user segment, thanks
to ongoing user research. For some brands and publishers, the best
strategy may be to continue to pursue “personalization
pioneers”, the approximately one-third of users who don’t object to
having their information collected for ad targeting. Other brands have
more appeal to mainstream, vaguely creeped out, users, or to users who
more actively defend their personal info. How can “conscious
chooser” research inform brands?
Regulation and browser privacy improvements are making contextual
targeting more imporant. Where are the opportunities to reach
human audiences in the right context? Where does conventional
programmatic advertising miss out on high-context, signalful ad
placements because of gaps in data?
As sharing of user data without permission becomes less common,
new platforms are emerging to enable users to share information about
themselves by choice. For example, a user who comments on a local news
site about traffic may choose to share their neighborhood and the mode
of transportation that they take to work. User data sharing platforms
are in the early stages, and agencies have an opportunity to understand
where publishers and browsers are going. (Hint: it’ll be harder to get
big-budget eyeballs on low-value or fraudulent sites.) Which
brands can benefit from user-permissioned data
sharing?
(Complementary to data sharing issues) Consent management is
still an unsolved problem. While the Transparency and Consent
Framework provides a useful foundation to build on, today’s consent
forms are too annoying for users and also make it difficult
and time-consuming to do anything except select a single
all-or-nothing choice. This doesn’t
accurately reflect the user’s data sharing choices. The first
generation of consent management is getting replaced with a better front
end that not only sends a more accurate consent decision, but also takes
less time and attention and is less vulnerable to consent
string fraud. How will accurate and convenient consent
management give advantages to sites and brands that users
trust?
Workshops are in progress on all this stuff. Clearly it’s not all
just coming from the browser side—forward-thinking people at ad agencies
and publishers are coming up with most of it.
Inner procrastinator: HEY LET’S FIND SOME K3WL
ARTICLES TO READ ON THE INTERNET
Sense of duty: No, must update project status.
(Ctrl-T to open new tab)
Web browser: HEY WEREN’T YOU LISTENING TO INNER
PROCRASTINATOR JUST NOW? HERE IS SOME RECOMMENDED CONTENT
Me: Preferences → Home → Firefox Home Content.
Uncheck everything except “Web Search” and “Bookmarks”.
Anyway, happy Friday. Since you’re already reading blogs, you might
as well read something good, so here is some stuff that the RSS reader
dragged in. (My linklog is no longer getting posted to Facebook because
reasons,
so if you were clicking on links from me there you will have to figure
something else out. The raw linklog is: feed. Ideas?)
The Segway
patent expires next June. If you thought the scooters of San
Francisco were annoying this year, just wait for the summer of
generic-Segway-on-demand startups.
The “legitimate interest” provision in the GDPR will not save
behavioral advertising and data brokers from the challenge of obtaining
consent for personally identifiable data.
The obvious question is: how many of the vendors listed on the
Global Vendor and CMP List are actually relying on LI for purposes
of Ad selection, delivery, reporting? Worth writing a simple script to check.
Looks like 151 of 409, or about 37%.
Purpose 3 is:
Ad selection, delivery, reporting: The collection of
information, and combination with previously collected information, to
select and deliver advertisements for you, and to measure the delivery
and effectiveness of such advertisements. This includes using previously
collected information about your interests to select ads, processing
data about what advertisements were shown, how often they were shown,
when and where they were shown, and whether you took any action related
to the advertisement, including for example clicking an ad or making a
purchase. This does not include personalisation, which is the collection
and processing of information about your use of this service to
subsequently personalise advertising and/or content for you in other
contexts, such as websites or apps, over time.
And here’s the list of vendors with a “3” in their
legIntPurposeIds:
Today’s web advertising is mostly a hacking contest. Whoever can
build the best system to take personal information from the user wins,
whether or not the user knows about it. Publishers are challenging
adfraud and adtech hackers to a hacking contest, and, no surprise,
coming in third.
Mainstream browsers, starting
with Apple Safari, are doing better at implementing user preferences
on tracking. Most users don’t
want to be “followed” from one site to another. Users generally want
their activity on a trusted site to stay with that trusted site. Only
about a
third of users prefer ads to be matched to them, so browsers are
putting more emphasis on the majority’s preferences.
Privacy law—from Europe,
to California,
to India,
is being updated to better reflect user expectations and to keep up with
new tracking practices.
As users get the tools to control who they share their information
with (and they don’t want to leak it to everyone) then the web
advertising business is transforming from a hacking contest into a
reputation contest. The rate-limiting reactant for web advertising isn’t
(abundant and low-priced) user data, it’s the (harder to collect) consent
bits required to use that data legally. Whoever can build the most
trustworthy place for users to choose to share their information wins.
This is good news if you’re in the business of reporting trustworthy
news or fairly compensating people for making cultural works, not so
good news if you’re in the business of tricking people out of their
data.
Federated paywall systems are not just yet another attempt at
micropayments, but also have value as a tool for collecting trust. The
user’s willingness to pay for something is a big trust signal. A small
payment to get past a paywall can produce a little money, but a lot of
valuable user data and the consent bits that are required to use that
data.
The catch is to figure out how to design federated paywalls so that
the trusted site, not the paywall platform, captures the value of the
data, and so that the platform can’t leak or sell the user’s data
consent outside the context in which they gave it. In the long run, a
consent system that tries to hack around user data norms to rebuild
conventional adtech is going to fail, but not before a lot of
programmers lose a lot of carpal tunnels on privacy vs. anti-privacy
coding, and a lot of users face a lot of frustrating consent dialogs.
Browser improvements and court cases will filter deceptively
collected consent bits out of the system.
Consent bits are a new item of value that needs new rules. The web ad
business is not going to be able to sell and and sync consent bits the
same way that it handles tracking cookies now. Consent bits are not a
“data is the new oil” commodity, and can really only move along trust
networks, with all the complexity that comes with them. New tools such
as federated paywalls are an opportunity to implement consent handling
in a sustainable way.
(Update 18 Aug 2018: Fix an error to be consistent with the
source quoted.)
(I work for Mozilla. None of this is secret. None of this is
official Mozilla policy. Not speaking for Mozilla here.)
The good news is that interesting competition
among web browsers is back, not just because of ongoing performance
improvements in Firefox, but also because of Apple Safari’s good work on
protecting users from some kinds of cross-site tracking by default. Now
the challenge for other browsers is to learn from the Safari work and
build on it, to even more accurately implement the user’s preferences on
sharing their personal information. According to research
by Tini Sevak at YouGov, 36% of users are more likely to engage
with adverts that are tailored to them, while 55% are creeped
out by personalized ads. The browser has to get its data sharing
settings right for the individual user, while minimizing the manual
settings and decision fatigue that the user has to go through.
A short-term problem for sites, though, is that the current price for
highly tracked ad impressions facilitated by cross-site tracking is
still way above the price of impressions delivered to users who choose
to protect themselves. Tim Peterson, on Digiday, covers the natural
experiment of GDPR consenters and non-consenters:
If an exchange or SSP declines to sign the agreement, it is limited
to only selling non-personalized ads through DBM. Those generic ads
generate less revenue for publishers than personalized ads that are
targeted to specific audiences based on data collected about them. Some
publishers that are heavily reliant on DBM have seen their revenues
decline by 70-80 percent since GDPR took effect because they were
limited to non-personalized ads, said another ad tech exec.
users are more likely to share information with a site they
trust
But in the short term, what can browsers do to help address the
market dislocation from the user data crunch?
One possibility is to take advantage of an important side effect of
browser privacy improvements: better anti-fraud data.
Today, unprotected browsers and fraudbots are hard to tell apart.
Both maintain a single “cookie jar” across trusted and untrusted sites.
For fraudbots, cross-site trackability is not a bug as it is for a human
user’s browser—it’s a feature. A fraudbot can only produce valuable ad
impressions on a fraud site if it is somehow trackable from a legit
site.
As browser users start to upgrade to nightly releases that include
more protection, though, a trustworthy site’s real users will start to
look more and more different from fraudbots. Low-reputation and fraud
sites claiming to offer the same audience will have a harder and harder
time trying to sell impressions to agencies that can see it’s not the
same people. This does require better integration with anti-fraud tools,
so it’s something sites and anti-fraud vendors can do in parallel with
the brower release process.
Can the anti-fraud advantages of browser privacy improvements
completely swamp out the market effects of reducing cross-site
trackability? Depends on how much adfraud there is. We don’t know.
If an exchange or SSP declines to sign the agreement, it is limited
to only selling non-personalized ads through DBM. Those generic ads
generate less revenue for publishers than personalized ads that are
targeted to specific audiences based on data collected about them. Some
publishers that are heavily reliant on DBM have seen their revenues
decline by 70-80 percent since GDPR took effect because they were
limited to non-personalized ads, said another ad tech exec. That revenue
drop has put pressure on exchanges and SSPs to sign Google’s consent
agreement lest their publishers move their inventory to other platforms
that can run DBM’s personalized ads on their sites, the second exec
said.
A lot of those “specific audiences” are, of course, adfraud bots.
Fraud hackers are better at adtech than
adtech firms are. So ads shown to bots, on shitty sites, are going
for more than ads seen by humans on legit sites.
Meanwhile, tracking-resistant, personalization-averse readers are
overrepresented in some customer categories. Web developers are a good
example. (40% protected based on recent data from one popular site.)
Of course, today’s web ad system is based on tracking the best
possible prospect to the cheapest possible site, so it won’t be easy to
take advantage of this nice piece of market inefficiency. First step is
figuring out how well protected the people you want to reach are.
Recent question about futures markets on software bugs: what’s the
business model?
As far as I can tell, there are several available models, just as
there are multiple kinds of companies that can participate in any
securities or commodities market.
Oracle operator: Read bug tracker state, write
futures contract state, profit. This business would take an agreed-upon
share of any contract in exchange for acting as a referee. The market
won’t work without the oracle operator, which is needed in order to
assign the correct resolution to each contract, but it’s possible that a
single market could trade contracts resolved by multiple oracles.
Actively managed fund: Invest in many bug futures in
order to incentivize a high-level outcome, such as support for a
particular use case, platform, or performance target.
Bot fund: An actively managed fund that trades
automatically, using open source metrics and other metadata.
Analytics provider: Report to clients on the quality
of software projects, and the market-predicted likelihood that the
projects will meet the client’s maintenance and improvement requirements
in the future.
Stake provider: A developer participant in a bug
futures market must invest to acquire a position on the fixed side of a
contract. The stake provider enables low-budget developers to profit
from larger contracts, by lending or by investing alongside them.
Arbitrageur: Helps to re-focus development efforts
by buying the fixed side of one contract and the unfixed side of
another. For example, an arbitrageur might buy the fixed side of several
user-facing contracts and the unfixed side of the contract on a deeper
issue whose resolution will result in a fix for them.
Arbitrageurs could also connect bug futures to other kinds of
markets, such as subscriptions, token systems, or bug bounties.
Despite theoretically adverse conditions, we find these markets are
relatively efficient, and improve upon the forecasts of experts at all
three firms by as much as a 25% reduction in mean squared error.
(This paper covers a related market type, not bug futures. However
some of the material about interactions of market data and corporate
management could also turn out to be relevant to bug futures
markets.)
Creative Commons
Pipeline monument in Cushing, Oklahoma: photo by Roy Luck for Wikimedia
Commons. This file is licensed under the Creative Commons
Attribution 2.0 Generic license.
Companies don’t advertise on sites like YouTube, sites teeming with
copyright infringers and nationalist extremists, because those companies
are run by copyright infringers or nationalist extremists. Marketing
decision-makers are incentivized to play a corrupt online advertising
game that rewards them for supporting infringement and extremism.
So the trick here is to help people move marketing money out of bad
things (negative externalities) and toward good things (positive
externalities). We know that YouTube is a
brand-unsafe shitshow because Google won’t advertise its own
end-user-facing products and services there without a whole extra layer
of brand safety protection.
Big Internet companies are set up to insulate decision-makers from
the consequences of their own online asshattery, anyway. The way to
affect those big Internet companies is through their advertisers. So how
about a tweak to any laws that give a safe harbor to an Internet
service? Keep the safe harbor for the service itself, but remove safe
harbor protections for companies that advertise on it. This
should help in several ways.
Give legit services some flexibility. If your web site’s business
model is anything other than “get cheap eyeballs with other people’s
creative work” or “get cheap eyeballs by recommending divisive bullshit”
then you don’t have to change a thing.
Incentivize sites to pay for new creative work, by making works
covered by an author or artist contract a more attractive place for paid
advertising than “content” uploaded by random users.
Make it easier for marketers who want to do the right thing, by
pointing out the risks of supporting bad people.
Move some of the risks of online advertising away from the public
and toward the decision makers who can make a difference.
(This is a cleaned-up and lightly edited version of my talk from Nudgestock 2018.)
First I have to give everybody a disclaimer. This is 100% off
message. I work for Mozilla. I am NOT speaking for Mozilla
here.
If you follow Rory, you have probably heard a lot about signaling in
advertising, so I’m going to go over this material pretty quickly. Why
does Homo economicus read magazine advertising but hangs up on
cold calls? To put it another way why is every car commercial the same?
You could shoot the “car driving down the windy road” commercial with
any car. All that the car commercial tells you is: if it was a waste of
your time to test drive our car then it would have been a waste of our
money to make this little movie about it.
There’s a whole literature of economics and math about signaling
involving deceptive senders and honest senders. With this
paper, Gardete and Bart show that when the sender wants to really
get a message across, counter-intuitively the best thing for the sender
to do is deprive themselves of some information about the receiver. If
you’re in the audience and you know what the sender knows about you,
then you can’t tell are they honestly expressing their intentions in the
market, or are they just telling you what you want to hear? Anyone who
used to read Computer Shopper magazine for the ads didn’t
just read it for specific information about all the parts that you might
put into your computer. You read it to find out which manufacturers are
adopting which standards so you don’t buy a motherboard that won’t
support the video card that you might want to upgrade to next year.
There are three sets of papers in the signaling literature. There are
papers that have pure math where you devise kind of a mathematical game
of buyers and sellers and see how that game works out. And there are
papers where you take users in an experimental setting. Ambler
and Hollier took 540 users, showed them different versions of expensive
looking and cheap looking advertising that conveys the same
information. Finally you’ve got the kind of research that looks at
spending across different product categories, and in this study they
found that types of product that have different advertising to sales
ratios really depends on how much extra user experience it takes to
evaluate that product.
The feedback loop here is that when brands have signaling power, then
that means market power for the publishers that carry their advertising,
which means advertising rates tend to go up, which means the publishers
can afford to make obviously expensive content. And when you attach
advertising to obviously expensive content, that means more signaling
power. It’s kind of a loop that builds more and more value for the
advertiser.
Some people compare
this to the signaling that a bank does when they build this monstrous
stone building to keep your money. Really, the stuff that a bank
does, having a stone building doesn’t do any more for keeping money in
it than having a metal building or a concrete building, but it just
shows that they’ve got this big stone building with their name on it so
if they turned out to be deceptive it would be more costly for them to
do it. That’s the pure signaling model. But the other area that we can
see when we compare this kind of classic signal-carrying advertising to
online advertising, the kind of ads that are targeted to you based on
who you are, is what’s up with the norms enforcers?
Rory has his blue
checkmark on Twitter which means he doesn’t see Twitter ads. I’m
less Internet Famous, so I still get the advertising on Twitter. A lot
of the ads that I get are deceptive issue ads.
This is one. A company that’s getting sued for lead paint related issues
is trying to convince residents of California that government inspectors
are coming to their houses to declare them a nuisance. This is bogus and
it’s the kind of thing that if it appeared in the newspaper that
everyone got to see then journalists and public interest lawyers, and
everyone else who enforces the norms on how we communicate, would call
it out. But in a targeted ad medium this kind of deceptive advertising
can target me directly.
So let me show a little
simulation here. What we’re looking at is deceptive sellers making a
sale. When a deceptive seller makes a sale that’s a red line. When an
honest seller makes a sale, that’s a green line. The little blue squares
are norms enforcers, and the only thing that makes a norms enforcer
different in this game from a regular customer is when a deceptive
seller contacts a norms enforcer the deceptive seller pays a higher
price than they would have made in profit from a sale. So with honest
sellers and deceptive sellers evolving and competing in this primordial
soup of customers, what ends up happening to the deceptive sellers that
try to do a broad reach and hit a bunch of different customers is, well
you saw them, they hit the norms enforcers, the blue squares lit up.
Advertisers who are deceptive and try to reach a bunch of different
people end up getting squeezed out in this version of the game. An
honest advertiser like this little square down here can reach over the
whole board because they don’t pay the penalty for reaching the norms
enforcer.
So what does this really mean for the real web? On the World Wide
Web, have we inadvertently built a game that gives an unfair advantage
to deceptive sellers? If somebody can take advantage of all the the user
profiling information that’s available out there, and say, “oh I believe
that these people are rural, low-income, unlikely to be finance
journalists, therefore I’m going to hit them with the predatory finance
ads,” does that cause users to pay less attention to the medium?
Online advertising effectiveness has declined since the launch of the
first banner advertisement in 1994. That’s certainly not news. This is a
slide that appeared in Mary Meeker’s famous Internet Trends
presentation, and as you can see blue is percentage of ad spending, grey
is percentage of people’s time. So TV is 36% of the time 36% of the
money. Desktop web 18%, 20%, about right.
What’s going on with print? Print is 9% of the money for 4% of the
time. Now you might say this is just inertia, that that this year people
are finally just cutting back on spending money in print because of
people spending less time on print and it’ll eventually catch up. But I
went back and plotted the
same slide from the same presentation going back to 2011, and I’ve
got time plotted across the bottom, money plotted on the y
axis, and what do we see about print? Print is on a whole different
trend line. Print is on a trend line of much more value to the
advertiser per unit of time spent than these other ad medium. My
hypothesis is that targeting breaks signaling and this means an
opportunity.
Targeting means that when you see an ad coming in targeted to you
it’s more like a cold call. It doesn’t carry credible information about
the seller’s intention in the market.
From the point of view of who has an incentive to to support
signal-carrying ad media instead, the people who have an interest in
that signal for attention bargain in that positive feedback loop are of
course the publishers, high reputation brands that want to be able to
send that signal, writers, photographers, and editors, people who get
paid by that publisher, and people who benefit from the positive
externalities of those signal carrying ads that support news and
cultural works.
So if the signaling model is such a big thing then why are there so
many targeted ads still out there?
Nudges.
Let’s have a look at, just to pick an example, the Facebook
advertising policy. As you know, the Facebook advertising platform will
let you micro target individuals extremely specifically. You can pick
out seven people in Florida, you can pick out everyone who’s looking for
an apartment who doesn’t have a certain ethnic affinity, that kind of
thing. But the one thing you’re not allowed to do with Facebook
targeting is put
anything in your ad that might indicate how you’re targeting it. The
policy
says:
ads must not contain content that asserts or implies personal
attributes
You can’t say, I know you’re male or female, I know your sexual
orientation, I know what you do for a living. The ad copy has to be
generic even if the targeting can be extremely specific. You can’t even
say other. You can’t say meet other singles because that
implies that the advertiser knows that the reader is single. Facebook
will let you target people with depression but you can’t reveal that you
know that about them. Aanother good example is Target. They do targeting
of individuals who they believe to be pregnant, but they’ll pad out
those ads for baby stuff with ads for other types of products so as not
to creep everybody out.
Back to our shared interest in signal for attention bargain. Pretty
much everybody has an interest in that original positive feedback loop
of getting the higher reputation for brands of getting reputation driven
publishers that’ll build high quality content for us. Writers and
photographers have an interest in getting paid, and people who are
shopping for goods are the ones who want the signal the most. All that
stands on the opposite side is behavioral tricks to conceal targeting.
Now I’m not going to say this as a privacy issue. I know that there are
privacy issues here but that is really not my department. Besides,
Facebook just announced a dating site so they’re going to breed privacy
preferences out of their user base anyway.
Can the web as an advertising medium be redesigned to make it work
better for carrying signal? We know from the existence of print that
this type of signal carrying ad medium can exist. Print is an existence
proof of signal carrying advertising. We also know that building that
kind of an ad medium can’t be that hard because print was built when
people were breathing fumes from molten lead all day.
The prize for building a signal-carrying ad medium is all the
cultural works that you get when somebody like Kurt Vonnegut can quit
his job as manager of a car dealership and write for Collier’s magazine
full-time. This
book is still on sale with the resulting stories. And of course
local news. Democracy depends on the the vital flow of information of
public interest. Some people say that the problem with news and
information on the web is that it’s all been made free, and if people
would just subscribe we could fix the system. But honestly if if free
was the problem, then Walter Cronkite would have destroyed the media
business in 1962. It’s a market design problem and a signaling problem,
not just a problem of who has to pay for what.
And the web browsers got a bunch of things wrong in the 1990s. There
are certain patterns of information flow that the browser facilitated,
like third-party tracking, where browsers enable some companies to
follow your activity from site to site, and data leakage. Things that
that just don’t work according to the way that people expect. Most
people don’t want their activity on one site to follow them over to
another site, and the original batch of web browsers got that terribly
wrong. The good news is web browsers are getting it right, and web
browsers are under tremendous pressure now to do so. As a product the
web browser is pretty much complete and working and generic. The whole
point of a web browser is it shows web sites the same as all the other
web browsers do, so there’s less and less reason for a user to want to
switch web browsers. But everybody who is trying to get you to install a
web browser needs for there to be a reason, so the opportunity for
browsers is to align with those interests of users that the browser
wasn’t able to pick up on previously.
At Mozilla some user researchers recently did a study
on users with no ad blocker installed and users within the first few
weeks of installing an ad blocker. Anybody want to guess on the
increased engagement? How much more time those ad blocker users spend
with that same browser than the non ad blocker users? Anybody shout out
a number. All right, 28%. From the point of view of the browser those
kinds of numbers, moving user engagement in a way that helps that
browser meet its goals, that’s something that that the browser can’t
ignore. So that means we’re going from the old web game where everyone
tries win by collecting as much data on people can without their
permission to a new game in which the browser, high reputation
publishers, and high reputation brands are all aligned in trying to
build enough trust to work on information that users choose to
share.
I know when I say information that users choose to share
you’re going to think about all these GDPR dialogs and I know I’ve seen
these too, and they’re just tons of companies on these. To be honest,
looking at some of these company names it looks like most of them were
made up by guys from Florida who communicate primarily by finger guns.
Users should not have to micromanage their consent for all this data
collection activity any more than email users should have to go in and
read their SMTP headers to filter spam. And really if you think about
what brands are, it’s offloading information about a product buying
decision onto the reputation coprocessor in the user’s brain. It’s kind
of like taking a computational task and instead of running it on the CPU
in your data center where you have to to pay the power and cooling bills
for it, you offload it and run it on on the GPU on the client. It’ll run
faster, it’ll run better, and the audience is maintaining that
reputation state.
The future is here, it’s just not very evenly distributed, as William Gibson
said. This picture is the cyberpunk
of the 1990s. Today all of that stuff he’s carrying, his video
camera, his laptop, his scanner, all that stuff’s on a phone and
everybody has it.
Today, the privacy sensitive users, the ones who are already working
based on sharing data with permission, they’re out there. But they’re in
niches today. If you have a relationship with those people now, then now
is an opportunity to connect with them, figure out how to build that
signal carrying advertising game, and and create a reputation based
advertising model for the web. Thank you very much.
Are there parallels between the rise of Worse Is Better in
software and the success of the “uncreative counterrevolution” in
advertising? (for more on that second one: John
Hegarty: Creativity is receding from marketing and data is to blame)
The winning strategy in software is to sacrifice consistency and
correctness for simplicity. (probably because of network effects,
principal-agent problems, and market failures.) And it seems like
advertising has similar trade-offs between
Measurability (How well can we measure this project’s effect on
sales?)
Message (Is it persuasive and on brand?)
Just as it’s rational for software decision-makers to choose
simplicity, it can be rational for marketing decsion-makers to choose
measurability over signal and message. (This is probably why there is a
brand crisis going on—short-term CMOs are better off when they choose
brand-unsafe tactics, sacrificing Message.)
As we’re now figuring out how to use market-based tools to fix market
failures in software, where can we use better market design to fix
market failures in advertising?
Actually, none of those three statements is true. And Facebook knows
it.
The American Red Cross has given Facebook this highly personal
information about me, by adding my contact info to an “American Red
Cross Blood Donors” Facebook
Custom Audience. If any of that stuff were true, I wouldn’t have
been allowed to give blood.
When I heard back from the American Red Cross about this personal
data problem, they told me that they don’t share my health
information with Facebook.
That’s not how it works. I’m listed in the Custom Audience as a blood
donor. Anyway, too late. Facebook has the info now.
So, which of its promises about how it uses people’s personal
information is Facebook going to break next?
And is some creepy tech bro right now making a killer pitch to Paul
Graham about a business plan to “disrupt” the health insurance market
using blood donor information?
Why don’t you CCPA
Facebook? Since companies are passing your info to Facebook anyway,
you might as well keep an account there to at least see what’s going
on.
Hugo-award-winning author Charles Stross said that a corporation is
some kind of sociopathic
hive organism, but as far as I can tell a corporation is really more
like a monkey troop cosplaying a sociopathic hive organism.
This is important to remember because, among other reasons, it turns
out that the money that a corporation spends to support democracy
and creative work comes from the same advertising budget as the
money it spends on random
white power trolls and actual
no-shit Nazis. The challenge for customers is to help people at
corporations who want to do the right thing with the advertising budget,
but need to be able to justify it in terms that won’t break character
(since they have agreed to pretend to be part of a sociopathic hive
organism that only cares about its stock price).
Techcrunch
reports that “Facebook Custom Audiences,” the system where
advertisers upload contact lists to Facebook in order to target the
people on those lists with ads, will soon require permission from the
people on the list. Check it out: Introducing
New Requirements for Custom Audience Targeting | Facebook Business.
On July 2, Facebook’s own rules will extend a subset of Europe-like
protection to everyone with a Facebook account. Beaujolais!
So this is a great opportunity to help people who work for
corporations and want to do the right thing. Denying permission to share
your info with Facebook can move the advertising money that they spend
to reach you away from evil stuff and towards sites that make something
good. Here’s a permission
withdrawal letter to cut and paste. Pull requests welcome.
At Nudgestock 2018
I mentioned the signaling
literature that provides background for understanding the targeted
advertising problem. Besides being behind paywalls, a lot of this
material is written in math that takes a while to figure out. For
example, it’s worth working through this Gardete
and Bart paper to understand a situation in which the audience is
making the right move to ignore a targeted message, but it can take a
while.
Are people rational to ignore or block targeted advertising in some
media, because those media are set up to give
an incentive to deceptive sellers? Here’s a simulation of an ad
market in which that might be the case. Of course, this does not show
that in all advertising markets, better targeting leads to an advantage
for deceptive sellers. But it is a demonstration that it is possible to
design a set of rules for an advertising market that gives an advantage
to deceptive sellers.
What are we looking at? Think of it as a culture medium where we can
grow and evolve a population of single-celled advertisers.
The x and y coordinates are some arbitrary characteristic of offers
made to customers. Customers, invisible, are scattered randomly all over
the map. If a customer gets an offer for a product that is close enough
to their preferences, it will buy.
Advertisers (yellow to orange squares) get to place ads that reach
customers within a certain radius. The advertiser has a price that it
will bid for an ad impression, and a maximum distance at which it will
bid for an impression. These are assigned randomly when we populate the
initial set of advertisers.
High-bidding advertisers are more orange, and lower-bidding
advertisers are more pale yellow.
An advertiser is either deceptive, in which case it makes a slightly
higher profit per sale, or honest. When an honest advertiser makes a
sale, we draw a green line from the advertiser to the customer. When a
deceptive advertiser makes a sale, we draw a red line. The lines appear
to fade out because we draw a black line every time there is an ad
impression that does not result in a sale.
So why don’t the honest advertisers die out? One more factor: the
norms enforcers. You can think of these as product reviewers or
regulators. If a deceptive advertiser wins an ad impression to a norms
enforcer, then the deceptive advertiser pays a cost, greater than the
profit from a sale. Think of it as having to register a new domain and
get a new logo. Honest advertisers can make normal sales to the norms
enforcers, which are shown as blue squares. An ad impression that
results in an “enforcement penalty” is shown as a blue line.
So, out of those relative simple rules—two kinds of advertisers and
two kinds of customers—we can see several main strategies arise. Your
run of the simulation is unique, and you can also visit the big version.
What I’m seeing on mine is some clusters of finely targeted deceptive
advertisers, in areas with relatively few norms enforcers, and some
low-bidding honest advertisers with a relatively broad targeting radius.
Again, I don’t think that this necessarily corresponds to any real-world
advertising market, but it is interesting to figure out when and how an
advertising market can give an advantage to deceptive sellers, and what
kinds of protections on the customer side can change the game.
Thanks for coming to my Nudgestock 2018 talk.
First, as promised, some links to the signaling literature. I don’t know
of a full bibliography for this material, and a lot of it appears to be
paywalled. A good way to get into it is to start with this widely cited
paper by Phillip Nelson: Advertising
as Information | Journal of Political Economy: Vol 82, No 4 and work
forward.
Gardete and Bart “We find that when the sender’s
motives are transparent to the receiver, communication can only be
influential if the sender is not well informed about the receiver’s
preferences. The sender prefers an interior level of information
quality, while the receiver prefers complete privacy unless disclosure
is necessary to induce communication.” Tailored
Cheap Talk | Stanford Graduate School of BusinessThe Gardete and Bart paper makes sense if you ever read
Computer Shopper for the ads. You want to get an idea of each
manufacturer’s support for each hardware standard, so that you can buy
parts today that will keep their value in the parts market of the near
future. You don’t want an ad that targets you based on what you already
have.
Kihlstrom and Riordan “A great deal of advertising
appears to convey no direct credible information about product
qualities. Nevertheless such advertising may indirectly signal quality
if there exist market mechanisms that produce a positive relationship
between product quality and advertising expenditures.” Advertising
as a Signal
For those of you who are verified on Twitter, so haven’t seen what
I’m talking about with the deceptive ads there, I have started
collecting some: dmarti/deceptive-ads
I mentioned the alignment of interest between high-reputation brands
and high-reputation publishers. More on the publisher side is in a
series of guest posts for Digital Content Next, which represents large
media companies that stand to benefit from reputation-based advertising:
Don
Marti, Author at Digital Content Next Also more from the publisher
point of view in Notes
and links from my talk at the Reynolds Journalism Institute.
What’s next? The web advertising mess isn’t a snarled-up mess of
collective action problems. It’s a complex set of problems that interact
in a way that creates some big opportunities for the right projects. Work together
to fix web ads? Let’s not.
Rule number one of dealing with the big Internet companies is: never
complain to them about all the evilstuff
they support. It’s a waste of time and carpal tunnels. All of the major
Internet companies have software, processes, and, most important, contract
moderators, to attenuate complaints. After all, if Big Company
employees came in to work and saw real user screenshots of the beheading
videos, or the child abuse channel, or the ethnic cleansing memes, then
that would harsh their mellow and severely interfere with their ability
to, as they say in California, bro down and crush code.
Fortunately, we have better options than engaging with a process
that’s designed to mute a complaint. Follow the money.
Your average Internet ad does not come from some ominous all-seeing
data-driven Panopticon. It’s probably placed by some marketing person
looking at an ad dashboard screen that’s just as confusing to
them as the ad placement is confusing to you.
Contact a brand’s marketing decision makers directly.
Briefly make a specific request.
Put your request in terms that make not granting it
riskier and more time-consuming.
This should be pretty well known by now. What’s new is a change in
European privacy regulations. The famous European GDPR
applies not just to Europeans, but to natural persons. So I’m
going to test the idea that if I ask for something specific and easy to
do, it will be easier for people to just do it, instead of having to
figure out that (1) they have a different policy for people who they
won’t honor GDPR requests from and (2) they can safely assign me to the
non-GDPR group and ignore me.
My simple request is not to include me in a Facebook Custom
Audience. I can find the brands that are doing this by downloading
ad data from Facebook, and here’s a letter-making web
thingy that I can use. Try it if you like. I’ll follow up with how
it’s going.
Trans
Europa Express was covered on ghacks.net. This is an experimental
Firefox extension that tries to get web sites to give you European-level
privacy rights, even if the site classifies you as non-European.
Since the version they mentioned, I have updated it with a few new
features.
big one: make the Google Tag Manager support actually work, as I
figured out the wrappedJSObject
thing.
Add support for another variant on Google ads (the “adsbygoogle”
object).
Anyway, check
it out. Seems to have actual users now, so I’ve got that going for
me. But lots of secret European mode switches still remain unactivated.
If you see one, please make a new
issue.
I know I haven’t posted for a while, but I can’t skip GDPR
Day. You don’t see a lot of personal info from me here on this blog. But
just for once, I’m going to share something.
I’m a blood donor.
This doesn’t seem like a lot of information. People sign up for blood
drives all the time. But the serious privacy problem here is that when I
give blood, they also test me for a lot of diseases, many of which could
have a big impact on my life and how much of certain kinds of healthcare
products and services I’m likely to need. The fact that I’m a blood
donor might also help people infer
something about my sex life but the health data is TMI already.
And I have some bad news. I recently got the ad
info from my Facebook account and there it is, in the file
advertisers_who_uploaded_a_contact_list_with_your_information.html.
American Red Cross Blood Donors. Yes, it looks like the
people I chose to trust with some of my most sensitive personal info
have given it to the least
trusted company on the Internet.
In today’s marketing scene, the fact that my blood donor information
leaked to Facebook isn’t too surprising. The Red Cross clearly has some
marketing people, and targeting the existing contact list on Facebook is
just one of the things that marketing people do without
thinking about it too much.Not thinking about
privacy concerns is a problem for Marketing as a career field long-term.
If everyone thinks of Marketing as the Department of Creepy Stuff it’s
going to be harder to recruit creative people.
So, wait a minute. Why am I concerned that Facebook has
positive health info on me? Doesn’t that help maintain my status in the
data-driven economy? What’s the downside? (Obvious joke about healthy-blood-craving
Facebook board member Peter Thiel redacted—you’re welcome.)
The problem is that my control over my personal data isn’t just a
problem for me. As Prof. Arvind Narayanan said
(video), Poor privacy harms society as a whole. Can I trust
Facebook to use my blood info just to target me for the Red Cross, and
not to sort people by health for other purposes? Of course not. Facebook
has crossed
every creepy line that they have promised not to. To be fair, that’s
not just a Facebook thing. Tech bros do risky and mean things all the
time without really thinking them through, and even when they do set
appropriate defaults they half-ass the implementation and shit
happens.
Will blood donor status get you better deals, or apartments, or jobs,
in the future? I don’t know. I do know that the Red Cross made a big
point about confidentiality when they got me signed up. I’m waiting for
a reply from the Red Cross privacy officer about this, and will post an
update.
Doc Searls
is optimistic that surveillance marketing is going away, but what’s
going to replace it? One idea that keeps coming up is the suggestion
that prospective buyers should be able to sell purchase intent data to
vendors directly. This seems to be appealing because it means that the
Marketing department will still get to have Big Data and stuff, but I’m
still trying to figure out how voluntary transactions in intent data
could even be a thing.
Here’s an example. It’s the week before Thanksgiving, and I’m
shopping for a kitchen stove. Here are two possible pieces of intent
information that I could sell.
“I’m cutting through the store on the way to buy something else.
If a stove is on sale, I might buy it, but only if it’s a bargain,
because who needs the hassle of handling a stove delivery the week
before Thanksgiving?”
“My old stove is shot, and I need one right away because I have
already invited people over. Shut up and take my money.”
On a future intent trading platform, what’s my incentive to reveal
which intent is the true one?
If I’m a bargain hunter, I’m willing to sell my intent information,
because it would tend to get me a lower price. But in that case, why
would any store want to buy the information?
If I need the product now, I would only sell the information for a
price higher than the expected difference between the price I would pay
and the price a bargain hunter would pay. But if the information isn’t
worth more than the price difference, why would the store want
to buy it?
So how can a market for purchase intent data happen?
Or is the idea of selling access to purchase intent only feasible if
the intent data is taken from the “data subject” without permission?
Anyway, I can see how search advertising and signal-based advertising
can assume a more important role as surveillance marketing becomes less
important, but I’m not sure about markets for purchase intent. Maybe
user data sharing will be not so much a stand-alone thing but a role for
trustworthy news and cultural sites, as people choose to share data as
part of commenting and survey completion, and that data, in aggregated
form, becomes part of a site’s audience profile. More:personal
AI in the rugpull economy
It would make me really happy to be able to yellow-list
Google web ads in Privacy Badger. (Yellow-listed domains are not
blocked, but have their cookies restricted in order to cut back on
cross-site tracking.) That’s because a lot of news and cultural sites
use DoubleClick
for Publishers and other Google services to deliver legit,
context-based advertising. Unfortunately, as far as I can tell, Google
mixes in-context ads with crappy, spam-like, targeted stuff. What I want
is something like Doc
Searls style ads: Just give me ads not based on tracking
me.
Sort of. GDPR-compliant terms written by Google aren’t exactly the
same as EFF’s privacy-friendly
Do Not Track (DNT) PolicyAll these different
tracking policies are reminding me of open source licenses for some
reason. but close enough. The catch is that as an end user, you
can’t just turn on Google’s European mode. You have to do some
JavaScript. I think I figured out how to do this in a simple browser
extension to unlock secret European status.
Google doesn’t appear to have their European mode activated yet, so I
added
a do-nothing “European mode” to the Aloodo project, for testing. I’m
not able to yellow-list Google yet, but when GDPR takes effect later
this month I’ll test it some more.
In the meantime, I’ll keep looking for other examples of hidden
European mode, and see if I can figure out how to activate them.
Lots of GDPR advice out there. As far as I can tell it pretty much
falls into three categories.
Play
it straight and handle user consent correctly. Good part: you end up
with less personal data, but what you do have is better quality and you
clearly know what data you can use for what purposes. Bad part: UX gets
annoying because users have to fill out a bunch of web forms.
Cut
back on surveillance marketing. Good part: better for brand equity
in the long run. All advertising is brand
advertising. Some of it is just brand advertising in the wrong
direction. Bad part: what long run? CMO is a short-term job, and
surveillance marketing projects get budgets for a reason. Strip-mining
brand equity is a short-term win.
Add microformats to
label consent forms as consent forms, and appropriate links to the data
usage policy to which the user is being asked to agree.
Release a browser extension that will do the right thing with the
consent forms, and submit automatically if the user is fine with the
data usage request and policy, and appears to trust the site. Lots of
options here, since the extension can keep track of known data usage
policies and which sites the user appears to trust, based on their
activity.
Publish user research results from the browser extension. At this
point the browsers can compete to do their own versions of step 3, in
order to give their users a more trustworthy and less annoying
experience.
Browsers need to differentiate in order to attract new users and keep
existing users. Right now a good way to do that is in creating a
safer-feeling, more trustworthy environment. The big opportunity is in
seeing the overlap between that goal for the browser and the needs of
brands to build reputation and the needs of high-reputation publishers
to shift web advertising from a hacking game that adtech/adfraud wins
now, to a reputation game where trusted sites can win.
Why does the Peak Advertising
effect occur most in the most accurately targeted ad media? Why do
people tend to filter out targeted ads, using habit power, technology,
and regulation, while paying more attention to less finely targeted ad
media?
One explanation is that buying ad space is an example of costly
signaling. On this view, advertising is basically an exchange of
signal for attention, and ads that don’t pay their way with some
kind of proof of spend are not worth paying attention to because they
don’t convey useful information about the seller’s beliefs on how
valuable the audience would find the product.
Another possible explanation is that targetable ad media are more
suitable for deception, and that where advertisers bid for space in a
medium, deceptive advertisers will tend to outbid the honest ones.
This seems counterintuitive, since we might suppose that the customer
lifetime value of an honest seller’s newly acquired customer could in
many cases be greater than the profit from a quick score by a deceptive
seller. But targeting doesn’t just match ad impressions with prospective
buyers. When used by a deceptive seller, it can also conceal an ad
impression from potentially costly attention.
For honest direct marketers, the expected profit from reaching a
buyer is positive, and the expected profit from reaching a non-buyer is
zero. But the audience does not just contain buyers and non-buyers.
People can also be divided into enforcers and non-enforcers. Enforcers
can be anything from professional law enforcement people, to someone who
takes apart a bogus product and makes a video about it, to just the
writer of a bad online review. What enforcers have in common is that for
a dishonest seller, the expected profit from reaching an enforcer is
negative.
Some kinds of enforcer can impose costs even without buying. For
example, a reader might send the publisher a screenshot containing a
scam ad and get the advertiser added to an
advertiser exclusion list. Other kinds of enforcer might only take
action if they buy the product and find it to be a scam. A deceptive
advertiser might incur costs when their ad is shown to either kind of
enforcer.
For an honest advertiser, the expected profit from a single
impression is:
probability of reaching a buyer × expected profit per
sale
For the dishonest advertiser, the expected profit is:
probability of reaching a buyer × expected profit per sale −
probability of reaching an enforcer × expected loss per
enforcer
The expected loss per enforcer is typically high compared to the
profit per sale. For example, a small number of contacts with review
writers might require a seller to re-launch under a new name. In an ad
impression market with both honest and deceptive sellers, where sellers
can choose which impressions to bid on, an ad impression that a
deceptive seller believes is unlikely to reach an enforcer has extra
value to that deceptive advertiser but not to an honest advertiser.
Deceptive sellers will tend to outbid honest ones for certain
impressions.
A member of the audience might be able to see targeting criteria, but
not the advertiser’s internal weighting of targeting criteria. (For
example, a targeted ad platform might reveal to you that you are being
targeted for an ad because your computer is running the latest release
of the OS. What they won’t tell you is that the seller is bidding on
impressions to your OS version because they’re selling a tainted
nutritional supplement, and the lead testing department at the Ministry
of Health is still on the old OS version.)
So, some ad impressions will tend to be purchased by deceptive
sellers, but a low-information member of the audience can’t tell which
impressions those are. Is this an ad from an honest seller that might be
reaching both me and enforcers, or is this an ad from a dishonest seller
targeted to reach me but not enforcers? When you read a magazine that
reaches a community of practice of which you’re a member, you can be
confident that product reviewers and editors are seeing the same ads you
are. A web ad could be targeted to avoid experienced and
better-connected members of the community of practice.
[I]f the ad could have been targeted to just the subset of consumers
most receptive to its message, they might not have raised any
complaints. As a result, the harmful message would have gone
unchallenged, missing an opportunity for the regulator to update their
guidelines in keeping with current social norms.
One possible explanation for the Peak Advertising effect is the
interaction between deceptive sellers discovering how to use a new ad
medium’s targeting capabilities to avoid enforcers, and the audience
discovering the fraction of deceptive sellers.
Ban
Targeted Advertising by David Dayen in The New
Republic. (I’m not so much interested in whether or not targeted
advertising should be banned as I am in the reasoning behind why people
choose to protect themselves from it. The story of matching the exact
right buyer to the exact right product is much less compelling for most
purchase decisions than the buyer’s story of finding an adequate product
and avoiding deceptive sellers.)
Post-creepy web ad sightings: What’s next for web
advertising after browser privacy improvements and regulatory changes
make conventional adtech harder and harder?
The answer is probably something similar to what’s already starting
to pop up on niche sites. Here’s a list of ad platforms that work more
like print, less like spam: list of
post-creepy web ad systems. Comments and suggestions welcome (mail
me, or do a GitHub pull request from the link at the bottom.)
Good question on Twitter, but one that might take more than, what is
is now, 280 characters? to answer.
Sir, why do you pay so much attention on internet advertising? I have
hardly read your tweet that isn't related to internet advertising. I
used Privacy Badger for some time last year. It's useful but a little
heavy😅
Why do I pay attention to Internet advertising? Why not just block it
and forget about it? By now, web ad revenue per user is so small that it
only makes sense if you’re running a platform with billions of users, so
sites are busy figuring out other ways to get paid anyway.
To the generation that never had a print magazine subscription,
advertising is just a subset of “creepy shit on the Internet.” Who wants
to do that for a living? According
to Charlotte Rogers at Marketing Week,the lack of information
out there explaining the diverse opportunities of a career in marketing
puts the industry at a distinct disadvantage in the minds of young
people. Marketing also has to contend with a perception problem among
the younger generation that it is intrinsically linked with advertising,
which Generation Z notoriously either distrust or dislike.
The answer is that I’m interested in Internet advertising for two
reasons.
First, because I’m a Kurt Vonnegut fan and have worked for a
magazine. Some kinds of advertising can have positive externalities.
Vonnegut was able to quit his job at a car dealership, and write full
time, because advertising paid for original fiction in
Collier’s magazine. How did advertising lose its ability to
pay for news and cultural works? Can
advertising reclaim that ability?
Second, because most of the economic role of advertising is in an
area that Internet advertising hasn’t been able to get a piece of. While
Internet advertising plays a game of haha, look what I tricked you
into clicking on for chump change, the real money is in
signal-carrying advertising that helps build brand reputation. Is it
possible to make Internet advertising into a medium that can get a piece
of the action?
Maybe make that three reasons. As long as Internet advertising fails
to pull its weight in either supporting news and cultural
works or helping to send a credible economic signal for
brands then the scams, malware and mental manipulation will
only continue. More:World’s
last web advertising optimist tells all!
We know that advertising on the web has reached a low point of fraud,
security risks, and lack of brand safety. And it’s not making much money
for publishers anyway. So a lot of people are talking about how to fix
it, by building a new user data sharing system, in which individuals are
in control of which data they choose to reveal to which companies.
Unlike today’s surveillance marketing, people wouldn’t be targeted
for advertising based on data that someone figures out about them and
that they might not choose to share.
A big win here will be that the new system would tend to lower the
ROI on creepy marketing investments that have harmful side effects such
as identity theft and facilitation of state-sponsored misinformation,
and increase the ROI for funding ad-supported sites that people trust
and choose to share personal information with.
A user-permissioned data sharing system is an excellent goal with the
potential to help clean up a lot of the Internet’s problems. But I have
to be realistic about it. Adam Smith once wrote,
The pride of man makes him love to domineer, and nothing mortifies him
so much as to be obliged to condescend to persuade his inferiors.
So the big question is still:
Why would buyers of user data choose to deal with users (or
publishers who hold data with the user’s permission) when they can just
take the data from users, using existing surveillance marketing
firms?
Some possible answers.
GDPR? Unfortunately, regulatory capture is still
a thing even in Europe. Sometimes I wish that American privacy nerds
would quit pretending that Europe is ruled by Galadriel or
something.
brand safety problems? Maybe a little around the
edges when a particularly bad video gets super viral. But platforms and
adtech can easily hide brand-unsafe “dark” material from marketers, who
can even spend time on Youtube and Facebook without ever developing a
clue about how brand-unsafe they are for regular people. Even as
news-gatherers get better at finding the worst stuff, platforms will
always make hiding brand-unsafe content a high priority.
fraud concerns? Now we’re getting somewhere.
Fraud hackers are good at making realistic user data. Even
“people-based” platforms mysteriously have more users in desirable
geography/demography combinations than are actually there according to
the census data. So, where can user-permissioned data be a fraud
solution?
signaling? The brand equity math must be out
there somewhere, but it’s nowhere near as widely known as the direct
response math that backs up the creepy stuff. Maybe some researcher at
one of the big brand advertisers developed the math internally in the
1980s but it got shredded when the person retired. Big possible future
win for the right behavioral economist at the right agency, but not in
the short term.
improvements in client-side privacy? Another
good one. Email spam filtering went from obscure nerdery to mainstream
checklist feature quickly—because email services competed on it. Right
now the web browser is a generic product, and browser makers need to
differentiate. One promising angle is for the browser to help build a
feeling of safety in the user by reducing user-perceived creepiness, and
the browser’s need to compete on this is aligned with the interests of
trustworthy sites and with user-permissioned data sharing.
With the rise of fake news and revelations about how the Russians
used social platforms to influence both the US election and EU
referendum, the need for change is pressing, both for the platforms and
for the advertisers that support them.
For the call to action to work, Unilever really needs other brands to
rally round but these have so far been few and far between.
Other brands? Why?
If brands are worth anything, they can at least help people tell one
product apart from another.
Saying that other brands need to participate in saving Unilever’s
brands from the three-ring shitshow of brand-unsafe advertising is like
saying that Volkswagen really needs other brands to get into simple
layouts and natural-sounding copy just because Volkswagen’s agency did.
Not everybody has to make the same stuff and sell it the same way.
Brands being different from each other is a good thing. (Right?)
Sometimes a problem on the Internet isn’t a “let’s all work together”
kind of problem. Sometimes it’s an opportunity for one brand to get out
ahead of another.
What if every brand in a category kept on playing in the trash fire
except one?
(I work for Mozilla. None of this is secret. None of this is
Mozilla policy. Not speaking for Mozilla here.)
A big objection to tracking protection is the idea that the tracker
will always get through. Some people suggest that as browsers give users
more ability to control how their personal information gets leaked
across sites, things won’t get better for users, because third-party
tracking will just keep up. On this view, today’s easy-to-block
third-party cookies will be replaced by techniques such as passive
fingerprinting where it’s hard to tell if the browser is succeeding at
protecting the user or not, and users will be stuck in the same place
they are now, or worse.
I doubt this is the case because we’re playing a more complex game
than just trackers vs. users. The game has at least five sides, and some
of the fastest-moving players with the best understanding of the game
are the adfraud hackers. Right now adfraud is losing in some areas where
they had been winning, and the resulting shift in adfraud is likely to
shift the risks and rewards of tracking techniques.
Data center adfraud
Fraudbots, running in data centers, visit legit sites (with
third-party ads and trackers) to pick up a realistic set of third-party
cookies to make them look like high-value users. Then the bots visit
dedicated fraudulent “cash out” sites (whose operators have the same
third-party ads and trackers) to generate valuable ad impressions for
those sites. If you wonder why so many sites made a big deal out of
“pivot to video” but can’t remember watching a video ad, this is why.
Fraudbots are patient enough to get profiled as, say, a car buyer, and
watch those big-money ads. And the money is good enough to motivate
fraud hackers to make good bots, usually based on real browser code.
When a fraudbot network gets caught and blocked from high-value ads, it
gets recycled for lower and lower value forms of advertising. By the
time you see traffic for sale on fraud boards, those bots are probably
only getting past just enough third-party anti-fraud services to be
worth running.
This version of adfraud has minimal impact on real users. Real users
don’t go to fraud sites, and fraudbots do their thing in data centers
Doesn’t everyone do their Christmas shopping while
chilling out in the cold aisle at an Amazon AWS data center? Seems legit
to me. and don’t touch users’ systems. The companies that pay for
it are legit publishers, who not only have to serve pages to
fraudbots—remember, a bot needs to visit enough legit sites to look like
a real user—but also end up competing with adfraud for ad revenue.
Adfraud has only really been a problem for legit publishers. The adtech
business is fine with it, since they make more money from fraud than the
fraud hackers do, and the advertisers are fine with it because fraud is
priced in, so they pay the fraud-adjusted price even for real
impressions.
What’s new for adfraud
So what’s changing? More fraudbots in data centers are getting
caught, just because the adtech firms have mostly been shamed into
filtering out the embarassingly obvious traffic from IP addresses that
everyone can tell probably don’t have a human user on them. So where is
fraud going now? More fraud is likely to move to a place where a bot can
look more realistic but probably not stay up as long—your computer or
mobile device. Expect adfraud concealed within web pages, as a payload
for malware, and of course in lots and lots of cheesy native mobile
apps.The Google Play Store has an ongoing problem
with adfraud, which is content
marketing gold for Check Point Software, if you like “shitty app did
WHAT?” stories. Adfraud makes way more money than cryptocurrency
mining, using less CPU and battery.
So the bad news is that you’re going to have to reformat your uncle’s
computer a lot this year, because more client-side fraud is coming. Data
center IPs don’t get by the ad networks as well as they once did, so
adfraud is getting personal. The good news, is, hey, you know all that
big, scary passive fingerprinting that’s supposed to become the
harder-to-beat replacement for the third-party cookie? Client-side fraud
has to beat it in order to get paid, so they’ll beat it. As a bonus,
client-side bots are way better at attribution fraud (where a fraudulent
ad gets credit for a real sale) than data center bots.
Advertisers have two possible responses to adfraud: either try to
out-hack it, or join the “flight to quality” and cut back on trying to
follow big-money users to low-reputation sites in the first place.
Hard-to-detect client-side bots, by making creepy fingerprinting
techniques less trustworthy, tend to increase the uncertainty of the
hacking option and make flight to quality relatively more
attractive.
What if I told you that there was an Internet ad technology that…
can reach the same user on mobile and desktop
uses open-standard persistent identifiers for users
can connect users to their purchase history
reaches the users that the advertiser chooses, at the time the
advertiser chooses
and doesn’t depend on the Google/Facebook duopoly?
Don’t go looking for it on the Lumascape.
I’m describing email spam.
Every feature that adtech is bragging on, or working toward? Email
spam had it in the 1990s.
So why didn’t brand advertisers jump all over spam? Why did they
mostly leave it to low-reputation brands and scammers?
To be honest, it probably wasn’t a decision decision in most
cases, just corporate sloth. But staying away from spam was the right
answer. In the email inbox, spam from a high-reputation brand doesn’t
look any different from spam that any fly-by-night operation can send.
All spammers can do the same stuff:
They can sell to people…for a fraction of what marketing used to
cost. And they can collect data on these consumers, track what they buy,
what they love and hate about the experience, and market to them
directly much more effectively.
It’s the direct consumer relationships, and the use of consumer data,
that is completely game-changing for the marketing world. And most big
marketers, such as Procter & Gamble and Unilever, are not ready for
this new reality, the IAB says.
But of course they’re ready. The difference is that those established
brand advertisers aren’t any more ready than some guy who
watched a YouTube video series on “growth hacking” and is ready to start
buying targeted ads and drop-shipping.
The “new reality,” the targeted advertising business that the IAB
wants brands to join them in, is a place where you win based not on how
much the audience trusts you, but on how well you can out-hack the
competition. And like any information space organized by hacking skill,
it’s a hellscape of deceptive crap. Read The
Strange Brands in Your Instagram Feed by Alexis C. Madrigal.
Some Instagram retailers are legit brands with employees and
products. Others are simply middlemen for Chinese goods, built in
bedrooms, and launched with no capital or inventory. All of them have
been pulled into existence by the power of Instagram and Facebook ads
combined with a suite of e-commerce tools based around Shopify.
Of course, not every brand that buys a social media ad or other
targeted ad is crap.
But a social media ad is useless for telling crap brands from
non-crap ones. It doesn’t carry economic signal. There’s
no such thing as a free watch. (PDF)
Many billions of pounds of advertising expenditure have been shifted
from conventional media, most notably newspapers, and moved into digital
media in a quest for targeted efficiency. If advertising simply works by
the conveyance of messages, this would be a sensible thing to do.
However, it is beginning to become apparent that not all, perhaps not
even most, advertising works this way. It seems that a large part of
advertising creates trust and conviction in its audience precisely
because it is perceived to be costly.
If anyone knows that any seller can watch a few YouTube videos and do
a certain activity, does that activity really help the audience
distinguish a high-reputation seller from a low-reputation one?
And how does it affect a legit brand when its ads show up on the same
medium with all the crappy ones?Twitter has a
solution that keeps its ads saleable: just don’t show any ads to
important people. I’m surprised they can get away with this, but given
the mix of rip-off and real brand ads I keep seeing there, it seems to
be working.
Extremists and state-sponsored misinformation campaigns aren’t
“abusing” targeted advertising. They’re just taking advantage of a
system optimized for deception and using it normally.
Now, I don’t want to blame targeted advertising for all of the
problems of brand equity. When you put high-fructose corn syrup in your
product, brand equity suffers. When you outsource or de-skill the
customer support function, brand equity suffers. All the half-ass “looks
good this quarter” stuff that established brands are doing is bad for
brand equity. It just turns out that the kinds of advertising that you
can do on the Internet today are all half-ass “looks good this
quarter” stuff. If you want to send a credible economic signal, buy TV
time or put a flagship store on some expensive real estate. The
Internet’s got nothing for you.
Failure to create signal-carrying ad units should be more of a
concern for people who want to earn ad money on the Internet than it is.
See Bob
Hoffman’s “refrigerator test.” All that work that went into building
the most complicated ad medium ever? It went into building an ad medium
optimized for low-reputation advertisers. And that kind of ad medium
tends to see rates go down over time. It doesn’t hold value.
And the medium can’t gain value until the users trust it, which means
they have to trust the browser. In-browser tracking protection is going
to have to enable the legit web advertising industry the same way that
spam filters enables the legit email newsletter industry.
I am sure Google, Facebook and lesser purveyors of advertising online
will find less icky ways to stay in business; but it is becoming clear
that next May 25, when the GDPR goes into full effect, will be an
extinction-level event for tracking-based advertising (aka adtech) as a
business model.
Big impact? Not so fast. There’s also a “business as usual” story,
and that one, you’ll find at Digital Advertising
Consent.
Our complex ecosystem of companies must cooperate more closely than
ever before to meet the transparency and consent requirements of
European data protection law.
Personally, I’m not buying either one of these GDPR visions. Because,
just for fun and also because reasons, I run my own mail server.
And every little decision I have to make about how to configure the
damn thing is based on playing a game with email spammers. Regulation is
a part of my complete breakfast, but it’s not the whole story.
The government doesn’t give you freedom from spam. You have to take
it for yourself, one filtering rule at a time. Or, do what most people
do, and find a company that does it for you, but it has to be a company
that you trust with your information.
A mail sender’s decision to comply, or not comply, with some
regulation is a bit of information. That feeds into the software that
makes the final decision: inbox, spam folder, or reject. When a spam
message complies with the regulations of some country, my mail server
doesn’t say, “Oh, wow, compliant! I can skip all the other checks and
send this one straight to the inbox!” It uses the regulation compliance
along with other information to make that decision.
So whatever extra consent forms that surveillance marketers are
required to send by GDPR? They’re not the final decision on What The
User Must See. They’re just data, coming over the network.
Some of that data will be interpreted to mean that this request is an
obvious mismatch with how the user chooses to share their info. The user
might not even see those consent forms, or the browser might pop up a
notification:
4 requests to do creepy shit, that’s obviously against your
preferences, already denied. Isn’t this the best browser ever?
(No, I don’t write copy for browser notifications. But you get the
idea.)
Browsers that implement tracking protection might end up with a
feature where they detect requests for permission to do things that the
user has already said no to—by turning on tracking protection in the
first place—and auto-deny them.
Legit email senders had to learn “deliverability,” the art and
science of making legit mail look legit so that it can get past email
spam filters. Legit advertisers will have to learn that users aren’t
identical and spherical, users choose tools to implement their data
sharing preferences, and that regulatory compliance is only part of the
job.
Let’s run a technical challenge on the Internet. Team A vs. Team
B.
Team A gets to work where they want, when they want. Team B has to
work in an open-plan office, with people walking behind them, talking on
the phone, doing all that annoying office stuff.
Members of Team A get paid for successful work within weeks or
months. Members of Team B get a base salary that they have to spend on
rent in an expensive location, but just might get paid extra for
successful work in four years.
Team A will let anyone try to join, and those who aren’t successful
have to drop out quickly. Team B will only let members who are a “good
cultural fit” join, and it takes a while to get rid of an unsuccessful
member.
Team A can deploy unproven work for real-world testing, using
infrastructure that they get for free on the Internet. Team B can only
deploy their work when production-ready, on infrastructure they have to
pay for.
If Team A breaks the rules, the penalty is that they have to spend a
little money to register new domain names. If Team B breaks the rules,
they risk lengthy regulatory and/or legal consequences.
Team A scores a win any time they can beat whoever is the weakest
member of Team B at that time. Team B can only score a win when they can
consistently defeat all of the most active members of Team A.
Team A is adfraud.
Why is so much marketing money being bet on Team B?
The IAB road blocked the W3C Do Not Track initiative in 2012 that was
led by a cross functional group that most importantly included the
browser makers. In hindsight this was the only real chance for the
industry to solve consumer needs around data privacy and advertising
technology. The IAB wanted self-regulation. In the end, DNT died as the
IAB hoped.
As third-party tracking made the ad experience crappier and crappier,
browser makers tried to play nice. Browser makers tried to work in the
open and build consensus.
That didn’t work, which shouldn’t be a surprise. Imagine if email
providers had decided to build consensus with spammers about spam
filtering rules. The spammers would have been all like, “It
replaces the principle of consumer choice with an arrogant ‘Hotmail
knows best’ system.” Any sensible email provider would ignore the
spammers but listen to deliverability concerns from senders of legit
opt-in newsletters. Spammers depend on sneaking around the user’s intent
to get their stuff through, so email providers that want to get and keep
users should stay on the user’s side. Fortunately for legit mail senders
and recipients, that’s what happened.
And now Google is doing their own thing. Some positive parts about
it, but by focusing on filtering annoying types of ad units they’re
closer to the Adblock Plus “Acceptable Ads” racket than to a real
solution. So it’s better to let Ben
Williams at Adblock Plus explain that one. I still don’t get how it
is that so many otherwise capable people come up with “let’s filter
superficial annoyances and not fundamental issues” and “let’s shake down
legit publishers for cash” as solutions to the web advertising problem,
though. Especially when $16
billion in adfraud is just sitting there. It’s almost as if the
Lumascape doesn’t care about fraud because it’s priced
in so it comes out of the publisher’s share anyway.
The web advertising problem looks big, but I want to think positive
about it.
billions of web users
visiting hundreds of web sites
with tens of third-party trackers per site.
That’s trillions of opportunities for tiny victories against
adfraud.
Right now most browsers and most fraudbots are hard to tell apart.
Both maintain a single “cookie jar” across trusted and untrusted sites,
and both are subject to fingerprinting.
For fraudbots, cross-site trackability is a feature. A fraudbot can
only produce valuable ad impressions on a fraud site if it is somehow
trackable from a legit site.
For browsers, cross-site trackability is a bug, for two reasons.
Leaking activity from one context to another violates widely held
user norms.
Because users enjoy ad-supported content, it is in the interest
of users to reduce the fraction of ad budgets that go to fraud and
intermediaries.
Browsers don’t have the solve the whole web advertising problem to
make a meaningful difference. As soon as a trustworthy site’s real users
look diffferent enough from fraudbots, because fraudbots make themselves
more trackable than users running tracking-protected browsers do, then
low-reputation and fraud sites claiming to offer the same audience will
have a harder and harder time trying to sell impressions to agencies
that can see it’s not the same people.
Of course, the browser market share numbers will still over-represent
any undetected fraudbots and under-represent the “conscious chooser”
users who choose to turn on extra tracking protection options. But
that’s an opportunity for creative ad agencies that can buy underpriced
post-creepy ad impressions and stay away from overvalued or worthless
bot impressions. I expect that data on who has legit users—made more
accurate by including tracking protection measurements—will be
proprietary to certain agencies and brands that are going after customer
segments with high tracking protection adoption, at least for a
while.
There’s enough bullshit on the Internet already, but I’m afraid I’m
going to quote some more. This time from Ilyse
Liffreing at IBM.
The reality is none of us can say with certainty that anywhere in the
world, we are [brand] safe. Look what just happened with YouTube. They
are working on fixing it, but even Facebook and Google themselves have
said there’s not much they can do about it. I mean, it’s hard. It’s not
black and white. We are putting a lot of money in it, and pull back on
channels where we have concerns. We’ve had good talks with the YouTube
teams.
Bullshit.
One important part of this decision is black and white.
If Nazis are better at “programmatic” than the resting-and-vesting
chill bros trying to deal with brand safety issues at the programmatic
ad firms (and, face it, Nazis kick ass at programmatic), then the choice
to spend ad money in a
we’re-kind-of-not-sure-if-this-goes-to-Nazis-or-not way is a choice that
puts your brand on the wrong side of a black and white line.
There are plenty of Nazi-free places for brands to run ads. They
might not be the cheapest. But I know which side of the line I buy
from.
Remove all the tracking widgets from your site. That Facebook “Like”
button only serves to exfiltrate your valuable data to an entity that
doesn’t have your best interests at heart. If you’ve got a valuable
audience, why would you want to help the ad tech industry which promises
“I can find the same and bigger audience over here for $2 CPM, so don’t
buy from the publisher?” Sticking your own head in the noose is never a
good idea.
That advice makes sense for the Facebook “like button.” That button
is just a data shoplifter. The others, though? All those extra trackers
come in as side effects of ad deals, and they’re likely to be
contractually required to make ads on the site saleable.
Yes, those trackers feed bots and data leakage, and yes, they’re even
terrible at fighting adfraud. Augustine Fou points out that Fraud
filters don’t work. “In some cases it’s worse when filter is
on.”
So in an ideal world you would be able to pull all the third-party
trackers, but as far as day-to-day operations go, user
tracking is a Chesterton’s Fence problem. What happens if a legit
site unilaterally takes down the third-party trackers? All the targeted
ad impressions that would have given that site a (small) payment end up
going to bots.
So what can a site do? Understand that the real fix has to
happen on the browser end, and nudge the users to either make
their browsers less data-leaky, or switch to browsers that are
leakage-resistant out of the box.
Start A/B testing some notifications to remind users to turn on
tracking protection.
Can you get users who are already choosing “Do Not Track” to turn
on real protection if you inform them that sites ignore their DNT
choice?
If a user is running an ad blocker with a paid
whitelisting scheme, can you inform them about it to get them to
switch to a better tool, or at least add a second layer of protection
that limits the damage that paid whitelisting can do?
When users visit privacy pages or opt-out of a marketing program,
are they also willing to check their browser privacy settings?
Every site’s audience is different. It’s hard to know in advance how
users will respond to different calls to action to turn up their privacy
and create a win-win for legit sites and legit brands. We do know that
users are concerned and confused about web advertising, and the good
news is that the JavaScript
needed to collect data and administer nudges is as easy to add as
yet another tracker.
If you’re responsible for a brand and somewhere in the mysterious
tubes of adtech your money is finding its way to Nazis, what is the
right course of action?
One wrong answer is to write a “please help me” letter to a company
that will just ignore it. That’s just admitting to knowingly sending
money to Nazis, which is clearly wrong.
Here’s another wrong idea, from the upcoming IAB
Annual Leadership Meeting session on “brand safety” (which is the
nice, sanitary professional-sounding term for “trying not to sponsor
Nazis, but not too hard.”)
Threats to brand safety arise internally and externally, in your
control and out of your control—and the stakes have never been higher.
Learn how to minimize brand safety risks and maximize odds of survival
when your brand takes a hit (spoiler alert: overreacting is as bad as
underreacting). Best Buy and Starcom share best practices based on
real-world encounters with brand safety issues.
Really, people? Overreacting is as bad as underreacting? The
IAB wants you to come to a deluxe conference about how it’s fine to send
a few bucks to Nazis here and there as long as it keeps their whole
adtech/adfraud gravy train running on time.
I disagree. If Best Buy is fine with (indirectly of course) paying
the occasional Nazi so that the IAB companies can keep sending them
valuable eyeballs from the cheapest possible sites, then I can shop
elsewhere.
Any nationalist extremist movement has its obvious supporters, who
wear the outfits and get the tattoos and go march in the streets and all
that stuff, and also the quiet supporters, who come up with the money
and make nice with the powers that be. The supporters who can keep it
deniable.
Can I, as a potential customer from the outside, tell the difference
between quiet Nazi supporters and people who are just bad at online
advertising and end up supporting Nazis by mistake? Of course not. Do I
care? Of course not. If you’re not willing to put the basic “don’t pay
Nazis to do Nazi stuff” rule ahead of a few ad clicks, I don’t want your
brand anyway. And I’ll make sure to install and use the tracking
protection tools that help keep my good data away from
bad sites.
In 2018, we’ll see the rapid decline of “place-ism,” the
discrimination against people who aren’t in a central office. Technology
is making it easier not just to communicate with distant colleagues
about work, but to have the personal interactions with them that are the
foundation of trust, teamwork, and friendship.
Really, “place-ism” only works if you can afford to overpay the
workers who are themselves overpaying for housing. And management can
only afford to overpay the workers by giving in to the temptations of
rent-seeking and deception. So the landlord makes the nerd pay too much,
the manager has to pay the nerd too much, and you end up with, like the
man said, [“debts that no honest man can
pay”](https://en.wikipedia.org/wiki/Atlantic_City_(song)?
Open source business news: Docker, Inc is Dead.
Easy to see this as a run-of-the-mill open source business failure
story. But at another level, it’s the story of how the existing open
source incumbents used
open practices to avoid having to bid against each other for an
overfunded startup.
Next time I hear someone from a social site talking about how much
they’re doing about extremists and misinformation and such, I have to
remember to ask: have you adjusted your revenue targets for political
advertising down in order to reflect the bad shit you’re not doing any
more? How
Facebook’s Political Unit Enables the Dark Art of Digital
Propaganda
Or are you just encouraging the “dark social” users to hide it
better?
I love “nopoly controls entire industry so there is no point
in it any more” stories: The
Digital Advertising Duopoly Good news on advertising. The
Millennials are burned out on advertising—most of what
they’re exposed to now is just another variant of “creepy annoying
shit on the Internet”—but the generation after the Millennials
are going to have hella mega opportunities building the next Creative
Revolution.
Bitcoin to the moooon: The futures market is
starting up, so here comes a bunch more day trader action. More
important, think about all the bucket shops (I even saw an “invest in
Bitcoin without owning Bitcoin” ad on public transit in London), legit
financial firms, Libertarian true believers, and coins lost forever
because of human error. Central bankers had better
keep an eye on Bitcoin, though. Last recession we saw that printing
money doesn’t work as well as it used to, because it ends up in the
hands of rich people who, instead of priming economic pumps with it,
just drive up the prices of assets. I would predict “Entire Round of
Quantitative Easing Gets Invested in Bitcoin Without Creating a Single
New Job” but I’m saving that one for 2019. Central banks will need to
innovate. Federal Reserve car crushers? Relieve medical debt by letting
the UK operate NHS clinics at their consulates in the USA, and we trade
them US green cards for visas that allow US citizens to get treated
there? And—this is a brilliant quality of Bitcoin that I
recognized too late—there is no bad news that could credibly hurt the
value of a purely speculative asset.
The lesson for regular people here is not so much what to do with
Bitcoin, but remember to keep putting some well-considered time into
actions that you predict have unlikely but large and favorable outcomes.
Must remember to do more of this.
High-profile Bitcoin kidnapping in the USA ends in
tragedy: Kidnappers underestimate the amount of Bitcoin
actually available to change hands, ask for more than the victim’s
family (or fans? a crowdsourced kidnapping of a celebrity is now a
possibility) can raise in time. Huge news but not big enough to slow
down something that the finance scene has already committed to.
Tech industry reputation problems hit open source.
California Internet douchebags talk like a positive social movement but
act like East Coast vampire squid—and people are finally
not so much letting them define the terms of the conversation. The
real Internet economy is moving to a three-class system: plutocrats,
well-paid brogrammers with Aeron chairs, free snacks and good health
insurance, and everyone else in the algorithmically-managed precariat.
So far, people are more concerned about the big social and surveillance
marketing companies, but open source has some of the same issues. Just
as it was widely considered silly for people to call Facebook users “the
Facebook community” in 2017, some of the “community” talk about open
source will be questioned in 2018. Who’s working for who, and who’s
vulnerable to the risks of doing work that someone else extracts the
value of? College athletes are ahead of the open source scene on this
one.
Adfraud becomes a significant problem for end users:
Powerful botnets in data centers drove the pivot to
video. Now that video adfraud is well-known, more of the fraud
hackers will move to attribution fraud. This ties in to adtech
consolidation, too. Google is better at beating simple to midrange fraud
than the rest of the Lumascape, so the steady progress towards a
two-logo Lumascape means fewer opportunities for bots in data
centers.
Attribution fraud is nastier than servers-talking-to-servers fraud,
since it usually depends on having fraudulent and legit client software
on the same system—legit to be used for a human purchase, fraudulent to
“serve the ad” that takes credit for it. Unlike botnets that can run in
data centers, attribution fraud comes home with you. Yeech. Browsers and
privacy tools will need to level up from blocking relatively simple
Lumascape trackers to blocking cleverer, more aggressive attribution
fraud scripts.
Wannabe fascists keep control of the US Congress, because
your Marketing budget: “Dark” social campaigns (both ads and
fake “organic” activity) are still a thing. In the USA, voter
suppression and gerrymandering have been cleverly enough done that
social manipulation can still make a difference, and it will.
In the long run, dark social will get filtered out by habits,
technology, norms, and regulation—like junk fax and email spam before
it—but we don’t have a “long run” between now and November 2018. The
only people who could make an impact on dark social now are the legit
advertisers who don’t want their brands associated with this stuff. And
right now the expectations to advertise on the major social sites are
stronger than anybody’s ability to get an edgy, controversial “let’s not
SPONSOR ACTUAL F—–G NAZIS” plan through the 2018 marketing budget
process.
Yes, the idea of not spending marketing money on supporting
nationalist extremist forums is new and different now. What a year.
Short puzzle relevant to some diversity and inclusion threads that
encourage people to share salary info. (I should tag this as “citation
needed” because I don’t remember where I heard it.)
Alice, Bob, Carlos, and Dave all want to know the average salary of
the four, but none wants to reveal their individual salary. How can the
four of them work together to determine the average? Answer below.
Answer
Alice generates a random number, adds it to her salary, and gives the
sum to Bob.
Bob adds his salary and gives the sum to Carlos.
Carlos adds his salary and gives the sum to Dave.
Dave adds his salary and gives the sum to Alice.
Alice subtracts her original random number, divides by the number of
participants, and announces the average. No participant had to share
their real salary, but everyone now knows if they are paid above or
below the average for the group.
Stuff the Internet needs: home fiber connections,
symmetrical, flat rate, on neutral terms.
Stuff the Internet is going nuts over:
cryptocurrencies.
Big problem with building fiber to the home:
capital.
Big problem with cryptocurrencies: stability.
Two problems, one solution? Hard to make any kind of currency useful
without something stable, with evidence-based value, to tie its value
to. Fiat currencies are tied to something of value?
Yes, people have to pay taxes in them. Hard to raise capital for
“dumb pipe” Internet service because it’s just worth about the same
thing, month after month. So what if we could combine the hotness and
capital-attractiveness of cryptocurrencies with the stability and actual
usefulness of fiber?
One quick question for anyone who still isn’t convinced that tracking
protection needs to be a high priority for web browsers in 2018. Web tracking isn’t just about items from your online
shopping cart following you to other sites. Users who are vulnerable to
abusive practices for health or other reasons have tracking protection
needs too.
Screenshot from the American Cancer
Society site, showing 24 web trackers
Who has access to the data from each of the 24 third-party trackers
that appear on the American Cancer Society’s Find Cancer Treatment and
Support page, and for what purposes can they use the data?
Well, in order to help slow down the spread of political speech
enforcement that is apparently stopping all of us cool innovator type
people from saying the Things We Can’t Say, here’s a Git hook to make sure that every
time you blog, you include at least one of the forbidden words.
If you blog without including one of the forbidden words, you’re
obviously internalizing censorship and need more freedom, which you can
maybe get by getting out of California for a while. After all, a lot of
people here seem to think that “innovation” is building more creepy
surveillance as long as you call it “growth hacking” or writing apps to
get members of the precariat to do the stuff that your Mom used to do
for you.
You only have to include one forbidden word every time you commit a
blog entry, not in every file. You only need forbidden words in blog
entries, not in scripts or templates. You can always get around the
forbidden word check with the –no-verify
command-line option.
Not much time to blog because work travel, but here is some of the
stuff I would have been linking to if I were writing anything. I plan to
get started again over the holiday break.
For developers: Find an open issue that matches your
skills and interests. Buy a futures contract connected to that issue
that will pay you when the issue is fixed. Work on the issue, in the
open—then decide if you want to hold your contract until maturity, or
sell it at a profit. Report an issue and pay to reward others to fix
it
For users: Create a new issue on the project bug
tracker, or select an existing one. Buy a futures contract on that issue
that will cost you a known amount when the issue is fixed, or pay you to
compensate you if the issue goes unfixed. Reduce your exposure to
software risks by directly signaling the project participants about what
issues are important to you. Invest in futures on an open source
market
Bug futures also open up the possibility of incentivizing other kinds
of work, such as clarifying and translating bug reports, triaging bugs,
writing failing tests, or doing code reviews—and especially arbitrage of
bugs from project to project.
Bug futures are different from open source bounty systems, what have
been repeatedly tried but have so far failed to take off. The big
problem with conventional open source bounty systems is that, as far as
I can tell, they fail to incentivize cooperative work, and in a lot of
situations might incentivize un-cooperative behavior. If I find a bug in
a web application, and offer a bounty to fix it, the fix might require
JavaScript and CSS work. A developer who fixes the JavaScript and gets
stuck on the CSS might choose not to share partial work in order to
contend for the entire bounty. Likewise, the developer who fixes the CSS
part of the bug might get stuck on the JavaScript. Because of how
bounties are structured, if the two wanted to split the bounty they
would need to find, trust, and coordinate with each other. Meanwhile, if
the bug was the subject of a futures contract, the JavaScript developer
could write up a good commit message explaining how their partial work
made progress toward a fix, and offer to sell their side of the
contract. A CSS developer could take on the rest of the work by buying
out that position.
Futures trading and risk shifts
But will bug futures tend to shift the risks of software development
away from the “owners” of software (the owners don’t have to be
copyright holders, they could be those who benefit from network effects)
and toward the workers who develop, maintain, and support it?
I don’t know, but I think that the difference between bug futures and
piecework is where you put the brains of the operation. In piecework and
the gig economy, the matching of workers to tasks is done by management,
either manually or in software. Workers can set the rate at which they
work in conventional piecework, or accept and reject tasks offered to
them in the gig economy, but only management can have a view of all
available tasks.
Bug futures operate within a commons-based
peer production environment, though. In an ideal peer production
scene, all participants can see all available tasks, and select the most
rewarding tasks. Somewhere in the economics
literature there is probably a model of task selection in open source
development, and if I knew where to find it I could put an impressive
LaTeX equation right around here. Of course, open source still
has all kinds of barriers that make matching of workers to tasks less
than ideal, but it’s a good goal to keep in mind.
If you do bug futures right, they interfere as little as possible
with the peer production advantage—that it enables workers to match
themselves to tasks. And the futures market adds the ability for people
who are knowledgeable about the likelihood of completion of a task,
usually those who can do the task, to profit from that knowledge.
Rather than paying a worker directly for performing a task, bug
futures are about trading on the outcomes of tasks. When participating,
you’re not trading labor for money, you’re trading on information you
hold about the likelihood of successful completion of a task. As in
conventional financial markets, information must be present on the
edges, with the individual participants, in order for them to
participate. If a feature is worth $1000 to me, and someone knows how to
fix it in five minutes, bug futures could facilitate a trade that’s
profitable to both ends. If the market design is done right, then most
of that value gets captured by the endpoints—the user and developer who
know when to make the right trade.
The transaction costs of trading in information tend to be lower than
the transaction costs of trading in labor, for a variety of reasons
which you will probably believe in to different extents depending on
your politics. What if we could replace some direct trading in labor
with trading in the outcomes of that labor by trading information? Lower
transaction costs, more gains from trade, more value created.
Some random notes about open source metrics, related to work on CHAOSS,
where Mozilla is a member and I’m on the
Governing Board.
As far as I can tell, there are three kinds of open source
metrics.
Impact metrics cover how much value the software
creates. Possible good ones include count of projects dependent on this
one, mentions of this project in job postings, books, papers, and
conference talks, and, of course sales of products that bundle this
project.
Contributor reward metrics cover how the software is
a positive experience for the people who contribute to it. Job postings
are a contributor reward metric as well as an impact metric. Contributor
retention metrics and positive results on contributor experience surveys
are some other examples.
But impact metrics and contributor reward metrics tend to be harder
to collect, or slower-moving, than other kinds of metrics, which I’ll
lump together as activity metrics. Activity metrics
include most of the things you see on open source project dashboards,
such as pull request counts, time to respond to bug reports, and many
others. Other activity metrics can be the output of natural language
processing on project discussions. An example of that is FOSS Heartbeat,
which does sentiment analysis, but you could also do other kinds of
metrics based on text.
IMHO, the most interesting questions in the open source metrics area
are all about: how do you predict impact metrics and contributor
reward metrics from activity metrics? Activity metrics are easy
to automate, and make a nice-looking dashboard, but there are many
activity metrics to choose from—so which ones should you look at?
Which activity metrics are correlated to any impact metrics?
Which activity metrics are correlated to any contributor reward
metrics?
Those questions are key to deciding which of the activity metrics to
pay attention to. I’m optimistic that we’ll be seeing some interesting
correlations soon.
40 trackers. Not bad, but not especially good either. That purple
box of data leakage—third-party trackers that forced Linux Journal
into an advertising race to the bottom against low-value and fraud
sites—is not so deep as a well, nor so wide as a church door…but it’s
there. A magazine that was a going concern in print tried to make the
move to the web and didn’t survive.
This might get the privacy activists mad at me, but as far as I can
tell it’s still counterproductive to ask a web site you visit to remove
its third-party trackers.
Right now, advertising on the site you’re writing to probably isn’t
saleable without the creepy trackers.
And if the site you’re writing to is a brand site, the marketing
department can’t justify spending money to support content sites without
showing some kind of tracking data. (more on this problem: User
tracking as Chesterton’s Fence) So what can privacy people
productively ask sites for? Some good ones are:
Fix any “turn off your ad blocker” scripts to detect ad blockers
only, and not falsely alert on privacy tools.
Remove links to the the confusing and broken “YourAdChoices”
site. Adtech opt-outs don’t cover all trackers, and are much less
effective than real privacy tools. (I have never had all the opt-outs
work on that site, even from a fresh, pristine browser. Somehow I get
the sense that the adtech firms don’t exactly put their best people on
it.)
Link to the privacy pages for the third parties the site uses. If
the advertising on the site is set up so that this is hard to do, and
users might see a tracker from an unknown domain, say so.
Fix up the privacy page to add links to appropriate privacy tools
based on the user’s browser. Better to have users on privacy tools than
get enrolled in a paid
whitelisting scheme.
If you maintain a privacy tool, offer to do a campaign with the
site. Privacy tool users are high-quality human traffic. Free or
discounted privacy tools might work as a subscription promotion. Where’s
the win-win?
Asking a site to walk away from money with no credible alternative is
probably not going to work. Asking a site to consider next steps to get
out of the current web advertising mess? That might.
Last time I recommended the
Tracking Protection feature in Firefox 57, coming tomorrow. The fast
browser is even
faster when you block creepy trackers, which are basically untested
combinations of third-party JavaScript.
But what about sites that mistakenly detect Tracking Protection as
“an ad blocker” and give you grief about it? Do you have to turn
Tracking Protection off?
So far I have found that the answer is usually no. I can usually use
NJS to
turn off JavaScript for that site instead. (After all, if a web
developer can’t tell an ad blocker from a tracking protection tool, I
don’t trust their JavaScript anyway.)
NJS will also deal with a lot of “growth hacking” tricks such as
newsletter signup forms that appear in front of the main article. And it
defaults to on, so that sites with JavaScript will work normally until I
decide that they’re better off without it.
In the country where I live, kidnapping for ransom is not a very
common crime.
That’s because picking up the ransom is too risky.
It’s easy to kidnap someone, and easy to let the person go when the
ransom is paid, but picking up the ransom exposes you. Wannabe
kidnappers who are motivated by money tend to choose other crimes.
As the [family relationship redacted] of a [family member information
redacted], I’m happy that kidnapping is difficult here. High transaction
costs for some kinds of transaction are a good thing.
Now, here comes Bitcoin.
As we’re already seeing with ransomware, harder-to-trace ransom drops
are now a thing.
So, even though I don’t actually hold Bitcoin, someone could grab my
family member (low risk), demand that I exchange some of my conventional
assets for Bitcoin (low risk) and send the Bitcoin as ransom (low risk).
The balance between risk and reward for the crime of kidnapping for
ransom has changed.
Make the Bitcoin business eat the costs of payments made under
duress.
New rule: If I ever trade any assets for Bitcoin in order to comply
with a threat, and then transfer the Bitcoin under duress (kidnapping,
ransomware, whatever), then I can go back to whoever I gave the assets
to with a copy of the police report on the incident and get my original
assets (and any fees) back.
Yes, that makes it harder for regular people to trade assets for
Bitcoin. Exchanges would have to hold the money for a while, check that
I’m not under duress, and probably do all kinds of other
pain-in-the-ass, possibly costly, work. But I’d rather have that than
the alternative.
Firefox 57 is coming on
Tuesday, and as you may have heard, add-ons
must use the WebExtensions API. I have been running Firefox Nightly
for a while, so add-on switching came for me early. Here is what I have
come up with.
HTTPS
Everywhere. This is pretty basic. Use the encrypted version of a
site where available.
Link
Cleaner. Get rid of crappy tracking parameters in URLs, and speed up
some navigation by skipping data collection redirects.
NJS.
Minimal JavaScript disable/enable button that remembers the setting by
site and defaults to “on”. Ever notice how the sites that use JavaScript
for real web applications are different from the sites that use
JavaScript for “growth hacking” such as newsletter popups? This add-on
keeps JavaScript working normally for most sites, and lets me revoke the
JavaScript privileges of wannabe growth hackers.
Privacy Badger is not
on here just because I’m using Firefox Tracking Protection. I like
both.
Blogging, development and testing
blind-reviews.
This is an experiment to help break your own habits of bias when
reviewing code contributions. It hides the contributor name and email
when you first see the code, and you can reveal it later. Right now it
just does Bugzilla, but watch this space for an upcoming GitHub version.
(more
info)
Copy
as Markdown. Not quite as full-featured as the old “Copy as HTML
Link” but still a time-saver for blogging. Copy both the page title and
URL, formatted as Markdown, for pasting into a blog.
Firefox
Pioneer. Participate in Firefox user research. Studies have
extremely strict and detailed privacy policies.
Test
Pilot. Try new Firefox features. Tracking Protection was on Test
Pilot for a while. Right now there is a new speech recognition one, an
in-browser notepad, and more.
Advanced (for now) nerdery
Cookie
AutoDelete. Similar to the old “Self-Destructing Cookies”. Cleans up
cookies after leaving a site. Useful but requires me to whitelist the
sites where I want to stay logged in. More time-consuming than other
privacy tools.
PrivacyPass. This is
new. Privacy Pass interacts with supporting websites to introduce an
anonymous user-authentication mechanism. In particular, Privacy Pass is
suitable for cases where a user is required to complete some
proof-of-work (e.g. solving an internet challenge) to authenticate to a
service. Right now I don’t use any sites that have it, but it could
be a great way to distribute “tickets” for reading articles or leaving
comments.
What to use instead? For most people, either the built-in Firefox
Tracking Protection or EFF’s Privacy Badger will
provide good protection. I would try one or both of those before a
conventional ad blocker. If sites have a broken ad blocker detector that
falsely identifies a tracking protection tool as an ad blocker, you can
usually get around it by turning off JavaScript for that site with NJS.
If you still want to get rid of more ads and join the blocker
vs. anti-blocker game (I don’t), there’s always uBlock
Origin, which does not do paid whitelisting. The project site has more
info). But try either the built-in tracking protection or Privacy
Badger first.
It’s getting hard to explain still taking web advertising seriously
in 2017, so I had better write something down. To start with, what is
web advertising exactly?
Fraudshitshow
where intermediaries make enough money from fraud to be understandably
uninterested in fixing it, and react
with hostility when one browser does something to make a
difference?
Fallback business model for sites that can’t do anything else?
Advertising is to web companies as scrap value is to machine tools. Even
originally ad-supported sites are getting
intoother
businesses.
From the point of view of users, web advertising has
failed to hold up its end of the signal for attention
bargain, and substituted nasty attempts at manipulation. No wonder
people block it.
From the point of view of clients, web advertising
has failed to meet the basic honesty standards that any third-rate print
publication can. And every web advertising company is calling fraud an
industry-wide problem, which is what business people say when
they really don’t care about fixing something.
From the point of view of publishers, web
advertising has failed to show the proverbial money. It’s stuck at a
fraction of the value per user minute that print can pull in, which
means that as print goes away, so does the ad money.
Web advertising has failed the audience, the advertisers, and the
people who make ad-supported news and cultural works. Maybe I should go
be a fan of something else, like securitizing
bug trackers or something. Web advertising just is that annoying,
creepy thing that browsers are competing to block in different,
creative, ways. [T]he online ad sector transitioned from a
creative-led industry to a data and algorithms-led industry, wrote
venture capitalist Adam Fisher, who is understandably proud of not
investing in it.
Some new companies, such as Scroll, are all about making it
easier for readers to buy out of seeing advertising. Advertising is to
web sites as annoying “UNREGISTERED SHAREWARE” banners and dialogs are
to computer software.
At least search advertising is working. Bob Hoffman calls it a
“much better yellow pages.” But any kind of brand-building,
signal-carrying advertising, where most of the money is? Not there. Ever
notice how much of the evidence for “data-driven” advertising is
anecdotal?
Is anyone speaking up for web advertising? Not really. Where
advertising still has a policy voice, it’s a bunch of cut-and-paste
anti-privacy advocacy that sounds like what you might get from eighth
grade Libertarians, or from people who are so bad at math they assume
that it’s humanly possible to read and understand Terms of Service from
70 third-party trackers on one web page. The Interactive Advertising
Bureau has become the voice of schemes that are a few pages of fine
print away from malware and spam. By expanding
to include members whose interests oppose those of legit publishers and
advertisers, and defending every creepy user privacy violation
scheme that the worst members come up with, an organization that could
have been a voice for pro-advertising policy positions has made itself
meaningless. Right now the IAB is about as relevant to web advertising
policy as the Tetraethyl Lead Industry Association is relevant to
transportation policy.
Bad news all the way around, right? But some of us have been
somewhere like this before.
Remember the operating systems market in the late 1990s?
All the right-thinking people were going Windows NT.
Yes, even Tim O’Reilly, who built version 1.0 of his company on Unix,
had apparently written it off. The spring 1998 O’Reilly catalog had all
Windows books on the cover, and the Unix stuff was in back. O’Reilly and
Associates was promoting the company’s first and only shrink-wrap
software, a web server for Windows NT.
And why not? Bickering Unix vendors were doing short-sighted stunts
such as removing the compiler from the basic version, and charging
hard-to-justify prices for workstations and servers that users could
beat with a properly-configured PC. Who needed it?
We know what happened shortly after that. The Unix scene Did anyone ever make a “Lumascape”-like chart of the Unix
vendors? faded away and, with enough drama to make for good IT
news coverage but not enough to interfere with successful efforts to fix
the Year 2000 Problem, the Linux scene replaced it.
The good news is that people employed in the Unix scene were able to
move, in most cases happily, to the Linux scene. (Which is big enough
that it has become the OS for the “IoT”, “Saas” and “Cloud” businesses,
and a majority of “mobile” by units, but not of course profits) So maybe
my experience living through the end of Unix is why I’m still a web
advertising optimist. The economic niche for advertising hasn’t gone
away. Just as software had to get some important licensing
and API
decisions right in order to make the Linux boom happen, web advertising
is so close to getting it right, too. Now that we know the basics…
People have norms about data sharing. Browsers must reflect those
norms or get replaced.
People don’t like to micromanage their attention and privacy, and
expect companies they deal with to cover the costs of coming into
compliance with norms.
…the next steps are coming together pretty quickly.
Always run a shell script in the directory in which it appears, and
change back to the directory you were in when you ran it even if it
fails.
trap popd EXIT
pushd $PWD
cd $(dirname "$0")
Works for me in bash. The pushd command does a
cd but saves the directory where you were on a stack, and
popd pops the saved directory from the stack. The
trap ... EXIT is a bash way to run something when the
script exits, no matter how, and dirname "$0" is the
directory name of the script.
(Taken from the deploy.sh script that rebuilds and
deploys this blog, so if you can read this, it works.)
What’s this? It’s basically the spawn of Git and a NoSQL database. So
why would anybody want to make that? For Science, of course. A lot of
research produces huge data files, and people would like to have a
resilient way to collaborate on them, using commands they already
know—but have it scale horizontally across large numbers of nodes, NoSQL
style.
Git has the advantage that a lot of people know it, but it doesn’t
really handle huge files that well. There are add-on solutions to make
it work by connecting to another system for handling large files, but
then you have to set up and trust two systems. And one of my favorite
properties of Git is that any authorized user of a project can check the
integrity of the entire project back to the beginning.
So what Attaca does is to consistently split huge files across a
cluster, using cluster nodes that can be cheap VPSs, low-end servers
with spinning disks, whatever. (In the test environment, nodes are just
Linux containers.)
Next steps are to test it out with some scientific data (genomes,
medical imaging, and so on), implement some more Git commands so that
people can check files out and not just in, and build a (Raspberry Pi?)
demo cluster.
Please stop by our
demo of Trading futures, fixing bugs: a live Smart Contracts
installation.
What is it?
Bugmark is a market that connects people who want better software to
the people who can build it.
In order to make open collabration more effective, we are using
simple market mechanisms to add incentives to do useful work.
Bugmark allows you to
Put financial value directly in the hands of the people who can
fix the software issues that are most important to you.
Discover which issues really matter to your project’s
users.
Work with open source practices and not against them.
Solve part of a problem and still get paid, instead of contending to
claim credit for a bounty payment.
Find an issue, fix it, and earn money
Vist Bugmark to find an open issue that matches your skills and
interests. Buy a futures contract connected to that issue that will pay
you when the issue is fixed. Work on the issue, in the open—then decide
if you want to hold your contract until maturity, or sell it at a
profit.
Report an issue and pay to reward others to fix it
Create a new issue on the project bug tracker, or select an existing
one. Buy a futures contract on that issue that will cost you a known
amount when the issue is fixed, or pay you to compensate you if the
issue goes unfixed. Reduce your exposure to software risks by directly
signaling the project participants about what issues are important to
you.
Invest in futures on an open source market
Development isn’t the only task required to make a software project a
success. You can trade futures to earn a profit from other vital tasks,
such as clarifying and translating bug reports, triaging bugs, writing
failing tests, or doing code reviews.
Looking for a way to get dedicated readers to un-block some of the
ads on your site? One way could be to update and integrate the AdLeaks system:
Our ads contain code that encrypts an empty message with the AdLeaks
public key and sends the ciphertext back to AdLeaks. This happens on all
users’ web browsers. A whistleblower’s browser substitutes the
ciphertext with encrypted parts of a disclosure. The protocol ensures
that an adversary who can eavesdrop on the network communication cannot
distinguish between the transmissions of regular browsers and those of
whistleblowers’ browsers.
Naturally sites would want to encourage whistleblowers (and others)
to block the regular creepy ad trackers—but building post-creepy ads and
hooking this up to them could be a way to encourage the dedicated
readers to treat the high-reputation ads differently from the
low-reputation ones.
(I work for Mozilla. None of this is secret. None of this is
official Mozilla policy. Not speaking for Mozilla here.)
The following is an interesting business model, so I’m going to tell
it whether it’s true or not. I once talked with a guy from rural China
about the tofu business when he was there. Apparently, considering the
price of soybeans and the price you can get for the tofu, you don’t earn
a profit just making and selling tofu. So why do it? Because it leaves
you with a bunch of soybean waste, you feed that to pigs, and you make
your real money in the hog business.
Which is sort of related to the problem that (all together now)
hard news isn’t brand-safe. It’s hard to sell travel agency ads
on a plane crash story, or real estate ads on a story about asbestos in
the local elementary schools, or any kind of ads on a disturbing,
but hard to look away from, political scene.
In the old-school newspaper business, the profitable ads can go in
the lifestyle or travel sections, and subsidize the hard news operation.
The hard news is the tofu and the brand-friendly sections are the
hogs.
On the web, though, where you have a lot of readers coming in from
social sites, they might be getting their brand-friendly content from
somewhere else. Sites that are popular for their hard news are stuck
with just the tofu.
This is one of the places where it’s going to be interesting to watch
the shift
from unpermissioned user data collection to user data sharing by
permission. As people get better control of how they share data with
sites—whether that’s through regulation, browsers scrambling for users,
or both—how will a site’s ability to deliver trustworty hard news give
it an advantage?
The browser may have to adapt to treat
trustworthy and untrustworthy sites differently, in order to come up
with a good balance of keeping sites working and implementing user norms
on data sharing. Will news sites that publish hard news stories that are
often visited, shared, and commented on, get a user data advantage that
translates into ad saleability for their more brand-safe pages? Does
better user data control mean getting the hog business back?
(I work for Mozilla. None of this is secret. None of this is
official Mozilla policy. Not speaking for Mozilla here.)
Browsers are going to have to change tracking protection defaults,
just because the settings that help acquire and retain users are
different from the current defaults that leave users fully trackable all
the time. (Tracking protection is also an opportunity for open web
players to differentiate themselves from mobile
tracking devices.)
Before switching defaults, there are a bunch of opportunities to do
collaboration and data collection in order to make the right choices and
increase user satisfaction and trust (and retention). Interestingly
enough, these tend to give an advantage to any browser that can attract
a diverse, opinionated, values-driven user base.
Do innovation challenges and crowdsourcing for tracking
protection tools. Use the results to expand the available APIs and
built-in options.
Develop a variety of tracking protection methods, and ship them
in a turned-off state so that motivated users can find the configuration
and experiment with them, and to enable user research. Borrow approaches
from other browsers (such as Apple
Safari) where possible, and test them.
For example: avoid blocklist politics, and increase surveillance
marketing uncertainty, by building Privacy-Badger-like tracker
detection. Enable tracking protection without the policy implications of
a top-down list. This is an opportunity for a crowdsourcing
challenge: design better algorithms to detect trackers, and block them
or scramble state.
Ship alternate experimental builds of the browser, with privacy
settings turned on and/or add-ons pre-installed.
Communicate a lot about capabilities, values, and research. Spend
time discussing what the browser can do if needed, and
discussing the results of research on how users prefer to share their
personal info.
Only communicate a little about future defaults. When asked about
specifics, just say, “we’ll let the user data help us make that
decision.” (Do spam filters share their filtering rules with spammers?
Do search engines give their algorithms to SEO consultants?)
Build functionality to “learn” from the user’s activity and
suggest specific settings that differ from the defaults (in either
direction). For example, suggest more protective settings to users who
have shown an interest in privacy—especially users who have installed
any add-on whose maintainers misrepresent it as a privacy tool.
Do research to help legit publishers and marketers learn more
about adfraud and how it is enabled by the same kinds of cross-site
tracking that users dislike. As marketers better understand the risk
levels of different approaches to web advertising, make it a better
choice to rely less on highly intrusive tracking and more on
reputation-driven placements.
Provide documentation and tutorials to help web developers
develop and test sites that will work in the presence of a variety of
privacy settings. “Does it pass Privacy Badger” is a good start, but
more QA tools are needed.
If you do it right, you can force up the risks of future surveillance
marketing just by increasing the uncertainty of future user
trackability, and drive more marketing investment away from creepy
projects and toward pro-web, reputation-driven projects.
This is OFF MESSAGE. No Mozilla policy here. This is my
personal blog.
(This is the text from my talk at the Reynolds Journalism Institute’s
Revenue
Models that Work event, with some links added. Not exactly as
delivered.)
Hi. I may be the token advertising optimist here.
Before we write off advertising, I just want to try to figure out the
answer to: why can’t Internet publishers make advertising work as well
as publishers used to be able to make it work when they were breathing
fumes from molten lead all day? Has the Internet really made something
that much worse?
I have bought online advertising, written and edited for ad-supported
sites, had root access to some of the servers of an adtech firm that you
probably have cookies from right now, and have written an ad blocker.
Now I work for Mozilla. I don’t have any special knowledge on what
exactly Mozilla intends to do about third-party cookies, or
fingerprinting, or ad blocking, but I can share some of what I have
learned about users’ values, and some facts about the browser business
that will inform those decision for Mozilla and other browsers.
First of all, I want to cover how new privacy tools are breaking web
advertising as we know it. But that’s fine. People don’t like web
advertising as we know it.
So what don’t they like?
A 2009 study
at the University of Pennsylvania came up with the result that “most
adult Americans do not want advertisers to tailor advertisements to
their interests.”
When the researchers explained how ad targeting works, the percentage
went up.
We have known for quite a while that people have norms about how they
share their personal information.
Pagefair study
That Pennsylvania study isn’t the only one. Just recently a company
called Pagefair did a survey on when people would choose to share their
info on the web.
They surveyed 300 publishers, adtech people, brands, and various
others, on whether users will consent to tracking under the GDPR and the
ePrivacy Regulation.
Some examples:
The survey asked if users would allow for tracking on one site only,
and for one brand only, in addition to “analytics partners”. 79% of
respondents said they would click “No” to this limited consent
request.
And what kind of tracking policy would people prefer in the browser
by default? The European Parliament suggested that “Accept only first
party tracking” should be the default. But only 20% of respondents said
they would select this. Only 5% were willing to “accept all tracking”.
56% said they would select “Reject tracking unless strictly necessary
for services I request”. The very large majority (81%) of respondents
said they would not consent to having their behaviour tracked by
companies other than the website they are visiting.
Users say that they really don’t like being tracked. So, right about
now is where you should be pointing out that what people say
about what they want is often different from what they do.
It’s hard to see exactly what people do about particular ads, but we
can see some indirect evidence that what people do about creepy ads is
consistent with what they say about privacy.
First, ad blockers didn’t catch on until people started to see
retargeting.
Second, companies indirectly reveal their user research in
policies and design decisions.
Back in 1998, when Google was still “google.stanford.edu” I wrote an
ad blocker. And there were a bunch of other pretty good ones in the late
1990s, too. WebWasher, AdSubtract, Internet Junkbuster. But none of that
stuff caught on. That was back when most people were on dialup, and
downloading a 468x60 banner ad was a big deal. That’s before browsers
came with pop-up blockers, so a pop-up was a whole new browser window
and those could get annoying real fast.
But users didn’t really get into ad blocking. What
changed between then and now? Retargeting. People could see that the
ad on one site had “followed them” from a previous site. That creeped
them out.
Some Facebook research clearly led in the same direction.
As we should all know by now, Facebook enables an extremely fine
level of micro-targeting.
Yes, you can target 15 people in Florida.
But how do the users feel about this?
We can’t see Facebook’s research. But we can see the result of it, in
Facebook
Advertising Policies. If you buy an ad on Facebook, you can target
people based on all kinds of personal info, but you can’t reveal that
you did it.
Ads must not contain content that asserts or implies personal
attributes. This includes direct or indirect assertions or implications
about a person’s race, ethnic origin, religion, beliefs, age, sexual
orientation or practices, gender identity, disability, medical condition
(including physical or mental health), financial status, membership in a
trade union, criminal record, or name.
So you can say “meet singles near you” but you can’t say “other
singles”. You can offer depression counseling in an ad, but you can’t
say “treat your depression.”
Facebook is constantly researching and tweaking their site, and, of
course, trying to sell ads. If personalized targeting didn’t creep
people the hell out, then the ad policy wouldn’t make you hide that you
were doing it.
Mozilla
All right, so users don’t want to be followed around.
Where does Mozilla come in?
Well, Mozilla is supposed to be all about data privacy for the user.
We have these Data Privacy
Principles
No surprises Use and share information in a way
that is transparent and benefits the user.
User control Develop products and advocate for
best practices that put users in control of their data and online
experiences.
Limited data Collect what we need, de-identify
where we can and delete when no longer necessary.
Sensible settings Design for a thoughtful
balance of safety and user experience.
Defense in depth Maintain multi-layered security
controls and practices, many of which are publicly verifiable.
If you want a look at what Mozilla management is thinking about the
tracking protection slash ad blocking problem, there’s always Proposed
Principles for Content Blocking by Denelle Dixon.
Content Neutrality: Content blocking software
should focus on addressing potential user needs (such as on performance,
security, and privacy) instead of blocking specific types of content
(such as advertising).
Transparency & Control: The content blocking
software should provide users with transparency and meaningful controls
over the needs it is attempting to address.
Openness: Blocking should maintain a level
playing field and should block under the same principles regardless of
source of the content. Publishers and other content providers should be
given ways to participate in an open Web ecosystem, instead of being
placed in a permanent penalty box that closes off the Web to their
products and services.
If we have all those great values though, why aren’t we doing more to
protect users from tracking?
Here’s the problem from the browser point of view.
Firefox had a tracking protection feature in
2015.
Firefox had a proposed “Cookie Clearinghouse” that was going to
happen with Stanford, back in 2013. Firefox developers were talking about
third-party cookie blocking then, too.
But the mainstream browsers have always been held back by two
things.
First, browser developers have been cautious about not breaking
sites. We know that users prefer not to be tracked from site to site,
but we know that they get really mad when a site that used to work just
stops working. There is a lot of code in a lot of browsers to handle
stuff that no self-respecting web designer has done for decades.
Remember the 1996 movie “Space Jam”? Check out the web
site some time. It’s a point of pride to keep all that 1996 web
design working. And seriously, one of those old 1996 vintage pages might
be the web-based control panel for somebody’s emergency generator, or
something. Yes, browsers consider the users’ values on tracking, but
priority one is not breaking stuff.
And that includes third-party resources that are not creepy ad
trackers—stuff like shopping carts and comment forms and who knows
what.
Besides not breaking sites, the other thing that keeps browsers from
implementing users’ values on tracking is that we know people like free
stuff. For a long time, browsers didn’t have enough good data, so have
deferred to the adtech business when they talk about how sites make
money. It looks obvious, right? Sites that release free stuff make money
from ads, ads work a certain way, so if you interfere with how the ads
work, then sites make less money, and users don’t get the free
stuff.
Mozilla backed down on third-party cookies in 2013, and again on
tracking protection in 2015.
Microsoft backed down on Tracking Protection Lists.
Both times, after the adtech industry made a big fuss about it.
So what changed? Why is now different?
Well, that’s an easy answer, right? Apple put Intelligent Tracking
Prevention into their Safari browser, and now everybody else has to
catch up.
Apple so far is putting their users ahead of the usual
alarmed letters from the adtech people. Steven Sinofsky, former
president of the Windows Division at Microsoft, tweeted,
Stand strong Apple [rhetorical]. Had these groups come after us trying
to offer browsing safety. MS backed down. — 🎃 Steven Sinofsky ॐ (@stevesi)
September
14, 2017
But that’s not all of it.
You’re going to see other browsers make moves that look like they’re
“following Safari” but really, browsers are not so much following each
other as making similar decisions based on similar information.
When users share their values they say that they want control over
their information.
When users see advertising that seems “creepy” we can see them take
steps to avoid ads following them around.
Some people say, well, if users really want privacy, why don’t they
pay for privacy products? That’s not how humans work. Users don’t pay
for privacy, because we don’t pay other people to come into compliance
with basic social norms. We don’t pay our loud neighbors to quiet
down.
Apple looks like a magic company that releases magic things that they
make up out of their own heads. “Designed by Apple in California.” This
is a great show. It’s part of their brand. I have a lot of respect for
their ability to make things look simple.
But that doesn’t mean that they just make stuff up.
Apple does a lot of user research. Every so often we get a little
peek behind the curtain when there is discovery in a lawsuit. They do
research on their own users, on Samsung’s users, everybody.
Mozilla has user research, too.
For a long time, browser people thought that there was a conflict
between giving the users something that complies with their tracking
norms and giving them something that keeps them happy with the sites
they want to use.
But now it turns out that we have some ways that we could act in
accordance with user values that also produce measurably more satisfied
users.
How badly does privacy protection break sites?
Mozilla’s testing team has built, deployed to users, and tested nine
different sets of cookie and tracking protection policies.
Lots of people thought there are going to be things that break sites
and protect users, or leave sites working and leave users
vulnerable.
It turns out that there is a configuration that gives both better
values alignment and less breakage.
Because a lot of that breakage is caused by third-party
JavaScript.
We’re learning that in a few important areas, even though Apple
Safari is in the lead, Apple’s Intelligent Tracking Prevention doesn’t
go far enough.
What users want
It turns out that when you do research with people who are not
current users of ad blockers, and offer them choices of features, the
popular choices are tracking blockers, malvertising protection, and
blocking annoying ads such as auto-play videos. Among those users who
aren’t already using an ad blocker, the offer of an ad blocker wasn’t as
popular.
Yes, people want to see fewer annoying ads. And nobody likes malware.
But people are also interested in protection from tracking. Some users
even put tracking protection ahead of malvertising protection.
If you only ask about annoying ad formats you get a list of which ad
formats are popular now but get on people’s nerves. This is where Google
is now. I have no doubt that they’ll catch up. Everyone who’s ever
moderated a comment section knows what the terrible ads are. And any
publisher has the motivation to moderate and impose standards on the ads
on their site. Finding which ads are the crappy ones are not the
problem. The problem is that legit sites and crappy sites are in the
same ad space market, competing for the same eyeballs. As a legit site,
you have less market power to turn down an ad that does not meet your
policies.
We are coming to an understanding of where users stand. In a lot of
ways we’re repeating the early development of spam filters, but in slow
motion.
Today, a spam filter seems like a must-have feature for any email
service. But MSN started talking about its spam filtering back when
Sanford Wallace, the “Spam King,” was saying stuff like this.
I have to admit that some people hate me, but I have to tell you
something about hate. If sending an electronic advertisement through
email warrants hate, then my answer to those people is “Get a life.
Don’t hate somebody for sending an advertisement through email.” There
are people out there that also like us.
According to spammers, spam filtering was just Internet nerds
complaining about something that regular users actually like. But the
spam debate ended when big online services, starting with MSN, started
talking about how they build for their real users instead of for
Wallace’s hypothetical spam-loving users.
If you missed the email spam debate, don’t worry. Wallace’s talking
points about spam filters constantly get recycled by the IAB and the
DMA, every time a browser makes a move toward tracking protection. But
now it’s not email spam that users supposedly crave. Today, they tell us
that users really want those ads that follow them around.
So here’s the problem. Users are clear about their values and
preferences. Browsers must reflect user values and preferences. Browsers
have enough of a critical mass of users demanding better protection from
tracking that browsers are going to have to move or become
irrelevant.
That’s what the email providers did on spam. There were not enough
pro-spam users to support an email service without a spam filter.
And there may not be enough pro-targeting users to support a browser
without privacy tools.
As I said, I do not know exactly how Mozilla is going to handle this,
but every browser is going to have to.
But I can make one safe prediction.
Browsers need users. Users prefer tracking protection. I’m going to
make a really stupid, safe prediction here.
User adoption of tracking protection will not affect the amount of
user data available, or affect any measurement of number of targeted ad
impressions available in any way.
Every missing trackable user will be replaced by an adfraud bot.
Every missing piece of user data will be replaced by an “inferred”
piece of data.
How much adfraud is there really?
There are people who will stand up and say that we have 2 percent
fraud, or 85 percent. Of course it’s different from campaign to campaign
and some advertisers get burned worse than others.
You can see “IAS safe traffic” on fraud boards. Because video views
are worth so much more, the smartest bots go there. We do know that when
you look for adfraud seriously, you can find it. Just recently the
Financial Times found a bunch.
The publisher has found display ads against inventory masquerading as
FT.com on 10 separate ad exchanges and video ads on 15 exchanges, even
though the FT doesn’t even sell video ads programmatically, with 300
accounts selling inventory purporting to be the FT’s. The scale of the
fraud uncovered is vast — the equivalent of one month’s supply of bona
fide FT.com video inventory was fraudulently appearing in a single
day.
[W]e were seated next to the head of this advertising company, who
said to me something like, “Well, I really always liked AllThingsD and
in your first week I think Recode’s produced some really interesting
stuff.” And I said, “Great, so you’re going to advertise there, right?
Or place ads there.” And he said, “Well, let me just tell you the truth.
We’re going to place ads there for a little bit, we’re going to drop
cookies, we’re going to figure out who your readers are, we’re going to
find out what other websites they go to that are way cheaper than your
website and then we’re gonna pull our ads from your website and move
them there.”
The current web advertising system is based on paying publishers
less, charge brands more. Revenue share for legit publishers is at 30
to 40 percent according to the Association of National Advertisers.
But all revenue split numbers are wrong because undetected fraud ends up
in the ‘publisher’ share.
When your model is based on data leakage, on catching valuable
eyeballs on cheap sites, the inevitable overspray is fraud.
People aren’t even paying attention to what could be the biggest
form of adfraud.
Part of the conventional wisdom on adfraud is that you can beat it by
tracking users all the way to a sale, and filter the bots out that way.
After all, if they made a bot good enough to actually buy stuff it
wouldn’t be a problem for the client.
But the attribution models that connect impressions to sales are,
well, they’re hard enough to understand that most of the people who
understand them are probably fraud hackers.
The dispute betwen Steelhouse and Criteo settled last year, so we
didn’t get to see how two real adtech companies might or might not have
been hacking each other’s attribution numbers.
But today we have another chance.
I used to work for Linux Journal, and we followed the SCO case pretty
intently. There was even a dedicated news site just about the case,
called Groklaw. If there’s a case that needs a Groklaw for web
advertising, it’s Uber v. Fetch.
This is the closest we have to a tool to help us understand
attribution fraud. When the bad guys have the ability to make bogus ads
claim credit for real sales, that’s a much more powerful motivation for
fraud than just making a bot that looks like a real user watching a
video.
Legit publishers have a real incentive to find and control adfraud.
Adtech intermediaries, not so much. That’s because the core value of ad
tech is to find the big money user at the cheapest possible site. If you
create that kind of industry, you create the incentive for fraud bots
who appear to be members of a valuable audience. You create incentives
to produce fraudulent sites because all of a sudden, those kinds of
sites have market value that they would not otherwise have had because
of data leakage.
As browsers and sites implement user norms on tracking, they get
fraud protection for free.
So where is the outrage on adfraud?
I thought I could write a script for a heist movie about
adfraud.
At first I thought, this is awesome! Computer hacking, big
corporations losing billions of dollars—should be a formula for an
awesome heist movie, right?
Every heist movie has a bunch of scenes that introduce the
characters, you know, getting the crew together. Forget it. All the
parts of adfraud can be done independently and connected on the free
market. It’s all on a bunch of dumb-looking PHP web boards. There go a
whole bunch of great scenes.
Hard-boiled detectives trying to catch the gang? More like over easy.
The adtech industry “committed $1.5 million in funding” (and set up a
24-member committee!) to fight an eleven billion dollar problem. Adfraud
isn’t taking candy from a baby, it’s taking candy from a dude whose job
is giving away candy. More fraud means more money for adtech
intermediaries.
Dramatic risk of getting caught? Not a chance of going to prison—the
worst that happens is that some of the characters get their accounts or
domains banned, and they have to make new ones. The adfraud movie’s
production designer is going to have to work awful hard to make that
“Access denied” screen look cool enough to keep the audience awake.
So the movie idea is a no-go, but as people learn that today’s web
ads don’t just leave the publisher with 30 percent but also feed fraud,
we should see a flight to quality effect.
The technical decisions that enabled the Lumascape to rip off Walt
Mossberg are the same decisions that facilitate fraud, are the same
decisions that make users come looking for tracking protection.
I said I was an advertising optimist and here’s why.
The tracking protection trend is splitting web advertising.
We have the existing high-tracking, high-fraud market and a new
low-tracking opportunity.
Some users are getting better protected from cross-site tracking.
The bad news is that it will be harder to serve those users a
lucrative ad enabled by third-party tracking data.
The good news is that those users can’t be tracked from high-value to
low-value sites. Those users start to become possible to tell apart from
fraudbots.
For that subset of users, web advertising starts to shift from a
hacking game to a reputation game.
In order to sell advertising you need to give the advertiser some
credible information on who the audience is. Most browsers have been bad
at protecting personal information about the user, so web advertising
has become a game where a whole bunch of companies compete to covertly
capture as much user info as they can.
But some browsers are getting better at implementing people’s
preferences about sharing their information. The result, for those
users, is a change in the rules of the game. Investment in taking
people’s personal info is becoming less rewarding, as browsers compete
to reflect people’s preferences.
And investments in building sites and brands that are trustworthy
enough for people to want to share their information will tend to become
more rewarding. This shift naturally leads to complaints from people who
are used to winning the old game, but will probably be better for
customers who want to use trustworthy brands and for people who want to
earn money by making ad-supported news and cultural works.
There are people building a new web advertising system around
user-permissioned information, and they’ve been doing it for a long
time. But until now, nobody really wants to deal with them, because
adtech is just selling that information taken from the user without
permission. Tracking protection will be the motivation for
forward-thinking brand people to catch the flight to quality and shift
web ad spending from the hacking game to the reputation game.
Now that we have better understanding of how user norms are aligned
with the interests of independent browsers and with the interests of
high-reputation sites, what’s next?
Measure the tracking-protected audience
Legit sites are in a strong position to gather some important data
that will shift web ads from a hacking game to a reputation game. Let’s
measure the tracking-protected audience.
Tracking protection is a powerful sign of a human audience. A legit
site can report a tracking protection percentage for its audience, and
any adtech intermediary who claims to offer advertisers the same
audience, but delivers a suspiciously low tracking protection number, is
clearly pushing a mismatched or bot-heavy audience and is going to have
a harder time getting away with it.
Showing prospective advertisers your tracking protection data lets
you reveal the tarnish on the adtech “Holy Grail”—the promise of
high-value eyeballs on crappy sites.
You can’t sell advertising without data on who the audience is. Much
of that data will have to come from the tracking-protected audience.
When quality sites share tracking protection data with advertisers, that
helps expose the adfraud that intermediaries have no incentive to track
down.
This is an opportunity for service journalism.
Users are already concerned and confused about web ads. That’s an
opportunity that some legit sites such as the Wall Street Journal and
The New York Times are already taking advantage of. The more that
someone learns about how web advertising works, the more that he or she
is motivated to get protected.
But if you don’t talk to your readers about tracking protection, who
will?
A lot of people are getting caught up today in publisher-hostile
schemes such as adblockers with paid whitelisting, or adblockers that
come with malware or adware.
If you don’t recommend a publisher-friendly protection tool or
setting, they’ll get a bad one from somewhere else.
I really like ads.
At the airport on the way here I saw that they just came out with a
hardcover
collection of the complete Kurt Vonnegut stories. A lot of those
stories were paid for by Collier’s ads run in the 1950s, and we’re still
getting the positive extenalities from that advertising today.
Advertising done right can be a growth spiral of growth spiral of
economic growth, reputation building, and creation of cultural works.
It’s one of the most powerful forces to produce news, entertainment
goods, fiction. Let’s fix it.
One of the problems with a bug futures
market is: where do you get the initial investment, or “stake”, for
a developer who plans to take on a high-value task?
In order to buy the FIXED side of a contract and make a profit when
it matures, the developer needs to invest some cryptocurrency. In a bug
futures market, it takes money to make money.
One possible solution is to use personal tokens, such as the new Evancoin. Evancoin is backed by hours
of work performed by an individual (yes, his name is Evan).
If I believe that n hours of work from Evan are likely to
increase the probability of a Bugmark-traded bug getting fixed, and my
expected gain is greater than n * (current price of Evancoin),
then I can
buy the FIXED side of the Bugmark contract
redeem n Evancoin for work from Evan on the bug
sell my Bugmark position at a profit, or wait for it to
mature.
Evan is not required to accept cryptocurrency exchange rate risk, and
does not have to provide the “stake” himself. It’s the opposite—he has
already sold the Evancoin on an exchange. Of course, he has an incentive
to make as much progress on the bug as possible, in order to support the
future price of Evancoin.
If Evan is working on the bug I selected, he would also know that
he’s doing work that is likely to move the price of the Bugmark
contract. So he can use some of the proceeds from his Evancoin sale to
buy additional FIXED on Bugmark, and take a profit when I do.
Evan’s skills tends to improve, and my understanding of which tasks
would be a profitable use of Evan’s time will tend to increase the more
Evancoin I redeem. So the value of Evancoin to me is likely to continue
rising. Therefore I am probably going to do best if I accumulate
Evancoin in advance of identifying good bugs for Evan to work on.
When you release open source software, you have this egalitarian idea
that you’re making it available to people who can really use it, who can
then built on it to make amazing things….While this is a fine position
to take, consider who has the most resources to build on top of a
project that requires development. With most licenses, you’re issuing a
free pass to corporations and other wealthy organizations, while
providing no resources to those needy users. OpenSSL, which every major
internet company depends on, was until recently receiving just $2,000 a
year in donations, with the principal author in financial
difficulty.
This is a good example of one of the really interesting problems of
working in an immature industry. We have a similar
problem in web advertising. We’re over-rewarding the ability to collect
numbers that show the effectiveness of a marketing project, while
under-rewarding the ability to build brand reputation. Web ads also have
an opportunity to fix incentives. We don’t have our incentives
hooked up right yet.
Why does open source have some bugs that stay open longer than
careers do?
Why do people have the I’ve been coding to create lots of
value for big companies for years and I’m still broke
problem?
Why is the meritocracy of open source even more biased
than other technical and collaborative fields? (Are we at the bottom of
the standings?) Why are we walking away from that many potential
contributors?
It is to the benefit of software companies and programmers to claim
that software as we know it is the state of nature. They can do stupid
things, things we know will result in software vulnerabilities, and they
suffer no consequences because people don’t know that software could be
well-written. Often this ignorance includes developers themselves. We’ve
also been conditioned to believe that software rots as fast as fruit.
That if we waited for something, and paid more, it would still stop
working in six months and we’d have to buy something new. The cruel
irony of this is that despite being pushed to run out and buy the latest
piece of software and the latest hardware to run it, our infrastructure
is often running on horribly configured systems with crap code that
can’t or won’t ever be updated or made secure.
We have two possible futures.
People finally get tired of software’s boyish
antics lethal irresponsibility, and impose a regulatory regime.
Rent-seekers rejoice. Software innovation as we know it ceases, and we
get something like the pre-breakup Bell System—you have to be an insider
to build and deploy anything that reaches real people.
The software scene outgrows the “disclaimer of implied warranty”
level of quality, on its own.
How do we get to the second one? One approach is to use market
mechanisms to help quantify software risk, then enable users with a
preference for high quality and developers with a preference for high
quality to interact directly, not through the filter of software
companies that win by releasing early at a low quality level.
There is an opportunity here for the kinds of companies that are now
doing open source license analysis. Right now they’re analyzing
relatively few files in a project—the licenses and copyrights. A tool
will go through your software stack, and hooray, you don’t have anything
that depends on something with a consistent license, or on a license
that would look bad to the people you want to see your company to.
What if that same tool would give you a better quality number for
your stack, based on walking your dependency tree and looking for weak
points based on market activity?
What to do about different kinds of user data interchange:
Data collected without permission
Data collected with permission
Good data
Build tools and norms to reduce the amount of reliable data that is
available without permission.
Develop and test new tools and norms that enable people to share data
that they choose to share.
Bad data
Report on and show errors in low-quality data that was collected without
permission.
Offer users incentives and tools that help them choose to share accurate
data and correct errors in voluntarily shared data.
Most people who want data about other people still prefer data that’s
collected without permission, and collaboration is something that
they’ll settle for. So most voluntary user data sharing efforts will
need a defense side as well. Freedom-loving technologists have to help
people reduce the amount of data that they allow to be taken from them
without permission in order for data listen to people about sharing
data.
(I work for Mozilla. None of this is secret. None of this is
official Mozilla policy. Not speaking for Mozilla here.)
Setting tracking protection defaults for a browser is hard. Some
activities that the browser might detect as third-party tracking are
actually third-party services such as single sign-on—so when the browser
sets too high of a level of protection it can break something that the
user expects to work.
Meanwhile, new
research from Pagefair shows that The very large majority (81%)
of respondents said they would not consent to having their behaviour
tracked by companies other than the website they are visiting. A
tracking protection policy that leans too far in the other direction
will also fail to meet the user’s expectations.
So you have to balance two kinds of complaints.
“your dumbass browser broke a site that was working
before”
“your dumbass browser let that stupid site do stupid
shit”
Maybe, though, if the browser can figure out which sites the user
trusts, you can keep the user happy by taking a moderate tracking
protection approach on the trusted sites, and a more cautious approach
on less trusted sites.
If the user has not interacted with example.com in the last 30 days,
example.com website data and cookies are immediately purged and continue
to be purged if new data is added. However, if the user interacts with
example.com as the top domain, often referred to as a first-party
domain, Intelligent Tracking Prevention considers it a signal that the
user is interested in the website and temporarily adjusts its behavior
(More…)
But it looks like this could give large companies an advantage—if the
same domain has both a service that users will visit and third-party
tracking, then the company that owns it can track users even on sites
that the users don’t trust. Russell Brandom: Apple’s
new anti-tracking system will make Google and Facebook even more
powerful.
It might makes more sense to set the trust level, and the browser’s
tracking protection defaults, based on which site the user is on. Will
users want a working “Tweet® this story” button on a news site they
like, and a “Log in with Google” feature on a SaaS site they use, but
prefer to have third-party stuff blocked on random sites that they
happen to click through to?
How should the browser calculate user trust level? Sites with
bookmarks would look trusted, or sites where the user submits forms
(especially something that looks like an email address). More testing is
needed, and setting protection policies is still a hard problem.
The investigative journalism organization ProPublica is teaming
up with three German news sites to collect political ads on Facebook
in advance of the German parliamentary election on Sept. 24.
Because typical Facebook ads are shown only to finely targeted
subsets of users, the best way to understand them is to have a variety
of users cooperate to run a client-side research tool. ProPublica
developer Jeff
Larson has written a WebExtension, that runs on Mozilla Firefox and
Google Chrome, to do just that. I asked him how the development
went.
Q: Who was involved in developing your
WebExtension?
A: Just me. But I can’t take credit for the idea. I was at a
conference in Germany a few months ago with my colleague Julia Angwin,
and we were talking with people who worked at Spiegel about our
work on the Machine Bias
series. We all thought it would be a good idea to look at political
ads on Facebook during the German election cycle, given what little we
knew about what happened in the U.S. election last year.
Q: What documentation did you use, and what would you
recommend that people read to get started with
WebExtensions?
A: I think both Mozilla and Google’s documentation sites are great. I
would say that the tooling for Firefox is much better due to the web-ext
tool. I’d definitely start there (Getting
started with web-ext) the next time around.
Basically, web-ext takes care of a great deal of the fiddly bits of
writing an extension—everything from packaging to auto reloading the
extension when you edit the source code. It makes the development
process a lot more smooth.
Q: Did you develop in one browser first and then test in the
other, or test in both as you went along?
A: I started out in Chrome, because most of the users of our site use
Chrome. But I started using Firefox about halfway through because of
web-ext. After that, I sort of ping ponged back and forth because I was
using source maps and eachbrowser
handles those a bit differently. Mostly the extension worked pretty
seamlessly across both browsers. I had to make a couple of changes but I
think it took me a few minutes to get it working in Firefox, which was a
pleasant surprise.
Q: What are you running as a back end service to collect ads
submitted by the WebExtension?
A: We’re running a Rust server that collects the ads and uploads
images to an S3 bucket. It is my first Rust project, and it has some
rough edges, but I’m pretty much in love with Rust. It is pretty
wonderful to know that the server won’t go down because of all the built
in type and memory safety in the language. We’ve open sourced the
project, I could use help if anyone wants to contribute: Facebook
Political Ad Collector on GitHub.
Q: Can you see that the same user got a certain set of ads,
or are they all anonymized?
A: We strive to clean the ads of all identifying information. So, we
only collect the id of the ad, and the targeting information that the
advertiser used. For example, people 18 to 44 who live in New York.
Q: What are your next steps?
A: Well, I’m planning on publishing the ads we’ve received on a web
site, as well as a clean dataset that researchers might be interested
in. We also plan to monitor the Austrian elections, and next year is
pretty big for the U.S. politically, so I’ve got my work cut out for
me.
Q: Facebook has refused to release some “dark” political ads
from the 2016 election in the USA. Will your project make “dark” ads in
Germany visible?
A: We’ve been running for about four days, and so far we’ve collected
300 political ads in Germany. My hope is we’ll start seeing some of the
more interesting ones from fly by night groups. Political advertising on
sites like Facebook isn’t regulated in either the United States or
Germany, so on some level just having a repository of these ads is a
public service.
Q: Your project reveals the “dark” possibly deceptive ads in
Chrome and Firefox but not on mobile platforms. Will it drive deceptive
advertising away from desktop and toward mobile?
A: I’m not sure, that’s a possibility. I can say that Firefox on
Android allows WebExtensions and I plan on making sure this extension
works there as well, but we’ll never be able to see what happens in the
native Facebook applications in any sort of large scale and systematic
way.
Q: Has anyone from Facebook offered to help with the
project?
A: Nope, but if anyone wants to reach out, I would love the help!
Question about A Trading
Market for Prices in Peer Production: what’s the difference between
a futures market on software bugs and an open source bounty system
connected to the issue tracker? In many simple cases a bug futures
market will function in a similar way, but we predict that some
qualities of the futures market will make it work differently.
Open source bounty systems have extra transaction costs of
assigning credit for a fix.
Open source bounty systems can incentivize contention over who
can submit a complete fix, when we want to be able to incentivize
partial work and meta work.
Incentivizing partial work and meta work (such as bug triage) would
be prohibitively expensive to manage using bounties claimed by
individuals, where each claim must be accepted or rejected. The bug
futures concept addresses this with radical simplicity: the owners of
each side of the contract are tracked completely separately from the
reporter and assignee of a bug in the bug tracker.
And bug futures contracts can be traded in advance of expiration.
Any work that you do that meaningfully changes the probability
of the bug getting fixed by the contract closing date can move the
price.
You might choose to buy the “fixed” side of the contract, do some
work that makes it look more fixable, sell at a higher price. A futures
market might make it practical to do “day trading” of small steps, such
as translating a bug report originally posted in a language that the
developers don’t know, helping a user submit a log file, or writing a
failing test.
With the right market design, participants in a bug futures market
have the incentive to talk
their books, by sharing partial work and metadata.
(Updated 10 Nov 2022: ended up actually removing Twitter tracking
script. Still not worried about Twitter, too much, just worried about
whoever ends up with their assets.)
(Updated 4 Sep 2017: add screenshot and how to see the
warning.)
Advice from yan, on Twitter:
if your site embeds tweets, add <meta name="twitter:dnt"
content="on"> so that Twitter doesn't track your visitors — yan
(@bcrypt)
September
1, 2017
I decided not to do that for this site.
Yes, user tracking is creepy, and yes, collecting user information
without permission is wrong. But read on for what could be a better
approach for sites that can make a bigger difference.
First of all, Twitter is so far behind in their attempts to do
surveillance marketing that they’re more funny and heartening than
ominous. If getting targeted by one of the big players is like getting
tracked down by a pack of hunting dogs, then Twitter targeting is like
watching a puppy chew on your sock. Twitter has me in their database
as…
Owner of eight luxury cars and a motorcycle.
Medical doctor advising patients about eating High Fructose Corn
Syrup.
Owner of prime urban real estate looking for financing to build a
hotel.
Decision-maker for a city water system, looking to read up on the
pros and cons of cast iron and concrete pipes.
Active in-market car shopper, making all decisions based on
superficial shit like whether the car has Beats® brand speakers in the
doors. (Hey, where am I supposed to park car number 9?)
Advice from “me” as I appear on Twitter: As your
doctor, I advise you to cut out HFCS entirely unless you’re at a family
thing where you should just eat a little and not be an ass about it.
When you’re in town, stay at my hotel, where the TV is a 4k monitor on
an arm that moves to make it usable from the sit-stand desk, and the
WiFi is fast and free. No idea on the city water pipe thing
though.
Funny wrong Twitter ad targeting is one of my reliable Internet
amusements for the day. But that’s not why I’m not especially concerned
with tagging quoted Tweets. Just doing that doesn’t protect this site’s
visitors from retargeting schemes on other sites.
And every time someone clicks on a retargeted ad from a local
business on a social site (probably Facebook, since more people spend
more time there) then that’s 65 cents or whatever of marketing money
that could have gone to local news, bus benches, Little League, or some
other sustainable, signal-carrying
marketing project. (That’s not even counting the medium
to heavy treason angle that makes me really uncomfortable about
seeing money move in Facebook’s direction.)
So, instead of messing with quoted Tweet tagging, I set up this
script:
This will load the Aloodo third-party tracking
detector, and, if the browser shows up as easily trackable from site
to site, switch out the page header to nag the user.
screenshot of tracking
warning
(If you are viewing this site from an unprotected browser and still
not seeing the warning, it means that your browser has not yet visited
enough domains with the Aloodo script to detect that you’re trackable.
Take a tracking protection test
to expose your browser to more fake tracking, then try again.)
If the other side wants it hidden, then reveal it
Surveillance marketers want tracking to happen behind the scenes, so
make it obvious. If you have a browser or privacy tool that you want to
recommend, it’s easy to put in the link. Every retargeted ad impression
that’s prevented from happening is more marketing money to pay for
ad-sponsored resources that users really want. I know I can’t get all
the users of this site perfectly protected from all surveillance
marketing everywhere, but hey, 65 cents is 65 cents.
Bonus tweet
Bob Hoffman’s new book is out! Go click on this quoted Tweet, and do
what it says.
Today's Newsletter (special clickbait edition) GUNFIGHT AT THE AD TECH
SALOON https://t.co/OI9JXs1hbM
What’s the difference between a futures market on software
bugs and a prediction market? We don’t know how much a bug
futures market will tend to act like a prediction market, but here are a
few guesses about how it may turn out differently.
Prediction markets tend to have a relatively small number of
tradeable questions, with a large number of market participants on each
side of each question. Each individual bug future is likely to have a
small number of participants, at least on the “fixed” side.
Prediction markets typically have participants who are not in a
position to influence the outcome. For example, The Good
Judgment Project recruited regular people to trade on worldwide
events. Bug futures are designed to attract participants who have
special knowledge and ability to change an outcome.
Prediction markets are designed for gathering knowledge. Bug futures
are for incentivizing tasks. A well-designed bug futures market will
monetize haters by turning a “bet” that a project will
fail into a payment that makes it more likely to succeed. If successful
in this, the market will have this feature in common with Alex
Tabarrok’s Dominant
Assurance Contract.
Prediction markets often implement conditional trading. Bug markets
rely on the underlying bug tracker to maintain the dependency
relationships among bugs, and trades on the market can reflect the
strength of the connections among bugs as seen by the participants.
What’s the difference between spam and real advertising?
No signaling
Signaling
Interruption
spam
advertising
No interruption
organic social
content marketing
Advertising is a signal for attention bargain. People pay attention
to advertising that carries some hard-to-fake information about the
seller’s intentions in the market.
Rory
Sutherland says,What seems undoubtedly true is that humans, like
peahens, attach significance to a piece of communication in some way
proportionally to the cost of generating or transmitting it.
If I get spam email, that’s clearly signal-free because it costs
practically nothing. If I see a magazine ad, it carries signal because I
know that it cost money to place.
Today’s web ads are more like spam, because they can be finely
targeted enough that no significant advertiser resources stand behind
the message I’m looking at. (A bot might have even written the copy.)
People don’t have to be experts in media buying to gauge the relative
costs of different ads, and filter out the ones that are clearly
micro-targeted and signal-free.
Our data, and data about us, is the crude that Facebook and Google
extract, refine and sell to advertisers. This by itself would not be a
Bad Thing if it were done with our clearly expressed (rather than merely
implied) permission, and if we had our own valves to control personal
data flows with scale across all the companies we deal with, rather than
countless different valves, many worthless, buried in the settings pages
of the Web’s personal data extraction systems, as well as in all the
extractive mobile apps of the world.
Today’s web advertising business is a hacking contest.
Whoever can build the best system to take personal information from the
user wins, whether or not the user knows about it. (And if you challenge
adfraud and adtech hackers to a hacking contest, you can expect to come in
third.)
As users get the tools to control who they share their information
with (and they don’t want to leak it to everyone) then the web
advertising business has to transform into a reputation
contest. Whoever can build the most trustworthy place for users to
choose to share their information wins.
This is why the
IAB is freaking out about privacy regulations, by the way. IAB
member companies are winning at hacking and failing at building
reputation. (I want to do a user focus group where we show people a
random IAB company’s webinar, then count how many participants ask for
tracking protection support afterward.) But regulations are a sideshow.
In the long run regulators
will support the activities that legit business needs. So Doc has an
important point. We have a big opportunity to rebuild important parts of
the web advertising stack, this time based on the assumption that you
only get user data if you can convince the user, or at least convince
the maintainers of the user’s trusted tools, that you will use the data
in a way that complies with that user’s norms.
One good place to check is: how many of a site’s readers are set up
with protetcion tools that make them “invisible” to Google Analytics and
Chartbeat? (script) And
how many of the “users” who sites are making decisions for are just
bots? If you don’t have good answers for those, you get dumbassery like
“pivot
to video” which is a polite expression for “make videos for bots
because video ad impressions are worth enough money to get the best bot
developers interested.”
Yes, “pivot to video” is still a thing, even though
Google is issuing refunds for ads that ran on websites with fake
traffic…
…
Google’s refunds amount to only a fraction of the cost of the ads
served to invalid traffic, which has left some advertising executives
unsatisfied…
…
In the recent cases Google discovered, the affected traffic involved
video ads, which carry higher ad rates than typical display ads and are
therefore an attractive target for fraudsters.
(read
the whole thing. If we’re lucky, Bob Hoffman will blog about
that story. “Some advertising executives unsatisfied”? Gosh, Bob, you
think so?)
The good news here is that legit publishers, trying to transform web
advertising from a hacking game into a reputation game, don’t have to do
a perfect job right away. Incrementally make reputation-based,
user-permissioned advertising into a better and better investment, while
adfraud keeps making unpermissioned tracking into a worse and worse
investment. Then wait for some ambitious marketer (and marketers are
always looking for a new angle to reinvent Marketing) to discover the
opportunity and take credit for it.
(update 13 Mar 2020: Remove link to the old Self-Destructing
Cookies extension and add a link to the new Cookie AutoDelete extension.
Both do about the same thing but Cookie AutoDelete is maintained on
current browsers.)
User privacy is at risk from both hackers and lawyers. Right now,
lawyers are better at attacking lists, and hackers are better at
modifying tracker behavior to get around protections.
The more I think about it, the more that I think it’s
counterproductive to try to come up with one grand unified set of
protection rules or cookie policies for everybody.
Spam filters don’t submit their scoring rules to ANSI—spammers would
just work around them.
Search engines don’t standardize and publish their algorithms,
because gray hat SEOs would just use the standard to make useless word
salad pages that score high.
And different people have different needs.
If you’re a customer service rep at an HERBAL ENERGY SUPPLEMENTS
company, you need a spam filter that can adjust for your real mail. And
any user of a site that has
problems with list-based tracking protection will need to have the
browser adjust, and rely more on cleaning up third-party state after a
session instead of blocking outright.
Does your company intranet become unusable if you fail to accept
third-party tracking that comes from an internal domain that your
employer acquired and still has some services running on? Browser
developers can’t decide up front, so the browser will need to adjust. Every change breaks someone’s
workflow.
That means the browser has to work to help the user pick a working
set of protection methods and rules.
This will need to include both list-based protection and monitoring
tracking behavior, like Privacy Badger, because
hackers and lawyers are good at getting around different ones.
2. Limit data sent to third-party sites
Apple Safari does this, so it’s likely to get easier to do cookie double
keying without breaking sites.
3. Scramble or delete unsafe data
If a tracking cookie or other identifier does get through, delete or
scramble it on leaving the site or later, as the Cookie
AutoDelete extension does. This could be a good backup for when the
browser “learns” that a user needs some third-party state to do
something like a shopping cart or comment form, but then doesn’t want
the info to be used for “ads that follow me around” later.
The best answer for now is 31%. Of the clients that ran JavaScript on
this site over the past two weeks, 31% did not also run JavaScript from
the Aloodo “fake third-party tracker”.
All of the conversations on the newspaper side have been focused on
how can we join the advertising technology ecosystem. For example, how
can a daily newspaper site in Bismarck, North Dakota deliver targeted
advertising to a higher-value soccer mom? And none of the newspapers
them have considered the fact that when they join that ecosystem they
are enabling spam sites, fraudulent sites – enabling those sites to get
a higher CPM rate by parasitically riding on the data collected from the
higher-value newspaper sites.
The field of Search Engine Optimization has white hat SEO, black hat
SEO, and gray hat SEO.
White hat SEO helps a user get a better search
result, and complies with search engine policies. Examples include
accurately using the same words that users search on, and getting honest
inbound links.
Black hat SEO is clearly against search engine
policies. Link farming, keyword stuffing, cloaking, and a zillion other
schemes. If they see you doing it, your site gets penalized in search
results.
Gray hat SEO is everything that doesn’t help the
user get a better search result, but technically doesn’t violate a
search engine policy.
Most SEO experts advise you not to put a lot of time and effort into
gray hat, because eventually the search engines will notice your gray
hat scheme and start penalizing sites that do it. Gray hat is just stuff
that’s going to be black hat when the search engines figure it out.
This scheme seems to be intended to get around existing third-party
cookie protection, which is turned on by default in Apple Safari and
available in other browsers.
But how long will it work?
Maybe the browser of the future won’t run a “kangaroo cookie court”
but will ship with a built-in “kangaroo law school” so that each copy of
the browser will develop its own local “courts” and its own local “case
law” based on the user’s choices. It will become harder to predict how
long any single gray hat adtech scheme will continue working.
In the big picture: in order to sell advertising you need to give the
advertiser some credible information on who the audience is. Since the
“browser wars” of the 1990s, most browsers have been bad at protecting
personal information about the user, so web advertising has become a
game where a whole bunch of companies compete to covertly capture as
much user info as they can.
Today, browsers are getting better at implementing people’s
preferences about sharing their information. The result is a change in
the rules of the game. Investment in taking people’s personal info is
becoming less rewarding, as browsers compete to reflect people’s
preferences. (That patent will be irrelevant thanks to browser updates
long before it expires.)
Adfraud is the other half of this story.
Fraudbots are getting smarter at creating human-looking ad impressions
just as humans are getting better protected. If you think that a web
publisher’s response to harder-to-detect bots, viewing more high-CPM
video ads, should be “pivot to video!!1!!” I don’t know if I can help
you.
And investments in building sites and brands that are trustworthy
enough for people to want to share their information will tend
to become more rewarding. (This shift naturally leads to complaints
from people who are used to winning the old game, but will probably
be better for customers who want to use trustworthy brands and for
people who want to earn money by making ad-supported news and cultural
works.)
As far as I know, there are three ways to match an ad to a user.
User intent: Show an ad based on what the user is
searching for. Old-school version: the Yellow Pages.
Context: Show an ad based on where the user is, or
what the user is interested in. Old-school versions: highway billboards
(geographic context), specialized magazines (interest context).
User identity: Show an ad based on who the user is.
Old-school version: direct mail.
Most online advertising is matched to the user based on a mix of all
three. And different players have different pieces of the action for
each one. For user intent, search engines are the
gatekeepers. The other winners from matching ads to users by intent are
browsers and mobile platforms, who get
paid to set their default search engine. Advertising based on
context rewards the owners of reputations for producing
high-quality news, information, and cultural works. Finally,
user identity now has a whole Lumascape
of vendors in a variety of categories, all offering to help identify
users in some way. (the Lumascape is rapidly consolidating, but that’s
another story.)
Few of the web ads that you might see today are matched to you purely
based on one of the three methods. Investments in all three tend to
shift as the available technology, and the prevailing norms and laws,
change.
The basic functionality of the internet, which is built on data
exchanges between a user’s computer and publishers’ servers, can no
longer be used for the delivery of advertising unless the consumer
agrees to receive the ads – but the publisher must deliver content to
that consumer regardless.
This doesn’t look accurate. I don’t know of any proposal that would
require publishers to serve users who block ads entirely. What
Rothenberg is really complaining about is that the proposed regulation
would limit the ability of sites and ad intermediaries to match ads to
users based on user identity, forcing them to rely on
user intent and context. If users
choose to block ads delivered from ad servers that use their personal
data without permission, then sites won’t be able to refuse to serve
them the content, but will be able to run ads that are relevant to the
content of the site. As far as I can tell, sites would still be able to
pop a “turn off your ad blocker” message in place of a news story if the
user was blocking an ad placed purely by context, magazine style.
Privacy regulation is not so much an attack on the basic
functionality of the Internet, as it is a shift that lowers the
return on investment on knowing who the user is, and drives up the
return on investment on providing search results and content. That’s a
big change in who gets paid: more money for search and for trustworthy
content brands, and less for adtech intermediaries that depend on user
tracking.
Advertising: a fair deal for the user?
That depends. Search advertising is clearly the result of a user
choice. The user chooses to view ads that come with search results, as
part of choosing to do a search. As long as the ads are marked as ads,
it’s pretty obvious what is happening.
The same goes for ads placed in context. The advertiser trades
economic signal, in the form of costly support of an ad-supported
resource, for the user’s attention. This is common in magazine and
broadcast advertising, and when you use a site with one of the (rare)
pure in-context ad platforms such as Project
Wonderful, it works about the same way.
The place where things start to get problematic is ads based on user
identity, placed by tracking users from site to site. The more that
users learn how their data is used, the less tracking they tend to want.
In one
survey, 66% of adult Americans said they do not want marketers to
tailor advertisements to their interests, and when the researchers
explained how ad targeting works, the percentage went up.
If users, on average, dislike tracking enough that sites choose to
conceal it, then that’s pretty good evidence that sites should probably
ask for permission to do it. Whether this opt-in should be enforced by
law, technology, or both is left as an exercise for the reader.
So what happens if, thanks to new regulations, technical improvements
in browsers, or both, cross-site tracking becomes harder? Rothenberg
insists that this transformation would end ad-supported sites, but the
real effects would be more complex. Ad-supported sites are already
getting a remarkably lousy share of ad budgets. “The supply chain’s
complexity and opacity net digital advertisers as little as 30 cents to
40 cents of working media for every dollar spent,” ANA
CEO Bob Liodice said.
Advertising on high-reputation sites tends
to be a better investment than using highly intermediated,
fraud-prone, stacks of user tracking to try to chase good users to cheap
sites. But crap ad inventory, including fraudulent and brand-unsafe
stuff, persists. The crap only has market value because of user
tracking, and it drives down the value of legit ads. If browser
improvements or regulation make knowledge of user
identity rarer, the crap tends to leave the market and the
value of user intent and context go
up.
Rothenberg speaks for today’s adtech, which despite all its acronyms
and Big Data jive, is based on a pretty
boring business model: find a user on a legit site, covertly follow
the user to a crappy site where the ads are cheaper, sell an ad
impression there, profit. Of course he’s entitled to make the case for
enabling IAB members to continue to collect their “adtech tax.” But
moving ad budgets from one set of players to another doesn’t end
ad-supported sites, because marketers adjust. That’s what they do.
There’s always something new in marketing, and budgets move around. What
happens when privacy regulations shift the incentives, and make more of
advertising have to depend on trustworthy content? That’s the real
question here.
Moral values in society are collapsing? Really? Elizabeth Stoker
Bruenig writes,
The baseline moral values of poor people do not, in fact, differ that
much from those of the rich. (read the whole thing).
Unfortunately, if you read the fine print, it’s more complicated than
that. Any market economy depends on establishing trust between people
who trade with each other. Tim Harford writes,
Being able to trust people might seem like a pleasant luxury, but
economists are starting to believe that it’s rather more important than
that. Trust is about more than whether you can leave your house
unlocked; it is responsible for the difference between the richest
countries and the poorest.
Somehow, over thousands of years, business people have built up a set
of norms about high-status and low-status business activities.
Craftsmanship, consistent supply of high-quality staple goods, and
construction of noteworthy projects are high-status activities. Usury
and deception
are examples of low-status activities. (You make your money in quarters,
gambling with retired people? You lend people $100 until Friday at a
300% interest rate? No club invitation for you.)
Somehow, though, that is now changing in the USA. Those who earn
money through deception now have seats at the same table as legitimate
business. Maybe it started with the shift into “consumer
credit” by respectable banks. But why were high-status bankers
willing to play loan shark to begin with? Something had to have been
building, culturally. (It started too early to blame the Baby
Boomers.)
We tend to blame information technology companies for complex,
one-sided Terms of Service and EULAs, but it’s not so much a tech trend
as it is a general business culture trend. It shows up in tech fast,
because rapid technology change provides cover and concealment for
simultaneous changes in business terms. US business was rapidly losing
its connection to basic norms when it was still moving at the speed of
FedEx and fax. (You can’t say, all of a sudden, “car crashes in existing
fast-food drive-thrus are subject to arbitration in Unfreedonia” but you
can stick that kind of term into a new service’s ToS.) There’s some kind
of relativistic effect going on. Tech bros just seem like bigger
douchebags because they’re moving faster.
Regulation isn’t the answer. We have a system in which business
people can hire lobbyists to buy the laws and regulations we want. The
question is whether we’re going to use our regulatory capture powers in
a shortsighted, society-eroding hustler way, or in a conservative way.
Economic conservatism means not just limiting centralized state control
of capital, but preserving the balance among all the long-standing
stewards of capital, including households, municipalities, and religious
and educational institutions. Economic conservatism and radical
free-marketism are fundamentally different.
People blame trashy media for the erosion of norms among the poor, so
let’s borrow that explanation for the erosion of norms among the rich as
well. Maybe our problem with business norms results from the
globablization and sensationalism of business media. Joe CEO isn’t just
the most important corporate leader of Mt. Rose, MN, any more—on a
global scale he’s just another broke-ass hustler.
One of the common oversimplifications in discussing open-source
software licenses is that copyleft licenses are “idealistic” while
non-copyleft licenses are “pragmatic.” But that’s not all there is to
it.
Instead of treating the downstrem developer’s employer as a hive
mind, it can be more producive to assume
good faith on the part of the individual who intends to contribute
to the software, and think about the license from the point of view of a
real person.
Releasing source for a derivative work costs time and money. The
well-intentioned “downstream” contributor wants his or her organization
to make those investments, but he or she has to make a case for them.
The presence of copyleft helps steer the decision in the right
direction. Jane Hacker at an organization planning to release a
derivative work can say, matter-of-factly, “we need to comply with the
upstream license” if copyleft is involved. The organization is then more
likely to do the right thing. There are always violations, but the
license is a nudge in the right direction.
(The extreme case is university licensing offices. University-owned
software patents can exclude a graduate student from his or her own
project when the student leaves the university, unless he or she had the
foresight to build it as a derivative work of something under
copyleft.)
Copyleft isn’t a magic commons-building tool, and it isn’t right for
every situation. But it can be enough to push an organization over the
line. (One place where I worked had to a do a source release for one
dependency licensed under GPLv2, and it turned out to be easist to just
build one big source code release with all the dependencies in it, and
offer that.)
Least surprising news story ever: The
Campaign Against Facebook And Google's Ad "Duopoly" Is Going Nowhere
Independent online publishers can’t beat the big surveillance marketing
companies at surveillance marketing? How about they try to beat Amazon
and Microsoft at cloud services, or Apple and Lenovo at laptop
computers? There are possible winning strategies for web publishers, but
doing the same as the incumbents with less money and less data is not
one of them.
Meanwhile, from an investor point of view: It’s
the Biggest Scandal in Tech (and no one’s talking about it) Missing
the best investment advice: get out of any B-list adtech company that is
at risk of getting forced into a low-value acquisition by a sustained
fraud story. Or short it and research the fraud story yourself.
Apple’s
Upcoming Safari Changes Will Shake Up Ad Tech: Not surprisingly,
Facebook and Amazon are the big winners in this change. Most of their
users come every day or at least every week. And even the mobile users
click on links often, which, on Facebook, takes them to a browser. These
companies will also be able to buy ad inventory on Safari at lower
prices because many of the high-dollar bidders will go away. A good
start by Apple, but other browsers can do better. (Every click on a
Facebook ad from a local business is $0.65 of marketing money that’s not
going to local news, Little League sponsorships, and other legit
places.)
Just went to realfavicongenerator.net and
did what it says, and added the resulting images and markup to the
site.
That’s about it. Now there’s a little mouse in the browser tab (and
it should do the right thing with the icons if someone pins it to their
home screen on mobile.)
A lot of privacy people these days sound like a little kid arguing
with a sibling. You’re going to be in big trouble when Dad
gets home!
Dad, here, is the European Union, who’s going to put the General
Data Protection Regulation foot down, and then, oh, boy, those
naughty surveillance marketers are going to catch it, and wish that they
had been listening to us about privacy all along.
The problem is that perfectly normal businesses are using
GDPR-violating sneaky
tracking pixels and other surveillance marketing as part of their
daily marketing routine.
As the GDPR deadline approaches, surveillance marketers in
Europe are going to sigh and painstakingly explain to European
politicians that of course this GDPR thing isn’t going to work. “You
see, politicians, it’s an example of political overreach that completely
conflicts with technical reality.” European surveillance marketers will
use the same kind of language about GDPR that the freedom-loving side
used when we talked about the proposed CBDTPA.
It’s just going to Break the Internet! People will lose their jobs!
The result is predictable. GDPR will be delayed, festooned with
exceptions, or both, and the hoped-for top-down solution to privacy
problems will not come. There’s no shortcut. We’ll only get a
replacement for surveillance marketing when we build the tools, the
networks, the business processes, the customer/voter norms, and then the
political power.
Update 24 Aug 2017: How to get the big video file
from an Air Mozilla page.
Sign in if needed and go to the page with the video on
it.
Control-I to open the page info window.
Open the “Media” tab in the page info window, and find the item
with type “Video”.
Click “Save As” to save the video.
Got a big video, and want a copy of just the audio for listening on a
device with limited storage? Use Soundconverter.
soundconverter -b -m mp3 -s .mp3 long-video.webm
(MP3 patents are expired now,
hooray! I’m just using MP3 here because if I get a rental car that lets
me plug in a USB stick for listening, the MP3 format is most likely to
be supported.)
Soundconverter has a GUI but you can use -b for batch
mode from the shell. soundconverter --help for help. You do
need to set both the MIME type, with -m, and the file
suffix, with -s.
Proctor & Gamble makes products that help you comply with widely
held cleanliness norms.
Digital ads are micro-targeted to you as an individual.
That’s the worst possible brand/medium fit. If you don’t know that
the people who expect you to keep your house or body clean are going to
be aware of the same product, how do you know whether to buy it?
I thought it would be fun to try Twitter ads, and, not surprisingly,
I started getting fake followers pretty quickly after I started a
Twitter follower campaign.
Since I’m paying nine cents a head for these followers, I don’t want
to get ripped off. So naturally I put in a support ticket to Twitter,
and just heard back.
Thanks for writing in about the quality of followers and engagements.
One of the advantages of the Twitter Ads platform is that any RTs of
your promoted ads are sent to the retweeting account’s followers as an
organic tweet. Any engagements that result are not charged, however
followers gained may not align with the original campaign’s targeting
criteria. These earned followers or engagements do show in the campaign
dashboard and are used to calculate cost per engagement, however you are
not charged for them directly.
Twitter also passes all promoted engagements through a filtering
mechanism to avoid charging advertisers for any low-quality or invalid
engagements. These filters run on a set schedule so the engagements may
show in the campaign dashboard, but will be deducted from the amount
outstanding and will not be charged to your credit card.
If you have any further questions, please don’t hesitate to
reply.
That’s pretty dense San Francisco speak, so let me see if I can
translate to the equivalent for a normal product.
Hey, what are these rat turds doing in my raisin
bran?
Thanks for writing in about the quality of your raisin bran eating
experience. One of the advantages of the raisin bran platform is that
during the production process, your raisin bran is made available to our
rodent partners as an organic asset.
I paid for raisin bran, so why are you selling me
raisin-plus-rat-turds bran?
Any ingredients that result from rodent engagement are not charged,
however ingredients gained may not align with your original
raisin-eating criteria.
Can I have my money back?
We pass all raisin bran sales through a filtering mechanism to avoid
charging you for invalid ingredients. The total weight of the product,
as printed on the box, includes these ingredients, but the weight of
invalid ingredients will be deducted from the amount charged to your
credit card.
So how can I tell which rat turds are “organic” so I’m not
paying for them, and which are the ones that you just didn’t catch and
are charging me for?
(?)
Buying Twitter followers: Fiverr or Twitter?
On Fiverr, Twitter followers are about half a cent each ($5/1000). On
Twitter, I’m gettting followers for about 9 cents each. The Twitter
price is about 18x the Fiverr price.
But every follower that someone else buys on Fiverr has to be “aged”
and disguised in order to look realistic enough not to get banned. The
bot-herders have to follow legit follower campaigns such as mine and not
just their paying customers.
(I call them "sleepers." They do all sorts of natural things (following
suggested accounts, tweeting quotes) aging into "trusted" zone.)
If Twitter is selling those “follow” actions to me for nine cents
each, and the bot-herder is only making half a cent, how is Twitter not
making more from bogus Twitter followers than the bot-herders are?
If you’re verified on Twitter, you may not be seeing how much of a
shitshow their ad business is. Maybe the’re going to have to sell
Twitter to me sooner than I thought.
Just thinking about approaches to incentivizing production of
information goods, and where futures markets might fit in.
Artificial property
Article 1, Section 8, of the US Constitution still covers this one
best.
To promote the Progress of Science and useful Arts, by securing for
limited Times to Authors and Inventors the exclusive Right to their
respective Writings and Discoveries;
We know about the problems with this one. It encourages all kinds of
rent-seeking and freedom-menacing behavior by the holders of
property interests in information. And the transaction costs are
too high to incentivize the production of some useful kinds of
information.
Commoditize the complement
Joel Spolsky explained it best, in Strategy
Letter V. Smart companies try to commoditize their products’
complements. (See also: the list of business models in the Some
Easily Rebutted Objections to GNU’s Goals section of the GNU Manifesto)
This one has been shown to work for some categories of information
goods but not others. (We have Free world-class browsers and OS kernels
because search engines and hardware are complements. We don’t have free
world-class software in categories such as CAD.)
Signaling
Release a free information good as a way to signal competence in
performing a service, or at least a large investment by the author in
persuading others that the author is competent. In software, this works
at the level of the individual labor market and in consulting. Also a
major part of certain kinds of ad-supported
business models.
Game and market mechanisms
With “gamified crowdsourcing” you can earn play rewards for very low
transaction costs, and contribute very small tasks.
Higher transaction costs are associated with “crowdfunding” which
sounds similar but requires more collaboration and administration.
From games to markets
In the middle, between crowdsourcing and crowdfunding, is a niche for
a mechanism with lower transaction costs than crowdfunding but more
rewards than crowdsourcing.
By using the existing bug tracker to resolve contracts, a bug futures
market keeps transaction costs low. By connecting to an existing payment
platform, a bug futures market enables a kind of reward that is more
liquid, and transferrable among projects.
We don’t know how wide the bug futures niche is. Is it a tiny space
between increasingly complex tasks that can be resolved by crowdsourcing
and increasingly finer-grained crowdfunding campaigns?
Or are bug futures capable of achieving low enough transaction costs
to be an attractive incentivization mechanism for a lot of tasks that go
into a variety of information goods?
Bryan Alexander has a good description of an “open web” reading
pipeline in I
defy the world and go back to RSS. I’m all for the open web, but 40
separate folders for 400 feeds? That would drive me nuts. I’m a lumper,
not a splitter. I have one folder for 12,387 feeds.
My chosen way to use RSS (and one of the great things about RSS is
you can choose UX independently of information sources) is a “scored
river”. Something like Dave Winer’s River
of News concept, that you can navigate by just scrolling, but not
exactly a river of news.
with full text if available, but without images. I can click
through if I want the images.
items grouped by score, not feed. (Scores assigned managed by a
dirt-simple algorithm where a feed “invests” a percentage of its points
in every link, and the investments pay out in a higher score for that
feed if the user likes a link.)
I also put the byline at the bottom of each item. Anyway, one thing I
have found out about manipulating my own filter bubble is that linklog
feeds and blogrolls are great inputs.
Here are some actual links.
This might look funny: How
I ran my kids like an Atlassian team for a month. But think about it
for a minute. Someone at every app or site your kids use is
doing the same thing, and their goals don’t include “Dignity and
Respect” or “Hard Work Smart Work”.
For
an Inclusive Culture, Try Working LessAs I said, to this day, my
team at J.D. Edwards was the most diverse I’ve ever worked on….Still, I
just couldn’t get over that damned tie.
The
Al Capone theory of sexual harassmentInitially, the connection
eluded us: why would the same person who made unwanted sexual advances
also fake expense reports, plagiarize, or take credit for other people’s
work?
Jon
Tennant - The Cost of KnowledgeBut there’s something much more
sinister to consider; recently a group of researchers saw fit to publish
Ebola research in a ‘glamour magazine’ behind a paywall; they cared more
about brand association than the content. This could be life-saving
research, why did they not at least educate themselves on the preprint
procedure….
2014:When there’s no other dude in the car, the cost of taking an Uber
anywhere becomes cheaper than owning a vehicle. So the magic there is,
you basically bring the cost below the cost of ownership for everybody,
and then car ownership goes away.
2018 (?): When there’s no other dude in the
fund, the cost of financing innovation anywhere becomes cheaper than
owning a portfolio of public company stock. So the magic there is, you
basically bring the transaction costs of venture capital below the cost
of public company ownership for everybody, and then public companies go
away.
Could be a thing for software/service companies faster than we might
think. Futures contracts on bugs→equity crowdfunding→bot-managed
follow-on fund…
Here’s a probably stupid idea: give bots the right to accept proposed
changes to a software project. Can automation encourage less
burnout-provoking behavior?
A set of bots could interact in interesting ways.
Regression-test-bot: If a change only adds a
test, applies cleanly to both the current version and to a previous
version, and the previous version passses the test, accept it, even if
the test fails for the current version.
Harmless-change-bot: If a change is below a
certain size, does not modify existing tests, and all tests (including
any new ones) pass, accept it.
Revert-bot: If any tests are failing on the
current version, and have been failing for more than a certain amount of
time, revert back to a version that passes.
Would more people write regression tests for their issues if they
knew that a bot would accept them? Or say that someone makes a bad
change but gets it past harmless-change-bot because no existing test
covers it. No lengthy argument needed. Write a regression test and let
regression-test-bot and revert-bot team up to take care of the problem.
In general, move contributor energy away from arguing with people and
toward test writing, and reduce the size of the maintainer’s to-do
list.
Just tried a Twitter advertising trick that a guy who goes by “weev”
posted two years ago.
It still works.
They didn’t fix it.
Any low-budget troll who can read that old blog post and come up with
a valid credit card number can still do it.
Maybe Twitter is a bad
example, but the fast-moving nationalist right wing manages to
outclass its opponents on other social marketing platforms, too.
Facebook won’t
even reveal how badly they got played in 2016. They thought they
were putting out cat food for cute Internet kittens, but the rats ate
it.
This is not new. Right-wing shitlords, at least the best of them, are
the masters of database marketing. They absolutely kill it, and
they have been ever since Marketing as we know it became a thing. Some
good examples:
All the creepy surveillance marketing stuff they’re doing today is
just another set of tools in an expanding core competency.
Every once in a while you get an exception. The environmental
movement became a direct mail operation in response to Interior Secretary
James G. Watt, who alarmed environmentalists enough that
organizations could reliably fundraise with direct mail copy quoting
from Watt’s latest speech. And the Democrats tried that “Organizing for
America” thing for a little while, but, man, their heart just wasn’t in
it. They dropped it like a Moodle site during summer vacation. Somehow,
the creepier the marketing, the more it skews “red”. The more creativity
involved, the more it skews “blue” (using the USA meanings of those
colors.) When we make decisions about how much user surveillance we’re
going to allow on a platform, we’re making a political decision.
News sites want to go to Congress, to get permission to play for
third place in their own business? You want permission to bring
fewer resources and less experience to a surveillance
marketing game that the Internet companies are already losing?
We know the qualities of a medium that you win by being creepier, and
we know the qualities of a medium that you can win with reputation and
creativity. Why waste time and money asking Congress for the opportunity
to lose, when you could change the game instead?
Maybe achieving balance in political views depends on achieving
balance in business model. Instead of buying in to the surveillance
marketing model 100%, and handing an advantage to one side, maybe news
sites should help users control what data they share in
order to balance competing political interests.
Given this, it appears that an open source venture (a company that
can scale to millions of worker/owners creating a new economic
ecosystem) that builds massive human curated databases and decentralizes
the processing load of training these AIs could become extremely
competitive.
But what if the economic ecosystem could exist without the venture?
Instead of trying to build a virtual company with millions of
workers/owners, build a market economy with millions of participants in
tens of thousands of projects and tasks? All of this stuff scales
technically much better than it scales organizationally—you could still
be part of a large organization or movement while only participating
directly on a small set of issues at any one time. Instead of holding
equity in a large organization with all its political risk, you could
hold a portfolio of positions in areas where you have enough knowledge
to be comfortable.
Robb’s opportunity is in training AIs, not in writing code. The
“oracle” for resolving AI-training or dataset-building contracts would
have to be different, but the futures market could be the same.
The cheating project problem
Why would you invest in a futures contract on bug outcomes when the
project maintainer controls the bug tracker?
And what about employees who are incentivized from both sides: paid
to fix a bug but able to buy futures contracts (anonymously) that will
let them make more on the market by leaving it open?
In order for the market to function, the total reputation of the
project and contributors must be high enough that outside participants
believe that developers are more motivated to maintain that reputation
than to “take a
dive” on a bug.
That implies that there is some kind of relationship between the
total “reputation capital” of a project and the maximum market value of
all the futures contracts on it.
Open source metrics
To put that another way, there must be some relationship between the
market value of futures contracts on a project and the maximum
reputation value of the project. So that could be a proxy for a
difficult-to-measure concept such as “open source health.”
Open source journalism
Hey, tickers to put into stories! Sparklines! All the charts and
stuff that finance and sports reporters can build stories around!
Update 18 Dec 2017: The blind-reviews add-on now supports both
Bugzilla code reviews and GitHub pull requests. Updated project status.
Added a forbidden word.
This paper presents the largest study to date on gender bias, where
we compare acceptance rates of contributions from men versus women in an
open source software community. Surprisingly, our results show that
women’s contributions tend to be accepted more often than men’s.
However, women’s acceptance rates are higher only when they are not
identifiable as women.
A followup,
from Alice Marshall, breaks out the differences between acceptance
of “insider” and “outsider” contributions.
For outsiders, women coders who use gender-neutral profiles get their
changes accepted 2.8% more of the time than men with gender-neutral
profiles, but when their gender is obvious, they get their changes
accepted 0.8% less of the time.
The experiment, launching this month, will help reviewers who want to
try breaking habits of unconscious bias (whether by gender or
insider/outsider status) by concealing the name and email adddress of a
code author during a review on Bugzilla. You’ll be able to un-hide the
information before submitting a review, if you want, in order to add a
personal touch, such as welcoming a new contributor.
The extension will “cc” one of two special accounts on a bug, to
indicate if the review was done partly or fully blind. This lets us
measure its impact without having to make back-end changes to
Bugzilla.
(Yes, browser add-ons let you experiment with changing a user’s
experience of a site without changing production web applications or
content sites. Bonus link: FilterBubbler.)
Adfraud is a big problem, and we keep seeing two basic approaches to
it.
Flight to quality: Run ads only on trustworthy
sites. Brands are now playing the fraud game with the “reputation
coprocessors” of the audience’s brains on the brand’s side. (Flight to
quality doesn’t mean just advertise on the same major media sites as
everyone else—it can scale downward with, for example, the Project
Wonderful model that lets you choose sites that are “brand safe” for
you.)
Increased surveillance: Try to fight adfraud by
continuing to play the game of trying to get big-money impressions from
the cheapest possible site, but throw more tracking at the problem.
Biggest example of this is to move ad money to locked-down mobile
platforms and away from the web.
Anyway, I’m interested in and optimistic about the results of the
recent Mozilla/Caribou Digital report. It turns out that USA-style
adtech is harder to do in countries where users are (1) less accurately
tracked and (2) equipped with blockers to avoid bandwidth-sucking
third-party ads. That’s likely to mean better prospects for ad-supported
news and cultural works, not worse. This report points out the good news
that the so-called adtech tax is lower in developing countries—so what
kind of ad-supported businesses will be enabled by lower “taxes” and “reinvention,
not reinsertion” of more magazine-like advertising?
Of course, working in those markets is going to be hard for big US or
European ad agencies that are now used to solving problems by throwing
creepy tracking at them. But the low rate of adtech taxation sounds like
an opportunity for creative local agencies and brands. Maybe the report
should have been called something like “The Global South is
Shitty-Adtech-Proof, so Brands Built Online There Are Going to Come Eat
Your Lunch.”
More on the
third connection in Benkler’s Tripod, which was pretty general. This
is just some notes on more concrete examples of how new kinds of direct
connections between markets and peer production might work in the
future.
Smart contracts should make it possible to enable these in a
trustworthy, mostly decentralized, way.
Feature request I want emoji support on my blog, so
I file, or find, a wishlist bug on the open source blog package I use:
“Add emoji support.” I then offer to enter into a smart contract that
will be worthless to me if the bug is fixed on September 1, or give me
my money back if the bug is unfixed at that date.
A developer realizes that fixing the bug would be easy, and wants to
do it, so takes the other side of the contract. The developer’s side
will expire worthless if the bug is unfixed, and pay out if the bug is
fixed.
“Unfixed” results will probably include bugs that are open, wontfix,
invalid, or closed as duplicate of a bug that is still open.
“Fixed” results will include bugs closed as fixed, or any bug closed
as a duplicate of a bug that is closed as fixed.
If the developer fixes the bug, and its status changes to fixed, then
I lose money on the smart contract but get the feature I want. If the
bug status is still unfixed, then I get my money back.
So far this is just one user paying one developer to write a feature.
Not especially exciting. There is some interesting market design work to
be done here, though. How can the developer signal serious interest in
working on the bug, and get enough upside to be meaningful, without
taking too much risk in the event the fix is not accepted on time?
Arbitrage I post the same offer, but another user
realizes that the blog project can only support emoji if the template
package that it depends on supports them. That user becomes an
arbitrageur: takes the “fixed” side of my offer, and the “unfixed” side
of the “Add emoji support” bug in the template project.
As an end user, I don’t have to know the dependency relationship, and
the market gives the arbitrageur an incentive to collect information
about multiple dependent bugs into the best place to fix them.
Front-running Dudley Do-Right’s open source project
has a bug in it, users are offering to buy the “unfixed” side of the
contract in order to incentivize a fix, and a trader realizes that
Dudley would be unlikely to let the bug go unfixed. The trader takes the
“fixed” side of the contract before Dudley wakes up. The deal means that
the market gets information on the likelihood of the bug being fixed,
but the developer doing the work does not profit from it.
This is a “picking up nickels in front of a steamroller” trading
strategy. The front-runner is accepting the risk of Dudley burning out,
writing a long Medium piece on how open source is full of FAIL, and
never fixing a bug again.
Front-running game theory could be interesting. If developers get
sufficiently annoyed by front-running, they could delay fixing certain
bugs until after the end of the relevant contracts. A credible threat to
do this might make front-runners get out of their positions at a
loss.
CVE prediction A user of a static analysis tool
finds a suspicious pattern in a section of a codebase, but cannot
identify a specific vulnerability. The user offers to take one side of a
smart contract that will pay off if a vulnerability matching a certain
pattern is found. A software maintainer or key user can take the other
side of these contracts, to encourage researchers to disclose
information and focus attention on specific areas of the codebase.
Security information leakage Ernie and Bert discover
a software vulnerability. Bert sells it to foreign spies. Ernie wants to
get a piece of the action, too, but doesn’t want Bert to know, so he
trades on a relevant CVE prediction. Neither Bert nor the foreign spies
know who is making the prediction, but the market movement gives
white-hat researchers a clue on where the vulnerability can be
found.
Open source metrics: Prices and volumes on bug
futures could turn out to be a more credible signal of interest in a
project than raw activity numbers. It may be worth using a bot to trade
on a project you depend on, just to watch the market move. Likewise, new
open source metrics could provide useful trading strategies. If
sentiment analysis shows that a project is melting down, offer to take
the “unfixed” side of the project’s long-running bugs? (Of course, this
is the same market action that incentivizes fixes, so betting that a
project will fail is the same thing as paying them not to. My brain
hurts.)
What’s an “oracle”?
The “oracle” is the software component that moves information from
the bug tracker to the smart contracts system. Every smart contract has
to be tied to a given oracle that both sides trust to resolve it
fairly.
For CVE prediction, the oracle is responsible for pattern matching on
new CVEs, and feeding the info into the smart contract system. As with
all of these, CVE prediction contracts are tied to a specific
oracle.
Bots
Bots might have several roles.
Move investments out of duplicate bugs. (Take a “fixed” position
in the original and an “unfixed” position in the duplicate, or vice
versa.)
Make small investments in bugs that appear valid based on project
history and interactions by trusted users.
Track activity across projects and social sites to identify
qualified bug fixers who are unlikely to fix a bug within the time frame
of a contract, and take “unfixed” positions on bugs relevant to
them.
For companies: when a bug is mentioned in an internal customer
support ticketing system, buy “unfixed” on that bug. Map confidential
customer needs to possible fixers.
Why would you want the added complexity of a market where anyone can
take either side of a futures contract on the status of a software bug,
and not just offer to pay people to fix bugs like a sensible person?
IMHO it’s worth trying not just because of the promise of lower
transaction costs and more market liquidity (handwave) but because it
enables other kinds of transactions. A few more.
Partial work I want a feature, and buy the “unfixed”
side of a contract that I expect to lose. A developer decides to fix it,
does the work, and posts a pull request that would close the bug. But
the maintainer is on vacation, leaving her pull request hanging with a
long comment thread. Another developer is willing to take on the
political risk of merging the work, and buys out the original
developer’s position.
Prediction/incentivization With the right market
design, a prediction that something won’t happen is the same as an
incentive to make it happen. If we make an attractive enough way for
users to hedge their exposure to lack of innovation, we create a pool of
wealth that can be captured by innovators. (Related: dominant
assurance contracts)
Bug triage Much valuable work on bugs is in the form
of modifying metadata: assigning a bug to the correct subsystem,
identifying dependency relationships, cleaning up spam, and moving
invalid bugs into a support ticket tracker or forum. This work is hard
to reward, and infamously hard to find volunteers for. An active futures
market could include both bots that trade bugs probabilistically based
on status and activity, and active bug triagers who make small market
gains from modifying metadata in a way that makes them more likely to be
resolved.
Content Neutrality: Content blocking software
should focus on addressing potential user needs (such as on performance,
security, and privacy) instead of blocking specific types of content
(such as advertising).
Transparency & Control: The content blocking
software should provide users with transparency and meaningful controls
over the needs it is attempting to address.
Openness: Blocking should maintain a level
playing field and should block under the same principles regardless of
source of the content. Publishers and other content providers should be
given ways to participate in an open Web ecosystem, instead of being
placed in a permanent penalty box that closes off the Web to their
products and services.
[T]he police are the public and that the public are the police, the
police being only members of the public who are paid to give full-time
attention to duties which are incumbent on every citizen in the
interests of community welfare and existence.
Web browser developers have similar responsibilities to those of
Peel’s ideal police: to build a browser to carry out the user’s intent,
or, when setting defaults, to understand widely held user norms and
implement those, while giving users the affordances to change the
defaults if they choose.
The question now is how to apply content blocking principles to
today’s web environment. Some qualities of today’s situation are:
Tracking protection often doesn’t have to be perfect, because
adfraud. The browser can provide some protection, and influence the
market in a positive direction, just by getting legit users below the
noise floor of fraudbots.
Tracking protection has the potential to intensify a
fingerprinting arms race that’s already going on, by forcing more adtech
to rely on fingerprinting in place of third-party cookies.
Fraud is bad, but not all anti-fraud is good. Anti-fraud
technologies that track users can create the same security risks as
other tracking—and enable adtech to keep promising real eyeballs on
crappy sites. The “flight to quality” approach to anti-fraud does not
share these problems.
Adtech and adfraud can peek at Mozilla’s homework, but Mozilla
can’t see theirs. Open source projects must rely on unpredictable users,
not unpredictable platform decisions, to create uncertainty.
Which suggests a few tactics—low-risk ways to apply content blocking
principles to address today’s adtech/adfraud problems.
Empower WebExtensions developers and users. Much of
the tracking protection and anti-fingerprinting magic in Firefox is
hidden behind preferences. This makes a lot of sense because it enables
developers to integrate their work into the browser in parallel with
user testing, and enables Tor Browser to do less patching. IMHO this
work is also important to enable users to choose their own balance
between privacy/security and breaking legacy sites.
Inform and nudge users who express an interest in
privacy. Some users care about privacy, but don’t have enough
information about how protection choices match up with their
expectations. If a user cares enough to turn on Do Not Track, change
cookie settings, or install an ad blocker, then try suggesting a
tracking protection setting or tool. Don’t assume that just because a
user has installed an ad
blocker with deceptive privacy settings that the user would
not choose privacy if asked clearly.
Understand and report on adfraud. Adfraud is more
than just fake impressions and clicks. New techniques include
attribution fraud: taking advantage of tracking to connect a bogus ad
impression to a real sale. The complexity of attribution models makes
this hard to track down. (Criteo and Steelhouse settled a lawsuit about
this before discovery could reveal much.)
A multi-billion-dollar industry is devoted to spreading a story that
minimizes adfraud, while independent research hints at a complex and
lucrative adfraud scene. Remember how there were two Methbot
stories: Methbot got a bogus block of IP addresses, and Methbot
circumvented some widely used anti-fraud scripts. The ad networks dealt
with the first one pretty quickly, but the second is still a work in
progress.
The more that Internet freedom lovers can help marketers understand
adfraud, and related problems such as brand-unsafe ad placements, the
more that the content blocking story can be about users, legit sites,
and brands dealing with problem tracking, and not just privacy nerds
against all web business.
Since most software is sold with an “as is” license, meaning the
company is not legally liable for any issues with it even on day one, it
has not made much sense to spend the extra money and time required to
make software more secure quickly.
The software business is still stuck on the kind of licensing that
might have made sense in the 8-bit micro days, when “personal computer
productivity” was more aspirational than a real thing, and software
licenses were printed on the backs
of floppy sleeves.
Today, software is part of products that do real stuff, and it makes
zero sense to ship a real product, that people’s safety or security
depends on, with the fine print “WE RESERVE THE RIGHT TO TOTALLY
HALF-ASS OUR JOBS” or in business-speak, “SELLER DISCLAIMS THE IMPLIED
WARRANTY OF MERCHANTABILITY.”
But what about open source and collaboration and science, and all
that stuff? Software can be both “product” and “speech”. Should
there be a warranty on speech? If I dig up my shell script for
re-running the make command when a source file changes, and
put it on the Internet, should I be putting a warranty on it?
It seems that there are two kinds of software: some is more
product-like, and should have a grown-up warranty on it like a real
business. And some software is more speech-like, and should have ethical
requirements like a scientific paper, but not a product-like
warranty.
What’s the dividing line? Some ideas.
“productware is shipped as executables, freespeechware is
shipped as source code” Not going to work for
elevator_controller.php or a home router security tool
written in JavaScript.
“productware is preinstalled, freespeechware is downloaded
separately” That doesn’t make sense when even implanted
defibrillators can update over the net.
“productware is proprietary, freespeechware is open
source” Companies could put all the fragile stuff in open
source components, then use the DMCA and CFAA to enable them to treat
the whole compilation as proprietary.
Software companies are built to be good at getting around rules. If a
company can earn all its money in faraway Dutch Sandwich Land and be
conveniently too broke to pay the IRS in the USA, then it’s going to be
hard to make it grow up licensing-wise without hurting other people
first.
How about splitting out the legal advantages that the government
offers to software and extending some to productware, others to
freespeechware?
Freespeechware licenses
license may disclaim implied warranty
no anti-reverse-engineering clause in a freespeechware license is
enforceable
freespeechware is not a “technological protection measure” under
section 1201 of Title 17 of the United States Code (DMCA
anticircumvention)
exploiting a flaw in freespeechware is never a violation of the
Computer Fraud and Abuse Act
If the license allows it, a vendor may sell freespeechware, or a
derivative work of it, as productware. (This could be as simple as
following the You may charge any price or no price for each copy that
you convey, and you may offer support or warranty protection for a
fee. term of the GPL.)
Productware licenses:
license may not disclaim implied warranty
licensor and licensee may agree to limit reverse engineering
rights
DMCA and CFAA apply (reformed of course, but that’s another
story)
It seems to me that there needs to be some kind of quid pro quo here.
If a company that sells software wants to use government-granted legal
powers to control its work, that has to be conditioned on not using
those powers just to protect irresponsible releases.
Check it out—I’m “on Facebook” again. Just fixed my gateway through
dlvr.it. If you’re reading this on
Facebook, that’s why.
Dlvr.it is a nifty service that will post to social sites from an RSS
feed. If you don’t run your own linklog feed, the good news is that Pocket
will generate RSS feeds from the articles you save, so if you want
to share links with people still on Facebook, the combination of Pocket
and dlvr.it makes that easy to do without actually spending human
eyeball time there.
There’s a story about Thomas
Nelson, Jr., leader of the Virginia Militia in the Revolutionary
War.
During the siege and battle Nelson led the Virginia Militia whom he
had personally organized and supplied with his own funds. Legend had it
that Nelson ordered his artillery to direct their fire on his own house
which was occupied by Cornwallis, offering five guineas to the first man
who hit the house.
Would Facebook’s owners do the same, now that we know that foreign
interests use Facebook to subvert America? Probably not. The Nelson
story is just an unconfirmed patriotic anecdote, and we can’t expect
that kind of thing from today’s post-patriotic investor class. Anyway,
just seeing if I can move Facebook’s bots/eyeballs ratio up a
little.
Is open source participation a way to signal
that you have skills and are capable of cooperation with others?
Is open source a way to build connections and social capital so that
you have more awareness of new job openings and can more easily move to
a higher-paid position?
Does open source participation just increase your skills so that you
do better work and get paid more for it?
Are open source codebases a complementary good to open source
maintenance programming, so that a lower price for access to the
codebase tends to drive up the price for maintenance programming
labor?
Is “we hire open source people” just an excuse for bias, since the
open source scene at least in the USA is less diverse than the general
pool of programming job applicants?
I’m thankful that the sewing machine
was invented a long time ago, not today. If the sewing machine were
invented today, most sewing tutorials would be twice as long, because
all the thread would come in proprietary cartridges, and you would
usually have to hack the cartridge to get the type of thread you need in
a cartridge that works with your machine.
Tracking protection is still hard. You have to provide good
protection from third-party tracking, which users generally don’t want,
without breaking legit third-party services such as content delivery
networks, single sign-on systems, and shopping carts. Protection is a
balance, similar to the problem of filtering spam while delivering legit
mail. Just as spam filtering helps enable legit email marketing,
tracking protection tends to enable legit advertising that supports
journalism and cultural works.
In the long run, just as we have seen with spam filters, it will be
more important to make protection hard to predict than to run the
perfect protection out of the box. Do not repeat the
tactics which have gained you one victory, but let your methods be
regulated by the infinite variety of circumstances. — Sun Tzu A spam
filter, or browser, that always does the same thing will be analyzed and
worked around. A mail service that changes policies to respond to
current spam runs, or an unpredictable ecosystem of tracking protection
add-ons that browser users can install in unpredictable combinations, is
likely to be harder.
But most users aren’t in the habit of installing add-ons, so browsers
will probably have to give them a nudge, like Microsoft Windows does
when it nags the user to pick an antivirus package (or did last time I
checked.) So the decentralized way to catch up to Apple could end up
being something like:
When new tracking protection methods show up in the privacy
literature, quietly build the needed browser add-on APIs to make it
possible for new add-ons to implement them.
Do user research to guide the content and timing of nudges. (Some
atypicalusers
prefer to be tracked, and should be offered a chance to silence the
warnings by affirmatively choosing a do-nothing protection
option.)
Help users share information about the pros and cons of different
tools. If a tool saves lots of bandwidth and battery life but breaks
some site’s comment form, help the user make the right choice.
Sponsor innovation challenges to incentivize development,
testing, and promotion of diverse tracking protection tools.
Any surveillance marketer can install and test a copy of Safari, but
working around an explosion of tracking protection tools would be
harder. How to set priorities when they don’t know which tools will get
popular?
What about adfraud?
Tracking protection strategies have to take adfraud into account.
Marketers have two choices for how to deal with adfraud:
flight to quality
extra surveillance
Flight to quality is better in the long run. But it’s a problem from
the point of view of adtech intermediaries because it moves more ad
money to high-reputation sites, and the whole point of adtech is to
reach big-money eyeballs on cheap sites. Adtech firms would rather see
surveillance-heavy responses to adfraud. One way to help shift marketing
budgets away from surveillance, and toward flight to quality, is to make
the returns on surveillance investments less predictable.
This is possible to do without making value judgments about certain
kinds of sites. If you like a site enough to let it see your personal
info, you should be able to do it, even if in my humble opinion it’s a
crappy site. But you can have this option without extending to all
crappy sites the confidence that they’ll be able to live on leaked data
from unaware users.
Tracking protection is still hard. You have to provide good
protection from third-party tracking, which users generally don’t want,
without breaking legit third-party services such as content delivery
networks, single sign-on systems, and shopping carts. Protection is a
balance, similar to the problem of filtering spam while delivering legit
mail. Just as spam filtering helps enable legit email marketing,
tracking protection tends to enable legit advertising that supports
journalism and cultural works.
In the long run, just as we have seen with spam filters, it will be
more important to make protection hard to predict than to run the
perfect protection out of the box. Do not repeat the
tactics which have gained you one victory, but let your methods be
regulated by the infinite variety of circumstances. — Sun Tzu A spam
filter, or browser, that always does the same thing will be analyzed and
worked around. A mail service that changes policies to respond to
current spam runs, or an unpredictable ecosystem of tracking protection
add-ons that browser users can install in unpredictable combinations, is
likely to be harder.
But most users aren’t in the habit of installing add-ons, so browsers
will probably have to give them a nudge, like Microsoft Windows does
when it nags the user to pick an antivirus package (or did last time I
checked.) So the decentralized way to catch up to Apple could end up
being something like:
When new tracking protection methods show up in the privacy
literature, quietly build the needed browser add-on APIs to make it
possible for new add-ons to implement them.
Do user research to guide the content and timing of nudges. (Some
atypicalusers
prefer to be tracked, and should be offered a chance to silence the
warnings by affirmatively choosing a do-nothing protection
option.)
Help users share information about the pros and cons of different
tools. If a tool saves lots of bandwidth and battery life but breaks
some site’s comment form, help the user make the right choice.
Sponsor innovation challenges to incentivize development,
testing, and promotion of diverse tracking protection tools.
Any surveillance marketer can install and test a copy of Safari, but
working around an explosion of tracking protection tools would be
harder. How to set priorities when they don’t know which tools will get
popular?
What about adfraud?
Tracking protection strategies have to take adfraud into account.
Marketers have two choices for how to deal with adfraud:
flight to quality
extra surveillance
Flight to quality is better in the long run. But it’s a problem from
the point of view of adtech intermediaries because it moves more ad
money to high-reputation sites, and the whole point of adtech is to
reach big-money eyeballs on cheap sites. Adtech firms would rather see
surveillance-heavy responses to adfraud. One way to help shift marketing
budgets away from surveillance, and toward flight to quality, is to make
the returns on surveillance investments less predictable.
This is possible to do without making value judgments about certain
kinds of sites. If you like a site enough to let it see your personal
info, you should be able to do it, even if in my humble opinion it’s a
crappy site. But you can have this option without extending to all
crappy sites the confidence that they’ll be able to live on leaked data
from unaware users.
I have to admit that some people hate me, but I have to tell you
something about hate. If sending an electronic advertisement through
email warrants hate, then my answer to those people is “Get a life.
Don’t hate somebody for sending an advertisement through email.” There
are people out there that also like us.
According to spammers, spam filtering was just Internet nerds
complaining about something that regular users actually like. But the
spam debate ended when big online services, starting with MSN, started
talking about how they build for their real users instead of for
Wallace’s hypothetical spam-loving users.
If you missed the email spam debate, don’t worry. Wallace’s talking
points about spam filters constantly get recycled by
surveillance marketers talking about tracking protection. But now
it’s not email spam that users supposedly crave. Today, the Interactive
Advertising Bureau tells us that users want ads that “follow them
around” from site to site.
Enough background. Just as the email spam debate ended with MSN’s
campaign, the third-party web tracking debate ended
on June 5, 2017.
With Intelligent Tracking Prevention, WebKit strikes a balance
between user privacy and websites’ need for on-device storage. That
said, we are aware that this feature may create challenges for
legitimate website storage, i.e. storage not intended for cross-site
tracking.
“Legitimate” uses do not include cross-site tracking.
Safari’s protection is automatic and client-side, so no blocklist
politics.
Surveillance marketers come up with all kinds of hypothetical reasons
why users might prefer targeted ads. But in the real world, Apple
invests time and effort to understand user experience. When Apple
communicates about a feature, it’s because that feature is likely to
keep a user satisfied enough to buy more Apple devices. We can’t read
their confidential user research, but we can see what the company
learned from it based on how they communicate about products.
(Imagine for a minute that Apple’s user research had found that real
live users are more like the Interactive Advertising Bureau’s idea of a
user. We might see announcements more like “Safari automatically shares
your health and financial information with brands you love!” Anybody got
one of those to share?)
Saving an out-of-touch ad industry
Advertising supports journalism and cultural works that would not
otherwise exist. It’s too important not to save. Bob
Hoffman asks,
[H]ow can we encourage an acceptable version of online advertising
that will allow us to enjoy the things we like about the web without the
insufferable annoyance of the current online ad model?
The browser has to be part of the answer. If the browser does its
job, as Safari is doing, it can play a vital role in re-connecting users
with legit advertising—just as users have come to trust legit email
newsletters now that they have effective spam filters.
Safari’s Intelligent Tracking Prevention is not the final answer any
more than Paul Graham’s “A
plan for spam” was the final spam filter. Adtech will evade
protection tools just as spammers did, and protection will have to keep
getting better. But at least now we can finally say debate over,
game on.
Looks like the spawn of Privacy Badger and cookie double-keying,
designed to balance user protection from surveillance marketing with
minimal breakage of sites that depend on third-party resources.
(Now all the webmasters will fix stuff to make it work with
Intelligent Tracking Prevention, which makes it easier for other
browsers and privacy tools to justify their own features to protect
users. Of course, now the surveillance marketers will rely more on
passive fingerprinting, and Apple has an advantage there because there
are fewer different Safari-capable devices. But browsers need to fix
fingerprinting anyway.)
Apple does massive amounts of user research and it’s fun to watch the
results leak through when they communicate about features. Looks like
they have found that users care about being “followed” from site to site
by ads, and that users are still pretty good at applied behavioral
economics. The side effect of tracking protection, of course, is that it
takes high-reputation sites out of competition with the bottom-feeders
to reach their own audiences, so Intelligent Tracking Prevention is
great news for publishers too.
Meanwhile, I don’t get Google’s
weak “filter” thing. Looks like a transparently publisher-hostile
move (since it blocks some potentially big-money ads without addressing
the problem of site commodification), unless I’m missing something.
Benkler builds on the work of Ronald Coase, whose The Nature of
the Firm explains how transaction costs affect when companies can
be more efficient ways to organize work than markets. Benkler adds a
third organizational model, peer production. Peer production, commonly
seen in open source projects, is good at matching creative people to
rewarding problems.
As peer production relies on opening up access to resources for a
relatively unbounded set of agents, freeing them to define and pursue an
unbounded set of projects that are the best outcome of combining a
particular individual or set of individuals with a particular set of
resources, this open set of agents is likely to be more productive than
the same set could have been if divided into bounded sets in firms.
Firms, markets, and peer production all have their advantages, and in
the real world, most productive activity is mixed.
Managers in firms manage some production directly and trade in
markets for other production. This connection in the firms/markets/peer
production tripod is as old as firms.
The open source software business is the second connection.
Managers in firms both manage software production directly and sponsor
peer production projects, or manage employees who participate in
projects.
But what about the third possible connection between legs of the
tripod? Is it possible to make a direct connection between peer
production and markets, one that doesn’t go through firms? And why would
you want to connect peer production directly to markets in the first
place? Not just because that’s where the money is, but because markets
are a good tool for getting information out of people, and projects need
information. Save the whole Kooths et al. paper to
read later. Best case against open source that I know of—all the points
that a serious open source proponent needs to be able to address.
Stefan Kooths, Markus Langenfurth, and Nadine Kalwey wrote,
in “Open-Source Software: An Economic Assessment” (PDF),
Developers lack key information due to the absence of pricing in
open-source software. They do not have information concerning customers’
willingness to pay (= actual preferences), based on which production
decisions would be made in the market process. Because of the absence of
this information, supply does not automatically develop in line with the
needs of the users, which may manifest itself as oversupply (excessive
supply) or undersupply (excessive demand). Furthermore, the functional
deficits in the software market also work their way up to the upstream
factor markets (in particular, the labor market for developers)
and–depending on the financing model of the open-source software
development–to the downstream or parallel complementary markets (e.g.,
service markets) as well.
Because the open-source model at its core deliberately rejects the
use of the market as a coordination mechanism and prevents the formation
of price information, the above market functions cannot be satisfied by
the open-source model. This results in a systematic disadvantage in the
provision of software in the open-source model as compared to the
proprietary production process.
The workaround is to connect peer production to markets by way of
firms. But the more that connections between markets and peer production
projects have to go through firms, the more chances to lose information.
That’s not because firms are necessarily dysfunctional (although most
are, in different ways). A firm might rationally choose to pay for the
implementation of a feature that they predict will get 100 new users,
paying $5000 each, instead of a feature that adds $1000 of value for
1000 existing users, but whose absence won’t stop them from
renewing.
Some ways to connect peer production to markets are already working.
Crowdfunding
for software projects and Patreon
are furthest along, both offering support for developers who have
already built a reputation.
Dominant
assurance contracts, by Alexander Tabarrok, build on the
crowdfunding model, with the extra twist that the person proposing the
project has to put up some seed money that is divided among backers if
the project fails to secure funding. This is supposed to bring in extra
investment early on, before a project looks likely to meet its goal.
Tom W. Bell’s “SPEX”, in Prediction
Markets for Promoting the Progress of Sciences and the Useful Arts,
is a proposed market to facilitate transactions in a variety of
prediction certificates, each one of which promises to pay its bearer in
the event that an associated claim about science, technology, or public
policy comes true. The SPEX looks promising as a way for investors
to hedge their exposure to lack of innovation. If you own data centers
and need energy, take a short position in SPEX contracts on cold fusion.
(Or, more likely, buy into a SPEX fund that invests for your industry.)
The SPEX looks like a way to connect the market to more difficult
problems than the kinds of incremental
innovation that tend to be funded through the VC system.
What happens when the software industry is forced to grow up?
I’m starting to think that finishing the tripod, with better links
from markets to peer production, is going to matter a lot more soon,
because of the software quality problem.
Today’s software, both proprietary and open source, is distributed
under ¯\_(ツ)_/¯ terms. “Disclaimer of implied warranty of
merchantability” is lawyer-speak for “we reserve the right to half-ass
our jobs lol.” As Zeynep
Tufekci wrote in the New York Times, “The World Is Getting Hacked.
Why Don’t We Do More to Stop It?” At some point the users are going to
get fed up, and we’re going to have to. An industry as large and wealthy
as software, still sticking to Homebrew Computer Club-era disclaimers,
is like a 40-something-year-old startup bro doing crimes and claiming
that they’re just boyish hijinks. This whole disclaimer of implied
warranty thing is making us look stupid, people. (No, I’m not for
warranties on software that counts as a scientific or technical
communication, or on bona fide collaborative development, but on a
product product? Come on.)
Grown-up software liability policy is coming, but we’re not ready for
it. Quality software is not just a technically hard problem. Today,
we’re set up to move fast, break things, and ship dancing pigs—with
incentives more powerful than incentives to build secure software. Yes,
you get the occasional DARPA
initiative or tool
to facilitate incremental cleanup, but most software is incentivized
through too many layers of principal-agent problems. Everything
is broken.
If governments try to fix software liability before the software
scene can fix the incentives problem, then we will end up with a
stifled, slowed-down software scene, a few incumbent software companies
living on regulatory capture, and probably not much real security
benefit for users. But what if users (directly or through their
insurance companies) are willing to pay to avoid the costs of broken
software, in markets, and open source developers are
willing to participate in peer production to make
quality software, but software firms are not set up to
connect them?
What if there is another way to connect the “I would rather pay a
little more and not get h@x0r3d!” demand to the “I would code that right
and release it in open source, if someone would pay for it” supply?
Related
Why
Coase’s Penguin didn’t fly * — Crooked TimberIf self interested
actors have the power to block changes that would hurt their self
interest, they are going to do so. Therefore, the kinds of efficiency
driven processes of change that Coase (and Benkler) talk about are only
going to happen under very unusual conditions.
Corporate Prediction
Markets: Evidence from Google, Ford, and Firm X on JSTOR[R]unning markets on OKRs draws attention to them, presumably
intensifying reputational incentives to achieve them. At the same time,
if a project’s participants hold initially optimistic views of the
likelihood that it will be completed on time, debiasing these views may
not be in the company’s interest.
Software:
annoying speech or crappy product? The liability side of the
software quality problem is hard because software can be both speech and
a product. So what if we split up the menu of legal protections offered
to software into speechware and productware sets of
options?
In the matter of reforming things, as distinct from deforming them,
there is one plain and simple principle; a principle which will probably
be called a paradox. There exists in such a case a certain institution
or law; let us say, for the sake of simplicity, a fence or gate erected
across a road. The more modern type of reformer goes gaily up to it and
says, “I don’t see the use of this; let us clear it away.” To which the
more intelligent type of reformer will do well to answer: “If you don’t
see the use of it, I certainly won’t let you clear it away. Go away and
think. Then, when you can come back and tell me that you do see the use
of it, I may allow you to destroy it.
Bob Hoffman makes a good case for getting
rid of user tracking in web advertising. But in order to take the
next steps, and not just talk among ourselves about things that would be
really great in the future, we first need to think about the needs that
tracking seems to satisfy for legit marketers.
What I’m not going to do is pull out the argument that’s in every
first comment on every blog post that criticizes tracking: that “adtech”
is just technology and is somehow value-neutral. Tracking, like all
technologies, enables some kinds of activity better than others. When
tracking offers marketers the opportunity to reach users based on who
the user is rather than on what they’re reading, watching, or listening
to, then that means:
But if tracking is so bad, then why, when you go to any message board
or Q&A site that discusses marketing for small businesses, is
everyone discussing those nasty, potentially civilization-extinguishing
targeted ads? Why is nobody popping up with a question on how to make
the next They
Laughed When I Sat Down At the Piano?
Targeted ads are self-serve and easy to get started with. If you
have never bought a Twitter or Facebook ad, get out your credit card and
start a stopwatch. These ads might be crappy,
but they have the lowest time investment of any legit marketing project,
so probably the only marketing project that time-crunched startups can
do.
Targeted ads keep your OODA loop tight. Yes, running targeted ads
can be addictive—If you thought the the attention
slot machine game on social sites was bad, try the advertiser
dashboard. But you’re able to use them to learn information that can
help with the rest of marketing. If you have the budget to exhibit at
one conference, compare Twitter ads targeted to attendees of conference
A with ads targeted to attendees of conference B, and you’re closer to
an answer.
Marketing has two jobs: sell stuff to customers and sell
Marketing to management. Targeting is great for the second one, since it
comes with the numbers that will help you take credit for
results.
We’re not going to be able to get rid of risky tracking until we can
understand the needs that it fills, not just for big advertisers who can
afford the time and money to show up in Cannes every year, but for the
company founder who still has $1.99 business
cards and is doing all of Marketing themselves.
(The party line among web privacy people can’t just be that GDPR is
going to save us because the French powers that be are all
emmerdés ever since the surveillance/shitlord
complex tried to run a US-style game on their political system. That
might sound nice, but put not your trust in princes, man. Even the most
arrogant Eurocrats in the world will not be able to regulate
indefinitely against all the legit business people in their countries
complaining that they can’t do something they see as essential. GDPR
will be temporary air cover for building an alternative, not a fix
in itself.)
Post-creepy web advertising is still missing some key features.
Branding and signaling metrics. We know the hard
math works out against tracking and targeting, and we know about the
failure
of targeted media to build brands in the long run, but we don’t have
good numbers that are usable day to day. The “customer journey” has nice
graphs, but brand equity doesn’t.
Quick, low-risk service. With the exception of
the Project
Wonderful model, targeted ads are quick and low-risk, while
signal-carrying ads are the opposite. A high-overhead direct ad sales
process is not a drop-in replacement for an easy web form.
I don’t think that’s all of them. But I don’t think that the move to
post-creepy web advertising is going to be a rush, all at once, either.
Brands that have fly-by-night low-reputation competitors, brands that
already have many tracking-protected customers, and brands with solid
email lists are going to be able to move faster than marketers who are
still making tracking work. More: Work together to
fix web ads? Let’s not.
I’m still two steps behind in devops coolness for my network stuff. I
don’t even have proper configuration management, and that’s fine because
Configuration
Management is an Anti-pattern now. Anyway, I still log in and
actually run shell commands on the server, and the LWN review of mosh was
helpful to me. Now using mosh for connections that persist across
suspending the laptop and moving it from network to network. More info:
Mosh: the mobile shell
write a long Medium post apologizing to your users for
failing
end date for IP Maximalism
When did serious “Intellectual Property Maximalism” end? I’m going to
put it at September
18, 2006, which is the date that the Gates Foundation announced
funding for the Public Library of Science’s journal PLoS Neglected
Tropical Diseases. When it’s a serious matter of people’s health,
open access matters, even to the author of “Open Letter to Hobbyists”.
Since then, IP Maximalism stories have been mostly about rent-seeking
behavior, which had been a big part of the freedom lovers’s point all
along. (Nobody quoted in this story is pearl-clutching about
“innovation”, for example: Supreme
Court ruling threatens to shut down cottage industry for small East
Texas town.)
The marketplace of ideas is now struggling with the increasing
incidence of algorithmic manipulation and disinformation campaigns.There
are bots. Look around.
Is it just me, or does it look to anyone else like the man in the
photo is checking the list of third-party web trackers on the site to
see who he can send a National Security Letter to?
Could a US president who is untrustworthy enough to be removed from
office possibly be trustworthy enough to comply with his side of a “Privacy
Shield” agreement?
If it’s necessary for the rest of the world to free itself of its
dependence on the U.S., does that apply to US-based Internet
companies that have become a bottleneck for news site ad revenue, and
how is that going to work?
Twitter ad with a celebrity photo and the
headline, What happened to Jenna Jameson? We Can’t Look Away…The New
Look is Mesmerizing
If you’re “verified” on Twitter, you probably miss these, so I’ll
just use my Fair Use rights to share that one with you.
You’re welcome.
Twitter is a uniquely influential medium, one that shows up on the TV
news every night and on news sites all day. But somehow, the plan to
make money from Twitter is to run the same kind of crappy ads that
anyone can run. And the latest Twitter news is a privacy update that
includes, among other things, more
tracking of users from one site to another. Yes, the same kind of
thing that Facebook already does, and better, with more users. And the
same kind of thing that any web site can already get from an entire Lumascape
of companies. Boring.
If you want to stick this kind of ad on your WordPress site, you just
have to cut and paste some ad network HTML—not build out a deluxe office
space on Market Street in San Francisco the way Twitter has. But the
result is about the same.
What makes Twitter even more facepalm-worthy is that they make a
point of not showing the ads to the influential people who draw
attention to Twitter to start with. It’s like they’re posting a big sign
that says STUPID AD ZONE: UNIMPORTANT PEOPLE ONLY. Twitter is building
something unique, but they’re selling generic impressions that
advertisers can get anywhere. So as far as I can tell, the Twitter
business model is something like:
Money out: build something unique and expensive.
Money in: sell the most generic and shitty thing in
the world.
Facebook can make this work because they have a lot of
eyeball-minutes and a lot of sources of personal info (pixels, SDKs,
server-to-server tracking…) But Facebook is an outlier in that area, and
there aren’t enough minutes in the day for another. So Twitter is on
track to get sold for $500,000, like Digg
was. Which is good news for me because I know enough Twitter users
that I can get that kind of money together.
So why should you help me buy Twitter when you could just
get the $500,000 yourself? Because I have a secret plan, of course.
Twitter is the site that everyone is talking about, right? So run the
ads that people will talk about. Here’s the plan.
Sell one ad per day. And everybody sees the same
one.
Sort of like the back cover of the magazine that everybody in the
world reads (but there is no such magazine, so that’s why this is an
opportunity.) No more need to excuse the verified users from the ads.
Yes, an advertiser will have to provide a variety of sizes and
localizations for each ad (and yes, Twitter will have to check that the
translations match). But it’s the same essential ad, shown to every
Twitter user in the world for 24 hours.
No point trying to out-Facebook Facebook or out-Lumascape the
Lumascape. Surveillance ads are high in fraud and weak on signal,
and a bunch of other companies are doing them more cost-effectively and
at higher volume, anyway.
Of course, this is not for everybody. It’s for brands that want to
use a memorable, creative ad to try for the same kind of global signal
boost that a good Tweet® can get. But if you want generic targeted ads
you can get those everywhere else on the Internet. Where else can you
get signal? In order to beat current Twitter revenue, the One Twitter Ad
needs to go for about the same price as a Super Bowl commercial. But if
Twitter stays influential, that’s reasonable, and I make back the 500
grand and a lot more.
Internet users have been asking what they can do to protect their own
data from this creepy, non-consensual tracking by Internet providers—for
example, directing their Internet traffic through a VPN or Tor. One idea
to combat this that’s recently gotten a lot of traction among
privacy-conscious users is data pollution tools: software that fills
your browsing history with visits to random websites in order to add
“noise” to the browsing data that your Internet provider is
collecting.
…
[T]here are currently too many limitations and too many unknowns to
be able to confirm that data pollution is an effective strategy at
protecting one’s privacy. We’d love to eventually be proven wrong, but
for now, we simply cannot recommend these tools as an effective method
for protecting your privacy.
This is one of those “two problems one solution” situations.
The problem for makers and users of “data pollution” or spoofing
tools is QA. How do you know that your tool is working? Or are
surveillance marketers just filtering out the impressions created by the
tool, on the server side?
The problem for companies using so-called Non-Human Traffic (NHT)
is that when users discover NHT software (bots), the users tend to
remove it. What would make users choose to participate in NHT schemes so
that the NHT software can run for longer and build up more valuable
profiles?
So what if the makers of spoofing tools could get a live QA metric,
and NHT software maintainers could give users an incentive to install
and use their software?
NHT market as a tool for discovering information
Imagine a spoofing tool that offers an easy way to buy bot pageviews,
I mean buy Perfectly Legitimate Data on how fast a site loads from
various home Internet connections. When the tool connects to its server
for an update, it gets a list of URLs to visit—a mix of random sites,
popular sites, and paying customers.
Now the spoofing tool maintainer will be able to to tell right away
if the tool is really generating realistic traffic, by looking at the
market price of pageviews. The maintainer will even be able to tell
whose tracking the tool can beat, by looking at which third-party
resources are included on the pages getting paid-for traffic.
The money probably won’t be significant, since real web ad money is
moving to whitelisted, legit sites and away from fraud-susceptible
schemes anyway, but in the meantime it’s a way to measure
effectiveness.
Setting up a couple of Linux systems to work with FilterBubbler, which is one of the
things
that I’m up to at work now. FilterBubbler is a WebExtension,
and the setup instructions use web-ext,
so I need NPM. In order to keep all the NPM stuff under my own home
directory, but still put the web-ext tool on my $PATH, I
need to make one-line edits to three files.
One line in ~/.npmrc
prefix = ~/.npm
One line in ~/.gitignore
.npm/
One line in ~/.bashrc
export PATH="$PATH:$HOME/.npm/bin"
(My /bashrc has a bunch of export PATH=
lines so that when I add or remove one it’s more likely to get a clean
merge. Because home directory in git.) I think that’s it. Now I can
do
npm install --global web-ext
with no sudo or mess. And when I clone my home directory on another
system it will just work.
(This is an answer to a question
on Twitter. Twitter is the new blog comments (for now) and I’m more
likely to see comments there than to have time to set up and moderate
comments here.)
Adfraud is an easy way to make mad cash, adtech is happily supporting
it, and it all works because the system has enough layers between CMO
and fraud hacker that everybody can stay as clean as they need to. Users
bear the privacy risks of adfraud, legit
publishers pay for it, and adtech makes more money from adfraud than
fraud hackers do. Adtech doesn’t have to communicate or coordinate with
adfraud, just set up a fraud-friendly system and let the actual fraud
hackers go to work. Bad for users, people who make legit sites, and
civilization in general.
But one piece of good news is that adfraud can change quickly.
Adfraud hackers don’t have time to get stuck in conventional ways of
doing things, because adfraud is so lucrative that the high-skill
players don’t have to stay in it for very long. The adfraud hackers who
were most active last fall have retired to run their resorts or
recording studios or wineries or whatever.
So how can privacy tools get a piece of the action?
One random idea is for an obfuscation tool to participate in the
market for so-called sourced
traffic. Fraud hackers need real-looking traffic and are willing to
pay for it. Supplying that traffic is sketchy but legal. Which is
perfect, because put one more layer on top of it and it’s not even
sketchy.
And who needs to know if they’re doing a good job at generating
real-looking traffic? Obfuscation tool maintainers. Even if you write a
great obfuscation tool, you never really know if your tricks for helping
users beat surveillance are actually working, or if your tool’s traffic
is getting quietly identified on the server side.
In proposed new privacy tool model, outsourced QA pays YOU!
Set up a market where a Perfectly Legitimate Site that is looking for
sourced traffic can go to buy pageviews, I mean buy Perfectly Legitimate
Data on how fast a site loads from various home Internet connections.
When the obfuscation tool connects to its server for an update, it gets
a list of URLs to visit—a mix of random, popular sites and paying
customers.
Set a minimum price for pageviews that’s high enough to make it
cost-ineffective for DDoS. Don’t allow it to be used on random sites,
only those that the buyer controls. Make them put a secret in an
unlinked-to URL or something. And if an obfuscation tool isn’t well
enough sandboxed to visit a site that’s doing traffic sourcing, it isn’t
well enough sandboxed to surf the web unsupervised at all.
Now the obfuscation tool maintainer will be able to to tell right
away if the tool is really generating realistic traffic, by looking at
the market price. The maintainer will even be able to tell whose
tracking the tool can beat, by looking at which third-party resources
are included on the pages getting paid-for traffic. And the whole thing
can be done by stringing together stuff that IAB members are already
doing, so they would look foolish to complain about it.
But I was just thinking—people started adding lots of refined sugar
to their diets long before anybody discovered how dental caries
works.
And today we have Internet distractions, and surveillance marketing,
doing to our brains what sugar did to
people’s teeth.
And people have both sugar and teeth today. Dental hygiene is
awesome: it’s a set of norms, technologies, and habits, grounded in
scientific understanding. Mental hygiene is just getting started.
The sugar industry moved faster to start with, but people agree that
teeth matter. So do brains.
If you want people on the Internet to argue with you, say that you’re
making a statement about values.
If you want people to negotiate with you, say that you’re making a
statement about business.
If you want people to accept that something is inevitable, say that
you’re making a statement about technology.
The mixup between values arguments, business arguments, and
technology arguments might be why people are confused about Brands
need to fire adtech by Doc Searls.
The set of trends that people call adtech is a values-driven
business transformation that is trying to label itself as a
technological transformation.
Some of the implementation involves technological changes (NoSQL
databases! Nifty!) but fundamentally adtech is about changing how
media business is done. Adtech does have a set of values, none of
which are really commonly held even among people in the marketing or
advertising field, but let’s not make the mistake of turning this into
either an argument about values (that never accomplishes anything) or a
set of statements about technology (that puts those with an inside POV
on current technology at an unnecessary advantage). Instead, let’s look
at the business positions that adtech is taking.
Adtech stands for profitable platforms, with
commodity producers of news and cultural works. Michael
Tiffany, CEO of advertising security firm White Ops, saidThe fundamental value proposition of these ad tech companies who are
de-anonymizing the Internet is, Why spend big CPMs on branded sites
when I can get them on no-name sites? This is not a healthy
situation, but it’s a chosen path, not a technologically inevitable
one.
Adtech stands for the needs of low-reputation
sellers over the needs of high-reputation sellers.High-reputation
and low-reputation brands need different qualities from an ad medium
and adtech has to under-serve the high-reputation ones. Again,
not technologically inevitable, but a business position that
high-reputation brands and their agencies don’t have to accept.
Adtech stands for making advertisers support
criminal and politically heinous activity.I’ll
just let Bob Hoffman explain that one. Fraudulent and brand-unsafe
content is just the overspray of the high value platforms/commoditized
content system, and advertisers have to accept it in order to
power that system. Or do they?
People have a lot of interesting decisions to make: policy,
contractual, infrastructural, and client-side. When we treat the
adtech movement as simply technology, we take the risk of missing
great opportunities to negotiate for the benefit of brands, publishers,
and the audience.
This is a brand new blog, so I’m setting up the basics. I just
realized that I got the whole thing working without a single script,
image, or HTML table. (These kids today have it easy, with their media
queries and CSS
Grid and stuff.)
One big question that I’m wondering about is: how many of the people
who visit here are using some kind of protection from third-party
tracking? Third-party tracking has been an unfixed vulnerability in web
browsers for a long time. Check out the Unofficial Cookie
FAQ from 1997. Third-party cookies are in there…and we’re
still dealing with the third-party tracking problem?
In order to see how bad the problem is on this site, I’m going to set
up a little bit of first-party data collection to measure
people’s vulnerability to third-party data collection.
The three parts of that big question are:
Does first-party JavaScript load and run?
Does third-party JavaScript (from a site on popular filter lists)
load and run?
Can a third-party tracker see state from other
sites?
I’m including a single-pixel image and two scripts: the Aloodo one
and a new first-party script.
In most tracking protection configurations, the Aloodo script will be
blocked, because ad.aloodo.com appears on the commonly used
tracking protection lists.
All it does is swap out the tracking image source three times.
When the script runs, to check that this is a
browser with JavaScript on.
When the Aloodo tracking script runs, to check
if this browser is blocking the script from loading.
When the Aloodo script confirms that tracking is
possible.
The work is done in the setupAloodo function, which runs
after the page loads. First, it sets the src for the
tracking pixel to js.png, then sets up two callbacks: one
to run after the Aloodo script is loaded, and switch the image to
ld.png, and one to run if the script can track the
user, and switch the image to td.png.
Step three: check the logs
Now I can use the regular server logs to compare the number of
clients that load the original image, and the JavaScript-switched one,
to the number that load the two tracking images.
I’ll run some reports on the logs and post again about the results.
(If you want to see your own results in the meantime, you can take a tracking protection test.)
Metalsmith is pretty fun. The basic pipeline from the article seems
to work pretty well, but I ran into a couple of issues. I might have
solved these in ways that are completely wrong, but here’s what works
for me.
First, I needed to figure out how to get text from an earlier
stage of the pipeline. My Metalsmith build is pretty basic:
turn Markdown into HTML (plus article metadata stored with it,
wrapped up in a JavaScript object)
apply a template to turn the HTML version into a complete
page.
That’s great, but the problem seems to be with getting a copy of just
the HTML from step 1 for building the index page and the RSS feed. I don’t want the entire HTML page from
step 2, just the inner HTML from step 1.
The solution seems to be metalsmith-untemplatize.
This doesn’t actually strip off the template, just lets you capture an
extra copy of the HTML before templatization. This goes into the
pipeline after “markdown” but before the “layouts” step.
Here’s the world’s simplest Firefox add-on, which just turns on
Tracking Protection (ordinarily buried somewhere in about:config) and
sets third-party cookie policy to a sane value.
Internet meetings are a pain in the behind. (Clearly online meeting
software is controlled by the fossil fuel industry, and designed to be
just flaky enough to make people drive to work instead.)
Here’s a work in progress to get an MSIE VM running on Fedora.
(Will edit as I check these steps a few times. Suggestions
welcome.)
Select Browse, then Browse Local, then select the .qcow2 file.
That’s it. Now looking at a virtual MS-Windows guest that I can use
for those troublesome web conferences (and for testing web sites under
MSIE. If you try the tracking
test, it should take you to a protection page that prompts you to
turn on the EasyPrivacy Tracking Protection List. That’s a quick and
easy way to speed up your web browsing experience on MSIE.)
BY USING AND/OR VISITING THIS WEB SITE, YOU SIGNIFY YOUR ASSENT TO
THESE TERMS AND CONDITIONS. IF YOU DO NOT AGREE TO ANY OF THESE TERMS,
YOU SHOULD NOT USE THE WEB SITE. Your use of this web site, unless
governed by a separate written agreement, is subject to the terms of
this Agreement between you and Donald B. Marti (hereinafter “Site
Owner”). Site Owner reserves the right to modify this Agreement without
notice to you. The current version of this Agreement is available from a
link on the web site home page. YOU AGREE THAT YOUR USE OF THIS
WEB SITE SHALL BE AT YOUR SOLE RISK. THIS WEB SITE AND ITS CONTENT IS
PROVIDED ON AN “AS IS” BASIS WITHOUT ANY WARRANTY OF ANY KIND, INCLUDING
WITHOUT LIMITATION THE WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT,
AND FITNESS FOR PARTICULAR PURPOSE. “Web Site Software” means
any software code, product, or work used in the creation, editing,
storage, delivery, operation, or maintenance of this web site or its
content, including but not limited to operating system, utilities,
software development tools, graphic design and editing tools, audio
recording, editing, and processing tools, language runtimes, and web
server software. “Licensable” means having the right to grant, to the
maximum extent possible, whether at the time of the initial grant or
subsequently acquired, any and all rights to make, have made, use, sell,
have sold and offer to sell any Web Site Software. “Patent Claims” means
any patent claim(s), now owned or hereafter acquired, including without
limitation, method, process, and apparatus claims, in any patent
Licensable by you or your employer. In consideration for your use of and
access to this web site, you grant Site Owner a worldwide,
sublicensable, non-exclusive license under Patent Claims infringed by
the making, using or selling of the Web Site Software, for the duration
of the patents. You represent that you have the authority to grant this
license on behalf of your employer. This license shall survive
termination of this Agreement. If you choose to terminate this
Agreement, you may do so by ceasing to use this web site, destroying any
and all copies of any content from this site, or data collected from
this site (including but not limited to the output of any network
client, scanning or diagnostic software) in your possession, and
notifying Site Owner in writing.
I have come to believe that advertising is the original sin of the
web. The fallen state of our Internet is a direct, if unintentional,
consequence of choosing advertising as the default model to support
online content and services.
[T]he advertising industry has become the web’s lapdog –
irresponsibly exaggerating the effectiveness of online advertising and
social media, ignoring the abominable results of display advertising,
glossing over the fraud and corruption, and becoming a de facto
sales arm for the online ad industry.
Advertising can be a good thing. Some of my favorite cultural goods
are leftovers
paid for by advertising at its best. There should be a way to make
advertising work for the web, the way it has worked for print
magazines.
But Hoffman and Zuckerman are both right. Web advertising
has failed. We’re throwing
away most of the potential value of the web as an ad medium by
failing to fix privacy bugs. Web ads today work more like email spam
than like magazine ads. The quest for “relevance” not only makes
targeted ads less valuable than untargeted ones, but also wastes most of
what advertisers spend. Buy an ad on the web, and more of your money
goes to intermediaries
and fraud
than to the content that helps your ad carry a signal.
From Zuckerman’s point of view, advertising is a problem, because
advertising is full of creepy stuff. From Hoffman’s point of view, the
web is a problem, because the web is full of creepy stuff. (Bonus link:
Big
Brother Has Arrived, and He’s Us )
So let’s re-introduce the web to advertising, only this time, let’s
try
it without the creepy stuff. Brand advertisers and web content
people have a lot more in common than either one has with database
marketing. There are a lot of great opportunities on the post-creepy
web, but the first step is to get the right people talking.
Andrew Cowie has written something
similar. The main thing that this one does differently is to ask
make which files matter to it, instead of doing an
inotifywatch on the whole directory. Comments and suggestions
welcome.
(I wrote this story a while ago, but the news is catching up
fast, so I'd better put it up now.)
"Look, Agent Bellamy, I appreciate you coming out, but it's three in
the morning. Can we set up a time to discuss this tomorrow, and can your
people check the house while I'm at the office?"
Jack Murphy was too tired to follow some involved technical
discussion with the big Intellectual Property Enforcement Agency agent,
who sat in Murphy's old steam-bent office chair, briefcase at his feet.
Murphy, quickly dressed in chinos and Stanford Law sweatshirt, sat in
his new ergonomic chair at a gleaming glass and metal desk. The desk
looked out of place in the much-remodeled Maryland house that Linda had
found when Murphy accepted the appointment in DC.
"I'm afraid it won't wait, sir," Bellamy said. "Sorry for the
unannounced visit, but as you're probably aware, our agency tracks the
Free Markets closely."
"The Free Markets? That underground money web site?"
"Yes, basically. Although it's not really a site, just a system for
communicating and trading. That's what makes it so hard to shut
down."
"Well, all I know is that if you jailbreak your computer you can get
on anonymously and buy drugs or guns or whatever."
"That's right. Let me show you an example." Bellamy pulled a plastic
IPEA evidence bag out of his briefcase. Inside was a heavy semiautomatic
pistol. It was raw machined steel without the usual blued finish, and a
blank slide where the manufacturer's name and serial number would be.
The plastic bag, oily on the inside, looked like it was lined with
little rainbows. "You can't make steel parts like this on a 3D printer,
but you can make parts for a plastic machine that will cut aluminum.
Then you can use aluminum parts to make machines that can cut steel.
People trade machines, parts and weapons every step of the way. This
one's complete, and it works. It was on its way to an underground
gunsmith who puts a nice finish on them."
Murphy could see the shiny steel reflected in both of the room's
immaculate black windows. "It's like Adam Smith's pin factory."
"Yes. And this piece could have come from any combination of
thousands of basement workshops. It's completely untraceable, and
infringes a zillion patents. These things are a headache for us, but
that's not why I'm here."
Murphy leaned over the desk, and Bellamy continued.
"There's also an online scene called the prediction markets. Oh, hold
on, sorry." Bellamy spoke quietly into his jacket cuff. The agents who
had arrived with Bellamy were still doing some kind of security sweep of
the house. Murphy was glad that Linda was away, dropping Jack Jr. off at
college. Security stuff always put her on edge.
"All right. Prediction markets," Bellamy said. "If I want to bet on a
football game, I can buy a prediction, say 'Eagles win on Sunday.' If
they win, after the game the prediction expires and I get a dollar."
"Sounds like just online gambling. They're just saying 'prediction'
instead of 'bet.'" Murphy yawned and shook his head to try to clear
it.
"Yes, it's like an ordinary bet in a lot of ways. If the Eagles lose,
my prediction expires worthless. Just like losing a bet. But those
predictions trade up and down, like stocks and bonds, right up until the
end of the game."
"And they're untaxed and anonymous."
"Right. And there are other predictions I could make. I could buy a
prediction on 'Jack Murphy dead before October 14th'." And if, for
whatever reason, you're no longer with us that day, I make a
dollar."
"So is that how the assassination market works? Someone just makes a
bet that somebody else will be dead?"
"That's one side of the deal. That's the bet that the assassin makes.
Someone else has to take the other side of the bet, and lose. If you
want somebody dead, you just place a bet that they'll be alive. You lose
your bet, but they get taken care of."
One of the agents who had come in with Bellamy was standing in the
office door. His light blue gloves and shoe covers didn't go with his
dark blue suit. He was holding Murphy's laptop computer, with Murphy's
mobile phone and charger on top.
"We're going to need to check those in the van," Bellamy said. "We'll
have them back in ten minutes."
Murphy nodded and the agent turned and left. Bellamy had introduced
him but Murphy was too tired to remember the name.
"So the original client, or whatever you want to call him, makes a
bet, and loses, and the assassin wins, and that's how the assassin gets
paid. But you said a dollar. Nobody's going to murder someone for a
dollar."
"Right. There has to be some volume in the market for it to be a
significant risk. A lot of people have to be willing to buy those
predictions of 'Jack Murphy alive.' and lose the money."
"So how is my stock doing?" Murphy knew that DC was still chattering
about the news of his surprise appointment. The Secretary was an old
colleague from think tank days, but nobody expected that the President
would go along with bringing Murphy in. The President was too good a
politician not to have his own person in every department's number two
spot.
"That's why we're here. There's a lot of volume. A lot of outstanding
predictions on you alive."
"They're predicting I'll be alive because they want me dead." Murphy
finally yawned and got his hand over it.
Bellamy just continued. "Yes, that's right. The good news is that the
administration has an independent fund for protecting appointees. Our
agency can't know about it officially, of course. That fund buys the
same 'dead' predictions that an assassin would. Makes it less profitable
for the assassin. Basically, we play the market to lose. It's expensive,
and it's not a hundred percent solution, but it's the best answer so
far."
"What about just going after the people who want me dead?"
"Frankly, sir, that wouldn't scale. Between the senior citizens and
the cat thing, our market model says that more than four hundred
thousand people have some money on you. If you're alive next week, they
make a little money. If you're dead, they're happy too."
Murphy was silent.
Bellamy said, "They don't really think of it as gambling. More like
they're hedging their exposure to your continued existence."
Murphy looked up. One of the other agents, whose name Murphy didn't
remember either, was standing in the doorway. "We're clear, sir. No
cameras or devices left. Verified no other residents present. Charlie
team is watching the egress. We're good to go."
"All right." Bellamy ripped open the evidence bag and pulled out the
raw steel untraceable pistol. The room smelled of some kind of oil.
"What are you doing?" Murphy yelled. His voice went up in a squeak at
the end. He grabbed for his desk phone and realized it was gone.
"Sorry, sir," said Bellamy. "But the money in that slush fund has to
come from somewhere. Sometimes we play to win."
The process is going to be a little different from what you might be
used to with another OS. If you shop carefully (and reading blogs is a
good first step) then the drivers you will need are already available
through your Linux distribution’s printer setup tool.
HP has done a good job with enabling this. The company has already
released the necessary printer software as open source, and your Linux
distribution has already installed it. So, go to printers
fully supported with the HPLIP software, pick a printer you like,
and you’re done.
If you want a recommendation from me, the HP
LaserJet 3055, a black and white all-in-one device, has worked fine
for me with various Linux setups for years. It’s also a
scanner/copier/fax machine, and you get the extra functionality for not
much more than the price of a regular printer. It also comes with a
good-sized toner cartridge, so your cost per page is probably going to
be pretty reasonable.
Other printer brands have given me more grief, but fortunately the HP
LaserJets are widely available and don’t jam much.
It’s important not to show a smug expression on your face while
printing if users of non-Linux OSs are still dealing with driver CDs or
vendor downloads.
When you give travel directions, you include landmarks, and “gone too
far” points. Turn left after you cross the bridge. Then look for my
street and make a right. If you go past the water tower you’ve gone too
far.
System administration instructions are much easier to follow if they
include those kind of check-ins there, too. For example, if you explain
how to set up server software you can put in quick “landmark” tests,
such as, “at this point, you can run nmap and see the port
in the results.” You can also include “gone too far” information by
pointing out problems you can troubleshoot on the way.
A full-scale troubleshooting guide is a good idea, but quick warning
signs as you go along are helpful. Much better than finding yourself
lost at the end of a long set of setup instructions.
GNU
seq doesn’t accept dotted quads for ranges, but fortunately
most of the commands that accept an IP address will also take it in the
form of a regular decimal. (Spammers used to use this to hide their
naughty domains from scanners that only looked for the dotted quad while
the browser would happily go to
http://3232235520/barely-legal-mortgage.html or
something.)
So here’s an ugly-ass shell function to convert an IP address to a
decimal. If you have a better one, please let me know and I’ll update
this page. (Yes, I know this would be one line in Perl.)
dq2int()
{
if [ $(echo $1 | grep -q '\.') ]; then
dq2int $(echo $1 | tr '.' ' ')
elif [ $# -eq 1 ]; then
echo $1
else
total=$1; next=$2; shift 2
dq2int $(($total*2**8+$next)) $@
fi
}
To quote the GNU bc
manual, “For bases greater than 16, bc uses a multi-character digit
method of printing the numbers where each higher base digit is printed
as a base 10 number.”
It says “Personal and Confidential” or “IMPORTANT CORRESPONDENCE
REGARDING YOUR OVERPAYMENT” on the envelope—can you really discard it
without opening it? You sure can. Some junk mailers disguise their mail
pieces as important correspondence from companies you actually do
business with, and the USPS helped them out a lot by renaming “Bulk
Mail” to “Standard Mail”. But you can look at the postage to discard
“stealth” junk mail without opening it.
Postal regulations require
that any bills or mail containing specific information about your
business relationship with the company must be mailed First Class.
So, if “Standard Mail” or “STD” appears in the upper right corner,
it’s not a bill, it’s not your new credit card, and it’s not a check.
It’s just sneaky junk mail.
Most of what separates Web 2.0 from pre-Web 2.0 is not really about
the web. Paul Graham writes that
Web 2.0 is about three things: AJAX, Democracy, and Don’t Maltreat
Users. Democracy here means using users to collect
decision-making information, such as evaluating whether something is
worth reading or buying, for other users. Yes, visitors come to your
site to get value that other users brought to it. Ka-ching!
Tim O’Reilly writes
that Web 2.0 is about constantly improving software that’s available as
a service; mixing data from multiple sources, including users; offering
your own data and services in a flexible, mixable way; and going
beyond the page metaphor (which sounds like AJAX to me.)
Anyway, the key part of being Web 2.0 is that you’re building value
from many small information contributions that users don’t mind making.
Every user whitewashes
a little bit of the fence. Paul Graham points out that Google is a
good example of this. When I say great
burritos in San Francisco, Google uses my link-making work (and that
of others) to amass awesome burrito (and other thing)-recommending power
and rule the world. And I like it because I want my favorite
San Francisco burrito place to succeed.
When people put
geographical directions up using microformats, someone will crawl
them and string the route decisions together to get a directions search
engine with common sense (because it borrowed the common sense of
millions of users) that doesn’t tell people to make an illegal left into
oncoming traffic, the way a certain map site used to tell me to leave my
old house every day. (70mph combined speed motor vehicle slalom!
Yaaaaahooooo!)
Where AJAX fits into all this is that you’re snarfing one reputation
information unit per click, quickly, instead of waiting for a whole page
to render to suck the value out of the user’s head into your MySQL
cluster where it becomes valuable. And you have to let users pull data
back out and mix it, since that creates attention incentives for other
users to push data in.
So far this Web 2.0 stuff sounds like it’s all about web sites. How
can companies that aren’t basically web sites or mail-order catalogs be
Web 2.0? Some already are. Remixed FedEx lately? Download
their sample code and try their API.
Hold on a second—you don’t have to be a FedEx partner to do
that? No, and that’s the first concrete difference between Web 2.0 and
non-Web-2.0 companies. From a pre-2.0 point of view, the partner program
is what enables companies to interact with you. Start thinking 2.0,
though, and the partner program looks more and more like pointless
bureaucracy that keeps non-partner companies out. Just as you
want Googlebot to crawl your product pages, (and some of you will go
flame Matt Cutts if it doesn’t) you want any company whose stuff can
plug into yours to try your API.
You could probably do a pretty reliable Web-2.0-or-not-o-meter based
on dates in the RSS feed for API announcements vs. dates in press
releases matching /partner/i.
What next? Larry Augustin points out that sales and
marketing accounts for 82 percent of new software license revenue.
Ouch! Let’s throw some Web 2.0 magic at that number. And I don’t mean
the sales part. The web, together with open source licensing,
easy-to-demo ASP, and virtualization, is already taking a huge chunk out
of the sales side.
But a huge, expensive part of software marketing is involved in
information gathering, too. It’s really expensive to hire Software
Marketing people to gather requirements from users, write big word
processor documents full of what the users want, and show each other
Microsoft PowerPoint presentations, then tie up the developers showing
them again.
Within organizations, we know all about using Extreme Programming and
other agile methodologies that replace the
obsolete-as-soon-as-finished overdocumented waterfall development
process. But often, as soon as projects cross organizational lines,
we’re back to the kind of thing that
with-it software people rightly make fun of.
Where web sites are concerned, Web 2.0 took the excellent idea of
APIs from software developers and gave them to the webmasters. In
off-web businesses, it’s taking development methodologies from
the developers and giving them to marketing people.
But how do you know what to build when the Big Dumb Word Processor
Document of What To Build is gone? This is where it gets fun. The
customers are already telling you what they want, if you know how to
listen to them. Web 2.0 companies are concentrating on building the
place for that conversation to take place, instead of writing the damn
document themselves. You get better, faster, cheaper when you lose the
waterfall.
After all, some users will sit still for Focus Groups and other
20th-century marketing, but even if they do, you’re stuck dealing with
the resulting data yourself. Canonical Ltd. takes the specification
process where Google takes the search result ranking process—outside
the company as much as possible. That doesn’t mean that random users
design Canonical’s products for it, any more than search engine spammers
define Google results. But in Web 2.0 you get the users to whitewash the
fence.
There’s some overlap between being a Web 2.0 company and being
an open source one. Here’s where I think Web 2.0 goes further
than open source. If open source is trees, Web 2.0 is hemp. Instead of
harvesting big particpation from a committed developer, tester, user,
partner, or customer, you get a small quantity of fiber per transaction,
fast, and you do a lot of them.
I think there’s a limit to how far pure conversationality and social
software can take this, and that we’re going to have to get hairier
information-sucking-a-little-bit-at-a-time-from-peoples-heads tools such
as prediction markets. But Web 2.0 unplugged from the web can
take us a lot further, faster, than the alternatives can, and,
especially in the area of business software, companies are already using
it.
By opening this package, you are agreeing to be bound by the
following agreement.
This software product is copyrighted, and all rights are reserved by
the publisher and the author. You are licensed to use the software on a
single computer. You may copy and/or modify the software as needed to
facilitate your use of it on a single computer. Making copies of the
software for any other purpose is a violation of the United States
copyright laws.
This software is sold AS IS without warranty of any kind, either
expressed or implied, including but not limited to the implied
warranties of merchantability and fitness for a particular purpose.
Neither the publisher nor its dealers or distributors assumes any
liability for any alleged or actual damages arising from the use of this
program. (Some states do not allow the exclusion of implied warranties,
to this exclusion may not apply to you.)
All that is really needed on computers is a “Calculate” button or
omnipresent menu command that allows you to take an arithmetic
expression, like 248.93 / 375, select it, and do the calculation whether
in the word processor, communications package, drawing or presentation
application or just at the desktop level.
Fortunately, there’s a blue “Access IBM” button on this keyboard that
doesn’t do much. So, I configured tpb to make “Access IBM” do
this:
(That is, get the contents of the X primary selection, run it through
a Perl “eval”, put the result back into the X primary selection, then
fake a delete and paste.)
If you want to do this, besides tpb, you’ll need
xsel and
xte, which is part of
xautomation.
If you don’t have an unused button, you could also set up a binding in
your window manager or build a big red outboard USB “eval” button or
something.
If you make a new ssh key and try to use it with ssh -i while running
ssh-agent, ssh tries the agent first. You could end up using a key
provided by the agent instead of the one you specify. You can fix this
without killing the agent. Use:
The most important part of picking a Linux distribution is thinking
about where you will go for help, and what distribution that source of
help understands. That’s true if your source of help is a vendor, a
consultant, a user group, or an online forum about a particular Linux
use case or application.
As a home user, you’ll probably be asking your local Linux users
group for help when you need it. So get on the mailing list and just
“lurk” for a while. See what the most helpful people on the list use,
and install that. That way if you have a question, you’ll be more likely
to reach someone who has already dealt with it.
If you’re getting into uses for Linux that are different from those
of your local user group, it’s more important to use a list of people
working on projects similar to yours than just the geographically
closest user group. Find where people are talking about that kind of
project and pay attention to what they’re using.
If you have an application that you want to run on Linux, either
because you’re moving from another OS and are already running it on
there, or because you have read or watched the tutorials and want to use
it, then get on that application’s support list or forum and see who
gets good help. Sometimes an application only gets packaged by a subset
of the distributions, and sometimes an application’s users find it hard
to get support because their distribution has an older or patched
version. Some application maintainers are fired up about Flatpak as a way to distribute on Linux,
so users of Flatpak-friendly distributions will get better support. But
you don’t need to get into the details of packaging methods. Figure out
where people are getting support for the application you want, and look
at which distributions tend to be better supported.
If you have a script that uses ssh, here’s something to put at the
beginning of the script to make sure the necessary passphrase has
already been entered, and the remote host is reachable, before starting
a time-consuming operation such as an rsync.