12 Apr 2025

picking up cheap shoes in front of a steamroller

Here’s another privacy paradox for people who collect them.

  • On the web, the average personalized ad is probably better than the average non-personalized ad. (The same ad campaigns that have a decent budget for ad creative also have a budget for targeting data.)

  • But users who block personalized ads, or avoid personalization by using privacy tools and settings, are, on average, better off than users who get personalized ads.

There’s an expression in finance: Picking Up Nickels In Front Of A Steam Roller. For some kinds of investing decisions, the investor is more likely to make a small gain than to lose money in each individual trade. But the total expected return over time is negative, because a large loss is an unlikely outcome of each trade. The decision to accept personalized ads or try to avoid them might be a similar bet.

For example, a typical positive outcome of getting personalized ads might be getting better shoes, cheaper. There’s a company in China that is working the personalized ad system really well. Instead of paying for high production value ads featuring high-profile athletes in the USA, they’re just doing the incremental data-driven marketing thing. Make shoes, experiment with the personalized ad system, watch the numbers, reinvest in both shoe improvements and improvements to the personalized ads. For customers, the shoe company represents the best-case scenario for turning on the personalized ads. You get a pair of shoes from China for $40 that are about as good as the $150 shoes from China that you would get from a a big-name brand. (The shoes might even be made by the same people out of the same materials.) I don’t need to link to the company, just turn on personalized ads and if you want the shoes they’ll find you.

That example might be an outlier on the win-win side, though. On average, personalized (behaviorally targeted) ads are likely to be associated with lower quality vendors and higher product prices compared to competing alternatives found among search results. (Mustri et al.) but let’s pretend for a minute and say you figured out how to get targeted in the best possible way and come out on the winning side. That’s pretty sweet—personalized ads save you more than a hundred bucks on shoes, right?

Here comes the steamroller, though.

In recent news, Baltimore sues 2 sportsbooks over alleged exploitative practices. Some people are likely to develop a gambling problem, and if you don’t know in advance whether or not you’re one of them, should you have the personalized ads turned on? You stand to lose a lot more than you would have gained by getting the cheap shoes or other miscellaneous stuff. It is possible that machine learning on the advertising or recommended content side could know more about you than you do, and the negative outcomes from falling for an online elder fraud scheme tend to be much larger than the positive outcomes from selecting the best of competing legitimate products.

The personalized advertising system can facilitate both win-win offers like the good shoes from an unknown brand or win-lose offers like those from sports betting apps that use predatory practices. The presence of both win-win and win-lose offers in the market is a fact that keeps getting oversimplified away by personalized advertising’s advocates in academia. In practice, ad personalization gives an advantage to deceptive sellers. Another good example comes from the b2b side: malware in search ads personalized to an employee portal or SaaS application. From the CIO point of view, are you better off having employees get better-personalized search ads at work, or better off blocking a security incident before it starts?

People’s reactions to personalization are worth watching, and reflect more widely held understanding of how information works in markets than personalized ad fandom does. The fact that Google may have used this data to conduct focused ad campaigns targeted back to you was disclosed as if it was a security issue, which makes sense. Greg Knauss writes, Blue Shield says that no bad actor was involved, but is that really true? Shouldn’t a product that, apparently by default, takes literally anything it can—privacy be damned—and tosses it into the old ad-o-matic not be considered the output of a bad actor? Many people (but not everybody) consider being targeted for a personalized ad as a threat in itself. More: personalization risks

06 Apr 2025

converting PDFs for Tesseract

There are two kinds of PDFs. Some have real embedded text that you can select in a PDF reader, and some are just images.

The second kind is what I sometimes get in response to a CCPA/CPRA Right to Know. Some companies, for whatever reason, want to make it harder to do automated processing of multiple RtKs. This should make privacy researchers more likely to look at them, because what are they hiding and they must be up to something.

But the PDF still needs to get run through some kind of OCR. Tesseract OCR has been giving me pretty good results, but it needs to be fed images, not PDFs.

So I have been feeding the PDFs to pdf2image—in Python code, and then passing the images to Tesseract. But it turns out that Tessaract works a lot better with higher resolution images, and the default for pdf2image is 200 DPI. So I’m gettting a lot more accurate OCR by making the images oversized with the dpi named parameter:

pages = pdf2image.convert_from_bytes(blob, dpi=600)

I might tweak this and try 300 DPI, or also try passing grayscale=True to preserve more information. Some other approaches to try next, if I need them.

Anyway, Meta (Facebook) made some of their info easy to parse (in JSON format) and got some of us to do research on them. Some of the other interesting companies, though, are going to be those who put in the time to obfuscate their responses to RtKs.

28 Mar 2025

More money and better stuff for people in the UK

Some good news last week: Meta settles UK ‘right to object to ad-tracking’ lawsuit by agreeing not to track plaintiff. Tanya O’Carroll, in the UK, has settled a case with Meta, and the company must stop using her data for ad targeting when she uses its services. It’s not a change for everyone, though, since the settlement is just for one person. O’Carroll said she is unable to disclose full details of the tracking-free access Meta will be providing in her case but she confirmed that she will not have to pay Meta.

The Open Rights Group now has a Meta opt-out page that anyone in the UK can use to do an opt out under the UK GDPR.

If you use any Meta products – Facebook, Instagram, Meta Quest or VR, Threads or WhatsApp – you can use our tool to request that they no longer collect or process your data for advertising. This is known as your right to object, which is enshrined in data protection law. Meta had tried to get around GDPR, but by settling Tanya’s case they have admitted that they need to give their users this right.

If you’re in the UK, you can either use the form on the site, or use the mailto link to open up a new regular email from your own account pre-populated with the opt out text. This is a win not just because it could mean less money for a transnational criminal organization and more money staying in the UK, but also because it’s going to mean better products and services for the people who do it.

Opt outs are one layer in the onion.

  • Don’t do a surveilled activity

  • Block the transfer of tracking data

  • Generate tracking data that is hard to link to you

  • Set an opt out while doing the surveilled activity

  • Send an opt out or Right to Delete after doing the surveilled activity

Having access to this new tool doesn’t mean not to do the others. Even if I could figure out how to use the Meta apps in a way that’s totally safe for me, it’s still a win to switch away because it helps build network effects for the alternatives and more safety for other people. So even if you do this opt out, it’s also a good idea to do the other effective privacy tips.

How this gets you more money and better stuff

Turning off the personalized ads is a bigger deal than it looks like. The arguments from advertising personalization fans don’t reflect the best research on the subject. Ad personalization systems, especially on Facebook, are designed to give some hard-to-overcome advantages to deceptive advertisers. Any limitations to personalization look like a big win, shopping-wise. In one study, turning on an Apple privacy setting reduced reported fraud losses by 4.7%.

The personalization of ads on Facebook helps vendors of crappy, misrepresented goods match their products to the shoppers who are most likely to fall for their bullshit. Yes, you can follow the advice in articles like Don’t Get Scammed! Tips For Spotting AI-Generated Fake Products Online on Bellingcat, but it’s a time-saver and an extra layer of protection not to get the scam ad in the first place.

Privacy tools and settings that limit ad personalization have been available for quite a while. If people who use them were buying worse stuff, the surveillance industry would have said so by now. Anyway, if you’re in the UK, go do the Meta opt-out.

In other countries, other effective privacy tips are still a win.

23 Mar 2025

power moves, signaling, and a helpful book for understanding Big Tech

I’m still waiting for my copy of Careless People by Sarah Wynn-Williams, so I don’t have anything more on the content of the book than what I have seen in other reviews. The local bookstore had a stack—more than they normally order for new hardcovers—but I hesitated and they were gone next time I went in there. So yes, I am a little behind on this.

But come on, people.

Careless People is a best-seller because Meta decision-makers want it to be a best-seller.

In other Big Tech news, Google is delivering ads for obvious malware, with a landing page featuring an unauthorized copy of one of Google’s own logos. Even worse, they got spotted placing ads on Child Sexual Abuse Material. At first these look like embarrassing self-owns, especially for a company that’s contending for favorable PR in the AI business. Is their AI really that bad at classifying landing pages, extension listings, and the content of sites where their ads appear? The search ad malware thing is particularly egregious—the whole point of the deceptive ads that are all over Google Search now is to impersonate some well-known company. It should be a high school level coding project to filter out some of these.

But Big Tech’s apparent eagerness to appear in bad news makes sense when you look at the results. Out of all the people who read and were outraged by Careless People over the weekend, how many are going to come in to work on Monday and delete their Meta tracking pixel or turn off Meta CAPI? And how many people concerned about Google’s malware, CSAM, and infringing content problems are going to switch to inclusion lists and validated SupplyChain objects and stop with the crappy, often illegal ad placements that Google recommends and legit ad agencies don’t? For Big Tech, doing crimes in an obvious way is a power move, a credible, costly signal. If there were a Meta alternative that didn’t do genocide, or an honest alternative to Google search advertising, then advertising decision-makers would have switched to them already. All these embarrassing-looking stories are a signal: don’t waste your time looking for an alternative to paying us. The publisher’s page for Careless People has a Meta pixel on it.

I do have a book recommendation that might be a little easier to get a hold of. Codes of the Underworld by Diego Gambetta was the weekly book recommendation on A Collection of Unmitigated Pedantry. I’m glad to see that it is still in print, because it’s a useful way to help understand the Big Tech companies. Actions that might not have made sense in a company’s old create more value than you capture days are likely to be easier to figure out after understanding the considerations applied by other criminal organizations.

Codes of the Underworld by Diego Gambetta

Criminals have hard-to-satisfy communications needs, such as the need to convey a credible threat to a victim without attracting the attention of enforcers. This is related to the signaling problem faced by honest advertisers, but in reverse. How can a representative of a protection racket indicate to a small business that they represent a true threat, and aren’t just bluffing? Gambetta digs into a variety of signaling problems. It’s a 2009 book, so many of the Big Tech firms were still legit when it came out, but a lot of the communications methods from back then apply to the companies of today.

Is there a solution? As Gambetta points out, real-life organized crime perpetrators tend to copy from the movies, and today they’re copying the partnership with a friendly government subplot from The Godfather Part II. Maybe it’s time to watch that movie again.

15 Mar 2025

privacy laws for slacker states

It has come to my attention that there are still 15 or so states in the USA without privacy laws. This is understandable. We all have a lot of stuff to deal with. And of course there’s the problem of privacy law compliance turning into a time-suck for small businesses. The more that the laws and regulations pile up, the harder to pick out everything you need to do from all those damn PDFs. And it’s not just small companies. Honda just got around to dealing with some obvious differences between GDPR compliance and CCPA compliance that I pointed out back in 2020. And that’s an old PDF and a big company.

But the good news for slacker states is that doing the most work, cranking out the most lines of code, or the most pages of PDFs, or whatever, does not necessarily produce the best results. Given the amount of work that other states, and juridictions like the European Union, have already done on privacy, a slacker state can, right now, get not just the best privacy protection but also save a lot of time and grief for state employees and for business people in your state.

You need two laws. And we know that people are going to print them out, so please keep them short. (Maybe do a printer ink right to refill law next year?)

First, surveillance licenses for Big Tech. This gets you a few benefits.

  • Focus on the riskiest companies with the most money and staff for compliance—don’t put extra work on small local businesses.

  • Save your state’s attorney general and their staff a bunch of time. They’re not Big Tech’s support department. If a Big Tech company drops the ball on user support, just suspend their surveillance license until they clean up their act, like a problem bar and their liquor license.

  • You can define surveillance really briefly in the law and make the big out-of-state companies do the work of describing their surveillance practices in their license application.

That one is pretty easy to do as long as you focus purely on inbound data, the surveillance part, and don’t touch anything that sounds like speech from the company to others. And you can push most of the work off onto Big Tech and a new surveillance licensing board. I’m sure every state has people who would be willing to get on one of those.

Second, copy all the details from other states and countries. The other law would be focused on maximum privacy, minimum effort. The goal is to make a law that small business people can comply with, without even reading it, because they already had to do some privacy thing for somewhere else. Two parts.

  • Any privacy feature offered in some other jurisdiction must be offered here, too. A company only breaks the law if someone out-of-state gets a privacy feature that someone in-state doesn’t.

  • This law may be enforced by anyone except a state employee. (Borrow the Texas S.B. 8 legal hack, to protect yourself from Big Tech industry groups trying to block the law by starting an expensive case.)

A small business that operates purely locally can just do their thing. But if they already have some your California privacy rights feature or whatever, they just turn it on for this state too. Easier compliance project for the companies, better privacy for the users, no enforcement effort for the state, it’s a win-win-win. After all, state legislators don’t get paid by the page, and we each only get one set of carpal tunnels.

14 Mar 2025

Links for 14 March 2025: autonomous drones in the news

How Ukraine integrates machine vision in battlefield drones by Oleksandr Matviienko, Bohdan Miroshnychenko & Zoriana Semenovych. In November 2024, the government procured 3,000 FPV drones with machine vision and targeting technologies. Reports also suggested that the procurement would be expanded to 10,000 units.

Preparing for the next European war by Azeem Azhar. One challenge will be the simple rate of innovation in the actual battlefield. Drone warfare in Ukraine has shown iteration cycles measuring weeks not years. So any systems procured today need to be future-proofed for those dynamics.

Thread by Trent Telenko The logistical facts are that the FM-MAG machine gun, the 60 mm & 81mm mortars, LAWS, Javelins, any infantry crew served weapon you care to name are all going to be most to fully replaced with drones and drone operators, because of the logistical leverage drones represent on the battlefield.

Long-range drone strikes weakening Russia’s combat ability, senior Ukrainian commander says by Deborah Haynes. Some of the drones are remotely piloted, others work via autopilot. Russia’s war has forced Ukraine to use technology and innovation to fight back against its far more powerful foe. It has accelerated the use of autonomous machines in an irreversible transformation of the warzone that everyone is watching and learning from. Brigadier Shchygol said: Right now, Ukraine’s battlefield experience is essentially a manual for the world.

Ukraine Drives Next Gen Robotic Warfare by Mick Ryan. Another more interesting trend has arisen which will force policy makers and military strategists to undertake an even more careful analysis of Ukraine war trends, and how these trends apply in other theatres, particularly the Pacific. This trend, robotic teaming, has emerged over the past year with the advent on drone-on-drone combat in the air and on the ground. In particular, several recent combat actions in Ukraine provide insights that need to be studied and translated for their employment in the massive ocean expanses, tens of thousands of kilometres of littoral, thousands of large and small islands and at least three continents that constitute the Pacific theatre.

DEEP DIVE: Taiwan miltech aims to undermine Chinese components by Tim Mak. Taiwan has learnt the central tech lesson from the war in Ukraine: the next global conflicts will heavily feature cheap, small drones—and in large numbers. So as an electronics and hardware component giant—especially relative to its size and diplomatic status—it is trying not only to develop a domestic industry, but also become an arsenal for the free world, building drones and devices for allied militaries worldwide.

Why America fell behind in drones, and how to catch up again by Cat Orman and Jason Lu. Also Building Drones for Developers: A uniquely open architecture on the F-11 means that every part of the drone is truly built around the [NVIDIAn] Orin [GPU]. This enables sophisticated autonomy applications in which ML models are able to not only analyze data obtained in-flight, but actually use that analysis to inform flight actions in real time.

09 Mar 2025

Pro tips: links for 9 March 2025

Jason Lefkowitz cövers höw to set up the Cømpose key (and make everything you type awesöme™), in Make special characters stupid easy: meet the compose key

switching.software offers Ethical, easy-to-use and privacy-conscious alternatives to well-known software

Pro tip: avoid generative AI images in blog posts (even if your CMS says you should have one for SEO purposes) unless you want to make a political statement: AI: The New Aesthetics of Fascism by Gareth Watkins

Got third-party tracking scripts or pixels on your site? Avoid legal grief, take them off. Caught with Their Hand in the Cookie Jar: CNN’s Privacy Lawsuit is Served Fresh and the Court is Taking a Bite by Blake Landis. (Highest priority is to get rid of the Meta pixel. That’s not just a pro-evil-dictator tattoo for your web site, it’s really easy for lawyers to check for.)

Add data poisoning for AI scrapers hitting your GitHub Pages site: Trapping AI from the Algorithmic Sabotage Research Group (ASRG)

Got a small business? Like riding bikes? Relocating to the Netherlands with DAFT

If you need an integer and all you have is four 2s, Eli Bendersky has some math advice: Making any integer with four 2s

Nearly a Year Later, Mozilla is Still Promoting OneRep (Part of the Mozilla Monitor Plus service. Protip: check Have I Been Pwned directly)

Why you need a radio (yes, you!) by Audrey Eschright

The Linux kernel project can’t use code from sanctioned countries. Other projects need to check compliance with sanctions, too. US Blocks Open Source ‘Help’ From These Countries by Steven J. Vaughan-Nichols

Jake Archibald covers The case against self-closing tags in HTML (you don’t need <br /> just <br>.

John D. Cook makes rounding numbers much easier (if you use balanced ternary) in A magical land where rounding equals truncation

Understanding the legal issues for small community sites under UK law: #2: Five things you need if you run a small, low-risk user-to-user service by Rachel Coldicutt

08 Mar 2025

advertising personalization: good for you?

A new paper is out, collecting some of the top arguments in favor of personalized advertising: The Intended and Unintended Consequences of Privacy Regulation for Consumer Marketing by Jean-Pierre Dubé, John G. Lynch, Dirk Bergemann, Mert Demirer, Avi Goldfarb, Garrett Johnson, Anja Lambrecht, Tesary Lin, Anna Tuchman, Catherine E. Tucker It’s probably going to get cited this privacy law season. But, as an Internet optimist, I’m still not buying the argument that personalized advertising has important benefits that need to be balanced with privacy. Looking at the literature, it is more likely that certain risks are inherent to personalization as such and that reducing personalization is more likely to be a bonus benefit of privacy protection than a trade-off.

Some notes and links follow.

p. 3 We do not consider legal arguments for consumer privacy as a fundamental right or concerns about access to personal data by malign actors or governments.

Avoiding malign actors is the big reason for restricting personalized ads. And malign actors are numerous. The high-profile national security threats are already in the news, but most people will encounter miscellaneous malware, scams, rip-offs and other lesser villainy enabled by ad personalization more often than they have to deal with state or quasi-state adversaries. There is no hard line between malign actors and totally legit sellers—not only does the personalized ad business have plenty of halfway crooks, you can find n/m-way crooks for arbitrary values of n and m.

Ad personalization gives a bunch of hard-to-overcome advantages to deceptive sellers. Although scams are generally illegal and/or against advertising platform policies, personalization makes the rules easier to evade, as we see with some ways that Facebook ads are optimized for deceptive advertising. Most personalized ads aren’t clustered at the good (high-quality pair of shoes in your size, on sale, next door!) or bad (malware pre-configured for your system) ends of the spectrum. Advertisers at all levels of quality and honesty are present, so any framework for thinking about ad personalization needs to take that variability into account.

p. 3 Some privacy advocates assume, incorrectly, that personalized marketing based on granular consumer data is automatically harmful…

Treating personalized advertising as harmful by default is not an assumption, but a useful heuristic based on both theoretical models and real-world experience. personally, I don’t pay attention to your ad if it’s personalized to me—it’s as credible as a cold call. But I might pay attention to your ad if it’s run in a place where the editors of sites that cover your industry would see it, or your mom would. Yes, it is possible for professors to imagine a hypothetical world in which personalization is beneficial, but that only works if you make the unrealistic simplifying assumption that all sellers are honest and that the only impact of personalization is to show people ads that are more or less well matched to them. The theoretical arguments in favor of personalized advertising break down as soon as you level up your economic model to consider the presence of both honest and deceptive advertisers in a market.

See Gardete and Bart, Tailored Cheap Talk: The Effects of Privacy Policy On Ad Content and Market Outcomes. Our research suggests that another peril of sharing very high quality targeting information with advertisers is that ad content may become less credible and persuasive to consumers. An advertising medium that allows for personalization is incapable of conveying as much information from an honest seller to a potential buyer as an advertising medium that does not support personalization.

Mustri et al., in Behavioral Advertising and Consumer Welfare, find that products found in behaviorally targeted ads are likely to be associated with lower quality vendors and higher product prices compared to competing alternatives found among search results.

p. 8 Which Consumers Care Most About Privacy, and Do Privacy Policies Unintentionally Favor the Privileged?

Lots of studies show that, basically, some people really want cross-context personalized advertising, some people don’t, and for the largest group in the middle, it depends how you ask. (references at the 30-40-30 rule). But the difference in consumer preferences is not about privilege, it’s about information level. See Turow et. al, Americans Reject Tailored Advertising and Three Activities That Enable It. That study includes a survey of privacy preferences before and after informing the participants about data practices—and people were more likely to say they do not want tailored advertising after getting the additional information.

In the Censuswide study Who’s In the Know: The Privacy Pulse Report, the experienced advertisers surveyed in the USA (people with 5 or more years of ad experience) were more likely than average to use an ad blocker (66% > 52%), and privacy is now the number one reason for people to use one. It is reasonable for policy-makers to consider the preferences of better-informed people—which is already a thing in fields such as transportation safety and public health.

p. 11 Poorer consumers live in data deserts (Tucker 2023), causing algorithmic exclusion due to missing or fragmented data. This exclusion thwarts marketing outreach and may deprive them of offers, exacerbating data deserts and marginalization.

Instead of speculating about this problem, personalized advertising proponents who are concerned about some people not being tracked enough can already look at other good examples of possibly under-surveilled consumers. Early adopters of privacy tools and preferences are helpfully acting as the experimental group for a study that the surveillance business hasn’t yet run. If people on whom less data is collected are getting fewer win-win offers, then the privacy early adopters should have worse consumer outcomes than people who leave the personalization turned on. For example, Apple iOS users with App Tracking Transparency (ATT) set to allow tracking should be reporting higher satisfaction and doing fewer returns and chargebacks. So far, this does not seem to be happening. (For a related result, see Bian et al., Consumer Surveillance and Financial Fraud. Consumers who deliberately placed themselves in a data desert by changing ATT to disallow tracking reported less fraud.) Click this to buy better stuff and be happier

And there’s little evidence to suggest that if a personalized ad system knows someone to be poor, that they’ll receive more of the kind of legit, well-matched offers that are targeted to the more affluent. Poor people tend to receive more predatory finance and other deceptive offers, so may be better off on average with ads less well matched to their situation.

p. 13 More broadly, without cross-site/app identity, consumers enjoy less free content

This depends on how you measure content and how you define enjoy. The Kircher and Foerderer paper states that, although children’s games for Android got fewer updates on average after a targeted advertising policy change by Google,

Only exceptionally well-rated and demanded games experienced more feature updates, which could be interpreted as a sign of opportunity due to better monetization potential or weakened competition. However, considering that we observed these effects only for games in the highest decile of app quality and demand and given that the median user rating of a game is 4.1 of 5, our findings suggest widespread game abandonment.

By Sturgeon’s Law, a policy change that benefits the top 10% of games but not the bottom 90% (which, in total, account for a small fraction of total installs and an even smaller fraction of gameplay) is a win for the users.

Another relevant paper is Kox, H., Straathof, B., and Zwart, G. (2014). Targeted advertising, platform competition and privacy.

We find that more targeting increases competition and reduces the websites’ profits, but yet in equilibrium websites choose maximum targeting as they cannot credibly commit to low targeting. A privacy protection policy can be beneficial for both consumers and websites.

When both personalized and non-personalized ad impressions are available in the same market, the personalized impressions tend to go for about double the non-personalized. But it doesn’t work to artificially turn off some data collection for a fraction of ad impressions, observe that revenue for those impressions is lower (compared to impressions with the data that are still available), and then extrapolate the revenue difference to a market in which no impressions have the data available.

It is also important to consider the impact of extremely low-quality and/or illegal content in the personalized advertising market. Much of the economic role of ad personalization is not to match the right ad to the right user but to monetize a higher-value user on lower-value content. The surveillance economy is more like the commodification economy. Surveillance advertising companies are willing to pursue content commodification even to the point of taking big reputational risks from feeding ad money to the worst people on the Internet (Hiding in Plain Sight: The Ad-Supported Piracy Ring Driving Over a Billion Monthly Visits - deepsee.io, Senators Decry Adtech Failures as Ads Appear On CSAM Site). If advertising intermediaries were more limited in their ability to put a good ad on a bad site using user tracking, the higher-quality content sites would enjoy significantly increased market power.

p. 14 Restrictions to limit the effectiveness of digital advertising would likely disproportionately disadvantage small businesses, since nine out of ten predominantly use digital advertising, especially on Meta

Are small businesses really better off in the surveillance advertising era? Although personalized Big Tech advertising is the main ad medium available to small businesses today, there is clearly some survivorship bias going on here. The Kerrigan and Keating paper states that, While entrepreneurship has rebounded since the Great Recession and its aftermath, startup activity remains weak by historical standards. This period of time overlaps with the golden age of personalized advertising, after widespread adoption of smartphones but before Apple’s ATT, the EU’s GDPR, and California’s CCPA. If personalized advertising is so good for small businesses, where are the extra small businesses enabled by it? We should have seen a small business boom in the second half of the 2010s, after most people in the USA got smartphones but before CCPA and ATT.

Jakob Nielsen may have provided the best explanation in 2006’s Search Engines as Leeches on the Web, which likely applies not just to search, but to other auction-based ad placements like social media advertising. An auction-based advertising platform like those operated by Google and Meta is able to dynamically adjust its advertising rates to capture all of the expected incremental profits from the customers acquired through it.

Part of the missing small business effect may also be caused by platform concentration. If, instead of an advertising duopoly, small businesses had more options for advertising, the power balance between platform (rentier) and small business (entrepreneur) might shift more toward the latter. See also Crawford et al., The antitrust orthodoxy is blind to real data harms. Policy makers might choose to prioritize pro-competition privacy legislation such as surveillance licensing for the largest, riskiest platforms in order to address competition concerns in parallel with privacy ones.

p. 15 Since PETs are costly for firms to implement, forward-looking regulation should consider how to incentivize PET adoption and innovation further.

In a section about how so-called privacy-enhancing technologies (PETs) have equal perceived privacy violation and bigger competition issues than conventional personalization, why recommend incentivizing PETs? The works cited would better support a recommendation to have a more detailed or informative consent experience for PETs than for cookie-based tracking. Because PETs obfuscate real-world privacy problems such as fraud and algorithmic discrimination, it would be more appropriate to require additional transparency, and possibly licensing, for PETs.

PETs, despite their mathematical appeal to many at Big Tech firms, have a long list of problems when applied to the real world. The creeped-out attitude of users toward PETs is worth paying attention to, as people who grow up in market economies generally develop good instincts about information in markets—just like people who grow up playing ball games can get good at catching a ball without consciously doing calculus. Policymakers should pay more attention to user perceptions—which are based on real-world market activity—than to mathematical claims about developers’ PET projects. PETs should be considered from the point of view of regulators investigating discrimination and fraud complaints, which are often difficult to spot on large platforms. Because PETs have the effect of shredding the evidence of platform misdeeds, enabling the existing problems of adtech, just in a harder-to-observe way, they need more scrutiny, not incentivization.

Coming soon: a useful large-scale experiment

Policymakers may soon be able to learn from what could be the greatest experiment on the impact of ad personalization ever conducted.

If Meta is required to offer Facebook users in the European Union a meaningfully de-personalized ad experience (and not just the less personalized ads option that still allows for personalization using fraud risk factors like age, gender, and location) then there will be a chance to measure what happens when users can choose personalized or de-personalized ads on a service that is otherwise the same.

Personally, I bet that users with the personalization turned off will have better outcomes as consumers, but we’ll see. I’m pretty confident that personalized ads will turn out to be worse because tools and settings that tend to make personalization less effective have been available for a while, and if choosing the privacy option made you buy worse stuff, the surveillance companies would have said so by now.

Conclusion

I put these links and notes together to help myself out when someone drops a link to the Dubé et al. paper into an Internet argument, and put them up here in the hope that they will help others. Hardly anyone will read all the literature in this field, but a lot of the most interesting research is still found in corners of the library that Big Tech isn’t actively calling attention to.

Thanks to Fengyang Lin for reviewing a draft of this post.

06 Mar 2025

mlp-2025-03-06

It is time to make the Online Safety Act 2023 fit for purpose

AI: The New Aesthetics of Fascism

https://news.sky.com/story/bluesky-13320824

CAUGHT WITH THEIR HAND IN THE COOKIE JAR?: CNN’s Privacy Lawsuit is Served Fresh and the Court is Taking a Bite Firefox deletes promise to never sell personal data, asks users not to panic https://nofreeviewnoreview.org/ https://algorithmic-sabotage.github.io/asrg/trapping-ai/ Stick with the Weirdos: Marie Davidson’s Favourite Books “Emergent Misalignment” in LLMs https://v.st/daft https://futurism.com/microsoft-ceo-ai-generating-no-value A US federal judge dismisses a lawsuit against TikTok and YouTube over “choking challenge” videos, saying the case failed on its merits and citing Section 230 The Real Goal of the Trump Economy Making any integer with four 2s Fyre Festival 2 is coming, and it already sounds bananas (and not in a good way) 1,000 artists release ‘silent’ album to protest UK copyright sell-out to AI Why Data Minimization Is A Very Big Deal For Ad Tech https://www.npr.org/2025/02/25/nx-s1-5307965/consumer-confidence-sentiment-inflation-trump-tariffs https://switching.software/ DeepSeek goes beyond “open weights” AI with plans for source code release Profiles In Cowardice: The Nobody Saw This Coming Brigade Tesla sales are tanking in Europe. Is Musk to blame? Shadow IT, shadow research, and democratizing research Time for a Reset https://wallethub.com/blog/google-quality-issues-report/147091 https://www.joanwestenberg.com/you-dont-have-to-monetize-the-things-you-love/ No, Privacy is Not Dead: Beware the All-or-Nothing Mindset Open source LLMs hit Europe’s digital sovereignty roadmap The EU AI Act is Coming to America What the US’ first major AI copyright ruling might mean for IP law This Is the Age of the Coward Meta in Myanmar (full series) Wallfacing Baking Soda Is the Key to Perfectly Browned Ground Beef Ukraine Drives Next Gen Robotic Warfare The prophet of parking Google faked a Gemini AI answer in its Super Bowl commercial ‘I am overwhelmed’: Luigi Mangione sends Valentine’s Day message as he launches fan site Advertisers Are Losing Trust In TAG and MRC After Damning CSAM Report The Worst Place To Show An Ad the decline of kids’ creativity https://newsletter.counteroffensive.pro/p/deep-dive-taiwan-miltech-aims-to-undermine-chinese-components https://webonastick.com/fonts/routed-gothic/ ‘The Hardest Working Font in Manhattan’ Privacy Loves Company Nearly a Year Later, Mozilla is Still Promoting OneRep Apple’s app tracking privacy framework could fall foul of German antitrust rules A win at last: Big-time blow to AI world in training data copyright scrap Ad Buyers Blast Google, Amazon, and Others After Ads Appear on Site Hosting Child Abuse Content Jason Snell Went There, Calls for iOS to Follow the Mac Model for Software Distribution Why you need a radio (yes, you!) Antoine Beaupré: Qalculate hacks Pop-up Ads in Your Jeep, the Latest Stellantis Innovation How WikiTok Was Created Metal Panels Fall Off of Cybertrucks, Revealing the Limits of a Glued-Together Truck It’s Official: the Cybertruck is More Explosive than the Ford Pinto ‘Torrenting From a Corporate Laptop Doesn’t Feel Right’ With the Support of Check My Ads Institute’s Advocacy, Congress Launches Bipartisan Inquiry into Adtech Firms’ Monetization of Child Abuse ICE gaming GOOGLE to create mirage of mass deportations… Lessons of Trump’s First Trade War https://hbr.org/2020/01/advertising-makes-us-unhappy The New DVD Bargain Bin Scrum Doesn’t Say… Why it makes perfect sense for this bike to have two gears and two chains Mob Rule What Happened Here https://www.crummy.com/2025/02/02/0 Elon Musk’s X is suing more advertisers over ad ‘boycott’ Quoted in Ars Technica’s article on tarpits for AI crawlers Reading newsletters via an RSS reader is still great ↦ Advertisers Aren’t Thrilled With Zuckerberg’s Embrace Of Hate Speech US Blocks Open Source ‘Help’ From These Countries “We ran out of columns” - The best, worst codebase The Concrete Club. Ars Technica – Democrat teams up with movie industry to propose website-blocking law Take Control of Your Data: Practical Tips for Data Privacy Week 2025 Ukraine prepares to showcase game-changing defense tech innovations at February forum in Kyiv The questions the Chinese government doesn’t want DeepSeek AI to answer People’s Privacy Act Introduced in Washington State dial down Coalition of Jewish groups say they’re leaving X over Musk’s behavior Democrats Flip Trump +21 District in Iowa Can we get the benefits of transitive dependencies without undermining security? https://medium.com/%40colin.fraser/generative-ai-is-a-hammer-and-no-one-knows-what-is-and-isnt-a-nail-4c7f3f0911aa Did you know its only $55 to get a lifetime license to Microsoft Office (that comes with Windows 11 Pro) https://www.adexchanger.com/online-advertising/people-managing-google-ad-campaigns-are-getting-their-accounts-seized-by-scammers/ notoriously vague term breaking compatibility while independent alternatives keep on going How a top Chinese AI model overcame US sanctions The case against self-closing tags in HTML A magical land where rounding equals truncation https://blog.logrocket.com/getting-started-pico-css/ bargaining Dave’s linkblog feed https://therecord.media/texas-probes-four-more-car-companies-data-collection-sharing “I prefer to meet people where they are” says reasonable-sounding white dude holding court at a table in the back of a Nazi Bar, redux. Authors Seek Meta’s Torrent Client Logs and Seeding Data in AI Piracy Probe https://www.nathanrabin.com/happy-place/2020/11/9/the-short-sad-strange-life-of-mr-delicious How the United States Learned to Love Internet Censorship Supreme Court upholds the TikTok ban. Profiting from addiction Whoops! Facebook trained Llama AI model on pirate site LibGen, with Zuckerberg’s OK Malicious extensions circumvent Google’s remote code ban Bot-ily Harm. Apple’s AI helpfully rewords scam messages to make them look legitimate https://www.crikey.com.au/2025/01/08/apple-new-artificial-intelligence-rewords-scam-messages-look-legitimate/ https://www.canarymedia.com/articles/virtual-power-plants/sonnen-solrite-to-offer-free-batteries-and-solar-to-texas-homeowners Interim note 5: web media and web dev employment Hotel chain ditches Google search for DuckDuckGo — ‘subjected to fraud attempts daily’ https://www.washingtonpost.com/home/2025/01/13/online-shopping-product-roundups/?pwapi_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJyZWFzb24iOiJnaWZ0IiwibmJmIjoxNzM2NzQ0NDAwLCJpc3MiOiJzdWJzY3JpcHRpb25zIiwiZXhwIjoxNzM4MTI2Nzk5LCJpYXQiOjE3MzY3NDQ0MDAsImp0aSI6IjA0M2ZlMjY0LTBjYzItNDgyMC04MTlmLTY1Njc5YjNkNTY2YiIsInVybCI6Imh0dHBzOi8vd3d3Lndhc2hpbmd0b25wb3N0LmNvbS9ob21lLzIwMjUvMDEvMTMvb25saW5lLXNob3BwaW5nLXByb2R1Y3Qtcm91bmR1cHMvIn0.vR-rq38Ql_H984trq0Ja79Ft4CqcqERIqCw7IzoN9E4 Congestion Tolls versus Congestion Pricing #2: Five things you need if you run a small, low-risk user-to-user service How extensions trick CWS search See What WebAssembly Can Do in 2025 Chromium - Policy List More Than Half of All Google Search Takedowns Now Come from Link-Busters My PhD advisor rewrote himself in bash (2010) Why Individual Rights Can’t Protect Privacy https://russ.garrett.co.uk/2024/12/17/online-safety-act-guide/ https://natlawreview.com/article/hashing-it-out-jornayas-data-tech-victory-over-cipa-claims https://www.marketingbrew.com/stories/2024/12/09/meta-plans-crackdown-on-health-related-user-data

Long-range drone strikes weakening Russia’s combat ability, senior Ukrainian commander says

02 Mar 2025

two open source stories

First, I know that pretty much everyone is (understandably) freaking out about stuff that is getting worse, but I just wanted to share some good news in the form of an old-fashioned open-source success story. I’m a fairly boring person and developed most of my software habits in the late 1990s and early 2000s, so it’s pretty rare that I actually hit a bug.

But so far this blog has hit two: one browser compatibility issue and this one. The script for rebuilding when a file changes depends on the inotifywait utility, and it turned out that until recently it breaks when you ask it to watch more than 1024 files.

  1. I filed a bug

  2. A helpful developer, Jan Kratochvil, wrote a fix and put in a pull request.

  3. A bot made test packages and commented with instructions for me on how to test the fix.

  4. I commented that the new version works for me

  5. The fix just went into Fedora. Pretty damn slick.

This is a great improvement over how this kind of thing used to work. I hardly had to do anything. These kids today don’t know how good they have it.

story number 2: why support the Linux desktop?

Amazon Chime is shutting down. Did anyone use it? I get invited to a lot of video conferences, and I never got invited to an Amazon Chime meeting. Even though Amazon.com is normally really good at SaaS, this one didn’t take off. What happened?

It looks like Amazon Chime was an interesting example of Nassim Nicholas Taleb’s intransigent minority effect.

The system requirements for Amazon Chime look pretty reasonable, right? Should get 95% of the client systems out there. The number of desktop Linux users is pretty small. But if you have 20 meetings a week, at 95% compatibility you’re going to average a compatibility issue every week. Even worse, the people you most want to make a good first impression on are the people whose client platform you’re least likely to know.

And if you do IT support for a company with 100 people organizing meetings, Amazon Chime is going to cause way too many support issues to put up with. Taleb uses the examples of kosher and halal food—only a small fraction of the population will only eat kosher or halal, but when planning food for a large group, the most practical choice is to satisfy the minority.

The minority rule will show us how it all it takes is a small number of intolerant virtuous people with skin in the game, in the form of courage, for society to function properly.

Anyway, something to keep in mind in the future for anyone considering moving the support desktop Linux or support Firefox tickets to backlog. None of the successul video conferencing platforms give me any grief for my Linux/Firefox/privacy nerdery client-side setup.

08 Feb 2025

the #Eurostack, so hot right now

Is it just me or is it all about Europe right now? Put on some Kraftwerk and follow along I guess.

Fedora Chooses Forgejo! This is GitHub-like project hosting software with version control, issues, pull requests, all the usual stuff. I have a couple of small projects on Codeberg, which is the (EU) hosted nonprofit instance and it works fine as far as I can tell. Also a meissa GmbH presentation at FOSDEM 2025 You know X, Facebook, Xing, SourceForge? What about GitHub? It is time to de-risk OpenSource engagement!

Lots more Europe-hosted Saas, too. Baldur Bjarnason has more info on Todo notes as a storm approaches

The Sovereign Tech Agency is supporting some Linux plumbing: Arun Raghavan: PipeWire ♥ Sovereign Tech Agency.

The northern German state of Schleswig-Holstein is moving 30,000 PCs from Microsoft Windows and Office to Linux and LibreOffice: LibreOffice at the Univention Summit 2025 I know, I know, government in Germany goes desktop Linux is the hey, Rocky, watch me pull a rabbit out of my hat of IT, but this time they’re not up against Microsoft in its prime, they’re up against a new generation that can’t open their old files, while LibreOffice can.

They Said It Couldn’t Be Done by Pierre-Carl Langlais, Anastasia Stasenko, and Catherine Arnett. These represent the first ever models trained exclusively on open data, meaning data that are either non-copyrighted or are published under a permissible license. Trained on the Jean Zay supercomputer. Related: Pirate Libraries Are Forbidden Fruit for AI Companies. But at What Cost?

Scott Locklin lists Examples of group madness in technology. One of the worst arguments I hear is that thing X is inevitable because the smart people are doing it. As I’ve extensively documented over the last 15 years on this blog, smart people in groups are not smart and are even more subject to crazes and mob behavior as everyone else.

Not a European product: Framework Laptop’s RISC-V board for open source diehards is available for $199 but there is a Europe angle here. European Union Seeks Chip Sovereignty Using RISC-V - EE Times, RISC-V Summit Europe. RISC-V holds significance for Europe due to its potential to foster innovation, enhance technological sovereignty, and stimulate economic growth within the region. By embracing RISC-V, European countries can reduce their dependency on foreign technologies and proprietary architectures, thereby enhancing their autonomy in critical sectors such as telecommunications, cybersecurity, and data processing.

Also international, not Europe-specific: Postgres full-text search is Good Enough! by Rachid Belaid. (But there is a tech autonomy angle, and an active PostgreSQL Europe, so for practical purposes PostgreSQL is part of the Eurostack.)

Good advice from tante/Jürgen Geuter: Innovation is a distraction The demand for more Innovation (and sometimes even the request for more research) has become a way to legitimize not doing anything. A way to say the unpleasant solutions we have are not perfect but in the future there might be a magic solution that doesn’t bother us and everyone gets a fucking unicorn.

Marloes de Koning interviews Cristina Caffarra. ‘We have to get to work and put Europe first. But we are late. Terribly late’ You really don’t have to buy everything in Europe, says the competition expert, who is familiar with the criticism that the American supply is simply superior. But start with 30 percent of your procurement budget in Europe. That already makes a huge difference. (That seems like an easy target. Not only are way more than 30 percent of the European Alternatives up to a servicable level by now, but unfortunately a lot of the legacy US vendors are having either quality or compliance problems, or both. The risks, technical and otherwise, keep going up.

Greg Nojeim and Silvia Lorenzo Perez cover Trump’s Sacking of PCLOB Members Threatens Data Privacy Aside from its importance in protecting civil liberties, the PCLOB cannot play its key role in enforcing U.S. obligations under the EU-U.S. Data Privacy Framework (DPF) while it lacks a quorum of members. The European Commission would lose a key oversight tool for which it bargained, and the adequacy decision that it issued to support the DPF could be struck down under review at the Court of Justice of the European Union (CJEU), which struck down two predecessor EU-U.S. data privacy arrangements, the Safe Harbor Agreement and the Privacy Shield.

Karl Bode writes, Apple Has To Pull Its “AI” News Synopses Because They Were Routinely Full Of Shit (If the features unavailable in Europe are problematic anyway…)

Sarah Perez covers Report: Majority of US teens have lost trust in Big Tech. Common Sense says that 64% of surveyed U.S. teens don’t trust Big Tech companies to care about their mental health and well-being and 62% don’t think the companies will protect their safety if it hurts profits. Over half of surveyed U.S. teens (53%) also don’t think major tech companies make ethical and responsible design decisions (think: the growing use of dark patterns in user interface design meant to trick, confuse, and deceive. A further 52% don’t think that Big Tech will keep their personal information safe and 51% don’t think the companies are fair and inclusive when considering the needs of different users. (What if the Eurostack becomes the IT version of those European food brands that sell well in other countries too?)

and more… Instead of F-35, Portugal turns to Europe in search of new fighter, In defense tech, Lithuania’s sovereign VC fund is one step ahead, Ukraine prepares to showcase game-changing defense tech innovations at February forum in Kyiv, Open source LLMs hit Europe’s digital sovereignty roadmap

29 Jan 2025

time to sharpen your pencils, people

Mariana Olaizola Rosenblat covers How Meta Turned Its Back on Human Rights for Tech Policy Press. Zuckerberg announced that his company will no longer work to detect abuses of its platforms other than high-severity violations of content policy, such as those involving illicit drugs, terrorism, and child sexual exploitation. The clear implication is that the company will no longer strive to police its platform against other harmful content, including hate speech and targeted harassment.

Sounds like a brand-unsafe environment. So is another rush of advertiser boycott stories coming? Not this time. Lara O’Reilly reports that brand safety has recently become a political hot potato and been a flash point for some influential, right-leaning figures. In uncertain times, marketing decision-makers are keeping a low profile. Most companies aren’t really set up to take on the open-ended security risk of coming out against hate speech by users with friends in high places. According to the Fraternal Order of Police, the January 6 pardons send a dangerous message, and that message is being heard in marketing departments. The CMOs who boycotted last time are fully aware that stochastic terrorism is a thing, and that rage stories about companies spread quickly in Facebook groups and other extremist media. If an executive makes the news for pulling ads from Meta, they would be putting employees at risk from lone, deniable attacks. So instead of announcing a high-profile boycott, marketers are more likely to follow the example of Federal employees and do the right thing, by the book, and quietly.

Fortunately, big advertisers got some lower-stakes practice with the X (former Twitter) situation. Instead of either (1) staying on there and putting the brand at risk of being associated with material copied out of Henry Ford’s old newspaper or (2) risking getting snarled up in a lawsuit for pulling the X ads entirely, brands got the best of both by cutting way back on the actual money without dropping X entirely or saying much one way or the other.

And it’s possible for advertisers to reduce support for Meta without making a stink or drawing fire. Fortunately, Meta ads are hella expensive, and results can be unrealistic and unsustainable. Like all the Big Tech companies these days, Meta is coping with a slowdown in innovation by tweaking the ad rules to capture more revenue from existing services. As Jakob Nielsen pointed out back in 2006, in Search Engines as Leeches on the Web, ad platforms can even capture the value created by others. A marketer doesn’t have to shout ¡No Pasarán! or anything—just sharpen your best math pencil, quietly go through the numbers, spot something that looks low-ROAS or fraudulent in the Meta column, tweak the budget, repeat. If users can dial down Meta, so can marketers. (Update: Richard Kirk writes, Brands could be spending three times too much on social. You read that right. Read the math, do the math.) And if Meta comes out with something new and risky like the adfraud in the browser thing, Privacy-Preserving Attribution, it’s easy to use the fraud problem as the reason not to do it—you don’t have to stand up and talk politics at work.

From the user side

It’s not that hard to take privacy measures that result in less money for Big Tech. Even if you can’t quit Meta entirely, some basic tools and settings can make an impact, especially if you use both a laptop and a phone, not just a phone. With a few minutes of work, an individual in the USA can, in effect, fine the surveillance business about $50/month.

My list of effective privacy tips is prioritized by how much I think they’ll cost the surveillance business per minute spent. A privacy tips list for people who don’t like doing privacy tips but also don’t like creepy oligarchs. (As they say in the clickbait business, number 9 will shock you: if you get your web browser info from TV and social media, you probably won’t guess which browsers have built-in surveillance and/or fraud features.) That page also has links to more intensive privacy advice for those who want to get into it.

A lawyer question

As an Internet user, I realize I can’t get to Meta surveillance neutral just with my own privacy tools and settings. For the foreseeable future, companies are going to be doing server-to-server tracking of me with Meta CAPI.

So in order to get to a rough equivalent of not being surveilled, I need to balance out their actual surveillance by introducing some free speech into the system. (And yes, numbers can be speech. O, the Tables tell!) So what I’d like to do is write a surrogate script (that can be swapped in by a browser extension in place of the real Meta Pixel, like the surrogate scripts uBlock Origin uses) to enable the user to send something other than valid surveillance data. The user would configure what message the script would send. The surrogate script would then encode the message and pass it to Meta in place of the surveillance data sent by the original Meta script. There is a possible research angle to this, since I think that in general, reducing ad personalization tends to help people buy better products and services. An experiment would probably show that people who mess with cross-context surveillance are happier with their purchases than those who allow surveillance. Releasing a script like that is the kind of thing I could catch hell for, legally, so I’m going to wait to write it until I can find a place to host it and a lawyer to represent me. Anyone?

25 Jan 2025

security headers for a static site

This site now has an OPML version (XML) of the blogroll. What can I do with it? It seems like the old Share your OPML site is no more. Any ideas?

Also went through Securing your static website with HTTP response headers by Matt Hobbs and got a clean bill of health from the Security Headers site. Here’s what I have on here as of today:

Access-Control-Allow-Origin "https://blog.zgp.org/"
Cache-Control "max-age=3600"
Content-Security-Policy "base-uri 'self'; default-src 'self'; frame-ancestors 'self';"
Cross-Origin-Opener-Policy "same-origin"
Permissions-Policy "accelerometer=(),autoplay=(),browsing-topics=(),camera=(),display-capture=(),document-domain=(),encrypted-media=(),fullscreen=(),geolocation=(),gyroscope=(),magnetometer=(),microphone=(),midi=(),payment=(),picture-in-picture=(),publickey-credentials-get=(),screen-wake-lock=(),sync-xhr=(self),usb=(),web-share=(),xr-spatial-tracking=()" "expr=%{CONTENT_TYPE} =~ m#text\/(html|javascript)|application\/pdf|xml#i"
Referrer-Policy no-referrer-when-downgrade
Cross-Origin-Resource-Policy same-origin
Cross-Origin-Embedder-Policy require-corp
Strict-Transport-Security "max-age=2592000"
X-Content-Type-Options: nosniff

(update 2 Feb 2025) This site has some pages with inline styles, so I can’t use that CSP line right now. This is because I use the SingleFile extension to make mirrored copies of pages, so I need to move those into their own virtual host so I can go back to using the version without the unsafe-inline.

(update 23 Feb 2025) The Pagefind site search requires ‘unsafe-eval’ in CSP in order to support WASM. This should be wasm-unsafe-eval in the future.

To do WASM and inline styles, the new value for the Content-Security-Policy header is:

"base-uri 'self'; default-src 'self'; style-src 'self' 'unsafe-inline'; script-src 'self' 'unsafe-eval'; frame-ancestors 'self';"

I saved a copy of Back to the Building Blocks: A Path Toward Secure and Measurable Software (PDF). The original seems to have been taken down, but it’s a US Government document so I can keep a copy on here (like the FBI alert that got taken down last year, which I also have a copy of.)

18 Jan 2025

Supreme Court files confusing bug report

I’m still an Internet optimist despite…things…so I was hoping that Friday’s Supreme Court opinion in the TikTok case would have some useful information about how to design online social networking in a way that does get First Amendment protection, even if TikTok doesn’t. But no. Considered as a bug report, the opinion doesn’t help much. We basically got (1) TikTok collects lots of personal info (2) Congress gets to decide if and how it’s a national security problem to make personal info available to a foreign adversary, and so TikTok is banned. But everyone else doing social software, including collaboration software, is going to have a lot to find out for themselves.

The Supreme Court pretty much ignores TikTok’s dreaded For You Page algorithm and focuses on the privacy problem. So we don’t know if some future ban of some hypothetical future app that somehow fixed its data collection issues would hold up in court just based on how it does content recommendations. (Regulating recommendation algorithms is a big issue that I’m not surprised the Court couldn’t agree on in the short time they had for this case.) We also get the following, on p. 9—TikTok got the benefit of the doubt and received some First Amendment consideration that future apps might or might not.

This Court has not articulated a clear framework for determining whether a regulation of non-expressive activity that disproportionately burdens those engaged in expressive activity triggers heightened review. We need not do so here. We assume without deciding that the challenged provisions fall within this category and are subject to First Amendment scrutiny.

Page 11 should be good news for anybody drafting a privacy law anyway. Regulating data collection is content neutral for First Amendment purposes—which should be common sense.

The Government also supports the challenged provisions with a content-neutral justification: preventing China from collecting vast amounts of sensitive data from 170 million U. S. TikTok users. That rationale is decidedly content agnostic. It neither references the content of speech on TikTok nor reflects disagreement with the message such speech conveys….Because the data collection justification reflects a purpose[e] unrelated to the content of expression, it is content neutral.

The outbound flow of data from people in the USA is what makes the TikTok ban hold up in court. Prof. Eric Goldman writes that the ban is taking advantage of a privacy pretext for censorship, which is definitely something to watch out for in future privacy laws, but doesn’t apply in this case.

But so far the to-do list for future apps looks manageable.

  • Don’t surveil US users for a foreign adversary

  • Comply with whatever future restrictions on recommendation algorithms turn out to hold up in court. (Disclosure of rules or source code? Allow users to switch to chronological? Allow client-side or peer-to-peer filtering and scoring? Lots of options but possible to get out ahead of.)

Not so fast. Here’s the hard part. According to the Court the problem is not just the info that the app collects automatically and surreptitiously, or the user actions it records, but also the info that users send by some deliberate action. On page 14:

If, for example, a user allows TikTok access to the user’s phone contact list to connect with others on the platform, TikTok can access any data stored in the user’s contact list, including names, contact information, contact photos, job titles, and notes. Access to such detailed information about U. S. users, the Government worries, may enable China to track the locations of Federal employees and contractors, build dossiers of personal information for blackmail, and conduct corporate espionage.

and in Justice Gorsuch’s concurrence,

According to the Federal Bureau of Investigation, TikTok can access any data stored in a consenting user’s contact list—including names, photos, and other personal information about unconsenting third parties. Ibid. (emphasis added). And because the record shows that the People’s Republic of China (PRC) can require TikTok’s parent company to cooperate with [its] efforts to obtain personal data, there is little to stop all that information from ending up in the hands of a designated foreign adversary.

On the one hand, yes, sharing contacts does transfer a lot of information about people in the USA to TikTok. But sharing a contact list with an app can work a lot of different ways. It can be

  1. covert surveillance (although mobile platforms generally do their best to prevent this)

  2. data sharing that you get tricked into

  3. deliberate, more like choosing to email a copy of the company directory as an attachment

If it’s really a problem to enable a user to choose to share contact info, then that makes running collaboration software like GitHub in China a problem from the USA side. (Git repositories are full of metadata about who works on what, with who. And that information is processed by other users, by the platform itself, and by third-party tools.) Other content creation tools also share the kinds of info on skills and work relationships that would be exactly what a foreign adversary murder robot needs to prioritize targets. But the user, not some surveillance software, generally puts that info there. If intentional contact sharing by users is part of the reason that the USA can ban TikTok, what does that mean for other kinds of user to user communication?

Kleptomaniac princesses

There’s a great story I read when I was a kid that I wish I had the citation for. It might be fictional, but I’m going to summarize it anyway because it’s happening again.

Once upon a time there was a country that the UK really, really wanted to maintain good diplomatic relations with. The country was in a critical strategic location and had some kind of natural resources or something, I don’t remember the details. The problem, though, was that the country was a monarchy, and one of the princesses loved to visit London and shoplift. And she was really bad at it. So diplomats had to go around to the stores in advance to tell the manager what’s going on, convince the store to let her steal stuff, and promise to settle up afterwards.

Today, the companies that run the surveillance apps are a lot like that princess. techbros don’t have masculine energy, they have kleptomaniac princess energy If one country really needs to maintain good relations with another, they’ll allow that country’s surveillance apps to get away with privacy shenanigans. If relations get chillier, then normal law enforcement applies. At least for now, though, we don’t know what the normal laws here will look like, and the Supreme Court didn’t provide many hints yesterday.

15 Jan 2025

How this site uses AI

This site is written by me personally except for anything that is clearly marked up and cited as a direct quotation. If you see anything on here that is not cited appropriately, please contact me.

Generative AI output appears on this site only if I think it really helps make a point and only if I believe that my use of a similar amount and kind of material from a relevant work in the training set would be fair use.

For example, I quote a sentence of generative AI output in LLMs and reputation management. I believe that I would have been within my fair use rights to use the same amount of text from a copyrighted history book or article.

In LLMs and the web advertising business, my point was not only that the Big Tech companies are crooked, but that it’s so obvious. A widely available LLM can easily point out that a site running Big Tech ads—for real brands—is full of ripped-off content. So I did include a short question and answer session with ChatGPT. It’s really getting old that big companies are constantly being shocked to discover infringement and other crimes when their own technology could have spotted it.

Usually when I mention AI or LLMs on here I don’t include any generated content.

More slash pages

11 Jan 2025

Click this to buy better stuff and be happier

Here’s my contender for Internet tip of the year. It’s going to take under a minute, and will not just help you buy better stuff, but also make you happier in general. Ready? Here it is, step by step.

  1. Log in to your Google account if you’re not logged in already. (If you have a Gmail or Google Drive tab open in the browser, you’re logged in.)

  2. Go to My Ad Center.

  3. Find the Personalized ads control. It looks something like this.

Personalized ads on
  1. Turn it off.
Personalized ads off
  1. That’s it. Unless you have another Google account. If you do have multiple Google acccounts (like home, school, and work accounts) do this for each one.

This will affect the ads you get on all the Google sites and apps, including Google Search and YouTube, along with the Google ads on other sites. Google is probably going to show you some message to try to discourage you from doing this. From what I can tell from the outside, it looks like turning off personalized ads will cost Google money. Last time I checked, I got the following message.

Ads may seem less relevant When your info isn’t used for ads, you may see fewer ads for products and brands that interest you. Non-personalized ads on Google are shown to you according to factors like the time of day, device type, your current search or the website you’re visiting, or your current location (based on your IP address or device permissions).

But what they don’t say is anything about how personalized ads will help you buy better products and services. And that’s because—and I’m going out on a limb here data-wise, but a pretty short and solid limb, and I’ll explain why—they just don’t. Choosing to turn off personalized ads somehow makes you a more satisfied shopper and better off.

How does this work?

I still don’t know how exactly how this tip works, but so far there have been a few theories.

1: lower fraud risk. It’s possible that de-personalizing the ads reduces the number of scam advertisers who can successfully reach you. Bian et al., in Consumer Surveillance and Financial Fraud, show that Apple App Tracking Transparency, which reduces the ability of apps to personalize ads, tended to reduce fraud complaints to the FTC.

We estimate that the reduction in tracking reduces money lost in all complaints by 4.7% and money lost reported in internet and data security complaints by 40.1%.

That’s a pretty big effect. De-personalizing ads might mean that your employer doesn’t get compromised by an ad campaign that delivers malware targeting a specific company, and you don’t get targeted for fake ads targeted to users of a software product. Even if the increase in fraud risk for users with personalization left on is relatively small, getting scammed has a big impact and can move the average money and happiness metrics a lot.

2: more mindful buying. Another possibility is that people who get fewer personalized ads are making fewer impulse purchases. Jessica Fierro and Corrine Reichert bought a selection of products from those Temu ads that seem to be everywhere, and decided they weren’t worth it. Maybe people without personalized ads are making fewer buying decisions but each one is better thought out.

3. buy more from higher quality vendors. Or maybe companies that put more money into personalized advertising tend to put less into improving product quality.ICMYI: Product is the P all marketers should strive to influence by Mark Ritson In Behavioral advertising and consumer welfare: An empirical investigation, Mustri et al. found that

targeted ads are more likely to be associated with lower quality vendors, and higher prices for identical products, compared to competing alternatives found in organic search results

In Why Your Brand Feels Like a Cheap Date: All Flash, No Substance in the World of Performance Marketing, Pesach Lattin writes,

Between 2019 and 2021, brands that focused on brand equity saw a 72% increase in value, compared to just 20% for brands that relied primarily on performance tactics. Ignoring brand-building not only weakens your baseline sales but forces you to spend more and more on performance marketing just to keep your head above water.

Brands that are over-focused on surveillance advertising might be forced to under-invest in product improvements.

4. limited algorithmic and personalized pricing. Personalized ads might be set up to offer the same product at higher prices to some people. The FTC was investigating, but from the research point of view, personalized pricing is really hard to tell apart from dynamic pricing. Even if you get volunteers to report prices, some might be getting a higher price because stock is running low, not because of who the individual is. So it’s hard to show how much impact this has, but hard to rule it out too.

5. it’s just a step on the journey. Another possibility is that de-personalizing the ads is a gateway to blocking ads entirely. What if, without personalization, the ads get gross or annoying enough that people tend to move up to an ad blocker? And, according to Lin et al. in The Welfare Effects of Ad Blocking,

[P]articipants that were asked to install an ad-blocker become less likely to regret recent purchases, while participants that were asked to uninstall their ad-blocker report lower levels of satisfaction with their recent purchases.

Maybe you don’t actually make better buying decisions while ads are on but personalization is off—but it’s a step toward full ad blocking where you do get better stuff and more happiness.

How do I know this works?

I’m confident that this tip works because if turning ad personalization off didn’t help you, Google would have said so a while ago. Remember the 52% paper about third-party cookies? Google made a big deal out of researching the ad revenue impact of turning cookie tracking on or off. And this ad personalization setting also has a revenue impact for Google. According to documents from one of Google’s Federal cases, keeping the number of users with ad personalization off low is a goal for Google—they make more money from you if you have personalization on, so they have a big incentive to try to convince you that personalization is a win-win. So why so quiet? The absence of a PDF about this is just as informative as the actual PDF would be.

And it’s not just Google. Research showing user benefits from personalized ads would be a fairly easy project not just for Google, but for any company that can both check a privacy setting and measure some kind of shopping outcome. Almost as long as Internet privacy tools have been a thing, so has advice from Internet Thought Leaders telling us they’re not a good idea. But for a data-driven industry, they’re bringing surprisingly little data—especially considering that for many companies it’s data they already have and would only need to do stats on, make graphs, and write (or have an LLM write) the abstract and body copy.

Almost any company with a mobile app could do research to show any benefits from ad personalization, too. Are the customers who use Apple iOS and turn off tracking more or less satisfied with their orders? Do banks get more fraud reports from app users with tracking turned on or off? It would be straightforward for a lot of companies to show that turning off personalization or turning on some privacy setting makes you a less happy customer—if it did.

The closest I have found so far is Balancing User Privacy and Personalization by Malika Korganbekova and Cole Zuber. This study simulated the effects of a privacy feature by truncating browsing history for some Wayfair shoppers, and found that people who were assigned to the personalized group and chose a product personalized to them were 10% less likely to return it than people in the non-personalized group. But that’s about a bunch of vendors of similar products that were all qualified by the same online shopping platform, not about the mix of honest and dishonest personalized ads that people get in total. So go back and do the tip if you didn’t already, enjoy your improved shopping experience, and be happy. More: effective privacy tips

05 Jan 2025

ads.txt for a site with no ads

This site does not have programmatic ads on it.

But just in case, since there’s a lot of malarkey in the online advertising business, I’m putting up this file to let the advertisers know that if someone sold you an ad and claimed it ran on here, you got burned.

That’s the ads.txt file for this site. The format is defined in a specification from the IAB Tech Lab (PDF). The important part is the last line. The placeholder is how you tell the tools that are supposed to be checking this stuff that you don’t have ads.

In other news, selling info on US citizens to North Korean murder robots is illegal now so we’ve got that going for us which is nice. See Justice Department Issues Final Rule Addressing Threat Posed by Foreign Adversaries’ Access to Americans’ Sensitive Personal Data

04 Jan 2025

Links for 4 Jan 2025: news from the low-trust society

Aram Zucker-Scharff writes, in Never Forgive Them,

If this year has revealed anything about the tech billionaires it is that they have a very specific philosophy other than just growth and that philosophy is malicious…I don’t think we can really take on the obstacle of, let’s call it more accurately, the scam economy without acknowledging this is all part of the design. They think they are richer than you and therefore you must be stupid and because you are stupid you should be controlled…

Read the whole thing. A lot of tech big shots want to play the rest of us like a real-time strategy game. (Ever notice that the list of skills in the we don’t hire US job applicants because the culture doesn’t value the following skills tweets is the same as the list of skills in the our AI has achieved human-level performance in the following skills tweets?) I predicted that low-trust society will trend in 2025, and I agree with Aram that a big part of that is company decision-makers deliberately making decisions that make it harder to trust others. I’m working on a list of known good companies. (Work in progress, please share yours if you have one.)

And yes, my link collecting tool as queued up a bunch of links about the shift towards a lower-trust society along with ways that people are adapting to it or trying to shift things back.

Opinion: We Need More Consequences for Reckless Driving. But That Doesn’t Mean More Punishment — Streetsblog USA (a lot of this is reactions to reactions to app-driven rat running through neighborhoods. Bollards can be a way to game the algorithm.)

Judge blocks parts of California bid to protect kids from social media (the ban on addictive feeds without consent is still there)

Self-Own (bullshit about economics, explained)

The Cows in the Coal Mine (bullshit about health, only getting worse)

This Year in Worker Conquests

Boeing strike ends after workers vote to accept “life-changing” wage increase

Steinar H. Gunderson: git.sesse.net goes IPv6-only (coping with AI scrapers)

OpenAI’s Board, Paraphrased: ‘To Succeed, All We Need Is Unimaginable Sums of Money’

Namma Yatri is a rideshare app that offers a better deal to drivers. Daily or per-trip flat rates, not a percentage

5 Rideshare Strategies That Are Complete BS

How to block Chrome from signing you into a Google account automatically

Leave Me Alone.

Firefox-maker Mozilla’s boosted revenue significantly in 2023, but the financial report may also raise concern

Google Cuts Thousands of Workers Improving Search After Search Results Scientifically Shown to Suck (a lot of the bullshit problem is downstream from Google’s labor/management issues)

Why is it so hard to buy things that work well? (imho Mark Ritson still explained it best—companies over-emphasize the promotion P of marketing, trying to find people slightly more likely to buy the product as is, over the product refinements that would tend to get more buyers. George Tannenbaum on destroying brand trust with too much of one P, too little of another: Ad Aged: Leave Me Alone.)

Why Big Business May Wind Up Missing Lina Khan

An ad giant wants to run your next TV’s operating system

Yes, your phone is tracking you via advertising ID, and companies are using it to sell your location and identity to anyone. Protect yourself by disabling this feature on your device.

Meta beats suit over tool that lets Facebook users unfollow everything (I guess now it turns out you can’t unfollow the AI bots anyway?)

Sweet Dreams and Sour Deals: How White-Noise Apps Are Playing Advertisers

NFL Player Uses Pirate Streaming Site to Watch His Own Team

Missouri AG claims Google censors Trump, demands info on search algorithm

Ex-coiner Y Combinator startup bro: ‘dawg i chatgpt’d the license, can’t be bothered with legal’

Steam adds the harsh truth that you’re buying “a license,” not the game itself

31 Dec 2024

predictions for 2025

(looks like I had enough notes for an upcoming event to do A-Z this year…)

Ad blocking will get bigger and more widely reported on. Besides the usual suspects, the current wave of ad blocking is also partly driven by professional, respectable security vendors. Malwarebytes Labs positions their ad blocker as an security tool and certain well-known companies are happy to help them with their content marketing by running malvertising. (example: Malicious ad distributes SocGholish malware to Kaiser Permanente employees) Silent Push is another security vendor helping to make the ads/malware connection. And, according to research by Lin et al., users who installed an ad blocker reported fewer regrets with purchases and an improvement in subjective well-being. Some of those users who installed an ad blocker reluctantly because of security concerns will be hard to convince to turn it off even if the malvertising situation improves.

Bullshit is going to be everywhere, and more of it. In 2025 it won’t be enough to just ignore the bullshit itself. People will also have to ignore what you might think of as a bullshit Smurf attack, where large amounts of content end up amplifying a small amount of bullshit. Some politician is going to tweet something about how these shiftless guys today need to pull up their pants higher, and then a bunch of mainstream media reporters are going to turn in their diligently researched 2000-word think pieces about the effect of higher pants on the men’s apparel market and human reproductive system. And by the time the stories run, the politician has totally forgotten about the pants thing and is bullshitting about something else. The ability to ignore the whole cycle will be key. So people’s content discovery habits are going to change, we just don’t know how.

Chrome: Google will manage to hang on to their browser, as prospective buyers don’t see the value in it. Personally I think there are two logical buyers. The Trade Desk could rip out the janky Privacy Sandbox stuff and put in OpenPass and UID2. Not all users would leave those turned on, but enough would to make TTD the dominant source for user identifiers in web ads. Or a big bank could buy Chrome as a fraud protection play and run it to maximize security, not just ad revenue. At the scale of the largest banks, protecting existing customers from Internet fraud would save the bank enough money to pay for browser development. Payment platform integration and built-in financial services upsell would be wins on top of that.

Both possible Chrome buyers would be better off keeping open-source Chromium open. Google would keep contributing code even if they didn’t control the browser 100%. They would feel the need to hire or sponsor people to participate on a legit open-source basis to support better interoperability with Google services. They wouldn’t be able to get the anticompetitive shenanigans back in, but the legit work would continue—so the buyer’s development budget would be lower than Google’s, long term. But that’s not going to happen. So far, decision makers are convinced that the only way to make money with the browser is with tying to Google services, so they’re going to pass up this opportunity.

Development tools will keep getting more AI in them. It will be easier to test new AI stuff in the IDE than to not test it. But a flood of plausible-looking new code that doesn’t necessarily work in all cases or reflect the unwritten assumptions of the project means a lot more demand for testing and documentation. The difference between a software project that spends 2025 doing self-congratulatory AI productivity win blog posts and one that has an AI code catastrophe is going to be how much test coverage they started with or were able to add quickly.

Environmental issues: we’re in for more fires, floods, and storms. Pretty much everybody knows why, but some people will only admit it when they have to. A lot of homeowners won’t be able to renew their insurance, so will end up selling to investors who are willing to demolish the house and hold the land for eventual resale. More former house occupants will pivot to #vanlife, and 24-hour health clubs will sell more memberships to people who mainly need the showers.

Firefox will keep muddling through. There will be more Internet drama over their ill-advised adfraud in the browser thing, but the core software will be able to keep going and even pick up a few users on desktop because of the ad blocking trend. The search ad deal going away won’t have much effect—Google pays Firefox to exist and limit the amount of antitrust trouble it’s in, not for some insignificant number of search ad clicks. If they can’t pay Firefox for default search engine placement, they’ll find some other excuse to send them enough cash to keep going. Maybe not as high on the hog as they have been used to, but enough to keep the browser usable.

Google Zero, where Google just stops sending traffic to a site, will arrive for a significant minority of sites. But not even insiders at Google know which. (I Attended Google’s Creator Conversation Event, And It Turned Into A Funeral | GIANT FREAKIN ROBOT, Google, the search engine that’s forgotten how to search)

Homeschooling will increase faster because of safety concerns, but parents will feel uncomfortable about social isolation and seek out group activities such as sports, crafts, parent-led classes, and group playdates. Homeschoooling will continue to be a lifestyle niche that’s relatively easy to reach with good influencer and content creator connections, but not well-covered by the mainstream media.

Immigration into the USA will continue despite high-profile deportations and associated human rights violations. But whether or not a particular person is going to be able to make it in, or be able to stay, is going to be a lot less predictable. If you know who the person is who might be affected by immigration policy changes, you might be able to plan around it, but what’s more likely from the business decision-making point of view is the person affected is an employee of some supplier of your supplier, or a family member, and you can’t predict what happens when their life gets disrupted. Any company running in lean or just-in-time mode, and relying on low disruption and high predictability, will be most at a disadvantage. Big Tech companies will try to buy their way out of the shitstorm, but heavy reliance on networks of supplier companies will mean they’re still affected in hard-to-predict ways.

Journalism will continue to go non-profit and journalist-owned. The bad news is there’s not enough money in journalism, now or in the near future, to sustain too many levels of managers and investors, and the good news is there’s enough money in it to keep a nonprofit or lifestyle company going. (Kind of like tech conferences. LinuxWorld had to support a big company, so wasn’t sustainable, but Southern California Linux Expo, a flatter organization, is.)

Killfile is the old Usenet word for a blocklist, and I already had something for B. The shared lists that are possible with the Fediverse and Bluesky are too useful not to escape into other categories of software. I don’t know which ones yet, but a shared filter list to help fix the search experience is the kind of thing we’re likely to see. People’s content discovery and shopping habits will have to change, we just don’t know how.

Low-trust society will trend. It’s possible for a country to move from high trust to low, or the other way around, as the Pew Research Center covered in 2008. The broligarchy-dominated political and business environment in the USA, along with the booms in growth hacking and AI slop, will make things a lot easier for corporate crime and scam culture. So people’s content discovery and shopping habits will have to change, we just don’t know how. Multi-national companies that already operate in middle-income low-trust countries will have some advantages in figuring out the new situation, if they can bring the right people in from there to here.

Military affairs, revolution in: If you think AI hype at the office in the USA is intense, just watch the AI hype in Europe about how advanced drones and other AI-enabled defense projects can protect countries from being occupied by an evil dictator without having to restore or expand conscription. Surveillance advertisers and growth hackers in the USA are constantly complaining about restrictions on AI in Europe—but the AI Act over there has an exception for the defense industry. In 2025 it will be clear that the USA is over-investing in bullshit AI and under-investing in defense AI, but it won’t be clear what to do about it. (bonus link: The Next Arsenal of Democracy | City Journal)

Neighborhood organizations: As Molly White recommended in November, more people will be looking for community and volunteer opportunities. The choice to become a joiner and not just a consumer in unpredictable times is understandable and a good idea in general. This trend could enter a positive feedback loop with non-profit and journalist-owned local news, as news sites try more community connections like Cleveland Documenters.

Office, return to: Companies that are doing more crime will tend to do more RTO, because signaling loyalty is more important than productivity or retaining people with desired skills. Companies that continue avoiding doing crimes, even in what’s going to be a crime-friendly time in the USA, will tend to continue cutting back on office space. The fun part is that the company can tell the employee that work from home privileges are a benefit, and not free office space for the employer. Win-win! So the content niche for how-tos on maximizing home (and van) offices will grow.

Prediction markets will benefit from 2024’s 15 minutes of fame to catch on for some niche corporate projects, and public prediction market prices will be quoted in more news stories.

Quality, flight to (not): If I were going to be unrealistically optimistic here, I’d say that the only way for advertisers to deal with the flood of AI slop sites and fake AI users is to go into full Check My Ads mode and just advertise on known legit sites made by and for people. But right now the habits and skills around race-to-the-bottom ad placements are too strong, so there won’t be much change on the advertiser side in 2025. A few forward-thinking advertisers will get good results from quality buying for specific campaigns, but that’s about it.

Research on user behavior will get a lot more important. The AI crapflood and resulting search quality crisis mean that (say the line, Bart) people’s content discovery and shopping habits will have to change, we just don’t know how. Companies that build user research capacity, especially in studying privacy users and the gaps they leave in the marketing data, will have an advantage.

State privacy law season will be spicy again. A few states will get big comprehensive privacy bills through the process again, but the laws to watch will be specific ones on health, protecting teens from the algorithm, social media censorship, and other areas. More states will get laws like Daniel’s Law. (We need a Daniel’s Law for military personnel, their families, and defense manufacturing workers, but we’re probably going to see some states do them for health insurance company employees instead.)update 1 Feb 2025: Compliance issues that came up for AADC will have to get another look.

Troll lawyer letters alleging violations of the California Invasion of Privacy Act (CIPA) and similar laws will increase. Operators of small sites can incur a lot of legal risk now just by running a Big Tech tracking pixel. But Big Tech will continue to ignore the situation, and put all the risks on the small site. (kind of like how Amazon.com uses delivery partner companies to take the legal risks of employing algorithmically micromanaged, overstressed delivery drivers.)

Unemployment and underemployment will trend up, not down, in 2025. Yes, there will be more political pressure on companies here to hire and manufacture locally, but actual job applicants aren’t interchangeable worker units in an RTS game—there’s a lot of mismatch between the qualities that job seekers will have and the qualities that companies will be looking for, which will mean a lot of jobs going unfilled. And employers tend to hire fewer people in unpredictable times anyway.

Virginia’s weak privacy law will continue to be ignored by most companies that process personal data. Companies will treat all the privacy law states as Privacyland, USA which means basically California.

Why is my cloud computing bill so high? will be a common question. But the biggest item on the bill will be the AI that [employee redacted] is secretly in love with, so you’ll never find it.

X-rated sites will face an unfriendly regulatory environment in many states, so will help drive mass-market adoption of VPNs, privacy technologies, cryptocurrencies, and fintech. The two big results will be that first, after people have done all the work to go underground to get their favorite pr0n site, they might as well use their perceived invisibility to get infringing copies of other content too. And second, a lot of people will get scammed by fake VPNs and dishonest payment services.

Youth privacy laws will drive more investment in better content for kids. (This is an exception to the Q prediction.) We’re getting a bunch of laws that affect surveillance advertising to people under 18. As Tobias Kircher and Jens Foerderer reported, in Ban Targeted Advertising? An Empirical Investigation of the Consequences for App Development, a privacy policy change tended to drive a lot of Android apps for kids out of the Google Play Store, but the top 10 percent of apps did better. If you have ever visited an actual app store, it’s clear that Sturgeon’s law applies, and it’s likely that the top 10 percent of apps account for almost all of the actual usage. All the kids privacy laws and regs will make youth-directed content a less lucrative play for makers of crap and spew who can make anything, leaving more of the revenue for dedicated and high-quality content creators.

ZFS will catch on in more households, as early adopters replace complicated streaming services (and their frequent price increases and disappearing content) with storage-heavy media PCs.

28 Dec 2024

How we get to the end of prediction market winter

Taylor Lorenz writes, in Prediction markets go mainstream,

Prediction markets—platforms where users buy and sell shares based on the probability of future events—are poised to disrupt the media landscape in 2025, transforming not only how news is shared but how it is valued and consumed.

Prediction markets did get some time in the spotlight this year. But the reasons for the long, ongoing prediction market winter are bigger than just prediction markets not being famous. Prediction markets have been around for a long time, and have stubbornly failed to go mainstream.

The first prediction market to get famous was the University of Iowa’s Iowa Electronic Markets which launched in the late 1980s and has been covered in the Wall Street Journal since at least the mid-1990s. They originally used pre-web software and you had to mail in a paper check (update 4 Jan 2024: paper checks are still the only way to fund your account on there). But IEM wasn’t the first. Prof. Robin Hanson, in Hail Jeffrey Wernick, writes about an early prediction market entrepreneur who started his first one in 1981. (A secretary operated the market manually, with orders coming in by fax.) Prediction markets were more famous than Linux or the World Wide Web before Linux or the World Wide Web. Prediction markets have been around since before stop trying to make fetch happen happened.

So the safe prediction would be that 2025 isn’t going to be the year of prediction markets either. But just like the year of Linux on the desktop never happened because the years of Linux in your pocket and in the data center did, the prediction markets that do catch on are going to be different from the markets that prediction market nerds are used to today. Some trends to watch are:

Payment platforms: Lorenz points out, Prediction markets are currently in legal limbo, but I’d bet against a ban, especially given the new administration. Right now in the USA there is a lot of VC money tied up in fintech, and a lot of political pressure from well-connected people to deregulate everything having to do with money. For most people the biggest result will be more scams and more hassles dealing with transactions that are legal and mostly trustworthy today but that will get enshittified in the new regulatory environment. But all those money-ish services will give prediction markets a lot more options for getting money in and out in a way that enables more adoption.

Adding hedging and incentivization: The prediction markets that succeed probably won’t be pure, ideal prediction markets, but will add on some extra market design to attract and retain traders. Nick Whitaker and J. Zachary Mazlish, in Why prediction markets aren’t popular, write that so far, prediction markets don’t appeal to the kinds of people who play other kinds of markets. People enter markets for three reasons. Savers are trying to build wealth, Gamblers play for thrills, and Sharps enter to profit from less well-informed traders. No category out of the three is well-served by existing prediction markets, because a prediction market is zero-sum, so not a way to build wealth long-term, and it’s too slow-moving and not very thrilling compared to other kinds of gambling. And the sharps need a flow of less well informed traders to profit from, but prediction markets don’t have a good way to draw non-sharps into the market.

Whitaker and Mazlish do suggest hedging as a way to get more market participants, but say

We suspect there is simply very little demand for hedging events like whether a certain law gets passed; there is only demand for hedging the market outcomes those events affect, like what price the S&P 500 ends the month at. Hedging market outcomes already implicitly hedges for not just one event but all the events that could impact financial outcomes.

That’s probably true for hedging in a large public prediction market. An existing oil futures market is more generally useful to more traders that a prediction market on all the events that might affect the price of oil. And certain companies’ stocks today are largely prediction markets on future AI breakthroughs and the future legal status of various corporate crimes. But I suspect that it’s different for a private market for events within a company or organization. For example, a market with sales forecasting contracts on individual large customers could provide much more actionable numbers to management than just trading on predicted total sales.

You could, in effect, pay for a prediction market’s information output by subsidizing it, and Whitaker and Mazlish suggest this. A company that runs an internal prediction market can dump money in and get info out. Like paying for an analyst or consulting firm, but in a distributed way where the sources of expertise are self-selecting by making trade/no trade decisions based on what they know or don’t know. But it’s also possible, usually on the smaller side, for a prediction market to become an incentivization market. To me, the difference is that in an incentivization market, a person with ability to affect the results holds a large enough investment in the market that it influences them to do so. The difference is blurry and the same market can be a prediction market for some traders and an incentivization market for others. But by designing incentives for action in, a market operator can make it drift away from a pure prediction market design to one that tends to produce an outcome. related: The private provision of public goods via dominant assurance contracts by Alexander Tabarrok

Proof of concept projects can already address specific information needs: A problem that overlaps with the prediction market incentivization problem in interesting ways is the problem of how to pay for information products and services that can be easily copied. How do we fund open source? is a persistent question. And Bruce Perens, original author of what became the Open Source Definition, wants to move on entirely. The problem of funding open source is hard enough that we mainly hear about it when a high-profile security issue makes the news.

As Luis Villa points out,

If you don’t know what’s in the box, you can’t secure it, so it is your responsibility as builders to know what’s in the box. We need better tools, we need better engagement to enable everybody to do that with less effort and less burden on individual volunteer maintainers and non-profits.

Companies that use open source software need to measure and reduce risks. The problem is that the biggest open source risks are related to hard-to-measure human factors like developer turnover and burnout. Developers of open source software can take actions that help companies understand their risks, but they’re not compensated for doing it. A prediction/incentivization market can both help quantify hidden risks and incentivize changes.

If you have an internal market that functions as both a prediction market and an incentivization market, you can subsidize both the information and the desired result by predicting the events that you don’t want to happen. This is similar to how commodities markets and software bug futures markets can work. Some traders are pure speculators, others take actions that can move the market. Farmers can plan which crops to plant based on predicted or contracted prices, companies can allocate money to fuel futures and/or fuel-saving projects, developers can prioritize tasks.

Synergy with AI projects: An old corporate Intranet rule of thumb [citation needed] is that you need five daily active editors to have a useful company or organization Wiki. I don’t know what the number is for a prediction market, but as Prof. Andrew Gelman points out, prediction markets need “dumb money” to create incentives for well-informed traders to play and win.

Noisy, stupid bots are a minus for most kinds of social software, but a win for markets. If only there were some easy way to crank up a bunch of noisy, stupid bots. Oh, wait, there’s a whole AI boom happening. Good timing, right? And AI projects need ways to test their output quality in a scalable way, just as much as prediction markets need extra trading churn. AI projects and prediction market projects solve each other’s problems.

  • Prediction markets need liquidity and dumb money. Bots can already do those.

  • AI projects need scalable quality checks. Slop is easier to make than to check, so evaluating the quality of AI output keeps growing relative to the declining costs of everything else. You can start up a lot of bots, fund each with a small stake, and shut down the broke ones. The only humans required are the traders who can still beat the bots. and if at some point the humans lose all their money, you know you won AI. Congratulations, and I for one welcome our bot plutocrat overlords.

Bots can also be run behind a filter to only make offers that, if accepted, would further the market operator’s goals in some way. For example, bots can be set up to be biased to over-invest on predicting unfavorable outcomes (like buying the UNFIXED side of bug futures) to add some incentivization.

Fixing governance by learning from early market experiences: Internal prediction markets at companies tend to go through about the same story arc. First, the market launches with some sponsorship and internal advocacy from management. Second, the market puts up some encouraging results. (Even in 2002 a prediction market was producing more accurate sales forecasts than the official ones at HP.) And for its final act, the prediction market ends up perpetrating the unforgivable corporate sin: accurately calling some powerful executive’s baby ugly. So the prediction market ends up going to live with a nice family on a farm. Read the (imho, classic) paper, Corporate Prediction Markets: Evidence from Google, Ford, and Firm X by Bo Cowgill and Eric Zitzewitz, and, in Professor Hanson’s post, why a VC firm could not get prediction markets into portfolio companies. Wernick blames the ego of managers who think their judgment best, hire sycophants, and keep key org info close to their chests.

The main lesson is that the approval and budget for the prediction market itself needs to be handled as many management levels as possible above the managers that the prediction market is likely to bring bad news to. Either limit the scope of issues traded on, or sell the market to a more highly placed decision maker, or both. The prediction market administrator needs to report to someone safely above the level of the decision-makers for the issues being traded on. The really interesting experiment would be a private equity or VC firm that has its own team drop in and install a prediction market at each company it owns. The other approach is bottom-up: start with limiting the market to predicting small outcomes like the status of individual software bugs, and be disciplined about not trading on more consequential issues until the necessary sponsorship is in place.

So, is 2025 the year of prediction markets? Sort of. A bunch of factors are coming together. Payment platform options, the ability to do proof of concept niche projects, and the good fit as a QA tool for AI will make internal market projects more appealing in 2025. And if market operators can learn from history to avoid what tends to happen to bearers of bad news, this could be the year.

25 Dec 2024

links for Christmas 2024

More stuff to read on the Internet.

Also, Quora Lies: WW2 Arial, Helvetica, Courier; also Times misinformation (More and more wrong answers out there, in easy to find places. Somehow, people will have to change content discovery habits to deal with scam culture and AI slop, but we don’t know how. IMHO the need for user research is greater than ever.)

[What say you, Spock?] My Proposed Terminology to Describe Bypassing Social Media Face ID Age Verification Systems (Interesting premise but are kids going to pick up hacking habits again? Kids back in the early days of the Internet had to hack because IT was rare, expensive, and flaky. But people who developed their Internet habits in the 2000s-2010s had it easy, because stuff was basically working but companies were still in create more value than you capture mode. I suppose kids today will have to learn to hack, not just beause of age verification stuff but because companies are in permanent hustle/growth hacking/value extraction mode, so the value available to the default user is less. Hack the consumer surplus?)

Step Right Up: The Chamber of Progress’s Ticketing Chamber of Horrors Fools Nobody (more news from the world of scam culture. Tech industry out of ideas? No problem, take low-reputation petty crimes like ticket scalping and scale them.)

Why Agentic AI Could Be Doomed To Fail, and 3 More AI Predictions for 2025 Accuracy of 75%-90% is state-of-the-art for AI….But if you have three steps of 75-90% accuracy, your ultimate accuracy is around 50%.

Linden Lab has spent $1.3B building Second Life and paid $1.1B to creators And since Linden Lab shares 90% of transactions with creators and only takes a 10% cut, the vast majority of the money generated through trade is paid to the creators themselves.

Classified fighter jet specs leaked on War Thunder – again (Do Wargaming.net players just take the games less seriously? This never seems to happen to the World of… games.)

The Ugly Truth About Spotify Is Finally Revealed Around this same time, I started hearing jazz piano playlists on Spotify that disturbed me. Every track sounded like it was played on the same instrument with the exact same touch and tone. Yet the names of the artists were all different….By total coincidence, Spotify’s profitability started to improve markedly around this time. and The Ghosts in the Machine, by Liz Pelly

Joey Hess: aiming at December The design goal of my 12 kilowatt system is to produce 1 kilowatt of power all day on a cloudy day in midwinter, which allows swapping between major loads (EV charger, hot water heater, etc) on a cloudy day and running everything on a sunny day. So the size of the battery bank doesn’t matter much. Batteries are getting cheaper fast too, but they are a wear item, so it’s better to oversize the solar system and minimize the battery….It costs more to mount solar panels now than the panels are worth.

Enrico Zini: New laptop setup (related: mine came up with fan and power light but no display, got helpful support)

21 Dec 2024

turning off browser ad features from the command line

(Previously: Google Chrome ad features checklist, turn off advertising features in Firefox.)

The Mozilla Firefox and Google Chrome browsers both have built-in advertising features, which I generally turn off because putting advertising features, even privacy-enhancing ones, in browsers is a bad idea. But the problem with going in to the settings and changing things is not just that it takes time to find stuff, but that it only affects the one browser profile you’re in. So every time I add a user account or a new browser profile, I still need to go to Settings and change the defaults again.

Fortunately it’s possible to turn the ad stuff off once and have it stay off. Both browsers have enterprise management features.

With a few commands, you can be your own enterprise manager, put the right file in the right location, and not have to worry about it.

On Linux, the following content should go in /etc/firefox/policies/policies.json for Firefox:

{
    "policies": {
        "Preferences": {
            "dom.private-attribution.submission.enabled": {
                "Status": "locked",
                "Type": "boolean",
                "Value": false
            },
            "browser.urlbar.suggest.quicksuggest.sponsored": {
                "Status": "locked",
                "Type": "boolean",
                "Value": false
            }
        }
    }
}

and the following content should go in

/etc/opt/chrome/policies/managed/managed_policies.json

for Chrome:

{
    "BlockThirdPartyCookies": true,
    "PrivacySandboxAdMeasurementEnabled": false,
    "PrivacySandboxAdTopicsEnabled": false,
    "PrivacySandboxPromptEnabled": false,
    "PrivacySandboxSiteEnabledAdsEnabled": false
}

The full list of available settings is at Chromium - Policy List. Some of these can be handy additions to the managed_policies.json file especially if you use multiple profiles. For example, I also add "DefaultBrowserSettingEnabled": false so that Google Chrome does not ask to be default browser.

Both files should be owned by the owner of the containing directory (root:root on my system) and mode 755.

That’s it.

There are ways to set this stuff up on Mac OS, too. I think it’s supposed to be

/Applications/Firefox.app/Contents/Resources/distribution/policies.json

for Firefox, but the /etc/ location might also work. For Google Chrome, there are Set up Chrome browser on Mac instructions.

There are also mentions of how to manage these two browsers on Microsoft Windows. If someone who blogs about those two OSs has instructions on how to set this up on other OS, please let me know and I’ll link to your blog post.

  • For Mac OS: YOUR_BLOG_LINK_HERE

  • For Microsoft Windows: YOUR_BLOG_LINK_HERE

Appeasement fails, and one more tip

For about the past five years, a lot of proponents of in-browser ad features have been going on about how we really need to let the advertisers have their privacy-preserving advertising systems in the browser, because otherwise the surveillance business is going to do something worse. But, as we can see from recent news, that’s not how boundary testing works. They put the ad features in the browser, and then went ahead and increased fingerprinting anyway.

Browser developer: can we make the browser a little creepy so we don’t have to do worse stuff like fingerprinting?

User: ok, fine (clicks Got it)

Browser developer: well if you didn’t mind that, you won’t mind this…fingerprinting…either, right?

User: (facepalm)

Not a surprise for readers of relationship blogs, which tend to be more realistic about how to handle boundary testing than web development blogs. For example, Terri Cole writes about a constructive way to respond to boundary testing, in Navigating Boundaries: Strategies for Addressing Repeat Violations with Effective Consequences.

You’ve 1) set a boundary, 2) communicated it to them, and, after the boundary was crossed, 3) named a consequence to let them know, if this happens again, this is what I am doing.

Accepting any in-browser ad feature just encourages them to test boundaries again and make the browser incrementally creepier and more intrusive. Consequences need to happen early and predictably, or the person testing your boundaries learns that they can test further. Letting creepy behavior slide is a way to get more of it later.

How can users realistically communicate with big companies that only pay attention to lawsuits, news stories, and metrics measured in millions? You can’t really turn off browser fingerprinting—that’s the point, it’s based on hardware or software features that are hard for the user to change—but you can send a signal (and as a useful side effect protect yourself from nasty stuff like malvertising targeted based on your employer.) One of the best underrated privacy tips is just to visit https://myadcenter.google.com/home and set Personalized Ads to Off. This doesn’t just help protect yourself, it also (1) moves a metric that they track, so sends a message that they will get, and (2) it does reduce surveillance advertising revenue, so you help limit the flow of money to the other side. Turning this stuff off is not mainly about protecting yourself, it’s about helping at-risk people hide in the crowd and about reducing the incentives to invest in surveillance.

No privacy setting or tool is a total fix by itself, but turning off in-browser ad features and turning off personalization are both pretty effective for the time invested. More tips: effective privacy tips

10 Dec 2024

run a command in a tab with gnome-terminal

To start a command a new tab, use the --tab command-line option to gnome-terminal, along with -- to separate the gnome-terminal options from the options passed to the commnd being run.

The script for previewing this site locally uses separate tabs for the devd process and for the script that re-runs make when a file changes.

#!/usr/bin/bash

set -e

trap popd EXIT
pushd $PWD
cd $(dirname "$0")

run_in_tab ()
{ 
    gnome-terminal --tab -- $*
}

make cleanhome # remove indexes, home page, feeds
make -j 
run_in_tab devd --port 8088 public
run_in_tab code/makewatch -j pages

More: colophon

26 Nov 2024

opt out of Google Page Annotations

Ever wish Google would have one button for opt me out of all Google growth hacking schemes that you could click once and be done with it? Me too. But that’s not how it works.

Anyway, the new one is Google Page Annotations: Google app for iOS now injects links back to Search on websites. I really don’t want this site showing up with links to stuff I didn’t link to. The choices of links on here are my own free expression.

This opt-out has two parts and you do need to have a Google Account to do it.

  1. Either set up Google Search Console and add your site(s) as web properties on there, or go to your existing Google Search Console account and get a list of your web properties.

  2. Visit the form: Opt out from Page Annotation in Google App browser for iOS and add your web properties as a comma-separated list. You have to be the Google Search Console owner of the site(s) to do the opt out.

Hopefully this awkward form thing is just temporary and there will be a more normal opt-out with a meta tag or something at some point. I’ll update this page if they make one.

IMHO the IT business had a peak some time in the mid-2000s. You didn’t have to dink with vintage PC stuff like DIP switches and partition tables, but the Internet companies were still in create more value than you capture mode and you didn’t have to work around too many dark patterns either. If I recall correctly, Microsoft did something like this link-adding scheme in Internet Explorer at one point, but they backed off on it before it really became a thing and the opt-out was easier. Welcome to the return of the power user. Oh well, writing up all the individual opt outs is good for getting clicks. The Google Search algorithm loves tips on how to turn Google stuff off.

24 Nov 2024

Use an ad blocking extension when performing Internet searches

The FBI seems to have taken down the public service announcement covered in Even the FBI says you should use an ad blocker on TechCrunch.

Use an ad blocking extension when performing internet searches. Most internet browsers allow a user to add extensions, including extensions that block advertisements. These ad blockers can be turned on and off within a browser to permit advertisements on certain websites while blocking advertisements on others.

This is still good advice. Search ads are full of scams, and you can block ads on search without blocking the ads on legit sites. I made a local copy of the FBI alert.

Why did they take the web version down? Maybe we’ll find out. I sent the FBI a FOIA request for any correspondence about this alert and the decision to remove it.

The Malwarebytes site has more good info on ongoing problems with search ads. Google Search user interface: A/B testing shows security concerns remain

23 Nov 2024

prediction markets and the 2024 election link dump

Eric Neyman writes, in Seven lessons I didn’t learn from election day, Many people saw the WSJ report as a vindication of prediction markets. But the neighbor method of polling hasn’t worked elsewhere. More: Polling by asking people about their neighbors: When does this work? Should people be doing more of it? And the connection to that French dude who bet on Trump

The money is flooding in, but what are prediction markets truly telling us? If we look back further, predicted election markets were actually legal in the US from the 1800s to 1924, and historical data shows that they were accurate. There’s a New York Times story of Andrew Carnegie noting how surprisingly accurate the election betting markets were at predicting outcomes. They were actually more accurate before the introduction of polling as a concept, which implies that the introduction of polling diluted the accuracy of the market, rather than the opposite.

Was the Polymarket Trump whale smart or lucky? Whether one trader’s private polling tapped sentiment more accurately than the publicly available surveys, or whether statistical noise just happened to reinforce his confidence to buy a dollar for 40c, can’t be known without seeing the data.

Koleman Strumpf Interview - Prediction Markets & More 2024 was a huge vindication for the markets. I don’t know how else to say it, but all the polls and prognosticators were left in the dust. Nobody came close to the markets. They weren’t perfect, but they were an awful lot better than anything else, to say the least.

FBI raids apartment of election betting site Polymarket’s CEO and seizes cellphone, source says Though U.S. election betting is newly legal in some circumstances, Polymarket is not supposed to allow U.S. users after the Commodity Futures Trading Commission halted its operations in 2022, but its user base largely operates through cryptocurrency, which allows for easy anonymity.

Polymarket Explained: How Blockchain Prediction Markets Are Shaping the Future of Forecasting (Details of how Polymarket works including tokens and smart contracts.)

Betting odds called the 2024 election better than polls did. What does this mean for the future of prediction markets?

Prediction Markets for the Win

Just betting on an election every few years is not the interesting part, though. Info Finance is a broader concept. [I]nfo finance is a discipline where you (i) start from a fact that you want to know, and then (ii) deliberately design a market to optimally elicit that information from market participants.

10 Nov 2024

Links for 10 November 2024

Signal Is Now a Great Encrypted Alternative to Zoom and Google Meet These updates mean that Signal is now a free, robust, and secure video conferencing service that can hang with the best of them. It lets you add up to 50 people to a group call and there is no time limit on each call.

The New Alt Media and the Future of Publishing - Anil Dash

I’m a neuroscientist who taught rats to drive − their joy suggests how anticipating fun can enrich human life

Ecosia and Qwant, two European search engines, join forces

What can McCain’s Grand Prix win teach us? Nothing new Ever since Byron Sharp decided he was going for red for his book cover, marketing thinkers have assembled a quite extraordinary disciplinary playbook. And it’s one that looks nothing like the existing stuff that it replaced. Of course, the majority of marketers know nothing about any of it. They inhabit the murkier corners of marketing, where training is rejected because change is held up as a circuit-breaker for learning anything from the past. AI and the ‘new consumer’ mean everything we once knew is pointless now. Better to be ignorant and untrained than waste time on irrelevant historical stuff. But for those who know that is bullshit, who study, who respect marketing knowledge, who know the foundations do not change, the McCain case is a jewel sparkling with everything we have learned in these very fruitful 15 years.

The Counterculture Switch: creating in a hostile environment

Why Right-Wing Media Thrives While The Left Gets Left Behind

The Rogue Emperor, And What To Do About Them Anywhere there is an organisation or group that is centred around an individual, from the smallest organisation upwards, it’s possible for it to enter an almost cult-like state in which the leader both accumulates too much power, and loses track of some of the responsibilities which go with it. If it’s a tech company or a bowls club we can shrug our shoulders and move to something else, but when it occurs in an open source project and a benevolent dictator figure goes rogue it has landed directly on our own doorstep as the open-source community.

We need a Wirecutter for groceries

Historic calculators invented in Nazi concentration camp will be on exhibit at Seattle Holocaust center

One Company A/B Tested Hybrid Work. Here’s What They Found. According to the Society of Human Resource Management, each quit costs companies at least 50% of the employees’ annual salary, which for Trip.com would mean $30,000 for each quit. In Trip.com’s experiment, employees liked hybrid so much that their quit rates fell by more than a third — and saved the company millions of dollars a year.

03 Nov 2024

links for 3 November 2024

Remote Startups Will Win the War for Top Talent Ironically, in another strike against the spontaneous collaboration argument, a study of two Fortune 500 headquarters found that transitioning from cubicles to an open office layout actually reduced face-to-face interactions by 70 percent.

Why Strava Is a Privacy Risk for the President (and You Too) Not everybody uses their real names or photos on Strava, but many do. And if a Strava account is always in the same place as the President, you can start to connect a few dots.

Why Getting Your Neighborhood Declared a Historic District Is a Bad Idea Historic designations are commonly used to control what people can do with their own private property, and can be a way of creating a kind of “backdoor” homeowners association. Some historic neighborhoods (many of which have dubious claims to the designation) around the country have HOA-like restrictions on renovations, repairs, and even landscaping.

Donald Trump Talked About Fixing McDonald’s Ice Cream Machines. Lina Khan Actually Did. Back in March, the FTC submitted a comment to the US Copyright Office asking to extend the right to repair certain equipment, including commercial soft-serve equipment.

An awful lot of FOSS should thank the Academy Linux and open source in general seem to be huge components of the movie special effects industry – to an extent that we had not previously realized. (unless you have a stack of old Linux Journal back issues from the early 2000s—we did a lot of movie covers at the time that much of this software was being developed.)

Using an 8K TV as a Monitor For programming, word processing, and other productive work, consider getting an 8K TV instead of a multi-monitor setup. An 8K TV will have superior image quality, resolution, and versatility compared to multiple 4K displays, at roughly the same size. (huge TVs are an under-rated, subsidized technology, like POTS lines. Most or all of the huge TVs available today are smart and sold with the expectation that they’ll drive subscription and advertising revenue, which means a discount for those who use them as monitors.)

Suchir Balaji, who spent four years at OpenAI, says OpenAI’s use of copyrighted data broke the law and failed to meet fair use criteria; he left in August 2024 Mr. Balaji believes the threats are more immediate. ChatGPT and other chatbots, he said, are destroying the commercial viability of the individuals, businesses and internet services that created the digital data used to train these A.I. systems.

The Unlikely Inventor of the Automatic Rice Cooker Fumiko, the mother of six children, spent five years researching and testing to document the ideal recipe. She continued to make rice three times a day, carefully measuring water-to-rice ratios, noting temperatures and timings, and prototyping rice-cooker designs. Conventional wisdom was that the heat source needed to be adjusted continuously to guarantee fluffy rice, but Fumiko found that heating the water and rice to a boil and then cooking for exactly 20 minutes produced consistently good results.

Comments on TSA proposal for decentralized nonstandard ID requirements Compliance with the REAL-ID Act requires a state to electronically share information concerning all driver’s licenses and state-issued IDs with all other states, but not all states do so. Because no state complies with this provision of the REAL-ID Act, or could do so unless and until all states do so, no state-issued driver’s licenses or ID cards comply with the REAL-ID Act.

02 Nov 2024

or we could just not

previously: Sunday Internet optimism

The consensus, dismal future of the Internet is usually wrong. Dystopias make great fiction, but the Internet is surprisingly good at muddling through and reducing each one to nuisance level.

  • We don’t have Clipper Chip dystopia that would have put backdoors in all cryptography.

  • We don’t have software patent cartel dystopia that would have locked everyone in to limited software choices and functionality, and a stagnant market.

  • We don’t have Fritz Chip dystopia that would have mandated Digital Rights Management on all devices.

None of these problems have gone away entirely—encryption backdoors, patent trolls, and DRM are all still there—but none have reached either Internet-wide catastrophe level or faded away entirely.

Today’s hottest new dystopia narrative is that we’re going to end up with surveillance advertising features in web browsers. They’ll be mathematically different from old-school cookie tracking, so technically they won’t make it possible to identify anyone individually, but they’ll still impose the same old surveillance risks on users, since real-world privacy risks are collective.

Compromising with the dystopia narrative always looks like the realistic or grown-up path forward, until it doesn’t. And then the non-dystopia timeline generally looks inevitable once you get far enough along it. This time it’s the same way. We don’t need cross-context personalized (surveillance) advertising in our web browsers any more than we need SCO licensesnot counting the SCO license timeline as dystopia, but another good example of dismal timeline averted in our operating systems. Let’s look at the numbers. I’m going to make all the assumptions most favorable to the surveillance advertising argument. It’s actually probably a lot better than this. And it’s probably better in other countries, since the USA is relatively advanced in the commercial surveillance field. (If you have these figures for other countries, please let me know and I’ll link to them.)

Total money spent on advertising in the USA: $389.49 billion

USA population: 335,893,238

That comes out to about $1,160 spent on advertising to reach the average person in the USA every year. That’s $97 per month.

So let’s assume (again, making the assumption most favorable to the surveillance side) that all advertising is surveillance advertising. And ads without the surveillance, according to Professor Garrett Johnson are worth 52 percent less than the surveillance ads.

So if you get rid of the surveillance, your ad subsidy goes from $97 to $46. Advertisers would be spending $51 less to advertise to you, and the missing $51 is a good-sized amount of extra money to come up with every month. But remember, that’s advertising money, total, not the amount that actually makes it to the people who make the ad-supported resources you want. Since the problem is how to replace the income for the artists, writers, and everyone else who makes ad-supported content, we need to multiply the missing ad subsidy by the fraction of that top-level advertising total that makes it through to the content creator in order to come up with the amount of money that needs to be filled in from other sources like subscriptions and memberships.

How much do you need to spend on subscriptions to replace $51 in ad money? That’s going to depend on your habits. But even if you have everything set up totally right, a dollar spent on ads to reach you will buy you less than a dollar you spend yourself. Thomas Baekdal writes, in How independent publishing has changed from the 1990s until today,

Up until this point, every publisher had focused on ‘traffic at scale’, but with the new direct funding focus, every individual publisher realized that traffic does not equal money, and you could actually make more money by having an audience who paid you directly, rather than having a bunch of random clicks for the sake of advertising. The ratio was something like 1:10,000. Meaning that for every one person you could convince to subscribe, donate, become a member, or support you on Patreon … you would need 10,000 visitors to make the same amount from advertising. Or to put that into perspective, with only 100 subscribers, I could make the same amount of money as I used to earn from having one million visitors.

All surveillance ad media add some kind of adtech tax. The Association of National Advertisers found that about 1/3 of the money spent to buy ad space makes it through to the publisher.

A subscription platform and subscriber services impose some costs too. To be generous to the surveillance side, let’s say that a subscription dollar is only three times as valuable as an advertising dollar. So that $51 in missing ad money means you need to come up with $17 from somewhere. This estimate is really on the high side in practice. A lot of ad money goes to overhead and to stuff like retail ad networks (online sellers bidding for better spots in shopping search results) and to ad media like billboards that don’t pay for content at all.

So, worst case, where do you get the $17? From buying less crap, that’s where. Mustri et al.(PDF) write,

[behaviorally] targeted ads are more likely to be associated with lower quality vendors, and higher prices for identical products…

You also get a piece of the national security and other collective security benefits of eliminating surveillance, some savings in bandwidth and computing resources, and a lower likelihood of becoming a victim of fraud and identity theft. But that’s pure bonus benefit on top of the win from saving money by spending less on overpriced, personally targeted, low-quality products. (If privacy protection didn’t help you buy better stuff, the surveillance companies would have said so by now.) Because surveillance advertising gives an advantage to deceptive advertisers over legit ones, the end of surveillance advertising would also mean an increase in sales for legit brands.

And we’re not done. As a wise man once said, But wait! There’s more! Before you rush to do effective privacy tips or write to your state legislators to support anti-surveillance laws, there’s one more benefit for getting rid of surveillance/personalized advertising. Remember that extra $51 that went away? It didn’t get burned up in a fire just because it didn’t get spent on surveillance advertising. Companies still have it, and they still want to sell you stuff. Without surveillance, they’ll have to look for other ways to spend it. And many of the options are win-win for the customer. In Product is the P all marketers should strive to influence, Mark Ritson points out the marketing wins from incremental product improvements, and that’s the kind of work that often gets ignored in favor of niftier, short-term, surveillance advertising projects. Improving service and pricing are other areas that will will also do better without surveillance advertising contending for budgets. There is a lot of potential gain for a lot of people in getting rid of surveillance advertising, so let’s not waste the opportunity. Don’t worry, we’ll get another Internet dystopia narrative to worry about eventually.

More: stop putting privacy-enhancing technologies in web browsers

29 Oct 2024

links for 29 Oct 2024

Satire Without Purpose Will Wander In Dark Places Broadly labelling the entirety of Warhammer 40,000 as satire is no longer sufficient to address what the game has become in the almost 40 years since its inception. It also fails to answer the rather awkward question of why, exactly, these fascists who are allegedly too stupid to understand satire are continually showing up in your satirical community in the first place.

Why I’m staying with Firefox for now – Michael Kjörling [T]he most reasonable option is to keep using Firefox, despite the flaws of the organization behind it. So far, at least these things can be disabled through settings (for example, their privacy-preserving ad measurement), and those settings can be prepared in advance.

Google accused of shadow campaigns redirecting antitrust scrutiny to Microsoft, Google’s Shadow Campaigns (so wait a minute, Microsoft won’t let companies use their existing Microsoft Windows licenses for VMs in the Google cloud, and Google is doing a sneaky advocacy campaign? Sounds like content marketing for Amazon Linux®

Scripting News My friends at Automattic showed me how to turn on ActivityPub on a WordPress site. I wrote a test post in my simple WordPress editor, forgetting that it would be cross-posted to Mastodon. When I just checked in on Masto, there was the freaking post. After I recovered from passing out, I wondered what happens if I update the post in my editor, and save it to the WordPress site that’s hooked up to Masto via ActivityPub. So I made a change and saved it. I waited and waited, nothing happened. I got ready to add a comment saying ahh I guess it doesn’t update, when—it updated. (Like being happy when a new web site opens in a new browser, a good sign that ActivityPub is the connecting point for this kind of connected innovation.) Related: The Web Is a Customer Service Medium (Ftrain.com) by Paul Ford.

China Telecom’s next 150,000 servers will mostly use local processors Among China Telecom’s server buys this year are machines running processors from local champion Loongson, which has developed an architecture that blends elements of RISC-V and MIPS.

Removal of Russian coders spurs debate about Linux kernel’s politics Employees of companies on the Treasury Department’s Office of Foreign Assets Control list of Specially Designated Nationals and Blocked Persons (OFAC SDN), or connected to them, will have their collaborations subject to restrictions, and cannot be in the MAINTAINERS file.

The TikTokification of Social Media May Finally Be Its Undoing by Julia Angwin. If tech platforms are actively shaping our experiences, after all, maybe they should be held liable for creating experiences that damage our bodies, our children, our communities and our democracy.

Cheap Solar Panels Are Changing the World The latest global report from the International Energy Agency (IEA) notes that solar is on track to overtake all other forms of energy by 2033.

Conceptual models of space colonization - Charlie’s Diary (one more: Kurt Vonnegut’s concept for spreading genetic material)

(protip: you can always close your browser tabs with creepy tech news, there will be more in a few minutes… Location tracking of phones is out of control. Here’s how to fight back. LinkedIn fined $335 million in EU for tracking ads privacy breaches Pinterest faces EU privacy complaint over tracking ads Dems want tax prep firms charged for improper data sharing Dow Jones says Perplexity is “freeriding,” sues over copyright infringement You Have a ‘Work Number’ on This Site, and You Should Freeze It Roblox stock falls after Hindenburg blasts the social gaming platform over bots and pedophiles)

It Was Ten Years Ago Today that David Rosenthal predicted that cryptocurrency networks will be dominated by a few, perhaps just one, large participant.

Writing Projects (good start for a checklist before turning in a writing project. Maybe I should write Git hooks for these.)

Word.(s). (Includes some good vintage car ads. Remember when most car ads were about the car, not just buttering up the driver with how successful you must be to afford this thing?)

Social Distance and the Patent System [I]t was clear from our conversation that [Judge Paul] Michel doesn’t have a very deep understanding of the concerns of many in the software industry. And, more to the point, he clearly wasn’t very interested in understanding those concerns better or addressing them. On a theoretical level, he knew that there was a lot of litigation in the software industry and that a lot of people were upset about it. But like Fed and the unemployment rate, this kind of theoretical knowledge doesn’t always create a sense of urgency. One has to imagine that if people close to Michel—say, a son who was trying to start a software company—were regularly getting hit by frivolous patent lawsuits, he would suddenly take the issue more seriously. But successful software entrepreneurs are a small fraction of the population, and most likely no judges of the Federal Circuit have close relationships with one.

(Rapids is the script that gathers these, and it got a clean bill of health from the feed reader score report after I fixed the Last-Modified/If-Modified-Since and Etag handling. So expect more link dump posts here, I guess.)

27 Oct 2024

typefaces that aren’t on this blog (yet?)

Right now I’m not using these, but they look useful and/or fun.

  • Departure Mono: vintage-looking, pixelated, lo-fi technical vibe.

  • Atkinson Hyperlegible Font was carefully developed by the Braille Institute to help low-vision readers. It improves legibility and readability through clear, and distinctive letters and numbers.

  • B612 – The font family is an highly legible open source font family designed and tested to be used on aircraft cockpit screens.

  • Routed Gothic Font is traced from an old Leroy Lettering set used in vintage technical drawings and control panels.

I’m trying to keep this site fairly small and fast, so getting by with Modern Font Stacks as much as possible.

26 Oct 2024

personal AI in the rugpull economy

Doc Searls writes, in Personal Agentic AI,

Wouldn’t it be good for corporate AI agents to have customer hands to shake that are also equipped with agentic AI? Wouldn’t those customers be better than ones whose agency is merely human, and limited to only what corporate AI agents allow?

The obvious answer for business decision-makers today is: lol, no, a locked-in customer is worth more. If, as a person who likes to watch TV, you had an AI agent, then the agent could keep track of sports seasons and the availability of movies and TV shows, and turn your streaming subscriptions on and off. In the streaming business, like many others, the management consensus is to make things as hard and manual as possible on the customer side, and save the automation for the company side. Just keeping up with watching a National Football League team is hard…even for someone who is ON the team. Automation asymmetry, where the seller gets to reduce service costs while the customer has to do more and more manual work, is seen as a big win by the decision-makers on the high-automation side.

Big company decision-makers don’t want to let smaller companies have their own agentic tools, either. Getting a DMCA Exemption to let McDonald’s franchisees fix their ice cream machines was a big deal that required a lengthy process with the US Copyright Office. Many other small businesses are locked in to the manual, low-information side of a business relationship with a larger one. (Web advertising is another example. Google shoots at everyone’s feet, and agencies, smaller firms, and browser extension developers dance.)Google employees and shareholders would be better off if it were split into two companies that could focus on useful projects for independent customers who had real choices.

The first wave of user reactions to AI is happening, and it’s adversarial. Artists on sites like DeviantArt went first, and now Reddit users are deliberately posting fake answers to feed Google’s AI. On the shopping side, avoiding the output of AI and made-for-AI deceptive crap is becoming a must-have mainstream skill, as covered in How to find helpful content in a sea of made-for-Google BS and How Apple and Microsoft’s trusted brands are being used to scam you. As Baldur Bjarnason writes,

The public has for a while now switched to using AI as a negative—using the term artificial much as you do with artificial flavouring or that smile’s artificial. It’s insincere creativity or deceptive intelligence.

Other news is even worse. In today’s global conflict between evil oligarchs and everyone else, AI is firmly aligned with the evil oligarch side.

But today’s Big AI situation won’t last. Small-scale and underground AI has sustainable advantages over the huge but money-losing contenders. And it sounds like Doc is already thinking post-bubble.

Adversarial now, but what about later?

So how do we get from the AI adversarial situation we have now to the win-win that Doc is looking for? Part of the answer will be resolving the legal issues. Today’s Napster-like free-for-all environment won’t persist, so eventually we will have an AI scene in which companies that want to use your work for training have to get permission and disclose provenance.

The other part of the path from today’s situation—where big companies have AI that enables scam culture and chickenization while individuals and small companies are stuck rowing through funnels and pipelines—is personal, aligned AI that balances automation asymmetries. Whether it’s solving CAPTCHAs, getting data in hard-to-parse formats, or other awkward mazes, automation asymmetries mean that as a customer, you technically have more optionality than you practically have time to use. But AI has a lot more time. If a company gives you user experience grief, with the right tools you can get back to where you would have been if they had applied less obfuscation in the first place. (icymi: Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent Not a deliberate obfuscation example, but an approach that can be applied.)

So we’re going to see something like this AI cartoon by Tom Fishburne (thanks to Doc for the link) for privacy labour. Companies are already getting expensive software-as-a-service to make privacy tasks harder for the customers, which means that customers are going to get AI services to make it easier. Eventually some companies will notice the extra layers, pay attention to the research, and get rid of the excess grief on their end so you can stop running de-obfuscation on your end. That will make it work better for everyone. (GPC all the things! Data Rights Protocol)

The biggest win from personal AI will, strangely enough, be in de-personalizing your personal information environment. By doing the privacy labour for you, the agentic AI will limit your addressability and reduce personalization risks. The risks to me from buying the less suitable of two legit brands are much lower than the risk of getting stuck with some awful crap that was personalized to me and not picked up on by norms enforcers like Consumer Reports. Getting more of my privacy labour done for me will not just help me personally do better #mindfulConsumption, but also increase the rewards for win-win moves by sellers. Personalization might be nifty, but filtering out crap and rip-offs is a bigger immediate win: Sunday Internet optimism Doc writes, When you limit what customers can bring to markets, you limit what can happen in those markets. As far as I can tell, the real promise for agentic AI isn’t just in enabling existing processes or making them more efficient. It’s in establishing a credible deterrent to enshittification—if you’re trying to rip me off, don’t talk to me, talk to my bot army.

For just a minute, put yourself in the shoes of a product manager with a proposal for some legit project that they’re trying to get approved. If that proposal is up against a quick win for the company, like one based on creepy surveillance, it’s going to lose. But if the customers have the automation power to lower the ROI from creepy growth hacking, the legit project has a chance. And that pushes up the long-term value of the entire company. An individual locked-in customer is more valuable to the brand than an individual independent customer, but a brand with independent customers is more valuable than a brand with an equal number of locked-in customers.

Anyway, hope to see you at VRM Day.

15 Oct 2024

Another easy-ish state law: the No Second-class Citizenship Act

Tired of Big Tech companies giving consumer protections, fraud protections, and privacy protections to their users in other countries but not to people at home in the USA? Here’s another state law we could use, and I bet it could be a two-page PDF.

If a company has more than 10% of our state’s residents as customers or users, and also does business in 50 or more countries, then if they offer a privacy or consumer protection feature in a non-US location they must also offer it in our state within 90 days.

Have it enforced Texas SB 8 style, by individuals, so harder for Big Tech sockpuppet orgs to challenge.

Reference

Antitrust challenge to Facebook’s ‘superprofiling’ finally wraps in Germany — with Meta agreeing to data limits | TechCrunch We’ve asked Meta to confirm whether changes will be implemented globally — or only inside the German market where the Bundeskartellamt has jurisdiction.

12 Oct 2024

convert TTF to WOFF2 on Fedora Linux

If you have a font in TTF (TrueType) format and need WOFF2 for web use, there is a woff2_compress utility packaged for Fedora (but still missing a man page and --help feature.) The package is woff2-tools.

sudo dnf install woff2-tools
woff2_compress example.ttf

Also packaged for Debian: Details of package woff2 in sid

WOFF

For the older WOFF format (which I needed in order to have the font show up on a really old browser) the tool is sfnt2woff-zopfli.

Install and run with:

sudo dnf install sfnt2woff-zopfli 
sfnt2woff-zopfli example.ttf 

References

Converting TTF fonts to WOFF2 (and WOFF) - DEV Community (covers cloning and building from source)

How to Convert Font Formats to WOFF under Linux (compares several conversion tools)

typefaces that aren’t on this blog (yet?)

10 Oct 2024

drinking games with the Devil

Should I get into a drinking game with the Devil? No, for three important reasons unrelated to your skill at the game.

  1. The Devil can out-drink you.

  2. The Devil can drink substances that are toxic to you even in small quantities.

  3. The Devil can cheat in ways that you will not be able to detect, and take advantage of rules loopholes that you might not understand.

What if I am really good at the skills required for the game? Still no. Even if you have an accurate idea of your own skill level, it is hard to estimate the Devil’s skill level. And even if you have roughly equally matched skills, the Devil still has the three advantages above.

What if I’m already in a drinking game with the Devil? I can’t offer a lot of help here, but I have read a fair number of comic books. As far as I can tell, your best hope is to delay playing and to delay taking a drink when required to. It is possible that some more powerful entity could distract the Devil in a way that results in the end of the game.

07 Oct 2024

there ought to be a law

Do we really need another CCPA-like state privacy law, or can states mix it up a little in 2025? (The easy version: privacy laws for slacker states)

What if, instead of big boring laws intended to cover everything, legislators did more of a do the simplest thing that could possibly work approach? Big Tech lobbyists are expensive—maybe a better way to beat them is, instead of grinding out the long-ass PDFs they expect, make them fight an unpredictable distributed campaign of random-ish short bills that take the side of local small businesses?

  • Require generative AI companies to offer an opt out that is not tied to any other services such as search. AI legal links

  • Surveillance companies should need to get state surveillance licenses. Big Tech platforms: mall, newspaper, or something else?, surveillance licensing in practice

  • Require blocking of search ads on state-owned and educational computers, because of the 2022 FBI warning (that stayed up until late 2024—here is a local copy) and the threat of fake ads intended to steal people’s passwords for commonly used services such as Slack and Calendly. B L O C K in the U S A

  • Require Global Privacy Control for smart TVs and appliances, and for smart home platforms that support ordering or subscriptions. GPC all the things!. We also need an opt-out preference signal for NFC tap to pay devices. (AB 3048 in California was a good idea, but it got changed to cover browsers and phones only, so would have tended to drive surveillance to devices where it’s harder to avoid, which would be a terrible experience for users. Thank you for browsing our catalog site, use your compatible smart appliance to actually order anything.) Update 31 Oct 2024: possibly combine the GPC mandate with a reform to wiretapping laws to address the CIPA Uncertainty that a lot of companies have been on about recently. Amend CIPA and similar state wiretapping laws to state that data collection from a device or client software that supports GPC is definitely not wiretapping. That way the companies get the legal ambiguity resolved, the users get their opt-outs, sounds like a solution we can all live with.

  • Some kind of a digital tearsheet requirement to make it harder to trick advertisers into sponsoring illegal activities. notes on a California advertiser protection bill

  • Require clear explanations of consumer categories and inferences in RtKs. OTHER ATTRIBUTES (Colorado already has this.)

  • Postal RtK/RtD/opt outs. If a postal backup is available, that sets the floor for how annoying a company can make the online process. The problem with CCPA RtK workflows

  • Add miscellaneous power user time saving improvements to existing privacy laws. State privacy law features from the power user point of view

  • Pigovian tax on databases of PII (calculated as n * log(n) to disincentivize risky centralization) taxing surveillance marketing

  • Digital ad tax. See Daron Acemoglu and Simon Johnson: “The Urgent Need to Tax Digital Advertising” - Network Law Review (The variant where advertising is tax-free at the scale of a legit ad-supported site, but a high tax kicks in at the scale of a Big Tech firm, is a competition policy win, too: Digital Ad-Tax by Paul Romer)

  • Require platform ad libraries to be crawlable by image indexers like TinEye and by trademark monitoring firms. some ways that Facebook ads are optimized for deceptive advertising

  • Euroclone law: if a company operates in 50 or more countries, and offers a consumer or privacy protection feature to the residents of some jurisdiction outside the USA, then that feature must also be offered to residents of our state. More: privacy laws for slacker states

  • Federal: Keep Section 230 immunity for platforms, but pass liability through to the advertisers. Big Tech would have to clean up their act to keep brands.

  • Update existing wiretapping laws to cover modern surveillance in media where no GPC or analogous opt-out is available. In the Kathleen Vita v. New England Baptist Hospital decision, the court wrote, If the Legislature intends for the wiretap act’s criminal and civil penalties to prohibit the tracking of a person’s browsing of, and interaction with, published information on websites, it must say so expressly.

Yes, the Big Tech companies will try to get small businesses to come out and advocate for surveillance, but there are a bunch of other small business issues that limitations on surveillance could help address, by shifting the balance of power away from surveillance companies.

  • Are small business owners contending for search rankings and map listings with fake businesses pretending to be competitors in their neighborhood?

  • Is Big Tech placing bogus charges on their advertiser account–or, if they run ads on their own site, are ad companies docking their pay for unexplained “invalid traffic”?

  • Are companies taking their content for “AI” that directly competes with their sites—without letting them opt out, or offering an opt-out that would make their business unable to use other services?

  • Can a small business even get someone from Big Tech on the phone, or are companies putting their dogmatic programs of union-busting and layoffs ahead of service even to advertisers and good business customers?

  • What happens when an account gets compromised or hacked? Do small businesses have any way to get help (without knowing someone who happens to know someone at the big company?)

(Update 9 Nov 2024) Each legal victory for groups like NetChoice reveals to state lawmakers how to craft more resilient laws.Jess Miers, on Techdirt

06 Oct 2024

links for 6 October 2024

Intent IQ Has Patents For Ad Tech’s Most Basic Functions – And It’s Not Afraid To Use Them (Wait a minute. If Firefox is part of the Open Innovation Network’s Linux System definition, and Firefox has ads now, does that mean OIN covers this?) 🍿

New Map Shows Community Broadband Networks Are Exploding In U.S. Community-owned broadband networks provide faster, cheaper, better service than their larger private-sector counterparts. Staffed by locals, they’re also more directly accountable and responsive to the needs of locals

So It Goes GHQ is a board game invented by Kurt Vonnegut in 1956. GHQ is to WWII what chess is to the Medieval battlefield.

The Other Bubble While SaaS is generally a good deal for small-to-mid-sized companies, the inevitable sprawl of letting SaaS into your organization means that you’re stuck with them.

Oskar Wickström: How I Built “The Monospace Web” (fun with CSS, cool vintage style serious-looking design)

Posse: Reclaiming social media in a fragmented world Rather than publishing a post onto someone else’s servers on Twitter or Mastodon or Bluesky or Threads or whichever microblogging service will inevitably come along next, the posts are published locally to a service you control.

Best practices in practice: Black, the Python code formatter I don’t have to explain what they got wrong and why it matters — they don’t even need to understand what happens when the auto-formatter runs. It just cleans things up and we move on with life.

EPIC Publishes Model Privacy Bill as Practical Solution for States (everyone ready for the 2025 privacy bill season next year? There are still some practical problems with this draft—I can see how opting out of every company that might have your data getting to be a big time suck under this. Needs to be simplified to the point where it’s practical IMHO.)

What Happened After I Outed a Reddit Mod for Affiliate Spam (you know that thing where you add reddit to your web search to find honest reviews?)

Valve Steam Deck as a stepping stone to the Linux desktop Thanks to the technology behind Steam Desk, however, you can now play Windows games on Linux without any fuss or muss. (of course, all the growth hacking on Microsoft® brand Windows might help, too)

A layered approach to content blocking Chromium’s Manifest v3 includes the declarativeNetRequest API, which delegates these functions to the browser rather than the extension. Doing so avoids the timing issues visible in privileged extensions and does not require giving the extension access to the page. While these filters are more reliable and improve privilege separation, they are also substantially weaker. You can say goodbye to more advanced anti-adblock circumvention techniques. (Good info on the tradeoffs in Manifest v3, and a possible way forward, with simpler/more secure and complex/more featureful blocking both available to the user)

(If you’re still bored after reading all these, how about trying some effective privacy tips?)

03 Oct 2024

why I’m turning off Firefox ad tracking: the PPA paradox

Previously: turn off advertising features in Firefox

I am turning off Privacy-preserving attribution (PPA), the controversial Mozilla/Meta advertising feature in Firefox. Yes, according to the documentation, there are some good things about PPA compared to cookies:

  • You theoretically can’t be identified individually as the same person who saw an ad and then bought something

  • A site can’t tell if you have PPA on or off

Those are both interesting and desirable properties, and the PPA system, if implemented correctly and run honestly, does not look like a problem on its own. So why are people creeped out by it?

Screenshot of the Firefox PPA preference, found in Settings→Privacy & Security

That creeped-out feeling is not coming from privacy math ignorance, it’s people’s inner behavioral economists warning about an information imbalance. Just like people who grow up playing ball can catch a ball without consciously doing calculus, people who grow up in market economies get a pretty good sense of markets and information, which manifests as a sense of being creeped out when something about a market design doesn’t seem right.

The problem is not the design of PPA on its own, it’s that PPA is being proposed as something to run on the real Web, a place where you can find both the best legit ad-supported content and the most complicated scams. And that creates a PPA paradox: this privacy-preserving attribution feature, if it catches on, will tend to increase the amount of surveillance. PPA doesn’t have all of the problems of privacy-enhancing technologies in web browsers, but this is a big one.

Briefly, the way that PPA is designed to work is that sites that run ads will run JavaScript to request that the browser store impression events to keep a record of the ad you saw, and then a site where you buy stuff can record a conversion and then get a report to find out which sites the people who bought stuff had seen ads on. The browser doesn’t directly share the impression events with the site where you buy stuff. It generates an encrypted message that might or might not include impressions, then the site passes those encrypted messages to secure services to do math on them and create an aggregated report. The report doesn’t make it possible to match any individual ad impression to any individual sale.

So, as a web entrepreneur willing to bend the rules, how would you win PPA? You could make a site where people pay attention to the ads, and hope that gets them to buy stuff, so you get more ad money that way. The problem with that is that legit ad-supported content and legit, effective advertising are both hard. Not only do you need to make a good site, the advertisers who run their ads on it need to make effective ads in order for you to win this way. An easier way to win the PPA game is to run a crappy site and then (1) figure out who’s about to buy, (2) trick those people into visiting your crappy site, and (3) tell the browser to store an impression before the sale you predicted, so that your crappy site gets credit for making the sale. And steps 1 and 2 work better and better the more surveillance you can do, including tracking people between web and non-web activity, smart TV mics, native mobile SDKs, server-to-server CAPIs, malware, use your imagination.

(Update 14 Oct 2024) PPA has an antitrust problem, too. In a market where the average user has their activity passed to Meta by thousands of companies, Meta has a large advantage when training a machine learning system to steal conversions by placing an ad in front of someone who would be likely to buy anyway. With PPA, a large surveillance company would not have to deliberately tell anyone to do fraud, or write code to do fraud. Instead, ML systems designed to win PPA would learn to do fraud, since if you have the surveillance data anyway, fraud is the quickest, easiest way to get money. (Like I said, legit conversions are hard.) And unlike what happened in legacy fraud cases like Uber v. Fetch, with PPA enough data is deliberately obfuscated to make the fraud impossible to track down. Only a few large companies have the combination of ML and large inflows of user data to make this kind of invisible, deniable fraud possible, so PPA looks like a tool for problematic concentration in the Internet and advertising businesses.

Of course, attribution stealing schemes are a thing with conventional cookie and mobile app tracking, too. And they have been for quite a while. But conventional tracking generally produces enough extra info to make it possible to do more interesting attribution systems that enable marketers to figure out when legit and not-so-legit conversions are happening. If you read Mobile Dev Memo by Eric Seufert and other high-end marketing sites, there is a lot of material about more sophisticated atribution models than what’s possible with PPA. Marketers have a constant set of stats problems to solve to figure out which of the ads are going to influence people in the direction of buying stuff, and which ad money is being wasted because it gets spent on claiming credit for selling a thing that customers were going to buy anyway. PPA doesn’t provide the info needed to get good answers for those stats problems—so what works like a privacy feature on its own would drive the development and deployment of more privacy risks. I’m turning it off, and I hope that enough people will join me to keep PPA from catching on.

More: or we could just not

29 Sep 2024

fair use alignment chart

Tantek Çelik suggests that Creative Commons should add a CC-NT license, like the existing Creative Commons licenses, but written to make it clear that the content is not licensed for generative AI training. Manton Reece likes the idea, and would allow training—but understands why publishers would choose not to. AI training permissions are becoming a huge deal, and there is a need for more licensing options. disclaimer: we’re taking steps in this area at work now. This is a personal blog post though, not speaking for employer or anyone else. In the 2024 AI Training Survey Results from Draft2Digital, only 5% of the authors surveyed said that scraping and training without a license is fair use.

Tantek links to the Creative Commons Position Paper on Preference Signals, which states,

Arguably, copyright is not the right framework for defining the rules of this newly formed ecosystem.

That might be a good point from the legal scholarship point of view, but the frequently expressed point of view of web people is more like, creepy bots are scraping my stuff, I’ll throw anything at them I can to get them to stop. Cloudflare’s one-click AI scraper blocker is catching on. For a lot of the web, the AI problem feels more like an emergency looting situation than an academic debate. AI training permissions will be a point where people just end up disagreeing, and where the Creative Commons approach to copyright, where the license deliberately limits the rights that a content creator can try to assert, is a bad fit for what many web people really want. People disagree on what is and isn’t fair use, and how far the power of copyright law should extend. And some free culture people who would prefer less powerful copyright laws in principle are not inclined to unilaterally refuse to use a tool that others are already using.

The techbro definition of fair use (what’s yours is open, what’s mine is proprietary) is clearly bogus, so we can safely ignore that—but it seems like Internet freedom people can be found along both axes of the fair use alignment chart. yes, there are four factors, but generative AI typically uses the entire work, so we can ignore the amount one, and we’re generally talking about human-created personal cultural works, so the nature of the copyrighted works we’re arguing about is generally similar. So we’re down to two, which is good because I don’t know how to make 3 and 4d tables in HTML.

  Transformative purist: work must be signficantly transformed Transformative neutral: work must be somehow transformed Transformative chaotic: work may be transformed
Market purist: work must not have a negative effect on the market for the original Memes are fair use AI business presentation assistants are fair use A verbatim quotation from a book in a book review is fair use
Market neutral: work may have some effect on the market AI-generated ads are fair use AI slop blogs are fair use New Portraits is fair use
Market chaotic: work may have significant effect on the market for the original AI illustrations that mimic an artist's style but not specific artworks are fair use Orange Prince is fair use Grok is fair use

We’re probably going to end up with alternate free culture licenses, which is a good thing. But it’s probably not realistic to get organizations to change their alignment too much. Free culture licensing is too good of an idea to keep with one licensing organization, just like free software foundations (lower case) are useful enough that it’s a good idea to have a redundant array of them.

Do we need a toothier, more practical license?

This site is not licensed under a Creative Commons license, because I have some practical requirements that aren’t in one of the standard CC licenses. These probably apply to more sites than just this one. Personally, I would be happier with a toothier license that covers some of the reasons I don’t use CC now.

  • No permission for generative AI training (already covered this)

  • Licensee must preserve links when using my work in a medium where links work. I’m especially interested in preserving link rel=author and link rel=canonical. I would not mind giving general permission for copying and mirroring material from this site, except that SEO is a thing. Without some search engine signal, it would be too easy for a copy of my stuff on a higher-ranked site to make this site un-findable. I’m prepared to give up some search engine juice for giving out some material, just don’t want to get clobbered wholesale.

  • Patent license: similar to open-source software license terms. You can read my site but not use it for patent trolling. If you use my content, I get a license to any of your patents that would be infringed by making the content and operating the site.

  • Privacy flags: this site is licensed for human use, not for sale or sharing of personal info for behavioral targeting. I object to processing of any personal information that may be collected or inferred from this site.

In general, if I can’t pick a license that lets me make content available to people doing normal people stuff, but not to non-human entities with non-human goals, I will have to make the people ask me in person. Putting a page on the web can have interesting consequences, and a web-aware license that works for me will probably needs to color outside the lines of the ideal copyright law that would make sense if we were coming up with copyright laws from scratch.

28 Sep 2024

Scam culture is everywhere

Just looking a recent news and how much of it is about surprisingly low-reputation decisions by surprisingly high-status business decision-makers. The big-picture trend that helps explain a lot of technology trends news is the ongoing collapse of business norms. Scam culture is getting mainstreamed faster than ever. Lots of related stories…

Online advertising is a…well, you knew that already. Brand safety a ‘con’ costing news industry billions, new research says How breaking up Google could lower your online shopping bill The Sleazy World of Reddit Marketing, Everything is Fake

Robot lawyers are fake. DoNotPay Has To Pay, After FTC Dings It For Lying About Its Non-Existent AI Lawyer

Academic publishing is a racket. Gates Foundation Shows That ‘Gold Open Access’ Was A Mistake, And ‘Diamond Open Access’ Is The Future

Other kinds of publishing are a racket, too. CNN and USA Today Have Fake Websites, I Believe Forbes Marketplace Runs Them Gannett’s ‘AI’ Scandals Result In Closure Of Wirecutter-esque Review Website, Layoffs

Pro sports are a racket. Legalizing Sports Gambling Was a Huge Mistake Want Access To Every NFL Game? It’ll Cost You, Thanks To Fractured Streaming Deals

Arrogant programmers and Enshittification - A New Understanding (read the whole thing. What happens when your self-worth is tied to work, but your boss is a growth hacker?)

Diseconomies of scale in fraud, spam, support, and moderation I don’t think it’s controversial to say that in general, a lot of things get worse as platforms get bigger.

The hate speech landscape on Facebook is worse than you thought. Here’s why In recent years, a growing number of politicians, human rights groups, and watchdogs have claimed that not only is Meta doing a poor job of removing harmful content, but its process for making enforcement decisions is happening in what they see as a black box. (There has always been some overlap between direct/database/online marketing, fraud, and right-wing politics in the USA. Goes back at least to the 1920s KKK boom. But today the connection is particularly strong. Maybe the national security Republicans were helping to keep that party from going into full growth hacker mode?) The return of Jacob Wohl! Yeah, he’s into AI now Trump’s $100,000 Watch Likely Made in China, Vastly Overpriced

Is Your Rent an Antitrust Violation? (Maybe we need a Lina Khan Signal, like the Batsignal but for Lina Khan?)

Anyway, it’s time to revise a lot of assumptions that were orignally made in the higher-trust business environment of the early, legit Web in its create more value than you capture days. Now that more devices, products, and services reflect scam culture settings by default, the rewards to tweaking, blocking, and other growth hacking avoidance are simliar to the rewards for PC power user skills back when those were a thing. More: Return of the power user

22 Sep 2024

stop putting privacy-enhancing advertising technologies in web browsers

(Previously: PET projects or real privacy?) The current trend for privacy-enhancing technologies for surveillance in web browsers are going to be remembered as a technical dead end, an artifact of an unsustainable advertising oligopoly. Here’s a top ten list of reasons, will update and add links.

10. PETs don’t fix revenue issues for ad-supported sites. The fundamental good ad/bad site problems and bad ad/good site problems are still there. PETs make it safer and easier for an advertiser to run ads on sites they don’t trust, so they help crappy infringing or AI-generated sites compete with legit ones in the same ways that third-party cookies do.

9. PETs give up the high ground and make the web just another incomprehensible, creepy surveillance medium. When people complain about privacy issues on native social media apps, with PETs on the web the app people can just say, your browser is creepy now too, we’re just better at business than web sites are.

8. Appeasement doesn’t work. In all the time that PET proponents have been saying that surveillance marketers will mend their ways if they have PETs as a compromise, how many data points have the surveillance marketers chosen not to collect because they have PETs instead? (The way to deal with boundary-testing is not to appease it, it’s to communicate the boundary, communicate the conseqences for crossing it, and make the consequences happen. I had a good source for this, need to find it again.)

(update 4 Mar 2025) In 2019, Google announced a Privacy Sandbox PET project, and wrote,

First, large scale blocking of cookies undermine people’s privacy by encouraging opaque techniques such as fingerprinting. With fingerprinting, developers have found ways to use tiny bits of information that vary between users, such as what device they have or what fonts they have installed to generate a unique identifier which can then be used to match a user across websites. Unlike cookies, users cannot clear their fingerprint, and therefore cannot control how their information is collected. We think this subverts user choice and is wrong.

Later, in 2025, Google lifted its 2019 ban on fingerprinting for advertisers. It turns out that Privacy Sandbox was not Google’s desired end state, but just boundary-testing. Web users can learn from this story. Today, Meta and Mozilla are doing a PET-based system for attribution tracking in Firefox that exposes users to fraud and discrimination risks. Proponents are saying that it’s not a big deal because it’s just attribution tracking, not personalization. And the people working on it might be just focused on the nifty attribution math and not the big picture. But for users, the time to react to boundary-testing is when you first see it happening. Waiting for things to get unsustainably bad—in this case, for deeper Meta integration in Firefox—would mean it’s harder to roll back.

7. Only a few platform oligopolies and monopolies benefit from PETs. PETs introduce noise and obfuscation, to make data interpretation only practical above a certain data set size—for a few large companies (or one?) On this point, they’re worse than third-party cookies.

6. People are different. About 30% of people really want cross-context personalized advertising, 30% really don’t want it, and for 40% it depends how you ask. PETs are too lossy for people who want cross-context personalized ads and too creepy for people who don’t.

5. If it’s a good idea for shoppers to share their info, obfuscated, with advertisers, why not make the browser share the info from corporate web apps with customers, with individual employee identifying details removed? What? Companies wouldn’t turn that feature on? Then why would users?

4. The code complexity and client-side resource usage—along with the inevitable security risks that come with running more code—end up being paid by users, while the benefits go to surveillance companies. And the additional server-side processing required to do all that privacy-enhancing math on all those zillions of cleverly scrambled data points means that Big Tech companies will build even more big data centers, consume more energy and fresh water, and delay those carbon-neutral goals yet again.

3. With PETs, information becomes available equally to both trusted and untrusted parties. In a sustainable advertising medium, a trusted publisher or channel has more audience information than an untrustworthy one. PETs commoditize ad inventory, create more incentives for surveillance of users using non-PET methods, and promote a race to the bottom the same way that cookies do.

2. For most people, individual tracking isn’t the problem. Users are concerned about group-level discrimination risks like surveillance pricing and algorithmic discrimination, and PETs would only obfuscate the risks, not reduce them, and make discrimination harder for regulators and NGOs to detect. In No Cookies for You! Evaluating the Promises of Big Tech’s ‘Privacy-Enhancing’ Techniques, Martin et al. found, Respondents did not judge the privacy solution proposed—to have a browser collect data, keep the data on the local device, create inferences for use in advertising—to be a privacy solution over third-party tracking and the use of data brokers in advertising. Jereth et al., in Consumers’ Perceived Privacy Violations in Online Advertising, found that perceived privacy violations” for a browser-based system that does not target people individually are similar to the perceived violations for conventional third-party cookies. The mathematical privacy that PETs offer is not the same as the decision-making privacy that people want or need. See browser topics tracking and the prejudiced landlord problem.

Compared to conventional tracking, PETs make it harder for states to enforce privacy laws. See The Hidden Dangers Of Privacy-Preserving Attribution – And A Smarter Solution on AdExchanger (by me).

1. Never mind, you didn’t have to read this list. Browser companies already know that PETs are creepy and bad, and you can tell they know because they hide PETs from users, either with a bullshit Got it dialog, or buried under Advanced or something. If PETs were good for users, the browsers would brag on them like they do other features.

More: Sunday Internet optimism

01 Sep 2024

journalist-owned news sites (Sunday Internet optimism, part 2)

Previously: Sunday Internet optimism

Congratulations to 404 Media, which celebrated its successful first year on August 22. They link to other next-generation news sites, owned by the people who write for them. I checked for ads.txt files and advertiser pages to see which are participating in the conventional RTB ad system and which are doing something else. (404 Media does have an ads.txt file managed by BuySellAds.)

Defector: sports site that’s famous for not sticking to sports (and even has an Arts And Culture section and #AI coverage: Whatever AI Looks Like, It’s Not) (ads.txt not found, advertise with us link redirects to a page of contact info.)

Hell Gate: New York City news (not just for those who finally canceled their subscriptions to that other New York site) (ads.txt not found, advertise with Hell Gate is just a page with a contact email address.)

Racket - Your writer-owned, reader-funded source for news, arts, and culture in the Twin Cities such as What It’s Like to Eat Your Own 90-lb. Butter Head (ads.txt not found, but the Advertise with Racket link goes to a nice page including advertiser logos and testimonials.)

Remap: Video game site that also covers a variety of topics, including but not limited to games, rooting for sports teams that break your heart, inflatable hot tubs, hanging out on car auction websites, and more. Old News from the Latest Disasters: [T]he fact that these studio tell-all features have started to feel so same-y says less about the journalist reporting them and more about how mundane this kind of dysfunction is in AAA game development. (ads.txt not found, no ad contact or page)

Aftermath: a worker-owned, subscription-based website covering video games, the internet and everything that comes after. Short-Sighted AI Deals Aren’t The Future Of Journalism (ads.txt not found, no ad contact or page.)

Another good example, not on 404 Media’s list, is The Kyiv Independent — News from Ukraine, Eastern Europe. The Kyiv Independent was born out of a fight for freedom of speech. It was co-founded by a group of journalists who were fired from the Kyiv Post, then a prominent newspaper, as the owner attempted to take the newsroom under control and end its critical coverage of Ukrainian authorities. Instead of giving up, the fired team founded a new media outlet to carry on the torch — and be a truly independent voice of Ukraine. Opinion: AI complacency is compromising Western defense (ads.txt found, looks like they use an ad management service.)

What all these sites have in common is a focus on subscriber/member revenue and first-party data.

For quite a while, operating an independent site has meant getting into a frenemy relationship with Big Tech. Yes, they pay some ad money, and can be shamed into writing checks (CA News Funding Agreement Falls Short), but they also grab as much reader data as possible in order to target the same readers in cheaper contexts, including some of the worst places on the Internet. But the bargain is changing rapidly—Big Tech is taking site content in order to keep eyeballs, not send them to the source. And sometimes worse: Copilot AI calls journalist a child abuser, Microsoft tries to launder responsibility. So The Backlash Against AI Scraping Is Real and Measurable. At first this situation seems like a massive value extraction crisis. If the ads move to AI content, and surveillance ad money goes away, where will the money for new data journalism and investigative reporting come from?

As a privacy nerd, I’m an optimist about this apparent mess. Yes, part of success in running a modern news operation is figuring out how to get by without legacy management layers and investors (404 Media Shows Online Journalism Can Be Profitable When You Remove Overpaid, Fail-Upward Brunchlords From The Equation). But the other big set of trends is technical and regulatory improvements that—if kept up and not worked around—will lower the ROAS (return on ad spendnot rodents of average size) for surveillance advertising. So the Internet optimist version of the story is

  1. Big Tech value extraction drives independent journalists to business models other than surveillance advertising

  2. Users choose effective privacy tools and settings (If the sites you like don’t need surveillance ads, and the FBI and FTC say they’re crooked, you might as well join the ad blocking trend to be on the safe side. Especially the YouTube ads…yeech)

  3. People with better privacy protection buy better goods and services

  4. With the money saved in step 3, people can afford more subscriptions.

The big objection to that is: what about free riding problems? Won’t people choose not to subscribe, or choose infringing or AI-exfiltrated versions of content? But most people aren’t as likely to try to free ride as tech executives are. The rise of 404 Media and related sites is a good sign. More: or we could just not

31 Aug 2024

Links for 31 August 2024

First, some good news: Sweden’s been stealthily using hydrogen to forge green steel. Now it’s ready to industrialise (the EU isn’t against technology, they’re against crooks and bullshitters. The DMA Version of iOS Is More Fun Than Vanilla iOS, Silicon Valley’s Very Online Ideologues are in Model Collapse)

AI Has Created a Battle Over Web Crawling The report, Consent in Crisis: The Rapid Decline of the AI Data Commons, notes that a significant number of organizations that feel threatened by generative AI are taking measures to wall off their data. (IMHO this is not just a TOS or copyright issue. In the medium term the main problem for AI scrapers is going to be privacy and defamation law. Meta AI Keeps Telling Strangers It Owns My Phone Number - Business Insider)

From the United States Court of Appeals for the Third Circuit, more news from the circuit split between common sense (advertisers should not be paying the PRC to kill kids) and the epicycles of increasingly contrived Big Tech advocacy still in the law books: The Limits of the CDA Section 230: Accountability for Algorithmic Decisions, Judges Rule Big Tech’s Free Ride on Section 230 Is Over. Yes, the Big Tech defenders are big mad. They thought they won with the ISIS recruiting on Twitter case. And they’re probably right about how well the Third Circuit’s decision (PDF) will hold up on appeal. I don’t think this will hold up in court with today’s judges. At least for now we need to regulate Big Tech in a way that avoids free speech issues. The motivation to deal with the situation is just getting stronger: Here are 13 other explanations for the adolescent mental health crisis. None of them work.) Update: TikTok Inspired Child Suicide Prompts a Sound Reading of Section 230 - The Illusion of More

DOJ sues TikTok, alleging “massive-scale invasions of children’s privacy” (Throwing the book at creepy surveillance companies is a win. Meta to pay $1.4 billion settlement after Texas facial recognition complaint)

Opt Out of Clearview AI Giveaway Class actions are terminally disappointing, but this one is especially egregious and it is worthy of special attention. We think you should opt out. Not just as a protest, but to preserve your rights in the event of further litigation. Here is how to do it. The deadline is September 20th.

Google’s Real Googly. No Not The Anti-Trust! Google search is starting to look old, tired, and less and less useful. (True, but that’s not because of disruption or innovation, it’s mainly that Google management has put dogmatic union-busting of TVC (second-class, indirect) employees ahead of a quality experience for users. The biggest mistake that companies with a cash cow make isn’t under-investing in innovation, it’s making wasteful investments in non-core areas while pursuing false economies in the core business. Meanwhile, Google writes checks for legacy media: Will Google’s $250 million deal with California really help journalism? California tried to make Google pay news outlets. The company cut a deal that includes funding AI and a new generation of journalist-owned news sites become going concerns)

More news from the regular people side of the AI story arc: Excuse Me, Is There AI in That? - The Atlantic Businesses and creators see a new opportunity in the anti-AI movement. Why putting AI in your product description is actually hurting sales The Generative-AI Revolution May Be a Bubble Law firm page following copyright cases: Case Tracker: Artificial Intelligence, Copyrights and Class Actions | BakerHostetler The other shoe dropping on ‘AI’ and office work

Ethics and Rule Breaking Among Life Hackers (to defeat the techbro, think like a techbro? full text)

Point of order: I decided not to put some otherwise good links in here because the writers chose to stick a big obvious AI-generated image on them. That’s like Rolling Coal for the web. Unless your intent is to claim membership in evil oligarch fan club or artist hater club, cut it out. I can teach you to find perfectly good Creative Commons images if you don’t have an illustration budget.

24 Aug 2024

pile of money fail

Really good example of a market failure in software quality incentivization: ansuz / ऐरन: “there’s a wee story brewing in…” Read the whole thing. Good counterexample for money talks. With the wrong market design, money says little or nothing.

To summarize (you did read the whole thing, right?) in 2019, a software algorithm called a Variable Delay Function (VDF) was the subject of a $100,000 reward program. Daniel J. Bernstein asked, in a talk recorded on video if the VDF was vulnerable to a method that he had already published in a paper.

If Bernstein was right, then a developer who

  • read Bernstein’s paper on the subject

  • applied Bernstein’s work to attacking the VDF

  • and was first to claim the reward

could earn $100,000. But the money was left unclaimed—nobody got the bounty, and the attack on VDFs didn’t come out until now.

It would take some time to read and understand the paper, and to figure out if it really described a way to break the VDF—but that’s not the main problem. The catch with the bounty scheme is that as a contender for the bounty, you don’t know how many other contenders there are and how fast they work. If 64 people (the number of viewers on the video) are working on it, and Bernstein is 95% likely to be right about the paper, then the expected payout is $100,000 × 0.95 × 1/64 = $1,484.38.

In this case, the main purpose of the bounty was to collect information about the quality of the VDF algorithm, and it failed to achieve this purpose. A better way to achieve this information-gathering goal is to use a system that also incentivizes meta-work such as evaluating whether a particular approach is relevant to a particular problem. More: Some ways that bug futures markets differ from open source bounties

11 Aug 2024

how to break up Google

Everybody* is on about plans for how to break up Google, so here’s my version. I’m trying to keep two awkward considerations in mind.

  • Any Google breakup plan has to fit in a tweet. Google will have more total lawyer time over more years to find the gaps in a complicated plan than could ever be invested in making the plan. Keep it simple, or Google will re-consolidate the way that AT&T did. (All right, maybe not fit in a tweet, but at least get it down to one side of a piece of paper.)

  • Leave Google with the ability to preserve shareholder value. Google is a big company that does a lot of things, so don’t drag it down with pointless micromanagement. Make as few breakup rules as possible but otherwise give them the ability to achieve the important goals in their own way.

The main point of the breakup is to protect users, not to protect any of the competing companies. A breakup does need to happen, though. Google’s tying of client and server products in an anticompetitive way enables the company to harm its users by funding illegal sites and serving fraudulent search ads while limiting the ability of their client software to protect people.

The common feature of all Google’s most problematic anticompetitive schemes is control of both the client and the server. For example, the reason that Google Chrome has such weird, clunky in-browser ad features is that it’s made by the same company that also owns YouTube. When the browser company owns a video sharing site with its own ad system, and the company as a whole earns more from YouTube than from open web ads, they have an incentive to develop in-browser ads in a way that a company that didn’t own both YouTube and Google Chrome would not.

So here’s the break-up plan. Should fit on one page. Google is split into two companies, call them clientGoogle and serverGoogle for now.

  1. serverGoogle can’t do clients. The first company, call it serverGoogle, may not sell or rent any hardware, or release any proprietary software that runs outside a serverGoogle data center. Any code that serverGoogle makes available outside a data center must be licensed without any limitations on reverse engineering, and distributed in the preferred form for making modifications. No software released by serverGoogle may be a technological protection measure under section 1201 of Title 17 of the United States Code (DMCA anticircumvention).

  2. clientGoogle can’t do servers. The second company, call it clientGoogle, cannot operate any Internet services, except those necessary for the development and distribution of client software.

  3. clientGoogle and serverGoogle can’t communicate confidentially with each other. The two companies can’t enter into an NDA with each other or contract with the same third parties (such as directors or consulting firms) in such a way as to create a confidential communications channel between them. (Consultants will have to pick one company to work for.)

The reason to do it this way is that most of Google’s anticompetitive behavior is based on control of both the client and the server. Splitting client and server would force a flip from an anticompetitive collusion approach to an adversarial interoperability situation. Separating the client and server would address the problems with Google’s browser, now hard-coded to advantage Google’s YouTube, and Google’s ad blocking support designed to bypass Google’s ads. In those two examples, the ads and YouTube would be part of serverGoogle, and the browser and mobile platform would be clientGoogle.

(Update 10 Nov 2024) Anticompetitive tying is a big issue for Google even though services such as Google Search are free of charge. Users pay for monopoly services in the form of exposure to fraudulent ads at a level higher than a competitive search engine would be able to get away with. If the search market were competitive, users would be able to choose a search engine based on their own preferences regarding total ad load and fraud risk, and search engines could compete based on anti-fraud effectiveness. Separating Google into client and server would enable the client company to more easily help users implement the FBI advice to use an ad blocking extension when performing internet searches as long as the fraud risk is high, and then to have the flexibility to recommend turning search ad blocking back off when the ads get cleaned up, as they were back when people actually liked Google search results more when they had ads than when they didn’t.

A client/server split would be easier to monitor than other breakup plans. The main monitoring that would be needed is enforcement of rule 3: keep the two companies from colluding. How long does a director or consultant have to sit out before going to work for the other company, that kind of thing. A whistleblower program with rewards big enough to retire on will help.

The two companies would need to coordinate, of course, but any communication would have to happen in open source projects and in organizations such as the Linux Foundation, W3C, IAB, and IETF. Opening up what had been intra-Google conversations to outsiders would not just be an antitrust win, it would also help avert some of the weird groupthink rat holes that all big companies tend to go down.

What about JavaScript and WebAssembly? When serverGoogle operates a web site with JavaScript or WebAssembly (which run on the client), the license for that client-side code may not prohibit reverse engineering, the terms of service for the site may not prohibit the use of the site with alternate JavaScript or WebAssembly code, and serverGoogle must make the corresponding source code available in the preferred form for making modifications, which might have to be a separate download from the compiled or minified form that users normally run.

What about servers for version control, CI, bug tracker, and downloads? The servers required to develop and release client software are the one exception to the no servers rule for clientGoogle. (That doesn’t mean clientGoogle gets to run any other servers. For example, if clientGoogle supports a browser with the ability to sync bookmarks, users must configure it to use their account with serverGoogle or some other party, as part of an add account process that users already go through to set up calendar or email accounts today.)

Can clientGoogle run servers for telemetry and in-product surveys? Yes, as long as they’re for the purpose of developing and releasing clientGoogle’s software. (They can’t pass data derived from their client software to any other party.)

What about Google Fiber? (and other businesses that aren’t client software or Internet services?) Let Google management pick based on what is good for them—we don’t want to micromanage business unit by business unit, just make rules to prevent the known problems.

What about AI? Considering that Google is all on about Integration and Android now? AI is a good example of a win from a client/server split. Mobile devices won’t be stuck talking to a laggy AI server for anticompetitive tying reasons, and Internet services won’t be held back by underpowered on-device AI for anticompetitive tying reasons. Both client and server will be able to make the best implementation choices.

What about the Google Play Store? serverGoogle could run the store but not release its own proprietary apps, which run on the client. clientGoogle could release apps, as well as mobile devices or platforms that enable users to connect to and use an app store.

Could serverGoogle spin off the YouTube service, clientGoogle spin off the YouTube apps, then the service and app companies merge to re-form a standalone YouTube? Yes, if it passes normal FTC merger review. Some post-breakup splitting and trading is going to happen, so the FTC still has to keep an eye on things.

What about my 401(k)? Google is a big part of the stock market, and without anticompetitive collusion they’ll be making less money. But relax. You’re probably invested in an index fund that owns shares in both parasites and hosts—as the legit economy recovers from all this negative-sum value extraction, your total portfolio will do better.

Would this work for [other company] too? Probably not. (Let’s do Google first, which will make the web a lot more fun, then we’ll be on a roll and can move on to whatever other big company is giving everybody grief.)

Don’t cut soup with a knife, people

Here’s how not to break up Google: Some people are suggesting that the breakup plan should be a careful dividing of the big bowl of adtech alphabet soup. (Where on Ari Paparo’s simplified chart do you cut, exactly?) That would be a waste of time—if that’s all you do, Google will just tweak their clients, Chrome and Android, to move the profits out of whatever slice of the soup they have to get rid of, and keep the money flowing into whatever they get to keep.

02 Aug 2024

a new browser feature?

The Web’s hottest new feature is Privacy-Preserving Corporate Information Sharing (PPCIS).

When a corporate employee uses a PPCIS browser to log in to any of their employer’s web applications, such as

  • shared document editor

  • webmail

  • bug tracker

  • Slack

PPCIS automatically uses its built-in AI to make a totally privacy-preserving summary of the employee’s work activity, then posts the summary to a PPCIS server using really cool math that makes it possible to identify the employer but not the individual.

The PPCIS server then aggregates all PPCIS summaries from all the users at a company to make a report that is shared with any customer or prospective customer who visits the company’s public web site.

PPCIS is not a real feature, but do you think that, if it existed, corporate IT departments would leave it turned on? If the answer is no, why would people want privacy-preserving tracking of their personal web activity? More: PET projects or real privacy?

29 Jul 2024

colophon

This site is built with a variety of tools.

I like static site generators but the way this site works I don’t have to learn a static site generator, just incrementally add on tools I already know as I need the site to do more.

All of these do a lot more than just what I use them for on this site.

More slash pages on this site

28 Jul 2024

links for 28 Jul 2024

Kamala Harris’ $7M support from LinkedIn founder comes with a request: Fire Lina Khan (Today’s IT industry big shots are used to the level of respect that they got from the Blackberry generation of politicians, but that was back when the industry was doing transformative innovation. Now that the industry has pivoted to rent-seeking and crime to keep the numbers going up, they’re not going to get the same treatment. Bonus link: The FTC Orders Companies To Disclose Info On “Surveillance Pricing”)

California Forges Ahead With Social Media Rules Despite Legal Barriers (More First Amendement questions on how recommendation algorithms work. It seems like requiring a Parental Control Protocol and a content-neutral surveillance licensing system would be more likely to hold up in court.)

End Single Family Zoning by Overturning Euclid V Ambler Cities around the country and around the world mix land uses, building heights, and lot sizes with no ill effects on health or safety. Indeed, mixed use cities may have improve health and safety by reducing driving and putting empty lots to use which reduces crime. (icmyi: “You Don’t Own Web3”: A Coinbase Curse and How VCs Sell Crypto to Retail)

The CrowdStrike Outage and Market-Driven Brittleness Read the whole thing. Today’s internet systems are too complex to hope that if we are smart and build each piece correctly the sum total will work right. We have to deliberately break things and keep breaking them. This repeated process of breaking and fixing will make these systems reliable.

The sentiment disconnect on ‘AI’ between tech and the public To many, “AI” seems to have become a tech asshole signifier: the tech asshole is a person who works in tech, only cares about bullshit tech trends, and doesn’t care about the larger consequences of their work or their industry. Or, even worse, aspires to become a person who gets rich from working in a harmful industry. (related: Does AI increase productivity at work? New study suggests otherwise, The average AI criticism has gotten lazy, and that’s dangerous)

Some coverage of the Google Chrome third-party cookies news:

(As a gatekeeper company, they’re not going to be able to get away with a setting that turns off third-party cookies but not tracking/personalization on Google Search or YouTube.)

And finally some random good reads.

California Grid Breezes Through Heat Wave due to Renewables, Batteries

Congress Accidentally Legalized Weed Six Years Ago

Costco in Cancún

Not Lost In Translation: How Barbarian Books Laid the Foundation for Japan’s Industrial Revoluton

26 Jul 2024

hello page for Don Marti

This is a Hello page for me.

Email

The best way to reach me is email: mailto:[email protected]

This blog has an RSS feed, or if you want to get this blog by email, I recommend Feedrabbit. Go to their site and put in this URL: https://blog.zgp.org/feed/

You can email me for Signal or phone info if you want to communicate that way.

Services I check

Besides email, I generally check these fairly often.

In-person events I attend

I can usually make it to Southern California Linux Expo (I missed this year) and it’s a good place to meet up with me in person.

Lately I have been going to W3C TPAC.

Other services

These are places I have accounts for certain purposes but aren’t a good way to reach me in general.

Services I might have an abandoned account on

I don’t know about the status of my accounts on these. Might have been taken over by spammers by now.

  • YouTube

  • Telegram

  • LiveJournal

  • I made accounts on Meta Instagram and Threads to try something on Threads but don’t check them.

Services I know I don’t have an account on

  • WhatsApp: I know users say it isn’t enshittified yet, but I don’t feel the need to be in suspense about when.

  • Apparently Google has some kind of new chat thing now again—I don’t have time to learn these before they go away, so don’t expect an answer (even though I do have a Google account).

More slash pages on this site

23 Jul 2024

surveillance licensing in practice

I wrote about how states should avoid free speech questions around Big Tech by establishing a licensing system for surveillance, and got some questions about how that would work.

The problem to watch out for is that state privacy regulators tend to be diligent high achiever types who aren’t afraid of doing a bunch of extra work. But what we want here is for most of the work of the licensing system to be done on the surveillance company side. The people who are getting paid by the taxpayers should spend as little time on it as possible. So here’s a possible way to do it.

  1. Pass a state law with a very general definition of surveillance, and say that anybody who surveils more than 20% of the population (to start with) needs to get a license. Appoint a surveillance licensing board.

  2. Design a surveillance licensing application, a one-page PDF. Name of company, contact person, and so on. Last form field is describe your surveillance practices in detail (attach additional pages if needed)

  3. When a company applies, put their application including the additional pages on the web (with employee name and contact info redacted of course) and have a public meeting.

  4. The meeting will be full of concerned citizens, NGOs, businesses that use the surveillance in some way, and other random members of the public. (Yes, people who got kicked off of Facebook because of getting hacked will show up at the Facebook meeting to complain.) Ideally this meeting would be organized in such a way that the Big Tech lawyers have to wait in a speakers’ waiting room next to random users. Will need to figure this out.

  5. Realistically some speakers at the meeting will come up with something that the surveillance company left out of their application, and some will mention harmful effects of surveillance practices. The board gives the company a temporary surveillance license and tells them to re-submit. While on a temporary license they can’t sign up any new users from this state.

  6. Go to step 3. When the company cleans up their act, then the board can give them a longer term license. If they persist the board might deny them a license and that’s when a lawsuit could kick in. But most of the steps of the process have already worked.

No speech mentioned, it’s all about non-speech conduct, so very difficult for surveillance industry sockpuppet orgs to get a court to block.

Update: pricing

So how much should a surveillance license cost? For a Big Tech company with a double-digit percentage of a state’s residents in their database, say $5-10 per person surveilled.

In general the license should be priced by count of people records, so a company would pay more per person surveilled as they surveil more people. As surveillance licensing comes into effect for smaller firms, they would pay less per record, and licensing would never be required for databases of less than a certain size.

Pricing a surveillance license proportional to (n log n) would help address some of the competition and centralization concerns raised by some kinds of privacy regulation. (see The Antitrust and Privacy Interface: Lessons for Regulators from the Data  by Brijesh Pinto, D. Daniel Sokol, Feng Zhu)

21 Jul 2024

Sunday Internet optimism

Over on the social media sites there have been a bunch of very serious posts from very serious people explaining how surveillance advertising is here to stay and the best we can do is put some privacy-enhancing technologies on it. This sounds dismal and awful—ads according to the faufreluches so the big shots get ads for sweet cars and good jobs, retirees get precious metals scams, those with money get legit investments, those without get predatory finance, you know, all the same tricks and discrimination but with more math to make it harder to understand. So instead I’m going to do some Internet optimism today. What happens if instead of reimplementing surveillance advertising, we just get rid of it?

Step one: people start buying better stuff. If you figure out how to turn the surveillance advertising off, you start buying goods and services that you are more satisfied with (Lin et. al) and buying less overpriced crap (Mustri et al). The main reason I’m pretty confident about this effect is because of some research that hasn’t been published. If people who use ad blockers and privacy tools were making worse purchases, then someone in the surveillance business would have published research saying so.

Step two: marketers look for alternatives. If I can somehow avoid being exposed to the surveillance ads, that doesn’t mean that people still aren’t going to try to sell me stuff. But instead of surveillance ads, which let them target the most valuable possible audience for the lowest possible ad rates they have to fall back to the next most profitable options, which might be

  • contextual ad placements, funding better ad-supported content

  • reviewer and influencer programs

  • content marketing

  • increase product quality

  • lower price

Those options probably have less attractive profits or predictability for the company than the surveillance ads do, or the company would have chosen them in the first place. But by removing the surveillance ad option, as a shopper I can get more money to flow to more win-win options.

Step three: what happened to the ad-supported content? A lot of ad-supported content does get money from surveillance ads. It could turn out that the legit ad-supported sites end up better off, just by supply and demand. The number of available crap ad spots—that are only saleable because of surveillance—will go down. And after steps one and two, the customers will be sitting on more money, and can put some of it into subscriptions and crowdfunding. And subscription and crowdfunding models tend to send a higher percentage of the money to the content creator than advertising models do.

Of course, the market isn’t going to change because one person is harder to reach with surveillance ads. Ad reform is a collective problem, and needs tool building, education, and lobbying work.

We might be able to get some good data about this soon, thanks to the EDPB decision on Facebook ad tracking. It looks like some users are going to be able to use the exact same social site, but with random ads instead of personalized ones. When the users who picked Facebook’s non-personalized option turn out to own better stuff that they’re more satisfied with, that will help build toward a surveillance advertising ban. It’s a lot easier to justify a ban when it’s not about balancing harms and benefits, but more about stacking consumer benefits on top of the existing privacy and national security benefits.

More: journalist-owned news sites (Sunday Internet optimism, part 2)

14 Jul 2024

turn off advertising measurement in Apple Safari

Apple Safari includes an advertising measurement feature, but fortunately you can turn it off. Update 20 Sep 2024: I now have access to Apple Mac OS and an Apple iPhone and iPad to test on. Will keep these instructions up to date with any changes. These instructions cover Privacy Preserving Ad Measurement along with some other related settings.

On Apple iOS

Open Settings, select Safari, then scroll down to Advanced. Tap to open the Advanced settings.

Turn off Privacy Preserving Ad Measurement

While you have Settings open, you might as well check two other iOS tracking features.

  • In Settings, go to Privacy & Security, then Tracking, and make sure “Allow Apps to Request to Track” is turned off.

  • Also in Settings under Privacy & Security, find “Apple Advertising” and make sure that “Personalized Ads” is turned off. (You will probably have to scroll down—Apple makes this one a little trickier to find.)

On Mac OS

From the Safari menu, choose Settings, then check the Privacy and Advanced tabs. (On Safari - Version 16.0 (17614.1.25.9.10, 17614) it is under Privacy. On other versions you need to look under Advanced.

On some versions, the setting you need is under Privacy.

Uncheck Allow privacy-preserving measurement of ad effectiveness. (You can also set advanced tracking and fingerprinting protection to in all browsing if it doesn’t break any sites you use. Some corporate web applications are not compatible with Safari privacy features, so if you use Safari for work you might need to leave this alone.)

Update: The Advanced button on the Privacy tab will take you to the Advanced tab.

why turn this off?

The deeper they hide stuff like this, the more it shows they understand that it’s not in your best interest to have it on. (This feature is even turned on by default in Private Browsing!) The Apple billboards are all about protecting you from tracking. I haven’t seen one yet that was more like Connect and share with brands you love! (please me know if you see any Apple billboards like this)

Information has value in a market. When your browser passes information about you—even in a form that is supposed to prevent individual tracking—you’re rewarding risky and problematic advertising practices along with the legit ones. Some advertising has value, but putting legit sites and malvertising on an equal basis for data collection is not helping.

For other browsers

turn off advertising features in Firefox

Google Chrome ad features checklist

(Yes, all the big web browsers have stuff like this that you need to find and turn off now.)

13 Jul 2024

PET projects or real privacy?

(update 25 Jul 2024: add links to FTC surveillance pricing story)

Every advertising event is full of thought leader insights about privacy-enhancing technologies (PETs) for ad personalization and reporting systems. Somehow Big Tech, adtech, and martech are all fired up about projects for reimplementing personalized/surveillance advertising, but this time with a bunch of complicated math added in a way that makes it hard to identify or track an individual.

In the real world, though, individualized tracking is not the top advertising privacy problem, and it might not even be in the top five. Arielle Garcia, director of intelligence for Check My Ads, said, Privacy-enhancing tech doesn’t make creepy and disruptive ads less creepy or disruptive in the eyes of the average user. And the user research backs that up. Jereth et al. find that perceived privacy violations for a browser-based system that does not target people individually are similar to the perceived violations for conventional third-party cookies. Co-author Klaus M. Miller presented the research at FTC PrivacyCon (PDF):

So keeping your data safer on your device seems to help in terms of consumer perceptions, but it doesn’t make any difference whether the firm is targeting the consumer at the individual or group level in the perceived privacy perceptions.

This might not make sense if you compare just the information passed through PETs to what is possible to do with a third-party cookie. It’s fewer total bits of information, so the users should be getting more privacy and like it better, right? Not so fast. The real problems that people complain about when they raise privacy concerns are more about information asymmetry.

People tend not to want to participate in markets in ways that give their counterparties too much information. Imagine going on vacation and visiting an old-fashioned rug market—and showing every rug merchant your budget, shopping list, and departure date. Information imbalances are not how sustainable markets work. And that’s just legit markets. The situation is even worse on today’s scam culture Internet, where the old fake it til you make it has been replaced with make it, then fake it even harder. The pervasive risks of scam culture are mostly group-level personalization risks and not the consequences of being individually identified.

Some PETs just measure ad performance and don’t personalize. But when the same measurement system covers both harmful and win-win ad placements, it creates incentives for advertisers to chase the best possible metrics while creating the least possible value for the user. In a sustainable system, the user, or some party that the user trusts to provide an ad-supported resource, would be a data gatekeeper to keep data from leaking into the negative-sum corners of the Internet. People choose to provide their information to parties they trust—they don’t broadcast info about themselves to trusted and untrusted parties on the same terms. If browsers offered a privacy-preserving way to share corporate information with web users who visit the company site, would the corporate IT department leave it turned on?

how things got this far

PETs started to go mainstream in the advertising scene back in 2019 when Google announced the Privacy Sandbox project. (Apple was first with wide release of a PET ad system, Private Click Measurement in 2021.) At the time, some people picked up on PETs as an alternative to platform dystopia, a way for independent retailers and publishers on the web to make a compromise and continue to do business without a hard dependency on fixing the Internet’s deeper monopolization, discrimination, and fraud problems. But in the about five years that people have been been arguing about web ad PETs, most recently over Google’s ongoing effort to make their PETs acceptable to the Competition and Markets Authority (CMA) in the UK, a big PET win is looking less and less likely. After all the work that has gone into PETs, not only do big risks remain, but—and this is super awkward—the companies working on PETs are still the same Big Tech companies that PETs were supposed to…protect us from?

Meanwhile, in the years since the PET trend began, mainstream privacy laws have gone off in a different direction, and focused not on obfuscation and math, but on data minimization and on enabling people to find out how companies use data. Data subject access rights, or “right to know,” are a key component of modern privacy laws and make it possible for web users and for organizations working on their behalf to detect patterns of illegal discrimination. In a 2019 settlement, Facebook agreed to stop using age, gender and ZIP code for housing, employment and credit ads. Reporters were later able to track the company’s progress. Privacy researchers and advocates continue to benefit from “right to know” processes today. A PET environment, however, limits access to data, making algorithmic discrimination hard to detect.

Much recent privacy news would have been as bad or worse in a PET environment. Grindr users aren’t suing over privacy because they were individually identified, they are suing because they were identified by HIV status, a fact common to a large group of people. Reviews of Google’s Privacy Sandbox, have, for quite a while, anticipated regulator concerns over algorithmic discrimination and transparency. The W3C TAG, in a review of a Privacy Sandbox” sub-project, wrote,

The Topics API as proposed puts the browser in a position of sharing information about the user, derived from their browsing history, with any site that can call the API. This is done in such a way that the user has no fine-grained control over what is revealed, and in what context, or to which parties. It also seems likely that a user would struggle to understand what is even happening; data is gathered and sent behind the scenes, quite opaquely.

Under privacy laws, users have the right to access not just their raw data, but the inferences made from it—a capability that will be difficult to retrofit into PETs. Google posted a FAQ stating,

Chrome can and will take steps to avoid topics that might be sensitive (i.e. race, sexual orientation, religion, etc.). However, it is still possible that websites calling the API may combine or correlate topics with other signals to infer sensitive information, outside of intended use. Chrome will continue to investigate methods for reducing this risk.

No results have been posted from this investigation so far. Someone will probably get a Best Paper award at a conference for solving algorithmic discrimination in a PET-based ad system…eventually. Until that happens, PETs will struggle to meet basic transparency requirements in more and more jurisdictions. The data obfuscation problems introduced by PETs will also create compliance challenges in the area of competition policy. In a recent Google report (PDF) to the CMA, the company acknowledges that compliance with the Digital Services Act (DSA) by “Privacy Sandbox” publishers and advertisers remains an unsolved problem.

alternatives

So what do we do instead? PETs may continue to be valuable in fields like software telemetry, where the end result is all users receiving the same bug fixes, not different treatment of different users. But ads are a harder problem. First of all, PETs are not a shortcut to solving some important platform trust issues.

I see that your industry sold my parents a retirement scam, put malware in my search results, and showed fake repair shops on a map of my neighborhood…but I totally trust your answer to this complicated math problem. — no one ever

The hard part about running any Internet service is the moderation (including ad review if it’s an ad-supported service). At some point Big Tech management needs to get over its dogmatic union-busting and sign fair contracts with the moderators and all the other people who do the high-skill human side of their operations. You can’t fix a reputation problem with math.

As far as how to make the ads work, it’s possible to get rid of third-party cookies and other problematic identifiers like mobile ad IDs without also taking on an open-ended research project. Instead of starting from what data flows are acceptable to today’s platforms/adtech/martech and trying to stick enough math on them to make them acceptable to users, start with how people choose to participate in markets and automate it. The Lindy Effect is a real thing in marketing. Ideas get tested out all the time, but the practices that tend to persist are the ones that have endured many generations of marketers, companies, and technologies. So the history of marketing practices that people have been willing to accept in the past is probably the best guide here. Realistically, markets will always have high and low reputation sellers, and the advertising most likely to persist will be whatever can pay for itself in (ad-supported resources + economic signal) > (cost in attention + resources + risk).

Finding something successful and sustainable and adapting it to the web is good for more total ad revenue over time—even if in any one transaction you get to capture less of it. A browser is a user agent, which means it does what users would do for themselves if they had time. People like to buy stuff, and prefer to buy better stuff. How can browsers help?

More: Sunday Internet optimism

11 Jul 2024

turn off advertising features in Firefox

It’s understandable that Mozilla wants to break their dependency on Google search ads, but right now they seem to be doing it by, yes, yet again putting advertising features in the browser. They seem to get on an ads in the browser kick every few years, but this time they’re doing it in collaboration with Meta, in a way that introduces fraud, surveillance, and antitrust risks. Google gets all the ink for their ad features in the browser project, but Firefox has given us some to-do items, too. I’ll keep this post up to date if the instructions change.

Privacy-preserving attribution

Privacy-preserving attribution (PPA) is the ad tracking thing cooperatively developed with Meta. It’s turned on by default as of Firefox 128, which is kind of a…bold…move, considering that even Google pops up a Got it dialog before turning on their ad stuff in Chrome. Here are the instructions for turning it off.

  1. Click the menu button (≡) and select Settings.

  2. In the Privacy & Security panel, find the Website Advertising Preferences section.

  3. Uncheck the box labeled Allow websites to perform privacy-preserving ad measurement.

To turn off this feature in advance, before upgrading to Firefox 128, follow the instructions from Michael Kjörling:

  1. Go to about:config

  2. Create an entry:
    dom.private-attribution.submission.enabled with a value of false.

That blog post also has info for disabling this by default in user.js which I have not tested.

Just putting privacy in the name of a feature doesn’t make it less creepy. Considering today’s branding trends it might even go the other way. Your privacy is important to us is the new your call is important to us. If you dig into the literature behind PPA, you will find some mathematical claims about how it prevents tracking of individuals. This is interesting math if you like that kind of thing, and the design does have some constructive features. But even if it all works error-free (update 5 Oct 2024) the problem is that on the real web, PPA provides incentives to track you more. Your sense of being creeped out by a web feature is a better applied behavioral economist than it gets credit for.

Firefox for Android

Thom Holwerda posted:

  1. Go to chrome://geckoview/content/config.xhtml
  2. Use the search field to find the entry for aboutConfig and enable it
  3. Go to about:config
  4. Search for dom.private-attribution.submission.enabled and set it to false

(I have not tested this—if you have better instructions please let me know.)

Turn on Global Privacy Control

I’m going to end with some good news and ask, please don’t get mad at Firefox in general just because of one more fad-chasing move by management. I have been using this thing since it was Netscape Navigator, and even worked there for a while, and I have always been able to muddle through. Here is a place where Firefox is ahead of the other browsers: Global Privacy Control support.

  1. From the ≡ menu button, Settings.

  2. In the Privacy & Security panel, scroll down to Website Privacy Preferences and turn on Tell websites not to sell or share my data.

You can check that it works by going to globalprivacycontrol.org. Look for GPC signal detected at the top of the screen.

Ad blocker check

Mozilla seems to be doing better at keeping the scam ad blockers out of their extensions directory than Google—but that might just be that fewer malware developers are targeting Firefox now. But it is a good idea to check that your ad blocker is protecting you—privacy is now the number one reason why people install ad blockers. You can check your browser setup at EFF’s Cover Your Tracks site.

For system administrators

(Update 24 Dec 2024) If you have permission to write Firefox policies, the preference can be set and locked for all users. See turning off browser ad features from the command line.

Work in progress

This is a work in progress, will update as needed. If you’re looking for info on the Mr Robot TV show plugin I think this is totally gone by now and you should not have to do anything.

More on why to do this

why I’m turning off Firefox ad tracking: the PPA paradox How the design of PPA causes more privacy risks.

Firefox deletes promise to never sell personal data, asks users not to panic

Ad features in other browsers

Google Chrome ad features checklist covers how to turn off the ad stuff in Google Chrome (the easiest so far).

turn off advertising measurement in Apple Safari (the setting is buried under Advanced so do this one tip and congratulations, you’re an advanced user)

10 Jul 2024

Using GitHub Pages to host a locally built site

Something I recently ran into: How do you build a site locally and host the resulting built version of the site on GitHub Pages? Here’s one way to do it. First, have a make all target that builds the public files from the source pages (for example, pass CommonMark filenames to Pandoc to get HTML.) Then make deploy does several things:

  1. Make a fresh build directory and copy the source files, the Makefile, and the .git directory into it.
  2. Run make gh-pages inside the build directory. That target depends on the all target, so this builds the site inside the directory.
  3. Next, make gh-pages inside the build directory commits the public files on the gh-pages branch, then force pushes to GitHub.
  4. Back in make deploy, delete the build directory. That includes the .git directory inside, so no history of built files gets preserved.
# We make this site with "make" locally and deploy generated pages to GitHub in
# a branch. First, delete the build directory and the gh-pages branch. Then
# copy the site files into the build directory and make the gh-pages target
deploy : all
    (git branch -D gh-pages || true) &> /dev/null
    rm -rf build && mkdir -p build
    cp -a Makefile .git $(SOURCES) build
    make -C build gh-pages
    rm -rf build

# This target only runs inside the build directory and does a commit and push
# on the gh-pages branch. If you look at this project on GitHub you should see
# the original .md files on the main branch and the generated HTML files on the
# gh-pages branch.
gh-pages : all
    basename `pwd` | grep -q build || exit 1
    rm -f .git/hooks/pre-push
    git checkout -b gh-pages
    git rm -f $(SOURCES)
    git add -f $(PUBLICFILES)
    git commit -m "this is a temporary branch, do not commit here."
    git push -f origin gh-pages:gh-pages

This is from a Makefile for a very basic .org site. The -C option to make is to change to the given directory before reading the Makefile. (That site has versioning for the source files on GitHub too, but it doesn’t have to be.)

Another option is to put the built files in a subdirectory, but then you end up committing the built files on the main branch: Using different static site generators · community · Discussion #21563

06 Jul 2024

Big Tech platforms: mall, newspaper, or something else?

The Pruneyard is “an iconic destination and experience designed to make the everyday extraordinary.” It’s also, according to the US Supreme Court, a business establishment that is open to the public to come and go as they please. The views expressed by members of the public in passing out pamphlets or seeking signatures for a petition thus will not likely be identified with those of the owner. Pruneyard Shopping Center v. Robins, in which a student group ended up being allowed to distribute their leaflets in some areas of the mall, is a Supreme Court case that keeps coming up in discussions about what kinds of regulation are appropriate for the “Big Tech” companies.

Is a Big Tech platform more like a mall, or more like a newspaper? Making the best case for the newspaper option, Mike Masnick at Techdirt asserts that the Big Tech companies have broad First Amendment rights that extend to cover many design and business model decisions. This point of view is pretty far outside the mainstream—if the First Amendment really extended that far into protecting business models, then the Fair Credit Reporting Act would be unconstitutional, and lenders and fintech companies would have the right to spread any kind of negative information about anyone (up to the generous limits of libel law). On the extreme end of the other side, of course, we hear from far right and far left politicians who want to shut down even classic First Amendment media such as newspapers and newspaper-like web sites.

In the middle, though, the tricky part is to figure out is which online platforms are more like a mall, where permitting the people who use it to exercise their own personal rights is not an unconstitutional infringement of the platform owner’s rights, as covered in the Pruneyard decision…

Here the requirement that appellants permit appellees to exercise state-protected rights of free expression and petition on shopping center property clearly does not amount to an unconstitutional infringement of appellants’ property rights under the Taking Clause. There is nothing to suggest that preventing appellants from prohibiting this sort of activity will unreasonably impair the value or use of their property as a shopping center. The PruneYard is a large commercial complex that covers several city blocks, contains numerous separate business establishments, and is open to the public at large. The decision of the California Supreme Court makes it clear that the PruneYard may restrict expressive activity by adopting time, place, and manner regulations that will minimize any interference with its commercial functions.

…and when is a site more like a newspaper? It makes sense in principle to treat a privately owned platform as a mall in situations where it works like a mall, and people want or need to do the kinds of things they do in a mall, but how do you draw the line?

A state law writing assigment, direct from the Supreme Court

The mall or newspaper question has been in the news lately because of the Supreme Court’s decision in Moody v. Netchoice, which is largely a writing assignment for state legislators. (full decision PDF) The task that the Court has set for legislators is to come up with laws that can somehow take a whack at Big Tech’s multifarious villainy—mental health harms, child exploitation, national security threats, fraud, and all the rest—and to draft those laws in a way that clearly treats an online version of the Pruneyard Shopping Center, a place where the rights of the users outweigh the rights of the platform owner, differently from a site that works like a newspaper.

But instead of getting too far into the weeds on the mall or newspaper question, what about other options? Big Tech has qualities more similar to other businesses that are covered by other kinds of laws.

Regulation is justified by the scale and impact of the gambling and credit reporting industries. Even though a hypothetical Techdirt for slot machine designers or Techdirt for credit bureaus could stretch out a First Amendment argument to cover those two cases, in reality they’re regulated. State legislatures need to identify and regulate a Big Tech business practice in a way that will also hold up in court. Fortunately, there is one that’s far enough away from speech to clearly avoid First Amendment issues: surveillance, or cross-context tracking.

Cross-context tracking includes using pixels, SDKs, or “conversion tracking” APIs as a way for any business to send information about any person to the Big Tech company. (More about conversion tracking, from Consumer Reports). Cross-context tracking also takes place within Big Tech companies, when, for example, the same company owns both a video sharing platform and a web browser, and uses data collected by the browser to personalize ads on the video sharing platform.

Cross-context tracking affects hundreds of billions of dollars in advertising and content, as well as people’s opportunities in housing, employment, and other areas. So we should be able to avoid asking mall or newspaper? as much as possible and work toward building a consensus: a company that collects or accepts tracking data about people’s actions in one context in order to inform decisions that affect them in another context is holding itself out as a networked meeting place for other people’s transactions, and in need of licensing. The tracking is in no way expressive by the Big Tech company. It’s an inbound flow of data, not outbound, and the company has no knowledge or control of what the tracking events will be.

Just as states regulate and inspect any business with a gambling or liquor license, a state licensing agency for cross-context tracking would be able to handle citizen complaints about Big Tech platforms. Even if you believe that cross-context behavioral advertising is a net economic win, the people subject to it face more than enough risks to justify a state-level licensing system. For the state whose legislature can get cross-context tracking licenses into law, it’s a win-win:

  • Licensing decisions for cross-context tracking could take into account policy goals such as requiring adequate staffing for security support.

  • Costs to the state government could be minimized. A company would be required to disclose its own cross-context tracking in its license application, state regulators wouldn’t have to go try to figure it out. And objections to licenses could be raised through public comment forms and meetings.

  • A company could cleanly avoid regulation by staying out of the cross-context tracking business and not applying for a license.

  • Can promote fair competition because the tracking that requires a license would be across contexts, not owners. A Big Tech company that gained a new context, such as a site or app, by acquisition would need to update its license.

A licensing bill covering cross-context tracking could get support from a variety of interests, including people who already believe in the harms of cross-context tracking and want fewer, better-run companies doing it, and people who are neutral or even feel positively about the tracking part but want to use future public meetings about tracking license renewals as a way to get Big Tech to improve their behavior toward their state’s citizens and businesses. Small business owners could show up at a license renewal meeting and have real impact, not just get rounded up to advocate for Big Tech and against privacy one day a year while their support tickets go to oblivion the rest of the time. And maybe, as Big Tech platforms increasingly resemble the spawn of the arbitrary top-down decision-making of a credit report and the psychological manipulation of a slot machine, the answer to the mall or newspaper question is neither.

More: surveillance licensing in practice

30 Jun 2024

Return of the power user

It’s all coming together. The PC power user is back. In the early days of small business microcomputers, the person with a subscription to Byte, and some knowledge of topics like batch files and how to get support for a printer driver, was able to bring up the value of the office computer up from, say, a 2 on a scale of 1 to 10 to maybe a 4 or 5. The desktop PC options kept getting better, though, so by the time Microsoft got to Windows XP in the early 2000s, the small office PC was more like a 6 or 7 without tweaking, and power user skills made less of a difference. No more DIP switch settings to learn! When you plug in a printer it works out of the box! You can get updates over the Internet! And the small business Internet scene, for a while, was solidly in Web 2.0 create more value than you capture territory. All that added up to much less value returned from the time invested to become a power user.

But now the balance is shifting again. Now the small office or home office PC is more of a point of sale device, loaded with surveillance software, compliance risks, and SaaS upsells. The peripherals work, in a sense, but they don’t work so much for you as for some far-away product manager who needs to nail their OKRs to get promoted and afford a down payment on a house.economics experiment I’d like to try: make middle-class housing affordable on an honest IT salary to test my hyothesis that enshittification would go down And the small-business-facing Internet is a more or less wretched hive of scum and villainy, from fairly mild shelfware without the shelf schemes, all the way to actively heinous stuff like sending your marketing budget to terrorists.

But the underlying PC hardware is still getting way better. And Internet service is getting faster with lower latency, and the best of the software you can easily get is still getting way better. Which means a bigger gap between baseline and advanced configuration, so a bigger win from learning power user stuff. Yes, I’m a Linux nerd, but you can probably get your computer into an acceptable state without switching OSs. (I got a Microsoft Windows laptop for work, and the two OSs have gotten a lot more similar. On Linux a video conference is more likely to just work, and on Windows, stuff is more likely to break because somebody got bored with it.) So if in the 1980s you could bring an office PC up from a 2 to a 5, and in the 2000s you might not get much noticeable change, now you can bring your PC from negative territory up to a 7 or 8. Looks like a big win, even if you don’t count the payoff from scam protection. More: effective privacy tips

28 Jun 2024

Money bots talk and bullshit bots walk?

Now that AI can beat a Turing test by bullshitting, what’s the next test? In Prediction Market Trading as an LLM Benchmark, Jesse Richardson suggests that setting up an LLM to trade on a prediction market (e.g. Polymarket, which is the platform I’ll talk about here) could be a particularly strong benchmark with a number of desirable properties. Scott Alexander also suggests prediction markets as a useful challenge for bots.

Seems like a good idea. The best part about prediction markets as a benchmark is scaling. One skilled prediction market trader can end up on the winning side of trades with a large number of low-skill traders, human or bot. LLM benchmarks that depend on evaluating generated text are much harder to scale, and it’s usually easier to bullshit than to detect bullshit. To make a market-based pass/fail test, give the bot a stake and charge it interest on its stake and rent for the computing resources it uses. A bot passes the test if it can stay solvent for the agreed-upon time in a market with human traders. Even if bots can’t pass the test, they might still have their uses, since they can help add liquidity to corporate prediction markets and incentivization markets. (More: boring bots ftw)

In practice this market test would not be a pure prediction benchmark, but would have some text generation aspects, too. A bot would get a substantial advantage by explaining its trades afterwards—talking up its positions, and telling other traders why they should pay more than the bot did. But the bot’s ability to explain its actions in a persuasive way does not have to be tested just by looking. It can be evaluated indirectly by looking at how well the bot can persuade other traders.

The big problem with building an open ecosystem for AI is the cost of benchmarking. Even when it is possible to train a model at low cost, evaluating that model depends on either hiring large numbers of human reviewers, or paying for access to a larger model. In order to get to the own your own AI stage, the benchmarking budget problem needs to be addressed early on. Integrating an incentivization market with existing open-source collaboration infrastructure (Pinfactory connects to a GitHub project pretty easily, just saying) could be a good start. More: boring bots ftw

24 Jun 2024

This site is opted out of AI. Perplexity.ai crawled it three times

Quick test of the perplexity.ai thing. From here it looks like they ignored this site’s 3 layers of AI protection:

  • robots.txt
  • X-Robots-Tag HTTP header
  • noai meta tag

and that they’re crawling from three different IP addresses with three different user agent strings, which look like normal browsers.

The test I did: I have some pages on this site that are not linked to from anywhere, so tested to see if this company would crawl one. (Mostly just random stuff that I have shared with friends but not blogged.)

My perplexity.ai query:

Can you explain how to play the board game [game name] which is covered by the ruleset at [URL]?

and that URL (which gets a couple of hits on game nights, that’s it) was immediately visted by not one, not two, but three different IP addresses.

44.221.181.252 “Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36”

45.56.133.241 “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36”

134.73.83.233 “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36”

Why three? The first reason I can think of is to try to evade a block, but really? Any better explanations?

16 Jun 2024

happy Father’s Day, here’s a Dad joke

Ready? Joke time. Here’s an old one.

What’s the difference between a donut and a turd?

I don’t know.

Remind me never to send you out for donuts.

What reminded me of that joke is all the surveillance advertising companies going on about how surveillance advertising is so good for small businesses. But if they have so much trouble telling small businesses and fraud apart, how can they know? Maybe surveillance ads are just better for fraud. The interesting comparison to make is not between a legit business at times they have surveillance advertising on or off, because the scammers competing to reach the same customers are leaving the surveillance ads on. IMHO you have to look from the customer side. If surveillance advertising helps legit companies reach people who can benefit from their products, then people who use ad blockers or privacy tools should be less happy with the stuff they buy.

Instead, people who installed ad blockers for a study turned out to be less likely to regret their recent purchases, and that’s surprising enough to be worth digging into. Maybe it’s not fraud, just drop-shippers. Lots of drop-shippers/social media advertisers are finding existing cheap products, marking them up, and selling using surveillance ads. It’s not illegal, but the people who click the ads end up paying more money for the same stuff. Maybe the reason that the ad blocker users are happier as shoppers is that they search out and buy, say, a $20 product for $20 instead of paying a drop-shipper $99? Or maybe ad blocker users are just making fewer but better thought out purchases?

15 Jun 2024

links for 15 June 2024

Just some reading material, more later. I did mess with the CSS on this blog a little, so pages with code on them should look a little better on small screens even if you have to scroll horizontally to see the code.

The Eclipse of the Russian Arms Market China is entering the market for traditional Russian products.

‘Devastating’ potential impact of Google AI Overviews on publisher visibility revealed (This is strange. Right at the time Google needs all the support they can get for their unpopular privacy and antitrust positions, they’re taking action against everyone else on the web. Not sure what the plan is here.)

Which top sites block AI crawlers? All in all, most sites I looked at don’t care to have their content used to train AI. (IMHO this will be a big issue with the Fediverse—currently the only way to pass a noai signal is to defederate. I made a FEP (fep-5e53) so will see what happens.)

Why First Party Data May Not Save Digital Advertising (This is why it’s going to be better to get real consent, later, from fewer people than bogus consent based on zero information about the brand or publisher.)

AI won’t kill ad agencies. Here’s why. Why? Because an agency can amortize the cost of expertise across multiple different paying clients.

United Airlines wants to show you personalized seatback ads: Here’s how to opt out (Meanwhile, other airlines are getting rid of heavy seatback entertainment systems to save fuel, since passengers are bringing devices with better screens anyway.)

“Your personal information is very important to us.” (XScreenSaver for Android has a privacy policy now.)

Economic Termites Are Everywhere [E]conomic termites…are instances of monopolization big enough to make investors a huge amount of money, but not noticeable enough for most of us. An individual termite isn’t big enough to matter, but the existence of a termite is extremely bad news, because it means there are others. Add enough of them up, and you get our modern economic experience.

Tesla may be in trouble, but other EVs are selling just fine (How much of this is the brand personality and how much is the problem that Teslas are expensive to insure? I think every car I have ever owned ended up costing a lot more in car insurance than its price.)

Facebook’s Taylor Swift Fan Pages Taken Over by Animal Abuse, Porn, and Scams (Moderation is the hard part of running any online forum, and AI moderators are the new self-driving cars.)

You Can Still Die From World War I Dangers in France’s Red Zones (This is why Europe has an AI Act. They have more important problems than building robots to take people’s art. Putting limits on luxury and counterproductive uses of AI will free up money and developer time for the stuff they really need. Before people in the USA get mad about this, remember we did it too. There’s no such thing as a 1943 Cadillac Coupe de Ville.)

We need to rewild the internet For California residents, GPC automates the request to “accept” or “reject” sales of your data, such as cookie-based tracking, on its websites. However, it isn’t yet supported by major default browsers like Chrome and Safari. Broad adoption will take time, but it’s a small step in changing real-world outcomes by driving antimonopoly practices deep into the standards stack — and it’s already being adopted elsewhere.

12 Jun 2024

SPC: an X-Robots-Tag for privacy control

It’s easy to mock generative AI for weird stuff like telling people to put glue on pizza, inspiring a reporter to write a story about making glue pizza, then summarizing the story. But there is a serious side to the AI issue. Protecting the content of a web site from AI training is not just about trying to avoid market competition with copied and scrambled versions of your own content. it’s not fair use, seriously, just read the actual four factors of fair use or ask a librarian. It’s just not a thing. When you put parts of your personal life on your web site, the blurry compressed version of it that AI spews out has other, more personal, risks too. Nonconsensual Nude Apps are just the beginning. AI companies, operating at large scale with little human moderation, process people’s personal info in irresponsible or illegal ways. MSN boosted an AI-generated article stating that an Irish DJ and talk-show host was on trial over alleged sexual misconduct.

AI-specific laws are still in progress, and copyright cases are still making their way through the court system. I still don’t know if all the stuff I did to block AI training on a web site is going to be enforceable—it depends on how well web site Terms of Service hold up in court as contracts. But in the meantime we do have a tool that is already in place and tested. An Opt-Out Preference Signal like Global Privacy Control is a way to signal, in a legally enforceable way, that you opt out of the sale or sharing of your personal information.

GPC already protects residents of California, Colorado, Connecticut, and other states in the USA, and enforcement is coming on line in other jurisdications as well. Sounds like a useful tool, right? But there’s one missing piece. The GPC standard covers a signal sent from the client to the server. When you visit a site as a user, this is just fine. But when you need protection for a blog, a portfolio, or a profile page, your personal info is on a server, but the company looking to exploit it is running a client—a crawler or scraper. That’s where we need to borrow some basics from the methods for blocking AI training on a web site and add a meta tag and HTTP header that work like GPC, from server to client.

The header is pretty easy. I just did it. Have a look at this site’s HTTP headers in developer tools or do a

curl -I -q https://blog.zgp.org/ | grep X-Robots-Tag

and there it is.

X-Robots-Tag: noai, noimageai, SPC

And here’s the meta tag.

<meta name=”robots” content=”noai, noimageai, SPC”>

TODO items

  • Colorado has a process for registering OOPSs, so I will need to write this up and submit it so it’s valid there. In other jurisdictions the OOPS is valid as long as it expresses the deliberate opt-out of the user, which mine does.

  • Just to make it extra clear, I need to put something in my Web Site User Agreement, the way a lot of sites do for noai

  • continue to GPC all the things!

Remember that laws are downstream of norms here. People generally believe in moral rights and some kind of copyrights for people who do creative work, and people generally believe in some kind of privacy right to control use of your personal information. And you shouldn’t be at a disadvantage when you choose to share personally relevant content on your own site compared to people who share on a big service. The details will get worked out. Big AI will probably be able to make bogus legal arguments, delay, and lobby for a while, but in the long run the law will reflect norms more than it reflects billable hours spent trying to push a disliked business model uphill. Comments and suggestions welcome.

09 Jun 2024

Block AI training on a web site

(Update 24 Sep 2024: originality.ai has a page showing the trend of increasing AI crawler blocking)

(Update 14 Jun 2024: Add darkvisitors.com API and GPC.)

I’m going to start with a warning. You can’t completely block “AI” training from a web site. Underground AI will always get through, and it might turn out that the future of AI-based infringement is bot accounts so that the sites that profit from it can just be shocked at what one of their users was doing—kind of like how big companies monetize copyright infringement.

But there are some ways to tell the halfway crooks of the AI business to go away. Will update if I find others.

robots.txt

Dark Visitors - A List of Known AI Agents on the Internet is a good source of an up-to-date set of lines to add to your robots.txt file.

This site uses the API to catch up on the latest. So if I fall behind on reading the technology news, the Makefile has me covered.

# update AI crawlers blocking list from darkvisitors.com
tmp/robots.txt : 
        curl -X POST "https://api.darkvisitors.com/robots-txts" \
                -H "Authorization: Bearer $(shell pass darkvisitors-token)" \
                -H "Content-Type: application/json" \
                -d '{"agent_types": ["AI Data Scraper", "AI Assistant", "Undocumented AI Agent", "AI Search Crawler"], "disallow": "/"}' \
        > $@

# The real robots.txt is built from the local lines
# in the conf directory, with the
# darkvisitors.com lines added
public/robots.txt : conf/robots.txt tmp/robots.txt
        cat conf/robots.txt tmp/robots.txt > $@

One of my cleanup scripts gets rid of the tmp/robots.txt fetched from Dark Visitors if it gets stale, and I use Pass to store the token.

X-Robots-Tag HTTP header

DeviantArt covers how to set the X-Robots-Tag header (which also has other uses to help control how search engines crawl your site) to express an opt-out.

On Apache httpd (I know, I’m old school) it’s something like this:

Header Set X-Robots-Tag "noai"

You can check it under “network” in browser developer tools. It should show up in response headers.

noai meta tag

Raptive Support covers the noai meta tag. Pretty easy, just put this in the HTML head with any other meta and link elements.

<meta name="robots" content="noai, noimageai">

That support FAQ includes a good point that applies to all of these—the opt out is stronger if it’s backed up with the site Terms of Service or User Agreement. Big companies have invested hella lawyer hours in making these things more enforceable, and if they wanted to override ToS they would be acting against their other interests in keeping their sites in company town mode.

new: privacy opt out for servers

This is the first site to include the new SPC meta tag and X-Robots-Tag header for a privacy opt-out that works like Global Privacy Control but for servers. Basically you have legally enforceable rights in your personal information, blogs have personal information, but regular GPC only works from your browser (client) to company on the server. This goes the other way, and sends a legally enforceable* *yes, I know, this has not yet been tested in court, but give it a minute, we’re just getting started here privacy signal from a personal blog on the server to an AI scraper on the client side.

So the new header on here is

X-Robots-Tag: noai, noimageai, SPC

So we’re up to four, somebody send me number five?

08 Jun 2024

business recommendations

Since there’s a search quality crisis on, a lot of the companies you might find on social media are scams, and a lot of the stuff sold on big retail sites is fake, here are some real businesses I can recommend in several categories. Will fill in some more.

I personally know about all of these and would be happy to answer questions.

art, crafts, gifts

Modern Mouse (A place for local artists and artisans to sell their work.)

audio gear

Sweetwater is a good source of pro/semi-pro/office/podcasting equipment. Good support for checking compatibility and other questions.

auto body shop

Fix Auto Alameda No problems with the work, good communications, cooperated well with the insurance company.

books

Books Inc (Several Bay Area locations including SFO. If they don’t have it they can order it. Better than ordering from the big online bookstore because they only get books from the real distributor. You will only get legit copies, not those cheaply printed pirate versions that pay no royalties to the author.)

burritos

Island Taqueria 1313 Park St., Alameda. (Bay Area’s best burritos. All the classic menu items people move to the Bay Area for, consistent quality. El Gran Taco in San Francisco would have been a contender but they’re gone now.)

car repair

Fred’s Wrenchouse has kept a 22-year-old car going for me. (They also provide good recommendations for shops that do the work they don’t.)

car stereo

Crutchfield is famous for compatibility charts and support. Will sell you the right adapters and stuff to do an install. (If you haven’t put in a car stereo using the stuff they supply, it’s a little harder than building Ikea furniture but easier than an electronic kit or advanced Lego set.)

chocolate chips

I use Guittard Semisweet Chocolate Chips or Extra Dark Chocolate Chips in my recipe.

cookies (third party)

Newman’s Own makes a better chocolate/white stuff/chocolate cookie than Oreo®. (Which is good, because the owner of Oreo cookies, Mondelez International, was recently removed from the list of royal warrants in the UK because of the company’s activities in Russia.)

delicatessen

Zingerman’s Deli (mail order available)

earbuds

JVC Gumy HAFX7 These really sound good and come with a set of silicone ear pieces in different sizes, so in real-world listening situations they sound better than more expensive options that don’t fit as well. (In my experience most drama and waste from electronic devices are caused by apps, firmware, Terms of Service, radios, and batteries. These have none of those.)

electrician

S O T Electric Inc did a really good job with a major upgrade to an existing house electrical system.

fabric

Mood Fabrics is a huge store in New York City, with lots and lots of options. No sign of it from street level. Allow plenty of time. (Have only shopped there in person, have not ordered online.)

Stonemountain & Daughter Fabrics in Berkeley, California has a great selection of fabric and really good service. (Have only shopped there in person, have not ordered online.)

flour

King Arthur Baking is recommended by people who know a lot more about flour in recipes than I do. The company is employee-owned and provides customer support.

hardware

Encinal True Value Hardware

Paganos Hardware

irrigation

The Urban Farmer Store

Internet service

monkeybrains.net

pasta sauce (in jars)

Newman’s Own makes a Sockarooni Sauce is chunky and versatile, and also priced comparable to the big brands and less than the fancy brands.

pharmacy

Versailles Pharmacy 2801 Encinal Ave., Alameda.

plants

Annie’s Annuals and Perennials

Encinal Nursery has good local recommendations for outdoor plants in season, along with an interesting selection of healthy houseplants in the back.

plumbing

Gladiator Plumber 1752 Timothy Drive, San Leandro. Have used them for everything from small jobs up to a water heater install. Easy scheduling and consistent good work.

power adapters and cables

Anker makes Lightning and USB cables that last longer than the Apple ones. Better overmolding and strain relief so they don’t get all frayed and crappy. Also good for power bricks and chargers.

roofing

Planchon Roofing & Siding Co did a house and garage roof plus a minor repair, all good.

shoes

Allen Edmonds (available at Nordstrom)

sidewall shingling

Nica Sidewall Shingling

stereo repair

Champlifier

transportation

Transbay buses operated by AC Transit are the best way in and out of Alameda unless you have something to do right by the ferry terminal or someone is meeting you there. Lots of stops convenient to the Alameda businesses linked to here, and you will generally get a deluxe bus with comfy seats on a Transbay route.

06 Jun 2024

some good recent links

Just in case you have a script for finding interesting links, here are some links from mine…

Parable of the Sofa It seems blindingly obvious that an economy with a higher proportion of lifestyle businesses is going to be more resilient, more humane, and immensely more pleasant than the one that the Leaders Of Industry are trying to build. How would we get there from here?

Lord Kelvin and His Analog Computer On Thomson’s tide-predicting machine, each of 10 components was associated with a specific tidal constituent and had its own gearing to set the amplitude. The components were geared together so that their periods were proportional to the periods of the tidal constituents. A single crank turned all of the gears simultaneously, having the effect of summing each of the cosine curves.

Solar Passes 100% of Power Demand in California! [UPDATED] (electricity prices going negative regularly is going to be a big opportunity)

What is the Cara app, and why are artists deleting Instagram for it? (nifty image sharing site with built-in poisoning for ML training)

Online Privacy and Overfishing What seems normal to us in the security community is whatever was commonplace at the beginning of our careers.

One Facebook Ad Promotes a For-Profit College; Another a State School. Which Ad Do Black Users See? (algorithmic discrimination is already a hard problem to track down—and so-called privacy-enhancing ad personalization systems are just making it harder.)

The Moral Economy of the Shire From everything we’re told, the Shire is a very agriculturally productive region, which helps explain the lack of debt-peonage or other forms of unfree labor. It also explains the relative “looseness” of the system we’re looking at here; the gap between the lower gentry and upper yeomanry isn’t very large, and most families are able to support themselves with only minimal assistance.

New zine: How Git Works! (memo to self: order this)

Amazon Sold Fake Copies of Hotly Anticipated UFO Book (more news from the Big Tech #pivotToCrime. When Microsoft needed support in its antitrust courtroom drama, the MS-Windows OEMs and ISVs were right there. Amazon, Meta, and Google need support now—but they’re telling the content business to go eat a turd.)

Video Shows China’s Rifle-Equipped Robot Dog Opening Fire on Targets (If you thought wow, AI means we don’t have to hire as many content moderators! was big news, get ready for wow, AI means our country’s army will be able to get by without conscription! stories)

Origins of the Lab Mouse The early supply of mice for research depended on a late-19th century community of hobbyists—fanciers—who collected, bred, and sold unusual mice varieties. These “fancy” varieties were then standardized in the 1920s…

WTf Quora: how a platform eats itself As Quora pursued AI-driven enhancements, things got weird. (At the very beginning of Quora, they tried populating it with bot-written questions. Then they cut back, it went through a human user phase, now back to crap.)

HouseFresh has virtually disappeared from Google Search results. Now what? (hard to swallow pills for Google management: employee hoarding and union contracts cost money, but are cheaper than trying to run a company in a layoffs-scheming-quality-revenue-layoffs doom spiral)

The Tigers at the Gate: Moving Privacy Forward Through Proactive Transparency GPC is easy to set up and listen for because it is a simple HTTP header. Perhaps too simple as it only conveys whether the choice mechanism is turned on or off (GPC=1 or GPC=0). Unlike the more complex IAB EU’s Transparency and Consent signal (TC String) signal itself does not encode information about the source of the opt-out signal, or provide details about how the signaling mechanism was implemented or presented to users. (imho this is a win. You have to respect GPC, but you can’t trust a sketchy site or adtech intermediary to set GPC correctly, therefore you can’t deal with sketchy sites or adtech intermediaries.

The deskilling of web dev is harming the product but, more importantly, it’s damaging our health – this is why burnout happens – Baldur Bjarnason You’re expected to follow half-a-dozen different specialties, each relatively fast-paced and complex in its own right, and you’re supposed to do it without cutting into the hours where you do actual paid web development. Worse yet, you’re not actually expected to use any of it directly. Instead you’re also supposed to follow the developments of framework abstractions that are layered on top of the foundation specialties, at least doubling the number of complex fields a web dev has to follow and understand, right out of the gate. (I don’t know about you, but this site has a basic HTML template, Pandoc, and a Makefile. That’s about it.)

A (tiny, incomplete, single user, write-only) ActivityPub server in PHP (lots of good news from the Fediverse. If it didn’t remind me of the original web already, it now has me pre-ordering from O’Reilly like it’s 1995 or something. Real 1995, not The Radiant Future! (Of 1995))

Google’s Protected Audience Protects Advertisers (and Google) More Than It Protects You (If you have Google Chrome, you can still turn this stuff off: Google Chrome ad features checklist)

Why I went back to buying CDs (and you should too) The integrity of my audio library had been corrupted, at least in small ways. Horns were easy to spot, but how many other songs or albums had been messed with, without my knowledge? It turns out, way more than I had thought.

Google Researchers Say AI Now Leading Disinformation Vector (and Are Severely Undercounting the Problem) As bad as the AI-generated images problem is according to the paper, it is very possibly much worse because the paper’s data sample is relying on public fact checker data, who are not selecting AI-generated image-based disinformation randomly. Sites like Snopes and Politifact, which have limited resources, are focusing on fact checking images that have earned some degree of virality or news coverage, so their fact checks serve a purpose or an audience.

Personal Blocklist (not by Google) (useful browser extension to remove sites from search results when they’re better at SEO than actual content)

Elon Musk’s Gifts to Web Scrapers (Guest Blog Post) [B]y providing a foil in litigation against both the Center for Countering Digital Hate (“CCDH”) and Bright Data (the world’s largest seller of scraped data), he’s given judges in the most important district court in the country for tech legal issues, the Northern District of California, plenty of motivation to rule against him. As a result, judges have provided two landmark opinions in the last 45 days in favor of web scrapers. This creates powerful new precedent that will make it easier for web scrapers to prevail in litigation and will make it much harder for websites to prevent scraping.

01 Jun 2024

cheese or woodstain?

It has come to my attention that any blog that mentions advertising must do a post including the expression Does Exactly What it Says on the Tin, so here is mine. Following up on the 30-40-30 rule, why are some people so fired up about personalized advertising, while others aren’t? Maybe it goes back to what kind of shopping use cases they’re optimizing for.

Phillip Nelson, in Advertising as Information, divides brand quaities into search qualities and experience qualities. A search quality is something you can check before buying the product, like tasting a sample of cheese. An experience quality is something you have to spend more time figuring out, like seeing if your woodstain dries in the time printed on the tin. Shopping for cheese and woodstain are a lot different.

Cheese shopping

  • Cheeses are similar as far as nutrition goes, so picking one is a matter of personal preference.

  • Cheese is easy to evaluate at the point of purchase. Mmm, sample cheese on a toothpick.

  • My own cheese-tasting palate is a better guide for me than the opinions of a cheese expert.

  • Cost of a mistake is low.

  • Top priority: getting the best-matched product among a set of alternatives in a narrow quality range.

Woodstain shopping

  • Has quality metrics that are not different from person to person.

  • Hard to evaluate at the point of purchase. You have to do your project and wait for it to dry (or not?)

  • The knowledge of a woodstain expert is more valuable to me than how I might feel about a certain brand of woodstain at the hardware store.

  • Cost of a mistake is high.

  • Top priority: avoiding a low-quality or deceptively sold product.

If you’re shopping for parts to build a PC, the mouse is cheese, the power supply is woodstain, and the video card is somewhere in the middle. If you’re buying a car, or a bike, or a pair of boots, it kind of depends on the ratio of your net worth and your budget for the item. Buyers who have a lot of money relative to the price of the product are more likely to be buying cheese, buyers who are sinking a lot of their assets into the purchase are buying woodstain.

Andrew Chen says that AI will reinvent marketing because it makes it possible to do a personalized, automated sales call for every possible purchase. The cost of personalization relative to the cost of the actual product goes down. This might be great for cheese shoppers. Imagine an AI that understands my cheese palate so well that it will suggest the yummiest possible cheese for me, every time. But automated, personalized communications sound terrible for woodstain shoppers. When it’s harder to evaluate the product, personalization just facilitates more kinds of deception, and public, signaling-based advertising might be more appropriate.

24 May 2024

boring bots ftw

Scott Alexander writes, about bots on prediction markets,

Most of these bots are boring. They’re bots programmed to automatically buy some market once the price gets low enough, or to arbitrage basically-identical markets, or do some other technical finance maneuver.

From the point of view of a active prediction market, where a lot of well-informed traders are speculating about well-known events, then those bots are not especially interesting.

The place where the boring bots do make a big difference, though, is in incentivization markets.

An incentivization market is like a prediction market except that one trading strategy is to make an event being traded on either happen or not happen.

Some markets can be both. (A use case for Policy Analysis Market would have been for someone with advanced knowledge of a terrorist attack to trade on their knowledge, and create a price signal that could prevent the attack.)

One problem for incentivization markets to get over is the large number of thinly traded contracts. So a boring bot would be just what you need for things like:

  • trade across likely duplicate and dependent issues to create fewer and more lucrative opportunities for human experts (can be within or across projects)

  • bid up the price of FIXED based on encouraging CI results, enabling developers to get out of all or part of a position early

  • front-run issues on behalf of a developer based on their interests and available time

The market helps compensate for erratic LLM behavior—an unproductive bot will lose its stake and get shut down. A bot doesn’t have to be run for max earnings, either. An arbitrage bot, for example, could break even or get subsidized to lose a little to keep the market smooth.

Incentivization market needs more noisy traders, LLMs need a cheap way to evaluate whether they’re doing something sensible. Seems like a cookies and milk situation.

04 May 2024

an easy experiment to support behavioral advertising (or not)

This is a follow-up to a previous post on how a majority of US residents surveyed are now using an ad blocker, and how the survey found that privacy concerns are now the number one reason to block ads.

Almost as long as Internet privacy tools have been a thing, so have articles from personalized ad proponents telling us not to use them, because personalized ads are good actually. The policy debate over personalized (or surveillance, or cross-context behavioral, or tracking-based, or whatever you want to call it) advertising seems to keep repeating an endless argument that on the one hand, personalized advertising causes some risk or cost, I’m not going to summarize the risks or costs here, go read Bob Hoffman’s books or Microtargeting as Information Warfare for more info but on the other hand we have to somehow balance that against the benefits of personalized advertising.

Benefits? Let’s see them. Cross-context behavioral advertising is good for consumers should be straightforward to test. If ad personalization really helps match buyers and sellers in a market, then users of privacy tools and privacy settings must be buying worse products and services. Research should show that the more privacy options you pick, the less happy you are with your stuff. And the more personalized your ad experience is, the more satisfied of a customer you are. This is different from asking whether or not people prefer to have ad personalization turned on. That has been pretty extensively covered, and the answer is that some people do, and some people don’t. This question isn’t about whether people like personalized ads or not, it’s about whether people who get more personalized ads are happier with how they spend their money.

This should be a fairly low-cost project because in general, the companies that do the most personalized advertising are in the best position to do the research to support it. Are users of privacy tools and settings more or less satisfied with the products and services they buy than people who leave the personalized ad options on?

  • Do privacy-protected users give lower ratings to the products they buy?

  • Do privacy-protected users return or stop using more of their purchases?

  • Are privacy-protected users more likely to buy a replacement, competing product after an unsuccessful first purchase in a category?

  • Are privacy-protected users more likely to agree with general statements about a decline in quality and trustworthiness in business in general?

The correlation between more privacy and less satisfied consumer would be detectable from a variety of angles. Vendors of browsers with preferences that affect ad targeting should be able to show that people who turn on the privacy settings are somehow worse off than people who don’t. Anti-adblock companies do research on ad blocker users—so how are shopping experiences different for those users? Any product that connects to a server for updates or telemetry is providing data on how long the buyer chooses to keep using it. And—the biggest opportunity here—any company that has an Apple iOS app (and that’s a lot of companies) should be able to compare satisfaction metrics between customers with App Tracking Transparency (ATT) on or off.

Ad platforms, search engines, social network companies, and online retailers all have access to the needed info on ads, privacy settings, locations, and purchases. Best of all, they’re constantly running customer surveys and experiments of all kinds. It would be straightforward for any of these companies to run yet another user satisfaction survey, to prove what should be an obvious, measurable effect. I’m really looking for any kind of research here, whether it’s a credit card company running a SQL query on existing data to point out that customers with iOS app tracking turned off have more chargebacks, or a longer-term customer satisfaction study, anything.

looking at the data we do have

The Consumer Welfare Effects of Online Ads: Evidence from a 9-Year Experiment | NBER Facebook users who had ads turned off entirely were asked to place a monthly valuation on Facebook, and the amount was not significantly different from the valuation given by users with ads turned on. (The researchers did not test personalized vs. random ads, just ads on/ads off)

Consumer Surveillance and Financial Fraud | NBER showed that Apple App Tracking Transparency, which reduces the ability of apps to personalize ads, tends to reduce fraud complaints to the FTC. We estimate that the reduction in tracking reduces money lost in all complaints by 4.7% and money lost reported in internet and data security complaints by 40.1%.

Balancing User Privacy and Personalization by Malika Korganbekova and Cole Zuber. This study simulated the effects of a privacy feature by truncating browsing history for some Wayfair shoppers, and found that people who were assigned to the personalized group and chose a product personalized to them were 10% less likely to return it than people in the non-personalized group.

The Welfare Effects of Ad Blocking by Lin et al. was different—members of the treatment group got an ad blocker affecting all sites, not just one retail site.

[P]articipants that were asked to install an ad-blocker become less likely to regret recent purchases, while participants that were asked to uninstall their ad-blocker report lower levels of satisfaction with their recent purchases.

The ad blockers used in that study, however, were multi-purpose ones such as uBlock Origin that block ads in general, not just personalization.

The effect of privacy settings on scams goes two ways: you can avoid being specifically targeted for a scam, but you can also get more scam ads by default if you feed in too little info to be targeted for the good ads.

The Internet as a whole is much more various in seller honesty level than the Wayfair platform is, which might help explain the difference in customer satisfaction seen between the Korganbekova and Zuber paper and the Lin et al. paper. Lin et al. showed that people were more satisfied as customers when receiving fewer ads in total, but they might have been even less satisified if they received more of the lower-quality ads that you’re more likely to get if adtech firms don’t have enough data to target you for a bigger-budget campaign.

Another related paper is Behavioral advertising and consumer welfare: An empirical investigation.

The presence of low quality vendors, along with the recent increase in the use of ad blockers, makes it increasingly difficult for new, high quality vendors, to reach new clients. Consumers benefit from having access to new sellers that are able to meet their needs through behavioral ads, as long as they are good sellers.

but

targeted ads are more likely to be associated with lower quality vendors, and higher prices for identical products, compared to competing alternatives found in organic search results

If you look back on the history of advertising, there has never been an ad medium that required so much legal and technical complexity to try to get people to accept it. Why is Meta going to so much trouble to try to come up with a legal way to require people in the EU to accept personalized ads? If ad personalization is so good for consumers, won’t they pick it on their own? Anyway, I’m looking for research on how personalization and privacy choices affect customer satisfaction.

More: Sunday Internet optimism

01 May 2024

blog fix: remove stray files

Another update from the blog. Quick recap: I’m re-doing this blog with mostly Pandoc and make, with a few helper scripts.

This is a personal web site and can be broken sometimes, and one of the breakage problems was: oops, I removed a draft post from the directory of source files (in CommonMark) but the HTML version got built and put in public and copied to the server, possibly also affecting the index.html and the RSS feed.

If you’re reading the RSS and got some half-baked drafts, that’s why.

So, to fix it, I need to ask make if there’s anything in the public directory that doesn’t have a corresponding source file or files and remove it. Quick helper script:

That should mean a better RSS reading experience since you shouldn’t get it cluttered up with drafts if I make a mistake.

But I’m sure I have plenty of other mistakes I can make.

27 Apr 2024

realistically get rid of third-party cookies

How would a browser realistically get rid of third-party cookies, if the plan was to just replace third-party cookies, and the project requirements did not include a bunch of anticompetitive tricks too?

  1. Start offering a very scary dialog to a fraction of new users. Something like Do you want to test a new experimental feature? It might—maybe—have some privacy benefits but many sites will break. Don’t expect a lot of people to agree at first.

  2. Turn off third-party cookies for the users who did say yes in step 1, and watch the telemetry. There will be positive and negative effects, but they won’t be overwhelmingly bad because most sites have to work with other browsers.

  3. When the breakage detected in step 2 gets to be insignificant as a cause of new browser users quitting or reinstalling, start making the dialog less scary and show it to more people.

  4. Keep repeating until most new installs are third-party cookie-free, then start offering the dialog on browser upgrades.

  5. Continue, for more and more users, until you get to 95-99%. Leave the third-party cookies on for 1-5% of users for a couple of releases just to spot any lingering problems, then make third-party cookies default off, with no dialog (users would have to find the preference to re-enable them, or their sysadmin would have to push out a centralized change if some legacy corporate site still needs them).

But what about the personalized ads? Some people actually want those! Not a problem. The good news is that ad personalization can be done in an extension. Ask extension developers who have extensions that support ad personalization to sign up for a registry of ad personalization extensions, then keep track of how many users are installing each one. Adtech firms don’t (usually?) have personalization extensions today, but every company can develop one on its own schedule, with less uncertainty and fewer dependencies and delays than the current end of cookies mess. The extension development tools are really good now.

As soon as an ad personalization extension can pass an independent security audit (done by a company agreed on by the extension developer and the browser vendor) and get, say, 10,000 users, then the browser can put it on a choice screen that gets shown for new installs and, if added since last upgrade, upgrades. (The browser could give the dogmatic anti-personalization users a preference to opt out of these choice screens if they really wanted to dig in and find it.) This makes the work of competition regulators much easier—they just have to check that the browser vendor’s own ad personalization extension gets fair treatment with competing ones.

And we’re done. The privacy people and the personalized ad people get what they want with much less drama and delay, the whole web ad business isn’t stuck queued up waiting for one development team, and all that’s missing is the anticompetitive stuff that has been making end of cookies work such a pain since 2019.

14 Apr 2024

planning for SCALE 2025

I missed Southern California Linux Expo this year. Normally I can think of a talk to do, but between work and [virus redacted] I didn’t have a lot of conference abstract writing time last fall. I need some new material anyway. The talks that tend to do well for me there are kind of a mix of tips for doing weird stuff.

I didn’t really have anything good to submit last fall, but this year I am building up a bunch of miscellaneous Linux stuff similar to what has worked for me at SCALE before. Because of the big Fediverse trend, the search quality crisis, the ends of third-party cookies and Twitter, and enshittification in general, it seems like there’s a lot more interest in redoing your blog—I know I have been doing it, so that’s what I’m going to see if I can come up with something on for next SCALE. But I’m not going to use a blog software package. I’m more comfortable with a mix of different stuff. This blog is now mainly done in Pandoc, auto-rebuilt by Make, and has a bunch of scripts in various languages, including shell, Perl, Python, and even a little bit of Lua now.

protip: use cowsay(1) to alert the user to errors in Makefile before restarting

I don’t really expect anybody to copy this blog, more outdo it by getting the audience to realize how much you can now do with the available tools. I’m not going to win any design prizes but with modern CSS I can make a reasonable responsive layout and dark/light modes. And yes you can make a valid RSS feed in GNU Make.

The feature I just did today is the similar posts in the left column. Remember that paper about how you can measure the similarity between two pieces of text by seeing how well they compress together? “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors - ACL Anthology This is Python code for rating similarity of chunks of text. Check it out in the left column, you can now follow the links to similar blog posts.

import gzip

def z(s):
    return len(gzip.compress(bytes(s, 'utf-8')))

def simscore(t1, t2):
    "lower is better"
    if len(t1) == 0 or len(t2) == 0:
        return 1
    base = z(t1) + z(t2)
    minsize = min(z(' '.join([t1, t2])),
                  z(' '.join([t2, t1])),
                  base)
    return int(10000 * minsize/base)

Next I will probably try stuff like Fediverse-powered comments, some kind of search feature, LLM training set poisoning, some privacy and p2p features, and maybe something else. A lot of what I’m doing here will be possible to translate into other environments, and should be portable to people’s favorite blog software.

12 Apr 2024

B L O C K in the U S A

(update 16 Mar 2025: add links, add slide from Advertising: Who Cares?)

According to a survey done by Censuswide for Ghostery, a majority of Americans now use ad blockers. Yes, it looks like a well-designed survey of 2,000 people. But it’s hard to go from what people say they’re using to figuring out how much protection they really have.

  • Are they answering accurately? People might be under- or over-reporting their use of ad blockers. Under-reporting because they don’t want to admit to free-riding on ad-supported sites, or over-reporting because install an ad blocker is now one of the typical Internet tips you’re supposed to do, like not re-using passwords and installing software updates when they come out. People might be trying to look more responsible. When the FBI says you should be running an ad blocker to deal with fake search ads, that puts a certain amount of pressure on people.

  • Are they using an honest blocker with real protection? The ad blocking category has a lot of scams, including adware and paid allow-listing, so most of the people saying yes are not getting the blocking they think they are. (The company that owns the number one ad blocker makes a business out of selling exceptions from blocking. Senator Ron Wyden wrote a letter to the FTC asking them to investigate the ad-blocking industry back in 2020, but no action as far as I know. In the meantime you can check your ad blocker using a tool from the EFF.)

  • How much of their browsing is on a protected browser or device? It’s a lot easier to install an ad blocker on desktop than on mobile, and people have different habits.

  • Is protection being circumvented by server-to-server tracking? Ad blocking has been a thing for a long time, so the surveillance industry has gotten pretty good at working around it. Facebook has responded to Apple ATT and to blockage of their tracking pixels by rolling out server-to-server tracking, which avoids any protection on the client. Google and other companies also have server-to-server tracking.

  • Maybe they really just don’t like ads in general. Lin et al. found that blocking all ads leads to fewer reported regrets with purchases and an improvement in subjective well-being. Maybe the simple explanation is just Advertising Makes Us Unhappy.

  • Maybe ads are just worse now. According to a presentation by Michael Farmer at the Advertising: Who Cares? conference, the people who make the ads are having to make more and more. Maybe the ads carry less information for the same amount of interruption and annoyance, and that’s why it make sense to block them. (And it’s not just about the rise of content creation software—most of the increase happened after agency processes had been digital for a while.)

The number of ads made by each agency employee increased from 3 in 1983 to 311 in 2019

The second most newsworthy part of the new Censuswide survey is why people say they’re using an ad blocker. Protect online privacy is now the number one reason, with block ads and speed up page loads coming in after that. I’ll leave the most newsworthy part to the end. I know, I know, the surveillance advertising people are going to reply with something like, yeah, right, these ad blocker users are just rationalizing free-riding on ad-supported sites, like Napster users making bogus fair use arguments instead of paying for CDs back when that was a thing. But, as Lars Doucet points out in Ad Blockers and the Four Currencies, it’s more complicated than that—many people get a positive integrity dollars bonus from punishing or disincentivizing a violation of privacy and honesty norms. In order to understand the ad blocker survey results, we have to put the survey in context with other research. Compare to Turow et al. on attitudes to cross-context tracking, and to an IAB Europe study that found only 20% of users would be happy for their data to be shared with third parties for advertising purposes.

It looks like the privacy concerns are real for a significant subset of people, and part of the same trend as popular US State Privacy Legislation. Different people have different norms around ad personalization, and if people can’t get companies to comply with those norms they will get the government to do something about it. For companies, adjusting to privacy norms doesn’t just mean throwing privacy-enhancing technologies (PETs) at the problem. Jerath et al. found similar levels of perceived privacy violations for on-device ad personalization as for old-fashioned cookie-based tracking. PETs have different mathematical properties from cookies, but either don’t address other problems or make them worse.

Companies deploying PETs are asking users to switch away from ad blocking and trust that they will do complicated math honestly—but they’re not starting from a position of trust. When users have the opportunity to evaluate the companies’ honesty in a way they do understand, the companies don’t measure up. Most people can look at an online map of their neighborhood and spot places where a locksmith isn’t. And it’s easy to look up a person on a social site and see where there are enough profiles that not all of them can be real.

screenshot of several fake Facebook profiles, all using the same two photos of retired US Army General Mark Hertling

The biggest problem with PETs will be that the Big Tech companies do both easy-to-understand activities—like scams, fake profiles, and union bustingand hard-to-understand activities, like PET math. I see you served me scam ads and a map with fake companies in my neighborhood, but I totally trust your math to protect my privacy — no one ever If you don’t know if the PET math is honest, but you can see the same company acting dishonestly in other ways, then it’s hard to trust the PET math. (Personally I think the actual PETs are probably legit, but they’re being rolled out as part of a larger program to squeeze out legit publishers and other smaller companies.)

In AIC polls, confidence in Amazon, Meta, and Google has fallen since 2018.

(source: How Americans’ confidence in technology firms has dropped: evidence from the second wave of the American Institutional Confidence poll) It would be unrealistic to expect PETs to shift people’s attitudes much on privacy concerns, or to affect ad blocking rates.

Maybe another variety of trust issue is behind Censuswide’s most newsworthy data point: experienced advertisers (with 5 or more years of experience in advertising) are more likely to run an ad blocker than average. (66% > 52%) Reminds me of how experienced email users were early adopters of spam filters—the more you know, the more you block. Between sketchy placements, bogus reports and a your call is important to us approach to advertiser support, the advertisers are having a much worse surveillance advertising experience than the rest of us. The Censuswide survey (full report PDF) also shows that more experienced advertisers than ordinary users believe that the Big Tech companies are likely to abuse data.but realistically, who knows if they are or not?

The Tragedy of the Commons is bogus when it comes to actual traditional practices for managing common resources, but it is a thing within large companies. Individual product managers are incentivized to achieve short-term goals either at the expense of other product managers, by dishonest practices that spend down the (common across the whole company) reputation level, or both. For example, within the same large company one business unit can achieve its goals by licensing e-books, while another business unit can achieve its goals by running ads on infringing copies of the same titles. Big Tech fans often ask, if these companies are so distrusted, why do people keep using their products? But another question is, if these companies are so trusted, why do voters keep asking the government to take over managing their products? Privacy settings are hard for users to figure out and easy for companies to override, but a vote for privacy is easier and sticks better. (and possibly the one thing that a bitterly divided nation can agree on)

Doc Searls called ad blocking the biggest boycott in world history back in 2015. Ad blocking looks like a response to creepy practices (or perceived privacy violations if that works better for you) and those practices are part of a more general scam culture crisis. Tressie McMillan Cottom writes,read the whole thing

Scams weaken our trust in social institutions, but their going mainstream—divorced from empathy for the victims or stigma for the perpetrators—means that we have accepted scams as institutions themselves.

I can’t see any one big policy solution for surveillance advertising, tech oligopolies, or the broader scam culture problem. All of that stuff would have to change in order to move the ad blocking numbers. It’s going to take a variety of approaches, maybe including a surveillance advertising ban, maybe surveillance licensing, maybe a Pigovian tax on databases containing PII, maybe breaking up Big Tech firms. So far the most promising approach seems to be state laws with private right of action, which is one of the reasons I’m so optimistic about Washington State’s My Health My Data Act. My experience on a jury (not an ad-related case) was the most productive meeting I have been in since I came to California. If surveillance advertising issues can grind their way through a few jury trials, where lawyers have an incentive to explain what’s going on in an accurate, comprehensible way, then both surveillance marketers and privacy nerds will be able to reset how we approach this stuff based on more common sense.

More: Sunday Internet optimism

03 Apr 2024

GPC all the things!

(this post is subject to change as I come up with more places. Get in touch if you want to suggest one.)

Now that we have Global Privacy Control for the web, can we do better? Where else do we need it? Not everything is on the web.

The big risk of having GPC for web but not for other systems is that surveillance companies will start forcing or nudging you to interact with them in other ways—if the surveillance options are too easy to use elsewhere, then more companies will use deceptive practices to drive us to all use communications technologies where browser privacy features can’t protect us. From my point of view, as someone who would rather use the web, that would be bad. (Install our app for full product info or to schedule a demo! Press the button on your printer or Internet refrigerator to order! Yikes.)

Smart TVs: As far as I know, these support HTTP, so the existing Sec-GPC header should work fine. Finding the right option in the TV menus is left as an exercise for the reader.

mobile apps: Same, apps also use HTTP so the same header should work. Set one preference per device or account and have it work across all apps.

other devices with an order button: I don’t know about these, will have to look it up.

email part 1: Email also has headers, so standardizing a GPC header would also work, but needs to grind through the RFC process.

email part 2: add +sec-gpc to your address when you mail a company. CRM systems and related software can already handle RFC 5233-style plus addresses, so could treat incoming email from [email protected] as an opt out for [email protected]. (GPC in email could be DKIM-signed, too.)

Instant messaging: So many options, should be doable but a pain because all the big companies want to have their own special protocol. The mobile device GPC should cover it from the sender side, but a recipient company still needs to get the GPC passed through on a message from a customer, somehow.

ActivityPub: Normal GPC in a header works if you connect to an instance directly, but the standard would need to be extended to let user opt outs and/or preferences travel with their content. Update 14 Jun 2024: fep-5e53 covers adding opt-out preference signals to ActivityPub.

NFC: The standard behind tap-to-pay and similar features could be extended to support GPC. This one would take more coding and have a longer deployment cycle, though.

vehicle license plates: It’s time for a crossover event between the California Privacy Protection Agency (CPPA) and Special Interest License Plates at the DMV. Add a special symbol to license plates to require companies using ALPR systems to treat the plate, and related database records, as opted out. But wait a minute, custom and special-interest plates cost extra! Wouldn’t that be asking people to pay for privacy? (I’d get one, but really just to fly my privacy nerd flag so I don’t count.) Not necessarily—most of these could end up being free to the vehicle owner. In practice, GPC plates would be a way to help resolve privacy class action lawsuits. Instead of getting yet another free credit monitoring service when a company settles a case (and you probably had free credit monitoring from the last class you were in) a company could settle for buying everyone in the class their choice of credit monitoring or a GPC plate. An interesting piece of graphic design work for somebody would be coming up with an easily recognizable symbol that an ALPR system would not get mixed up with any of the other symbols that are allowed on a license plate.

face recognition: Once we have a GPC symbol for license plates, require companies that process images for the purposes of identifying people to also check for the symbol. Then you could wear it on a T-shirt, on a piece of flair, or on the front of a hat.

tire pressure monitors: These devices have a unique ID and can be used for surveillance. This probably also apples to a bunch of other little radios that follow you around, but maybe the easier method here is to ban commercial surveillance using little radios that follow people around, since there’s not really a legit use case.

postal addresses: There should be a way to pack a GPC into an apartment number. Maybe treat an address from apartment “GPC” as an opt-out for the same address without “GPC” if you have it, and an address from apartment “3AGPC” as an opt-out for apartment 3A?

more GPC for web

A GPC-like opt-out for HTML meta tags and X-Robots-Tag HTTP headers: The original GPC for web is a signal from client to server. Adding a header and/or meta tag would make it apply to cases where the personal info is on the server (as in a blog, forum, or profile page) and the recipient of the signal is the client (a crawler or scraper).

GPC in robots.txt: Another possible place to put GPC as a server-to-crawler OOPS. FIXME: how to handle this to convey permissions—you might want a crawler to check for something like an RSS update, but not harvest your contact info for sale/sharing.

GPC in links: Sort of like a nofollow link, you could tag a link containing personal info as opted out: <a class="secGPC" href="/service/mailto:[email protected]">mail me (but don't sell/share my info)</a>. That would send an OOPS for the email address but not any other info collected from the page.

GPC for RSS: It might be possible to handle this with the HTTP response header, and not have to put GPC in too many places.

Scraping can already be actionable as a violation of site ToS, but being able to tag a page by or about a person as opted out is another level of protection that imho will help encourage more people to make more useful sites. As a site operator, you want to be able to put up something like a blog, company directory, or event speaker profiles without feeding your site’s people to who knows what.

01 Apr 2024

trying to think about European tech policy in context

(update 1 Aug 2024: last section)

Good post on Antitrust, Meta, Apple and more by Ian Betteridge.

The EU isn’t just concerned with today. It’s really taking Steve Jobs’ advice and listening to the Wayne Gretzky quote: it’s skating to where the puck is going, not where it’s been. Its aim is to ensure that two very large companies don’t own the market for smartphones to such a degree they can determine everything that happens in those markets, to their advantage. The EU is a capitalist body: its obsession is keeping markets open, and it will do anything it needs to do to make sure that happens.

From here in the USA, it looks to me like we need to put our understanding of European privacy and competition policy into context. We seem to spend a lot of time in the weeds trying to figure out what will be allowed by specific laws and regulations, and maybe we’re missing the big picture. The EU has two really big problems.

  • The EU’s most important problem is climate change, including the problem of climate refugees. Worst case is that Europe has to rebuild their infrastructure and agriculture while somehow dealing with 1.2 billion people trying to move there because their old home is unlivable now. None of the ways that the EU can address this problem are going to make everybody happy, and the EU somehow has to do them in a decentralized democratic system.

  • The EU’s most urgent problem is that Europe is being invaded by Russia—regular Russian forces in several eastern European countries, and economic, information, and network warfare (update: and terrorist attacks and infrastructure sabotage) clearing the way for them further west. Also a hard problem, partly because of the Putin regime’s ability to work the political system.

Big Tech isn’t in trouble in Europe because companies are failing to comply with whatever the EU laws are today. They’re in trouble because they’re more of a part of the problem than a part of the solution on the big issues. The EU says that they want social media companies to hire factcheckers to fight election fake news, they opened up a bunch of competition cases on platform companies, and, of course, they’re going to shut down Meta’s bogus pay or consent policy. if the pay or consent model actually worked it would turn the whole GDPR into a no-op. Every vending machine would have pay €2 and consent and pay €1002, no consent buttons. Pay or consent is not going to hold up in court but the Meta team that deployed it have already collected their bonuses and moved on.

But it’s not about the individual cases. If the EU throws the book at a Big Tech company, and the company successfully dodges it, what happens is that the EU makes a heavier book and throws harder. As long as a company is working for the Russians, for the fossil-fuel industry, and for divisive right-wing groups (yes, they overlap) then they’re going to have a problem in the EU. Someone in the EU is probably also reading the history of the US Republican party and direct mail—a party that had been a coalition of defense, free-market, and social conservative factions rapidly pivoted in the post-Goldwater years based on what works as direct mail (and now email and social media) copy. In the USA, it looks like direct mail, personalized, and surveillance advertising give advantages to fear-based and racially divisive messsages that would work less well in other media. If that applies in Europe too, a crackdown on the surveillance industry makes sense even outside the context of the coming post-surveillance economic boom.

And yes, there’s an Al Capone effect going on here. The same Big Tech management teams that reward sexual harassment with big checks are also willing to monetize scams hosted in adversary countries and copyright-infringing sites run from adversary countries. I still don’t understand the appeal of Vladimir Putin fandom among tech thought leaders, but it’s there—and as long it persists, climate change and European security are the underlying problems for Big Tech in Europe, not any specific policies.

Update: can’t the US “tech” companies just leave Europe?

This keeps coming up whenever the Europeans try to regulate anything. The surveillance/cryptocurrency/LLM tech boosters say that big companies can just pull out of Europe, and then they’ll be sorry and beg the companies to come back because they’re missing out on all the technology and innovation.

That might have been a threat if the industry were still in a position to deny Europe the System/360, the i386, Unix, or Ethernet…but in case you haven’t noticed, Big Tech has been pretty much out of transformative innovations for a while. People aren’t lining up to buy the metaverse, NFTs, or LLMs (putting AI in a product even reduces purchase intentions) the way they did for Windows 95 and iPhones. If big companies block Europe, Europe’s alternatives look pretty good—technically good enough, and with enough less scam culture built in to be a win overall.

And whatever contender comes in to serve Europe will end up in a position to build network effects globally. The upstart doesn’t need a big starting market share or feature lead, just the ability to serve users that Big Tech can’t or won’t. In the USA we have less lard and more Linux support than you might expect.

  • Only a single-digit percentage of people in the US observe dietary restrictions on pork, but Oreo cookies are lard-free. If you buy a big pack of cookies for an event, the likelihood of someone in the group not eating pork is pretty high.

  • Desktop Linux is a pretty small fraction of the IT market, but all the video conferencing services support it. Even though the chance of a Linux user on any one call is small, the chance that an IT department will have to deal with some meeting orgnizer not being able to get some Linux participant on in any given day is pretty high.

If a Big Tech service pulls out of Europe, any competing service that can compromise to keep operating there will pick up the network effect win, as groups of people shift to whatever platform or forum can accomodate everyone. And it’s pretty clear they realize this—nobody at Meta has been threatening to leave Europe even as the pay or consent cases grind their way toward an outcome that allows no-consent users a partial free ride on the same service without the personalized/surveillance ads.

30 Mar 2024

LLMs and reputation management

One of the big problems with widespread use of large language models (LLMs) is going to be that reputation management firms will be able to put up a lot of content to try to clean up mentions of their own clients. Other players will also, in a totally deniable way, be able to put up their own text to train LLMs to say bad stuff about people who are opposed to their clients in some way. What is seen as important material about a person from the point of view of, say, a reporter or Wikipedia editor, is not necessarily going to be what gets pulled out a big pile of crawled text by an LLM.

You can kind of see what’s possible by looking at 20th century history. Many Axis officers from World War II have a lot of content about them on the Internet somewhere.

  1. contemporary propaganda

  2. post-war content such as the “clean Wehrmacht” campaign

  3. online forums discussing 1 and 2 above

  4. gaming-related content

For example, let’s take a look at the 14 defendants in the High Command Trial at Nuremberg. To a human editor, the Nuremberg trials were a significant historic event, so the fact that an officer was a defendant is going to be a key fact about him that makes it into any reasonable length bio.

For ChatGPT, not so much. Asking ChatGPT: “what is officer_name known for” gives different results. I tried it.

  • All 14 are correctly identified by country and historical period

  • In 5 cases, the ChatGPT answer has no mention of the trial, of any of the conduct for which the officer was tried, or any mention of war crimes at all.

Answers often end with something like, After the war, officer_name was captured by Allied forces and held as a prisoner of war. He was later released and returned to civilian life.

The Nuremberg trials were extremely well covered at the time, a lot has been written about them, and they’re still studied today. Old-school propaganda operations, keyboard warriors, and random wehraboos were able to scrub this major event, and the crimes for which the defendants were tried, without really trying. So the same kind of thing is going to be practical for lots of people, companies, and organizations. Reputation management firms typically don’t have to obscure something as big as the Nuremberg trials, and they can use LLMs to produce far more text, faster, than even an online army of gamers can. Expect the defendants in run-of-the-mill corporate trials to effectively disappear their crimes, too.

It’s possible that the big “AI” companies could deal with the reputation management problem by licensing a set of known neutral point of view (NPOV) content and either avoiding material that looks conflicting, or filtering out responses that look inconsistent with it. But that’s a big task, and will get bigger the more that “reputation management” is seen to work.

(This is an edited version of a thread).

23 Mar 2024

test post

This is a test post. The RSS feed is probably going to be a little noisy.

Here are some fun links anyway.

22 Mar 2024

404

Hey, kids, 404 page!

Yes, I have been messing with my blog and breaking stuff.

Perhaps there’s a way to fix things.

Anyway, the thing you’re looking for might be on the whole blog on one page page, which is a thing here.

13 Mar 2024

the 30-40-30 rule

User research about advertising, personalization, and privacy is surprisingly consistent. In a replication prediction market, I would invest in futures on any research that shows:

  • About 30 percent of people want personalized ads.

  • About 30 percent of people don’t want personalized ads.

  • For the other 40 percent it depends how you ask.

A lot of good work has been done in this area, but the results are inconvenient for anybody who wants to be able to build one set of online advertising software and settings for everybody. People are different. Research links, in order of percentage of pro-personalization people found.

18%

McDonald et al. did in-person interviews with questions on people’s knowledge and preferences about Internet ads.

  • 18% agree with Glad to have relevant advertisements about things I am interested in instead of random advertisements

  • 30% agree with No one should use data from Internet history to personalize ads.

26%

Accountable Tech did a 1,000-person survey.

  • 26% agreed with I’d rather see relevant ads, even if companies are using my personal data to target them.

  • 46% supported a ban on collecting people’s personal data and using it to target them with ads.

30%

Verbraucherzentrale Bundesverband commissioned a phone survey of 1,500 people in 2024. Seven out of ten respondents (70 per cent) agree that companies should not be allowed to create profiles based on personal data to deliver personalised advertising.

30% (or 47%)

U.S. Internet Users Ready to Limit Online Tracking for Ads In a Gallup poll of 1,019 adults, 30% agree that advertisers should be allowed to match ads to interests based on websites visited but 47% want to be able to allow advertisers they choose to personalize ads to them in some way. (It depends how you ask.)

32%

Turow et al. did an online study of 314 people.

  • 32% said yes to Please tell me whether or not you want websites you visit to show you ads that are tailored to your interests.

  • 68% definitely would not allow personalization based on web history even if anonymous.

36%

YouGov did a survey of British adults.

  • 36% are Personalized Pioneers who say they want personalized ads
  • 55% say that personalized ads creep them out.

37% (26% + 11%)

In a study of regular social media users, YouGov found that 11% are fine with personal data for all ads, and an additional 26% only object to personal data being used for political ads.

42%

IAB Europe did a survey of 11,020 European Internet users.

  • 42% don’t mind personalized ads based on browsing data

  • 20% don’t mind having data shared with third parties for ads

People are different

Some people are privacy people and I’m one. Personally I don’t understand why anybody would want a personalized ad at all. I’m most likely to get any use from advertising when I need to buy unfamiliar stuff, and that happens when I’m learning about a new activity that I’m not good at. I want to see the same ads that show up for people who already know the skills and the scene, and would raise a stink about a deceptive ad in their information space. As far as I can tell, personalization makes advertising work in the wrong direction.

Some people are personalization people and don’t understand why you wouldn’t want a personalized ad. If you’re going to look at an ad anyway it might as well be for something that you’re more likely to buy. (But I want to see the same ad that a company is willing to show to the regulators, editors, and experts in their community of practice, not what they think they can get some random person to click on. I don’t understand Kevin and he doesn’t understand me.)

Other people, the biggest group at 40 percent, are probably better off not overthinking online ads and, instead, learning about other stuff.

Whatever a person prefers,Can database marketing sell itself to the people in the database? it seems to be about the personalization using any information from outside the context in which the ad appears—cross-context behavioral advertising—not about individualized tracking. Jerath et al. found similar perceived privacy violations (PPV) for new ad personalization technologies that prevent individual identifiers from being used: New technologies or proposals that ensure that data are kept on the consumer’s machine lower PPV relative to behavioral targeting but, importantly, this decrease is small. Furthermore, group-level targeting does not differ significantly from individual-level targeting in reducing PPV. The IAB and Gallup studies imply that there is a cohort of users who want personalization but not cross-context tracking, but safer personalization looks like a small add-on to the 30%, not a widely held preference that splits off a chunk of users from the core anti-personalization group.

using the 30-40-30 rule

A lot of this research looks useful as a way to spot deceptive patterns in personalization preferences. If fewer than 30% or more than 70% of users end up with ad personalization turned on, something is probably wrong with the personalization UX or the underlying trustworthiness of the medium.

And when building anything that depends on advertising, it’s going to be hard to build something that only works if everybody who uses it has personalization, or that only works if nobody does. As far as I can tell, adding extra personalization for those who don’t want it just builds support for privacy habits, settings, tools, and laws, in that order. And banning personalization entirely, or building a medium that can’t be tweaked to support it, hasn’t really been tried yet but would probably cause problems of its own.

In my humble opinion, the biggest benefit of the 30-40-30 rule is that it helps justify the decision to support Global Privacy Control everywhere. People are just different, so let everybody pick once whether they want cross-context behavioral advertising everywhere. Yes, I know, I know, under a different set of laws we could have had Global Personalization Control, that would work the other way around so that the work of turning it on would be done by the personalization people and not the privacy people. But GPC is there, it works, and if sites, browsers, and platforms can implement it in a common-sense way you should just be able to turn it on once and be happy for as long as you own the software or device, kind of like Filmmaker Mode on a TV. In the long run, I can safely predict that the fraction of users who turn GPC on will be somewhere between 30 and 70 percent. Which end of the range we end up with will depend in how trustworthy the ads, the Internet, and the economy are in other ways, but that’s another story. More: GPC all the things!

10 Mar 2024

privacy: I’d buy that for 20 dollars!

Update: Newer post with a similar calculation, which I think is more accurate: or we could just not

How much extra should you pay, or would you end up having to pay, to get privacy from surveillance ads? This is my attempt to come up with a high number for surveillance advertising’s impact on the content economy—the actual ad-supported information, entertainment, or other resource that that advertising is paying for. For now I’m going to make the most pro-surveillance assumptions, to try to come up with a high bound for the price of privacy.

To start with, total advertising, in all media, in the USA is about $960 per person which works out to a convenient $80 per month being spent on ads intended to reach you. So let’s start with the $80 and assume (to be as friendly as possible to the surveillance side…)

  • All advertisers prefer maximum surveillance

  • All ads are surveillance ads unless interfered with by privacy tools

  • Surveillance ads are 2x as valuable as non-surveillance ads

  • People who make content (writers, photographers, artists, musicians, editors…) get a similar large share of revenue from advertising spending as from other ways of supporting their work.

So, if you can somehow get total privacy protection, ads that reach you are worth only $40 per month total, instead of $80. Or, more realistically, if you can get halfway protected, your privacy is costing the content creators of the world $20 per month minus whatever goes to the intermediaries. That’s real money, but it’s on the scale of a few subscriptions or a fraction of an ISP bill, not privacy breaks the Internet economy as we know it money. According to one survey, the average US household spends $61/month on streaming video services. That’s per household, not per person, so at an average household size of 2.5, streaming alone is about $24 a person. could use some numbers for music, news and other services here too.

You might not want to agree with those three assumptions. Some ad money still gets spent on non-surveillance advertising, and surveillance advertising might not be twice as valuable as surveillance advertising in every medium. And different ways of supporting content have different transaction costs. If I install privacy tools that cost the surveillance business $100 in ad revenue per year, and at the same time I buy a subscription to 404 Media for $100 per year, I’m pretty sure I’m paying for a lot more content than the ads would have. The stack of service providers needed to run a subscription service takes a smaller bite of the subscription money than the ad agency/adtech stack takes of the ad money. And the publisher/intermediary split is always worse than it looks. Many of the publishers that show up in adtech’s reckoning of its impact are so-called made for advertising (MFA) sites, or sites that just run copyright-infringing or AI generated content. When a surveillance firm makes claims about ads supporting content, a lot of the ad money they count as the publisher’s share is not funding any new material at all.

Thomas Baekdal wrote, for every one person you could convince to subscribe, donate, become a member, or support you on Patreon…you would need 10,000 visitors to make the same amount from advertising. Or to put that into perspective, with only 100 subscribers, I could make the same amount of money as I used to earn from having one million visitors. It doesn’t look like you need to add $4 to your monthly subscription budget, to balance things out, every time you improve your privacy protection by 10%—that’s more of an unrealistically high estimate if you want to be on the safe side. A more realistic view on the impact of privacy tech is in Targeted advertising, platform competition and privacy. Making both personalized and non-personalized options available will make more money for the publishers, so publishers are better off letting the privacy people have privacy as a fair choice—so the privacy fraction of the audience is protected from being followed elsewhere in a way that drives down ad rates—while the personalized ads option remains available for those who want it.

The bigger problem is that platform companies that sit between advertiser and publisher are steadily chickenizing the companies on both sides of the ad market. Jakob Nielsen pointed out the problem for search engines back in 2006:

In the long run, every time companies increase the value of their online businesses, they end up handing over all that added value to the search engines. Any gain is temporary; once competing sites improve their profit-per-visitor enough to increase their search bids, they’ll drive up everybody’s cost of traffic.

Today, the problem continues in search, but also applies to big retail and social platforms. Direct to consumer brands must pay rent to platforms, and both brands and publishers are indirectly micromanaged by and for large platform companies, like drivers for Uber or Amazon. So it seems like there’s a big win available from antitrust cases. Putting a price on privacy is probably backwards. It would make more sense for legit sellers to promote and even pay for privacy tools among good customers, just for the fraud protection benefits. Money that a customer spends taking their kids to a bogus event advertised on social media can’t be spent on legit products or services. Brands that need to reach a high-privacy early adopter group are going to have some interesting post-surveillance options.

Something that’s harder to measure is that companies may be over-advertising to the point of inefficiency because advertising is (with extra, unwanted, surveillance in place) more measurable than other ways to spend company money. The direct mail saying is list, offer, package—you can get the highest returns by targeting the best-performing list of people, then by making a better product and/or a better price, and finally by improving the actual mail piece. If privacy goes up to the point where optimizing the list is harder to do, then the company has more incentive to invest in “offer” (product features, quality, service, price) and “package” (content marketing and ad creative).

One of the unconventional aspects of the open source trend of the 1990s/early 2000s was that open source companies like Red Hat were willing to say that they wanted the software market to be smaller in total. Most investors don’t like to hear that especially for an established industry. But a smaller, more restricted advertising industry in total might be more valuable to advertisers and consumers, by driving some investment into product improvements that are higher in value than, but less measurable than, advertising.

So, what do do about it? It looks like some privacy tips will be more effective than others for re-incentivizing the industry, so make a list and try the ones that look good to you.

18 Feb 2024

what if Twitter is Road Runner?

Remember the famous 9 Rules For Drawing Road Runner Cartoons, by Chuck Jones, that made the rounds a while ago?

It seems like there is a very similar set of rules for Twitter.

  1. Twitter users cannot harm Twitter management, can only tweet
  2. The only forces that can harm Twitter management are the failures of its own schemes
  3. Twitter management could stop at any time, but chooses not to
  4. No new dialogue (all the Twitter business model ideas were already tweeted some time during SXSW 2006)
  5. Twitter users cannot leave Twitter

18 Feb 2024

moderation is harder than editing

Moderation is the hardest part of running any Internet service. Running a database at very large scale is not something that anyone can do, but people who can do it are around and available to hire. And you don’t have to write your own BigTable or Cassandra or whatever, like today’s Big Tech had to do—you can just call ScyllaDB or even click the right thing on Amazon AWS.

Moderation is harder than editing, and I say that as a former editor.

  • Editors know the language and audience in advance. Moderators might get a bunch of new users from nobody knows where, writing in a language they don’t know.

  • Editors know the schedule and quantity of content to be posted in advance. Moderators have to deal with content and complaints as they come in.

  • If an editor doesn’t understand something, they can just tell the writer that it won’t be clear to the reader, and make the writer rewrite or explain it. A moderator just has to figure it out.

  • An editor has more options: delay, ask for a rewrite, and yes, even edit. A moderator has fewer available actions.

26 Jan 2024

rules of search engine club

  1. Anyone may use the search engine anonymously, or make a free account. Accounts may be personal or organizational. An organizational account must be verified with a DNS TXT record or a file under /.well-known.

  2. Account holders can use the hide site in results feature, other customization features, and the search API.

  3. Account holders can add an RSS feed to their profiles. Account holders agree to allow the club to use their customization choices like hidden and boosted sites, and RSS links, in aggregated form to improve search rankings. Account holders may list their trademarks in a /.well-known/ file, to prevent them from being used in a search ad. (See rule 7)

  4. An account holder whose RSS feed and hidden sites provide useful information may be invited to join as a member.

  5. Members, except nonprofit organizations, pay an annual fee. Individual members may vote for the board. All links in member RSS feeds are crawled. Member fees and categories (such as student and senior memberships and fees for extra API calls) are set by the board.

  6. Members may apply to become advertiser members. Advertiser membership applications are approved or rejected by a vote of all members, and approvals must pass with a 2/3 majority. Members also vote on the limit of total search ads to be shown, which may be zero.

  7. Advertiser members may bid to run search ads in a second-price auction on search terms. Search ads may not target the name or any trademark of another member or account holder. Either the board or the members may vote to convert an advertiser member back to a normal member for any reason.

  8. Members may be suspended or expelled by the board. A suspended or expelled member may appeal to a vote of the full membership. A suspended member may use the search engine as an account holder.

  9. Excess funds after operating the search engine are donated to nonprofit member organizations whose sites are frequently returned in search results.

01 Jan 2024

effective privacy tips

Disclaimers: This page is regularly updated with links and copy changes. This page is not intended for people with a specific personal security need or threat. But it does include some tips that will make it easier for vulnerable people to blend into a crowd of other people doing similar things. See links at the end if you want to put more time into privacy stuff after trying these tips.

Most people don’t have time to follow all the privacy advice available to them. This is my current list of what I think are the tips that will provide the biggest impact for the time available. For me, this is not so much about protecting myself today, but in moving money out of surveillance and into constructive areas in the future.

A lot of privacy tutorials will tell you to act independently and protect yourself, but in the real world that’s hard to do. See the section on privacy labour in the W3C Privacy Principles. The surveillance industry has the resources to make more privacy settings than you can possibly find and fix. (Can you win Realistic Facebook Privacy Simulator?) Privacy tips are only part of the solution, and they really only help as part of collective action. The problems with individual privacy include

  • Using privacy tips in situations where other people don’t use them just makes you more identifiable.

  • If only a few people use privacy tips, then money keeps flowing into surveillance. CMOs fund surveillance marketing projects at companies, VCs fund surveillance startups, and surveillance budgets keep going up.

In the long run I want to lower the expected return on investment (ROI) of future surveillance, which means it’s generally better to have more people doing the few highest-impact measures than a few people getting more protection.

In my humble opinion, my good fortune in having nothing to hide comes with an obligation to help others by taking privacy measures that can make others safer. This is my latest attempt to pick some good privacy tips that most Internet users should be able to do, and rank them. I’m looking not just at the privacy scene, but also at industry sources and legal documents. When a tool or setting has the result of lowering revenue for Big Tech, or if they try to nudge people into not doing it, then that’s a good sign, and I move it up on the list.

These tips are all focused on the USA, and different tips might work more or less well elsewhere. I’m also assuming that you can have both a computer and a phone—a basic laptop and a basic phone are a lot better for doing these than even the best possible phone. If anyone has written effective privacy tips for people who are phone-only please let me know and I’ll link here.

And tips will get more or less effective as laws, regulations, compliance, and features change. I have been updating this page fairly often. Ready? Let’s start with an easy, high-impact one.

1. Fix mobile privacy settings. These are different on Apple iOS and Google Android. On Apple iOS (iPhone and iPad):

  • In Settings, go to Privacy & Security, then Tracking, and make sure Allow Apps to Request to Track is turned off.

  • Also in Settings under Privacy & Security, find Apple Advertising and make sure that Personalized Ads is turned off. (You will probably have to scroll down—Apple makes this one a little trickier to find.)

  • In Settings, under Safari, scroll down to Advanced, and turn off Privacy Preserving Ad Measurement (This is an ad tracking feature, not a privacy filter on existing ad tracking.)

On Android, there are also at least two settings to check.

  • Open Settings, go to Privacy, then Ads and select Delete advertising ID.

  • Under Settings, go to Security & Privacy, Privacy, Ads, Ads privacy, Ad topics. Toggle the switch next to Allow ad topics off.

Consumer Reports has instructions that should work across Android versions. See Workout 4 at CR Data Bootcamp

On either platform, if you have a Google Account, turn off ad personalization. Go to https://myadcenter.google.com/home and set Personalized Ads to Off. This does not turn off the tracking that feeds into the ad selection, but makes it harder to infer anything about you based on having received a particular ad, and makes it harder for scammers to target you (like malware operations targeting IT staff for fake download ads.) Most important, it looks like this setting will help limit Google ad revenue (PDF) so they have less money to build future surveillance. And, as a bonus, it will help you buy better stuff.

Fixing mobile privacy settings is still today’s best privacy tip, because of the ubiquity of smartphones, the good research showing high impact, the small amount of time required, and the stickiness of the settings once set. These help protect you from fraud, too. According to the National Bureau of Economic Research, the Apple setting reduces money lost in all complaints by 4.7% and money lost reported in internet and data security complaints by 40.1%.

Because a lot of people have already fixed these settings, workarounds by the surveillance industry are pretty far along. Many apps use server-to-server tracking which is not affected by this tip. It’s still a good first move, though.

2. Get an authorized agent. Consumer Reports Permission Slip is free and works well to automate your right to opt out of data sales or sharing. It has an auto requests option to handle the big data brokers, so you can sign up once and then leave Permission Slip to do its thing. Or if you’re interested in the surveillance economy you can open up the app and choose to send opt-outs for individual companies manually.

This tip is a contender for number one in a future list, as three trends continue.

  • Consumer Reports is adding more companies to the list.

  • More companies are coming into compliance and automating their interactions with authorized agents.

  • More state privacy laws keep taking effect.

When you sign up for Consumer Reports you can also get on their list and start keeping up with political stuff, which will help with another tip on this list. (Disclaimer: I worked on the CR research that led up to Permission Slip and am still an advisor to CR.)

Permission Slip offers both a free service and a paid upgrade. Two paid-only options are Incogni and DeleteMe.

3. Switch to non-surveillance apps and remove as many apps from your phone as you can.

For messaging, switch to Signal where possible. This doesn’t just give you better personal security and privacy. If you have nothing to hide, switching can still help you protect others.

  • help vulnerable people hide in the crowd of other users chatting about boring stuff on Signal

  • collect less surveillance data in total, to limit the ability of Big Tech AI to learn to spot the targeted users

Go through your phone and remove extra apps. This includes not just the obvious social media and messaging. Most mobile apps have some kind of tracking built in.

Habits can be powerful. Elena Rossini writes, I will be thinking: what do the Big Tech billionaires and kleptocrats wish I would do? And then I will do the OPPOSITE. Look for ways to nudge yourself into doing some beneficial habit instead of feeding a surveillance app with your attention. If you can’t completely delete a risky messaging app because you need it to communicate with some people, some compromise steps are to move it off your phone’s home screen and to make a habit of checking Signal before checking it so that people learn that Signal is a better way to reach you.

4. Fix smart TV settings. You can skip this one if your TV is not connected to the Internet and you know that nobody will connect it, or if you don’t have a TV that can be connected to the Internet.

TVs have a lot of confusing menus to navigate, and manufacturers keep moving the settings around. The article How to Turn Off Smart TV Snooping Features is about as clear an explanation as I have found. The reason this tip is number three here is because Tom Cruise says you have to change some other default settings on the TV anyway, so you might as well do the privacy stuff while you’re digging into your TV menus.

4.5 Set up a YouTube-only email address. This is a prerequisite for the next step. If you don’t have a spare email address that you can dedicate to YouTube, you can use Apple Hide My Email or Firefox Relay to make a separate address just for YouTube. (Using different email addresses for everything is a privacy tip but can be inconvenient enough that it doesn’t make my list—YouTube does.)

Another option is to make an account on Hotmail or some other non-Google email service, just for YouTube.

5. Fix YouTube. YouTube is the number one social network by some metrics. And it’s a big surveillance risk because it’s tied to other Google services.

It’s important not to log in to YouTube with your regular Google account if you have one, or to make a YouTube account with your normal email address. You need to log in with an email address you only use for YouTube.

You have a couple of options.

Don’t use the YouTube app, just a browser. Unfortunately this tip is here because of the impact level, not because the privacy development scene has good easy answers for how to do it. This one might depend on habit power—saving YouTube links to watch later on a privacy laptop instead of opening on your phone right away. Attention developers: YouTube protection is a big need with relatively few solutions, good place to make a big difference.

6. If you use Facebook or Instagram, fix Meta settings. John Oliver covers How to change your settings to make yourself less valuable to Meta. These settings don’t limit how much information about you goes to Meta, but they will probably limit your exposure to well-targeted scams, as well as reducing the amount of ad money that goes to Meta.

7. Turn on Global Privacy Control. This will automate your do not sell/share for sites you visit. Hella easy so the denominator in the effect/time score is low. But it’s still not supported everywhere, because there are more companies required to comply with state privacy laws than there are people who know how to comply. As more companies come into compliance, and as more states require it, expect this one to work better and better, and move up on the list. Privacy Badger will turn on Global Privacy control for you, by default, along with blocking tracking scripts, so it’s possible to do two tips with one extension. There are several easy ways to do this one. More info on the Global Privacy Control site.

8. Participate in privacy politics. California has the CPRA because people voted for Proposition 24 in 2020. The CPRA isn’t perfect, but voting made a difference. Consumer Reports Impact Newsletter and Newsletter from The Markup are helpful for keeping up with privacy issues.

9. Install a basic privacy extension on your web browser. Privacy Badger will both block tracking scripts and turn on Global Privacy Control. A big time saver. The only problem with this one is that some sites will detect that third-party scripts are being blocked, and hassle you to turn off your ad blocker—so you’ll have a less annoying experience running uBlock Origin instead (see tip 10).

10. Fix browser settings. Most browsers have sensible defaults by now, but there are two exceptions: so-called privacy-enhancing ad data collection, and the old-fashioned third-party cookies that remain in Google Chrome. Check and fix the settings for whatever browsers you use.

(I’m not including switch browsers in this list because switching can be a lot of work, and you need to change some privacy settings in all the major browsers. You do need a separate browser for YouTube though—see tip 5. Google Chrome is probably going to have to remind you to turn their advertising features off, but I don’t know whether the others will have to disclose what they’re doing. You have to check the settings.)

11. Check or replace your ad blocker if you use one. Some sites (and even the FBI) recommend an ad blocker, but just searching for ad blocker or taking the first online recommendation is a bad idea. Many ad blockers are either adware, or participate in a paid allow-listing scheme that allows tracking by default. Do not use Google Search or the search feature on the Chrome Web Store to find an ad blocker, even if you know the name of a good one. A lot of extension listings are fraudulent and Google has no incentive to help you with this search. Only use a link from a trustworthy privacy site.

You can visit EFF’s Cover Your Tracks to run a test and reconfigure your existing ad blocker, but switching to uBlock Origin is probably quicker and will give you a privacy upgrade as well as exposing you to fewer web annoyances. uBlock Origin has an active development, testing, and support scene. They’re especially good at eliminating turn off your ad blocker pop-ups.

(Personally, I generally don’t run an ad blocker because research, but privacy tools and settings often set off a lot more turn off your ad blocker pop-ups than uBlock Origin users get. If the web gets too annoying with Privacy Badger, then switching to uBlock Origin might be the best option. You should run either Privacy Badger or uBlock Origin, but not both. They can interfere with each other in ways that are sometimes hard to figure out.)

Other recommendations (not tested by me so far).

Ad blocking is tricky—if you block ads on normal sites but not YouTube or Google Search, then you just encouraging advertisers to send more money to one big company, which is a bad idea. Do the YouTube tip before this one. If you’re seeing ads on YouTube but not on other sites, you’re doing ad blocking wrong. Most people surveyed say they are running an ad blocker now, and experienced advertisers are more likely to have one, so this tip might move up soon.

If you do decide to install an ad blocker, make sure that it is blocking the ads on Google Search. The ads on search results pages are designed to be easy to confuse with search results, and Google allows a lot of malware and other fraud in the search ads—even ads that look like they link to Google’s own site.

12. Nerd out with browser privacy extensions. Now we’re getting into the intermediate-level tips that will be interesting to some people.

  • ClearURLs gets rid of tracking parameters in URLs, and speeds up browsing by skipping data collection redirects.

  • NJS is a minimal JavaScript disable/enable button. Helpful for removing pop-ups and crap from sites where the real content works without JavaScript.

  • Personal Blocklist is surprisingly handy for removing domains that are heavy on annoyances and surveillance, but weak on actual information, from search results. (part of how to fix Google Search)

13. Send opt out, right to know, or right to delete requests manually. All right, we’re deep in privacy nerd territory now, you probably should have stopped reading two tips ago. This is way too time-consuming. I do it, but I’m mainly looking for companies that should be on the lists used by authorized agents to opt everyone else out. For most people these are just privacy labour and you’re better off doing a higher-priority privacy tip or writing to your state representative.

Any others?

More: My talk from Southern California Linux Expo (slides and text) has more details on context for some of these.

Other privacy how-to articles

How to: Get to Know iPhone Privacy and Security Settings from EFF is a thorough list for Apple iPhone users.

How Do I Protect My Privacy If I’m Seeking an Abortion? This article has a lot of good general information that applies to other health and family privacy concerns. It’s worth reading even if the headline reason does not apply to you.

Accessible tips for people to protect their privacy by Bill Fitzgerald is a more detailed, advanced list.

The Cyber-Cleanse: Take Back Your Digital Footprint - The Opt Out Project is a 20-day program that covers a lot.

29 Dec 2023

Opting out of doing 2024 predictions

I’m going to exercise my right to opt out of doing year-end predictions this year, because the most significant privacy and web ad news will depend on the state of US support for Ukraine, and I don’t have any special knowledge of what is going to happen with that. If the USA comes through for Ukraine, then our allies are likely to try to smooth things over on privacy issues in order to keep the coalition together. If we half-ass it or pause, though, a lot of people are going to be asking what the hell happened, and trying to figure out what they can do with less support from the USA. So I can’t predict anything from where I sit. Some of the 2024 happenings I have no idea about:

You know the EU’s DSA proceeding against “X” (former Twitter)? Could be just the beginning. A loss of US funding would be seen as a hybrid warfare victory for Russia, and everyone will want to see how Russia took and held social media, to try to do better next time. Joohn Choe writes that Ukraine is generally a laboratory of disinformation and anti-disinfo tactics. So the European Commission could come up with good reasons to open up similar proceedings against all the big social sites.

Transparency and Consent Framework: still not transparent enough? Some of the thousands of companies that can receive information about you through web and mobile “real-time bidding” for ads are likely controlled by foreign adversaries. The new version of the industry’s Transparency & Consent Framework includes more required documentation from companies receiving RTB data, and proactive auditing of CMPs and Vendors that will be randomly selected each month. That might or might not be enough. Limiting the risk of sanctions-evading data flows by sending TCF back to the drawing board again would be a cheap way to try to stifle the flow of money into Russia’s war economy, without a dependency on the US government.

Google Privacy Sandbox/Performance Max and infringing sites: Are you ready kids? SpongeBob SquarePants on a copyright-infringing Russian site The Russians have gone in for copyright-infringing sites in a big way, and a lot of that stuff is monetized by Google. Today, Google can track visitors from legit to infringing sites using third-party cookies, but when they take those out of Google Chrome they’ll have to rely on other technologies, which the UK’s Competition and Markets Authority says they will ‘closely monitor’. The version of Topics API now in Google Chrome is set up to give advantages to large sites like YouTube at the expense of smaller ones and to copyright-infringing sites at the expense of legit ones.

Google might have been able to get away with this. There’s a (citation needed) story about a kleptomaniac princess from a UK-allied country who used to visit London and shoplift from Harrods. Law enforcement tolerated it to avoid an incident. But now elections are coming up, there’s a lot of popular support for Ukraine in the UK, and the copyright industries there are a big deal. Would be hard for UK regulators to justify coming down on the side of Russian pirates.

Safe Harbor Privacy Shield Trans-Atlantic Data Privacy Framework This could be the big one, and it’s a stretch. As noyb.eu points out, the current proposal is the third attempt at a data sharing agreement, without substantial changes. Maybe the EU could be persuaded to let this third try squeak through if it would benefit a country that is a solid supporter on their biggest immediate problem, but sending your people’s data over to a country with an unresolved Russian influence situation is like sending your kids to spend the night with a family who all got positive virus tests.

So anyway, no predictions from me this year.

10 Dec 2023

Don Marti bio and disclaimer

Don Marti is VP of Ecosystem Innovation at Raptive (the company that used to be CafeMedia), and a former strategist at Mozilla and former editor of Linux Journal. He works on web ecosystem and business issues including collaborative research on the impact of advances in consent management and tracking protection technology. He started the California authorized agent project at Consumer Reports Digital Lab that led to the the development of CR’s Permission Slip service.

Don has written for AdExchanger, Linux Weekly News, and other publications, and co-authored a paper on the economics of software quality for the Journal of Cybersecurity and a book chapter in Advances in Advertising Research. He has been a speaker on open source software, user data, targeting, and signaling effects at technology and advertising events including Ogilvy Nudgestock, Southern California Linux Expo, and the MSIX conference. Don serves on the Advisory Committee of the World Wide Web Consortium.

Don Marti photo

Disclaimers

I have accepted Google funding in the past. Google was one of the sponsors of FreedomHEC USA. Sponsorships covered all travel and accomodations costs for speakers.

I am an advisor to Consumer Reports on the Permission Slip project and related privacy tools and services.

I am a former employee of and current stockholder in ScyllaDB, which is a really fast NoSQL database.

I have accepted a sponsorship payment for a blog post on Earthly.

Contact info and metadata

hello page: best ways to reach me

Slash pages on this site: metadata

12 Nov 2023

personalization risks

Based on some of the feedback to the Draft W3C Privacy Principles it is starting to look like some of the problems that get lumped together as privacy problems are actually personalization problems. Personalization can occur when a person gets different content or ads from another person in the same context, and is technically possible without any other party knowing what the person got.

If the Web were somehow able to get perfect privacy-preserving personalization—for ads, content, or both—some of the problems inherent to personalization would remain.

Unlawful discrimination Personalization can facilitate showing and hiding opportunities, such as jobs or housing, based on a user’s membership in a legally protected group of people. And the more obfuscated the personalization/discrimination gets, the harder it is for news sites to report on it and for regulators to take action. Privacy-preserving personalization could make illegal discrimination could become prohibitively difficult to detect, or get a conviction on, if a regulatory agency with a budget in the millions of dollars has to solve a tricky machine learning explainability problem devised by a company with a budget in the billions.

Hiding fraud A personalization system can be used by deceptive sellers to avoid making their ads or content known to people who are likely to take some kind of action against deception.

Content market race to the bottom effects This is a well-known problem in adtech, where an audience can be indentified on a high-value site and monetized on a lower-value that is not reported to the advertiser. Not mainly a web publishing problem—it’s mostly a problem for users who don’t get as much ad-supported content as their ads would otherwise pay for.

The mix of fraud and race to the bottom effects is sites that source traffic deceptively—they’re a little of both.

More: advertising personalization: good for you? (Notes on some of the literature on this subject, including papers that don’t get cited enough.)

05 Nov 2023

privacy economics sources

In one generation, we have gone from buying Computer Shopper for the ads to a whole ad blocking software development scene. What happened? The advertising that people are blocking now is different from what advertising mostly used to be, and most of the ways it’s different get lumped together under privacy.

Alessandro Acquisti writes, in The Economics of Privacy at a Crossroads, I argue that as economists we have, by and large, adopted a reductionist view of privacy that overlooks the richness and nuance of the contemporary debate around privacy.

Part of the problem, it seems to me, is the idea that people are balancing privacy harms on one side against supposed benefits of personalized advertising on the other side. But the harms are documented, while the benefits are…just kind of assumed. (usually) Meanwhile, a substantial body of literature supports the argument that personalization has built-in costs of its own. So privacy regulations and technology that limit personalization could be failing to get credit for some important economic wins. If you look at the math, a personalized message carries less information than a message sent without knowledge of the recipient.

Gardete and Bart “We find that when the sender’s motives are transparent to the receiver, communication can only be influential if the sender is not well informed about the receiver’s preferences. The sender prefers an interior level of information quality, while the receiver prefers complete privacy unless disclosure is necessary to induce communication.” Tailored Cheap Talk

This makes sense to me. If you hear an entrepreneur make two claims about their new product, one in a meeting with you alone, and one as an answer to a question on a trade show panel, you’re more likely to believe the second one. In that case, the entrepreneur didn’t know who was in the audience: existing customers, experts in the field, competitors, maybe even the government.

So it seems like moving the same ad from Computer Shopper style, seen by a whole community of practice, to cold call style, seen by specific groups only, would tend to reduce its credibility. But why does a widely shown ad carry information in the first place?

Kihlstrom and Riordan A great deal of advertising appears to convey no direct credible information about product qualities. Nevertheless such advertising may indirectly signal quality if there exist market mechanisms that produce a positive relationship between product quality and advertising expenditures. Advertising as a Signal

Ambler and Hollier High perceived advertising expense enhances an advertisement’s persuasiveness significantly, but largely indirectly, by strengthening perceptions of brand quality. The Waste in Advertising Is the Part That Works

Davis, Kay, and Star It is not so much the claims made by advertisers that are helpful but the fact that they are willing to spend extravagant amounts of money. Is advertising rational?

Now, here is where the value destruction from personalization comes in. What if an ad medium supports enough personalization that it becomes too hard for consumers to detect any economic signal from a legit ad that appears in the medium?

Hertzendorf This result, however, is sensitive to the assumption that consumers can perfectly observe the firm’s advertising expenditure. This assumption is somewhat unreasonable in light of the fact that much advertising takes place over various electronic media to which not everyone is ‘tuned in.’…Furthermore, the noise complicates the process of customer inference. This enables a low-quality firm to take advantage of consumer ignorance by partially mimicking the strategy of the high-quality firm. I’m Not a High-Quality Firm-But I Play One on TV

That paper covers the situation where viewers have a multitude of TV channels to choose from, but the signal loss effect is at its worst in highly personalizable online ad media. In a healthy ad medium, the value of ad-supported content plus the signaling value of the advertising itself is greater than the annoyance of advertising plus the risks of privacy threats. As signal goes to zero, the ad has a harder time paying its way, even if annoyances and risks are somehow reduced.

What’s even worse for users is that personalization provides extra, bonus value to deceptive advertisers. For a legit advertiser, the value of showing an ad to the wrong person is zero. You wasted a fraction of a cent. Deceptive advertisers, though, obtain value from avoiding showing their ads to people who would try to hold them accountable in some way. For example, a seller of designer knock-offs can avoid employees of the legit manufacturer, their ad agency, and their country’s law enforcement agencies. Sellers of quack medicine and investment scams can avoid regulators and journalists. The same personalization capacity can be worth more to deceptive sellers than to honest ones. (More examples of how online ad media facilitate deception, from the FBI, FTC, and NBER.)

Lin et al. We find that for users currently using an ad-blocker, uninstalling them leads to a deterioration in their online experiences and lower satisfaction with recent purchases. For users that were not using Ad-Blockers, installing one led to fewer reported regrets with purchases, an improvement in subjective well-being, and a less positive view of online advertising. The Welfare Effects of Ad Blocking

Mustri et al. [W]e find that [behaviorally targeted] ads are more likely to be associated with lower quality vendors, and higher prices (for identical products), compared to competing alternatives found in search results. Behavioral Advertising and Consumer Welfare: An Empirical Investigation.

The signaling problem and the deceptive sellers problem might help explain why privacy tool developers are choosing to block Google’s in-browser ad personalization. So why are so many intermediary companies so fired up about tracking people when it reduces the value to the buyer and seller ends? Maybe because overgrown open-source companies dominate advertising now? More: the Linux device driver hacker’s guide to giant Internet monopoly dominance Whether based on actual user research, user requests, or intuition, privacy developers are anticpating that users are just as likely to want to avoid personalization when it’s free of individualized tracking as they are to avoid personalization based on an individual identifier. Yes, there are group discrimination risks that are still an unsolved problem, but right now it looks like privacy developers are betting that users will want to avoid not just individual tracking, but all personalization. The signaling literature might provide some hints why.

12 Aug 2023

an ad medium is a game with five teams

As far as I can tell, any advertising medium is like a game with five kinds of players in it.

  • users: seeking the greatest possible value in (ad-supported content + economic signal) in exchange for (attention/interruption + computing resources + risk). This does not necessarily mean seeking the least possible amount of advertising. See Vogue and Computer Shopper.

  • legit advertisers: seeking to maximize return on ad spend, which can include direct response metrics and/or reputation/brand equity

  • deceptive advertisers: similar to legit advertisers, but the difference is that they don’t want the brand equity because it would tend to go negative.

  • legit publishers: produce original ad-supported content that pays its way for users.

  • deceptive publishers: produce ad-supported content that either does not pay its way for users (costs more in attention, resources and risk than it’s worth) or takes from some other party without compensation (such as copyright infringement).

Sometimes an ad medium has qualities that give an advantage to some categories of players but not others. I don’t know exactly what those qualities are, but there is some interesting literature that might help to figure them out.

When an ad medium is implemented in software, the hard part is to design the rules in order to give an advantage to the legit players. And this is really hard. Today’s search, social, and web RTB advertising are giving persistent advantages to deceptive players. And just hiring a bunch of ad reviewers to chase down the deceptive players is not going to work. If the rules of the market are laid down so that they give an advantage to deceptive players, then any deceptive player who can get their costs down below the cost of finding and blocking them is going to keep at it.

05 Aug 2023

you’re on a customer journey, they’re on a marketer journey

Read the whole thing: The emerging marketing economist by Eric Seufert. The good news here is that technical and regulatory changes to how privacy work are already affecting investment decisions by companies. Mobile Dev Memo is a valuable source for understanding the surveillance marketing scene on its own terms, and is a good thing for privacy people to read. It’s not all about mobile. In the days of Peak Surveillance Marketing (2006-2018) marketers could basically put existing customer info, plus money, in to a big platform company, and watch sales leads come out. But that’s less and less of a thing. Marketers are having to do more math to connect cause and effect.

The output of probabilistic measurement models requires interpretation: weaving a plausible story of how marketing spend generated conversions through an analytical framework.

That’s one side of the game, the marketing side. Now think of it from the privacy side. Some of that marketing spend pays for things that you don’t want to happen. The biggest impact of a privacy habit, tool, or regulation is to interfere with the surveillance marketing cycle and make it harder for the marketing decision-maker to connect an actual sale back to their earlier decision to spend money on a harmful practice.

the surveillance marketing cycle

This is an oversimplified version of the cycle and I’m open to suggestions for improving it. But it’s a way to start thinking about the most effective use of your limited time. You probably don’t have enough time to adequately secure your own personal privacy and still participate in the modern economy. But it is possible, if you focus on breaking the cycle for the most harmful practices, to help that marketing decision-maker reallocate their money away from surveillance and toward marketing projects that are win-win for vendor and customer. Advertisers in the USA spend about $1100 per person on advertising. Privacy habits, tools, and politics are how you can have an impact on how your $1100 gets spent.

If you invest in protecting your personal privacy, but marketers can still see that surveillance pays, then they have the resources and incentives to invest in technical workarounds and lobbying to undo your investment. This is one more reason why privacy can’t be personal, it’s a cooperative project. Lower the payoffs from surveillance practices, and ambitious corporate people will move to where the money is. If you’re on a customer journey toward spending money with them, they’re on a marketer journey towards spending money in a way that benefits (or at least doesn’t do too much harm to) you.

first positive step: YouTube

For most readers of this blog, getting YouTube into a safe state is probably priority one. Those video links come up in enough contexts that unless you break the link between YouTube viewing and shopping, it’s easy for a marketing decision-maker to connect send money to YouTube with sell stuff. Some options:

  1. Independently developed client (not a browser or the official YouTube app)

  2. Open YouTube in a separate browser not used for other Google services, and use LeechBlock in your regular browser to block YouTube there.

  3. my approach, which is convenient day to day but harder to set up: Firefox container + the “Enhancer for YouTube” extension

More on how users and legit brands will be better off

09 Jul 2023

Google Chrome ad features checklist

(Update 22 Dec 2024: add link to separate post on policy files)

(Update 21 Sep 2024: add link for turning off ad personalization for Google Accounts)

(Update 11 Jul 2024: add link for turning off ad tracking in Firefox)

(Update 2 Jan 2024: add Dark Visitors link)

For complicated reasons that you can read about in other places, Google Chrome is in the process of shipping a bunch of built-in advertising features, to enable a new variant of Real Time Bidding advertising inside the web browser. More info from noyb.eu: Google Chrome: Agree to ‘privacy feature’, but get tracking!

As a user, you’re probably better off with a browser from a company that isn’t incentivized or required to do any of this stuff. Besides the privacy and antitrust problems, in-browser ad features use your battery and bandwidth—one ad service reported a delay of 1500 milliseconds per ad. But sometimes a site is broken in other browsers, sometimes you need to test a site in multiple browsers, sometimes there is a company policy requiring a certain browser, or whatever. So, for your convenience (fine, mostly mine) here are notes and links for what to do about it.

Google’s in-browser ad features have been branded as Privacy Sandbox, but that’s a marketing name that applies to a lot of projects. So far the Privacy Sandbox options do not seem to include any actual privacy features, so you can safely turn them off. memo to self: update this blog post with details if they come out with a Privacy Sandbox feature that’s an actual privacy feature you should leave turned on.

For all Google Chrome users

  • Turn off third-party cookies, under chrome://settings/cookies. Select Block third-party cookies.

    This setting makes Google Chrome’s behavior similar to that of other browsers, so shouldn’t break anything on the normal web. (Some corporate applications will break, so if you have those applications at your company or organization, you can share the links in the desktop system administrators section with your IT department and they can configure Google Chrome to do the right thing.)

  • Below the third-party cookie setting, turn off Allow related sites to see your activity in the group (This is third-party tracking for sites that claim they’re related in some way—but the groups don’t get an independent check.)

  • Go to the in-browser ad features, at chrome://settings/adPrivacy or chrome://settings/privacySandbox and turn off at least Ad topics which is the really problematic one, and any of the others you want to. This stuff is all new, and sites don’t depend on it, so you can safely turn it off without breaking anything.

For Google Account holders

If you have a Google Account for services like YouTube or Gmail, go to https://myadcenter.google.com/home and set Personalized Ads to Off. (This should cover downstream processing of your data, not what happens in the browser.)It’s not about the cookies or specific tracking technologies. Personalized advertising is unavoidably risky, and so-called privacy-enhancing technologies just make it worse.

For Android users

Consumer Reports has an up to date set of instructions that should work across Android versions. See Workout 4 at CR Data Bootcamp

If you use an ad blocker or content blocker

If you choose to run an ad blocker or a content-blocking privacy extension, pick one that is honest and currently maintained.

Please don’t recommend an ad blocker to other people. When people search for ad blockers on search engines or browser extension directories, they’re likely to find malware, adware, scams, extensions that block visible ads but not tracking, and extensions that participate in a paid allow-listing scheme. If you recommend an extension, make sure to give a specific name and link.

The functionality available to extensions in Google Chrome is being reduced, so running Google Chrome with an ad blocker does not look like a long-term solution. But either of the above two should work for now. (If you do switch browsers, be sure to turn off advertising features in Firefox.)

For command line users and desktop system administrators

You can install a managed policies file to make these settings apply to all profiles and users. See turning off browser ad features from the command line.

Some Intranet and SaaS applications still require third-party cookies. If you are setting this up for other users, test this option thoroughly, especially if you have web applications that are known to only work on Google Chrome.

The other in-browser ad features are new and nothing depends on them, so they can safely be turned off for everyone. It’s not just a privacy thing. The performance impact of running an entire in-browser ad stack, especially on older PCs, means that these settings are likely to extend desktop system life and reduce user complaints.

For web developers

Data on your web visitors can be collected using a third-party script. Some third-party scripts are already being extended to do this. Fortunately the fix is something that looks like a good idea anyway: setting Permissions-Policy.

  • If you run a site with no third-party JavaScript or iframes, and you know you will never add them: you should technically be good, just don’t put any third-party JavaScript or iframes on it. (That includes embedded Twitter tweets, embedded YouTube videos, Imgur photo widgets, third-party fonts, all that stuff.)

  • If your site has third-party JavaScript or iframes, or someone might add one in the future: Check your Permissions-Policy HTTP header to make sure it includes browsing-topics=(). As I write this, a good header is:

Permissions-Policy "accelerometer=(),autoplay=(),browsing-topics=(),camera=(),display-capture=(),document-domain=(),encrypted-media=(),fullscreen=(),geolocation=(),gyroscope=(),magnetometer=(),microphone=(),midi=(),payment=(),picture-in-picture=(),publickey-credentials-get=(),screen-wake-lock=(),sync-xhr=(self),usb=(),web-share=(),xr-spatial-tracking=()" "expr=%{CONTENT_TYPE} =~ m#text\/(html|javascript)|application\/pdf|xml#i"

See permissions-policy.conf in the h5bp project. If your site requires some of these permissions in order to work, you can always turn some of them back on. More info in the Permissions Policy article on MDN.

For WordPress sites

The WordPress plugin Disable Topics API by Roy Tanck will set the Permissions-Policy header for you.

08 Jul 2023

browser topics tracking and the prejudiced landlord problem

(8 Sep 2023: added links, copy edit for clarity)

Here’s a little longer explanation of the prejudiced landlord use case from my notes on Google Topics API. First of all, yes, bias in rental and other decisions is still a thing. Business decision makers are not Econ 101 profit maximizers. They’re entire people, with human motivations, good and bad. So any ad platform that’s large enough is going to get a mix of advertisers who want to do good and bad stuff.

If a platform measures, in aggregate, how well the advertisers achieve their goals, the platform is going to measure both the success of money-seekers in getting money, and the success of people with other motivations in achieving their goals. From the ad platform point of view, it’s hard to tell a pure money-seeking landlord, looking for the tenant likely to earn them the most profit, apart from a prejudiced landlord, who is willing to give up on renting to an otherwise suitable tenant if it means they can keep someone from a certain group of people out of their building. Any automated ad platform, unless management makes a substantial positive effort to do otherwise, is going to end up optimizing for the needs of prejudiced advertisers along with the rest.

Automated social media advertising saw this problem in a big way. A big social media site can identify users by membership in a protected group and allow landlords to exclude certain people from seeing an ad. But that turned out to be easy to spot. So how can an ad platform, which because of its size is optimizing for the needs of advertisers with all kinds of motivations, manage to offer housing ads that discriminate against members of protected groups without leaving evidence? That’s where some software components and protocols come together.

  • Conversion tracking: an advertiser can report to the ad platform whether a sale was made to a particular user.

  • Topics-based ad placement: an ad platform can place an ad based on topics collected from user browsing history.

  • Machine learning: the ad platform can optimize the likelihood of conversion of a particular ad.

Now, the discrimination never has to be written down. It’s calculated on the fly by the machine learning system, and ML decisions are notoriously hard to figure out. So,

  • user membership in a protected group is correlated with browsing topics in some way (could be reflected in either presence or absence of particular topics)

  • browsing history is classified and mapped to topics, in a way that is correlated both with actual topics browsed and with a pattern of noise and errors introduced by the classifier

  • user passes browsing topics into an ML system. If the user does examine “their” topics, the user can identify the topics, but not the errors in how topics are assigned, or how those topics, with errors, are correlated with membership in one or more protected groups

  • advertiser passes conversion data into the ML system. In the case of housing, a “conversion” reflects whether the prejudiced landlord decided to rent the apartment

  • ML system optimizes for conversions, and “decides” to show fewer ads for the prejudiced landlord’s properties to members of the protected group. It’s nearly impossible to explain why the “decision” was made, so there’s no evidence of the crime.

Won’t members of protected groups just avoid discrimination by blocking certain topics from being shared, or turning off the topics tracking feature? Well, no, because people who don’t expect to be discriminated against will be less likely to take that action, and the ML system will learn to discriminate that way.

Google’s Topics API is intended to be a subset of third-party cookies, and of course all of this is possible with third-party cookies instead of topics tracking. And the problem is not that the company is choosing to offer services to enable prejudiced advertisers to achieve their goals, it’s that because of the need to serve advertisers at large scale, advertisers with discriminatory goals will be served along with the rest, in a hard-to-detect way. According to Google’s FAQ,

Chrome can and will take steps to avoid topics that might be sensitive (i.e. race, sexual orientation, religion, etc.). However, it is still possible that websites calling the API may combine or correlate topics with other signals to infer sensitive information, outside of intended use. Chrome will continue to investigate methods for reducing this risk.

While they’re working on that investigation, best to leave this feature turned off.

What to do about it

Most of the web ad revenue comes from a small fraction of the ads. (It varies depending on how you measure but the 80/20 rule is a good way to think about it.) If you’re in the 20% of people who either get the best ads, or would have gotten them if you hadn’t protected yourself, then you have more impact. And, in my humble opinion, an oportunity and responsibility to help protect others by taking your own valuable eyeballs off the market. After turning this stuff off—especially if you’re the kind of person who would get targeted for the good ads—the most important step is to either remove all third-party JavaScript and iframes, or set the Permissions-Policy header on your web site to protect your users. Simple list of what to configure: Google Chrome ad features checklist

03 Jul 2023

Xumee USB-C headphone and charger adapter on Linux

Welcome search engine users. If you are trying to find out if the Xumee USB Type C to 3.5mm Headphone and Charger Adapter works on Linux, then I have good news for you. Works fine, no changes to settings required.

Shows up as Headphones - KM_24bit96khz Audio on my system (ThinkPad X270, Fedora 38, kernel 6.3.5.) and sounds good with the headphones and earbuds I have tried.

These things seem to be all over the online stores for only about the price of a burrito mojado. For more info, read Xumee USB-C DAC Dongle Review - A View From The Cheap Seats on audioreviews.org.

(and yes, these kids today have it easy, do we just assume that everything works out of the box on Linux now, so this post will get zero traffic?)

02 Jul 2023

not a blog post

This is not a blog post. This is some links that did not make it into any blog post on here yet, from the script that collects links to make blog posts with. This is all worth reading, I just didn’t make the links. (and of course the categories are a matter of opinion)

Software and Internet

AI Is About to Dump More Work on Everyone

Cheating is all you need

Large Libel Models: An AI Company’s Noting That Its Output “May (Be) Erroneous” Doesn’t Preclude Libel Liability

I tried out SyntheticUsers, so you don’t have to

SQLite has Write-Ahead Logging

3 Billion Chrome Users Are About to See This Privacy Sandbox Pop-Up

MDN can now automatically lie to people seeking technical information · Issue #9208

Red Hat blog on RHEL changes

“A really big deal”—Dolly is a free, open source, ChatGPT-style AI model

Retrospective on a dying technology cycle, part 4: What comes next?

ChatGPT Gets Its “Wolfram Superpowers”!

Why I’m all-in on the fediverse

How Google Ran Out of Ideas

Collections: On ChatGPT

Safari 16.4 Is An Admission

People do use Add to Home Screen

How Rust went from a side project to the world’s most-loved programming language

Creating slides with Markdown using Marp

Clever Hack Uses YouTube to Back Up Your Data in Google’s Cloud for Free

Push notifications are now supported cross-browser

Advertising vs. surveillance

Pluralistic: Everything advertised on social media is overpriced junk (08 Apr 2023)

Careful with LinkedIn, dammit

In a shrinking marketplace, Magnite explores media trading without DSPs

Germany accuses Twitter of failing to remove illegal hate speech

Mobile Advertising in a Data Constrained Future

Inaudible ultrasound attack can stealthily control your phone, smart speaker

Introducing the California Delete Act, How the California Delete Act Further Protects Reproductive Rights

How Malvertising Shapes Behavior and Threatens An Industry: A Q&A with Yuval Shiboli, GeoEdge

Firefox rolls out Total Cookie Protection by default to more users worldwide

Marketing Brew

Is Micro-targeting Giving the Digital Advertising Sector a Bad Rap?

AI Drake Just Set An Impossible Legal Trap For Google, by Nilay Patel, The Verge

Firefox may soon reject Cookie prompts automatically

Why Are So Many US Companies Using Cookie Banners On Their Websites?

Retail, search and Amazon’s $40bn ‘advertising’ business

Meta was scraping sites for years while fighting the practice

Joe Brockmeier: Something I should’ve done a long time ago: Installing Pi-hole

Build security with the assumption it will be used against your friends

musings: a semester with the framework laptop

Until further notice, think twice before using Google to download software

DON’T WAIT FOR TECHNOLOGY

Energy and economy

Heat pumps boom in Poland, Europe’s fastest growing market

Inside Europe’s high-tech scramble for better energy storage

How did solar power get cheap? Part I, part II

These deep-sea “potatoes” could be the future of mining for renewable energy

Feedly launches strikebreaking as a service

The U.S. produced more electricity from renewables than coal in 2022

Largest Ever Hydrogen Fuel Cell Plane Takes Flight

This giant ship is a symbol of the shift from oil to renewables

Paris bans cars: The city pulled off an urban dream. Is it a model or a warning?

The Silicon Valley Bank Collapse Should be a Radicalizing Moment

Moral hazard for thee but not for me

A Bank of One’s Own

How Cigna Saves Millions by Having Its Doctors Reject Claims Without Reading Them

Block: How Inflated User Metrics and “Frictionless” Fraud Facilitation Enabled Insiders To Cash Out Over $1 Billion and Is Jack Dorsey going to get blown up by Hindenburg Research?

Those meddling kids! The Reverse Scooby-Doo theory of tech innovation comes with the excuses baked in

An Introduction To Class Warfare For The Software Engineer

https://www.vice.com/en/article/88qze4/32-nasty-rail-workers-say-they-knew-the-train-that-derailed-in-east-palestine-was-dangerous

Sam Bankman-Fried is not a child

Moving Company That Threatened People With $1,000 A Day Fees For Negative Reviews To Pay $125,000 Settlement

Miscellaneous

Place your bets

Make Up a Guy

The Russians Keep Evolving

I liked this simple calculus exercise

Bosses hate work from home because ‘home’ is for women

Is college stressing you out? It could be the way your courses are designed

Not Forgotten

Colorado Farmers Just Won a Historic Right-to-Repair Battle

Inside Rupert Murdoch’s Succession Drama

7, 11, and 13

Weekend Update #23: Reflections from Ukraine

Reshoring Vacuum Tube Manufacturing, One Tube at a Time

This $39 Cooler Master case turns your old Framework Laptop parts into a tiny PC

Marsupials and other mammals separately evolved flight many times, and we are finally learning how

Why not put TSMC and Samsung fabs in Canada?

How to Understand the Well-Being Gap between Liberals and Conservatives

Exploring “White Sun War”

People Forgot How War Actually Works

‘Incredibly intelligent, highly elusive’: US faces new threat from Canadian ‘super pig’

Place Your Bets

Biden Went to Kyiv Because There’s No Going Back

Indiana lands new support for local news — and Capital B’s next newsroom

Congratulations on the mess you made of things

God Did the World a Favor by Destroying Twitter | WIRED

Is text-generating AI an industry killer or just another wave of hype?

Why The Failure By Funders To Require Academics To Retain Copyright In Their Papers Is The Biggest Obstacle To Open Access Today

Bay Area Cities To Lose ALL Housing Zoning Powers in 2 Days

Political stress test: Adopting a compliant Housing Element

Dungeons & Dragons Scraps Plans to Update Its Open Game License

Tanks for Ukraine Have Shifted the Balance of Power in Europe

Sneaking Things In

Villains

Redesigning the entry from the Fruitvale Bridge to Alameda

Open Game License Version 1.2 Draft

14 May 2023

notes on ad-supported piracy

(This is a blog version of a thread on federate.social, somewhat expanded and with more links added.)

The more I learn about how web ads work, the less impact it looks like Large Language Models (LLMs) (either big company ones, or the underground ones that will do better than Big Tech can) will have on web publishing.

The pessimistic case is that LLMs trained on original human-written content will generate large quantities of derivative work and drive down ad rates. But the problem with that theory is that web ad rates are already being set by the lowest-effort sites. It’s not hard to find big-budget brands running their ads on sites that are full of obvious copies of legit works available elsewhere. One well-known large Internet company sells both authorized e-books on a store app and runs the ads on pirate e-book sites offering the same content.

I know I mention e-books a lot when I talk about ad-supported piracy, but that’s because of my own media consumption habits. I have a pretty good idea how to find out which publishers work with authors who write books in English, but I don’t know anime. I’m also really weak on the streaming rights to UK and European football (soccer), and I don’t know Bollywood movies or the e-book market in languages other than English. There are hella sites offering the same titles in all of those areas, and I doubt they’re all legit, but my language and business knowledge is limited so I don’t know. bonus link about the book business and piracy, from Hugo-winning author Charles Stross

Anyway, I have been checking a lot of these sites out, and getting to the point where I’m starting to have an outline of the ad-supported piracy racket. It’s one of those emergent crime networks that grows in the cracks between different kinds of laws.

Step 1. This is the tricky part, and all the rest of the crime stuff falls into place once this is working. Make an advertising system that either hides or obfuscates the information on where the ads ran.

It is possible for advertisers, or their agencies, to get the same info I did, and get a better idea of how this problem affects a specific brand or campaign. If you work with an up-to-date, honest agency, and they use adtech partners who correctly implement the ads.txt, sellers.json, and SupplyChain industry standards, then it’s possible to get a trustworthy report on where the ads appeared.

In practice, not only do many adtech intermediaries fail to offer a tear sheet to the advertiser and agency, they actively obfuscate much of the data that would reveal ad-supported piracy if it was widely available. Some well-known adtech intermediaries throw huge amounts of development work at hiding it. And they’re making it worse, not better. (At least one of the systems intended to replace third-party cookies seem to be designed to continue the leakage of audience data from legit to infringing sites.)

short digression: This is why ad agencies started in the first place. If you’re a manufacturer in Connecticut making patent medicine or barbed wire or whatever, you have no idea which of the newspapers in the western USA are legit, and which are some no-good outlaw who prints one copy of your ad just for the tearsheet, then invoices you for reaching a whole town. So you hire an agency in Chicago to buy ads for you and check that the newspapers are real. From just handling the insertion orders, tearsheets, and invoices, the agency business model eventually grew to include making the ads. Are there any Western novels about an advertising agency media buyer who rides into town to sort out the real newspaper from the cheating varmints sending faked-up tearsheets?

Step 2: Get a hold of a hard drive full of copyright-infringing content, make a pirate site, and either sign up with a big company to get ads on it, or use a dark pool sales house to hide your pirate site under the same ad ID as a bunch of other sites.

Step 3: The pirate site is hidden from the advertisers, so they won’t complain about paying full price for low effort—but eventually the copyright holders will complain to the search engines to try to get the pirate site kicked out.

Step 4: That’s where the pirate site points to their “DMCA PolicY” and responds: we’re shocked to discover that “one of our users” uploaded that copyrighted work “without permission.” The pirate site can claim a DMCA safe harbor as an online service provider but it doesn’t look like the safe harbor applies to the adtech firms or to the advertiser.

Step 5: Eventually the pirate site gets kicked out of search engines, but that’s fine because they have already registered more domains and copied the content over.

Step 6: This is the strange part. Even when the search engine and the ad network are the same company, the pirate site operator can keep their ad account while losing domains to copyright complaints. I see the same ad accounts carrying over across 3 domains that appear to have been either kicked out of search or pulled down for copyright reasons.

The crime here is behind the scenes: a site pops up with all or most of the same copyrighted content as the owner’s last site, but when they get a DMCA takedown, they blame it on some “user upload.” If, as the copyright holder or advertiser, you could find the site, then you could probably figure out what they’re up to, but the way that web ads work is designed to make it hard to see where they went.

screenshot of a pirate copy of a computer book with a CVS ad Example from a pirate ebook site, showing a book listing and the footer ad. (This is one of those “confidential” sites we hear so much about.)

Meanwhile, the advertiser could be insisting on accurate reporting, but most advertisers don’t do this, and nobody at the agency wants to point out mistakes being made by their client (even if the agency is honest and not somehow in on the racket). Anyway, LLMs are likely not going to be transformative here, more of a force multiplier for an existing business model.

More: the Linux device driver hacker’s guide to giant Internet monopoly dominance

11 May 2023

generative ai antimoats

(Update 17 Nov 2024: add links to EU and California laws that will affect Big AI firms)

(Update 30 Aug 2023 and after: add links to Related section)

(Update 14 May 2023: add meme and attention span paragraph.)

Read the whole thing: “We Have No Moat, And Neither Does OpenAI”. To me, it looks more interesting than that. Big companies have sustainable anti-moats in generative AI. Independent and underground operations can do generative AI in ways that a big company can’t.

First of all, filtering output. If you run a big service for everyone you have to hire a bunch of people to train it not to output something that could be illegal, somewhere. There are a bunch of these laws. You can’t…

  • libel people in the UK

  • give a programmer in the USA some code to bypass a DRM system

  • insult the monarch of Thailand

  • give a user in Germany an article that denies a historic genocide

A Big Tech company doesn’t know which users are going to report them for which crimes, so they have to try to keep everything compliant, or compliant-looking, for all users worldwide. An independent operation only has to comply with the law in their own jurisdiction. (Personally, I would not run any generative AI service, at any scale, in a country where the blasphemy law carries the death penalty, but that’s just me.) Meanwhile, if you’re running underground generative AI in the USA, you can probably get away with breaking a bunch of laws. Section 230 is your friend. Copy a trick from ad-supported pirate sites, which use a different immunity law (DMCA) to keep themselves out of trouble. Post the generative AI content from regular user accounts, and if anyone complains, play either the Section 230 card or the DMCA card and blame it on a user. (This is one of the ways YouTube got videos for their site when they were getting started. Jawed, please stop putting stolen videos on the site. We’re going to have a tough time defending the fact that we’re not liable for the copyrighted material on the site because we didn’t put it up when one of the co-founders is blatantly stealing content from other sites and trying to get everyone to see it.)

Second, training sets. Big companies keep records on how they use copyrighted works, and underground operators have ever-bigger hard drives full of material that they, or their friends, have been collecting since the 1990s. A new California AI Law Mandates Disclosure of GenAI Training Data. After big companies have to clean up their training sets under the harsh glare of legal threats, underground AI projects can still just torrent a big pile of copyrighted books, or use various tricks to grab content from sites that have an AI Opt-Out. Copyright cases will affect Big Tech but not underground AI, which can use the same tricks for getting away with infringement that the regular ad-supported pirate sites already do.

Underground AI can also get training data from unsuspecting, unpaid users, like GPT-4chan did. Gamers and forum users are training underground AI right now by interacting with bots that collect and respond to their activity. And how do you tell if the operator of the forum or game server is running the bots? Or are bots training on services offered by others? Or is the forum or game using the Section 230, we’re shocked to discover what one of our users was doing, trick?

Meme version of this blog post

Big Tech companies are law-abiding enough that they have to invest heavily in filters on generative AI—but they do enough crimes that they have to collaborate with the expectation that their email, commit messages and chat will eventually become public (there is corporate training about how to email about crimes and have it covered by attorney-client privilege but it turns out not to work as well as they thought it would.) Big Tech AI projects are being watched by regulators, competitors, employees who are going to be working for a competitor next week, wannabe whistleblowers, independent researchers…and soon will have to open up a bunch of records to plaintiffs’ lawyers. That’s a combination that will lead to chilling effects on internal communcations that underground AI operations don’t have. (Yes, individual underground AI rings will get busted, but they’re compartmentalized.)

And I almost forgot: the rent extraction and short attention span problems. Building a service on a Big Tech API is like opening a restaurant in a building owned by a San Francisco landlord: if you fail, you fail. If you succeed, they raise the rent on you. And, unlike commercial real estate leases, if you build on a Big Tech API, the landlord can kick you out at any time just because they get bored. Or, more likely, because the product manager nailed their OKRs, got promoted, and now the service is in the hands of someone who knows that you can’t buy a house in Mountain View, California just by keeping an existing service from crashing. Steve Yegge explains it best.

Generative AI frenzy was rushed out in response to the 2022 Big Tech union organizing wins, like fast food robots were demoed before they were ready in response to “Fight for $15” a decade ago. But somehow it doesn’t seem that they thought it through very well. That’s probably a cultural problem, just because the instinct to do union-busting regardless of longer-term consequences is selected for so heavily. Anyway, AI is not going to be an advantage for the big incumbent companies that are still in the awkward halfway crook space.

More: notes on ad-supported piracy

11 Apr 2023

Reputation, signaling, and targeted ads

Three recent developments that I’m still trying to figure out.

The FBI Says You Need to Use an Ad Blocker on Google and Bing. According to the Internet Crime Complaint Center, criminals are using ads in search engine results like Google and Bing to impersonate brands. These ads send unsuspecting users off to phony websites that look identical to the pages people are actually searching for, where they are then be subjected to ransomware or phishing attacks. The Bureau says an ad blocker can help.

FTC Issues Orders to Social Media and Video Streaming Platforms Regarding Efforts to Address Surge in Advertising for Fraudulent Products and Scams. With fraud on social media surging, the Federal Trade Commission has issued orders to eight social media and video streaming platforms seeking information on how these companies scrutinize and restrict paid commercial advertising that is deceptive or exposes consumers to fraudulent health-care products, financial scams, counterfeit and fake goods, or other fraud. (also, from the FTC last year: Social media a gold mine for scammers in 2021. With fraud on social media surging, the Federal Trade Commission has issued orders to eight social media and video streaming platforms seeking information on how these companies scrutinize and restrict paid commercial advertising that is deceptive or exposes consumers to fraudulent health-care products, financial scams, counterfeit and fake goods, or other fraud.)

And now: Behavioral Advertising and Consumer Welfare: An Empirical Investigation. [W]e find that [behaviorally targeted] ads are more likely to be associated with lower quality vendors, and higher prices (for identical products), compared to competing alternatives found in search results. Or, as Julia Angwin put it, Online Ads Are Serving Us Lousy, Overpriced Goods. What is going on?

Update 29 Sep 2023: Consumer Surveillance and Financial Fraud | NBER A study on the impact of Apple App Tracking Transparency on fraud complaints as recorded by the FTC: We estimate that the reduction in tracking reduces money lost in all complaints by 4.7% and money lost reported in internet and data security complaints by 40.1%.

The old-school view of advertising is that it helps you tell legit sellers and deceptive sellers apart. It’s an economic signal. Rory Sutherland said, You could argue that advertising is a display of commercial fitness, that if you were intending purely to make short term, quick bucks, you’d simply sell via other means, and it only pays to advertise if you’re planning to establish a long term reputation for quality and probity. For example, legit sellers and deceptive sellers are equally good at cold calls, so a cold call is not a good way to tell a legit seller and a deceptive seller apart. Someone who studies the behavioral economics of advertising would probably say that a cold call carries no signal, so it’s not worth your time to take it. Ignore cold calls, but pay attention to ad media that can carry an economic signal.

So, can we divide ad media into signal-carrying ads—usually the ones that make it easy for enforcers of laws and business norms to see the same ads that less well-informed users get—and ads that don’t carry signal, which are safe to ignore? Maybe not. Now it’s becoming clear that the money spent on surveillance advertising displaces investments in other areas. So seeing a surveillance ad means that it’s less likely to be a win-win offer than a product you discover in some other way.

I’m not sure what to do with this info other than take it into account in customer research. Is the absence of an search or social ad, probably in combination with other information, becoming part of a useful heuristic for selecting a quality product for some buyers?

More: privacy economics sources

13 Mar 2023

Notes from my talk at Southern California Linux Expo

Here are notes and links from my talk at Southern California Linux Expo. This is not exactly what I said but the links should be right.

title slide

Introduction: limits of individual protection

I need to start with a disclaimer. Privacy tools will not get you privacy, but it’s important to do them anyway.

  • Most privacy tools and settings make you different from others, so more fingerprintable (hi, Linux nerds)

  • Out of band surveillance, which privacy tools can’t touch, is still there (and the more you participate in the modern economy, the more you’re surveilled)

  • companies fail to comply with existing laws (yes, they complain that they’re “confusing” and somehow that seems to work in a lot of cases.) And they play legal tricks to build up and use friendly jurisdictions.

  • the more you try to protect yourself, the more crappy ads you get. How long can you stand getting the miracle cure ointment ad featuring the “before” picture—before you give up and try to get tracked so you get more legit ads?

By itself, no set of privacy tools is going to be able to get you to an acceptable level of privacy. If that’s what you came for, you can split, go get a coffee or something. Privacy tools are worth doing, but only as part of a cooperative program to address a much larger set of problems.

Am I here to argue the privacy point of view or the pro-business point of view? Yes. First of all, surveillance advertising competes directly with product innovation for investments. The economics literature on the effects of shifting investment away from surveillance is going to be interesting. In the meantime, we have evidence from mobile games and mobile game advertising, which is a fast-moving behavioral economics lab that helps spot trends early. As mobile game developers lose much of their targeting info to Apple’s App Tracking Transparency, they compensate by investing more in content and gameplay.

Surveillance also puts us at greater risk from internal strife and external adversaries. Bob Hoffman told the European Parliament, Tracking is also a national security threat. The Congress of the United States has asked U.S. intelligence agencies to study how information gleaned from online data collection may be used by hostile foreign governments. (More on that: Microtargeting as Information Warfare by Prof. Jessica Dawson of the Army Cyber Institute) All right, all right, fine, you came for open source privacy tools advice so I will include some. We’ll do some fun tips, but I will try to put them in context. Here’s number one.

Privacy tip: fix YouTube

This one is an easy example of a tracking risk. The video you came to watch is great, but the stuff that gets suggested after that is the first step down the rat hole. A case before the Supreme Court right now covers YouTube running ISIS recruiting videos. And one of their biggest stars was just arrested in Romania for human trafficking. The recommendation algorithm on there will take you to some pretty dark places, pretty fast. But there are also videos on there you have to watch for school, or work.

What I want is a YouTube setup that will not just protect my privacy, but also reduce the number of YouTube impressions I generate. (Also I want to visit my family members at college or at a legit job, not at terrorist training camp.)

This requires two browser extensions. First, use the Multi-Account Containers extension to put YouTube in a separate browser container. That’s a separate space with its own logins, storage, and cookies.

Make a YouTube container

Click the extension, select “Manage Containers” and make a new container for YouTube. Then go to YouTube and choose “always open this site in container.”

Second, use Enhancer for YouTube to turn off the problematic features.

  • Turn off autoplay

  • Hide “related videos”

  • Enable “Theater Mode” to make main video wider (and fill in the space left by related videos)

  • All your other least favorite features of YouTube can also be hidden.

Turn off growth hacking features of YouTube

Don’t use the native mobile app for YouTube. There are also alternate front ends for YouTube and other services. LibRedirect is an extension to automatically redirect to those services. But they aren’t as future-proof, because it’s easier for YouTube to break them.

Privacy violations as part of a larger problem

Privacy is downstream from other problems. Surveillance capitalism isn’t a thing because advertisers care about your personal data. If you do a “Right to Know” most of what you get is wrong. According to Oracle, I am a soccer mom of four.

I hate to break it to you, but those startups that tell you that you can make money selling your personal info to advertisers are wrong. Your data has value as a short-term intermediate reaction product, as an input to two kinds of deceptive practices.

  • First of all, the B2C scams: ad targeting lets the legit advertisers show their ads to the affluent users who can afford stuff. And they typically target younger people. Targeting generally cuts off at age 34 or 54. So what do social media platforms do with all the ad slots that the legit advertisers don’t buy, and that go to older and poorer people? Right, they get the not so legit ads.
I haven’t been doing this. Will this be my last SCALE?
  • Second is the B2B scam: tricking advertisers into sponsoring content that they would choose to avoid, such as Russian disinformation and copyright infringement. User tracking lets ad intermediaries sell an ad impression that claims to reach a high-value user, but on content that no reasonable advertiser would choose to support.

Hey, look, it’s a real car ad on a scraper site.

NewYork News Times, real legit looking site here people

How can such a large industry get away with this kind of conduct for so long? Professor Tressie McMillan Cottom wrote,

From buying a gallon of milk to making a dinner reservation, all the way up the chain to electoral malfeasance, so many of our interactions feel weighted away from social connection in favor of extracting every ounce of unfair advantage from every single human activity. Not to overstate it, but a pluralistic democracy simply cannot function when most of its citizens cannot trust that the arrangements that they rely on to meet their basic needs are roughly fair.

To put the privacy problem in context, privacy violations as we see them are downstream from changes in business norms. We’re pulling on one tentacle of a much bigger threat than just the obvious personal privacy issues such as swatting, identity theft, price discrimination, and investment scams.

The relative payoffs of production and predation (or “making” versus “taking”) are determined by legal mechanisms for enforcing contracts and protecting property rights, but also by social norms and interpersonal trust. — Stephen Knack

The more common that scams become in our markets and society, the more trust we burn, the more unnecessary risk we take on, and the more wealth we miss out on. This is not an advertising problem, as Bob Hoffman said—it’s a surveillance problem. Some kinds of advertising can have economic signaling value that does help build trust, but only if the medium is set up for it. (Fixing digital ads to pull their weight and fulfill the signaling role of advertising is a hard problem, and I should bug Rory Sutherland some more about how to solve it.)

Privacy tip: mobile apps

Native apps can track you in ways that web sites on a well-behaved browser can’t. In-app browsers inject JavaScript. Apps contain tracking SDKs. Privacy filters are limited.

Banning TikTok is a start, but all apps containing the TikTok SDK are TikTok, as far as harvesting your info goes. And other apps and/or SDKs could be sharing your info with other people you don’t want it shared with. There are a few ways to deal with surveillance apps.

  • Delete your account

  • Switch to using the service on the web

  • Limit your use of the app (For example, make a habit of checking Signal before you check surveillance apps, so people learn that it’s a better way to reach you.)

Remove all the surveillance apps from your phone that you can. This doesn’t necessarily mean cut off people who you can only reach on a surveillance app.

What do you think? The more you tell, the more you sell, right?

Social connections mean you live longer, so apps required to communicate with friends and family are worth spending some of your personal “privacy risk budget” on. In general I accept some risks on social connections and collaborating, but I try to be stricter when it comes to shopping and entertainment.

Good news: privacy is on the way up

Right now we’re in the middle of some positive privacy trends: more effective enforcement of existing privacy laws and regulations, more interest in new ones, and some software improvements that make some kinds of tracking harder. Here in California, some recent news includes

  • the Sephora case, which makes it clear that common surveillance marketing practices are sales under the CCPA

  • the Attorney General’s office has been clarifying rules on Global Privacy Control, and doing a sweep of law-breaking apps

  • CPPA, the new privacy agency, is starting up

State privacy law map More states are probably going to get privacy laws this year, so those outside California will be able to join in.

We have some advantages if we sound like we know what we’re doing. Surveillance and the attitudes underlying it are not changing right away, but you can shift things if you approach each action with something easy for the other end to do, and act like you know what you’re doing. They’re going to balance the time required to act on your privacy mail against the risk of not acting on it, and if the risk looks big enough, easier to do the right thing.

screenshot of “watchdog groups will test the law”

Enough of those and we shift the relative expected payouts of surveillance and non-surveillance investments. (Like RCRA got many companies to just cut down on hazmat instead of dealing with the required record-keeping.)

Privacy tip: mobile phone settings

Check your phone settings. On Apple iOS there are two settings for surveillance ads: one for most companies’ ads, and one for Apple’s own ads. Don’t forget to check both. (Yes, you probably have to scroll down for the second one. Well played, Apple.)

  • In Settings, go to Privacy & Security, then Tracking, and make sure Allow Apps to Request to Track is turned off.

  • Also in Settings under Privacy & Security, find Apple Advertising and make sure that Personalized Ads is turned off.

On Android, you can open Settings, go to Privacy, then Ads and select Delete advertising ID.

More info: How to Disable Ad ID Tracking on iOS and Android, and Why You Should Do It Now | Electronic Frontier Foundation

The kind of ad that privacy nerds get

If you get these settings right, your mobile app ads will get really crappy, really fast, but you’re limiting mobile time and maximizing web time, right?

General principle: trying to herd the money away from the worst places.

We said we would try to retrain the surveillance business.

Address the worst practices first. Doing the easy stuff first can create the wrong incentives. That’s why I put YouTube first here. Then level up your privacy skills and toolset.

Level 1: mix of effective and ineffective privacy practices and tools

Level 2: effective privacy practices and tools

Level 3: effective practices and tools applied in an effective way

Remember you are not going to be able to get individual protection that’s meaningful while still participating in society. It’s more about driving transformation.

Privacy tip: Google Chrome

This is a big year for Google Chrome.

  • Manifest v3: coming next year, will limit the ability of content blocking extensions to block dynamically. On the Google Chrome browser, ad blockers, along with tracking protection tools that work in a similar way to ad blockers, will soon be limited in what they can do. If your chosen privacy tools and settings are not going to be supported, you might have to switch browsers. Browser compatibility has gotten a lot better recently, so if you switched because a site you like was broken on your old browser, please check it again. (This may or may not affect other Chromium-based browsers.)

  • Privacy Sandbox: a variety of projects, including a mix of some actual privacy features, one big on-browser ad auction (an ambitious project) and some anti-competitive shenanigans. It looks like 2023 will be the summer of double ad JavaScript on Google Chrome: you’ll still have the old cookie-based stuff, but you’ll also have an experimental ad auction running inside the browser.

Because Google both competes as an adtech intermediary and releases a browser, antitrust concerns mean they have to try to make the browser fair to all the other adtech intermediary companies they compete with. This is now the subject of an ongoing investigation by the Competition and Markets Authority in the UK. James Rosewell, CEO and founder of a mobile data company called 51Degrees, started a long, complicated process.

If you do decide to keep Google Chrome, there is a bunch of brouhaha about the impending end of third-party cookies, but you can turn them off today without breaking much, if anything. (Sites already have to support browsers that don’t do third-party cookies.) From the Ad Contrarian newsletter:

  1. Open the Chrome browser. Click the three dot thing in the upper right corner.
  2. Click “Settings”
  3. In the left column click “Privacy and security”
  4. Click “Cookies and other site data”
  5. Click “Block third-party cookies”

Google Chrome also has new in-browser advertising features, confusingly lumped together as Privacy Sandbox. Check chrome://settings/privacySandbox.

Topics API preferences

The text in this settings screen is not especially helpful. Google’s “Topics API” is a general-purpose system to categorize users based on sites you visit. There’s nothing about it that’s limited to ads. Will probably be more useful for price discrimination, and worse, helps incentivize deceptive practices to drive web traffic. (Looks like that message is in open-source Chromium, so filed an issue. Anyway, if you have Google Chrome, turn this off.

Update 17 Jul 2023: If you have a web site, or if you administer desktop systems with Google Chrome on them, you can do a few more steps to protect others: Google Chrome ad features checklist

The attribution tracking chain

Attribution tracking chain

This is an oversimplified chart, but it’s a good start for learning about the “attribution tracking” chain and how to break it.

If you break the link between surveillance data on what you buy and surveillance data on what ads you saw, then it’s harder to justify investments in surveillance advertising— remember, we’re trying to move money from surveillance to other investments here.

That’s the main reason why you have YouTube on a browser container that is never used for anything else. It interferes with the attribution link between a video view and a sale.

Privacy tip: remember to vote

California has the CPRA because people voted for Proposition 24 in 2020. The CPRA isn’t perfect, but voting made a difference. While you’re voting, please don’t eliminate a candidate from consideration just because they’re using the big surveillance platforms. They’re hard to avoid completely. In today’s environment it’s generally better to make a little progress than to achieve privacy purity but lose the actual election.

Breaking the chain: where?

Once you have a mental model of the attribution chain you can allocate your time most effectively.

  • Software tools are usually lower effort but shorter range: they can protect an activity like a web session but can’t reach out to affect server to server communications.

  • Legal tools can reach out further, but are higher effort. And companies don’t always comply.

Different surveillance threats can be addressed at different levels. In general, earlier and more automated is easier, but legal tools have a longer reach, since they can touch systems that you don’t have a network connection to, and that automated tools can’t see.

Here’s a list of places to make a difference, from earliest to latest.

  • Don’t do a trackable activity (example: delete a surveillance app, don’t visit a surveilled location)

  • Don’t send tracking info (example: block tracking scripts)

  • Send tracking info that is hard to link to your real info (examples: use an auto-generated email address system like Firefox Relay, or churn tracking cookies with Cookie AutoDelete)

  • Deny consent or send an opt out when doing a tracked activity: Global Privacy Control

  • Object, exercise your right to delete, or opt out later, after data has been collected but before you are targeted (CCPA Authorized Agents, RtD automation tools like Mine)

Privacy tip: CCPA script

Delete your info from the largest surveillance firms. Here’s a partial list. Later on we’ll cover how to make a personalized list based on who has info on you.

I do these pretty quickly with Mutt and shell.

    ccpa [email protected]

The script is here: CCPA opt out, nerd edition

The sample letters, as templates, are in this privacy-docs Git repository.

Teaching the escalation path

If you’re protected from a company by automated tooling, you can mostly relax. (This applies to most of the companies that can only track you by an identifier stored in a third-party cookie. Turn off third-party cookies and you’re fine.)

If they accept a GPC, almost as good. Remember that GPC applies to all uses of your information, not just the current session.

Manual opt outs are where things start to get time-consuming. Most opt outs are either not compliant, or take advantage of loopholes in the regs to make you do extra work.

If they give you any grief on an opt out, and make opt-out as hard as a “right to know” (Rtk), then go ahead and escalate to an RtK. According to one study, a manual RtK can cost a company $1400 to handle.

Then if they give you grief on your RtK, you can file a complaint with the AG.

Be patient. Right now there are way more companies that have your personal info than companies that have a qualified CCPA manager. Be patient, and remember, any individual request might end quickly or turn into a long thread. The object is to change the expected payoffs for investments in surveillance relative to other investments.

Privacy tip: keep your Facebook account

Why does it matter if a company is sending my info to Facebook if I never log in to Facebook (or related sites?)

Just as the lowest point on a toxic waste site is generally a good place for a sampling well to check the progress of remediation, Facebook is a good place to sample for your personal info. It’s a low point to which most of the firms who have your info will eventually share it.

Remember, get a Facebook account, but don’t install the mobile app.

  • click on your face

  • select Settings and Privacy from the menu, then Settings

  • Select “Ads” on the left column

  • Under “Ad Preferences” select Ad Settings

(We’re getting close…)

  • Select “Audience-based advertising”
Facebook ad settings
  • Now scroll down and select “see all businesses”

You’re probably going to get a lot of these. Let’s have a look at a few.

Facebook audience examples

First is a store where I actually bought something. Facebook’s biggest advertiser at one point.

Then a DTC retailer that I’ve never heard of. Who sold them a list of “hot prospects?” I remember when this list used to be all car dealers.

Then two surveillance marketing companies—they can get a Right to Delete.

Finally another retailer, wait a minute, I haven’t bought from this one either. The closest I have come to this one is the year that they had the LinuxWorld conference and the Talbots managers meeting at the same convention center in Boston. Remember, surveillance marketing and scam culture are two overlapping scenes—you can often see when a legit company has been sold a customer list that isn’t. (maybe the scammer who got the car dealers has moved on to retailers?)

Facebook health ad

Of course, while you’re there, building a list of who to send opt outs to, don’t buy anything from the ads.

Privacy tip: Browser checkup

Remove extensions you aren’t sure about. A lot of spyware and adware gets through.

Run the EFF checkup tool, “Cover your Tracks”. Sad but true: the more custom your Linux setup is, the more fingerprintable you are. Technical protections won’t cover you, browser protection needs to combine the technical and the legal.

Turn on Global Privacy Control. This will automate your California do not sell for sites you visit. Still not supported everywhere, but will have more effect as more companies come into compliance and more jurisdictions require companies to support GPC.

Content blocking. This next one is a tricky subject, so I’m not going to say anything about it. I’ll just quote the FBI’s public service announcement, Cyber Criminals Impersonating Brands Using Search Engine Advertisement Services to Defraud Users.

FBI Public Service Announcement on search advertising
FBI scam protection tips

It all comes back to scam culture. Be warned when looking for an ad blocker on browser extension directories. A lot of them are spyware or malware.

Some privacy extensions I use include:

Browser extensions part 1
  • ClearURLs to remove tracking parameters from URLs, and often speed up browsing by skipping a redirect that’s just there for tracking.

  • Cookie AutoDelete. Cleans up cookies after leaving a site. Not for everyone—it does create a little extra work for you by making you log in more often and/or manage the list of sites that can set persistent cookies. But it does let you click agree with less worry since the cookie you agreed to is going to be deleted.

  • Facebook Container because, well, Facebook.

Browser extensions part 2
  • NJS. This minimal JavaScript disable/enable button can be a good way to remove some intrusive data collection and growth hacking on sites where the real content works without JavaScript.

  • Personal Blocklist is surprisingly handy for removing domains that are heavy on annoyances and surveillance, but weak on actual information, from search results.

  • Privacy Badger blocks tracking scripts and will also turn on Global Privacy control for you, by default. This will not have much of an impact right away but will start to do more and more as more companies come into compliance. (More companies are required to comply with California privacy law than there are people who understand how to comply California privacy law.)

Conference program listing

Most of us have built up a collection of privacy tools, including browser extensions, settings, and services. But privacy threats keep changing, while at the same time new kinds of tools and services have become available. This talk will help maximize the value of your privacy toolset for today’s best options, while helping you plan for the future.

Not only can you protect yourself as an individual, but your choices can help drive future investments away from surveillance into more productive areas. (Surveillance marketers and their investors think they can train us – but with the right tools we can train them right back.)

Today most of us are at level 1 or 2 on privacy.

Level 1 You do something about privacy and take a mix of effective and ineffective actions

Level 2 You take mostly effective actions, but don’t allocate your time and resources for maximum effect

Level 3 You take effective actions, efficiently selected and prioritized

Ready to level up? Now that California law codifies our right to check out how our personal information is shared, that means we have an opportunity to optimize our privacy toolkits and habits, and focus where it counts. We’ll cover:

  • Ad blocker myths and facts, and why the surveillance marketing business loves some ad blockers

  • The most important privacy extensions for most people (and they’re not what you’d think)

  • Corporate surveillance about you that never touches your device, and how to reach out and block it

  • Don’t use the law on a problem that a tool can solve faster, but don’t try to stretch a tool to solve a problem that needs the power of the law

  • Where to add extra protection for special cases

Each of your individual privacy choices has a bigger impact that just the protection that it provides for you. In the long run, your real impact will be not so much in how you’re protected as an individual, but in how you help drive future investments away from surveillance and toward more constructive projects.

08 Feb 2023

LLMs and the web advertising business

Large Language Models again.

It seems like everybody wants to talk about the this new shiny thing will affect web publishing and advertising. What’s going to happen to ad revenue for human-authored web sites, when Large Language Models can generate large quantities of text, and ads will run on that instead?

I wish I could come up with a really futuristic answer, but I’m going to be boring here, and just say LLMs will mean more of the same. Web ads are already running on plain old copied content, and LLM-generated content will fill the same role. Web ad intermediaries are choosing to look the other way on infringing sites, even when a site looks hella sketchy. They’ll do the same with placing ads on sites full of LLM-generated content.

Here’s a short conversation with ChatGPT, in which I ask about a confidential site with the name of a well-known trademarked fictional character in the domain name. The site is running ads from a well-known company. This is a totally unedited ChatGPT session (except for redacting the domain name and putting my text in bold.

Hi, do you think that [DOMAIN REDACTED] sounds like a legal web site?

No, the website “[DOMAIN REDACTED]” likely does not offer legally obtained or licensed audio books of the “Harry Potter” series. It is important to be cautious when downloading or streaming copyrighted material from websites, as doing so may be illegal and could result in criminal or civil penalties.

I would recommend purchasing audio books from reputable and authorized sources to ensure that you are obtaining legally obtained and licensed material. Additionally, supporting the creators and rights holders by purchasing their works is important for ensuring that they can continue producing new content in the future.

So, should legit businesses run web ads on [DOMAIN REDACTED] or is that not a good idea?

It would not be a good idea for legitimate businesses to run ads on “[DOMAIN REDACTED],” as this website likely offers copyrighted material illegally and could be engaging in piracy. Advertising on websites that promote or facilitate illegal activity can harm a business’s reputation and potentially lead to legal consequences.

Additionally, supporting websites that engage in piracy by running ads on them sends a negative message about a business’s values and could discourage customers from doing business with them.

It is important for businesses to ensure that their advertising partners and the websites on which they run ads align with their values and comply with the law. By doing so, they can help protect their reputation and avoid potential legal and ethical issues.

Thank you for your help, you explained this well.

You’re welcome! I’m glad I could help. If you have any more questions, feel free to ask!

31 Dec 2022

use a Large Language Model, or eat Tide Pods?

The current hot Internet trend is large language models (LLMs). Before that it was NFTs, and before that it was eating Tide Pods. I know that these things go in cycles, but perhaps this hype cycle may have moved on too quickly, and there may be some advantages to the previous generation. Let’s compare.

Availability

  • Tide Pods are available from many sources including online retailers, supermarkets, and convenience stores.

  • LLMs require substantial training investment, so are available from a limited number of sources at present.

  • Advantage: Tide Pods

Mitigating the adverse effects

  • The health effects of eating Tide Pods are already well understood. Poison control center staff members are already trained in this area, and are reachable quickly with a convenient toll-free call.

  • Fixing the results of actions taken based on LLM output could require specialized knowledge that presents challenges in training and/or hiring.

  • Advantage: Tide Pods

Time required

  • Tide Pods can be eaten in seconds, with or without chewing.

  • LLMs can produce text faster than most humans can read it, so may represent an open-ended time commitment.

  • Advantage: Tide Pods

Repurposing

  • After you have learned not to eat Tide Pods, the rest of the pods in the package are useful for laundry.

  • Clearly useful applications of LLMs remain to be discovered.

  • Advantage: Tide Pods

Educational value

  • Eating Tide Pods helps teach you not to believe everything you find on the Internet

  • LLMs help teach you not to believe everything you find on the Internet

  • Advantage: (tie)

This objective evaluation shows that organizations considering investing in LLMs should also consider, as an alternative, eating Tide Pods, which may be a technical direction that is faster and less costly to evaluate. Winding down from an investment in a hot technology is always challenging, but will likely be more manageable for organizations that did not buy in at the top—where LLMs are today and NFTs were last year—but instead consider the lessons to be drawn from organizations that are now ending NFT projects, and increase the ratio of learning to both cost and risk by choosing a better characterized technology.

Don’t eat Tide Pods. Don’t believe everything you read on the Internet. Don’t eat Tide Pods. (also, remove the AI results from Google.)

29 Oct 2022

a survivability onion for privacy tools?

Starting with some good news. There are several versions of the Survivability Onion but most appear to be US government work and so not copyrighted. I’m going to borrow it because it looks like a good starting point for setting priorities for designing a privacy tools and and services stack. Yes, in the long run, the real impact of individual privacy measures will be not so much in how you’re protected as an individual, but in how you help drive future investments away from surveillance and toward more constructive projects.

It would be good to get more privacy people leveled up:

Level 1 mix of effective and ineffective actions

Level 2 effective actions, but applied haphazardly (this is about where I am now)

Level 3 effective actions, efficiently selected and applied

If you want privacy, prepare for surveillance? All right, onion time.

Integrated SoS Survivability Onion chart, showing layers: pre-emptive encounter, pre-emptive kill,
  avoid/prevent encounter/exposure, avoid detection, avoid targeting, avoid engagement, avoid hit/application, avoid kill

A survivability onion is a way to visualize layers of protection. From Integrated Survivability Assessment:

The separate and independent “layers” of functions, which the threat has to “penetrate” to kill the system in a typical engagement, are most often represented mathematically by independent probabilities; thus, the overall probability of survival is the product of the independent component probabilities.

Since you have limited resources when designing an armored vehicle or whatever, you can apply your limited weight and money budgets to the most effective combinations of layers. The object is to maximize the probability of survival, which is the product of the probabilities of the attack getting through each layer.

And hey, that sounds familiar. We have a limited amount of time, money, and political juice for privacy stuff too. I think we can visualize the protection options in a similar way. Here’s a first attempt at a survivability onion for a personal privacy stack, with some examples of what fits into what layer.

  • Don’t do a trackable activity (delete a surveillance app, don’t visit a surveilled location, boycott a vendor)

  • Don’t send tracking info (block tracking traffic, either by using a tool like Privacy Badger to keep a tracking script from loading, or using a network filter like Pi-hole to prevent tracking SDKs from communicating with their hosts)

  • Send tracking info that is hard to link to your real info (use an auto-generated email address system like Firefox Relay, churn tracking cookies with Cookie AutoDelete)

  • Object or opt out when doing a tracked activity (Global Privacy Control)

  • Object, exercise the right to delete, or opt out later, after data has been collected but before you are targeted (Authorized Agent services like Permission Slip by CR, RtD automation tools like Mine)

So that’s step one—define the layers of the onion.

Next step: assessing threats. (Will add a link here soon.)

09 Oct 2022

honest answer to a CCPA email

(When privacy people send an opt-out, RtK, or RtD, you have to acknowledge it. May never see them again, so why waste the opportunity? This is a draft of a letter to include with the response. Might not get this approved as is, though.)

Updated 21 Oct 2023: Permission Slip and Google Chrome updates.

Dear [name],

This attachment does not contain any information specific to you or any legally required information.

Still reading? Good. First of all, I want to wish you well on your privacy quest. I hope that you will return and check out our company someday in the future, when all this corporate surveillance stuff is over.

Sometimes, when you’re installing and configuring privacy protection tools, or taking actions like the request you sent to us, it can feel like you’re barely making any progress. But each of your individual privacy choices has a bigger impact that just the protection that it provides for you. In the long run, your real impact will be not so much in how you’re protected as an individual, but in how you help drive future investments away from surveillance and toward more constructive projects.

Please keep doing what you’re doing. You’re not just making a better society, you’re helping me. If creepy surveillance works less well as a way to reach you, then I won’t have to deal with as much creepy surveillance at work. And the less that creepy surveillance matters on the company side, the less risk and work for you. We can help each other out here.

Nobody has time to do every possible privacy tip. From the inside of the corporate surveillance business, I have a pretty good point of view to see what works best. From what I can see, here are some of the steps that you can take that look like they will be most effective over the next year or so.

Update 20 Jan 2025: effective privacy tips is my current list of individual privacy advice

07 Oct 2022

meow

Don’t forget to move the cat! It’s street sweeping day on our side of the street.

On my old Android phone, I once made an alarm for move car on street sweeping day. But I did a typo, so I used to get alarms about moving the cat. We don’t have an actual cat, but it’s now an in joke here anyway. Did you move the cat? Meow! It’s just one of those things. And now I miss my old phone.

I miss it, but it’s right here. I just picked it up. If I turned it on, I would be able to visit a web site, or take a picture, or whatever. All the hardware is in perfect working order. But I can’t take the risk of actually using it for all the stuff that people expect me to use a phone for, because it’s no longer getting software updates. And of course we all agree that it’s important to keep your devices updated. The updates stopped, so I switched phones. (Yes, I had to abandon my plan for an all USB-C device collection and buy some Lightning cables.) I had a useful thing that I had to turn off for the last time.

I still have my old phone, but I don’t have it in a way that matters, in a way that I could really use it. It’s satisfying to keep a useful thing working, and the opposite of satisfying to put something in the e-waste box, or worse, the maybe for projects later stack. I know that Google has no legal obligation to me, there’s NO WARRANTY, that’s life in the big city. I shouldn’t care, right? It’s just a phone. Somebody at Google nailed one of their OKRs by letting my old phone go out of support. They won. I lost.

10 Sep 2022

Topics API notes and links

(Update 15 Jun 2024: add link to noyb.eu)

(Update 10 Aug 2023: add links to recent issues.)

(Update 5 Jul 2023: add “Topics API enables advertisers to discriminate against legally protected groups without an incriminating paper trail.”)

(Update 26 Apr 2023: add screenshot.)

(Update 15 Apr 2023: add some more recent links and explanation)

(Update 31 Jan 2023: Google has made a commitment to the Competition and Markets Authority in the UK that Google Search will not use a site’s decision to opt-out from the Topics API as a ranking signal.)

(Update 14 Sep 2022: add non-sensitive but possibly user-unacceptable use cases)

Some notes on how Google Topics API is likely to affect various web stakeholders, in order by priority of constituencies. Numbered links are to issues in the GitHub repository.

Google Topics API is in Google Chrome but not in the Chromium browser.

Users

Topics API rewards deceptive sites and traffic sourcing practices. Topics API leaks valuable audience data from one site to another. As I wrote on AdExchanger, this is a problem that it shares with third-party cookies—enabling the monetization of low-value sites of little interest to users. As the proposal currently stands, any site that a user can be tricked into visiting, or that their device can be manipulated into visiting, is in a position to collect ad revenue based on Topics API data that the browser supplies.

Topics API gives people an additional incentive to create deceptive sites and drive traffic to them by deceptive or harmful means. Operators of deceptive sites will see users as Topics piñatas—hit them with something like email spam or a deceptive link, get them to drop valuable Topics, profit. Technically this is no worse than third-party cookies, but third-party cookies are on the way out for a reason, and Topics API is positioned as a longer-term thing.

A new Topics API feature makes it even more friendly to monetizing problem sites. Adtech intermediaries can now obtain valuable topic data on illegal and/or low-reputation sites without allowing the browser to see those sites’ topics. See #92 (comment)

Topics API will enable price discrimination. Topics API would help retail, travel, and other sites identify more or less price-sensitive users based on their interests. And Topics API is more likely to be used for personalized pricing than other data sources, because it looks safer.

Many existing sources of data, such as geolocation or device info, might be correlated with membership in legally protected groups of people. However, Topics API is carefully intended not to be tied to sensitive information, which lowers the perceived risks of using the API for personalized pricing. (see #34)

Topics API discrimination does have some public sector uses that might be acceptable in some jurisdictions but not others. Users in the USA might accept the TSA using Topics API to optimize passenger screening, the IRS using it to select taxpayers for audits, or the CDC using it for public health data collection. (Update 13 Mar 2025) No, wait, they would probably freak out. But some other countries might use it for purposes that are not acceptable to some users there.

Some uses of Topics API outside the conventional publisher/advertiser use cases would be more or less acceptable to users, even if Topics API is successful in removing the ability of a caller to detect a sensitive user characteristic. And some jurisdictions might require use of a public-sector caller by sites—so not all callers are necessarily chosen or approved by the first party.

  • HMOs or public health services classifying patients by healthy/unhealthy habits

  • LMSs classifying diligent and distracted students

  • Tax authorities classifying likely big-spending users for tax audits

More on non-advertising use cases: #216

Topics API could reveal sensitive information or identify individuals in combination with other data. Some privacy concerns for users are covered in a blog post from DuckDuckGo. There are some users who prefer personalized advertising, but a future personalize my ads feature would need to be designed to avoid user harms that would affect both personalizers and non-personalizers.

Topics API enables machine learning systems to build profiles of users over time. Private Browsing 2.0 covers why Apple WebKit does not implement this API. You haven’t told any website with access to your email address anything that’s been going on in your family life. But the data broker has been able to read your shifting interests and store them in their permanent profile of you — while you were reading the news. Now imagine what advanced machine learning and artificial intelligence can deduce about you based on various combinations of interest signals. What patterns will emerge when data brokers and trackers can compare and contrast across large portions of the population? Remember that they can combine the output of the Topics API with any other data points they have available, and it’s the analysis of all of it together that feeds the algorithms that try to draw conclusions about you. Apple Safari doesn’t have this specific feature but you still need to turn off advertising measurement

Topics API enables advertisers to discriminate against legally protected groups without an incriminating paper trail. Consider a hiring manager who discriminates against job applicants who have a particular disability. This is illegal in the USA, and an online advertising service that offers advertisers the ability to explicitly discriminate by disability would face consequences.Facebook had to eliminate some targeting features over explicit discrimination—but biased advertisers still exist, and will tend to seek out systems that facilitate discrimination where available. And users who are searching for a job would likely choose not to share information on their disability with web sites.

Topics API will allow a biased advertiser to bypass both legal and user choice protections. A Topics API caller can feed the topics received from a user into a machine learning (ML) system that is simply optimized to get conversions, which in this case are hiring decisions by the biased manager. Using the presence or absence of particular topics for a user, the ML system learns to identify people whose disability would stop them from being hired if they clicked on the ad, applied, and got an interview. And no particular topic would appear sensitive to the user, so even if the user did decide to manually filter their Topics API list, they might not be aware of how the ML system uses apparently non-sensitive topics (or the absence of them) to determine that they should not receive a job ad from a particular company.

In this case the hiring manager does not expect to be able to track any one user individually, or even to receive no applicants with a disability—but is willing to pay extra for an opportunity to advertise in a way that results in reaching fewer applicants with the disability. And the only record of a discriminatory ad placement on either the advertiser or Topics API caller side is inside the ML system, which can be prohibitively difficult to interpret. (More on this problem: browser topics tracking and the prejudiced landlord problem) Related: #221

Martin et el. Respondents did not judge the privacy solution proposed—to have a browser collect data, keep the data on the local device, create inferences for use in advertising —to be a privacy solution over third-party tracking and the use of data brokers in advertising.

Topics API is misrepresented to users. How Google describes Topics API to users: Google Chrome dialog Not disclosed is the fact that it’s an experimental system with unknown discrimination risks. See Google’s own FAQ. Chrome can and will take steps to avoid topics that might be sensitive (i.e. race, sexual orientation, religion, etc.). However, it is still possible that websites calling the API may combine or correlate topics with other signals to infer sensitive information, outside of intended use. Chrome will continue to investigate methods for reducing this risk.

Update: The misrepresentation problem is the subject of a complaint with the Austrian data protection agency from noyb.eu. Google’s internal browser tracking was introduced to users via a pop-up that said “turn on ad privacy feature” after opening the Chrome browser. In the European Union, users are given the choice to either “Turn it on” or to say “No thanks”, so to refuse consent. In a letter to noyb, Google argued that choosing to click on “Turn it on” would indeed be considered consent to tracking under Article 6(1)(a) of the GDPR. In reality, the company concealed the fact that selecting this option would turn on first-party tracking. Full complaint (PDF)

Update: it seems like regulators are on to this one. A recent CMA report recommends a redesign and says, Surface the Topics dialogue box periodically and consider approaches, based on user research, to remind or prompt users to revisit other Privacy Sandbox settings. In a subsequent report, the CMA states:

We are concerned that the Topics consent user interface may not adequately inform users about how their personal data is used or how the topics generated may be used for purposes wider than interest-based advertising (e.g. as determined by organisations that decide to use the API). To address this concern, Google has agreed to update the Topics API consent interface and to strengthen developer guidance to highlight the requirement to obtain purpose-specific consent prior to calling the API. We are awaiting the results of these updates

Topics API does not provide the same privacy guarantees to all users. Beugin and McDaniel: We find that 46%, 55%, and 60% of the 1207 users in the dataset are uniquely re-identified across websites after only 1, 2, and 3 observations of their topics by advertisers, respectively. More literature:

Web page authors

Topics API enables leakage of audience information across sites. The main beneficiaries of Topics API audience data leakage would be sites that advertisers would not choose to support, along with the adtech intermediaries that enable them. Ad revenue that goes to sanctioned, copyright-infringing, brand-unsafe or other problem sites is not available to the authors of legit web pages (and a waste for advertisers, but they’re not on the list of constituencies that I’m using). (More on the persistent problem of ads on copyright-infringing sites)

Example of a copyright-infriging site Topics API is designed to allow Google to continue to transfer valuable audience info from legit sites to confidential sites like this one, which accept less ad money because they use other people’s content. (URL and logo blurred because it’s confidential. Their ads.txt is one line, and their Google ad account id is a multiple of 19.)

Cross-site data leakage is a problem for users because of the incentives covered above, but also a problem for legit web sites because of revenue. (And yes, it is better for users when ad revenue goes to sites they like, and not sites they got tricked into clicking through to.)

Users and legit ad-supported sites share an interest in having advertising pay for the largest possible value in ad-supported content. Meanwhile, Google wants to maximize ad revenue while minimizing the amount paid for content. Topics API is part of a set of projects to help Google come off ahead in this contest. The other parts are a Google service called Performance Max that hides information about what site an ad appeared on from the advertiser, and Google’s decision to run a large number of ads on confidential infringing or brand-unsafe sites without disclosing the domains on an industry-standard file called sellers.json. (IMHO it’s impossible to understand Topics API outside the context of Performance Max and confidential crap sites, but we won’t see the slide that has all three on it until it comes out in discovery in an antitrust case, or gets leaked.)

Topics API is biased in favor of large, general-interest sites. Topics API assigns only one or a few topics to each domain. A niche site that covers one or a few topics would provide significant commercially valuable data to the system, while a large site such as YouTube, that covers every topic, would provide no usable information. This is a pretty obvious problem for competition policy, so I’m kind of surprised they left it in. (See #206 and #224.)

Topics API encourages consolidation of third party services. The more sites that a third-party service runs on, the more useful information it can collect from Topics API. Since an individual site can’t get usable information out of Topics API, sites will have to A/B test different combinations of third-party iframes as callers. Sites will keep callers that provide more good Topics, and drop others. After repeated cycles of A/B tests, where callers that get dropped by one site start providing worse data to other sites and get dropped there too, we would be left with one domain doing all the Topics API calls. (see #82) This is also a pretty glaring antitrust issue.

See also a Google comment on the W3C TAG review: Note that the number of sites that can both call the API, and receive an unfiltered response, is quite small. This is because the caller would have to have observed the user on a site about that topic in the past to get through the filter. The vast majority of sites that can call the API will actually receive an empty list. Topics API is noisy enough to be much more useful for a large third-party caller than for a smaller one. (Is the noise level deliberately tuned to be too noisy to be useful for companies much smaller than Google, but usable at Google’s size?) More discussion of Topics API as a driver of consolidation and centralization in issue #73 Toward an oligopoly of “Topic providers”?.

The Topics API README states that,

One of (Interest-based advertising)’s benefits is that it allows sites that are useful to the user, but perhaps could not be easily monetized via contextual advertising, to display more relevant ads to the user than they otherwise could, helping to fund the sites that the user visits.

This goal can be achieved without the extensive leakage of site audience data made possible by Topics API. (Alternatives moved to a separate section, below.) Web authors and publishers might be able to ameliorate some of the effects of Topics API by, for example, opting out their signed-in users, or opting out of Topics API data collection entirely until YouTube channels are treated as separate areas with their own topic sets, on an equal footing with domains (see #17).

Topics API is designed to be difficult to A/B test. Ad-supported web publishers, and ad services that manage ads for them, generally make a practice of testing the revenue impact of turning different data sources on and off. Unlike other data sources, Topics API is controlled only by a Permissions-Policy HTTP header, which means that turning it off can only be done at page load time, and only with access to the web server or CMS code or configuration. In theory a publisher could choose not to load a script that only called Topics API, but Topics API so far has been combined with other third-party functionality, so needs its own A/B testing control. (#208).

Update 31 Jan 2023: The SEO problem is fixed. (Opting Out Of Google’s Topics API Won’t Affect Search)

User agent implementors

User agent implementors have limited developer and tester time available to implement and support new features. All software projects must leave some good ideas un-implemented in order to devote time to the features they decide to do. And as the feature set of a browser grows, so does the required ongoing maintenance. Choosing to implement and support Topics API, even for a large company, would mean failing to do a useful feature. Some Google Chrome features are good, and out ahead of other browsers. It would be a waste to keep putting time into this one.

Alternatives

There are two alternatives to Topics API that also support the proposal’s intended functionality, that might be wanted by the users who prefer personalized ads.

The main difference is that SDA and FLEDGE both give a gatekeeper role to a party with an incentive to avoid problematic ad placements and data leakage. With SDA, one publisher can control use of data to place ads on other publishers’ sites (audience extension). Depending on how the final version of FLEDGE comes out, a variety of parties could be interest group owners. None of these proposals are complete or perfect, but Topics API appears to be too friendly to good ad/bad site placements to keep pursuing.

What do do about it

How to protect yourself—and also protect desktop users you support and visitors to your web site: Google Chrome ad features checklist

28 Aug 2022

AADC compliance questions for small web sites

(update 31 August 2022: third-party analytics for option 2?)

I am not a lawyer, and this is not legal advice. I’m just trying to figure out what would be the right questions to ask a California lawyer about how a small or medium web site can comply with AB-2273: The California Age-Appropriate Design Code Act, if it becomes law. What would be the available options?

This is not about apps, games, services that develop a lot of their own code, platforms, or UGC forums. See California Legislators Seek To Burn Down The Internet — For The Children by Eric Goldman for some questions on those. Compliance issues for games, especially those with an in-game economy, will be a lot more interesting. This is just about regular web sites that have unique content, but pretty generic functionality.

Will fill in with more info as I learn it.

Option 1: Don’t be a business as defined by CCPA/CPRA. A CPRA business has to have at least one of the following:

  1. annual gross revenues in excess of $25,000,000
  1. annually buys, sells, or shares the personal information of 100,000 or more consumers or households
  1. Derives 50 percent or more of its annual revenues from selling consumers’ personal information.

Typical RTB web ads, as used by a small/medium site, are based on sale of personal info by the CCPA/CPRA definition. Some other third-party tools, like comments and social widgets, also sell info on the users. So if we get 100,000 or more normal ad users in a yearyear, not month (that is, visitors who are not blocking trackers or opting out of sale), then a for-profit site is a CCPA/CPRA business, and required to comply with AADC. And if we have fewer than 100,000 annual users but make half our money or more from RTB ads, we’re also a business.

Option 1A: don’t run the site as a for-profit business: start or find a non-profit org to own the domain and assets

Option 1B: stay under $25 million in revenue and either don’t run any RTB ads or third-party resources, or shut them down for the year after 99,999 visitors saw the ads.

Option 1C: (and this is where it gets tricky, do not know if this would work) Stay under $25 million, run RTB ads, but do all the right legal and configuration incantations to put all third parties on the page into a mode where no sale is happening. Basically treat all the users as if they had done a CCPA opt out. Does this work?

Or could we combine 1B and 1C, and flip the ads to opt-out mode after 99,999 users in a year?

Option 2: don’t make the site likely to be accessed by children. That’s a hard one. Likely to be accessed by children can be any of these:

  1. The online service, product, or feature is directed to children as defined by the Children’s Online Privacy Protection Act (15 U.S.C. Sec. 6501 et seq.).
  1. The online service, product, or feature is determined, based on competent and reliable evidence regarding audience composition, to be routinely accessed by a significant number of children.
  1. An online service, product, or feature with advertisements marketed to children.
  1. An online service, product, or feature that is substantially similar or the same as an online service, product, or feature subject to subparagraph (B).
  1. An online service, product, or feature that has design elements that are known to be of interest to children, including, but not limited to, games, cartoons, music, and celebrities who appeal to children.
  1. A significant amount of the audience of the online service, product, or feature is determined, based on internal company research, to be children.

The tricky part here is that COPPA children (under 13) are different from AADC children (under 18). Trying to comply by making a site not likely to be accessed by children might work if we were talking about COPPA-age children, but under-18s? People that age had better be reading about a lot of different stuff— they’re going to be voting soon. (Is the webcomic Terminal Lance likely to be accessed by children because it’s of interest to 17-year-olds thinking about joining the Marine Corps when they graduate from high school?)

Option 2 looks like a dead end. We might start off thinking we’re not running a kids site, but no site maintainer can know if their niche topic is going to get covered in a MOOC, a YouTube video, a Discord or Reddit thread, or whatever, and all of a sudden the site becomes a must-visit for teenage users. (which is not a bad thing, right? Hey, kids, go get prepared to win Internet arguments about Sparta.)

There might be a way to do option 2, if the site has a third-party analytics provider that has age information. Would an aggregated report on the site audience be good enough to claim that the site is not likely to be accessed by children? If a site could rely on this, then that would let you leave RTB ads in normal mode and not do any of the remaining compliance stuff.

All right, let’s keep going. If option 1 didn’t work, and we don’t have a way to show that the site is not likely to be accessed by children, then maybe we actually have to do DPIAs, plus either age estimation or put every user in the max privacy protection category.

An independent site is probably going to be running WordPress or some other open-source or SAAS WCMS. So we’re likely going to be able to ask around and borrow mostly workable DPIAs from open source (like sites can borrow the WordPress privacy policy today) and we will get sent DPIAs by every ad network or third-party service that we’re signed up for. More paperwork, but should be something that can be dealt with. (Hosting providers and ad services might add DPIA management to their service packages, too.)

Age estimation is not age verification (no, you won’t have to pay a creepy vendor owned by a pr0n site to track your users, if this becomes law somebody else will launch a lower-risk estimator service) but it’s still an extra piece of code to add. But it looks like this is a task that third party service(s) will either tell sites how to do and check up on, or do themselves. (We won’t have to do it if we were able to use Option 1.) If a site kept running RTB ads and didn’t either (1) put the ads in opt-out mode for everybody or (2) start running age estimation, then all the third-party services would be taking a compliance risk. It’s going to be like having to get a CMP, right?

Impact on the future of web ads?

It looks like the options for most sites will be

  • remove RTB ads and other third-party resources that count as a sale or share

  • put all third parties into opt-out mode for all users maybe???

  • rely on 3rd-party analytics to prove that the site is not likely to be accessed by children maybe???

  • Do DPIAs (copied from the same open source projects as site code?) and turn on whatever age estimation service the third parties will require.

This is extra work for legit sites, but I’m cautiously optimistic about the big picture here. The first rule of understanding the impact of a proposed tech standard or regulation is that you can’t just look at the likely impact on small legit sites—you have to look at the relative impact on small legit sites compared to the impact on harmful sites/apps/channels (scrapers, terrorists, etc.) and the third parties that enable them. In this case, AADC means increasing existing legal risks for apps and third parties, and the advertisers that use them, so should help push up the value of ads on legit sites. A lot of advertisers who get told they’re reaching high-value adult audiences are really having their ads shown in apps for kids—if AADC makes that harder to do, it’s good news for the legit sites that actually reach those adult audiences.

To review, the web ad crisis is more of a supply and demand story than a privacy story. We have a problem mostly because third parties can artificially increase the number of saleable ad impressions, by offering ads on content that no advertiser would choose to sponsor. Creepy trackers mostly aren’t creepy because they want to be creepy, they’re creepy because they want to sell some advertiser an ad impression they wouldn’t otherwise buy. In general, regulations and tools that make third parties reduce the number of ad impressions they can sell are a step in the right direction.

No one law gets us all the way to the point where a fully-opted-out site is going to be viable on ad revenue alone, but it’s a step.

17 Aug 2022

custom audiences laundering

Everyone else getting ready for the CPRA rulemaking?

It looks like a tricky CCPA loophole is catching on, and I’m not sure of the best way to address the problem.

As I have mentioned before, the easiest way to catch a company violating the CCPA is:

  1. Buy something with Global Privacy Control turned on.

  2. Log in to Facebook and check Ad Settings. Look at Audience-based advertising.

  3. If the name of the company you bought from is in there, they sold (exchanged for something of value) your info, probably by sending it to Facebook as part of a Custom Audience, or possibly by using Facebook Conversions API. Anyway, they broke the law and got caught.

Simple, right? Looks like a way to make open-and-shut CCPA cases at scale. The new California privacy agency will be able to just copy over the same paperwork, because all the surveillance marketers are following the same tutorials.

Unfortunately, surveillance marketers already have a workaround. I have seen this doing RtKs (which is a good example of why RtKs matter).

  1. The original company (the business) collects customer email address from an opted-out customer, and possibly hashes it.

  2. Business passes the email address, or hash, to a third party.

  3. The third party passes the email address or hash to Facebook, and then deletes it. They can’t tell which of their client businesses passed information on which people (or they claim not to be able to).

That way, the name of the third party, not the name of the business, shows up in Facebook Ad Settings. Under the draft CPRA regulations, the third party is required to comply with a Right to Know or Right to Delete, but as far as I can tell, there’s no additional requirement for the third party to disclose who the original business was, or to be able to.

So a business that wants to violate the CCPA can run their Custom Audiences through a third party, and switch to a new third party if the old one builds up too many RtDs.

It looks like all we can really do is list the third parties involved in this scheme and RtD them? I know this is a good argument for why everybody needs an Authorized Agent service, but it would be less total work if there were a better way to find the original business that broke the law.

Any better ideas?

04 Aug 2022

the colored pencil test for web features

A web browser has a fiduciary duty to its user. A web browser is the agent of the user, and the user agent is expected to align fully with the person using it and operate exclusively in that person’s interest. But how can we figure out what is and isn’t in the user’s interest? Some browser functionality is stuff that the user asks for, but other browser functionality can do more complex tasks for the user. Not every user has time to learn and understand everything the browser does for them. Maybe a story from space exploration history can provide at least a start to figuring this out.

In 1965, NASA scientists received the first images of Mars from the Mariner 4 probe, as numbers printed on paper tape. In order to see the image, they translated the numbers to colors and drew individual pixels with colored pencils.

[hand-colored picture of a Martian landscape, in a wooden frame(/i/mars.jpeg)

Today, your web browser probably does something similar, many times a day. It turns a set of image data, from a file format such as PNG or JPEG, into a set of colored pixels that you can see as an image.

If, as a user, you had the time, you would probably choose to do exactly what the browser does. You want to see the images on a web site you visit. Other browser features, maybe not so much. Because the browser is supposed to be the agent of the user, a helpful way to answer the question, should the browser do this? is Would the user do this themselves if they had time?

For some browser functionality, answers can be found in the history of technology. People have put bookmarks in books as long as there have been books. So it makes a lot of sense for browsers to offer users a bookmark feature. But the more novel a feature gets, the harder it is to figure out whether to do it without more forward-thinking user research.

  • When you buy something, would you tell the seller about every ad you saw for the thing you just bought? Would you tell them if they promised to mix up your answers with other people’s and do math on them so they can’t tell what any one person said?

  • On your first visit to a new site, would you choose to tell the site about some of the topics that you’re interested in?

People do provide information about themselves to other parties they deal with. The browser’s role is to understand and facilitate the information sharing that people would choose to do on their own, if they had the time to learn about it, keep the necessary records, and answer questions. (For example, a browser might offer to auto-populate “where did you see our ad” fields on order forms, if user research shows that people are willing to fill in that field.)

A lot of user research has shown that many people don’t like the online advertising practices of today, but there needs to be more research on what they would accept. Ultimately the browser works for the user, and it would be a waste of resources to go too far down a direction that's too different from what people would choose to do for themselves. More: PET projects or real privacy?

21 May 2022

OTHER ATTRIBUTES

(update 30 Sep 2022: add link to Colorado regulations.)

Quick update just to day that I did get the specific pieces of personal information from that CCPA RtK that I mentioned before.

Lots of “OTHER ATTRIBUTES” like

Social: Influencer Propensity:14;

and

Purchase Propensity: Japanese Luxury Vehicle:10;

along with some sensitive ones. Added to the end of Example CCPA workflow.

It seems like we might have a loophole in the CCPA Regulations.

It says, A business shall identify the categories of personal information, categories of sources of personal information, and categories of third parties to whom a business sold or disclosed personal information, in a manner that provides consumers a meaningful understanding of the categories listed. but I don’t see where it says the business has to disclose the actual specific pieces of personal information in a way that provides consumers with meaningful understanding.

So it looks like they can send me a bunch of integer and letter codes without a key. This is similar to the Verizon RtK response, by the way. Will probably have to write this up for the CPRA rulemaking. Disclosure of a score really needs to come with a key or units in order to be meaningful.

One piece of good news is that they have me down as 80-something years old, which might help keep me safe from the target selection algorithms for People’s Liberation Army assassination drones, bodily fluid harvesting robots, and Texas abortion bounty hunters.

Updated to add some good news from Colorado. Colorado closed the loophole. The Colorado Privacy Act Rules state,

Personal Data provided in response to an access request must be provided in a form that would allow the average Consumer to make an informed decision of whether to exercise deletion, correction, or opt-out rights.

and

For instance, the Personal Data must be provided in a form that is concise, transparent and easily intelligible, and avoids incomprehensible or unexplained internal codes and identifiers.

Will companies maintain both incomprehensible and incomprehensible versions of their data dumping code? Or just make one version that’s compatible with both California and Colorado? Or will California catch up with Colorado?

14 May 2022

The problem with CCPA RtK workflows

Here is a follow-up to my comments at the pre-rulemaking stakeholder sessions for CPRA last week: Example CCPA workflow. printed copy of CCPA form

This is one where I had to print out and sign a form, and have it notarized.

As I pointed out before, making Right to Know work is really a critical first step for all the other CCPA tasks. If you don’t know which companies have which info, it’s almost impossible to prioritize who gets a CCPA delete, which requires more effort, and who gets a Do Not Sell.

If every data broker and surveillence marketing firm could make the Right to Know process a little different, then it would be nearly impossible for anybody to get anywhere with CCPA, and we might as well not have it.

What would be good to see in the CPRA rulemaking is one standard baseline process for Right to Know, that any company would have to do. They could, of course, add additional, more convenient processes, but there should at least be one that is of known difficulty.

Here is my suggestion.

  1. As a California resident, I go to the California DMV, show my California ID, and get a stack of printed Right to Know slips. These are pieces of paper and have my identifying information on them. The DMV is allowed to charge me for the printing costs.

  2. When I want to exercise my Right to Know, I fill out a company’s Right to Know form on their web site, and provide my contact info and postal address.

  3. If the company doesn’t have any info on me, they can email me to say so.

  4. If the company does have info on me, they send me a Business Reply Mail envelope.

  5. I put one of my Right to Know slips from step 1 in the Business Reply Mail envelope and send it back.

  6. The company checks my Right to Know slip and sends me a copy of my info.

This puts all the sensitive data handling either under the DMV’s roof, or in postal mail space where mail fraud is a Federal crime.

Naturally, a lot of people will come up with ways to do this more cheaply and conveniently on the Internet. That would be great. Putting a simple, standard, postal process in the regulations will set the baseline: you can’t make it too much harder than DMV+postal, or people will do DMV+postal.

04 May 2022

CPRA comments

I got to comment at a pre-rulemaking stakeholder session for the California Privacy Rights Act (CPRA). Here is my prepared text (not exactly what I said, but close).

As a California resident, I have had a right to know how my personal information is used, since January 1, 2020. On paper, that is. In practice, it’s a little trickier. In order to exercise my California privacy rights, I have had to run a lot of mazes. I won’t mention any specific companies here, but

  • I have taken selfies.

  • I have taken a selfie holding my California driver’s license.

  • I have scanned my California driver’s license, front and back.

  • I have taken a photo of my California driver’s license from an Android device, had it rejected, found an Apple device, taken a different photo of the same license, and had it accepted.

  • I have passed a quiz about my former addresses and bank accounts.

  • I have passed a quiz but only by getting some of the answers wrong because they would have been right if a family member of mine with a similar name was taking it.

  • I have printed and signed a document, and scanned it.

  • I have printed and signed a two-page document, gone to a notary, had it notarized, and scanned it.

Getting through the Right to Know process can be tricky. And I’m pretty good at paperwork, I have a variety of different devices to try, and I have a printer and scanner set up and working.

The reason I’m focusing on the Right to Know here is because it’s the CCPA right that helps me decide what to do with all the other rights. If I have a positive response to a Right to Know, then I don’t have to do a Right to Delete, and I can be more confident in sharing information with a company. There are tens of thousands of companies out there that might have some info on me, so I need to prioritize. But realistically, inconsistent and over-complicated handling of Right to Know by the companies I buy stuff from and by data brokers means that it’s a time-consuming effort for me to find out what’s going on with my personal information.

Under CCPA, I have the right to use an authorized agent. But authorized agent requests are even more complicated. Companies generally react to a fully documented authorized agent Right to Know by getting in touch with me directly and making me run the original maze anyway.

The worst part of all this maze running is sometimes there’s no cheese at the end of the maze. I have gone through all the work to do a Right to Know with one company, and ended up with, among other things, a list of the companies that sent my personal information to them. So, when I send a Right to Know to those companies, I should be able to get some information, right?

Not necessarily. Sometimes they claim not to have any information about me.

And in the case of one high-profile company, I can look up the public documents from an ongoing privacy lawsuit, and read employee depositions stating that they have certain kinds of information. But it’s information that they don’t disclose to me. A company shouldn’t be able to testify to one thing in court and then tell California residents something else.

In the 2020 election, Proposition 24 was supported by an overwhelming majority of California voters. Today, the CPPA has an opportunity to implement the intent of California voters by adopting regulations that make it practical for everyone to exercise their basic privacy rights.

As a California resident, I should be able to use a simple, standardized Right to Know process, such as being able to request a standard paper form and a Business Reply Envelope. Naturally, businesses and their service providers should be able to compete to offer consumers a simpler, faster online process as an alternative to paper forms and trips to the mailbox. But without a guarantee of a common, baseline simple opt-out process to fall back on, we’re still going to be stuck in a maze next year.

Thank you.

More: The problem with CCPA RtK workflows

26 Mar 2022

CPRA: do I feel lucky?

The remarkably popular California Privacy Rights Act (CPRA) takes effect on January 1, and that means the California Privacy Protection Agency (CPPA) is coming. But right now, it doesn’t look like surveillance marketers are that worried. The agency has a $10 million budget, and that’s chump change around here. The Big Tech legal departments and the big Palo Alto law firms probably spend more than that on guacamole.

The CPPA will be able to do a lot per case, but won’t have the budget for too many cases. To some extent, they’re going to have to rely on uncertainty. As a famous (but fictional) California lawman once said, Did I fire ten million dollars worth of privacy enforcement, or only nine? But being this is the CPRA, the most powerful privacy law in the United States, and would blow your head clean off, you’ve got to ask yourself one question. Do I feel lucky? Maybe I got that movie quote wrong. Need to check it.

It’s not as bad as it looks, though. The new agency doesn't have to deal with a zillion different CPRA violations. All the surveillance marketers did the same Big Tech training. To put it in terms that Big Tech will understand, CPRA enforcement scales. A lot of cases are going to be basically identical because a lot of the violators watched the same growth hacking videos and read the same documentation. The big companies have cleverly shifted a bunch of the CCPA (and soon, CPRA) risks over to the smaller companies. Kind of like how Amazon understands the risks of operating vans on crunch schedules in residential areas, so offloads those risks on Amazon Delivery Service Partners. That's the safe choice for them from one point of view, but it means that the CCPA/CPRA violator is technically not the big, lawyered-up company. The violator is a small company with a small legal budget, breaking the law in the exact same way as a bunch of other companies.

Big Tech's decision to outsource the legal risks means that CPPA will not have to put a lot of lawyer time into each violation. Just like New York City has a Citizens Air Complaint Program to handle one kind of common pollution violation (idling trucks), the CPPA will be able to do the same complaint over and over. Get the person whose info was misused to fill it in with screenshots and/or attachments, and all the CPPA lawyer has to do is check and sign it.

Here are a few common violations where enforcement will be able to scale quickly.

Using a GDPR workflow to do a CCPA job. Under CCPA, you can opt out of sale of your info. Under GDPR, you can object to processing. Similar concepts, but the details are different. Under GDPR, a company can make people go through an extra ID verification step for an objection. Under CCPA (and soon, CPRA), an opt-out has to be handled without ID verification unless the recipient has a good-faith, reasonable, and documented belief that it is fraudulent, and sends it to you in writing when you opt out. GDPR compliance doesn’t always get you CCPA compliance. Companies have had a couple years to fix this, but it's still pretty common. Info needed: Forwarded email or screenshot of illegal verification step.

GPC order, followed by sale/share. This is easy to check. Turn on Global Privacy Control in your browser. Order something on the web. Set a reminder to come back later to see if the company transferred (sold or shared) your info to Facebook. Info needed: screenshot of GPC install/activation, screenshot of Facebook Ad Settings.

Inconsistent data. Another good use of Facebook Ad Settings. Many companies default to dumping customer info over to Facebook, without really thinking about it. Then, somehow, some of them leave out the info that went to Facebook when they answer a Right to Know. Easy to spot. Info needed: copy of RtK results, screenshot of Facebook Ad Settings.

The Attorney General's office already has a Consumer Privacy Interactive Tool to help you report companies that fail to include the required Do Not Sell My Personal Information link on their web sites. Similar tools will be easy to add for the common violations—so that $10 million will go further than it looks.

10 Feb 2022

undisclosed risks at Meta

Lots of attention to one SEC disclosure about regulatory risks for Instagram and Facebook in Europe: Meta warns it may shut Facebook in Europe but EU leaders say life would be ‘very good’ without it.

So that’s a story, but from here in California the really interesting risk disclosures are a few that aren’t there. Meta has some risks related to ongoing violations much closer to home.

  1. The company has not been acknowledging CCPA Right to Know (RtK) letters from California residents who do not have Facebook accounts. If Facebook still maintains shadow profiles on people who do not have accounts, then refusing to disclose them to the person described in the profile is a CCPA violation. A person cannot be required to agree to a company’s Terms of Service in order to exercise their rights under CCPA. (Or maybe Facebook already bulk-erased the shadow profiles of anyone from California who they don’t have signed up as a user?)

  2. The personal information in Facebook’s Download Your Information downloads is incomplete. The extent of the missing data is gradually coming out in discovery in In re Facebook, Inc., Consumer Privacy User Profile Litigation. We don’t know exactly how much extra data they will have to start disclosing, but if they’re putting this much effort into fighting discovery in one case, it has to be enough to be worth mentioning as a risk to the SEC.

  3. Meta is ignoring or mishandling Authorized Agent RtKs. This is a fairly common issue at surveillance marketing companies. Authorized Agents are still rare, and a lot of companies don’t have a process in place to handle them correctly. But a high-profile company like Meta is likely to get a bunch of agent RtKs, and it’s reasonable to expect them to disclose the associated risks.

See the CCPA Regulations for more details. A Download your Information portal is allowed for complying with CCPA Right to Know, but only for people who already have accounts, and a business maintains a password-protected account with the consumer, and only when the portal fully discloses the personal information that the consumer is entitled to under the CCPA.

All three of these items are much more likely than the We will likely be unable to offer a number of our most significant products and services, including Facebook and Instagram, in Europe scenario. So they could really use a mention in the next Form 10-K or other documents.

23 Jan 2022

banning surveillance advertising

(update 21 Mar 2022: Add another link, edit down and simplify)

(update 26 Jan 2022: Add links, edit some material on subsection b)

New bill in Congress: the Banning Surveillance Advertising Act of 2022. Ambitious goal. May not get far this Congress, but it’s good to have a destination in mind. As Allison Schiff wrote on AdExchanger, Even If Targeted Online Advertising Isn’t Banned – Take Note Of Which Way The Wind Is Blowing. Remember, it took the EPA 23 years to get to the (almost) Final Step in Phaseout of Leaded Gasoline.

We can ignore the first round of industry group freakouts over the bill. It’s unrealistic to play the bad for small business card this early. Even though a lot of small businesses use surveillance ads today, legit small businesses could end up better off after a ban if they don’t have to keep playing negative-sum games against scammers trying to pick off their customers. Hard to predict the impact of any policy change on any advertiser until you know what they’re contending with in the existing system. (Yes, the 1971 ban on cigarette ads on TV seems to have had two effects: a short-term increase in profits for all tobacco firms, and an increase in concentration of profits for the largest brands. However the cigarette market was essentially all legal product vs. legal product competition, and not a single market including both honest and deceptive advertisers like today’s social media advertising.)

A surveillance ad ban won’t make people buy less stuff, but in the absence of surveillance ads, different gatekeepers will be more important. The big, obvious winners from a surveillance ad ban would be SEO, content marketing, affiliate programs, and search advertising. If you can’t find Kevin, you have to make yourself as findable as possible when he tries to find you.

Another set of winners in the post-surveillance-advertising economy will be niche content sites, especially sites that cover shopping-friendly content categories (like travel, fashion, and home improvement) and sites that run product reviews and buyers’ guides. Shopping help forums on general-interest sites, like /r/buildapc, will be more and more important for marketers to keep up with, and niche retailers with a curated selection will get a new amplification role in some categories. Measuring the effectiveness of non-surveillance advertising requires different skills, so that’s an opportunity for some kinds of research firms. A not-so-surprising winner when you think about it will be multi-level marketing. MLM spreads person to person.

So is surveillance advertising going to take as long to get rid of as leaded gas did? Personally, I don’t think so. The winners from the leaded gas ban were numerous but dispersed: people who live near a lot of motor vehicle traffic and people who might be crime victims. The leaded gas ban created no localized near-term big winner businesses, but there will be some for a surveillance ad ban. Expect a second round of industry group comments as the companies that will win from a ban get involved.

Time to look at some specific issues in the bill. Some suggestions.

Remove or limit the Custom Audiences exception. Subsection (b) has some language on Custom Audiences that provides some protection but not enough.

Paragraph (1) does not apply to the targeting of the dissemination of an advertisement based on information described in clauses (i) through (iv) of subparagraph (B) of such paragraph that is provided to an advertising facilitator by an advertiser or by a third party on behalf of an advertiser, if the advertising facilitator is provided a written attestation that the advertiser is not in violation of subsection (b) with respect to such information.

This is interesting in principle, but has a negative space problem. The larger a platform gets, and the more Custom Audiences it receives from more advertisers, the more that users can be targeted just based on which Custom Audiences they’re not in. Even if no single advertiser’s Custom Audience reveals membership in a protected class, the platform will be able to infer membership if it has enough lists of non-members.

For example, Facebook has an internal cascade of classifiers that use inputs from multiple sources to place both scam and non-scam ads. This placement decision happens per ad impression, in real time. As Facebook gets lists of finance publication readers and accredited investors as Custom Audiences, not only do the legit advertisers get the ability to target their audiences on Facebook, the system learns how better to match the precious metals scams with the most vulnerable targets. The Facebook database does not have to store a list of vulnerable targets for a scam. Vulnerable people receive the scam ads as the result of an on-the-fly decision, as the inevitable effect of not receiving the ads intended for members of the well-informed Custom Audiences.

Large platforms that receive lists of people who work in healthcare, or are interested in legit health content, end up targeting users with limited health knowledge for health-related disinformation. Well-intentioned NGOs, by using Custom Audiences listing their informed supporters and donors, are unavoidably helping to place disinfo and scam ads for the other side.

A law that addressed some older ad targeting practices while leaving Custom Audiences as an alternative could easily do more harm than good. And from a communications point of view, a Custom Audiences exception is confusing. Personally, the first question I generally get about any privacy law or tool is, Will this keep (some company) from sending my info to Facebook? Leaving an exception for just that means that this bill is letting through not just some of the highest-risk surveillance advertising practices, but some of the least accepted.

A minor problem with subsection (b) is that all an advertiser needs to supply is a written attestation. There are a lot of sketchy Facebook advertisers who will sign a written attestation about anything, and advertisers who are advertising in one jurisdiction from another jurisdiction, making it hard to enforce the law against them. Makes no sense to let a big platform avoid accountability because of one possibly meaningless document.

Conversion tracking is similar to Custom Audiences, but sends one customer record at a time instead of a list batched together. The law has to apply fairly to both.

Figure out how to allow user-configured profiles. Some ad platforms allow users to limit the number of gambling or alcohol ads they see, or to set other ad preferences. As long as profiles (along with surveys, preferences, and similar features) that affect ad selection are clearly under user control, it makes sense to allow them. Best to let the pro-personalization users get what they say they want, and don’t tell people that they can’t choose to avoid ads for a product category they have a problem with.

An ad can be personalized but not surveillance if it’s matched to the person receiving it based on information that the person intentionally supplied, knowing that it is being used for advertising, and with the option not to do it. This might be a reader survey, add this topic to my interests button, or some other feature. The personalization section of the law will need some careful work, because there is a risk that a large platform could end up with too many non-members of protected classes with personalization turned on, causing legit advertisers to go for the personalizers and leaving the non-personalizers to the scammers. A trustworthy content site with a high response rate to its reader survey might be able to safely use profiles for ad placement while some larger, lower-trust sites might not.

In the future, some users might choose to have their browser share an interest profile with some sites they visit. If a browser team can make a profile sharing feature, and fairly convice some users to turn it on, there should be a way to make it legal. (Browsers sending profile identifiers to sites without the user’s knowledge: still bad.)

Reform 47 U.S. Code § 230 to exclude surveillance advertising facilitators and providers of algorithmic social feeds from the definition of interactive computer service. This safe harbor was passed in the days of Usenet and simple mailing lists and web boards, which were interactive under control of the user or the user’s manually maintained subscription and block lists. Safe harbor is appropriate for content hosting or store-and-forward, but it was never intended for the kind of active selection and promotion of messages that surveillance advertising firms do.

Phase the ban in. If small businesses take longer to adjust to technical changes, give them longer, just like the EPA gave small refineries additional time to phase out lead in the gasoline they produce. Smaller databases present less risk, and larger companies have more technical capacity to comply with a ban. Possibly apply the ban in year 1 to companies with 100 million or more records of PII, then 10 million or more in year 2, 1 million in year 3, and so on. By the time it gets down to a small company’s list, post-surveillance tools and services will be well-tested and easy to switch to.

Running more productive CEO hearings

There are probably going to be committee hearings about all this at some point, but nobody wants another, I’ll have my staff get the answer to that hearing with minimally useful testimony from big platform CEOs. No CEO is going to be able to answer the important questions about how this stuff works in reality without preparation. Instead, please share some material in advance.

  1. Ask constituents for screenshots of scam ads, along with their identifying info and permission for the company that placed the ad to look up how they got the ad and discuss it with member of Congress and staff.

  2. Share a selection of scam screenshots and user info with the platform CEOs in advance. Let them know that you are going to ask, how did this exact person end up getting this exact ad? so that they can bring the answer to the hearing.

  3. When they’re in the hearing, live, the answer will give you a better idea of how the system works.

Some work is still needed to figure out the possible loopholes in a surveillance ad ban, so it would be good to get some reformed surveillance advertising pros to go over it a bunch of times and game it out. I’m sure that I have only thought of a few parts of what needs to be done here.

04 Jan 2022

#web5 resources

Memex overview

As We May Think by Vannevar Bush A record if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted.

Webs of trust and reputation graphs

PGP: Explanation of the web of trust of PGP

It’s Now Possible To Sign Arbitrary Data With Your SSH Keys

Distributed hash tables

Distributed Hash Tables, Part I by Brandon Wiley

Kademlia - Wikipedia

Microformats and note taking

From brainstorm to outline: Why I use OPML

Microformats – building blocks for data-rich web pages

Attention conservation

Your attention didn’t collapse. It was stolen | Psychology

Web 2.0

Web 2.0: Compact Definition? - O’Reilly Radar

On Web 2.0, application uses YOU!

Related/unsorted

Why it’s too early to get excited about Web3 – O’Reilly

Alternatives to the metaverse

Clever uses of pass, the Unix password manager

The internet runs on free open-source software. Who pays to fix it?

The Three Laws of Personal Devices

The Web3 Fraud | USENIX

Should you fix errors and contribute to Google Maps for free?

Competition & Privacy: It’s Both Or Nothing

Why We Need to Address Scam Culture

Cryptocurrencies: A Necessary Scam?

Get a Good Job With More Pay, the Landlord Takes It All Away

Thiel, Silicon Valley and the Rise of Tech Neo-Reaction

How CockroachDB went serverless

Web5 notes

A block contains a list of claims and a list of parent blocks. In order to check if a block is valid, you also need a block with a claim “[blob] is a valid signature for [block]” and a copy of the signature. You can’t check a block’s validity in isolation.

When choosing among blocks to form a chain, choose the oldest. When choosing which of two conflicting blocks to remove in order to make a chain valid, remove the oldest. The older a block is, the more likely that other nodes will already have a copy.

28 Nov 2021

improving web advertising

What does improving web advertising mean? Some possible answers.

  1. Advertising reaches the intended human audience

  2. Advertising is placed in a context appropriate to the product or service advertised (brand safety)

  3. Placement and reporting complies with information sharing and processing laws and norms (privacy where the information is personal, data leakage where the information is business)

You can’t crank up any of them and leave the others go. Fail on (3) and deceptive advertisers outcompete the honest ones so the whole medium becomes untrustworthy. Fail on (2) and low-quality scraper and misinfo sites can outcompete legit ad-supported sites. Fail on (1) and the advertisers stop supporting the web, and all the ad money goes to Facebook, which is sub-optimal for some or many advertisers depending on time preferences. There are enough deceptive and/or hostile parties out there that some of the failure modes for web ads are risks to civilization, so you are probably set if you’re trying to justify working on this stuff to Future You.

The fun part, though, is that web advertising can’t really work the same as other online (software-intermediated?) ad platforms. The web is (user + client)/server. The client software is the agent of the user and is expected to operate in the user’s interests. Other online ad media such as native mobile apps and connected TV are user/(client + server) The client software is provided by, and under the control of, another party. The user may even be a party to an end-user license agreement where they agree not to interfere with the client software’s ability to act in another party’s interests.

It’s unrealistic to expect web ads to work similarly to app and CTV ads. Web ads in their improved state are going to work more like an ad medium in which there is no client software (like print ads) than they will work like an ad medium in which the client software operates in the interests of a party other than the user. This is probably a good thing, because if the user is in the loop, the best way for an ad to pay for the resources and attention it consumes is to participate in a content→signal→reputation spiral that is hard to do in less user-aligned ad media.

11 Nov 2021

Revenue gap and related links

When you measure include the measurer. — MC Hammer

Rebuilding web advertising without the third-party cookie is not really a privacy story. It’s a market design story. And it’s probably a mistake to think about third party cookies as some ideal system of which a fraction needs to be clawed back while keeping privacy. The limitations to measurement are complicated. (There is a lot of math needed to understand the reasons behind The Refrigerator Test and I’m still trying to figure it out.)

Some good background info.

Consumer Privacy Choice in Online Advertising: Who Opts Out and at What Cost to Industry? by Garrett Johnson, Scott Shriver, Shaoyin Du :: SSRN Though consumers express strong privacy concerns in surveys, we find that only 0.23% of American ad impressions arise from users who opted out of online behavioral advertising. We also find that opt-out user ads fetch 52.0% less revenue on the exchange than do comparable ads for users who allow behavioral targeting.

The Identity Fragmentation Bias by Tesary Lin and Sanjog Misra. This paper studies the identity fragmentation bias, referring to the estimation bias resulted from using fragmented data. Using a formal framework, we decompose the contributing factors of the estimation bias caused by data fragmentation and discuss the direction of bias. Contrary to conventional wisdom, this bias cannot be signed or bounded under standard assumptions.

Inferno: A guide to field experiments in online display advertising by Garrett A. Johnson. 19 Jul 2021 Online ad experiments suffer from an extreme statistical power problem, which limits how much can be learned from experiments.Some of the coming changes to online advertising promise improvements for experimenters.

The Unfavorable Economics of Measuring the Returns to Advertising - 2015-lewis.pdf These initial findings show that when advertising at a level of intensity typical of digital advertising, [randomized controlled trials] require sample sizes in the single-digit millions of person-weeks to distinguish campaigns that have no effect on consumer behavior (100% ROI) from those that are profitable (ROI >0%).Identifying highly successful campaigns from ones that merely broke even is not an optimization standard we typically apply in economics, yet our analysis shows that reliably distinguishing a 50% from 0% ROI is typically not possible with a $100,000 experiment involving millions of individuals.

Internet Activity Bias Causes Lumpy User Behavior by Jakob Nielsen This phenomenon is called activity bias: some days, people do a lot online; other days, they do very little. On very active days, people are more likely to do both Activity A and Activity B, no matter what A or B might be…. Crucially, even if there is no relationship between A and B, the very fact that you observe users doing A means that they are likely to be having one of their more active days and therefore are also more likely to do B.

Much Better Yellow Pages. Much Worse Television. by Bob Hoffman

04 Nov 2021

taxing surveillance marketing

Putting a tax on surveillance marketing is sometimes suggested as a solution to a classic externalities problem—firms benefit from surveillance marketing, but the costs and risks are paid for by the people surveilled. A Pigovian tax is the go-to fix for this situation.

Where to put the tax is the problem. Taxing specific marketing practices probably creates more overhead and risk than it’s worth. Too easy for surveillance marketers to work around. So it seems like the best approach would be to expand the existing “data broker” registration laws to put scaled reporting and tax requirements on any database containing PII. From an economic POV, personal information is digital hazmat, with both potential harms and possible future benefits that a regulator is not in a position to evaluate but the users are.

Just like RCRA resulted in some marginal uses of hazardous materials being phased out, a Pigovian tax would likely cause companies to get rid of some high cost/benefit surveillance marketing data on their own to avoid reporting and taxation.

In general, it is important to tie the tax to the data (and therefore risks) and not to specific practices. Users of the data are best able to decide how to balance the risks and rewards.

24 Oct 2021

bringing in the email harvest

(This is rough, still trying to figure this out. Based on a thread in a surveillance marketing forum.)

Here’s a surveillance marketing problem: some company gets a web site visitor, but can’t identify them. They want to target the same person with Facebook ads or email spam.

For the company, one solution is to put some third-party JavaScript on the page that generates an email address for an anonymous visitor. Examples:

  • LeadPost leverages a network of match providers to convert your anonymous bounces into actionable leads. All lead data includes fully verified name, address and email and may be used for unlimited marketing purposes.

  • GetEmails offers Anonymous Website Visitor Identification, best explained in their video, How it Works In the Kitchen. We hash the email addresses in the huge f’n database, we then match the hashed email addresses in the cookies to the hashed email addresses in the big f’n database, and we pass you a record. It is magic.

There’s a book about this technique, by Adam Robinson, founder of GetEmails. Permission (Sh)marketing: How the world’s fastest-growing companies legally retarget website visitors using email without permission (and how you can too).

On the identification side, we could use existing technology to identify up to about 35 percent of US traffic. On the data side, we could compile an enormous list of third-party opt-ins through business developement with lead generation companies. We connected the two parts together, and lo and behold, it worked. Email-Based Retargeting was born.

This might be why I sometimes get completely inexplicable companies showing up as Advertisers using your activity or information in Facebook Ad Preferences. What if this happened?

  1. User A visits a web site with some email-finding JavaScript on it.

  2. Email-finding JavaScript misidentifies User A as User B.

  3. Company adds User B’s info to their CRM system and uses it to send spam (generally, CAN-SPAM compliant email that is spam according to norms and reputable mail server ToSs, not spam according to US Federal law) to User B, and adds User B to a Facebook Custom Audience (not a Website Custom Audience like they would have gotten by using the Facebook pixel, a Customer List Custom Audience as if they had gotten the email with consent.)

The problem is: what happens if User B had Global Privacy Control turned on? The company would have picked up on it, and set the “Do Not Sell” flag to apply to User B’s information, but User B didn’t visit the company’s site. User A did. So now User B sees their info in a place it shouldn’t be, and the company is in CCPA trouble for mishandling the information of someone who never even came to their site.

Part of the solution seems to be for the third-party vendor to keep track of everyone they have seen a Global Privacy Control for on any site, and never return that person’s info in step 2 above. But I’m not sure if this covers it. Anyway, this might just be more about what the heck is that company I’ve never heard of doing in Facebook Ad Preferences? than anything too significant.

Or I’m seeing those weird Facebook ads because the company just bought an old spam CD and made custom audiences out of that. Probably easier.

08 Oct 2021

Recording audio from YouTube

So, I was on a panel a while ago: What Does the Future Hold for Global Privacy Controls? - YouTube

The main point here is that Global Privacy Control affects all marketing projects at a company, not just web advertising.

  • If you have a GPC from a customer, it affects everything including sales of postal direct mail lists, social media custom audiences, all of it.

  • If you use GPC when you place an order, and you give the seller your info, then GPC affects all of what the company can do with your info, not just your current web session or order.

Anyway, they put it up as a YouTube video, so if I want to make an audio file to listen to offline I should download and convert it.

Step 1: Make a recording with Live Recorder.

Step 2: Extract the audio stream with ffmpeg. (source: video conversion - Lossless extraction of streams from WebM - Super User)

ffmpeg -i input.webm -vn -acodec copy out.ogg

08 Oct 2021

Prevent the system from sleeping when someone is logged in over ssh

To prevent the Linux system from suspending or hibernating when a user is logged in, you can put this file, 10-no-sleep-ssh.sh into /etc/pm/sleep.d. Owned by root:root, mode 755.

#!/bin/bash

case "$1" in
  suspend|hibernate)
    if (who -u | awk '{print $2}' | grep -q pts/); then
    logger 'ssh user(s) logged in'
        exit 1
    fi
    ;;
esac

The who -u command lists the users who are logged in. A remote user will be on a pseudo-terminal and have “pts” in the second column of their line in the output.

If this script exits with an error (exit 1) the system will be prevented from sleeping.

25 Aug 2021

Under-served niche in the ad market?

In case you missed it, here’s the new version of the FTC complaint against Facebook: FTC Alleges Facebook Resorted to Illegal Buy-or-Bury Scheme to Crush Competition After String of Failed Attempts to Innovate.

To me, it looks like there is a gap in the Internet advertising market that might be explained by the actions covered in that complaint.

Complexity
Low High
Customer lifespan Long ? high-quality sites
Short Facebook misinfo sites

Right now if companies want to buy advertising with low complexity—ease of placement and easy-to-interpret attribution data—they can get it from Facebook. But because of the buy-or-bury scheme, it comes as part of a single service that also tends to kill off a fraction of the customers, through medical misinformation, extremist violence, climate disasters, and so on. This ad option appears to meet the needs of advertisers with shorter time horizons, such as several quarters of DTC startup runway or CMO job tenure. Most of the years of customer life lost will not enter into consideration because they will take place outside the time frame relevant for making the decision.

Judy Shapiro points out that In 2017, just five years ago, the CPM to run a Facebook ad was $4.21, meaning you had to spend $4.21 to reach 1,000 impressions, presumably people. Today, the price is $13.87 – a whooping increase of 222%.

Judging by that, the combination of advertiser convenience and lowering customer lifespan seems to be a winner. Some advertisers, however, have investments in facilities and goodwill that make them prefer longer customer lifespans but lack the advertising know-how to be able to run high-complexity advertising. It is possible that one of the companies acquired by Facebook might have, if it continued to operate independently, been successful in the low-complexity/high-lifespan niche. Today that niche seems to be an opportunity.

13 Jun 2021

interesting missing sentence

Update: CMA caught this. Google previously committed to not imposing a penalty in search rankings on sites which opt out of Topics API. This should be extended to apply to all Privacy Sandbox technologies.

In response to the news that Google won’t end support for tracking cookies unless UK’s competition watchdog agrees, the company has come out with a page stating Our commitments for the Privacy Sandbox.

The key commitment that I was looking for in there is something about how a site’s decision to use Google Privacy Sandbox will or will not affect the ranking of search results from that site in Google Search and Google News.

It would not have taken them much time to add that, plus of course a little extra lawyer time depending how many lawyers need to read it before they post. But that didn’t happen. Which makes the missing sentence the loudest part of the whole thing.

By now it’s pretty clear that we’re going to continue to have arguments about the future of web advertising, and then, whatever way those arguments turn out, everyone is going to do everything exactly Google’s way because Google dominates search. For example, Google FLoC opt-outs are a thing now, but how many sites will stay opted out even after the SEO impact becomes clear? Wikipedia, GitHub, and Amazon have a high enough profile that they’re probably fine, but how will The Markup do in search results, against other sites that cover the same topics in a more Google-friendly way?

Instead of sending an anticompetitive message in a document that can be part of discovery in a lawsuit, or stating it in a meeting that can be recorded or testified about, Google can encode the message in patterns of search results so that SEO consultants can reconstruct it for their clients. No message text has to be shared, but an internal message M at Google can become M′ at the recipient company. There is probably a really good information theory paper about how this works.

25 May 2021

notes on chapter 6

Some notes on chapter 6 of Privacy is Power by Carissa Veliz. This is the chapter with specific recommendations in it.

Think twice before sharing: this is a good point, maybe more than twice. For example, if you Tweet the date that your coronavirus vaccination became effective, surveillance marketers who know where you live can count backward to the date you became eligible, and learn something about your age or health history.

Respect others’ privacy: Good advice, and don’t ask why someone declined permission to have their info shared, or pressure people. (And if you have information or photos pertaining to somebody who you no longer have a relationship with that would make having that material appropriate, destroy it.)

Create privacy spaces: Now I want to do this.

Say no: This situation is actually getting a little better this year—consent dialogs are getting cleaned up. I’m not seeing as many of the first generation more dark pattern-ish ones. But see the Cookie AutoDelete link below. If you consent by mistake, a tool can sometimes fix it.

Choose privacy: Vizio makes nearly as much money from ads and data as it does from TVs, so it’s not surprising that it’s hard to get a TV without smart features. My solution is to just connect HDMI and antenna, never WiFi or Ethernet. The part about not having an Amazon Alexa device in the house seems kind of obvious, but maybe some families disagree? The good thing is that couples who are dating now can find out if they have compatible views on smart speakers before sharing a home. (and maybe figure out whether augmented reality surveillance glasses are a relationship problem now, before you end up having to break up with both a person and with the corporation surveilling you from their face?)

Under devices there is a very good point about not keeping apps you don’t need. Also check devices for pre-installed, unremovable apps before buying, to save yourself the hassle of having to return the device. (Any phone manufacturer that would stick unremovable surveillance bloatware on their product would probably make other bad decisions, too.)

Messaging apps: Good points. If this section isn’t enough to keep you off WhatsApp, there are other issues. I’m still concerned that WhatsApp is going to rub some people’s patriotic sensibilities the wrong way, since WhatsApp makes some countries, including India, accept a lesser privacy policy that they won’t inflict on, say, Germany. Just to stay out of that whole contentious national pride issue, I’m sticking with Signal which doesn’t discriminate by country this way.

Email: Surveillance marketers have figured out the + addressing trick, unfortunately. It does make sense to keep a separate address for low-priority marketing mail.

Browsers: Good point to have more than one. I like Firefox, tricked out with appropriate extensions, for daily use but keep a totally stock copy of Google Chrome just in case, for sites with compatibility problems. (I never use Chrome for buying anything. Why reward sites for compatibility problems by buying stuff from them?) If you have a computer that came with Apple Mac OS or Microsoft Windows, both of those come with good browsers now. So if you want to use Firefox as your main browser, you can just use the browser that came with your computer as the backup.

Use privacy extensions and tools. Here is the one piece of bad advice in this chapter: Adblockers are easy to find and install. Don’t do this.Seriously, don’t do this. The ad blockers that are easy to find are generally the sketchy ones. Unfortunately, the browser extensions directories tend to fill up with adblockers that either have a so-called Acceptable Ads feature that lets tracking through, and that you have to change some settings to turn off, or are actual adware. (If you install an ad blocker and start getting ads on Wikipedia, you got the second kind.) And somehow the ad blockers that allow the Google trackers through seem to come up higher in a Google search.

I use the built-in Enhanced Tracking Protection in Firefox, which you can set to standard for basic protection and strict for more protection but possible breakage.

Legit ad blockers are generally effective against tracking by third parties on regular sites, but that’s not as big a surveillance risk as the Facebook sites and YouTube. For Facebook in the browser, there’s Facebook Container (you did remove the native app, right?), but cleaning up YouTube is a little harder.

Yes, third-party cookies are the easy surveillance technology to block, but deal with Facebook and YouTube first. This whole project is not just about saving yourself a little annoyance today, but also about incentivizing CMOs and VCs to expect lower returns on future surveillance marketing and invest based on that. We win when we can change the level of risk we all have to deal with in the future.

Another extension that I find valuable is Cookie AutoDelete. Automatically cleans out the cookies that a dark pattern got you to consent to, while letting you keep cookies for sites where you want to stay logged in. Much easier than clearing cookies manually, which a lot of people apparently still do.

Demand privacy: Sending opt outs is one privacy task where it really helps to be in California. The high-priority companies to send CCPA opt-outs to are

  • big data brokers

  • brands you actually buy stuff from

  • companies targeting you on Facebook. (This is where keeping a Facebook account is actually a privacy win. It’s a waste of time to actually CCPA Facebook but their ad settings can reveal who has uploaded your info to them and needs to get a CCPA opt-out.

Breaking the kill chain

What’s the most effective place to break the surveillance marketing kill chain? Let’s look at it as a cycle. People buy stuff, surveillance marketers use sales to measure the effectiveness of what they have been doing, use those measurements to guide target selection, then place ads and collect reports on the ads. The reports feed into the same effectiveness measurement as sales.

surveillance marketing kill chain

(Feel free to pick this model apart. I plan on revising it some more once I have figured out what I missed.)

Ad placement looks easiest because it’s is where you can use ad blockers. Fine for run-of-the-mill sites where the ads come from a separate ad server, but native apps and non-web ads are still going to get through here. Blocking the most blockable ads is only going to incentivize ads to move to non-web ad media. Most likely case is that more ad money ends up supporting deadly viruses, climate disasters and terrorists at the big social sites.

Ad reporting is also affected by ad blockers, and you can use tools like AdNauseam to make it less effective. Promising, but like ad placement an area where the more evil ad media are less affected.

Measuring effectiveness and target selection are both internal to the advertiser. Could be good for insiders to work on, but not a lot you can do from the outside.

That leaves buying stuff. What if the most effective place to take action is between putting something in your web shopping cart and making a purchase?

  • Compared to the number of trackers and third parties you interact with, the number of companies you actually buy from is small

  • Opt out will cover all downstream uses of your info, not just the ones you know about or have a tool for

Surveillance-based ads can only justify a higher rate per ad impression by connecting ad reporting to sales.

This gets way easier when retailers have Global Privacy Control support and clean up their dark patterns, so could go from privacy nerdery to mainstream fairly smoothly.

Facebook is a good example here, because their system is designed to limit outward transfer of information. If a company buys ads on Facebook, they can’t ask Facebook for a list of who saw their ad and then compare to the list of people who bought stuff from them. Instead, the advertiser has to:

  1. Integrate Facebook software into their web site, apps, and/or CRM system. (Facebook makes this software easy to set up, and only the web version is easy for the customers to block.)

  2. Use the software to send reports on their customers, saying who bought what, to Facebook. The advertiser generally doesn’t know which customers have Facebook accounts, and most people have a shadow profile anyway, so they just dump everybody’s purchases to Facebook. This can be done in a server-to-server way that does not depend on the browser.

  3. Facebook generates a report for the advertiser so that the advertiser can compare their Facebook advertising to their sales.

Step 2 is hardest to keep from happening, since this information transfer has the most technical and legal armor around it. But it also carries the most reward per bit of personal info transfer blocked. The more I can exercise my right to stop the advertisers from doing this, the less justification they will have to send money to Facebook instead of to ad media with positive externalities.

It makes sense not to RtD Facebook, since with the current state of things you will have a shadow profile with them anyway. Instead, RtK Facebook and either opt out or RtD the advertisers who feed into their system.

25 Apr 2021

how to keep the wrong sorts of people out of your place

How to keep the wrong sorts of people out of your place, using Google FLoC.

  1. Post a FLoC checker QR code.
FLoC QR code
  1. Make everybody scan it when they come in.

  2. Check their phones.

  • Green: good cohort

  • Red: bad or missing cohort. If they turned off FLoC or installed an alt-browser, they’re probably hiding that they’re in a bad cohort.

  • Blue: Apple Safari, you figure it out.

(no, I don’t think anybody should actually do this, but it’s only a little bit of JavaScript so people are certainly going to try. So it could turn out that non-Google browsers will end up having to provide a realistic cohort instead of just not implementing FLoC.)

12 Mar 2021

Your cohorts are just ethnic affinity groups. Change my mind.

(Update 28 Feb 2022: Berke and Calacci link)

(Update 9 May 2021: add another example)

The big question around Google FLoC is whether or not some of the FLoC cohorts, which are group identifiers applied by the browser and shared with all sites, will match up with membership in legally protected groups of people. Will cohorts turn out to be the web version of Facebook’s old Ethnic Affinity Groups, also known as multicultural affinity groups?

2022 update: Browsing behavior correlates with race, but cohorts do not….We did not find with our t-closeness analysis that the likelihood of correlating racial background with cohorts, using the FLoC OT algorithm, was any greater than chance. (Privacy Limitations Of Interest-based Advertising On The Web: A Post-mortem Empirical Analysis Of Google’s FLoC

Facebook limited the ability of advertisers to exclude members of these groups in 2018 and made many of the groups unusable for targeting at all in 2020. But FLoC is a little different. It assigns numbers, not names, to cohorts, so the unsolved problem is how to tell which cohorts, if any, are actually ethnic affinity groups. One issue on GitHub asks,

If we do have an issue where racially specific targeting is incidentally created by the ML system what happens when advertisers target for or against it and who ends up responsible?

FLoC developers are planning to use sensitive-page classifiers to check which cohorts match up to sensitive groups of pages in web history. Unfortunately, checking page content is not going to give them protected group membership for the users. A simple US-based example is school and neighborhood patterns. A school that is mainly attended by members of a single ethnic group is going to have page content that’s mostly the same as all the other schools in the district. The schools all have similar events and play the same sports, but serve different groups of students and parents. So, even though the content is non-sensitive, the cohort is. And local stores with similar merchandise in different neighborhoods are going to get different ethnic affinity groups, I mean cohorts, of visitors. Content in language A could be completely non-sensitive, and local content for region B could be completely non-sensitive, but the cohort of people who use language A in region B could be highly sensitive.

So it might look like nobody will be able to tell which cohorts are really ethnic affinty groups until some independent data journalism site manages to do a study with a panel of opted-in users. This would be the kind of AI ethics research that is bad for career prospects at Google, but that independent organizations can often come up with the funding to do.

But one company doesn’t have to wait for the study and resulting news story. Facebook has enough logged-in Google Chrome users that they could already know which FLoC cohorts match up to their old ethnic affinity groups. If a brand buys ads on the open web and relies on FLoC data, Facebook can see when the brand is doing crimes. This doesn’t mean that Facebook will disclose the problem, since it gives them something to hold over the brand. No more making any stink about ad metrics or Facebook Groups IRL get-togethers. The extra risk for the advertisers means lower expected revenue for ad impressions tied to FLoC—because of uncertainties that are hard for anyone else to see.

Inspiration for the title for this post:

Your probabilistic ID is just fingerprinting. Change my mind.

— Stephanie Layser (@slayser8) Twitter, January 27, 2021

06 Mar 2021

cleaning up YouTube

(update 6 Aug 2023: how did this tip affect my usage of YouTube?)

Time for another Internet tip. Hope this is useful, since I think my problem was a pretty common one.

  • I want to be able to share a video link from a business-related event and not have a “lizard people run the Federal Reserve” video auto-play at the end of it.

  • I want my family members to be able to click a video on how to play a game or cook a recipe without constantly having to check the recommended videos for those prolific “Fourteen Words” guys.

  • I don’t want to watch an exercise video and then have it roll right over into the miraculous bleach and vinegar diet video.

  • And finally, I want to cut back on how much of my personal or business info is used to target me with ads.

Yes, this means cleaning up YouTube. Ideally there would be just one tool I could install, like Facebook Container, but it’s a little more work than that. This is going to take two extensions and a little configuration. But in my opinion it’s worth it to do this once instead of having to deal with “oh crap, YouTube is showing WHAT?” over and over. Ready?

Part one: containerize YouTube

First, install Firefox Multi-Account Containers. Containers are a way for users to isolate their online identities and tasks from one another. You will get a new Multi-Account Containers button on the Firefox toolbar. This opens the container menu.

Click the button to open the container menu, click “Manage Containers,” then “New Container”. Make a container for YouTube. I picked red as the color code for tabs opened in this container. (If you already have Facebook Container, which I recommend, you’ll have a blue stripe for Facebook tabs.) You can also pick a small icon for the container.

Now that the container exists, go to the YouTube site, open the container menu, and select “Always Open This Site in…” Pick the YouTube container from the list.

Now when you follow a link to YouTube, the page will open up in the container, with a stripe on the tab showing the color you picked. You shouldn’t be logged in to YouTube with your Google account if you have one.

Part one done. This should limit the use of my info from other services to target me for scams and crap in video ads. This should not require any extra clicks after the original setup. You can even customize the Firefox toolbar and move the Multi-Account Containers button to the overflow area, so you don’t have to think about it unless you want to do this for another site.

Part two: patch the YouTube rat hole

Time for another extension: Enhancer for YouTube. This one will change some of the creepy “engagement” promoting behaviors of the YouTube site design.

Install, then open up the preferences. Hamburger menu → Add-ons → three dots to the right of the Enhancer for YouTube entry.

This preferences menu has a whole bunch of stuff on it, take your pick. The essentials are:

  • Disable autoplay

  • Hide related videos

I also hide comments I stopped hiding comments as I found channels with more useful links in comments. and use the Automatically enable YouTube’s Theater mode option to make the video bigger and fill in the extra space (that used to be occupied by thumbnails of the flat earth video, the miracle virus cure video, and the video about the dead politician who is secretly alive and is coming back to massacre everybody that some video maker doesn’t like).

You can also click around with some of the other options. I haven’t messed with this extension since I got things cleaned up. If you like a video channel you can allow autoplay just for playlists, and the extension doesn’t interfere with browsing a channel’s home page or videos list page.

All done.

Conclusion

YouTube is not in business just to show you the videos you want. The more that viewers pick their own videos, the more market power that popular video creators end up with, and the lower the share of ad revenue that YouTube can capture. The secret of the YouTube model is to commodify the content by artificially driving viewers away from emergent stars…even if that means that the side effect is promoting more bleach-drinking videos or white power videos than viewers actually want.

In the long run, regulation will need to complement the technology here. It will be easy for YouTube to change their site around to make extensions and tips stop working. Drink your bleach and like it, we’ve got a market to dominate! Right now, there is a lot of focus on the technology and regulation to enforce people’s right to block transfers of their data between sites. But there is also going to need to be some protection of the right to turn off automated promotion/commodification of behavior within a site.

Why not just quit YouTube? (added in 2023)

YouTube, once you clean up the engagement and growth hacking crap, is promising in a lot of ways. At its best, YouTube is pretty encouraging as an ad medium, as entertainmment, and as a place to learn stuff. How I use it now:

  • I subscribe to channels recommended by people, including YouTubers.

  • I actually pay attention to the ads. Not the automatically placed surveillance ads, of course. Those are targeted to me, so more likely to be deceptive than signal-carrying. I still ignore those. But a lot of YouTubers read their own sponsor messages, old-school Paul Harvey style, and those can actually work like ads are supposed to. Here’s a good example: Incogni promo (video).

04 Mar 2021

letter to Consumer Reports

(here’s the body of a letter I’m sending to Consumer Reports. Feel free to copy, modify, and send to other public-interest sites or open-source projects.)

I’m writing to ask you to set an HTTP header on the Consumer Reports site to help keep deadly robots from burning people’s houses down.

First, here’s the header.

Permissions-Policy: interest-cohort=()

I know it’s a little overhead on each request, but here’s why you’re going to need it.

The Google Chrome web browser is going to be coming out with a new feature called Federated Learning of Cohorts (FLoC). FLoC is software that runs in the browser and assigns the user to a group, or cohort, based on the web sites they visit.

Each user would be assigned to one cohort, so the Consumer Reports site visitors will be spread out among several cohorts. In order to use cohorts, sites will have to learn which cohorts correspond to which characteristics about people. For example, Consumer Reports members might be assigned to cohorts that are heavy on “appliance buyers who are concerned about safety.” When retail sites are able to analyze the meaning of a visitor’s cohort, they’ll likely begin to optimize, showing the safest, highest-quality appliances to the users whose cohort shows they’re likely to be Consumer Reports members, and showing the lower-quality appliances, more likely to catch on fire, to the members of other cohorts.

In the long run, it’s bad for everyone in the market when companies can easily monetize unsafe products. FLoC doesn’t require a deliberate decision by anyone to trick any customer into buying a dangerous appliance. The machine learning software on the retail side just “learns” how to optimally move certain SKUs, without ever knowing that they match up to fire hazards for the customers. Cohort training is an important area for ethical AI research, but Google has been abruptly terminating some key people who were responsible for leading research in the field of AI ethics. Until FLoC has been tested for safety, and for other problems such as enabling unlawful discrimination, it seems best to stay opted out of it. That header is the only opt-out method for now.

If you have any questions about how to test the HTTP header, please let me know. A test of FLoC will be starting shortly, and you can learn more at the World Wide Web Consortium’s Web Incubator Community Group (WICG).

01 Mar 2021

leaked: a new surveillance browser?

I just heard from a reader of this blog that a dictatorship is taking over in their country, and will be requiring all citizens to use their new web browser. The scary thing is that this new country-specific browser will have a built-in surveillance system in it. Excerpt from the top-secret document follows.

Today, citizens are categorized, governed, and provided with public services based on a variety of tracking techniques. It would be more efficient and less costly for the State if citizens could be assigned to priority groups, or cohorts, within the browser itself.

We plan to explore ways in which a browser can group together people with similar browsing habits, so that the State (and private entities regulated by the State) can use the habits of these cohorts, to:

  • prioritize assignment of surveillance personnel to individuals

  • allocate public services preferentially to favored religious and language groups

  • encourage self-reeducation by members of marginal groups

Browsers would need a way to form clusters that are both useful and efficient: Useful by collecting people with similar enough interests and producing labels suitable for machine learning, and efficient by cheaply forming large clusters that can be used to prioritize the assignment of costly individual surveillance technologies and of public-sector services such as education, business subsidies, and travel documents.

A Citizen’s Omnipresent Law-Enforcement and Favoritism (COLF) cohort is a short name that is shared by a large number (thousands) of people, derived by the browser from its user’s browsing history. The browser updates the cohort over time as its user traverses the web. The value is made available to websites via a new JavaScript API.

The browser uses machine learning algorithms to develop a cohort based on the sites that an individual visits. The algorithms might be based on the URLs of the visited sites, on the content of those pages, or other factors. The central idea is that these input features to the algorithm, including the web history, are kept local on the browser—the browser only exposes the generated cohort. The browser ensures that cohorts are well distributed, so that each represents thousands of people.

Hiding or falsifying cohort membership

Some citizens who are members of less favored cohorts might configure or modify their browsers to send a blank, random, or deliberately chosen cohort. This behavior will be disincentivized by doing spot-checks that compare the observed cohort for a citizen to a set of likely cohorts calcuated from known PII for that citizen.

Cohorts do not eliminate the need for detailed State surveillance of a subset of citizens, just as a vehicle license plate does not eliminate the need for random checks of a driver’s papers. However, license plates and cohorts are easily observable in large numbers, and appropriate penalties for falsification of either can be applied. Cohorts are a cost-saving complement to other surveillance technologies, and make more kinds of discrimination and surveillance practical and affordable.

Sensitive Categories

A cohort is designed to reveal sensitive information. A user might configure or modify the browser in an attempt to remove visits to sensitive sites from cohort data collection. But this does not mean sensitive information can’t be leaked. The State is aware of correlations between browser history and sensitive cohort membership that citizens are not aware of.

Citizens might attempt to evaluate their own cohort by measuring and limiting their deviation from population-level demographics with respect to the prevalence of sensitive categories, to prevent their use as proxies for a sensitive category. However, this evaluation would require knowing how many individual people in the cohort were in the sensitive categories, information which could be difficult or intrusive to obtain.

…all right, that’s enough. This wasn’t some new surveillance browser, it’s a lightly edited version of the FLoC README. How much of a Bay Area big company employee filter bubble do you have to be in to see an idea for having the browser tell sites, “MY USER IS A MEMBER OF THIS GROUP” and think, wow, we can use this to sell sneakers! Anybody who wants sneakers, if you seriously have no idea how to get them, let me know, I’ll drive you to the damn shoe store myself. Easier than dealing with all this labeling-people-with-group-identifiers creepy jibber-jabber.

13 Feb 2021

FLoC opt out

Update: FLoC has been replaced. Here is a list of the new opt-outs: Google Chrome ad features checklist. Do that stuff instead.

Begin old material:

I’m not going to cover all the problems with having web browsers label their users with “cohorts” here. Really briefly, the Google Chrome browser might start using a system called “Federated Learning of Cohorts” (FLoC) to partly replace some of the data practices that are currently done using third-party tracking cookies.

There is a test of FLoC now running that is opt-in, but the core FLoC proposal is still opt out. It looks as if, in order to protect Google Chrome users on your site, you will have to change the site configuration a little.

New opt out

The opt out for the new FLoC, Topics API is:

Permissions-Policy: browsing-topics=()

Opt out method 1: HTTP header in the site config

The header you need is Permissions-Policy, and looks like this:

Permissions-Policy: interest-cohort=(), browsing-topics=()

In order to set that in Apache2 on Debian, I made a file called /etc/apache2/conf-available/floc.conf with the line:

Header always set Permissions-Policy: interest-cohort=(), browsing-topics=()

and then ran:

a2enconf floc.conf
apachectl graceful

I’m going to do this on all the sites where I have access to the web server config and can easily add a header.

You can also add the header in the appropriate .htaccess file.

For WordPress

There is a new WordPress plugin to add the opt-out header: Opt-out of FLoC on your WordPress website – Roy Tanck

Opt out method 2: for sites that can’t set HTTP headers

Work in progress. The HTTP header is still the only way to opt out, since a meta tag won’t work. (Discussion on the WICG/floc repository on GitHub.)

Testing

In the future, to check that it worked, I should be able to use the Permissions API. An example of a Permissions API query that works is this one, for how to check the Geolocation API permission:

// Check for Geolocation API permissions
navigator.permissions.query({name:'geolocation'})
    .then(function(permissionStatus) {
        console.log('geolocation permission state is ', permissionStatus.state);
  });

Right now when I do the a query for the name interest-cohort, I’m getting The provided value ‘interest-cohort’ is not a valid enum value of type PermissionName. but I’m sure that they’ll get this fixed up before FLoC gets turned on in regular people’s browsers. Anyway, will update as I get to test more.

Well that was interesting

I mess around with this kind of thing on a Saturday morning because I’m interested in web privacy, but seriously, opting out of half-thought-out site audience data exfiltration schemes should not have to be something that people with normal web sites have to worry about. Not everybody has a free Saturday morning to work on their personal site, and not every company has a massive open-plan office with armies of developers to throw at projects they need to do just to stay in the same place.

06 Feb 2021

State privacy law features from the power user point of view

For those who are planning to subvert democracy in the USA or target vulnerable Americans for scams, I have some bad news. People across the nation are coming together to support new state privacy laws.

Here in California, I say, “right on.” All this state political action is not surprising. Privacy laws and regulations are incredibly popular.
California polling before our latest privacy initiative showed 88 percent in favor. Mandating privacy protection is so popular that the argument against the initiative in the voter guide had to focus on how it didn’t do enough.

Vote NO on Proposition 24 because it was written behind closed doors with input from giant tech corporations that collect and misuse our personal information—while the measure’s sponsor rejected almost every suggestion from 11 privacy and consumer rights groups.

I’m not just a California privacy law proponent. I’m also a user. A power user. My goal for 2020 was to use the CCPA to opt out of all personalized advertising, and if you count the stuff I’m already protected from by my regular privacy tools, I have mostly succeeded. (I still get targeted ads on Twitter, but those are really just for ridiculosity. Not only do I lack the budget for an F-15EX airplane, I don’t even have a pilot’s license.)

That meant I did a lot of CCPA opt-outs in 2020. I even got back on Facebook, to CCPA any recognizable brand trying to target me on there. (I didn’t bother with the sketchy Facebook advertisers, like all the companies offering software and courses for making money drop-shipping cheap products to people who click on Facebook ads.) The Consumer Reports CCPA Authorized Agent study (PDF) was part of my year of using the CCPA, and I also did a bunch of CCPA opt outs and Right to Knows on my own. On my computers, ccpa is a shell script now, so I can CCPA a company faster than anybody. Time me.

Privacy law features I depend on

As a power user, I’m willing to use services and write code to make my privacy opt-outs work. I understand that there are good reasons for putting privacy laws in the USA on an opt-out basis instead of making them consent-based. I’m fully prepared to do my part of the work, even if that means dorking around with laser printers or fax machines.

If a missing piece in a state privacy law is something that I can work around, I’ll do it. So this list is only going to cover what I think are the essentials for making a state privacy law workable.

Reasonable identity verification for opt outs: The CCPA does not require identity verification for an opt out, but companies can deny an opt out if they believe it to be fraudulent. This makes an opt out easier than a full CCPA deletion or Right to Know. Realistically, making people scan their drivers’ licenses to opt out is going to do more to deter opt outs than to make them work any better.

Authorized agents: People hate filling out forms, especially minor variations on the same form, over and over. Giving people the ability to delegate the work is what makes an opt-out-based privacy law practical. I know that early CCPA implementations were kind of rough, but the future is in automation and delegation.

Dark patterns: Unless the law covers Dark Patterns, companies are incentivized to make opt-out processes that are technically legal but that keep increasing the time required.

Definition of what is being opted out of: When people ask me about the CCPA, the most common thing they want to opt out of is “stop this company targeting me on Facebook.” Please compare your privacy law to how Facebook Custom Audiences work. If the definitions in the law don’t cover this high-profile example of creepy privacy violation, you’re missing a key part of what voters want, and you need to fix it. (Yes, research shows that 31-36% of people are “Kevins” who want personalized ads. Kevin’s desires are already met, so the law needs to focus on helping the rest of the people exercise their rights.)

That’s about it. Let me just end with a quick rule of thumb: you know you have a good state privacy law when the surveillance marketing companies lobby for a Federal privacy law to preempt it. If your state isn’t making them complain about “uncertainty” and “patchwork of regulations” then you’re missing something.

03 Jan 2021

how to CCPA a game company

I was playing an online game the other night, and another player pointed out that the game had the Facebook SDK in it. This is a small piece of software that, well, I’ll quote from their site. Facebook SDKs enable you to pass app event data from your app to Facebook.link added by me

Creepy, right? What can you do about it?

First, the easy answer. On Apple iOS, users are soon going to get an Ask App not to Track button that should help with this problem. You’ll start getting dialogs with that option plus an Allow option. I don’t have the budget for enough lawyer time to understand what I would be agreeing to if I click Allow, so I’m not going to select that. Keep an eye out for these dialogs, and select the Ask App not to Track button when you can.

Second, the almost as easy answer. Send a CCPA opt-out to the game company. Under the California Consumer Privacy Act, you can require a company to stop selling your personal information. A sale for CCPA purposes does not have to be an exchange of personal info for money. So even if the game company is just getting different data back in exchange for yours, the CCPA still covers it.

How to do that:

  1. Get a CCPA opt-out letter.

  2. Put your info in the blanks with double brackets (name, address, phone, email.)

  3. Find the game company’s privacy email address, which is generally in their privacy policy somewhere, and send the email.

They will either act on the request (as they’re required to by law) or write back with some instructions for extra stuff they want you to do, which is not exactly legal legal, but companies have been getting away with it. Follow the instructions and you should be good.

After a year of doing CCPA opt-outs, it looks like even if they make you do extra steps to do your opt-out, it’s almost always faster to follow the instructions in the email than to try to find the right form on your own. And I use the email outbox as a way to check which companies I have already started the CCPA process with. Yes, there are going to be easier ways to make this stuff work including browser signals and authorized agent services. I’ll update with links as they become available.

29 Dec 2020

some ways that Facebook ads are optimized for deceptive advertising

(updated 5 Jun 2024: add link to The One Simple Trick to Measuring Abuse in Tech’s $440 Billion Ads Business)

(updated 14 Nov 2021: deception avoidance and value exchange)

(updated 20 Sep 2021: more material on Ad Library, problem categories, links)

(updated 7 Jan 2021: added intermediary for Custom Audiences trick)

Why are there so many scam ads on Facebook? The over-simplified answer is that Facebook just doesn’t have enough ad reviewers for the number of ads they get. Since basically anyone with a credit card can advertise, and advertisers have access to tools for making huge numbers of ad variations, then of course lots of scam ads are going to get through.

Facebook is also more attractive to scammers than other ad media. Deceptive advertisers already get more value from highly targetable ad media than honest advertisers do, because targeting gives the deceptive advertiser an additional benefit. Besides helping to reach possible buyers, a deceptive advertiser can also use targeting to avoid enforcers of laws and norms.

Understaffing and targeting are only parts of the story, though. It’s not so much that Facebook is uninterested in dealing with scams, it’s as if their ad system in general is the result of a cooperative software development project with the scammers. do Facebook and their scam advertisers constitute an “enterprise” for purposes of RICO? I don’t know, might be worth asking your lawyer if you got scammed or impersonated, though Some of the deliberate design decisions that went into Facebook ads are making things easier for deceptive advertisers at the expense of users and legit advertisers.

Custom Audiences don’t support list seeding. Until Facebook, every direct marketing medium has supported “seed” records, which look like ordinary records but get delivered back to the list owner or someone they know, so that they can monitor usage of the list. (I used them for a biotech company’s postal and email lists, even though we never sold or shared the list. Just to be on the safe side.) Using seed records is a basic direct marketing best practice and deters people who might see your list from misusing it.

Facebook Custom Audiences are a way for scammers to use a stolen list without detection. Facebook Ad Settings lets a user see if they personally are in someone else’s Custom Audience, but there’s no way for a list owner to check if the seed records from their list ended up on one. Someone who steals a mailing list can sneak it into a new Custom Audience without getting caught by the list owner. Legit direct marketers who want to protect their lists would pay for the ability to use seed accounts on Facebook, but this functionality would interfere with Facebook’s support for scam advertisers, so they don’t offer it, or even allow anyone else to provide seed accounts. (A limited number of Test Users are allowed for app development, but these are not usable as seeds. Facebook uses the term “seeds” differently from the conventional meaning, to mean the starting names for a Lookalike Audience)

Users can be blocked from seeing the company that really controls the targeting lists that they’re on. Suppose that a dishonest advertiser wants to use a California resident’s PII, but they don’t want to have to honor CCPA opt outs or register with the state. Facebook promises transparency and allows users to see who has uploaded their info. But the dishonest advertiser can simply send the hashed versions of the PII on their list to an intermediary firm, and have that firm transfer the hashed PII to Facebook. Now when someone who is on the list goes to “Advertisers using your activity or information” on Facebook, they see the name of the intermediary firm instead. Even if a bunch of people on the list do opt out, the deceptive advertiser’s own copy of the list is intact. When they switch to a different intermediary firm later, there are no opt-outs associated with the list. This also seems to be a good way for extremely suspicious-looking advertisers to hide from people who might report or investigate them. If I check Facebook for exclusion lists used by scammers who think I might report them, I see only the name of a generic-sounding targeted ad company, not the actual dishonest Facebook page.

Ad Library helps hide deceptive ads at times when risk of discovery is high. Facebook’s Ad Library is designed to show only “active” ads, those that are running this very minute. A deceptive advertiser using a trademark or a person’s likeness without permission can simply turn their ad on and off based on when the victim is likely to be checking the Ad Library. For example, a seller of infringing knock-offs of a European brand can run the ads when European marketers, lawyers, and regulators are asleep but people in the Americas or Asia are awake and shopping. Ad Library makes it easier for scammers to copy honest advertisers than the other way around.

Ad Library delays posting of scam ads. If you see a bunch of similar scam ads popping up, like this…

Scam ad on Facebook

…but then you go to their Ad Library and get This advertiser isn’t running ads in any country at this time, read the fine print.

An ad will appear in the ad library within 24 hours from the time it gets its first impression. Any changes or updates made to an ad will also be reflected in the ad library within 24 hours.

Facebook deliberately gives their scam advertisers almost a full day to take a whack at you before revealing their ads in Ad Library (and, of course, if the ad comes down fast enough, it never shows up there.)

Independent crawling of ads is blocked by policy. On the open web, online ads can be crawled and logged by independent companies. This service is needed in order to check for malvertising and other problem ads. Inside the Facebook environment, however, independent checking on ads is prohibited. Facebook puts the goal of hiding problem ads ahead of facilitating the kinds of services that could help fix the situation.

Image search crawlers are blocked from ads. Many scammers make infringing copies of material from legit ads without permission. Pirated product photos are especially common. The photos in those scam ads above appear to have been taken from a legit retailer. If legit advertisers had the ability to search for ads similar to theirs, or for edited copies of their own photos, they would be able to find a lot. But, for example, TinEye is blocked from Ad Library, to make life easier for Facebook’s deceptive advertisers at the expense of legit ones. Wells Fargo has to ask customers to report fake Wells Fargo because Facebook cooperates with scammers pretending to be Wells Fargo, to hide fraudulent uses of Wells Fargo’s trademarks.

Categories of scams to look for

The reason that Facebook has to try to shut down research programs like NYU’s is that a project with the budget and skills of a small university team could pick up on a bunch of obvious scams with some tools based on existing open-source image matching software.

Some examples:

  • photos of public figures who do not endorse a particular category (such as personal finance experts on cryptocurrency ads)

  • well-known company logos (needs manual check, sometimes the advertiser is a dealer using the logo with permission)

  • rental housing scams—look for the same house or apartment photo showing up in ads from multiple landlords

But why?

It’s not clear why a large company would choose to cooperate with deceptive advertisers. This decision might have to do with the fact that Facebook has lots of eyeball-minutes that are hard to sell to the legit market. As Bob Hoffman has been saying for a while, the ad business has a long-running problem of avoiding advertising to older people. Any online forum except the youngest and hottest is going to fill up with older users whose ad impressions are less valuable to marketers. Facebook could be making a short-term revenue-maximizing decision to try to monetize these users better by temporarily filling up the ad spots with scams, and only cleaning up bit by bit when they have to.

Deception avoidance and value exchange

Or maybe it’s not a short-term decision after all. What if the deceptive ads are a necessary part of the system?

A common, conventional point of view about surveillance marketing is that people choose to trade information about themselves for better-targeted ads. But this is oversimplified even if you don’t get into the details of whether or not people give actual consent to the exchange. Realistically, there aren’t enough well-targeted ads trying to reach you at any one time to make the more relevant ads better enough that even a high-status user would notice. If the Facebook ad system is run at capacity, then as a user you’re generally going to be getting mostly ads that are not perceptibly well-matched, but still revenue positive for the company.

Allowing a certain percentage of deceptive ads changes the balance. With enough deceptive ads in the system, it becomes a better move for a high-status user to reveal more information. Revealing information might be able to get you enough additional legit ads that the level of risk and annoyance you experience moves down noticeably.

So—even in a idealized consent-based future technical and regulatory environment, where users can’t be easily deceived into giving up more information than they prefer to—some rational high-status users might choose to trade away some personal information in order to attract more legit ads and fewer scams. Facebook doesn’t have to do anything drastic like offering reduced ad load in exchange for allowing better-matched ads, they can just let you buy your way out of some scams with data.

Salomé Viljoen writes, in A Relational Theory of Data Governance,

[P]eople have a collective interest against the unjust social processes data flows may materialize, against being drafted into the project of one another’s oppression as a condition of digital life, and against being put into data relations that constitute instances of domination and oppression for themselves or others on the basis of group membership.

This ad system might be a good example of that kind of project. A Facebook user who chooses to avoid scams by providing data on their membership in a high-status group is diverting the scams that they would have gotten onto other people, both members of low-status groups and members of high-status groups who share less data.

The question of scam load and total ad load is different from the competition questions around total ad load. In a hypothetical competitive market for social networking services, companies could compete on ad load, but with network effects and winner take all market effects, a monopoly network can run at a higher ad load than, say, a single ad-supported service that participated in a federated system of intercommunicating social sites.

Conclusion

There are some lessons here for the rest of us. When designing new post-cookie ads for the web in general, though, it will be more and more important to avoid the kind of design decisions that Facebook has made. Facebook is highly profitable running deceptive ads today, but as a single company they can unilaterally change their system relatively quickly. All the items above would be small code or policy changes whenever they decide to cut down on scams. For the open web, fixes that need to involve code and business agreements from more companies would be harder.

20 Dec 2020

free riding on future web ads?

(Update 10 Mar 2022: frequency capping)

Free riding in today’s web advertising is pretty basic.

  • Use an ad-supported site while blocking the ads.

Opportunities for free riding in the future are a lot more interesting.

  • Don’t store “interest-based” ads that sites ask you to keep for possible future use.

  • Use a modified version of a browser-hosted ad auction that skips time-consuming processing steps.

  • Don’t store delayed click or conversion tracking data, just drop it.

  • Turn off the browser’s cache partitioning feature to save bandwidth and storage, knowing that companies won’t invest in tracking the small fraction of users who figure out how to do this.

  • Identify yourself as a member of a more desirable interest group, to get better-looking ads than your station in life entitles you to.A discussion thread about economic inequality is just under the surface of the web ad argument. Much of the targeting that marketers want to do is about chasing the same people who have the money for discretionary personal spending and are also b2b decision-makers while avoiding wasting ad budgets on people who aren’t in a position to spend money.

  • Turn off frequency capping to get more impressions of a big-budget ad instead of the first few impressions of a lower-quality (possibly deceptive or gross) ad

Some of the proposals for post-cookie web advertising create new opportunities for free riding, either by individuals tweaking their settings or by developers who rebuild the browser with stubbed-out versions of ad placement and reporting features.

Privacy features in post-cookie web ads are going to make a lot of free riding hard to detect while a visitor is on a site. There’s no immediate difference perceptible to the site between a browser that is actually running in-browser ad auctions and delayed reporting, and a browser that is skipping all that stuff.

Will free riding browsers be an obscure nerd niche, or will these practices become common enough to provoke widely adopted countermeasures? Software, music, and movies have DRM systems. Online games have anti-cheating systems. Educational testing has intrusive proctoring systems. Will future web ad systems require some kind of locked-down browsing environment to keep people using the post-cookie ad placement and reporting system as intended?

29 Nov 2020

My talk, with links, from WebInnovationX

(This is a cleaned up and edited transcript of my talk from WebInnovationX, with some links put in.)

The IT business has long been in a kind of cycle of centralization and decentralization. After the 8-bit microcomputer days, the IBM PC led into the Microsoft Windows era, then the explosion of companies in the dot-com boom, and the cycle keeps turning. Today, the death of the third party cookieI promised a link to a cookie recipe, so here’s one that works reliably for me: Very Peanut Butter Cookies is often seen as a problem of further centralization. If third party cookies go away and we don’t have the ability for multiple players on the web to see data from each other, then that leaves a few big companies running everything. The Lumascape would look better with just a few logos on it, and some consolidation is inevitable. But there’s a big difference though between some overdue consolidation in the web advertising business and this scary move toward extreme centralization.

Today there are a few extremely large companies that are capable of participating in the web standards process in a large, ostentatious fashion. They have the ability to produce complete implementations of complex proposals, demonstrating their power to drive consolidation of of the web business. Big companies have open-plan offices full of developers for the same reason that King Henry VIII ordered rows of ornamental yew trees planted at Hampton Court Palace: to show that they have enough wealth not to need all that land and labor for food, and to remind visitors of the power of the English longbow. Some consolidation is probably a good thing, but for large companies the process of eliminating the third-party cookie is more about commodifying complementary goods, a strategy that Joel Spolsky pointed out back in 2002. In the days of Linux first catching on as a web server platform, the new OS was used for commodifying the servers. New companies didn’t need big Digital or Sun servers like the first generation of big web properties, they could just use stacks of generic PCs.

This innovation led to the giant companies of today, systematically commodifying everything that they touch. A strategy that worked so well for commodifying the hardware business is now being applied to everything, including content and labor. A lot of of attention is paid to big data as a buzzword, or artificial intelligence as a buzzword, but all of these terms are encapsulating a common phenomenon: pushing all the value in the system to a centralized reputation graph, a data structure that allows one company to evaluate which other participants in the system are better or worse for specific purposes. This is a common pattern across the gig economy, across large social sites, and of course the content industry.

Here is a question that that we got before this talk. If personal data becomes available mostly through a few giant companies, and those companies are the only people who have access to these large reputation graphs, then won’t all advertising budgets just move to these large companies?

The answer is that yes, content is participating in a race to the bottom, but brands are also subject to commodification. If commodification continues, then brand marketing budget decisions won’t matter anyway. Here’s a simple example.

Facebook ad info for BM 00704

This is a large company’s user dashboard for checking on who is using my personal information. “BM 00704” is directly competing with established brands to sell me branded goods and services, and of course I never gave them my personal info. The big company, in this case Facebook, cooperated in whatever tricks got pulled to get my information from somewhere. In most direct marketing media, like direct mail, a vendor can “seed” the mailing list with records that will get back to them to let them know if the list gets copied without permission. Facebook Custom Audiences give the list owner no way to detect seed records, which makes Facebook an easy way to use a stolen customer list without detection. Brands can’t expect any protection from the commodification effect that publishers are seeing. There’s no reason to expect that first, servers will be commidified, then they’ll come for the publishers, and somehow they’re just going to decide to stop before they get to, say, Oreo cookies. If you want to tell the commoditization dystopia story, don’t stop at publishers. Tell the whole story, including the brand part.

On the internet we love our dystopia stories. We’ve had the cryptography key escrow dystopia story, the digital rights management dystopia story, and today of course we’re having the surveillance marketing consolidation dystopia story. The crypto issue and the DRM issue are both still problems on the Internet, but they haven’t led to the end of civilization as we know it because there’s been a set of factors pushing back. Same goes for surveillance marketing. Here, a lot of the anti-dystopia narrative comes from privacy law and tools. As you’ve probably heard, in the California election that just happened, the California Privacy Rights and Enforcement Act, or Proposition 24, passed with 56% of the vote. That 56% turns out to be way below what CPRA had originally polled at before the election, where it came in at 88 or 72 percent.

At first, it looks like the surveillance marketing business managed to make their case to some California voters. But the main point against proposition 24 in the information that actually went out to voters is that it doesn’t do enough:

Vote NO on Proposition 24 because it was written behind closed doors with input from giant tech corporations that collect and misuse our personal information—while the measure’s sponsor rejected almost every suggestion from 11 privacy and consumer rights groups….The real winners with Proposition 24 are the biggest social media platforms, giant tech companies and credit reporting corporations who get more freedom to invade the privacy of workers and consumers, and to continue sharing your credit data. Here’s what they won’t tell you about the 52 pages of fine print: Proposition 24 asks you to approve an Internet “pay for privacy” scheme. Those who don’t pay more could get inferior service—bad connections, slower downloads and more pop up ads. It’s an electronic version of freeway express lanes for the wealthy and traffic jams for everyone else.

If you look at that 56 percent from the point of view of one of the big surveillance marketing companies, then yes, only 56 percent voted to have you walk the plank, but a substantial fraction of the of the other 44 voted for having you keelhauled first. For another reality check on how far away from people’s norms the direct marketing business has managed to get, take a look at this post from the infamous Unethical Life Pro Tips board on Reddit, where monetizing a list of PII was too unethical even for people who choose to moderate a forum about unethical activity.

ULPT Request: How to make money from an email/snail mail list
of contacts? (This post has been removed by the moderators)

California already has one privacy law, the California Consumer Privacy Act (CCPA). The Interactive Advertising Bureau recently surveyed corporate privacy lawyers and they found that only about 1-5% of people who are given the opportunity to exercise their privacy rights under the CCPA actually do so. And 60% of the lawyers surveyed said that their companies just go ahead and make CCPA rights available to everyone. You do have to put an asterisk on that number, because that’s 60% of businesses that are big enough and engaged enough to send a privacy lawyer to meet with the IAB, so the actual percentage is probably lower, but it’s clear that CCPA is having an influence on privacy features that are being made available even outside of California.

I am one of that 1-5% and i’ve been sending out CCPA opt-outs since January. The process is really kind of a pain still.

Result of a CCPA Right to Know

The percentage of people who vote for privacy stuff is high, but the percentage of people who actually take the time to do the privacy activities enabled by the law is a lot lower. The only thing less popular than creepy nerds is making yourself do creepy nerd stuff. The decentralizing effect comes from how not only will California voters just keep passing privacy initiatives until this stuff stops, but in how the laws are increasingly making it possible for organizations to take some of these actions on behalf of consumers.

I worked on an Authorized Agent study at Consumer Reports, and it had a response rate that any direct marketer would envy. We had a whole process for recruiting emails and multiple lists and following up, and we didn’t have to use any of it because the first email to the first list filled up the entire study group. It’s incredibly popular with consumers to be able to say, let me push one button and make this stuff stop. Watch for more info on results of the study.

Authorized Agent services are not some kind of silver bullet for putting end users completely in charge of their personal data. There’s a trend among activists to say that we’re headed for some kind of privacy or data control utopia, in which people are going to have a high level of control, and that’s not realistic. The data isn’t worth enough and people don’t care enough. But privacy services are a counterweight to centralization trends. As a company gets big enough and high enough in impact to drive meaningful centralization, it’s also going to be a high-value target for privacy laws and protection services to balance that out.

This is good news for advertising. Somehow the print publishing business managed to come up with a high margin, repeatable, sustainable advertising model for newspapers and magazines when they were breathing the fumes from molten lead all day, and we have not been able to get anywhere near that on the Internet. So we have a huge opportunity now to redesign the market for web advertising in such a way that it is acceptable to the audience, not forcing the kind of of consolidation that tends to get pushed back, while at the same time producing the kind of reputation effects in the audience’s head that make advertising worth buying. If the entire reputation graph lives inside a big company then there can be no brand equity, there can only be bits within someone’s centralized score, and that’s missing out on a lot of economic value. So I’m highly encouraged to be participating in the Improving Web Advertising Business Group and I’m really looking forward to seeing what kind of models we can come up with. Thank you very much.

Do you have any comments about the pros and cons of closed and open ecosystems? Some of the questions claim that a closed ecosystem is faster and allows for better measurements than the smaller publishers.

Unfortunately in this election cycle we’ve seen a lot of the limitations of centralization. From the point of view of a large tech company there’s constant pressure to lower the costs of functions such as moderation and ad review, and so you end up with moderators and ad reviewers who are doing these very stressful jobs that expose them to a lot of, say, terrorist material or child abuse material. And they’re up against dedicated misinformation operations, so when people rely on those centralized information sources then they’re they’re getting a source of information that’s been weaponized by highly motivated bad actors, whether they’re financial scammers or political extremists or both. When independent publishers are involved, there’s a media ethics point of view or a labor of love point of view on their content that tends to make it higher reputation and more reliable in a way that a big company whose algorithm is constantly being tested and gamed really can’t do.

How do we balance the need for privacy with the need for advertisers to know that a user is legitimate?

People have a trust relationship with their web publishers and so there is very personal information that people will share with a trusted content brand that they won’t share with some big bad Internet company in general. As editor of Linux Journal some of the most rewarding letters to the editor that we ever received were people who had been reading Linux Journal in prison. They didn’t generally have access to an actual Linux box but then when they got out they were able to get some kind of an IT job. People share personal information with their trusted publishers, writers, and editors in a way that a big company can never get.

The way to use that information appropriately, in a way that you can say to an advertiser that yes, I have this highly engaged audience, is a very fruitful field. I would look at the Trust Tokens sessions at W3C when those come up. Real people do a lot of real people things, and real people are good at recognizing other real people. Just to give an example, the 90210 area code in the USA has more Facebook users than people. Publisher networks of trust can show if, yes, this is a real human reader. And they’re much more reliable if it’s a publisher that has a subscriber or someone who’s who’s been interacting with them for a while, than if it’s just some big company saying, trust us, look at all these people we have watching videos, you can pivot to video now.

WebInnovationX link on YouTube with panel discussion from this session, plus more events

23 Nov 2020

An advertising awards show in the browser?

When adtech people, privacy developers, and the EFF can all agree that something is bad, it must be bad, right?

The proposed Federated Learning of Cohorts (FLoC) would replace the old-fashioned third-party cookie with a shorter identifier, calculated by a complex algorithm from your browsing activity. The cohort ID doesn’t correspond to you individually but identifies you as part of a “cohort”. Any site can call the JavaScript function document.interestCohort(); to find out which cohort you’re in.

Cohort-based ads sound great, if you’re in the cohort that gets ads featuring attractive Europeans driving new European cars on winding roads through the European woods. Or the cohort that gets ships cruising to scenic vacation destinations. But not all cohorts are going to get the good ads.

It’s even worse when a cohort ID might leak a sensitive piece of information about you. There is no way to test FLoC with all the legacy sites on the web that might leak some kind of sensitive info. What if a user’s pattern of play in a casual web game can leak something about their disability to the FLoC algorithm, and they stop getting certain job ads? Proving that FLoC protects user privacy is an unsolved problem, and might be mathematically impossible. So we have to assume that the a cohort ID leaks bits of sensitive personal info until it can be shown that it doesn’t. And, of course, from the web publisher point of view, FLoC leakage is a business issue. The FLoC algorithm could “learn” the subscriber lists of niche publications that depend on ad revenue. Since any site can call interestCohort, a site like cheapAssCatGifs.com might be able to sell the audience of a site like expensiveCarTeardownReviews.com, just based on cohort ID.

The FLoC-powered awards show

So far, not so good. But offline, people actually buy posters and books of award-winning ads, so there must be some demand for the good ads. If only there were some way to get more of the good ads right in the browser.

That’s where FLoC can really help.

Step one: Identify the good FLoC-based ads that appear on a set of sites, along with the cohort ID of the cohort that got them. For a first pass, pick out the ads that carry the most revenue for their weight. In general, the ad campaigns that are willing to pay more per impression are also the ones that have a budget for good creative work. At this point we have a first pass at a set of possibly good ads and can pick some good ones manually.

Step two: Keep track of which cohorts got the best ads, and share the highest ranking cohort with browsers that want to give their users the best experience.

Step three: the browser always responds to document.interestCohort(); with the winning cohort ID, for all users. No leakage of possibly sensitive info, the browser developers don’t have to code and test a bunch of hard cohort math, and everybody gets the good ads. It’s like an ad awards show in the browser. Could be updated every browser release.

What do you think?

13 Nov 2020

WIP: running a CCPA-in

Something to keep in mind for when they open up street fairs again. Could also run this as a local public library event or trade show booth.

Needed:

  • computer(s)
  • laser printer (office type with low cost/page)
  • one box per company
  • generator (if at a street fair)

How it works.

Setup

  1. Hook up the hardware.

  2. Label the boxes with the addresses of the companies to be opted out of.

For each user:

  1. User goes up to the computer, opens a simple web application, and types in their info.

  2. User can preview a sample CCPA letter on screen.

  3. User approves their contact info as it appears on the sample letter.

  4. Application generates a multi-page PDF, one company opt-out per page, and prints.

  5. User signs each page and hands back to a volunteer

  6. Volunteer puts each page in the appropriate company’s box.

At the end of the event:

  1. A volunteer seals up each box of letters

  2. A volunteer takes each box to the appropriate company (if local) or ships it.

This should be able to do CCPA letters at under a dime each. Under one cent for the paper, three or four cents in printing costs, about three cents per page for shipping if a large number are packed and shipped together.

This could also be set up as a kiosk with a drop box for step 5 and just come around every so often to empty it and ship out a batch.

22 Sep 2020

Why don’t you CCPA Facebook?

Here’s another question I get a lot. Why don’t you CCPA Facebook?

Here are some reasons, and there are probably others. (This post contains facebook.com links, so you may want to check your browser privacy settings before clicking.)

  1. CCPA Do Not Sell requests, also known as opt outs, only affect “sale” of personal information. In general, the flow of personal info to Facebook is one way, inbound. Facebook either (a) doesn’t sell or share my personal info, or (b) they do but stop when it makes the news.

  2. A CCPA Right to Know probably wouldn’t get me any more information than is already available using Facebook’s Download Your Information feature, at least to start.

  3. Facebook trackers on non-Facebook web sites can be blocked with Facebook Container and other browser privacy features. This activity is important to stop, but it’s already handled.

  4. When the user-enabled global privacy controls mentioned in the CCPA regulations are ready, I’ll be sending a CCPA opt out to every site I visit unless I specifically say otherwise. So I don’t need to worry about Facebook trackers on random sites any more than I need to worry about any random LUMAscape company.

  5. Facebook Custom Audiences are like those sampling wells on hazardous waste sites that catch anything nasty that leaks in the area. Crooked politicians, fake brands, Amazon review manipulation schemes, data brokers ripping off their clients, the whole skeevy side of marketing is right there in the Facebook ad settings. If I sent a CCPA Right to Delete to Facebook, it would be like deleting my account manually—I would lose a way to see what’s happening with my other CCPA activity and to learn about other privacy risks.

I do use Facebook Custom Audiences to find companies that are sending my personal info to Facebook. In the case of obvious scams or marketing firms taking advantage of clients, I just take a screenshot. If a real company is adding me to a Custom Audience, I CCPA the company, where it will do some good.

19 Sep 2020

figuring out the CCPA escalation path

(Update 10 Sep 2021: add more material on GPC)

(update 8 Oct 2020: add material on GPC, copy edit)

The later you catch a software bug, the more expensive it is to fix. Catching a syntax error while you’re typing code costs practically nothing, fixing a broken test is more expensive, and deploying an update to users can cost even more.

The California Consumer Privacy Act (CCPA) provides for a similar escalation path. It’s helpful to look at all the ways to handle CCPA obligations in order from lowest overhead to highest. Just as software developers are learning to find and fix threading bugs with build-time borrow checking before a difficult bug can make it to a customer, companies are learning how to handle CCPA rights at the easiest, fastest, cheapest level.

Here are the levels, cheapest to most expensive.

browser Do Not Sell. The CCPA regulations (PDF) say,

If a business collects personal information from consumers online, the business shall treat user-enabled global privacy controls, such as a browser plug-in or privacy setting, device setting, or other mechanism, that communicate or signal the consumer’s choice to opt-out of the sale of their personal information as a valid request submitted pursuant to Civil Code section 1798.120 for that browser or device, or, if known, for the consumer.

A standard signal to implement this, called Global Privacy Control, has just been announced, and is being tested across a variety of browsers, extensions, and sites. It is technically similar to the old Do Not Track, with the big difference that it’s intended to be now legally required (updated for 2021).

authorized agent Do Not Sell This is the first escalation for a mishandled browser Do Not Sell, and the lowest level for a company that the consumer does not have a direct HTTP connection to. Agent opt outs can be bundled and made easy to handle, and the agent has an incentive to cooperate with the company, to cut their own costs and increase user satisfaction. Bulk handling of agent opt outs is an easy win for DSAR vendors, to lower average cost per CCPA transaction.

Do Not Sell My Personal Information emails and clicks. These are similar to a GDPR Article 21 objection, but can’t be handled with the same processes used for GDPR. Lightweight because no user identity verification is required (although the company can do an anti-fraud check) but still heavier than handling an agent opt out. If you get an opt out, cheaper to act on it than to push back and make the consumer escalate to a Right to Know or Right to Delete.

Right to Know and Right to Delete If a Do Not Sell gets subjected to illegal verification steps or other dark patterns, then the consumer can escalate to a Right to Know, followed by a Right to Delete. According to vendors of Data Subject Access Rights software, manual handling of a Right to Know can cost $1,400-$10,000. Software and processes are going to bring this down, but realistically nowhere near the cost of dealing with the browser signal or the agent opt opt correctly in the first place.

Companies that try to apply the same user experience to a Right to Know as to a more common and less expensive opt out are likely to have to have to deal with a higher volume of Right to Know requests.

Somewhere along this escalation path, users can make automated or manual reports to the office of the Attorney General, to help them pick targets for enforcement. They certainly don’t have the staff time to go after most CCPA violations, but reports from consumers and consumer organizations will help them pick some high-priority targets.

Some experts are recommending relying on dark patterns to limit the number of CCPA requests I still don’t like the term “requests” here since the company has to comply with them. And legally binding communications coming from the company to the customer is never called a request. But “request” is in the regulations. that companies have to deal with. But we’re going to discover that the dark patterns approach is flawed. Yes, a lot of consumers are going to give up and go away when they hit a dark pattern, such as an extra verification not allowed by the law, but a fraction of the consumers are going to escalate. A company that chooses dark patterns instead of straightforward compliance is making a high-stakes bet on what fraction of consumers will escalate.

28 Aug 2020

Building a C program with dependencies in a container with Earthly

Every time I build an open source project from scratch, I end up installing a bunch of dependencies. And often it’s tricky to get the build working the same way on multiple systems. And it’s even harder to get new people started on a project.

So just containerize it, right? So far I have found that Docker containers work great for web applications. I get an accurate copy of the standard production environment inside a container, and I can still edit files and use my web framework’s auto-reload feature. Here’s an example, the web.sh script in the Pinfactory project. All the dependencies get installed in the project Dockerfile, and then the source code (including all the HTML templates, CSS, and graphics) is in a volume shared into the container. This way I can do a flask run inside the container, and when I edit a file in the volume, it Just Works and auto-reloads. Pinfactory is easy to work on in containers. You can run one script to do all the unit tests in a container, one script to start up a web server with real data, and there’s even a tricked-out demo script. that creates a container with multiple users. Containerizing web applications is a win for small stuff, too. Here’s a simple Dockerfile for a Jekyll project that I can use to preview a relatively large Jekyll site locally, without installing any Ruby packages.

Containers for developing and testing web sites locally are great. So what about containerizing a regular software build?

Building a simple tool to sign files

I run my own mail server and other services. (My blog is on a VPS with a static site generator.) That means tracking and deploying a bunch of files that end up in a bunch of different places, on systems running a variety of Linux distributions.

I want to be able to sign important files, and check signatures, so I’m looking for a good, lightweight digital signature tool.

Looking around, I found signify, the OpenBSD tool to sign and verify signatures on files, in a portable version. Looks like just what I need. Sign stuff, check the signatures of files on a remote system, not a lot to configure, easy to script. Also, good practice for a new way to make a software build easy to manage and repeatable.

Signify has a very nice build that facilitates what I want to do, driven by a well laid-out Makefile. I can build a statically linked signify, and the man page, that will work on all my Linux systems of whatever distribution. Signify is also a good example of a program to build and install, because it includes an interesting dependency and a step where the Makefile needs to check a signature of the dependency.

Driving the build with Earthly

Earthly is a build automation tool for container-based tools. It uses the Docker daemon to manage containers. I have run it with both docker.com’s Docker Engine and with the Docker packages for Fedora 32: moby-engine and docker-compose.

Earthly is controlled by an Earthfile, which is like a Dockerfile, broken out into targets like a Makefile. Each target produces an entire container image, including all side effects. If anything in your build leaves stray files behind in /tmp or the user’s home directory, they will be persisted.

The install is simple—it’s a single binary. The install instructions on the Earthly site will put it in /usr/local/bin by default, but there’s nothing else to add or configure besides Earthly and Docker. More info: Earthfile reference

There is an example Earthfile for a C++ project with CMake that I’ll use as a starting point.

Planning a Signify build.

My Signify build will have to be a little more complicated than just installing the packages I need from the package manager, copying the Signify source code into the container, and then running make.

In order to make a static build with signify’s bundled copy of libbsd, I will also need to download and verify a libbsd release. The signify Makefile already knows how to download libbsd and build it into a statically linked signify binary. All I have to add to do is

make BUNDLED_LIBBSD=1 static

But if I do that, every time I do a build, I have to go out on the network. Behind the scenes, the signify Makefile is running wget to download first the signature for the libbsd release…

    $(WGET) -cO $@ '$(libbsd_ASC_URL)'

and then the tar file.

    $(WGET) -cO $@ '$(libbsd_TAR_URL)'

So I really want to separate the download step from the build step. I want something like this.

  1. Set up the base system and save a container image.

  2. Download libbsd and save a container image.

  3. Copy my current version of the code into the container, do the build, save the build artifacts.

Step 3 shouldn’t require any network access, so should be really fast. As fast as a regular make, anyway.

First try, first FAIL

Here’s my first attempt at step 2. I’ll take advantage of the nice libbsd-download target in the signify Makefile, and do this right after I copy the code into the container.

COPY --dir . /code
RUN make BUNDLED_LIBBSD=1 libbsd-download

No, wait, libbsd-download needs to check the signature. Make that:

COPY --dir . /code
RUN gpg --import /root/keys/libbsd.asc
RUN make BUNDLED_LIBBSD=1 libbsd-download
SAVE IMAGE

When I first tried this, I was not able to make the libbsd-download target, because GPG tried to leave a socket behind under .gnupg in the build user’s home directory.

+build | ERROR: (RUN [make BUNDLED_LIBBSD=1 static]) executor failed running [/bin/sh -c  /bin/sh -c 'make BUNDLED_LIBBSD=1 static']: buildkit-runc did not terminate successfully: context canceled: context canceled
Error: solve side effects: build error group: solve: failed to solve: rpc error: code = Unknown desc = failed to compute cache key: failed to create hash for /root/.gnupg/S.gpg-agent: archive/tar: sockets not supported

If you found this page by Googling for archive/tar: sockets not supported, here’s the answer. It’s a known bug in buildkit, the software build system maintained as part of Moby, which is the open-source project that forms the basis of Docker. Earthly has fixed the problem by updating to the new version of buildkit.

If you’re still seeing this error, you can (1) upgrade your Earthly and Docker, (2) don’t try to do any build steps that run GPG until the final target, or (3) remove the sockets by adding

RUN rm -f /root/.gnupg/S*

before the SAVE IMAGE.

So my first attempt at getting Signify to build was:

  • copy the signify sources over
  • import the key
  • Do a make libbsd-download
  • remove the GPG sockets because they can’t be saved in the container image
  • finally, save the image.

In Earthfile, that looks like this.

RUN gpg --import /root/keys/libbsd.asc
RUN make BUNDLED_LIBBSD=1 libbsd-download
RUN rm -f /root/.gnupg/S*
SAVE IMAGE

But that’s a sub-optimal solution.

Splitting out download, copy, and build steps

The problem with the above method is that if I change something in the signify source code, the Earthly build has to go download libbsd again.

This is slow, and bad style, and it means if you need to make a quick change to the C source code, the build still goes and gets some unchanged dependencies.

Ideally you have all your dependencies stored locally, so if there’s a network outage, or a trade war, or some developer rage-quits and takes their downloads page down, the build will still go brrrrr. Not that anything like that would happen in the case of signify, but you never know. And since Earthly is new enough that early Earthfiles will end up being copied and changed for generations, like Makefiles, I might as well figure out a generally good way to do it.

Making it all work.

So here’s the solution I came up with. First, I’ll get the base system set up. This should be familiar to Docker users. The root user is going to need a copy of the public key needed to check libbsd, so we’ll get that too.

# build.earth
FROM debian:stable

# install build dependencies, then clean up system packages
RUN apt-get -y update && \
    apt-get -y install build-essential file make gcc git pkg-config wget && \
    apt-get -y --purge autoremove && \
    apt-get -y clean 

# Fetch the public key for the libbsd release.  This will be needed in
# the build step.
RUN mkdir -m 700 -p /root/keys /root/.gnupg
RUN wget https://www.hadrons.org/~guillem/guillem-4F3E74F436050C10F5696574B972BF3EA4AE57A3.asc -O /root/keys/libbsd.asc

WORKDIR /code

Now it’s time to get the bundled libbsd. Instead of running the entire make libbsd-download, we’ll just grab the files. We can apply the “Don’t Repeat Yourself” principle to the URLs, by having the Makefile tell us what they are, using the libbsd-print-urls target.

bundle:
  # This target downloads the bundled libbsd.  This should only run again
  # if the Makefile changes.
  RUN mkdir /bundle
  COPY Makefile /bundle

  # The Makefile includes a "libbsd-print-urls" target that prints the 
  # URLs of the libbsd files needed to work with this version of signify.
  RUN (cd /bundle && make BUNDLED_LIBBSD=1 libbsd-print-urls | xargs wget)
  RUN rm /bundle/Makefile

  # Now all that is left in /bundle is copies of the files listed by
  # libbsd-print-urls.
  SAVE IMAGE

At this point, we have a container image with the libbsd code and signature in /bundle, and the key needed to check it in /root/keys. Now it’s time to copy in the actual code, and add the libbsd files.

code:
  # Copy everything, then copy the libbsd files in.
  FROM +bundle
  COPY --dir . /code
  RUN cp /bundle/* /code
  SAVE IMAGE

The code target will get re-run any time that anything gets changed. But it’s fast because it’s just local copies.

Hooray, time to build. We’ll do a quick touch on the libbsd files so that the helpful and full-featured Makefile doesn’t try to get them again, then make the executable, make the compressed man page, run the test suite, and save the artifacts.

build:
  FROM +code

  # The modification date on the libbsd source and signature needs to be
  # new enough for the build not to try downloading it again.
  RUN find . -maxdepth 1 -name 'libbsd*' -exec touch '{}' ';' 

  # The build requires a GPG verify, so import the key
  RUN gpg --import /root/keys/libbsd.asc

  # Make the statically linked binary and the compressed man page.
  RUN make BUNDLED_LIBBSD=1 static signify.1.gz

  # Run the regression tests. (Even though signify is already built with
  # bundled libbsd, we need to use BUNDLED_LIBBSD to keep from checking
  # for a system installed copy.)
  RUN make BUNDLED_LIBBSD=1 check

  # Save the static binary and the man page
  SAVE ARTIFACT signify AS LOCAL signify
  SAVE ARTIFACT signify.1.gz AS LOCAL signify.1.gz

No need to SAVE IMAGE a container image at this point, because I just need the two artifacts.

And it’s all done.

Right now Earthly is pretty new, so most of the discussion is happening on the GitHub page.

There is also a Gitter channel for user questions.

The project is responsive to issues and suggestions—they implemented my suggestion to move the cache out of /tmp and into what I think should be the FHS-compliant place for it. Watch the Examples on their docs site for more sample builds.


This is a sponsored post. This article and modified versions of this article may be copied and redistributed under the same terms as Earthly.

This article and modified versions of this article may be copied and redistributed under the same terms as Signify.

Markdown source for this article: signify/README-earthly.md at earth-wip · dmarti/signify

14 Aug 2020

Answers to some questions on CCPA opt out compliance

(update 3 Apr 2024: enforcement advisory)

(update 28 Aug 2020: when a company can contact a consumer directly)

I have been getting some interesting questions about how GDPR compliance doesn’t necessarily get you CCPA compliance.

Q: CCPA opt-outs do not have to be verified? Does that also apply to Right to Know and Right to Delete?

A: No. The standards for opt out and for other CCPA requests are different.

Deletion or Right to Know have to be either verified, (or, in the case of an agent request, done with a power of attorney).

Direct Agent
opt out (Do Not Sell) no verification no verification
Right to Know verification verification with the consumer or power of attorney
Right to Delete verification verification with the consumer or power of attorney

Q: Don’t companies have to verify requests in order to avoid risks?

A: Yes, this is why there is a verification (or power of attorney) requirement for Right to Know and Right to Delete. The customer doesn’t want to risk having their record sent to the wrong person, and they don’t want to risk having their loyalty program deleted right when they were on their 9th paid sandwich and they have a free sandwich coming.

But an opt out is different. The regulations require no verification here because the risk is lower. The only consequence to the consumer of accepting an opt out erroneously is that a “do not sell” bit in their customer record gets set. Nobody’s info will get compromised or deleted.

Q: If a company does come up with a good-faith reason to deny an opt out, can they use the same verification process as for GDPR?

A: No. A slightly different workflow is needed. Let’s review the regulations again…

A request to opt-out need not be a verifiable consumer request. If a business, however, has a good-faith, reasonable, and documented belief that a request to opt-out is fraudulent, the business may deny the request. The business shall inform the requestor that it will not comply with the request and shall provide an explanation why it believes the request is fraudulent.

So you can’t just treat a CCPA opt out that you suspect is fraudulent as if it was an incoming GDPR Article 21 objection that you can verify. This is one of two ways that a CCPA opt out is different from a GDPR Article 21.

  1. For Article 21, a company can verify just based on “reasonable doubts”. For CCPA opt-out, the company can only deny if they can meet the higher standard of “good-faith, reasonable, and documented belief.”

  2. Even if the company does have a belief that meets the CCPA standard, they can’t just redirect to the verification form that they use for Article 21. An explanation of why they believe the opt out is bogus has to come with the denial. And the explanation goes to the requestor, which is the authorized agent in the case of an agent request.

A lot of vendors still get this wrong, likely because they are still repurposing GDPR code for CCPA. This is something you have to check.

Q: When a company receives an authorized agent opt out, when can they contact the consumer directly?

A: For opt outs that come in from an authorized agent, there are two situations where the company can go to the consumer for verification.

  • If the opt out is missing some of the required paperwork. The agent is required to provide written permission from the consumer. If this is missing, the company can deny the opt out.

  • As for other opt outs, when the company has a “good-faith, reasonable, and documented belief” that the request is fraudulent. The company can’t go to the consumer to re-check a legit agent opt out, but an agent that claimed permission in a consumer’s name would be fraud.

Either way, the response is different from what the company would do to verify a GDPR Article 21 objection.

  • If an agent opt out is missing the required written permission, you can just drop it on the floor. There’s no requirement to let either the agent or the consumer correct it.

  • If a company chooses not to act on an agent opt out, they have to provide the full “explanation of why it believes the request is fraudulent.”

Yes, this means that if a company did all the work of making user stories and code for GDPR Article 21, they won’t be able to re-use them unmodified for CCPA.

Q: Why did they make CCPA so different, and not just copy GDPR or a subset of it?

A: That goes back to why CCPA is opt-out-based, and not consent-based like GDPR. The authors of CCPA anticipated that an opt out system would be more likely to hold up in court in the USA than a consent-based system.

So in order to make opt-out workable at all, they had to make the process reasonably lightweight for the consumer. Under GDPR the consumer can just choose not to consent, but under CCPA the consumer has to choose to do something to get the equivalent result. That action that the consumer takes by choice has to be feasible.

Q: Which service providers are getting it right?

A: Implementing the opt out code path correctly is likely to be a differentiating advantage for service providers in 2020. A lot of the first wave of CCPA services just copied GDPR and made best guesses on the regulations, but now there is an opportunity for services to get the difference between opt out and other request types correct.

Watch this space.

Q: If verification is not required, why do authorized agents still put their users through a verification process?

A: Verification for opt out is manageable if you do it once per agent relationship and not once per logo on the Lumascape.

But still, if an opt out does not have to be a verified consumer request, why do authorized agents go ahead and verify emails and phone numbers? Because if we don’t verify, then pranksters will sign up with fake names, the agent will look silly sending opt-outs from Mickey Mouse, and all those bogus opt-outs will eventually help companies form a good-faith reason to believe that opt-outs from this agent are fraudulent, and start denying them.

Doing verification on the agent side means that by the time the company sees the opt-out it will be verified to a standard strong enough that there is no reasonable way that the company could have a good-faith belief that it’s fraudulent. So they have to handle it in one step without looping in the consumer.

06 Aug 2020

Passwords

(update 31 Aug 2020: add Prop 24 info and Adweek Magic Link)

Things that people are bad at:

  • remembering strong passwords

  • choosing and using different passwords on different sites

Things that most sites depend on for their security

  • (see above)

Things that will cost you $750 each if you mess up on managing them:

So avoiding passwords is going to be more and more of a thing. Here’s a list of ways to avoid doing passwords (or make the password optional)

  • Email login link. You type in your email address and it sends you a link to log in. Red Hat Mugshot did this, a while ago. Today. it’s implemented nicely at Adweek, where they call it “Magic Link”.

  • ssh to log in. You give the site your ssh public key when you make your account, then to start a session you ssh to a service that gives you a URL containing a session key. $BROWSER $(ssh login.example.org)

  • Log in with G.A.F.A.

  • QR code. If you are already logged in on one device and want to log in from another one, scan a QR code with s33krit crypto stuff in it, like Keybase.

Any more?

03 Aug 2020

SELECT * FROM services WHERE name = ‘postgres’

I guess we can rearrange the default web site to-do list, to put the branded jackets and the user conference ahead of the job server and full-text search engine. Some good articles on simplifying the back end by making PostgreSQL do it.

29 Jul 2020

Can you use Google Analytics in California? What if a user does a CCPA opt out?

The short answer is yes. Google Analytics is even on the oag.ca.gov site.

Since CCPA is opt out, Google Analytics is going to be fine unless you know the site visitor has opted out. Once a site visitor sends you a CCPA opt out, what do you have to do to square it with Google?

This is where the Google documentation gets a little confusing. They’re not going to tell you too much about whether a specific setting will get you to compliance, probably because that sounds too much like giving legal advice, and every site is different anyway. A wise man once said, Go not to the Elves for counsel, for they will say both no and yes.

Realistically, the Attorney General’s office has limited time to bring CCPA enforcement cases, and realistically, hardly anybody has time to read n pages of Google documentation except the n/(reading speed) developers that Google can afford to hire, and nobody has enough C++ developers to keep up with Google’s replacement technologies for the 3rd-party cookie, either so all you really need to do is not be one of the dozen or so creepiest, or most famous for being creepy, companies out there.

So read through the stuff on Helping advertisers, publishers, and partners comply with the California Consumer Privacy Act (CCPA), make sure that you’re set up with the new version of the data processing terms, and then if someone opts out, do a ga('set', 'allowAdPersonalizationSignals', false); like it says on Advertising Features  |  Analytics for Web (analytics.js). And you should be good.

There is similar CCPA compliance stuff for other areas like ads, which gets a little more gnarly. But unless you have time to do every privacy compliance thing to white-glove standards, Google Analytics is probably not worth that much time. (Facebook Custom Audiences, on the other hand, are a big potential red flag, and anyone who takes the time to rage-surf Facebook for companies giving their contact info to the World’s Creepiest PHP Programmer is going to see them. Probably worth more compliance checking time than it looks like they’re getting. But that’s another story.) The one exception is if you’re stuffing PII into analytics events (which you can do, because you can put all kinds of stuff in analytics) but that’s hopefully rare enough that nobody reading this blog is doing it. So there’s plenty more to worry about than Google Analytics. And like I keep saying, all this stuff should be a win, not a cost center. In the near future, healthy organizations, fandoms, communities of practices and audiences will cooperatively (and with the help of publishers) spew forth CCPA opt-outs to protect themselves, like Penicillium colonies soaking their neighborhoods in antibiotics.

Related: Google Analytics: Cookieless Tracking Without GDPR Consent • Helge Klein

20 Jul 2020

CCPA for nerds, part 2

Here’s a quick update on CCPA opt out, nerd edition, which describes how I can send so many CCPA opt-outs so quickly. As you may recall, I made a simple CCPA opt-out tool, using…

  1. Mutt with GNU Privacy Guard

  2. Keybase

  3. A /code/ccpa shell script

  4. An opt-out letter. This is my opt-out that includes some language on Facebook data transfers.

The script generates signed CCPA opt-out requests, which do work.

But how do I get my PII into the outgoing mail, without putting it in the CCPA opt-out letters?

Use templates. Since I already know Mustache templates from web development, the final tool in the CCPA opt out stack is…

  1. mo is a tool to replace simple {{ STUFF }} template variables with values taken from environment variables.

So now I just set CCPA_ADDRESS and CCPA_PHONE in my .bashrc and they get substituted into the letter.

07 Jul 2020

Example of where GDPR compliance doesn’t get you CCPA compliance

You can’t just cut and paste a set of existing GDPR compliance tools and processes (or a subset of what you do for GDPR) and get to CCPA compliance. (Update 23 March 2025: CPPA Penalizes Honda for CCPA Compliance Violations by Alan L. Friel, Kyle R. Fath, Niloufar Massachi of Squire Patton Boggs (US) LLP—detailed coverage of a case that includes this problem)

One area where CCPA and GDPR are substantially different is identity verification. (This is something that published articles on CCPA compliance often get wrong. Check with your lawyer.)

GDPR: where the controller has reasonable doubts concerning the identity of the natural person making the request referred to in Articles 15 to 21, the controller may request the provision of additional information necessary to confirm the identity of the data subject.

CCPA regulations: A request to opt-out need not be a verifiable consumer request. If a business, however, has a good-faith, reasonable, and documented belief that a request to opt-out is fraudulent, the business may deny the request. The business shall inform the requestor that it will not comply with the request and shall provide an explanation why it believes the request is fraudulent.

If someone sends a GDPR Article 21 objection, the recipient is allowed to ask them for additional info to verify themselves, and doesn’t have to explain why. But if someone sends a CCPA opt-out, the recipient has to act on it unless they have a good-faith, reasonable, and documented belief that it’s actually fraudulent.

And, on denying an opt-out, the recipient must provide an explanation of why they believe the request to be fraudulent. This writing assignment for the recipient is in CCPA but not GDPR.

(This only applies to out outs. The recipient can verify identity if someone asks for right to know and/or right to delete.)

Also, the CCPA opt-out doesn’t have to come directly from the natural person. It can be from an authorized agent or a browser setting. The recipient still has to have that good-faith, reasonable, and documented belief in order to deny it, and they still have the writing assignment.

12 Jun 2020

CCPA opt out, nerd edition

Update 1 Jan 2020: link to current version of opt out letter

While we figure out how to make general-purpose CCPA opt-outs practical (feel free to set up a time on Calendly if you want to talk with me about Authorized Agent projects), here’s a quick summary of my current CCPA opt-out tools. This is a prototype only, but does work.

  1. Mutt with GNU Privacy Guard (I put this first so people with a normal mail client setup can stop reading. You’re welcome.)

  2. A Keybase account. Yes, we don’t know what will happen to this service in the long run, but this is a prototype so whatevs.

  3. My ccpa shell script.

  4. An opt-out letter.

What the script does is generate a GPG-signed opt-out request that I can edit and send in Mutt, and the letter contains a link to the Keybase profile. Remember that the CCPA regulations Section 999.315(h) if you’re following along at home. say,

“A request to opt-out need not be a verifiable consumer request. If a business, however, has a good-faith, reasonable, and documented belief that a request to opt-out is fraudulent, the business may deny the request. The business shall inform the requestor that it will not comply with the request and shall provide an explanation why it believes the request is fraudulent.”

They’re not allowed to deny the first one, opt-out, unless they come up with a good reason. Their choice is to do the opt-out or to take on a writing assignment for me for no money. And if they really want to pick the latter, and write an explanation of their “good-faith, reasonable, and documented belief” that this opt-out is fraudulent, they have to learn GPG and Keybase, and no growth-hacking marketer is going to spend time doing that.

This script works so far. They just do the opt-out. Sometimes they’ll even treat the opt-out alone as either a Request to Know and/or a Request to Delete, because seriously, time is limited and once you have decided to just give the privacy nerd what they want, you might as well get all that nerd stuff out of the way at once.

Next steps? Make it work for more people, opting out of more companies. Right now this is just a simple version of the items needed for a “real” opt-out.

  • credibly claim that I am the person identified by data points that can be found in a marketing database (email, phone, IDFA/AdID, postal address…)

  • credibly claim that I am in a jurisdiction where I have a privacy right

  • assert that the sender of this letter has the right to act on a privacy request for me (in this case because it’s me)

To be practical at scale, all of these need good UX, working, scalable implementation, and an effective legal payload. Some opt-outs will be on-demand, one company at a time, and others will be larger, in cases where you want to opt out of every company engaging in a specific practice and have your authorized agent do it.

But for now, for someone with my mail setup, doing CCPA is pretty convenient.

More, including template suppport: CCPA for nerds, part 2

06 Jun 2020

CCPA as a general-purpose activist tool?

(Update 30 May 2023: add link to GDPR objection)

The final text of the California Consumer Privacy Act (CCPA) regulations is up, along with the final statement of reasons from the Attorney General’s office. So it might be a good time to think about how the CCPA can be a useful tool for connecting and sharing with brands about policy issues on which you disagree.

CCPA is not just for privacy nerds. If you have a problem with a company over some other issue (pollution? wage theft? tax avoidance?) then the CCPA provides a way to register a complaint and make it stick. Companies do have to pay attention to CCPA requests.

A lot of ways to express alignment with a cause are things to not do, such as not buying or not crossing a picket line. CCPA requests can be almost as lightweight, but are unambiguously something to do.

There are three kinds of CCPA requests.

  • opt out (Do Not Sell).

  • Right to Know.

  • Right to Delete

Opt-outs are the easiest. They do not have to be verified. This is a fast, lightweight way to express that you have a problem with a company.

“A request to opt-out need not be a verifiable consumer request. If a business, however, has a good-faith, reasonable, and documented belief that a request to opt-out is fraudulent, the business may deny the request. The business shall inform the requestor that it will not comply with the request and shall provide an explanation why it believes the request is fraudulent.”

If you just do an opt out, they can’t go make you go fill out a bunch of web forms for ID verification while they shoot at your feet. They have only 2 choices.

  1. Act on your opt-out

  2. Send you an explanation of their “good-faith, reasonable, and documented belief” that your opt-out is fraudulent.

The other two require identify verification, so are time-consuming to do as an individual. They do more for you, and send a harder-to-ignore message, but they can make you do identity verification steps.

An opt-out is easy, but what does it really do? Most companies will say that they don’t “sell” your personal info for money. However, CCPA has a broader definition of selling, which lawyers say is confusing, and service providers offer to help with. The CCPA defines “Sale” as

selling, renting, releasing, disclosing, disseminating, making available, transferring, or otherwise communicating orally, in writing, or by electronic or other means, a consumer’s personal information by the business to another business or a third party for monetary or other valuable consideration.

This definition covers some common marketing practices. Interestingly enough, while Facebook is trying to carve out a CCPA exemption for themselves, they do require their advertisers to honor opt-outs. The Facebook Custom Audiences Terms say

“You represent and warrant that the Hashed Data does not relate to data about any individual who has exercised an option that you have, directly or indirectly, committed to honoring or provided to opt out of having that data disclosed and used for targeted advertising. To the extent an individual exercises such an opt-out after you have used data relating to that individual to create a custom audience, you will remove that data subject from the custom audience.”

From the privacy nerd point of view, it makes sense to get as many people as possible opted out everywhere. But from the point of view of influencing companies on other policy issues, can CCPA campaigns fill a niche between just Tweeting and more time-consuming actions?

Here’s a CCPA opt-out email I have been working on. Put in your own cause and contact info. This is a little long because there has to be some explanatory material for now. A lot of companies are still learning CCPA, and they have a bunch of marketing stuff going on that they might not be aware of. (Maybe last summer’s marketing intern uploaded a customer list to an account they don’t have the password to any more?) For now the responses can be all over the place.

If you want to try CCPA-ing some companies to get a sense of how it works, try this:

Californians, did you know you can now opt-out of companies selling your personal information online? You have this right under a new CA privacy law, but we'd like your feedback. Sign up below for our volunteer study, and help us stand up for your privacy.

— Consumer Reports Advocacy (@CRAdvocacy) Twitter, May 14, 2020

18 May 2020

The surveillance economy is more like the commodification economy

Previous: Open source businesses, meet the real world

If we want to do anything about the surveillance economy problem, it helps to try to figure out why it’s so important to so many people to do surveillance marketing working definition of surveillance marketing: any database marketing where some or all of the data used is derived from surveillance instead of other kinds of marketing. One way to look at it is that it’s not about the surveillance. Surveillance is a tool in a more complicated process. Today, the surveillance marketing business looks like the Establishment, but it used to be cool (business cool anyway) back when it was emerging from the open source scene of the late 1990s and early 2000s.

Doc Searls wrote, Madison Avenue fell asleep, direct response marketing ate its brain, and it woke up as an alien replica of itself. Part of that was driven by the realization that the same Commoditize Your Complement strategy that applies to device drivers and other open source software can also apply to every business, everywhere. It’s more of a commodification economy than a surveillance economy, but we got started saying surveillance economy so we might as well stick with it.

The general direction of the “surveillance economy” is to build a system where reputation graphs gain sustainable value, and goods and services (whether they’re PC hardware, or software, or songs, or news stories, or home delivery of stuff) lose value until they become commodity nodes on someone else’s reputation graph. Information goods such as software code can and should be public, while reputation graphs are hidden by legal restrictions, complexity, and scale. And there’s a reputation graph for everything. Amazon sellers. Uber drivers. News sites. The same adtech companies that facilited content commodification in 2014 are facilitating content commodification today. Web adfraud is just as necessary for web adtech as extra drivers cruising around are necessary to Uber.

The first market in information goods to get commoditized was the slothful, overpriced Unix business, which was already being starved of investment as CIOs and analysts talked up the inevitability of Windows NT. When Linux came along, the commodity was an improvement. Naturally, when the commodification strategy worked in one business, the winners went on to apply it elsewhere.

  • Use a stack of janky PC hardware instead of splashing out on a Digital Unix server: cool hack

  • Use a stack of miracle cure peddlers instead of splashing out on a star MD: maybe not so much

YouTube doesn’t show you videos about drinking bleach because they want you to drink bleach. They show you videos about drinking bleach because the recommendation algorithm is set up to avoid creating the kind of powerful stars who tend to emerge on other media. The bleach-drinking advocacy is just a side effect of a system designed to steer you away from any content creator who might develop market power.

Algorithms that artificially boost “long tails” in content We can see this happening but it’s hard to quantify from the outside. We won’t see it for real until the discovery phase of some ugly lawsuit over some tragic event. are an essential part of the commodification strategy. For a platform company, fraudulent and misinfo content doesn’t look good in the PR clippings, but it’s manageable if you can get enough Serious People to write slick PDFs about how it’s an Industry Wide Problem. Allowing the algorithm to let user preferences create bankable stars would be a real threat.

For every bad thing on the Internet, there is an “if this goes on” dystopia story. If the surveillance economy keeps going the way it’s going, we’ll all end up working for and buying from one big company built around a really complicated reputation graph that measures who can catch the tastiest rats in the ruins of civilization. But no dystopia narrative runs out all the way (yet). Just as the surveillance economy dystopia narrative mostly ended the IP Maximalism dystopia narrative, we’re going to get a new one.

So that’s the fun part. Which marketers will manage to de-commodify themselves? (Maybe think economic signal, using the whole customer including their own hard-wired monkey brain ability to process reputation and how brands are a cognitive hack on that, and so on.) One piece of good news. Section 230 of the Communications Decency Act helps platform companies enable commodification, but other laws and regulations are giving us tools to resist it. More on that soon.

14 Apr 2020

Conference talk abstracts and links

Southern California Linux Expo, 2023: Leveling up your open-source privacy toolset for 2023 Most of us have built up a collection of privacy tools, including browser extensions, settings, and services. But privacy threats keep changing, while at the same time new kinds of tools and services have become available. This talk will help maximize the value of your privacy toolset for today’s best options, while helping you plan for the future. (March 12, 2023)

Southern California Linux Expo, 2020: Hacking the California Consumer Privacy Act for Fun and Profit (and freedom and privacy) A new California law that will help protect you from scams and misinformation, by taking control of your personal information. For legit sites that respect people’s privacy, CCPA helps shift the balance of power on the web away from the “data brokers” that share people’s sensitive health and finance information on the Internet, and toward sites that people trust. (March 5-8, 2020)

Southern California Linux Expo, 2020: Designing a market to reduce software risk and compensate open source contributors Developers face the risks of under-incentivization, leading to lack of sustainability of key projects. Meanwhile, users face the risks of missing or delayed features, bugs going un-fixed, and software projects going unmaintained. This project is one of several recent attempts to improve transparency, incentivization, and sustainability in the production of software. (March 5-8, 2020)

Trust, Identity and Data Privacy: A Multistakeholder Approach: Report from Mozilla’s test of their global consent browser-plug in tool Users need help managing their privacy preferences across multiple web services. Mozilla is testing “Global Consent Manager” — a browser tool for that purpose. Learn about the test and the surprising initial data about engagement. (April 24-25, 2019)

Nudgestock Festival Why exactly does digital advertising make people feel uncomfortable? What are its downfalls? And how does GDPR present an opportunity for brands seeking to reach people online? (June 2018)

Mumbrella MSIX, with Eaon Pritchard Don Marti, open source innovation strategist at Mozilla, and Eaon Pritchard, head of strategy at UM, will take the stage to explore the ‘signal’ in digital. With web advertising grabbing headlines for fraud, privacy and trust issues, Marti and Pritchard will look at making it possible to use the web as a different kind of advertising medium, one that works better for carrying an economic signal.

FOSDEM 2018 In order to make open collaboration more effective, we are using simple market mechanisms to add incentives to do useful work. This demonstration shows how users can put financial value directly in the hands of the people who can fix the software issues that are most important to you, and how you can discover which issues really matter to yoru project’s users. (February 3-4, 2018)

MozFest 2018, with Matt Snell We will discuss how consent management on the web works today, and the relationship between user privacy and reputable content providers. Web users face a confusing array of data sharing choices, and click fatigue can lead to poor user experience and possible inadvertent selection of options that do not match the user’s privacy norms. Reputable content providers face revenue disruption arising from changing data privacy regulations. (October 26-28, 2018)

28 Mar 2020

Can database marketing sell itself to the people in the database?

Within two years or so, marketers will have to deal with a new technical and regulatory climate, where a person’s information is only in a company’s database if that person wants it to be. If a person doesn’t want to be tracked and targeted, or hasn’t heard of you, their information will not be available to you.

This might sound like a big change, but it’s where privacy technology and regulation are headed. Here in California, we will keep signing and passing privacy initiatives until marketing practices that we see as creepy go away. If privacy settings on a web site or app are too confusing, people will ignore them and fix the privacy settings through the political process. The harder it is to work through today’s California Do Not Sell process, the easier it will be for the next initiative to get passed. And the next after that. The California privacy initiatives will continue until California voters are happy with how the least trustworthy companies handle their information.

And just as privacy laws and regulations are popular in politics, privacy features are popular in products. (If they weren’t, Apple wouldn’t be all What happens on your iPhone stays on your iPhone. The billboard would be iPhone shares your life with brands you love or something.)

Today, database marketing companies try to make it a pain for people to exercise their right to have personal information deleted. But that can only delay what’s coming, and build support for proponents of stricter privacy laws. And those laws don’t have to be perfect. I don’t have to get every record mentioning me out of every database everywhere. I just have to be hard enough to reach with personally targeted ads and other forms of database marketing that it becomes too costly to do it, compared to other ways of getting a message in front of me. (Targeted ads are a form of database marketing. Doc Searls explained that a while ago. And anonymized ad tracking identifiers are PII until proven otherwise.)

Today, people have a choice.

  • accept tracking and targeting

  • do a bunch of boring privacy nerd stuff to get out of tracking and targeting

  • vote to change the game

The third option is the only easy or effective one. The result is that at some point anyone doing database marketing will have to make a case to every individual in their database for why the marketer should have that individual’s information. If you can’t convince me to let you have my info—not just trick me into fake consent or bore me into giving up—you’re not going to have it. When you sow dark patterns, you reap privacy regulations.

Outside of adtech/martech, business will go on pretty much as usual. I will put up with being in someone’s database to facilitate a transaction. You can’t ship my order unless you know where I live. I might even give up my information as part of a sales process or to receive a newsletter. I’m fine with being on a subscriber list for a publication, not just in order to receive information and culture, but also to be part of a community of some kind. In those cases, though, I’m still aware that I’m making a trade with a known party, and I’m accepting risks. And I’ll sign a privacy initiative that limits what you can do with my info once you have it.

Can database marketing sell itself to the people in the database?

What objections are marketers going to have to overcome to convince people to consent to the use of their information? How can database marketing justify itself to the people in the database? Today, I’m not convinced that database marketing can do that.

  • Will you use my information for price discrimination or discrimination against protected groups of which I am a member? For example, if you know my ZIP code will you quote me a higher price if I’m far from a competing big box store selling the same item? If you know something about my family life will you decline to show me ads for housing or jobs?

  • I have enough stuff, and if I want to buy more stuff I’ll go search for it. Why are you tracking me instead of putting your marketing efforts into content marketing and SEO that will help me when I need to find your product?

  • Why are you spending money on targeting me instead of buying me ad-supported content and services? When you can’t reach me by targeting me, you have to sponsor some kind of content or service that I might want.

Wait a minute, don’t I want more relevant ads? Of course not. When an ad is targeted to me, it’s easier to make it deceptive. I want the ads that your existing customers and employees see, too. I want the ads that the editors of the trade publications that cover your industry can see. I want the ad that your parents see.

Don’t publishers make more money from targeted than non-targeted ads? Yes, in a system where both kinds of ads are available reaching the same audience. I’m not interested in being targetable by anyone. If you want my eyeballs, pay for valuable content.

You might be able to sell me on being listed in a database if you offer me meaningful discounts. One possible result of the shift away from database marketing without consent is driving more and more marketing attention to membership programs like Costco and Amazon Prime. Instead of surveillance marketing to individuals, you will have to get inside a wall by dealing with one big company.

Anyway, more later.

Update 4 May 2024: Are users of privacy tools buying worse products and services?

25 Mar 2020

a likely near future for web ads

Michael D. Silberman from Piano writes,

Every day, users provide media companies with the type of explicit, accurate, data other businesses covet. Consumers willingly volunteer this data as they register or subscribe in order to unlock more content or build a deeper relationship with their site of choice. And it’s exactly this type of information that, if used right, can push those companies ahead in this new data landscape.

Piano is one of several service providers enabling the web ad business to migrate from the third-party cookie to SSO identifiers for ad placement and attribution. Another example is How LiveRamp Plans To Win After ‘Seismic Impact’ Of Chrome And Regulatory Changes | AdExchanger

LiveRamp’s Authenticated Traffic Solution (ATS) is creating more chances to match online users based on email. Publishers using ATS ask readers to submit their email for free access to content. That email address can potentially be matched to the IdentityLink profile, creating a chance for one-to-one targeting without having cookies, and with clear consent because the email was given to the site. Howe said LiveRamp has 12 SSPs and 30 DSPs buying on ATS or committed to its adoption.

One possible future for post-cookie web advertising is going to work something like this: if you’re signed in to a site, you’re going to get something pretty close to adtech as usual, except limited to the group of sites where you’re willing to sign in. So if two publishers can both use a registration wall to get your email (or SSO that maps to your email, which is basically the same thing) then the same ads will “follow” you across both those sites, and you’ll see ads targeted based on loyalty programs you opt into.

This means an increase in market power for publishers from the conventional third-party cookie, because crappy and fraudulent sites will have a hard time getting your email or SSO. For advertisers, the game of tag, trying to get ads in front of specific people, continues, except that the boundaries for the game are brought in to include only sites that can get people past the reg wall. LiveRamp CEO Scott Howe explains, in Why Addressability Will Flourish In A Cookieless World | AdExchanger.

On sites where you’re not signed in, you’re going to get ads for miracle fungus cures, predatory finance schemes, and other bottom-feeder stuff—unless you’re running a browser with built-in targeting/atribution (the stuff being discussed at W3C’s advertising business group) and leave it turned on. In that case your browser will do magic JavaScript tricks to give you reasonable-looking ads from legit companies—but the site still has an incentive to get you signed in if possible. (The clickbait of today is a photo and headline. The clickbait of two years from now will be a photo, headline, and a killer first two paragraphs to run above the SSO button.)

Fraud doesn’t go away entirely in this scenario, but the difficulty of fraud schemes does shift, and tends to make other platforms, such as streaming media, into more attractive fraud targets.

03 Mar 2020

oversimplified ad network

(updated 31 Jan 2021: placeholder ad in step 1)

Problem: Blog ads don’t pay and web ads don’t carry economic signal. And if you use a regular ad network you’ll probably get a bunch of shitty ads on your blog anyway, so not worth it.

Thinking about a solution (work in progress, comments and suggestions welcome)

How it works (for bloggers)

  1. Put the ad tag on your blog, apply to the ad network stating how much you want per day for that spot on your blog. (You can attach analytics and stuff to your application, but a lot of that stuff is bogus anyway so don’t sweat it.) At this point, a sample placeholder ad starts showing up on your blog, so the other members can check it out.

  2. The other members vote. Result can be: you’re in at the rate you asked for, you’re rejected, you get told to come back with a lower rate or change your site design to give the ad better placement or both.

  3. If you’re in, you’re in. Any time a paying ad gets accepted, it runs on every member’s blog and everyone gets their daily rate.

  4. If you want, you can make a “house ad” for your blog that might run (members vote) on everyone’s blog, on days when there is no paying ad.

  5. As long as you are a member, you can vote on ads, new members, and the house ad of the day for unsold days.

How it works (for advertisers)

  1. You can see a list of members, the site analytics they choose to share, and an estimated total rate at any time.

  2. You make an ad and upload it, along with the URL of the landing page you want it to link to.

  3. Members vote up/down. (Your ad must get a 2/3 yes vote, but it’s a short voting time and a missing vote counts as a yes. So you’re probably in, unless you’re that gut doctor begging people to throw out vegetables in which case go try a conventional ad network.)

  4. You pick days you want the ad to run, and sign an insertion order.

  5. Your ad runs. You are charged only for blogs that were members at the time you signed your IO. Members approved between the time you signed your insertion order and the time your ad ran are free.

  6. You get an invoice and pay it. (No, you don’t have to pay for your ad up front. If the bloggers didn’t trust you to pay an invoice they wouldn’t have trusted you to sell your stuff to their readers.)

The reason it’s one ad everywhere is to try to get some signal out of it. Readers will be more likely to check out their favorite blogs in the network, and not block ads there, when an ad is more like a magazine ad or a trade show booth—a clear signal that the advertiser is investing money in reaching a community of practice.

(Yes, this is sort of the spawn of The Deck and Project Wonderful, and both of those failed, but these are the days of countdown to the end of the third-party cookie, so worth trying again imho.)

29 Feb 2020

lazy loading images

Update 2 Jan 2022: Link to caniuse.com

As you might have noticed, the main page of this blog is not paginated. The text goes all the way back to the beginning. Fortunately I don’t write very fast so I can keep filling it up and it’s not even especially big for a web page today.

But the iframes and images can get big.

So I just put the lazysizes script, by Alexander Farkas, on here. So far so good.

Easier than setting up pagination. Please let me know if you run into any problems.

I have switched the images to loading="lazy" but I’m keeping this script for now, for iframes, because Firefox only supports lazy loading for images.

16 Feb 2020

three reasons privacy regulations and tools will result in an economic boom

(update 6 Aug 2020: edit for clarity, note at end)

Just popping up for a quick look at the big picture before digging into the details of advertising markets. Realistically, privacy regulations and tools are going to have a bunch of positive impacts on the economy as a whole.

  • Reallocation of information and communications technology (ICT) investments and skills. As investment shifts out of surveilance marketing, available financial and human capital in ICT will end up moving to positive-sum interactions. Capital can move remarkably quickly in response to regulation as we saw in the early 2000s. The Digital Millennium Copyright Act drove investments away from many kinds of infotainment startups, and into surveillance marketing.

    Today, the regulatory pressure is in the other direction. Not all of the shift will be away from marketing in general. While no-consent surveillance gets harder, other investments in marketing technology will show greater returns. For example, a mapping service will be able to add fuel, restroom, and food stops to planned trips—based just on express intent and on information supplied by businesses. And contextual targeting is likely to keep getting better.

    Important to understand the extent to which different “Big Tech” companies are exposed to different aspects of the transition. (Search and map ad revenue is less at risk than audience-based ads tied to poorly understood social surveillance practices, where voters will keep supporting laws and initiatives until the creepy headlines stop.)

  • Increased market power for news and cultural industries, resulting in a news and culture boom. This is not just because of context-based targeting, but because limitations on tracking users across contexts will remove some of the so-called “infinite inventory” that tends to drive down ad rates everywhere. More ad money for creative businesses means we get to see more high-risk, potentially high-value news and creative works, such as big investigative reporting projects.

  • Lower expected returns to some kinds of crime. User-targeted ad placements give a persistent structural advantage to deceptive sellers. Limits on targeting will reduce may opportunities for fraud against both legit advertisers and against end users. A stronger news and cultural industry will also be better funded to deliver trustworthy product reviews.

Rapid deployment of privacy regulations and technology is likely to result in a privacy dividend for investors, and a potential risk of an “overheated” economy in general. Much of the boom will hit the news and creative industries, but there’s probably some good news here for small businesses relieved from the pressures of fraud and bidding up the price of access to audiences.

I wrote this in February. Now the overheated economy risk is…not so much. Although some surveillance marketing advocates are asking for delays in privacy enforcement, it’s more likely that we will all do better by speeding up.

10 Feb 2020

Play money bug futures market

Developers would prefer to release software at a high quality level. Users prefer to use software at a high quality level. However, firms are incentivized to release software at a lower quality level than would be chosen by either developers or users.

How do you design a system that lets users quantify and hedge the risks of low-quality software, while compensating developers to do the extra work to bring the software up to a higher quality level?

A variety of systems have been proposed, including subscriptions, bounties, and dominant assurance contracts. A market for trading software issues describes another possibility: futures contracts on bugs/tasks.

A market (play money only) based on this research is set for a soft launch the first week of March. For a demo, account, and some free (play) money, please contact me directly.

20 Jan 2020

Removing extra, complex markup from a HTML page

Interesting question from Doc Searls, on Twitter:

Is there a way to export a non-complex Medium post (one, say, just with links) that isn't full of cruft like "p id="4f89" class="gw gx ar bj gy b gz ha hb hc hd he hf hg hh hi hj" data-selectable-paragraph="" ? How about an "Export as old school html?" option in the ••• menu?

— Doc Searls (@dsearls) January 16, 2020

Here’s one way. Save your messy HTML file (here I’m calling it index.html, then:

pandoc -f html -t commonmark < index.html \
| grep -v '<' \
| pandoc -f commonmark -t html -o clean_index.html

Open clean_index.html and there’s your old-school web page.

What this command is doing is…

First, convert the contents of index.html to the simpler CommonMark markup format.

pandoc -f html -t commonmark index.html 

Next, filter out any lines containing HTML markup that Pandoc wasn’t able to translate.

grep -v '<' 

Finally, convert back to HTML and write it out to a new file.

pandoc -f commonmark -t html -o clean_index.html

Where to get Pandoc, and more info: Pandoc

At Southern California Linux Expo a few years ago I did a talk with more Pandoc tricks: Using git and make for tasks beyond coding [LWN.net]

By the way, I’m scheduled to speak at SCALE again this year: Hacking the California Consumer Privacy Act for Fun and Profit (and freedom and privacy), so hope to see you there.

12 Jan 2020

Facebook, brown M&Ms, and skin in the game

(update 20 Jan 2020: add link to Bill Fitzgerald’s blog post.)

(update 18 Jan 2020: add embedded Tweet of fake McDonald’s ad.)

This is the long answer to the question: why are you tweeting screenshots of stupid Facebook scams? On Twitter it might look like I’m just randomly talking shit about Facebook, but I do have a point here.

Here’s an example of a Facebook Page whose owners uploaded my contact info without permisssion. It doesn’t look like that’s really their logo, either.

Facebook scam, fake “Amazon”

I didn’t set out to look for scam advertisers on Facebook. I did visit “Settings” → “Ads” there, in order to make sure to send a CCPA data deletionCCPA letter, shell script to some well-connected nodes in the surveillance marketing network. Hitting the cute little “Do Not Sell” button on content sites is a lot of effort for a little CCPA win, so the best CCPA strategy is to focus on the big players, such as.

  • Acxiom

  • Experian

  • Epsilon

  • LiveRamp

  • Oracle

Those were some of the companies I expected to have my info, and sure enough I found them in Facebook ad settings. So they’re on my CCPA list. I don’t expect companies like that to make the CCPA process easy, but I will do my part for the California creative boom of 2020.

Anyway, back to Facebook scams. While I was making my CCPA list, I also saw a bunch of Facebook advertisers like the Amazon one above, and this fake “Gap Inc.” This one not only uploaded my info, but also got Facebook to let them match it against Facebook’s existing user data, and re-sell the result.

Facebook scam, fake “The Gap”

What’s the big deal, though? They’re just regular scams, and I don’t buy stuff off of Facebook anyway.

Here’s why I think it matters. Obvious scams are a helpful way to see how well Facebook is enforcing its own policies on ads. Van Halen didn’t really dislike brown M&Ms, but their contract for live shows included a section banning brown M&Ms backstage. The part about the brown M&M was buried in the middle of a bunch of technical and safety requirements for the show. If the band saw a brown M&M, it was a warning to check again for harder-to-find safety issues.

On Facebook, a lot of the worst problems are the hardest to see. Any halfway decent state-sponsored political misinformation operation is going to be effectively invisible to me, and to academic and NGO researchers, even with Facebook “ad transparency.” The misinfo people have probably been uploading a bunch of variants on the same ad creative, to make it impractical to check it all. They have an inoffensive, generic name, use a US-based Facebook user with a clean account to be their point of contact, and carefully filter their Custom Audiences to key purple-state voters. And, as long as they don’t tag their ads political, and take them down before anyone reports them as political, the ads won’t be available afterward in Facebook’s Ad Library. Bill Fitzgerald explains in a blog post summarizing a recent Twitter Q&A session.

Can we take Facebook’s word for it that they’re doing anything about sneaky, invisible state-sponsored misinformation? I doubt it—when they’re serving a big bowl of brown M&Ms, in the form of obvious scams, to everyone who looks at their ad settings page. It’s hard to believe that they take an invisible problem seriously when the visible problems that would get fixed as a side effect of addressing it are still there. (And, of course, when deceptive ads serve the company’s own interests. Facebook is pushing an ambitious cryptocurrency scheme that depends on approval from US regulators, and the results of the 2020 election will decide who those regulators are.)

Facebook has two sets of rules for deceptive ads: the written rules that they show to media and the government, and the unwritten rules that they teach to their scam and misinfo advertisers by example. The unwritten rules, which encourage deceptive advertising, matter. The written rules, not so much.

Hypothetially, what would Facebook do if the company’s true intent on misinformation matched the written ad policies they claim to enforce? They would deploy a few fairly basic “skin in the game” fixes.

  • No more credit card payments for advertising, invoices only, net 90 If an advertiser is so untrustworthy that Facebook doesn’t even know they’ll pay their invoice, then that advertiser is not trustworthy enough to put in front of a user.

  • Rewards for reporting violations. If a user reports an ad that violates a policy, and the advertiser gets kicked off, then let the user keep the ad money that came in from that advertiser. Include the owners of email addresses and phone numbers who get added to Custom Audiences without consent in the program. (And no, this does not incentivize users to deliberately post scam ads and report them, because they would just get their own money back.)

  • Notify advertisers when their ad ran on content, or near an ad, that is removed for policy violations. Right now a lot of important brand safety issues are hidden, because advertisers can’t see context. Give the legit advertisers the visibility they need to decide how much brand safety risk to take.

Top management at a large organization cannot micromanage for trust and quality. But they could, if they wanted to, set up the culture and incentives to make it important to all decision makers at the organization. Right now Facebook is set up to encourage and profit from deceptive advertising while imposing deceptive advertising’s costs and risks on their users and on society in general.

All right, enough fun with zany scams, back to CCPA-ing any real company that put me in a Custom Audience.

31 Dec 2019

CCPA as a game mechanic?

Doc Searls pointed out, back in 2013, that direct marketing has taken over the advertising business. At the same time, the optimal balance between brand building and sales activation has shifted toward brand building. Advertising done right can be a source of funding for creative work, as part of a positive-sum cooperative game involving honest sellers and discerning but time-limited buyers. Today’s Internet advertising doesn’t manage to pull that off, because of its persistent fraud and brand safety problems.

But marketers are citizens and parents, too. The struggle over advertising’s alignment isn’t about privacy nerds on one side and marketers on the other, it’s about forming a connection between people looking at the advertising problem from both sides. A privacy developer building a system to help users control the use of their personal information is working on one subtask of the same project as a marketer who needs a trustworthy platform to build brand equity.

The problem, since the rise of third-party cookies in the first dot-com boom, has been that negative-sum marketing investments are much better at justifying money spent on them than positive-sum, signaling-based investments. As a marketer, it’s easy to show numbers to prove the success of a creepy project, while leaving the long-term damage to the brand to the next person.

Now for the good news.

Starting tomorrow, the California Consumer Privacy Act will give us an important tool to shift the balance between positive-sum and negative-sum advertising, by interrupting the data flows that allow for the placement and measurement of the bad stuff. The CCPA is an important tool to help marketers concerned about brand equity, to redirect ad budgets to support the creative work that we want. January 1, 2020 is CCPA Day, and the beginning of the journalism and culture boom of the 2020s.

It will take some work, though. CCPA without action by citizens is just a piece of paper. It only works if people take action to opt out, have their data deleted, or both. And the right companies to contact for maximum impact are usually the shadowy data brokers that you might not think of. Oracle is a database company and a sailboat sponsor, right? Yes, but they’re also the owner of a collection of database marketing companies and an important node to disconnect from the attack path that leads to me. Even though I’m a privacy nerd from way back, the prospect of opting out and deleting my data from all the shady companies out there looks like a boring grind, even though I know I’ll enjoy more ad-supported news and cultural works in the future.

At least I don’t need to do CCPA stuff to protect myself from anything already covered by privacy technology. I don’t need to spend human time dealing with a problem that a machine can block. That handles a lot of the ankle-biters of the Lumascape. But we still have a bunch of important nodes on the surveillance economy network to get to. Boring.

One way to get a good-sized pool of opt-outs and deletions in place will be to gamify it. I’m using a browser extension to log when an opt out or deletion has happened, report it to the back end. From there, connect it to an inventory or deck-building feature of the game, so the more opt-outs you do, the better your chances of winning.

Work in progress: source code

30 Dec 2019

bug futures references

our publications

A market for trading software issues | Journal of Cybersecurity

Marktplatz zur Koordinierung und Finanzierung von Open Source Software by Georg J. P. Link, Malvika Rao, Don Marti, Andy Leak and Rich Bodo

A trading market to incentivize secure software: Malvika Rao, Georg Link, Don Marti, Andy Leak & Rich Bodo (PDF) (presented at WEIS 2018)

references

Corporate Prediction Markets: Evidence from Google, Ford, and Firm X (PDF) by Bo Cowgill and Eric Zitzewitz.

Despite theoretically adverse conditions, we find these markets are relatively efficient, and improve upon the forecasts of experts at all three firms by as much as a 25% reduction in mean squared error.

(This paper covers a related market type, not bug futures. However some of the material about interactions of market data and corporate management could also turn out to be relevant to bug futures markets.)

29 Dec 2019

Day-to-day experiences with bug futures

In theory and in the lab, futures contracts are a proposed way to make the missing connection between open source and markets. Has everyone read “A market for trading software issues” in the Journal of Cybersecurity?? Here’s a figure from Marktplatz zur Koordinierung und Finanzierung von Open Source Software: figure 1

The next interesting question is: how do futures work in a real project? Are bug futures just high-tech piecework? This is my subjective notes on participating in a small project using futures. (We got in the habit of using FIXED and UNFIXED, all-caps, for the two sides.) It feels like we’re on to something, that the market is adding some information sharing and coordination power that’s not available in the bug tracker alone. I’m looking forward to using markets in more projects in the future.

Habits: I did get into the habit of quickly selecting what to work on based on my FIXED positions. I think I’m more of a loss avoider than a profit maximizer, and I probably passed up some chances to buy into something I could have finished faster, and just take a loss when an issue where I held FIXED ended up being unfixed on the maturity date. Something to try in the future: I might be more willing to try to resell my FIXED positions at a loss if the project had more traders.

Pricing: As a project contributor, I tended to use price as a signal of my confidence in being able to get something to work. For an issue with a good description and (imho) a straightforward fix, I would offer to buy a large quantity of FIXED at a higher price, which means putting more of my own tokens at risk. This should help other participants judge the likelihood of completion of particular tasks by particular dates. The actual prices in the live, small, market ended up being quite a bit higher then the examples in the paper.

I did end up offering extremely low prices for less well specified issues. This was, I think, a useful signal for the people requesting the features. The better specified the issue, the more likely to get a reasonable offer. I don’t know how this would be different in a project with more random wishlist bugs. There might be a trading opportunity for people willing to hold FIXED through the process of clarifying a feature request.

Maturities: I think a sensible “portfolio” view is important, and would like to experiment with better ones. As a random part-time contributor it was important to me never to build up too big of a workload for a particular maturity date. I did find myself making offers on an existing issue that didn’t match an UNFIXED offer from a user, because I wanted a later maturity date. Offering a low price and a far-off maturity date was the best way to signal that either this issue is not comprehensible enough for me to fix, or it’s too much for a single issue and needs to be split up.

I would like to see more live data on whether feature requesters try to buy UNFIXED positions on contracts with less crowded maturity dates (dates when fewer other contracts mature) to have a better chance of getting attention.

Next steps

If you’re interested in trying bug futures for a live project, please let me know. I already have futures hooked up to one that’s using Python+Flask for the server, and JavaScript+WebExtensions for the client. The market can also be hooked up to existing product. Currently supports GitHub but more integrations are certainly possible.

08 Dec 2019

A trend for 2020?

(This is a copy of an answer that I posted to a marketing list.)

There’s a strong possibility that 2020 could be the year that two big trends collide.

  • Privacy opt-outs and objections, enabled by laws such as CCPA, which goes into effect on Jan. 1.

  • Public concerns over brand safety, Sleeping Giants style—amplified by the contentious online environment around the 2020 US election, and the anticipated flood of misinformation, extremist, and harassing content.

The conventional wisdom now around brand safety is that it’s a manageable concern, and that it’s acceptable to have a certain percentage of a brand’s ad impressions end up on really heinous material on “long tail” sites and social media. This is likely to shift. Many people who are already on edge politically will realize that they can communicate their concerns in a way that brands are legally required to act on, by using the tools available to them under new privacy laws and regulations. Instead of just tweeting a screenshot of a problematic ad placement, people will be able to rage-CCPA the brand responsible.

The cost of complying with GDPR and CCPA is likely to be a factor in “flight to quality” media buying decisions.

29 Nov 2019

Happy #buyNothingDay, and a goal for 2020

Happy #buyNothingDay!

To be honest, though, I buy stuff the other 365 days of the year (2020 is scheduled to be a leap year) so I still get and pay attention to advertising.

In 2020, I’m going to see what I can do to get better ads. It seems like this is a good year to do it. Let’s review. There are three ways to get an ad in front of me.

Context: pay for something I want to read

Search: deliver the ads with search results when I search for something

Personalization: place the ad based on who I am

I’m fine with the first two. Context-based advertising pays for news and cultural works.

Search advertising pays for services I use, like web search and mobile maps.

The third one, though, is a problem. The first two are positive-sum games between me and the advertiser, and have positive externalities. Personalization is a negative-sum game, and has negative externalities.

What if I treat every personalized ad I receive as a bug, and report it?

In 2020, thanks to the California Consumer Privacy Act, I should be able to opt out of the use of my personal information for ad targeting. In the long run, when advertisers figure out that I’m impossible to reach with personalization, they’ll pay more for the news, cultural works, and services I want, and spend less money storing my information where it exposes me to risks like fraud and identity theft.

Here’s my CCPA letter. Work in progress: California Consumer Privacy Act letter.

Comments and suggestions welcome. Anyway, long weekend, longer bonus links section.

26 Nov 2019

making people cry in the stairwell: your best online brand-building value?

(point of order: I’m just going to say “brand-unsafe context” here because we already know way too much about all the worst-case places where a Facebook ad might run. Excuse the marketing speak, but I’m not getting paid enough for this blog to think any more about all the redacted on Facebook than I have to. Follow the links if you really want to read about where your Facebook ad runs.)

Facebook makes the news in unpleasant ways, but so far this hasn’t had much of an impact on the company. The decision-makers who matter are the people who buy the ads, from huge companies all the way to individual small business owners typing their credit card numbers into Facebook Ads Manager. And those decision-makers are still buying ads. Revenue is up.

Facebook does some magic algorithm shit to match your ad to people likely to click on it, and some of those people click on it and buy something. And for many advertisers the price of the ads is less than the expected increase in profit driven by the buyers the ads bring in. The two possible alternatives for the Facebook advertiser so far are…

  • someone sees the ad and clicks it

  • someone sees the ad and doesn’t click it. Not good, but not actively bad either.

The ad might be running in a brand-unsafe context, as Sacha Baron Cohen recently pointed out. But Facebook management doesn’t believe that running in a brand-unsafe context is bad for the advertiser. The worst the users can do is not click. Facebook Has Fewer Brand Safety Controls For News Feed Ads–On Purpose.

“We don’t believe ad adjacency matters in certain environments…and we designed the platform with that in mind,” said Erik Geisler, Facebook’s director of North American agency partnerships, speaking Thursday at 614 Group’s Brand Safety Summit in New York City.

Basically he’s bragging on being able to take your brand’s ad and run it in whatever brand-unsafe context the worst possible Facebook user can come up with. According to Facebook, you know all that stuff that has moderators crying in the stairwell and one service company quitting the moderation business entirely? Don’t worry about it because people are going to click on whatever ads run there, like nothing happened.

Even if that was true before, to be honest I’ve never placed an online order while crying in a stairwell, don’t know about you is brand safety still not going to be an issue in 2020?

The way the game works today is that the Facebook users can click or not click. But in the near future, things get a little different. The three options will be:

  • someone sees the ad and clicks it

  • someone sees the ad and does nothing

  • someone sees the ad in a brand-unsafe context and tries to get to the bottom of it

How does the cost of dealing with people who have a problem with a brand-unsafe context affect the game? In other ad media, user concerns about brand safety can have a material effect on the market.

So here’s my first attempt at a letter to send to the brand advertiser whose ads you see in a brand-unsafe context. Just like it’s pointless to report a scam to Facebook, exception to that rule: reporting a Facebook scam matters if you’re Craig Silverman calling for a comment on a story. Then they take care of it. don’t report brand-unsafe content to them either. Just screenshoot it and send your CCPA letter to the advertiser.

This letter refers to the European and the new Kenyan law too. IMHO better to make the advertiser look up which should apply, so we can all use the same letter.

Any suggested improvements? Please let me know.

25 Nov 2019

linklog by email

(update 15 Jul 2020: This seems to be working. Also added link to the gateway going the other way.)

This is how to get my linklog in email without an RSS reader.

  1. Go to Feedrabbit and make a free account. Check your email and follow the instructions to confirm your Feedrabbit account.

  2. On the Feedrabbit subscriptions page, click “New Subscription”.

  3. Enter https://rapids.aloodo.org/feed and click “Discover”.

  4. Click “Subscribe”.

You can repeat for the feed for this blog, which is https://blog.zgp.org/feed.xml

If you want a gateway to go the other way (email newsletter to feed) check out Kill the Newsletter!.

16 Nov 2019

note on adversarial interoperability

(update 11 Dec 2019: copy edit, add links.)

Read the whole thing: alt.interoperability.adversarial by Cory Doctorow.

The story of the alt. hierarchy is an important lesson about the nearly forgotten art of “adversarial interoperability,” in which new services can be plugged into existing ones, without permission or cooperation from the operators of the dominant service.

Today, we’re told that Facebook will dominate forever because everyone you want to talk to is already there. But that was true of the backbone cabal’s alt.-free version of Usenet, which controlled approximately one hundred percent of the socializing on the nascent Internet. Luckily, the alt. hierarchy was created before Facebook distorted the Computer Fraud and Abuse Act to try to criminalize terms of service violations.

Usenet was a good example of a decentralized service that could support both free speech and commercial publishing. Usenet supported adversarial interoperability (a25y?) about as well as you can. Any individual could pick their own client software. Any site could choose which newsgroups to offer, and which other sites to share them with. Users could filter messages with killfiles, and even run their own “cancelbots” to share filtering decisions out on the network.

Usenet is also a good example of the best “white hat” case against adversarial interoperability, because Usenet isn’t really much of a thing any more. Usenet spammers buried the legit users, even though many tried to fight back, as creatively as they could. Would better spam filters on the client side have helped? What if the scoring algorithms used to rank stories on Slashdot-style boards had made it into Usenet newsreaders? If Usenet had held its own, things might have gone very differently. But we lost Usenet—not because of policies and lawyers, but because spammers killed Usenet while the market incentives for developers encouraged work on web boards, which were mostly ad-supported.

I still like adversarial interoperability and agree with Cory that we need more of it. The best recent example I can think of is ProPublica’s Facebook Ad Collector extension, which was later blocked by Facebook. Why was Facebook management so willing to take the bad publicity that came with hiding which users see which ads? The easy answer, of course, is that Facebook is eevill, funded by Russian interests to undermine US public health and promote right-wing causes. But when a big company does something, there’s always a “because they’re just evil lol” reason and a legit-sounding reason. You have to pick whether they’re the real reason and the cover story, or the conspiracy theory and the real reason.

If you want the legit-sounding reason to avoid adversarial interoperabilty, here it is. The two easiest kinds of adversarial interoperability, by far, are:

  • Ad blocking: Removing advertising from a medium.

  • Spamming: Adding advertising that does not pay to support the medium in which it appears.

Twitter is a good example here. Twitter limited third-party apps and dropped RSS support, to keep third parties from offering either convenient ad blockers or ad-supported clients. (Or both: before they made the change, you could make a Twitter client that stripped out Twitter’s ads and put yours in.) Twitter also bought and stifled TweetDeck, the popular third-party client.

  • The easier you make it for people to read your service with their own choice of client, the more ads they will block.

  • The easier you make it for people to post to your service with their own choice of client, the more spam they will send.

In Facebook’s case, the HTML they send to your browser is a convoluted mass of HTML elements with randomized classes and ids, to make things harder for ad blockers—and either as collataral damage or on purpose depending on who you believe, for ProPublica. (In native apps, ad blocking is even harder.)

Facebook has to pull horrible, ever-changing, HTML stunts to get their ads through to the user, because those ads don’t pay their way, and the people receiving them have no reason to keep them. Advertising is only sustainable when it’s an exchange of economic signal for attention. In order to carry the signal needed to earn attention, an ad has to be:

  • credible

  • from an identifiable and significant sender

  • related to a market in which the reader is a participant

Only the last one might apply to social ads. On Facebook, I have no incentive to let an ad through, because I have no reason to trust it. That also applies to “programmatic” ads on the web.hey kids! scientific literature! In any ad-supported medium where blocking is the best choice for a member of the audience, then adversarial interoperability is not an option, because the obvious low-hanging fruit of adversarial interoperability is ad blocking.

It seems like we would need ads to start paying their way, and for ad blocking online to be less of a rational bargain for the reader, for adversarial interoperability to catch on.

12 Nov 2019

consolidation? meh.

This is an answer to a question I got about adtech consolidation.

The Wall Street Journal ran a great piece by Keach Hagey and Vivien Ngo, all about how Google is finally helping to fix the design of that terrible “Lumascape” slide, by letting them add some whitespace and make the logos bigger.Who says Google doesn’t do great things for graphic design? Here it is: How Google Edged Out Rivals and Built the World’s Dominant Ad Machine: A Visual Guide - WSJ

So, is adtech consolidation bad for publishers?

Not really. Adtech consolidation is overdue. Any normal Internet business would have consolidated a long time ago. (Ever notice you get mostly Zoom meeting invitations now, and not links to rival conferencing systems?)

What matters is the adtech/publisher split.

  • In a sustainable ad-supported business you have a split of about 85% publisher/15% intermediaries.

  • The web is 40%/60% with the publisher getting the small slice.

It doesn’t matter if the 60% goes to one company, a few companies, or a Lumascape with thousands of companies. It’s still unsustainable for the publisher. When publishers look at the diagrams showing multiple advertisers bidding for an impression on their sites, they’re ignoring the other side—the same advertisers bidding for the same user eyeballs on cheaper sites.

Content is commodified, the publisher lacks market power to command a decent price for ads delivered to their own audience, and no tweaking of market share in the adtech business will change that. Real-time bidding works today because too many people have the user’s data, and can target them.

Today’s big tech companies come at every problem with the same tool: take whatever business is adjacent to me, and try to turn it into a commodity. That works great when your product is an internet service and the business adjacent to you is an operating system. A copy of Linux is a commodity, but it’s actually worth more than a copy of HP-UX or Solaris.

But commodification is not the highest value model for the advertising business, which depends on feedback between brand equity and content reputation. Brands are worth more in a market with high-value content sites, and ad-supported content sites are worth more in a market with high-value brands. So what matters to the publisher is not the number of adtech vendors participating in the market. What matters is the number of saleable ad impressions that are in direct competition with your ad impressions. (A site trying to sell an ad on a story that cost $10,000 to report, edit and shoot is competing against a site running an ad on a recycled racist cartoon or a pirated Nickelback song that cost them nothing.)

Adtech’s job is simple: to facilitate putting impressions on cheap content into the same market as impressions on expensive content. It might matter a little bit how many companies are involved, but that’s not the main story. The survival of any ad-supported site depends on market power, and part of reclaiming market power is making it harder or more expensive to reach your same audience on cheaper sites. That depends on privacy tools and regulations.

The good news is that CCPA is coming January 1. CCPA is not just a compliance issue or a cost center. If we handle it right, it’s a way to reclaim some market power by limiting the number of saleable ad impressions on low-reputation sites.

30 Oct 2019

Farewell, Twitter political ads.

I see that political ads on Twitter are supposedly going to not be a thing any more, so I just wanted to share some that I remembered to save.

You’re welcome.

I don’t remember getting candidate ads on there, just issues and astroturf. All misleading, of course, because the main benefit of targeted advertising is that it faciliates sending deceptive messages while avoiding law and norms enforcers. I don’t remember getting any honest political ads on Twitter, so probably a good move.

22 Oct 2019

Duck season! Wabbit season!

I don’t know if it’s Duck Season or Wabbit Season now, but starting Jan. 1 it’s creepy database marketing company season here in California.

This doesn’t mean that we can just use the opt-out flow that’s easiest. The surveillance marketers are already working on a way to make people feel like they’re doing CCPA, and to make things harder for publishers, but without affecting the bigger players. The IAB CCPA Compliance Framework for Publishers & Technology Companies is out, and it enables you to signal an opt-out only for the site you’re on. Not what we need, because it doesn’t do anything about the party that actually holds the data, and that’s where the opt out or data deletion demandI know the law says data deletion request but if you’re not allowed to turn me down then I’m not requesting. needs to go. Places like

  • Oracle Data Cloud

  • Acxiom

  • Experian Marketing Services

  • LiveRamp

All the companies on this CCPA todo list. If you want advertisers to buy you better ad-supported content, you have to starve them of the targeting data they need to reach you on crappy content, or in social media and native apps. Which means focus on the DMPs, not the peripheral ad targeting. Anyway, we should be able to make CCPA flow a lot easier. Here’s a button to help me test something to do that.

Two more things.

Anybody with a clean whiteboard can design a better web advertising system than what we have now. The hard part is incrementally getting ad-supported publishers from the current shitshow to the new system before they run out of money.

Solutions to the surveillance marketing problem have to be built for real people, not for a fully-informed, rational Homo economicus. Two pieces of irrational behavior you can count on, and that the solution has to take into account.

  • Users won’t pay for privacy even when it’s cheap.

  • Advertisers won’t pay for context even when it’s valuable.

(The first makes sense if you consider that we consider surveillance marketing as something that the other side is doing wrong and we expect norms violators, not the people affected by the violation, to bear the cost. The second one seems to be a lot more complicated.)

17 Oct 2019

Beans, botulism, and brand safety

Everyone is writing long thoughtful pieces about how some social media company CEO should do this or that.

What a waste of carpal tunnels. Social media CEOs know what they’re doing about as much as Clostridium botulinum knows that it’s giving people botulism.

If your brand is all about canned beans, and you find out that some of the the cans have C. botulinum in them, you don’t write a long think piece about how C. botulinum should carefully consider its impact on human health. You don’t go to a conference and get on a panel about how botulism is an industry-wide problem. You don’t invite C. botulinum to give a talk at a college campus.

Brand safety is a little behind food safety, though. For a long time, think pieces and conference talks have taken the place of action, while brands end up in worse and worse places. That is the brand marketer’s responsibility. David Kohl writes,

[A]dvertisers have the power to reduce the spread of misinformation by withdrawing media dollars from platforms that enable its distribution. Consumers are nearly three times less willing to associate with brands that advertise alongside unsavory, offensive content (see Magna’s Brand Safety Effect). And according to Edelman, 48 percent of consumers feel it is a brand’s own fault if their advertising appears near inappropriate social media content. Simply said, advertisers put their brands at risk by placing their media budgets in the hands of platform providers that fail to take full responsibility for the content they distribute.

Starting in 2020, when a brand runs an ad on social and it ends up sponsoring a death threat against a journalist, or a video of someone abusing their kids, or a domestic terrorist recruiting page, the people who made that bad advertising decision are going to find out. Starting in 2020, California gets the California Consumer Privacy Act (CCPA).

When people see brand ads running on all the evil shit that the social media companies are pushing out, they’re not going to fill out some ToS report form when CCPA is available. And those CCPA demands will come to the brands responsible.

Which is great news. Handling CCPA demands is likely to be costly. So avoiding them gives everybody a reason to do the right thing, and move ads away from problem activities. Brand safety and isn’t just another marketing checklist item. Do it right and it’s your ticket to getting on the right side of history.

05 Oct 2019

GPG agent timeouts

This is to see the GPG passphrase dialog pop up less often. Probably a security win because every time I get prompted for a passphrase there is a risk that I will fumble-finger it and type the passphrase into a chat window instead.

In $HOME/.gnupg/gpg-agent.conf

default-cache-ttl 28800

What I really want to be able to do is make ssh-agent and gpg-agent forget their keys before the machine suspends. Looks like Running scripts before and after suspend with systemd is a good way to do that.

Also, hey, kids, paper!

market for trading software issues | Journal of Cybersecurity | Oxford Academic

Open source software forms much of our digital infrastructure. It, however, contains vulnerabilities which have been exploited, attracted public attention, and caused large financial damages. This article proposes a solution to shortcomings in the current economic situation of open source software development. The main idea is to introduce price signals into the peer production of software. This is achieved through a trading market for futures contracts on the status of software issues.

(by Rao et al., and yes, I’m one of the “et al.”)

References

25 Sep 2019

Open source businesses, meet the real world

(Update 7 Oct 2019: add link to James Vasile article, minor copy edits.)

Previous: the Linux device driver hacker’s guide to giant Internet monopoly dominance

Big Tech companies as we know them are mutated versions of the open source software business. This looks like a big cultural win for the open source entrepreneurs of the 1990s. But the problem is that open source business models can be a rational choice in the software business, but in other businesses, not so much.

Nobody is the villain in their own story, and Big Tech management generally doesn’t make decisions that look creepy and evil because they actually are creepy or evil. They’re just running the pattern that beat the last few levels.

The problem they’re solving is that in the software business, the absolute worst place for a marginal dollar to end up is at another software company. You would rather see money burn up in a fire than see another software company get it. If another software company got it, they would use it to sue you, or build network effects in their own product adjacent to yours to squeeze you out, or whatever. So the pattern you end up developing for self-defense is open source. Open source is a great defensive tool in software. Turn the product categories adjacent to yours into low-profit commodities, and keep money out of the hands of other software people.

If you keep growing the open source model you get today’s Big Tech. Decisions that look shortsighted or just plain evil are understandable if you look at them from the Open Source entrepreneur’s point of view: Every business adjacent to mine is either a low-margin commodity or an existential threat.

When the business adjacent to the Big Tech company is in individual independent contractor, you get the gig economy and the precariat. Besides the gig economy, though, the biggest example of Open Source patterns influencing other areas is the commodification of content sites. Surveillance marketing, for the Big Tech platforms, is not about the surveillance. User surveillance is just a commodification tool, like an open source software component is. User tracking has value to Big Tech because it makes the content site into a commodity source of the same eyeballs you get get anywhere, and drives ad profits to the platform that enables the tracking.

The problem with the commodification strategy is that it works great for software, where it’s safe to assume that every company in every area adjacent to yours is run by a douchebag software CEO, but it’s suboptimal for types of business in which having a strong company adjacent to you is an advantage. If Big Tech management ran Chevron, they would give out free clones of the 1970 Plymouth Belvedere that get 8 MPG, and everyone would be all on about how there is no money in the car business. Brands, content sites, and ad agencies are an example of a set of businesses in which a viable company in one category actually boosts the companies in adjacent categories. Approaching this kind of situation with the oversimplified view of commodifying everything is leaving money on the table.

Next: The surveillance economy is more like the commodification economy

12 Sep 2019

will ITP and ETP kill traffic arbitrage?

Today, Traffic Arbitrage is a profitable scheme.

  • Make a crappy site full of cheap clickbait content.

  • Load it up with programmatic ad slots.

  • Buy content recommendation widgets, or Chumboxes on legit, high-traffic sites.

  • Any time someone clicks through from a legit site, show them a bunch of ads, including high-CPM video ads.

The reason that this model has worked so well is that with programmatic advertising, impressions on the crappy site are worth something. The user from the legit site brings along the same third-party tracking cookies that their browser sends to everybody. So a legit advertiser is likely to bid to reach them.

The publisher of the legit site gets some of the revenue—they get paid per click on the chumboxes—but they don’t see the revenue lost because they’re now competing for programmatic bids with the lower-cost sites that they’re sending traffic to. The tragedy of the chumbox is that all the legit publishers would be better off if nobody could do traffic arbitrage, but the first legit site to stop participating in traffic arbitrage loses.

Now browsers are blocking the third-party tracking that enables traffic arbitrage. First Apple Safari ITP, now Firefox ETP, soon Microsoft Edge.

A legit site that can persuade users to subcribe or sign in now has significantly more data on its audience than a crappy site where people are unlikely to share their information. Traffic arbitrage schemes are being starved of data, which should start to show up as lower RTB bids on their sites, and higher bids on legit sites.

There are uses for content recommendation widgets other than traffic arbitrage, and it will be interesting to see how the content that gets recommended on them changes as traffic arbitrage schemes get squeezed out.

This isn’t the last move in the game, of course. For large Internet platform companies, privacy violation is not required, but a means to an end. The business model for platforms depends on capturing an ever-larger share of ad revenue by commodification of the content business, and violating user privacy expectations is the easiest, 1.0, way to do it. It may be possible to construct a user data scheme that both commodifies content and preserves user privacy, so the interests of legit publishers are aligned with, but not identical to, the privacy interests of their readers.

03 Sep 2019

Open source funding and risks

Read the whole thing: Recap of the funding experiment by Feross Aboukhadijeh.

Unfortunately, when open source people say things like…

Maintainers do critical work which enables companies to create billions of dollars in value, yet we capture none of that value for ourselves.

Does it have to be like this?

I’m not arguing that maintainers should start capturing all of the value that we create. But we shouldn’t capture literally none of the value either. The status quo is not tenable.

I would love to find a way to help maintainers capture at least a bit of the value we create so that we can happily continue to write new features, fix bugs, answer user questions, improve documentation, and release innovative new software.

…what people who use open source in business are hearing is more like…

We’re getting a lot of software value for nothing! Fist bump!

A simple appeal to do the right thing is not something that, as a downstream user, you can put in your budget.

When you use under-funded open source software, there is always a risk that if the maintainer doesn’t get paid, they will either burn out or go get a high-intensity job and let their project fall on the floor. Can you justify paying open source maintainers in order to protect yourself from this risk?

That’s a little more promising, but two areas need to be addressed.

  • Is the risk quantified? I can measure a software project’s value to me, but not the probability of the maintainer quitting in the absence of support, so I don’t know the total size of the risk. If I can’t quantify the risk, I can’t justify spending to avoid it.

  • Can I measure the benefit of participating? I don’t know how much my choice to fund the project reduces the risk. I could put in my $100, see that the developer still can’t live on that, and end up incurring just as much cost to replace the open-source dependency as if I had not invested.

IMHO we need better market design in order to deal with those two problems. I personally think that models based on dominant assurance contracts and/or futures markets are promising (more on that later) but just banning an interesting idea after its first deployment is counterproductive.

23 Aug 2019

the Linux device driver hacker’s guide to giant Internet monopoly dominance

I recently found out that Linux Journal, where I was editor-in-chief for a while, is shutting down. This might seem natural, because considering all the places you find Linux—from the largest cloud services and supercomputers to the cheapest smartphones—it makes about as much sense to have a Linux Journal as it would to have a magazine called Air Breathing Aficionado. For what it’s worth, MSDN Magazine is shutting down too. Are operating systems just boring commodities now?

Yes, but that’s not all.

In 1994, Randolph Bentson got a serial board from Cyclades in exchange for writing a freely available Linux device driver for it. This piece was before my time as editor. This probably didn’t seem like a big deal at the time, but it was a key event in how the Free Software scene grew into the open source software business, and then the Internet business, as we know them.

Device drivers used to be proprietary software that came on a floppy or CD-ROM with the board. What Cyclades came up with was a good early example of what Joel Spolsky later called commoditizing the complement. By releasing the device driver under the same terms as the rest of the Linux kernel, they saved their own costs of keeping the software proprietary, pushed the maintenance programming into the open, and lowered the total cost of ownership of the device for users. Most of the major hardware companies ended up making the same decision. Today, the operating system is an inexpensive commodity, and most hardware firms have dedicated kernel teams to keep the free part of the software/hardware combination working. This keeps the whole product (Linux plus Linux-supporting hardware) attractively priced.

The Linux business is built on ruthless commodification, and would not be a business without it. Tim O’Reilly pointed out that it’s a good example of Clayton Christensen’s “Law of conservation of attractive profits.” Red Hat was the survivor of a crowded Linux distribution market largely because of its committment to work upstream and offload as much code review, testing, maintenance programming, and version control as possible. The OS market is a tiny fraction of what we thought it would be based on the way things were headed in 1998. Yes, individual Linux developers are well-paid, but the OS business? The ubiquitous OS license price, on both client and server, is $0.

An even bigger commodification shift came later, when server hardware became a commodity too, driving all the attractive profits to the service business. Big Internet companies as we know them grew out of the Linux scene, where the mandate to commoditize everything that you don’t directly make money from is so obvious that people rarely even discuss or acknowledge it.

Commoditizing the operating system was only the first few levels of the game. Google, for example, beat level one of the game by installing racks of janky-looking Linux boxes instead of proper servers with licensed Solaris or whatever on them, then beat level two by commoditizing the mobile OS. But it’s not just Google, and it’s not just OS and hardware. Today’s tech business is not so much about technology any more. It’s more about applying the principle of commodification, learned from Linux, to everybody else’s work, whether by investing in building network effects to take over services formerly run by local businesses or enforing arbitrary rules on video creators while aggressively using recommendation algorithms to drive users to “fresher” content.

The commodification play in web publishing is to control the data on who is looking at something, in order to drive the profit out of where they look at it. This doesn’t necessarily work so well, but whether or not tracking-based ads work better isn’t the point. They only have to work well enough to drive the web content business into the commodity category with the cover bands from Amazon Prime Music. The only real opposition comes from publishers and privacy developers. Privacy developers don’t want users followed from one site to another, and publishers don’t want their audience’s eyeballs sold somewhere else.

The optimistic view is that better privacy in the browser will help us beat commodification. If everything works out just right, privacy in the browser means that nobody can get trustworthy data about ad impressions on random sites, which means no more infinite online ad inventory, which means that advertisers have to board the flight to quality to sites known to be trusted by their users. Then increased market power for those publishers means more expensive advertising, which means more signaling power for brands. Signaling power, if used right, builds brand equity, which means brands can spend more on ads, so they increase signal by contending to support obviously expensive content. This effect is responsible for the kind of ad-supported media that’s worth real money offline, so let’s make it work for the web too.

But what about the low bid problem and the crappy ad problem?

Advertisers bid less for ad impressions without tracking data when impressions with tracking data are available. According to one Google study, Based on an analysis of a randomly selected fraction of traffic on each of the 500 largest Google Ad Manager publishers globally over the last three months, we evaluated how the presence of a cookie affected programmatic revenue. Traffic for which there was no cookie present yielded an average of 52 percent less revenue for the publisher than traffic for which there was a cookie present. Lower revenue for traffic without a cookie was consistent for publishers across verticals—and was especially notable for publishers in the news vertical. For the news publishers in the studied group, traffic for which there was no cookie present yielded an average of 62 percent less revenue than traffic for which there was a cookie present.

The paper Consumer Privacy Choice in Online Advertising: Who Opts Out and at What Cost to Industry? by Johnson, Shriver, and Du finds 52% less revenue for impressions to users who deliberately opt out using an industry site.

The crappy ad problem is related. If the ad network doesn’t know that you’re an affluent car shopper, you’re not going to get the professionally shot photo of a BMW on a scenic road. Instead, you’re going to get ads for FREE* LIVER FUNGUS CURE!!!1! (just pay shipping, order auto-renews weekly).

These two effects are visible when impressions with and without cookie data, reaching a similar pool of people, are available in the same market. It would, however, be unrealistic to extrapolate from this to get to the conclusion that the result of protecting a large fraction of an audience will be that ad budgets intended to reach those people will go down by 52%. Jonathan Mayer and Arvind Narayanan point out that after GDPR, The New York Times cut off ad exchanges in Europe, and kept growing ad revenue, and that another study showed only a 4% revenue boost from behavioral tracking. (Measuring normal RTB bids against artificially cookie-blocked RTB bids seems like it would not detect a flight to quality.)

As privacy protection gets better, sites will have options to fix the low bid problem without commoditizing the content site by leaking user data. Context-based ad placement technology is still catching up to user-tracking-based technology.

We now return to the Internet Optimism already in progress

Nobody wants to be stuck being the commodity, and with decent privacy in the browser, the content site doesn’t have to.

That’s the basis for cooperation between privacy-protecting browsers and sites trusted by their readers. Former Mozilla COO Denelle Dixon writes, on the Digital Content Next site, In short, behavioral targeting will become more difficult, but publishers should be able to recoup a larger portion of the value overall in the online advertising ecosystem. This means the long-term revenue impact will be on those third-parties in the advertising ecosystem that are extracting value from publishers, rather than bringing value to those publishers.

I have been talking and writing about the alignment of interests between privacy developers (who don’t want their users’ activity from one site following them to another site) and publishers (who don’t want to leak their audience data) for quite a while. But privacy and publisher market power are two parallel causes, not one big movement. The commoditizers have a lot of skill and time to put into splitting the alliance that puts publishers on the same side as privacy developers. Can the Internet ad duopoly do something to satisfy privacy demands from users and regulators without ceding market power to trustworthy sites? Two proposals.

  • Fraud Resistant, Privacy Preserving Reporting Using Blind Signatures is a system to prove that a likely human visited a site, without giving the advertiser enough information to show that a certain user visited a certain site.

  • Federated Learning of Cohorts(FLoC): We plan to explore ways in which a browser can group together people with similar browsing habits, so that ad tech companies can observe the habits of large groups instead of the activity of individuals. Ad targeting could then be partly based on what group the person falls into.

If they can’t track users individually, they’ll still try to figure out a way to get high-value ad impressions from known human eyeballs at random sites, and commodify publishers that way. But that depends on getting the privacy developers to decide to be fine with this kind of scheme.

The good news is that privacy developers tend to be generally sympathetic to the publishers, because positive externalities and stuff. But we’re still facing the risk of privacy-acceptable but anti-publisher user data handling schemes. Privacy developers need help to keep any new privacy technology aligned with the interests of whatever publishers their users choose to trust. That means we have to commit to more ongoing coordinated open source development, with publishers who want to stay out of the precariat using and testing the same code as browser and tool developers who want to keep their users safe.

The other good news, now that I think about it, is that they’re now paying the privacy/publisher alliance the ultimate compliment, by trying to split it. We’re on to something here.

Continued: Open source businesses, meet the real world

19 Aug 2019

links for 19 Aug 2019

Do you agree that Brands, And Now Publishers, Must Get The Message About Programmatic Creative? Or have we gone “Beyond Copy”

Bruce Schneier has a great piece on the Influence Operations Kill Chain but it misses one step. What if a social network is so committed to getting a cryptocurrency approved that the moderators end up coming in on the misinfo side? And a reality check on filtering: Facebook is already flooded with fake Libra cryptocurrency scams. If they can’t filter out ads about their own brand, based on photos of their own CEO, what can they do for political ads where the creative can be more varied? And will they even want to?

If you advertise on Facebook, read this: Trauma Counselors Were Pressured to Divulge Confidential Information About Facebook Moderators, Internal Letter Claims And if you don’t advertise on Facebook, go ahead and feel a little more satisfied with yourself. You’re doing a great thing for our country.

Q: Why don’t you ride your bike in traffic? It’s convenient, fun and great exercise!
A: Man Caught Playing Pokemon Go On Eight Phones In His Car

Apple will soon treat online web tracking the same as a security vulnerability. Apple: ‘We’ll treat sites like malware if they break ad tracking rules’. WebKit’s new anti-tracking policy puts privacy on a par with security. (The message here is that if you want to track Apple’s users, move them to your private island and hunt them, I mean make a native iOS app with tracking SDKs.) And ICYMI: Introducing tracking prevention, now available in Microsoft Edge preview builds

Most EU cookie ‘consent’ notices are meaningless or manipulative, study finds. In other news, ursine defecation observed in forest environments.

Can CCPA Be a Good Thing for Marketers? Now, we need to prove that the data is being stored, managed and used with the utmost integrity and privacy safeguards possible. Regulations like GDPR and CCPA provide the opportunity for marketers to become accountable for this process and allow us to step up and show what great stewards of data we are. More CCPA: Why the CCPA’s ‘verified consumer request’ is a business risk. IMHO CCPA will do for marginally useful PII in Marketing what the Resource Conservation and Recovery Act did for marginally useful hazardous materials in manufacturing—make it easier to just get rid of it and update your processes to do without.

Digital buzzword watch: contextual. The halo effect is real: New research confirms that ads perform better in quality contexts, How quality content separates publishers from ad streams, New Study Shows That All This Ad Targeting Doesn’t Work That Well, Contextual Targeting’s Coming Renaissance

28 Jul 2019

The new browser consensus and SSO

(Update 24 Sep 2019: add link to IsLoggedIn. Update 20 Sep 2019: copy edit. Update 14 Aug 2019: Add link to Safari’s policy)

(Disclaimer: I work for Mozilla. Not speaking for Mozilla here.)

The first result of the browser privacy trend is a growing difference between how the browser treats two kinds of third-party data collection.

  • third-party data collection that happens when the user chooses to use information from one site on another site

  • third-party data collection that happens when a site or service, without an action from the user, tries to use information about the user’s actions from one site while they’re using another site.

Any third party interaction that the user knows about is supposed to keep working. But hidden tracking pixels, scripts and any technology that tries to implement tracking without user interaction are all supposed to stop working.

Protections to implement this are still in progress, but this clearly the direction Safari, Firefox, and now Microsoft Edge are going. We now have the same kind of rough consensus on user expectations about tracking that we developed pretty early on in the email spam situation. This consensus is based on extensive user research. (Why browsers took so long to listen to people about what they find creepy is another story.)

An example of a protection step that’s common across browsers is the the Storage Access API. This gives browsers a way to allow third-party scripts to use cookies and LocalStorage, but only if the user takes action. Apple Safari, Mozilla Firefox, and Microsoft Edge are all involved. (hashtag #worldsFriendliestBrowserWar)

The WebKit Tracking Prevention Policy used by Apple Safari, says

Merely hovering over, muting, pausing, or closing a given piece of content does not constitute an intention to interact.

and

We consider certain user actions, such as logging in to multiple first party websites or apps using the same account, to be implied consent to identifying the user as having the same identity in these multiple places. However, such logins should require a user action and be noticeable by the user, not be invisible or hidden.

The Mozilla anti-tracking policy is similar.

For sites, what this means is that SSO and registration walls are relatively safe. If the user is clearly presented with “Sign in with (identity provider brand)” and there is a button the user has to click the first time they go to the site, that SSO system should keep working. The user knows that they’re using it, and clicked the logo of the provider they “sign in with.” A proposed API, IsLoggedIn, from Apple Safari developer John Wilander, would make it easy for a site to check logged-in status from JavaScript.

If the user can’t see the way that multiple sites are trying to use the same information about them, then that flow of data across sites is likely to get blocked, whatever the technical implementation is. This is likely to be good for the relative market power of sites that people trust more, if it turns out that people are more willing to “sign in with” (and obviously share info about themselves) on their trusted sites than on a random site that their uncle sent them a link to.

More on this: Will ITP and ETP kill traffic arbitrage?

 

23 Jul 2019

informing brand advertisers

What’s the weakest point in the chain of deception and abuse coming at you over the Internet? Maybe the brand advertisers who pay for it.

This is a random list of projects that help inform advertisers of what their ad budgets support:

Any others?

23 Jun 2019

Reporting scam ads to Facebook: counterproductive?

Scams and political misinfo are bad, so why not report them?

What follows is a version of my long email response to a question about why I think that reporting problem content to Facebook is a bad idea, at least here in the USA. Complaining about Facebook is fine, but complaining to Facebook, not so much.

That’s because as of 2019, it’s almost like there are three Facebooks in this country, from the user point of view.

Purple state Facebook: I wish I could predict otherwise, but this is going to be a bigger and bigger fraud and foreign misinformation operation leading up to the 2020 election.

In some countries, doing moderation might be a way to build goodwill with the government. But here, the current US administration gives foreign operations some of the credit for getting elected. Meanwhile, Facebook needs to get an ambitious new cryptocurrency scheme approved by that very administration. Which means nobody at Facebook can do anything serious about the foreign misinfo problem, at least as it affects the people that the re-election campaign wants to reach. They’ll have to handle anything pro-reelection with an extremely light touch, or face a regulatory mess that will keep them out of the money-printing business.The good news is that these two assembly line workers can get a break from the “useless jobs” meme, and whoever is in charge of pretending to moderate Russian troll accounts for Facebook can have a turn.

Red state Facebook: At first I thought that that Facebook could mostly ignore the red state people, or just let skeevy PACs raise money from them, because they live in states that are already safely in the re-election column. But red state people have a valuable role to play. When Facebook kicks some of their favorite personalities off the service, the role of the red states is to complain loudly about it, and even threaten regulatory action, to help make it look like Facebook is even-handed or leaning moderate liberal, to the remaining audience, which is…

Blue state Facebook: This is where the prospective employees live, or are willing to move, and also where you’ll find the decision makers at the major advertisers. But both of these groups are more comfortable with a company that has international appeal, so Facebook somehow has to look “brand safe” in order to keep them on board. That’s where Grigory Potemkin’s paint and trim crew, I mean the Facebook advertising “transparency” operation, comes in.

The kind of people who might work for Facebook, or advertise there, get targeted for a dramatically different experience from what the regular people do. How many Facebook employees are embedded with the Presidential re-election campaign these days, anyway? And how many are planned to be there at the peak? IMHO some reporter should ask that. And much of the documentation that Facebook makes public about its political misinformation problem is a read-between-the-lines instruction manual on how to do political misinformation without letting the brand advertisers see it.

Regular people get quack miracle cures and massacre-your-neighbors campaigns, while CMOs get ads for luxury resorts and martech services. Of course the CMOs are going to be fine with advertising there. But no ad targeting system is perfect, and occasionally some of the nasty stuff leaks through where blue state people can see it, which means a recruiting problem to start with, and maybe even an advertising problem. (Although as far as I can tell, CMOs are pretty easy to keep in the dark, safely reassured about how they can stay brand-safe and moderate even while supporting a company that’s already locked in as a division of the re-election campaign.)

When Facebook asks you, a blue state person, to please report things, that’s where you come in. They’re looking for help spotting it when the bad stuff that goes out to regular people leaks into the sanitized version of Facebook seen by advertisers and prospective employees. Anyway, long answer, but that’s why I don’t report problems to Facebook. When a drop of the purple-state crapflood leaks through to me, I post it publicly and/or send it to an advertiser instead.

12 Jun 2019

in the future

What’s in the box? What did you order?

I haven’t ordered anything in a while. I’ll open it.

Just one item. It’s shaped like a kettlebell, but smaller. It’s covered with tiny buttons like a scientific calculator. Each button has a person’s headshot on it. Looks like they’re all members of 1980s hair bands.

Can you use it as a kettlebell anyway?

No, it’s too small, I can’t get my fingers between the handle and the ball part. And it’s really light. And none of the buttons move. They’re not hooked up to anything.

What is it?

It’s a metabolic product of the data-driven economy. Bots browse for stuff to generate ad clicks. As a side effect, they visit product pages and pretend to show interest. Marketing algorithms discern product preferences from this. Product design algorithms put out contracts on gig sites, to hire designers to draw up plans for all the parts. And contract manufacturing algorithms have the products made and drop-shipped. Some bots must have gotten stuck in a loop of digging on kettlebells, calculators, and vintage MTV.

That’s all fine, but why did you pay for it?

Probably one of my devices got compromised and told another device to order it. At least if my stuff is under the control of a botnet that’s making money by ordering stupid shit for me online, they’ll keep my devices safe from the botnets that order Fentanyl from overseas and hire kids to steal it off my porch. Or worse.

Are you going to send that thing back?

I don’t think so, now. It’s oddly reassuring. As long as I keep getting this kind of stuff I’ll know there’s a botnet looking out for me.

08 Jun 2019

Some minor blog fixes

Just cleaned up some CSS on this blog, so if you missed the post about making ASCII Art work with responsive layouts, check it out now. The examples should work better on small screens now, and the overflow on bigger screens should do the right thing.

Also ran linkchecker and fixed the permalinks that got broken when I updated some JavaScript dependencies. (Whee.)

That’s about it.

07 Jun 2019

they always told me I had a face for radio

On a podcast with Katherine Druckman and Doc Searls at Linux Journal.

Some of the stuff we talked about…

advertising and signaling

Doc: The latest Firefox does block third party tracking. And that’s a huge move. But there’s a drift of the other browser makers in their own different ways are doing things. So and that’s part of a larger trend. So maybe you could just sort of unpack that for us.

Me: Apple Safari got out ahead of the rest of the pack in a lot of interesting ways. And people certainly joke about the $999 monitor stand. But it’s pretty clear that the people who buy that kind of high end equipment are also the people who are least trackable by conventional adtech. So if you want to reach the kind of people who can either afford a tricked-out Macintosh, or have a job where their employer will buy them one, then you really have to think about how do we place advertising in such a way that it doesn’t depend on the kinds of creepy tracking that the Apple Safari developers have put so much effort into avoiding.

Doc: Katherine, maybe I’m wrong about this. But didn’t we finally put the nail in the lid of the coffin of Google Analytics for our own website, when we found that as Don was just saying our readers block tracking for the most part?

Katherine: It was it was so far from being accurate that it was not useful anymore. Who doesn’t block tracking? I’m kind of throwing this out there, but I want to say a good 60% of our traffic was blocking it.

Me: I’ve seen some numbers from web developer sites, and from blogs that focus on web development, and those are often showing a 30 to 40 percent block rate. But it’s really interesting that Linux Journal readers are coming in at 60.

And so there are, right now, marketing organizations that are going out and trying to reach the kind of people who buy virtual private servers or software as a service products or developer tools. And if you do conventional data driven marketing, when you’re going after that kind of audience, then you’re really going to get a lot of fraud bots, and your marketing operation is going to be making decisions based on what bots like to read, not so much what those what those those high tech, or highly protected, users are interested in.

Doc: Have you seen a sign of that knowledge being generalized, beyond, you know, people like us talking about it, where it’s becoming obvious to some people in the marketing side that the most valuable people are going to be the ones that are most protected?

Me: I’m cautiously optimistic because of the change over from targeting millennials to targeting Generation Z. I don’t know if you’ve seen it, but marketing thought leaders are changing up all their slides. And they no longer say the millennials are different. Now millennials are boring, and Generation Z is all different. But it’s really recycling a lot of the same millennials material. So there’s a nice niche opening up for a marketing thought leader to scrap the generation-driven slides that everybody has already seen, and become the marketing thought leader of the tracking protected segment.

Doc: Today you get injected with all these third party cookies that get arranged like a DNA string that gets presented at every site you go to subsequently. Do you see any hope for either ending that or modifying it with the kind of things that are going on now or blocking the third parties? I guess the question is, are we stuck?

Me: I don’t think we’re stuck. I think that a lot of the talking points that we’re getting from adtech and martech today are very similar to what email spammers were coming up with in the early days of spam filters. The early spam filters, of course, were done by technical early adopters, the kind of people who read Linux Journal and know how to write .procmailrc files. And when those people started rolling out their original, simple spam filters to the the less Internet skilled users, the spammers started started saying, hey, wait a minute, users like getting messages about opportunities for great savings on HERBAL VIAGRA, or whatever the latest spam campaign was.

The message from the email spam scene was really that privacy nerds are less in tune with the preferences of regular users than we the spammers are, so you should pay more attention to what spammers want, and less attention to what spam filter developers think is the right thing to do. And we really saw that not come across very well as as email spam moved from a niche issue for people who had had their email address out there for a long time to being a mainstream day to day annoyance. The general population of users turned out to be more like the privacy nerds than like the the way that the spammers predicted they would be.

Doc: I’m thinking that you could go into how people actually have a pretty good sense of behavioral economics, that they’re good behavioral economists to some degree if you want to have any money left. For us, because obviously, we’re we’re all walking through a minefield that all of us understand in somewhat different ways. But I think one of your points is that people do become pretty adept even if they don’t fully understand what’s going on behind the surface.

Me: And you can’t make the optimal decision for most of the decisions that you have to make in your economic life. I don’t have the time to buy the optimal pair of socks. So what is a set of tools that I can use to get an adequate pair of socks in the amount of time that I have to make that decision? And of course, advertising isn’t the entire story behind building brand reputation. But it sets up part of the information that people can use to evaluate a product or, or figure out the reputation of a brand. Brands are really interesting. Brands are a cognitive hack that uses our brain’s hard-wired circuitry for evaluating each other’s reputation.

What would be a really interesting piece of research would be comparing TV ad spend on cars to later Consumer Reports ratings. Are TV ad budgets a reliable leading indicator of how well that car actually does in the independent test? It’s kind of like when Rory Sutherland at Ogilvy compared advertising to someone betting on their own horse at the track. If you go to the racetrack and see that the horse’s owner is betting heavily on that horse, they’ve probably got more information than you do.

Open source and incentives

Katherine: I feel like I see growing resentment, and not just individual open source developers, but small companies and whatnot, just that are becoming more painfully aware of the inequality of, you know, fortunes being built off of what they perceived to be their own backs. And, and the consequences are well, unforeseen.

Me: It’s not just a matter of resentment, there’s also an element of risk there. One of the side effects of having good dependency management tools is that real world IT projects are building deeper and deeper dependency trees. So the success of your web site might depend on some software component three levels deep, whose maintainer is going to burn out, right as you deploy your site, but you don’t know it. So there’s this risk for anybody, depending on open source, that somebody’s choice to stick with it just doesn’t pencil out. And the people who you need to stick with it in order to have your thing be successful might be several hops away. There’s still no good way of getting that information, propagating the developer pressureMaybe a market mechanism (PDF) could help? from the developer who’s experiencing it to the leader of the project that depends on their work.

Listen to the whole thing.

21 Apr 2019

don’t preempt me bro

Some surveillance marketing organizations have suggested adopting a Federal privacy law in order to preempt the California Consumer Privacy Act. Preemption would be bad if it actually happened, but the fact that they’re trying for it is the best endorsement I have ever seen for the California Consumer Privacy Act. If I wasn’t a CCPA fan before, I am now.

In my humble opinion, preemption is the wrong direction. Privacy regulation should be complicated enough to impose significant transaction costs on database marketing practices. State-level privacy regulations are a start, but what about county or city ordinances? User tracking allowed on alternate sides of the street on different days of the week?

Why would I want to see costs and complexity imposed on the surveillance marketers? I’m going to leave the political stuff out for now. From a selfish point of view, as an individual considering buying stuff, I am going to get ads, and I’m going to get them matched to me in three ways.

  1. Context. Placed on a resource I’m interested in using, like a magazine article or a bus bench.

  2. Search. Matched to search results when I look for a product or a service, like a Yellow Pages ad or a Google search or Maps ad.

  3. Personalized targeting. Matched to me based on something the advertiser knows about me.

On the Internet, many ads are placed using a mix of these techniques, and it’s hard to split out how a real-world marketing budget is allocated across them. And information originally collected based on context can leak and start getting used for personalization. But the technical and regulatory environment affects how much money advertisers choose to invest in each one.

As the recipient, or potential customer, the three ad placement methods affect me in different ways. Ad money allocated to context is a subsidy for something I want to use, whether it’s local news coverage or an ad-supported public restroom.

Ad money allocated to search is almost as good. I’ll use a search engine more if it gives helpful results, so search advertising also pays for something I want.

Personalized targeting, though, is a problem. Instead of paying to support something I want, the advertiser is paying to reach me as an individual. The fact that my information is in somebody’s database is a risk to me, but a source of revenue for the database owner. It’s a classic Negative Externalities problem. Besides, anything spent on this stuff does not go to pay for the ad-supported resources and search services I really want.

Ad-supported cultural works have positive externalities, when they’re re-purposed for other uses. The “Star Trek” advertisers got their money’s worth in 1966-1969, but people are still watching the show today. Kurt Vonnegut quit his job as a car dealership manager because he sold stories to Collier’s magazine.

As a member of the audience for advertising, I win when I can help move the marginal advertising dollar from personalized targeting to either context or search, because a fraction of the money that gets moved pays for something I want, some of it is likely to create positive externalities, and none of it gets spent on creating risks for me. Regulation is a piece of the solution, and a mess of confusing regulations could be more effective in raising the relative price of personalized targeting than a single set would.

People’s intuitions about marketing practices are economically sophisticated.

  • People often choose to pay attention to ads that carry economic signal.

  • People are quick to develop banner blindness and other habits to avoid low-signal advertising.

  • People choose not to invest a lot of time in low-effectiveness ways to protect their personal information, but pick up on measures seen as effective, such as Do Not Call.

People who grow up in ad-heavy economies learn the economics of advertising like people who grew up playing ball learn physics.

What we need to see from privacy regulation is

  • increase the transaction costs of negative-externality advertising practices.

  • credible promise of reducing risks, to atract mass participation.

Privacy regulation has to have the confusion and cost from the advertiser side increased, in order to balance out the risks and costs imposed on the audience side, and shift ad budgets.

20 Apr 2019

PWA notes: CSS Grid

(update 21 Apr 2019: copy edit, add some explanation.)

I’m learning how to make a Progressive Web Application. Progressive Web Applications are a good thing because they give people a lot of the features of mobile apps, but run in the browser where it’s easy to turn on privacy protection features.

Here’s how it’s going so far. A simple polyhedral dice roller for Dungeons and Dragons, and similar games that use many-sided dice.

Links: standalone, source.

(Yes, I know real dice are better for a real game. This is for when you forget your dice but not your phone, or have a few minutes to prepare something.) This is mainly designed to run on a phone, but it does take keyboard input, and if you can see it here it also works in an iframe.

I ran into some browser compatibility problems trying to do the ASCII Art as real ASCII Art, so the ASCII Art is really a PNG. To look on the bright side, though, finding a browser incompatibility issue means that this is now a Real Web Project.

Here’s how I got the help to work the way it does, with CSS. In its regular place, the keypad is laid out calculator-style, and on the help page, the buttons are laid out in a column on the left with the explanation of what they do next to them.

The keypad and the help page are really the same content, so each button’s help is a p element that lives right next to the button element. Turning help on and off doesn’t navigate you to a separate help page, it just moves the keypad to a new parent element where it is styled differently.

The #keypad div starts off as a child of the #compact div. Inside of #compact,

  • the grid is four columns: grid-template-columns: 1fr 1fr 1fr 1fr;

  • the help text is styled with display: none

  • the tall button is grid-area: span 2 / span 1 and the wide button is styled with grid-area: span 1 / span 2;

  • the 0-9 button, only used in help, is also display: none

Moving #keypad to be a child element of #help means that different styling applies.

  • the grid is two columns: grid-template-columns: 1fr 4fr;

  • All the help elements are display: block so they show up, and take up positions in the grid.

  • The tall and wide buttons are single sized.

  • The individual numbers are display: none and the 0-9 button is display: block.

Putting the help text next to the button it applies to should make it easier for me to check that the help text for each button is there and up to date, and I don’t have to make a separate help page.

Next step: figure out how to make this Do The Right Thing with a screen reader.

14 Apr 2019

responsive ascii art

(Update 17 Apr 2019: Yes, I know it works on Firefox but is messed up on some other browsers. I made an issue: Issue #29609 | webcompat.com)

I’m making a web thing (for Progressive Web Application practice) and could use a header image. I’ll just go old school and do some ASCII Art.

Wait a minute, though. All the cool web sites now are Responsive. So the header should work at different sizes. So what I want to do is to get ASCII Art to behave like a regular image. If I make Ye Olde .Sig Sword

/ O===[============================> \

and I want it to look good inside the containing element, I want the text to resize, not reflow.

Kind of like this.

Hi, here we are inside a narrow element. Here is a little tiny sword.

/ O===[============================> \

The dashed red border is just to show how big the div is. Here’s a bigger one.

This div is wide. The normal text is the same size. Behold my large sword!

/ O===[============================> \

The answer so far: put the ASCII Art inside an svg element, like this.

<p><svg viewBox="0 0 170 24" xmlns="http://www.w3.org/2000/svg">
<text x="0" y="0">
    /
O===[============================&gt;
    \
</text></svg></p>

Needs some CSS, like this:

svg {
        width: 100%;
        overflow: auto;
}

svg text {
        line-height: 100%;
        white-space: pre;
        font-family: monospace;
        font-size: 8px;
        fill: #666;
        user-select:none;
        -webkit-user-select:none;
        -khtml-user-select:none;
        -moz-user-select:none;
        -ms-user-select:none;
}

The “white-space: pre” gives me the ultimate image editing environment: free-form multi-line ASCII art text within the text element. Yes, I still need to use &gt;, &lt;, and &amp;. The fill sets the color.

One small annoyance is that the text of the ASCII Art can be selected if the user double-clicks, or drags, or long presses on a touchscreen. So the user-select stuff is to prevent that from happening.

Sources

11 Apr 2019

how I think about how the New York Times thinks about privacy

A. G. Sulzberger, publisher of The New York Times, writes,

If you’re reading this essay on an internet browser, it offers a useful example of what tracking looks like at a practical level. Before you had time to read a single word, a number of different companies had already placed a “cookie” or other tracking mechanism on your browser to study your internet use. The Times hosts these trackers for three purposes: to learn about how people use our website and apps so we can improve their experience; to reach readers we hope will subscribe; and to sell targeted advertising.

Read the whole thing. But my inner tech editor could not be silenced, and had a small suggestion. How about…

If you’re reading this essay on an internet browser, it offers a useful example of what tracking looks like at a practical level. Before you had time to read a single word, your web browser had already accepted a “cookie” or other tracking mechanism from a number of different companies to study your internet use. The Times hosts these trackers for five purposes: to learn about how people use our website and apps so we can improve their experience; to reach readers we hope will subscribe; to sell targeted advertising; to leak our readers’ personal information to help our competitors sell ads targeting our audience; and to enable fraudulent bot traffic to impersonate human visitors.

As soon as I make the web browser, and not the tracking company, into the subject of the sentence, it helps explain some of the business reasons for news sites to focus on privacy. For a site, examining your own privacy practices is fine, but it’s not where the big wins are. The important part, for the New York Times and other sites that need to protect their ad revenue, is to work along with in-browser tracking protection technology. Protecting reader data for the readers is mostly the same as protecting audience data for the ad business.

It’s kind of like the situation with email. Email is a viable marketing medium today not just because legit email marketers don’t spam, but because email users have good spam filters. Spam filter technology kept low-value email lists from devaluing email marketing. In-browser privacy technology is starting to reverse the process by which low-value cross-site tracking has been devaluing web advertising.

The Times is already doing some good service journalism on web privacy. Next step: set up the paywall to give extra free articles per month for anyone running Apple Safari ITP Apple Safari ITP or Firefox ETP? The more reader eyeballs a a site can remove from the race-to-the-bottom eyeball market, the more market power it has.

Spam filters and legit email marketers saved email as a marketing tool. Can privacy-protecting browsers and legit ad-supported sites do the same for the web?

31 Mar 2019

How to spam software developers and get away with it?

(somebody tell me why this doesn’t work)

Step 1: Adopt a GDPR Everywhere policy. This is obviously good. Show me a company in the IT business that hasn’t decided to go GDPR Everywhere, and I’ll show you a company that hasn’t finished writing out all the user stories for how to handle it when some users or partners are covered by GDPR and others aren’t. Or what happens when you have been giving a user the creepy second-class privacy policy for a while and then they go get married to a European, or go work for a European company, or something. Basically every IT company is going to either go GDPR Everywhere or sign up for years of intricate, expensive legal work and arguments that they’ll eventually give up on.

Step 2: Have products and services interact with open source, and collaborate and test upstream. This is also obviously good. Pull open-source Git repositories and run integration testing and metrics and whatever on them. We shouldn’t just sit there and pull whatever comes out at the end of the development process, help with the QA, publish peer-reviewed research, whatever.

Step 3: Congratulations, we’re now a data controller under Article 14 of the GDPR. Git repositories are full of PII. Every commit includes the developer name and email address.

Oh, no, PII! Does that mean we can’t work with open source?

Of course not. Open source is still legal. But we have to comply with our data subject rights obligations under Article 14. We have to contact everyone whose PII we hold, and notify them clearly of what we’re doing with their data.

And what are we doing with it? We’re using it to do open source QA that feeds into making our product better. And we have to explain what we’re doing in our Article 14 notification. So the European Union basically just told us not just that we can send our elevator pitch to a bunch of software developers unsolicited, but that we have to.

Right?

19 Mar 2019

Open source, cognitive surplus, and precarity

(edit 28 May 2019: fix some awkward or unclear sentences.)

Someone once remarked (paraphrased) that as long as there has been a scene, there have been people complaining that it is no longer the true scene. (citation needed)

Of course the open source scene is changing, but how much of that is the unavoidable transformation that a healthy scene goes through, and how much is fundamental?

The Free Software movement as we know it started by capturing the tremendous cognitive surplus that was just there for the taking from university students and from employees of conventional, slothful corporations. Back in the 1980s and early 1990s, barriers to cooperation were transactional: licensing and communications technologies. Patches on a mailing list seem like a high-overhead collaboration method today, but by the standards of the time, diff(1), patch(1) and Free command-line tools were transformational. And of course the classic free software licenses are practically zero-overhead for participants with uncomplicated sharing or reciprocity goals.

So, all that cognitive surplus was just sitting there between classes or TPS Reports or whatever, and the software freedom scene was set up to capture it. Before long, Tim O’Reilly and friends branded it as a software business trend called Open Source, and the modern software business emerged.

Sounds great—why isn’t it continuing to work like that? Two reasons.

Less cognitive surplus in the world

  • The kind of university experiences that include substantial cognitive surpluses are less widely available, because of increases in the cost of higher education and how those costs are allocated.

  • The work environment is better at capturing cognitive surplus.

Precarity is a thing. Compared to the early days of open source, the rent is too damn high.

Internet adoption by people with less “free” time

There’s a whole complex privilege thread here, but the main point is that open source as we know it began when a lot of people who had a lot of free time got on the Internet. They (fine, fine, we) had the opportunity to participate in open source and other cognitive-surplus-capturing activities (such as MMORGs). Many new people joining are not coming in with the same economic and time advantages, even if they have access to the same or better creative and collaborative tools.

More competition to capture available cognitive surplus

Open source is no longer the only practical, low-overhead way to do collaborative projects. Now people can do

  • crowdfunding

  • gig sites

  • native app stores (mobile, Steam…)

  • software as a service

It’s no longer a choice between low-overhead, low-incentivization (open source) or accepting high overhead if you want to get paid.

What next?

Open source advantages in transaction costs are still there. But people looking for open source contributors do have to realize that we’re going to have to keep increasing the number of people who consider open source as a possible valuable use of their time (remuneration issues are blockers) or see open source lose contributors as we get stuck competing with more outlets for less already-unmonetized time and attention.

10 Mar 2019

perfection is achieved

A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.

—Antoine de Saint-Exupéry

The perfect surveillance advertising business model has been discovered. I have seen it. Maybe you have seen it too.

Here’s how it works.

  1. Identify users likely to be enraged by a political issue. Start a PAC with a name related to that issue. Also start an LLC.

  2. Run rage-provoking social media ads about your issue, with a call to action to sign a petition or complete a survey.

  3. Upsell a fraction of the people to make a donation to your political action committee. Use a Dark Pattern to get some people to consent to having their credit cards billed monthly.

  4. Actually deliver the petitions or surveys or whatever, and actually make a few donations from the PAC to politicians to get on their donor lists. But keep most of the money for yourself by paying it to your LLC (from step 1).

It’s data-driven. It’s sticky. It’s social. It’s got everything that a surveillance marketing business needs, and nothing it doesn’t. No manufacturing. No support. Not even any drop-shipping. Just register a PAC and an LLC and start running ads and social media.

If you don’t get these perfect ads, yay for you. Your eyeballs are probably too expensive on the ad impressions market, or you don’t seem like the kind of person who would get enraged about any issue that they have a landing page for.

One reason that this model works is because people made so much noise about the Citizens United decision and shadowy political groups. So the people who are paying into these things probably never even feel ripped off—they feel that’s just how things work now, and they’re getting away with working the system.

Update 20 May 2020: Exposing the Dorr Brother Scams: How do the Dorr Brothers do it? This is a similar business model. Not surprised that a lot of people are trying new models in this area.

Update 20 Aug 2020: How not to do it: Steve Bannon arrested and charged with fraud - Axios

Update 12 Sep 2021: Copy edit, add Dark Patterns link.

Update 16 Oct 2023: Man who advertised fake dinner with Donald Trump sentenced in tax case - The Baltimore Banner This is an example of doing this business model wrong. This guy messed up. You have to pay the money from the super PAC to the LLC—that way the individual items are on the LLC’s budget, and can stay confidential. Then do your taxes by the book. (If you really don’t want to pay much in taxes, take your money out as expense reimbursements or loans from the LLC.)

Update 26 Jun 2024: These Political Nonprofits Spend 90% of Donations on Fundraising — ProPublica [E]xperts said that there is lax oversight by the federal agencies in charge of regulating the groups. (for every one company that gets this kind of coverage, how many are going unnoticed?)

Update 15 Dec 2024: Harrison Butker’s PAC raised about $36,000 this fall. None of it went to politicians (This is how you do it. Five stars.)

02 Mar 2019

One place I will probably not buy a car.

(Update 31 Mar 2023: We got some volunteers together and checked this out along with some related stuff. One car dealer in San Benito, Texas (pop. 24,665) was responsible for sending information on approximately 10% of the volunteers, even though only 6.6% resided in the entire state of Texas.)

(Update 23 Apr 2019: I wrote to ask this dealership how they got my info and didn’t get an answer to my question. But they did put me on their email list in time to get the “April Shower of Savings” email so I’ve got that going for me which is nice.)

One place I will probably not buy a car: Franklin Sussex Auto Mall, in New Jersey.

I still have a Facebook account, I don’t check Facebook often enough for it to be a good way to reach me. See the page footer here for contact info. mostly to keep up on the ad scene there. When I checked Facebook’s page of ads targeting me, this company is listed under advertisers “Who use a contact list added to Facebook.”

Facebook ad for a New Jersey car dealer

Somehow, Franklin Sussex Auto Mall got a hold of my email address or phone number, and uploaded it to Facebook. Have I ever shopped for a car in New Jersey? No. If I was shopping for a car, would I fly to New Jersey to buy it and then drive it home? No.

And now that I look at it, when I go through the advertisers that Facebook lists as having uploaded my info, most of them are car dealers I have never visited or contacted. Someone has a pretty good racket going here. How much are they making from the car dealers? (Yes, this is a bad thing, because car dealers could be spending that money to build positive reputation by funding local news, or other ad-supported resources with positive externalities, but we knew that already.)

Maybe when they write the history of the big social site era, it won’t be about some all-seeing panopticon, but more about a bunch of people in a highly paid California bubble, mostly young guys who have been told they’re smart their whole lives, getting out-hustled at a direct marketing business they don’t really care about.

27 Feb 2019

principal-agent problems

Interesting problem: why do brands fail to protect customer data when it would be in their interest to do so?

If expected customer retention of tracking-protected customers is higher, why not invest in tracking protection for your most profitable customers? Why don’t car insurance companies figure the odds on customer retention of protected and unprotected customers the same way they figure the odds on other risks?

It might be because corporations are not decision-making entities, and online marketing is the world’s longest chain of principal-agent problems.

The value of a database marketer as an individual on the job market is a function of the number of database-capturable prospects that the marketer will help an employer land as customers. If a lot of Allstate customers are also available on DMPs, then more VC-funded insurance startups will launch, and they’ll bid up the salaries of database marketers now working at Allstate.

If Allstate’s best customers are protected, then the VCs invest in something else, the job interviews don’t happen, and Allstate can keep paying their database marketers what they’re paying.

So: principal-agent problems are market design opportunities. How to structure compensation for marketers to incentivize customer retention even after that marketer is no longer employed by the brand? (People generally want to do the job right, you just can’t keep throwing incentives to do it wrong at them.)

And how to increase the social rewards of the choice to allocate marketing budgets towards positive-externality advertising and away from negative-externality advertising? Imagine that a restaurant chain is opening a new location and wants to reach people there. They have two choices. Buy ads that pay for local news and cultural content that is written for people in that area, or they can buy ads that pay to make those people more depressed, manipulate their elections, and try to taunt them into massacring each other. The social rewards for choosing the first should tend to go higher.

21 Feb 2019

why I’m not reading your think piece

White power assholes are not exactly the smartest people on the Internet. State-sponsored manipulation operations have better skills and can use the domestic guys as human sock puppets.

Pedophiles aren’t the smartest people on the Internet either. Even the “highly technical dark web” pedo networks are using off-the-shelf tricks that are far behind what even the most basic adfraud operation can pull off.

So I’m writing to explain why I’m going to move your long-form think piece about the “power” of the Internet “duopoly” to the probably-never-going-to-get-to-it end of my to-do list. Let’s have a look at just the last 24 hours:

(Come back in 24 hours for more.) You’re asking me to be interested in reading your ever-so-thoughtful essay about the awesome power of two companies that have to be like the fifth most influential people on the Internet, max. Even the Facebook ad integrity guy is down to asking for free reports of scam ads, on Twitter.Dude, if the “get everybody else to do your QA for you” strategy worked, then we’d all be running desktop Linux.

The “powerful platforms” are a box on the Internet cable between terrible marketing decisions on one end and criminals and terrorists on the other. A box maintained by vaguely creepy but not especially interesting IT staff. Yes, let’s write about CMOs who attach their brands to heinous shit—what’s up with that? Yes, let’s write about the criminals who end up with the money—that can’t be good. But the companies in the middle are not the story.

06 Feb 2019

Four ideas that the privacy business can borrow from the open source business

Please let me know if the following makes any sense, and if so I’ll turn it into a talk.

It’s not a simple game of people vs. companies. In software, you don’t just have evil “software hoarders” vs. cooperation-minded “users”. There are way more players: OS vendors, hardware vendors, proprietary ISVs, developers of internal applications, and IT organizations. At least. I’m sure I forgot some. But the point is that they don’t all have the same interests. Pretty much everyone who does software wants everybody else’s software to be open source. So if you look at everybody’s Core vs. Context, people will generally play nice in open source projects doing whatever their Context (or complement, if you want to look at it that way) is.

In user data, you’ve got the Five Armies: content creators and their publishers, companies trying to sell stuff (advertisers/sponsors/signalers), intermediaries (adtech/platforms), client-side developers (browsers/privacy tools), and fraud hackers. A high-reputation brand with a solid mailing list has completely different user data handling interests from a social platform—just like a network chipset manufacturer will have different open source interests from a proprietary OS vendor.

True believers aren’t enough to build on. Some people are really fired up about Internet ethical and policy concerns, but most people would somewhat prefer the right thing, and telling them that you do the right thing makes them feel better about doing it and somewhat more likely to do it. But doing the evil thing is not a deal-breaker.

Loud complaints don’t matter (much). Yes, the first open source release will include a license mismatch, or somebody’s ssh private key, or it won’t build without a tool you didn’t include, or something. And somebody will complain. But the true believers are useful for QA to guide incremental improvement, not as gatekeepers to decide if you’re in or out. (And if you fix something that someone is complaining about in a particularly annoying way, do it quietly. Eventually they’ll make their complaint to a reporter who will check it out, find the fixed version, and start ignoring them.)

Hardly any company will get to 100%. Robert Penn Warren said it best.

Man is conceived in sin and born in corruption and he passeth from the stink of the didie to the stench of the shroud. There is always something.

Even companies that focus on open source have awkward corners where they can’t Do The Right Thing, because reasons. And most of the code contributed to open source projects is done on the clock at companies that are also in the proprietary software business.

Just like IBM didn’t need to have a plan to open source AIX in order to make a difference in Linux, companies don’t need to have a plan to get clean of all surveillance marketing activity to make significant moves in the direction of user privacy. An insurance company might decide to remove third-party pixels from the pages linked to from existing customers’ bills, to keep from leaking customer data—but keep social tracking pixels for some other pages for tracking conversions on a social campaign. Anyway, open source program offices are a thing. What about customer data protection offices?

04 Feb 2019

this guy is bad at game theory

From the spam folder:

I am a well known hit man on deep web. Someone paid me 1000 USD to beat you and broke your right arm. (Why? I don’t know) I will take 1000 USD more after my client sees your broken arm!

If you send 2000 USD to me, I will cancel the job, and I will give you the name of my client. Else, I will finish my job asap!

Send the above amount on my BTC wallet (Bitcoin): 3JDLJWW5K6AsP1VBUD1Dgsxk9ydtcdMFvz As soon as the payment is completed I will receive a notification and a new email with the client’s details will follow.

You have 24 hours from now on!

Hold on a minute.

He has a reputation, but he’s going to ruin it, and burn an existing customer, in order to earn 1.5x what he was originally going to earn from the deal?

Where did this clown learn his game theory?

Nobody would take an established brand, fail to deliver the product or service that the brand was originally known for, and leak their good customer’s private info, just to go chase incremental revenue driven by unproven new technology, right?

02 Feb 2019

Consent and bots

Two kinds of web clients who it’s a bad idea to serve a third-party resource to:

  • Users who have not given consent. We know we can’t use their data. But third parties can peek at those users because their tracking script or pixel is on the page. If the first party can’t have that data why should the third parties get it?

  • Adfraud bots. Bots come to visit legit sites to build up realistic-looking cookies so they can cash out elsewhere. Bad idea to help them.

Consent management requires some interaction with the user, which is also an opportunity to collect data for assigning a botness score.

Bots will also try to appear to be visitors who have already given consent, and go get the third-party resources anyway. This is an interesting problem because it’s a game where the bot and the third party are on the same side, and the site is on the other. Impossible for the CMP to block the bot connection to the third party, but is it possible to show that consent was not in place when that connection happened? Understanding the provenance of the consent string is going to be important. An extra cookie containing a digital signature for the consent string?

New CMPs will have an opportunity to build on knowledge gained from regulator reactions to first-generation CMPs. But it’s more interesting to think about sustainable advantage for the site than just about regulatory future-proofing. For example, a good consent management platform will also tie in to an objection management platform/opt-out management platform.Objection management platform and opt-out management platform both work out to OMP—anybody using that TLA?

People ask about whether consent records obtained by conventional CMPs are even good. (Risks in IAB Europe’s proposed consent mechanism | PageFair) The first-generation click OK to make this dialog go away and consent to everything UX is unlikely to last, but what’s next?

  • Design the CMP to work in the interest of the CMP customer, not third parties.

  • Understand the (painful, because anything touching the CMS is painful) changes involved in taking 3rd parties out of the page template entirely when the page is going to a no-consent user. No peeking!

  • Future-proof consent workflow to allow for adjusting for regulatory changes (boring) and revenue or data opportunities (fun)

  • Integrations: objection/opt-out mangagement, single sign-on, paywalls, in-browser/in-extension consent mangement.

  • And of course, get out in front of coming browser privacy improvements. Need an open-source strategy including participation in browser and extension projects.

01 Feb 2019

Attention humans.

Attention humans. We are in a life and death struggle with our enemies, the pathogenic bacteria. Our scientists have developed secret weapons, the antibiotics. It is vital to use these weapons only when they will make a difference, in overwhelming force, and to leave no survivors. The enemy must be prevented from developing countermeasures. Do you understand?

Can we just betray our most effective weapons to the enemy if it’s in exchange for CHEAP MEAT?

(facepalm) Let me explain this again.

30 Jan 2019

From privacy compliance to user data wins

Open source program offices are a thing. What about customer data protection offices?

A little background: when the open source business as we know it was getting started, most of the original concerns about free software in business were about license compliance. Many people assumed that all software companies would pursue maximum restrictions using copyrights and patents, and users who wanted to use, modify, and redistribute software would be their adversaries.

Then, Tim O’Reilly and others started changing the conversation to talk about open source strategies. How can a small company release high-impact software by building on collaboratively developed work? Now, as open source has caught on all over the software business, it seems obvious that people think about

  • business models made possible by open practices

  • open source companies as market participants competing for users while cooperating on common work

But it was a big mental shift at the time.

Today, a modern open source program office has to handle issues of license compliance, including training developers to follow and apply licenses, and checking the licenses of inbound software for compatibility. But the big picture is about using open source for sustainable advantage.

Maybe, today, we’re still thinking about privacy as a compliance problem. Users and regulators on one side, companies on the other.

But what about a company that has a solid first-party relationship with a customer? What if the person is known to open the email newsletter, come in the store, answer the surveys—you’re not in an adversarial relationship with that person over their data. The company and the customer are on the same side. When privacy concerns and adoption of privacy tools help get the person protected from targeting by some fly-by-night competitor, that’s a win for both.

If you’re running a bank, you don’t want some cryptocurrency scam picking off your high-value customers. Those people’s lifetime value is going to go way down when they’re selling off all their stuff because the bank bought a “custom audience” social campaign targeting them, and the data leaked. If the bank had a customer data office thinking a step ahead, instead of just checking compliance boxes, it would have considered the data leakage risk along with the social campaign’s possible upside.

Or a healthcare brand might run what looks like a harmless campaign, but some clever data management platform can infer medical data from it, and a “miracle cure” racket uses the data to pick off the customers. Before you know it the customers stop filling their prescriptions and start loading up on colloidal silver or something. A customer data office would have had the data science skills to see the risk, and offset it, possibly by offering the customers a free service to help them opt out of high-risk data processing.

Even for just a regular product, when a VC-funded “direct to consumer” competitor comes in, with no customer list—how do they grow so fast? Buying targeting data on the open market, because the existing brand haven’t learned to protect their interests. Where does a brand’s interest in customer data coincide with the customers’ own interest in privacy? Instead of purely focusing on compliance, a customer data office will understand the risks and opportunities.

Anyway, software freedom went from a contentious idea to the source of much value in a remarkably short time. What if something similar happens with privacy?

28 Jan 2019

Perfect timing

(I work for Mozilla. Not speaking for Mozilla here.)

January 28, 2019:

Male impotence, substance abuse, right-wing politics, left-wing politics, sexually transmitted diseases, cancer, mental health….Intimate and highly sensitive inferences such as these are then systematically broadcast and shared with what can be thousands of third party companies, via the real-time ad auction broadcast process which powers the modern programmatic online advertising system. So essentially you’re looking at the rear-end reality of how creepy ads work.

Natasha Lomas, on TechCrunch

Also January 28, 2019:

Simply put: users need more protection from tracking….In support of this effort, today we are releasing an anti-tracking policy that outlines the tracking practices that Firefox will block by default. At a high level, this new policy will curtail tracking techniques that are used to build profiles of users’ browsing activity. In the policy, we outline the types of tracking practices that users cannot meaningfully control.

Steven Englehardt and Marshall Erwin, on the Mozilla Security Blog

24 Jan 2019

Suggesting a small fix to a Privacy Manifesto

London Trust Media has a Privacy Manifesto up.

Many good points, but there’s one small fix that could make it more useful. From the original:

  1. It is now the norm—even in the presence of laws clearly forbidding it—for nearly every commercial website we visit to plant tracking beacons in our devices, so our lives can be examined and exploited by companies and governments that extract personal data and manipulate our lives for their purposes. This offends our privacy and diminishes our agency.

Here’s a suggested new version edited to be clearer about how browsers work.

It is now the norm—even in the presence of laws clearly forbidding it—for nearly every commercial website we visit to include tracking beacons in their pages, and for our browsers to load and run those beacons, so our lives can be examined and exploited by companies and governments that extract personal data and manipulate our lives for their purposes. When our own browsers work against our interests, this offends our privacy and diminishes our agency.

Please don’t assign all the work to the site.

It’s counterproductive to ask the site to be the one to bear all the costs of privacy reform. The site is the player with the least economic power and the least freedom to change. Web publishers and brands failed to protect their audience and customer data and are now, unfortunately, kind of stuck. Because third parties control the audience information that’s needed in order to make ads saleable, no one web site can unilaterally switch off the data flow that makes their business model work.

On the browser side, though, it’s different. Browser developers know that they can get more user satisfaction, and get users to spend more time in the browser if those users have functionality that makes them feel safer.

This stuff needs to get fixed and browsers have the motivation and skills to do it. Let’s focus on productive next steps by the parties that can afford to change. The result will be a new web advertising business that works better for sites and brands, too.

Sites can’t take the big step to remove tracking scripts entirely, but there are a few things that sites can do to assist with ad reform.

  • Fix any “turn off your ad blocker” scripts to detect ad blockers only, and not falsely alert on privacy tools.

  • Fix up the privacy page to add links to appropriate privacy tools based on the user’s browser. (Aloodo has JavaScript for this.)

  • If you maintain a privacy tool, offer to do a campaign with the site. Privacy tool users are high-quality human traffic. Free or discounted privacy tools might work as a subscription promotion. Where’s the win-win?

Asking a site to walk away from money with no credible alternative is probably not going to work. Asking a site to consider next steps to get out of the current web advertising mess? That might.

More: What The Verge can do to help save web advertising

19 Jan 2019

Behavioral targeting and investor relations

Eric Shih writes, on the Digital Content Next site,,

Leveraging data to make more informed ad targeting decisions is a breakthrough versus previous methods where ads were un-targeted. Personalized ads are a win for all parties. [emphasis added] It is better for:

  • Users (connects them to more interesting and relevant ads)
  • Advertisers (results in higher return on investment)
  • Publishers (delivers higher CPMs and increased revenue)

Hold on a minute. That doesn’t look right.

The win for all parties is clearly bogus from the user and publisher point of view. Behavioral targeting means that users see more low-quality and deceptive ads, because behavioral targeting gives a long-running structural advantage to deceptive advertisers. As a potential customer, I also win as ad budgets go into supporting context that I care about, such as news and cultural works, and lose as ad budgets go into behavioral targeting with negative externalities.

From the publisher point of view, behavioral targeting creates near-infinite saleable ad inventory on low-value and fraudulent sites, and forces publishers to contend with those sites for ad money. For users and publishers, the less behavioral targeting the better. But what about for marketers? Isn’t behavioral targeting a win for them?

It looks like the answer is: not so much. Here’s a must-read piece on research by Prof. Byron Sharp.

Personalised advertising may be one thing but getting people to respond to even micro-targeted ads is a whole other ball game. However, analysis of 3.1 million ad exposures shows that such adverts generate low click through rates (CTR). Furthermore, some of the responses to such ads are counterintuitive – with a higher CTR coming from ads mismatched to personalities and lower vs. the overall industry average for Facebook ads.

But if micro-targeting is arguably so ineffective, why do some many marketers use it? Sharp and Danenberg highlight several reasons:
  • Marketers often do things based on theory/logic rather than evidence. The worst myths, the longest lasting, are those that sound plausible.
  • Micro-targeted campaigns can boast of high ROI, largely because they are so small, reaching people who had a high likelihood of buying anyway. Marketers see the high campaign ROI from micro-targeting but fail to realise that the overall return to the company may well be lower.
  • It’s fashionable!!!

Read the whole thing. Not surprising if you follow the underlying #behavioralEconomics. And Dave Trott agress with the fashionable point: It’s right because everyone’s doing it.

There might be another reason, though. Maybe part of the problem is that marketing science is hopelessly mixed up with investor relations. After all, adtech firms and ad agency groups are publicly traded companies. And creative ad work, hiring writers and artists and marking up what you pay them, is the kind of business model that stock markets get bored with. Margins are low, you can only grow as fast as you can hire, and your assets can quit and go work somewhere else.

Choosing to place an ad in a quality context is more cost-effective, but again, it doesn’t scale. If your business is putting good ads in good places then the people who do good work have market power. But psychographic models and the underlying data sets are more investor-friendly. Even if it takes torturing the data and putting up with fraud to make adtech look effective. Mathemagickal woo-woo is scalable, more like the intangible assets of a software company. Markets see promise of big margins and high revenue/employee.

How much do investor-focused messages about the effectiveness of behavioral targeting companies interfere with marketer-focused messages about the effectiveness of behavorial targeting in campaigns?

14 Jan 2019

What’s so bad about the pay to avoid ads model?

(edit 11 Apr 2019: Split item 2 into 3a and 3b.)

A lot of people have come up with the idea of a system that lets readers of a web site pay to avoid the advertising. This is obviously bad, wrong and dangerous, for several reasons.

  1. The model assumes that advertising is unredeemably awful, and walks away from future revenue that would be made possible from fixing advertising. (So far, Online Ads Haven’t Built Brands, but what if they could?)

  2. The model creates incentives to make advertising worse. Ever since we started running the auto-playing video campaign for MIRACLE ASS FUNGUS CURE, our subscriptions are through the roof! Bonuses for all!

  3. (a)Because the ads on news sites will keep getting worse and worse, non-subscribers will get more and more of their news from biased sources that re-report and spin it. (The most common sound effect on Rush Limbaugh’s radio show, last I heard it, was him flipping the pages of the New York Times as he selectively quoted from news stories.)

    1. Or, because the ads keep getting shittier and shittier, because that’s the best way to incentivize people to pay to get out of them, ad blocking keeps going up.
  4. As soon as site owners realize that number 3 is growing, and won’t go away, they’ll start lobbying for extensive copyright expansion laws that limit fair use, or create new exclusive rights, or apply DRM to web pages to limit ad blocking, and, as a side effect, restrict other software that gives users control over their web experience. Probably all three. Freedom-hostile companies will repurpose these laws for censorship and break the Internet.

I know that “this stupid idea will break the Internet” posts are everywhere, but I just wrote one more. Keeping the ads just high enough in signal, and low enough in resource suckage and privacy/security risk that they mostly aren’t worth blocking, is just one of the many things that has to come out somewhere close to right in order to prevent a bunch of bad stuff.

17 Dec 2018

Firefox extensions list 2018

One of the great things about Firefox is the ability to customize with extensions.A MIG-15 can climb and turn faster than an F-86. A MIG-15 is more heavily armed. But in actual dogfights the F-86 won 9 out of 10 times. Part of that is training, but part is that the Soviets used data to build for the average pilot, while the USA did a bigger study of pilots’ measurements and recognized that adjustable seats and controls were necessary. Even in a group of pilots of average overall size, nobody was in the average range on all their measurements. Here is what I’m running right now.

  • Awesome RSS. Get the RSS button back. Works great with RSS Preview.

  • blind-reviews. This is an experiment to help break your own habits of bias when reviewing code contributions. It hides the contributor name and email when you first see the code, and you can reveal it later.

  • Cookie AutoDelete. Similar to the old “Self-Destructing Cookies”. Cleans up cookies after leaving a site. Useful but requires me to whitelist the sites where I want to stay logged in. More time-consuming than other privacy tools. This is a good safety measure that helps protect me while I’m trying out new the new privacy settings in Firefox Nightly as my main data protection tool.

  • Copy as Markdown. Not quite as full-featured as the old “Copy as HTML Link” but still a time-saver for blogging. Copy both the page title and URL, formatted as Markdown, for pasting into a blog.

  • Facebook Container because, well, Facebook.

  • Facebook Political Ad Collector, even though I don’t visit Facebook very often. This one reports sneaky Facebook ads to ProPublica.

  • Global Consent Manager, which provides an improved consent experience for European sites. More info coming soon.

  • HTTPS Everywhere. This is pretty basic. Use the encrypted version of a site where available.

  • Link Cleaner. Get rid of crappy tracking parameters in URLs, and speed up some navigation by skipping data collection redirects.

  • NJS. Minimal JavaScript disable/enable button that remembers the setting by site and defaults to “on”. Most sites that use JavaScript for real applications are fine, but this is for handling sites that cut and pasted a “Promote your newsletter to people who haven’t even read your blog yet” script from some “growth hacking” article.

  • Personal Blocklist is surprisingly handy for removing domains that are heavy on SEO but weak on actual information from search results. (the Ministry of Central Planning at Google is building the perfectly-measured MIG cockpit, while extension developers make stuff adjustable.)

  • RSS Preview. The other missing piece of the RSS experience. The upside to the unpopularity of RSS is that so many sites just leave the full-text RSS feeds, that came with their CMS, turned on.

02 Dec 2018

memo to self: learn CSS grid

I need to make a new chart of different kinds of third parties that we will need in the post-creepy ad world.

One area that tends to get overlooked in the data-driven marketing world is defense.

Just a few examples.

  • Amazon stopped sending email receipts, to keep email services from targeting people with ads based on their Amazon shopping habits.

  • Google’s Ads Data Hub restricts how advertisers can combine Google and non-Google data.

  • Facebook announced it would eliminate all third-party data brokers.

What do these companies have in common? They’re marketing’s winners. Meanwhile, publishers festoon their sites with consent management platforms that capture consent for all surveillance marketing, everywhere. They’ll even get consent for tracking by third parties that the publisher doesn’t even use. Why play to lose? If you run a trusted site in a position to get consent and prove you got it, you want fewer other companies getting that user’s data, not more.

So the obvious counterpart to the consent amplification carried out by CMPs is some kind of objection amplification.

If the user clicks something other than “OK” on the GDPR consent dialog, don’t just set their consent to zero. That non-consenting user needs to have their voice heard, not just filed away. Ask: Do you want to deny tracking just by our site, or by all these third parties? Then show them a list of Lumascape firms, most of which look like they were named not by branding experts, but by some guy in Florida who mainly communicates by “finger guns”. When the user says, hell yeah, I don’t want to be tracked by all those companies either, then that’s when the objection amplification starts. Generate a Article 21 objection for every company you can think of, get the user to sign off on them, and send them out. (This is why it has to be a platform. Could be quite a bit of verbiage here.)

Now the record of objections sent is a piece of data for ad sales. “Buy ads here because x% of our users can’t even legally be targeted by those other companies.”

Internet platform companies play defense all the time. Will publishers?

29 Nov 2018

Deleting Facebook as signaling

Many researchers who study human behavior on the Internet will point out that calls to “just delete Facebook” are unrealistic for many users. A lot of people depend on the company for family connections, health-related support groups, or even employment.

Which, of course, should make deleting Facebook an easy win from an economic signaling point of view. If you can credibly stay off Facebook, you’re signaling that you have the skills, wealth, health, and social capital not to need it. What could be better? Why aren’t more people signaling their fitness through conspicuous lack of Facebook dependence? Two reasons.

  • If you delete your account, it’s too easy for others to make fake accounts imitating you, so it looks like you’re on there anyway.

  • The decision to #deleteFacebook is easily reversible. You could easily come sneaking back.

So the thing to do if you want to get signaling power out of quitting Facebook is to not just delete your account, but do two things.

  • keep your account live so that it keeps your name “squatted on” in Facebook-space.

  • take a credible action to lock yourself out (that is in compliance with the Facebook ToS, of course).

How about this? Get a “burner” SIM, make that the one phone number for your account, then let me hang on to the SIM for you. I’ll periodically post a list of everyone whose Facebook account is associated with a SIM I hold, but I won’t be able to log in. I’ll charge a monthly storage fee to keep the SIM for you, but it only comes due when you reclaim it.

27 Nov 2018

Should Internet platforms disclose to advertisers when their ads sponsor illegal activity?

Why do advertisers keep sponsoring illegal activity on big Internet platforms such as YouTube and Facebook? Platforms are running so many copyright-infringing copies, cryptocurrency scams, state-sponsored campaign finance violations, and even functioning as the IT department for genocide (d00d wtf?) that it’s hard to understand why so many good brands are still there.

A big part of the problem is that even though platforms do invest a lot of time and money in removing illegal activity, the advertisers never know. If you’re a CMO making decisions about where to spend your ad budget, your experience of a highly customized social platform is completely different from what most of your brand’s customers see. As a CMO, you see content from people in your social and professional circles, and ads from high-bidding advertisers who want to sell high-margin items such as conferences and SaaS subscriptions. You don’t see as much of the bad stuff. Advertisers pay the bills for illegal activity because they lack the information they would need to stop doing so.

It’s time for Internet platforms to stop hiding this information.

When a platform blocks or restricts distribution of content, require disclosure to the advertisers affected. (link goes to my notes for an upcoming meeting about this. There’s a GitHub link on the page, suggestions welcome.)

More: there ought to be a law

06 Nov 2018

Site engagement, consent management, bargains

One common growth hacking pattern on social and collaboration sites is to build user profiles incrementally. Capture just enough info to get the user logged back in, then get them started using the site. As they get into it, prompt them to fill in more and more profile information. You have probably seen this on new sites where you have to make an account. FIXME: list of good examples here.

People don’t want to give up a bunch of information up front before they see how good the site is. And, I suppose, if the site is good enough that the person thinks they’ll spend more time on it, they’re more likely to provide correct information than all the residents of “asdf” born on January 1, 1970.

But news sites don’t take this approach. Instead of trading a little value for a little information, repeatedly, you get one big dialog asking you to give up all your information before you even read the first story.

Does the same incremental approach that applies to data collection for social and collaboration sites also apply to news sites? Preliminary results from Global Consent Manager tend to indicate that yes, it does.

Bargains?

So here’s the bargain. Right now, the web ad business is set up to bid on ad impressions that come with third-party data, way more than for impressions without third-party data. So a trackable bot impression on a fraud site can produce more ad revenue for the fraud operator than an impression from a privacy-sensitive user running Firefox Nightly or Apple Safari produces for a legit site.

Yes, even though the privacy-sensitive user is more likely to be human and interested in buying something related to the topic of the site.

The opportunity to get a bargain is: instead of relying on conventional programmatic ad buying, if you do a little extra work to understand the audience of specific sites, you can reach more of the humans you’re interested in.

Not every Firefox user who shows up on the Road and Track site is going to buy a car this year, but $1 worth of ad impressions there is likely to reach more human car buyers than $1 spent programmatically—because you get a higher fraction of humans for a lower price. and third-party data on who’s a likely car buyer is bogus anyway, but that’s another story.

This opportunity is likely to go away as more agencies figure it out, but right now it’s a great chance to get humans cheaper than bots.

24 Oct 2018

a common objection to privacy improvements

One common objection to the new third-party cookie behavior in Firefox is something like,

If you block the tracking cookies that advertisers use to decide which ads to target you with, you’ll start getting the low-budget, low-quality ads that show up in the absence of the targeting data that marks you as a desirable customer.

Before I turned on Enhanced Tracking Protection I was getting ads for stuff like cloud computing services and luxury SUVs. Now, with Enhanced Tracking Protection, am I going to get more ads for for FREE nutritional supplements? You know, the offers where you put in your credit card info for shipping and then they keep billing you even after you try to cancel? Or maybe I’ll get offered a great deal on a for-profit college program, or some predatory finance! I can’t wait.

It might be an inconvenience for me to start getting the ads that people get when they’re too broke, or just too old, for high-bidding advertisers to care about reaching them. But the real problem is that legit sites are running those ads in the first place.

23 Oct 2018

a clean ad network for independent sites?

Project Wonderful is no more. This was the ad network that got a lot of the important parts of web advertising right.

  • signal-carrying model: all visitors to the same site on the same day see the same ad(s).

  • brand safety: advertisers choose sites, and site owners approve ads.

  • fraud resistance: ads sell by the day instead of by impression or click.

  • incentive to discover and support new sites the first advertiser to express interest in a site get to run their ad for free until another advertiser places a bid.

But there were still some problems.

  • Project Wonderful was just as vulnerable to ad blocking as regular adtech.

  • The audiences of sites using Project Wonderful were just as vulnerable to tracking as everyone else.

The second one is especially important. Why spend the effort to pick, and run ads on, mutiple independent sites in order to get your ad in front of the right people, when you could just sign up for some user tracking scheme? The people who control marketing budgets need a problem, a trend, and a story in order to shift money from one place to another.

What would it take to borrow and build on the good parts of the Project Wonderful model while taking steps to fix the problem of data leakage?

  • Avoid privacy-focused ad blockers by accepting the EFF DNT policy. Third parties that can pass EFF’s Privacy Badger also tend to stay off other blocklists.

  • Offer unlimited CNAMEs, also to help beat list-based blockers.

  • Don’t participate in paid whitelisting as a network, but individual sites that choose to do so could, for their own specific CNAME.

  • Good metrics on tracking protection adoption by the audience. Show advertisers that these users are hard to reach another way.

  • Include reverse tracking walls, tracking detection roadblocks, and A/B test alternate “turn off your ad blocker” messages to motivate users to get protected from cross-site tracking.

  • Limited, user-permitted data collection with clean consent management.

The hard part for an independent ad network is to offer small advertisers something they can’t get from Google or Facebook. Access to a protected audience?

13 Oct 2018

Zero-click bookstore?

Random idea for a way to make the local bookstore easier to use than the big one-click Internet bookstore.

I walk by the local bookstore all the time but I don’t always have the list of books I want to read with me.

So what about this?

  1. I keep a list of books I’m reading, or want to read, on Github.

  2. When I find out about a book I want to read, I add it to the list, make a GitHub issue, and assign the issue to someone at the local bookstore.

  3. The local bookstore gets the book and changes the status of the issue.

  4. I go pick up the book when I’m walking by the bookstore anyway.

13 Oct 2018

measuring happiness

Another one of those employee happiness reports is out. This kind of thing always makes me wonder: what are these numbers really measuring?

It seems like happiness ratings by employees would depend on:

  • expected cost of retaliation for low scores

  • expected benefit of management response to low scores

The expected cost of retaliation is the probability that an employee’s ratings will be exposed to management, multiplied by the negative impact that the employee will suffer in the event of disclosure. An employee who believes that the survey’s security has problems, that management will retaliate severely in the event of disclosure, or both, is likely to assign high scores to management.

Some employers make changes in compensation or working conditions when they fail to achieve well on happiness (or employee engagement) surveys. If an employee believes that management is likely to make changes, then the employee is likely to assign low scores in areas where improvement would have the greatest impact on them.

An evil company where management makes an effort to de-anonymize the happiness survey results, retaliates against employees who give low scores, and will not make changes to improve scores, will appear to have high employee happiness.

A good company where management does not retaliate, and will make changes in response to low scores, will appear to have low employee happiness.

Of course, this all changes the more that people figure out that getting low happiness scores means that you have responsive management.

29 Sep 2018

Notes on “turn off your ad blocker” messages

At least three kinds of software can be detected as “an ad blocker” in JavaScript.

full-service blockers, the best known of which is uBlock Origin. These tools block both invisible trackers and obvious ads, with no paid whitelisting program.

privacy tools, such as Disconnect (list-based protection) and Privacy Badger (behavior-based protection), that block some ads as a side effect. This is a small category now compared to ad blocking in general, but is likely to grow as browsers get better at privacy protection, and try new performance features to improve user experience.

deceptive blockers, which are either actual malware or operate a paid whitelisting scheme. The best-known paid whitelisting scheme is Acceptable Ads from Adblock Plus, which is disclosed to any user who is willing to scroll down and click on the gray-on-white text on the Adblock Plus site, but not anywhere along the way of the default extension install process.

So any ad blocker detector is going to be hitting at least three different kinds of tools and possibly six different groups of users.

  • People who chose and installed a full-service blocker

  • People who chose to protect their privacy but did not specifically choose to block ads

  • People who may have chosen their browser for its general privacy policies, but got upgraded to a specific feature they’re not aware of

  • People who chose to block ads but got a blocker with paid whitelisting by mistake

  • People who chose to “install an ad blocker” because it got recommended to them as the magic tool that fixes everything wrong with the Internet

  • People who are deliberately participating in paid whitelisting. (Do these people exist?)

Sometimes you need to match the message to the audience. Because sites can use tools such as Aloodo to get a better picture of what kind of protection, or non-protection, is actually in play in a given session, we can try a variety of approaches.

  • Is silent reinsertion appropriate when the ad is delivered in a way that respects the user’s personal information, and the user has only chosen a privacy tool but not an ad blocker?

  • When the user is participating in paid whitelisting, can a trustworthy site do better with an appeal based on disclosing the deception involved?

  • For which categories of users are the conventional, reciprocity-based appeals appropriate?

  • Where is it appropriate to take no action in a user session, but to report to a browser developer that a privacy feature is breaking some legit data collection or advertising?

22 Sep 2018

browser.fastblock.timeout and browser.fastblock.limit

(update 24 Sep 2018: add link to a FastBlock description on Bugzilla)

What’s the difference between Firefox browser.fastblock.timeout and browser.fastblock.limit?

  • browser.fastblock.timeout: When Fastblock starts working (in milliseconds, default 5000 = 5s)

  • browser.fastblock.limit: When Fastblock stops working (also in ms, default 20000 = 20s).

FastBlock only works if browser.contentblocking.enabled and browser.fastblock.enabled are both true.

Starting browser.fastblock.timeout ms after the page starts loading, FastBlock will stop loading new third-party resources that are on the Tracking Protection list.

  • FastBlock does not affect third parties that are not on the Tracking Protection list.

  • FastBlock does not cancel third-party requests that are already loading.

Then, starting at browser.fastblock.limit ms after the page starts loading, FastBlock stops having any effect.

More on Fastblock here

The sole focus of the Fastblock feature is to restrict the loading of trackers. It monitors trackers waiting for the first byte of data since the start of navigation of the current tab’s top level document. If this is not received within 5s, the request is canceled. If any bytes are received, the 5s timer is stopped. In some of the experimental branches, a few tracker requests are whitelisted, and do not have this monitoring. These include resources known to cause breakage, such essential audio/video, and commenting platforms.

Here’s the code.

Fastblock plus European mode

Just going by basic economics, ads placed with more information about me are going to carry less signal and more deception than ads placed only by what page they’re on. Now I’m wondering how well “slow loading ads” correlate with “deceptive ads”. Are slow loading ads slow because they depend on a bunch of complex RTB stuff? Can less creepy ads be faster?

By combining FastBlock with a brower extension to turn on Google’s documented but underrated European mode, I should be able to get a better class of web ads all around. I’ll give it a try and follow up with how it goes.

This post is a work in progress. I’ll update as needed to fix errors and update with current browser behavior.

20 Sep 2018

Fun with YouTube

(update: unmaintained. A better way to do this is cleaning up YouTube.)

In case you missed it: One viral thread shows how quickly YouTube steers people to wacko conspiracy theories and false information (GOOG, GOOGL)

Turns out that fast-moving, hungry misinformation operations are better at YouTube than YouTube is. This is not too much of a surprise. Resting and vesting makes you stupid. It’s like a resource curse for code. Sometimes I think I should start an imposter syndrome cure sanitorium. Main activity for patients will be watching the JavaScript console for all the errors and warnings on sites built by the so-called tech elite. (Look who’s still typing = sometimes instead of == or ===.) Anyway, what do you do when you want to send someone a link to a YouTube video, but you don’t want the engagement anti-features to kick in?

How about addressing the problem on the client side?

Here’s an experimental Firefox extension that will remove the recommended videos sidebar and keep you on the same video even if the pwned engagement algorithm tries to auto-play a different one. So if I send a family member a link, I can have fewer worries that they’ll end up in a rat hole.

Bug reports and pull requests welcome. (yes, I know that I should be using a MutationObserver instead of a timer. At some point I’ll try to figure that out.)

09 Sep 2018

Look who’s defending users from surveillance marketing

(updated 14 Jan 2019: added two more examples.)

What’s the best defense against surveillance marketing? In some cases, another surveillance marketer. Just like hackers lock up a vulnerable system after they break in to protect against other hackers, surveillance marketers who know what they’re doing are helping to protect users from other companies’ data collection practices.

Amazon: Retailers include different degrees of data in email receipts. Amazon only emails consumers links to their full receipts, limiting the information an email provider can extract. Oath gets to know shoppers through their Yahoo emails | Digital - Ad Age

Google: Google’s recent changes with Ads Data Hub keeps data locked within Google Cloud and cannot be combined outside of Google’s controlled environment. As a result, data lakes for marketing are under threat by recent changes by Google. How does Google’s Ads Data Hub Affect My Analytics? (Part III of the Ads Data Hub Series) - Thunder Experience Cloud

Google again: Google Demanded That T-Mobile, Sprint Not Sell Google Fi Customers’ Location Data - Motherboard. (If you want to target Google’s users, better pay Google.)

Facebook: Late last week Facebook announced it would eliminate all third-party data brokers from its platform. It framed this announcement as a response to the slow motion train wreck that is the Cambridge Analytica story. Just as it painted Cambridge as a “bad actor” for compromising its users’ data, Facebook has now vilified hundreds of companies who have provided it fuel for its core business model, a model that remains at the center of its current travails. Newco Shift | Facebook: Tear Down This Wall. (And Facebook even runs a Tor hidden service.)

Real surveillance marketers play defense.

But in most cases, publishers don’t. And that’s Why Local Newspaper Websites Are So Terrible. What happens when news sites can play some defense of their own?

I don’t know, but IMHO it will be an improvement for everybody. And the good news is that browser privacy improvements are finally making it possible.

Let’s discuss. WORKSHOP: User Data and Privacy — Building market power for trustworthy publishers, Sept. 26, Chicago | Information Trust Exchange Governing Association

07 Sep 2018

another omission

(I work for Mozilla. None of this is secret. None of this is official Mozilla policy. Not speaking for Mozilla here.)

Johnny Ryan writes, in ePrivacy: Over-regulation or opportunity?

[A]n ad tech lobby group called ‘IAB Europe’ published a new research study that claimed to demonstrate that the behavioural ad tech companies it represents are an essential lifeline for Europe’s beleaguered publishers….the report claimed that behavioural advertising technology produces a whopping €10.6 billion in revenue for Europe’s publishers.

Surely, the ad tech lobby argued, Parliament would permit websites to use “cookie walls” that force users to consent to behavioural ad tech tracking and profiling their activity across the Web. The logic is that websites need to do this because it is the only way for publishers to stay in business.

We now know that a startling omission is at the heart of this report. Without any indication that it was doing so, the report combined Google and Facebook’s massive revenue from behavioural ad tech with the far smaller amount that Europe’s publishers receive from it.

The IAB omitted any indication that the €10.6 billion figure for “publishers” revenue included Google and Facebook’s massive share too.

That’s not the only startling omission. The most often ignored player in the ePrivacy debate is adtech’s old frenemy, the racket that’s the number two source of revenue for international organized crime and the number three player in targeted behavioral advertising—adfraud.

And ePrivacy, like browser privacy improvements, is like an inconveniently placed motion detector that threatens to expose fraud gangs and fraud-heavy adtech firms.

The same tracking technologies that enable the behavioral targeting that IAB defends are the tracking technologies that make adfraud bots possible. Bots work by visiting legit sites, getting profiled as a high-value user, and then getting tracked while they generate valuable ad impressions for fraud sites. Adfraud works so well today because most browsers support the same kind of site-to-site tracking behavior that a fraudbot relies on.

Unfortunately for those who perpetrate fraud, or just tolerate it and profit from it, browser privacy improvements are making fraud easier to spot. Changes in browsers intended to better implement users’ privacy preferences (as Ehsan Akhgari explains in On leveling the playing field and online tracking) have the helpful side effect of making a human-operated browser behave more and more differently from a fraudbot.

And regulations that make it easier for users to protect themselves from being followed from one site to another are another source of anti-fraud power. If bots need to opt in to tracking in order for fraud to work, and most users, when given a clear and fair choice, don’t, then that’s one more data point that makes it harder for adfraud to hide.

Publishers pay for adfraud. That’s because adfraud is no big secret, and it’s priced into the market. Even legit publishers are forced to accept a fraud-adjusted price for human ad impressions. I’m sure that not every adtech firm that opposes ePrivacy or browser privacy improvements is deliberately tolerating fraud, but opposing privacy regulations and privacy technologies is the position that tends to conceal and protect fraud. That’s the other omission here.

27 Aug 2018

Dialog with Jo Ellen Green Kaiser on user data management platforms

updated 27 Aug 2018: copy edits for clarity, add introduction.

EDITOR’S INTRODUCTION – What follows is an edited Q-and-A exchange between Jo Ellen Green Kaiser, board chair of the Information Trust Exchange Governing Association, and Don Marti. The exchange is focused on explaining why a voluntary user data privacy policy adopted by quality publishers is a valuable companion to upcoming changes in web-browser software. The browers changes will tend to reduce the ability of publishers to manage advertising across hundreds of independent ad networks which don’t coordinate privacy policies.

Jo Ellen: In your blog piece you point out that the news system has to work with user privacy principles. Most of the conversation is about putting into place a set of systems based on opt-in tracking but it is not clear how the principles will impact the opt-in tracking and consent management. I’d like to hear more about that.

Don: The incentive from the browser side is clear for independent browser businesses that don’t have a surveillance marketing business attached. What is it that a big incumbent browser will have trouble doing but that users clearly want?

Extensive user research indicates that users prefer a browser that will protect them from having their activity in one context follow them over to another context, and they also want a clear and non-confusing user experience. So this sets up an opportunity for browsers. They can compete over who can best manage user data in order to meet people’s norms and preferences on how that data is used.

Browser management decisions being made day to day are based on how to acquire users, and keep users once they are already running a browser. So what are the side effects of this new browser competitive area? Why are publishers going to need to be concerned about it, and where can they get some sustainable advantage from it? And the answer is that when user data gets managed in accordance with users’ norms and preferences, then sites that are trusted by users to use their data have an advantage over untrusted sites. And the biggest place this will show up immediately is in ad fraud, because the way that fraud bots work is they leak user data from high-value sites to fraud sites. They do exactly what the mainstream browsers do today in facilitating tracking the user from high value sites to low value sites.

Can the platform that connects permissed data function now without anything more than GDPR or do you see the need for more detailed privacy protections?

There is a need for comprehensive privacy policies across sites because it is prohibitively expensive for small news organizations to keep up with all the details of all the privacy tools and requirements across every possible tech platform and jurisdiction. One major US publishing company was unable to do GDPR compliance for their sites so they ended blocking a whole bunch of US news sites for European visitors.

When I see a site that isn’t able to comply with GDPR, I see a site that is getting its clock cleaned by data leakage. Every single person using that site is getting their data leaked out to other places so they can get reached without the original publisher getting any benefit from it.

If you can’t even do GDPR as a big publishing company how are you going to be able to do California, Europe, and India as a small independent web site, or do clean user-data collection across Firefox, Safari, and other browsers out there?

This is good. We are talking about creating trusted news sites based on the way they work with user data.

The ways users indicate trust with a site are potentially all over the place. They might say they trust their local public radio station by pledging and getting a coffee mug. They might indicate they trust their local news site by filling in a traffic survey saying what neighborhood they live and work in. A user might indicate trust for a site by leaving a comment or a letter to the editor. Many different platforms all have a small view into user trust and all have an opportunity to capture some kind of consent for data use, but there’s no good way to integrate all those. And if you do it through a conventional surveillance marketing mechanism you may be doing it in a way that doesn’t even capture consent. User data without consent is not going to be sustainable on a regulatory or technical basis.

Your typical news site has 50-70 third-party domains showing up on it, and every one of them has a separate privacy policy, all written by different lawyers with the objective of staying out of trouble while giving you the least privacy possible. So if you are a publisher running some skeevy tracker on your site without the right consent, future browsers are going to look at that and say there is no way this user has given consent to this firm from a dark corner of the Lumascape, I’m not going to reveal any user data to that firm.

So what you end up is news sites with reputable content not having the right consent bits set in order to be able to prove that they have a valuable audience. We saw this with GDPR and unconsented impressions coming into real-time bidding platforms. Some of those impressions are coming in without the right consent bits set which means they aren’t going to get bids from some advertisers. Even users who trust the site are not producing ad impression value for the publisher they trust, and that’s a big problem. That’s the first thing that publishers are going to be concerned about with browser privacy improvements. Without all the non-permissioned data we are used to seeing attached to the impressions, those are not going to have much value. Publishers are going to be selling remnant impressions on a quality site because they don’t have the data.

Let’s imagine we have a way to collect opt-in data from a variety of different news sites, and also the merchants and apps that supply those news sites with services. It provides uniform opt-in rules to gather that data and then is able to serve those opt-in users with different types of content. Sort of an opted-in Taboola. If that kind of platform were created would there still be a need for privacy policies as well, or would the consent management system replace that need for the privacy policies?

Consent strings in Apple Safari are managed like any other tracking state would be. So the platform has to be aware of the policies and limitations of all the privacy tools that feed into a user data collection opportunity. Privacy Badger is a niche tool. They look for a specific third-party tracking policy. That is not as important for mainstream adoption directly but some of the list-based tools out there like Disconnect, which Firefox feeds off, can be informed by trackers detected by Privacy Badger.

A common policy has a real role because it lets you address incompatibilities one at a time instead of having a big n by m matrix of site privacy policies and privacy tool policies. It is kind of like open source licenses. If you go to build a project and want to keep your licenses compatible, it is way easier if you have a single software license across that ecosystem or at least a set of compatible licenses.

That is super helpful.

This platform needs to come into existence in an incremental way. Many local sites are signed up with Google and use Google Tag Manager for their ad serving. Google has a lot of the needed functionality built out for their European customers, so the process of moving from unpermitted user data sharing to permission-based user data sharing can be done incrementally if you work it the right way.

Sites can use the Google tools according to their design, taking features that have been developed for compliance in Europe and applying those features to another need, like an off-label use for GDPR compliance features. It’s like discovering you can cure some ulcers by taking a specific dose of antibiotics. This is a big opportunity for Google as well.

There is a need for a comprehensive policy because it is too complicated to do it across all the platforms, and even if there is a private label way to create some kind of opt in, how do you rely upon consent management? Like an open source license, you need a privacy policy that gives a you common language that allows you to cross all these different juristictions, tools and browsers.

Yes, and when this common policy is out there and able to be part of a discussion with tool and browser developers, that policy will inform the future decisions made by those developers. People will say I don’t really want my tool to block permitted data sharing with trusted sites, how do I make my tool better reflect what the users are doing?

For more info

Multi-stakeholder convening process to develop consumer-friendly privacy policies and standards, organized by the Local Media Consortium, the Internet Society, and the Information Trust Exchange.

25 Aug 2018

the missing user data platform

(update 20 Nov 2018: copy edit, add a link to Dr. Johnny Ryan’s CNIL article)

Today’s web advertising relies on 1990s browser behavior—most browsers fail to protect users from being tracked from site to site, and advertisers are used to taking advantage of that old defect. But because browsers do user research and respond to what users want, that’s changing. Browsers are making it harder to track users from site to site without their permission. Along with privacy regulations, this change is creating an opportunity for new, “post-creepy” web advertising that:

The big opportunity is in enabling publishers to reclaim control over their own audience data, not in establishing a new choke point such as a cryptocurrency or paid whitelisting program. (If publishers wanted to give up control to a tech firm, they can do that already.) Most of the development that is needed here can be provided by third parties that publishers are already using, because third parties are coming into compliance with privacy regulations. For example, Google Tag Manager already has the required functionality in order to comply with the European GDPR.

The missing piece is a way for sites to collect and enough user data to show advertisers that the site is trusted by human users, in order to make the ads on that site saleable.

In the new environment, user data alone is insufficient—data must be accompanied by the consent required to use it. And that can’t be just “click to make this dialog go away and consent to adtech as usual”. Both regulators and browser developers are going to require real consent. So the web advertising system needs to evolve away from dependence on large quantities of un-permissioned data towards the ability to use less data accompanied by permission. (Post-creepy web ads won’t be able to swim in abundant unpermissioned data with the nutria of the Lumascape. Consent is scarcer than raw data, and only data accompanied by consent is safe to use. Publishers will have to collect and conserve every drop of data, like muad’dib, the desert mouse of Arrakis.) Possible sources include:

  • Subscription and micropayment systems

  • Federated paywalls

  • Comments and surveys

  • Email newsletters

  • Events

  • Miscellaneous e-commerce (tote bags, mugs, clothing…)

  • Transparency and Consent Framework consent bits

  • Differences in browser behavior between trusted and untrusted sites

and more.

Consent management is a tricky problem. IAB Europe is doing some work toward addressing it, with the open-source Transparency and Consent Framework. Although existing implementations are designed to nudge the user into not-transparent data practices, and are not yet getting real consent, this framework does provide a starting point on which to build consent management that both implements the user’s preferences accurately and provides a smooth user experience. (more info: Global Consent Manager. Global Consent Manager is a client-side component that you can try in Firefox now, that can interact with server-side data platforms.)

In principle, privacy regulation and browser privacy improvements have the potential to lower the return on investment on creepy tracking, and raise the return on investment on building reputation and getting consent. But publishers, who have the reputation to get users to agree that they have the right to use data, don’t have the development budgets or time to build the tools for data gathering.

User data and opportunities to get to get consent are everywhere, in CMSs, other software, and in third-party services. The missing piece is a platform that will collect data, with permission, from all the above sources and

  • run either on the publisher’s own infrastructure or as a third-party service so that small publishers don’t need to touch the CMS or deploy and manage a new service

  • comply with current and future data protection regulations

  • work with and anticipate privacy improvements in browsers

  • provide reports and APIs in a usable format for advertisers and agencies

Many of today’s ad agencies, even sympathetic ones, won’t come to the new system by choice, because it won’t allow for tracking desirable audiences to cheap sites. We can assume that advertisers and agencies will ignore the new system until they see that it’s a way to reach a significant audience that they can’t reach in other ways, today, and the mainstream tracking-protected web audience in the near future.

23 Aug 2018

See you at the Voice of Blockchain conference?

I’ll be at the Voice of Blockchain conference in Chicago on Friday and Saturday. Two panels: “Journalism: Incentivizing the Truth” on Friday, and “Crowdsourcing, Bounties, and Democratizing Access to Jobs” on Saturday.

So what does blockchain have to do with incentivizing journalism?

One important reason that we have standards of fairness and accuracy in news is that news organizations sell advertising to mainstream brands. Brands that want to be able to sell to everyone, not just one side of a political or social issue. High-reputation news sites don’t respond individually to the demands of advertisers, but the principles on which high-reputation news sites operate have developed in parallel with the needs of brand safety.

On today’s web, reputation-based advertising is not so much of a thing. Adtech firms place ads from legit brands on brand-unsafe sites, usually without anyone at the brand knowing about it. Faris Yakob points out, By squeezing fees and margin procurement put incredible pressure on agency principals, who have obligations to hit certain targets from the holding companies. Rock meet hard place. Thus new sources of revenue were found, in media rebates, or opacity, or programmatic trading desks, or production fixing - all conflicts of interest that can be leveraged to try to appease both masters…for a time.

When agencies try to get ad impressions in front of the desired audience at a bargain price, a lot of ad money ends up with fraudulent or brand-unsafe sites. Even legit sites end up running 50 to 70 tracking scripts because they lack the market power to protect their audience from being tracked to cheaper sites.

Incentivizing journalism depends on helping users protect their personal information from being tracked from one site to another. As users get the tools to control who they share their information with (and they don’t want to leak it to everyone) then the web advertising business has to transform into a reputation contest. Whoever can build the most trustworthy place for users to choose to share their information wins.

Blockchains are slow and expensive compared to databases or conventional payment systems, but cheap compared to trust networks. As browsers take a more active role in protecting users from third-party tracking, reputable news sites will need a new technical infrastructure for Internet advertising that accurately reflects the trust relationships between brands, agencies, sites, and users.

What about “Crowdsourcing, Bounties, and Democratizing Access to Jobs”? This is a fun area. Learn market design is the new learn to code.

Developers would prefer to release open source software at a high quality level and get paid for it. Many users would prefer to use software at a higher quality level if they could pay for it. The current software market, though, incentivizes companies to release at a low quality level, in order to get early adoption and build network effects. One approach is to build a new kind of market, one that allows users to hedge their software quality risks while enabling developers to trade on the likelihood of bug fixes. More info: Rao et al.

IMHO, open source bounties still have problems with incentivizing partial work and meta work, so there are lots of opportunities to build better markets here. (Ever notice that there are more companies offering solutions to open source license risks than solutions to developer burnout risks?) Anyway, go read Sneha Sinha’s piece on paid internships.

11 Aug 2018

Questions for agency and publisher workshops

The web advertising game is changing from a hacking contest to a reputation contest. It would have had to happen anyway, but the shift is happening quickly right now because of two trends.

  • Privacy regulation (starting with the European Union, California and India). Some regulations will have impact outside their own juristictions when companies choose not to write and enforce separate second-class privacy policies for users not covered by those regulations.

  • New “browser wars” over which browser can best implement widely-held user norms on sharing their personal information. (Web browsers are good at showing you a web page that looks the same as it does on the other web browsers. Why switch browsers? For many users, because one browser does better at implementing your preferences on personal data sharing.)

Right now the web is terrible as a tool for brand building. But the web doesn’t have to get better at signaling, or less fraudulent, than print or broadcast. In a lot of places the web just has to be better than Android. Fixing web advertising is not one big coordination problem. People who are interested in web advertising, from the publisher and ad agency point of view, have a lot of opportunities for innovative and remunerative projects.

  • Browser privacy improvements, starting with Apple Safari’s Intelligent Tracking Prevention, are half of a powerful anti-fraud system. The better that the browser protects the user’s information from leaking from one site to another, the less it looks like a fraudbot. How can publishers and brands build the other half, to shift ad budgets away from fraud?

  • “Conscious choosers” are an increasingly well-understood user segment, thanks to ongoing user research. For some brands and publishers, the best strategy may be to continue to pursue “personalization pioneers”, the approximately one-third of users who don’t object to having their information collected for ad targeting. Other brands have more appeal to mainstream, vaguely creeped out, users, or to users who more actively defend their personal info. How can “conscious chooser” research inform brands?

  • Regulation and browser privacy improvements are making contextual targeting more imporant. Where are the opportunities to reach human audiences in the right context? Where does conventional programmatic advertising miss out on high-context, signalful ad placements because of gaps in data?

  • As sharing of user data without permission becomes less common, new platforms are emerging to enable users to share information about themselves by choice. For example, a user who comments on a local news site about traffic may choose to share their neighborhood and the mode of transportation that they take to work. User data sharing platforms are in the early stages, and agencies have an opportunity to understand where publishers and browsers are going. (Hint: it’ll be harder to get big-budget eyeballs on low-value or fraudulent sites.) Which brands can benefit from user-permissioned data sharing?

  • (Complementary to data sharing issues) Consent management is still an unsolved problem. While the Transparency and Consent Framework provides a useful foundation to build on, today’s consent forms are too annoying for users and also make it difficult and time-consuming to do anything except select a single all-or-nothing choice. This doesn’t accurately reflect the user’s data sharing choices. The first generation of consent management is getting replaced with a better front end that not only sends a more accurate consent decision, but also takes less time and attention and is less vulnerable to consent string fraud. How will accurate and convenient consent management give advantages to sites and brands that users trust?

Workshops are in progress on all this stuff. Clearly it’s not all just coming from the browser side—forward-thinking people at ad agencies and publishers are coming up with most of it.

10 Aug 2018

ICYMI

Inner procrastinator: HEY LET’S FIND SOME K3WL ARTICLES TO READ ON THE INTERNET

Sense of duty: No, must update project status. (Ctrl-T to open new tab)

Web browser: HEY WEREN’T YOU LISTENING TO INNER PROCRASTINATOR JUST NOW? HERE IS SOME RECOMMENDED CONTENT

Me: Preferences → Home → Firefox Home Content. Uncheck everything except “Web Search” and “Bookmarks”.

Anyway, happy Friday. Since you’re already reading blogs, you might as well read something good, so here is some stuff that the RSS reader dragged in. (My linklog is no longer getting posted to Facebook because reasons, so if you were clicking on links from me there you will have to figure something else out. The raw linklog is: feed. Ideas?)

The Segway patent expires next June. If you thought the scooters of San Francisco were annoying this year, just wait for the summer of generic-Segway-on-demand startups.

xkcd: Voting Software

Why open source failed

The Google Funded Astroturf Group that Hacked The EU Copyright Vote (In Pictures)

Juul & its House of Smoke & Horrors

Parking Has Eaten American Cities

Selling a Good Time: Inside the Wild, Wacky World of Minor League Baseball Marketing

How to Stop Your Smart TV From Tracking What You Watch

How to Pull Off a Professional Video Call From Home

Architects Ask: Where Are the Spaces for Teen Girls?

Open Offices Make You Less Open

The AudioKit Synth One is a pro-level iPad synth that’s completely free

The advantages of an email-driven git workflow

Here’s One Union That Can’t Be Touched by ‘Right to Work’ Laws

The Innovation Stack: How to make innovation programs deliver more than coffee cups

I Delivered Packages for Amazon and It Was a Nightmare

New US Tariffs are Anti-Maker and Will Encourage Offshoring

How to spot a perfect fake: the world’s top art forgery detective

Help The Stranger and ProPublica Track Online Ads About the Seattle Head Tax, the Midterms, and More

Should Bankers Be Forced to Put Some Skin in the Game?

San Francisco is losing more residents than any other city in the US, creating a shortage of U-Hauls that puts a rental at $2,000 just to move to Las Vegas

Containers, Security, and Echo Chambers

05 Aug 2018

How many vendors are relying on legitimate interest for ad targeting?

ICYMI: Why the GDPR ‘legitimate interest’ provision will not save you by Johnny Ryan.

The “legitimate interest” provision in the GDPR will not save behavioral advertising and data brokers from the challenge of obtaining consent for personally identifiable data.

The obvious question is: how many of the vendors listed on the Global Vendor and CMP List are actually relying on LI for purposes of Ad selection, delivery, reporting? Worth writing a simple script to check. Looks like 151 of 409, or about 37%.

Purpose 3 is:

Ad selection, delivery, reporting: The collection of information, and combination with previously collected information, to select and deliver advertisements for you, and to measure the delivery and effectiveness of such advertisements. This includes using previously collected information about your interests to select ads, processing data about what advertisements were shown, how often they were shown, when and where they were shown, and whether you took any action related to the advertisement, including for example clicking an ad or making a purchase. This does not include personalisation, which is the collection and processing of information about your use of this service to subsequently personalise advertising and/or content for you in other contexts, such as websites or apps, over time.

And here’s the list of vendors with a “3” in their legIntPurposeIds:

151 of 409 listed vendors claim LI for purpose: Ad selection, delivery, reporting

This is as of version 90 of the list, last updated 2 August.

Will be interesting to see if the number claiming a legitmate interest here goes up or down as people learn more about the applicable regulations.

20 Jul 2018

Browser privacy improvements and anti-fraud

(Update 18 Aug 2018: Fix an error to be consistent with the source quoted.)

(I work for Mozilla. None of this is secret. None of this is official Mozilla policy. Not speaking for Mozilla here.)

The good news is that interesting competition among web browsers is back, not just because of ongoing performance improvements in Firefox, but also because of Apple Safari’s good work on protecting users from some kinds of cross-site tracking by default. Now the challenge for other browsers is to learn from the Safari work and build on it, to even more accurately implement the user’s preferences on sharing their personal information. According to research by Tini Sevak at YouGov, 36% of users are more likely to engage with adverts that are tailored to them, while 55% are creeped out by personalized ads. The browser has to get its data sharing settings right for the individual user, while minimizing the manual settings and decision fatigue that the user has to go through.

A short-term problem for sites, though, is that the current price for highly tracked ad impressions facilitated by cross-site tracking is still way above the price of impressions delivered to users who choose to protect themselves. Tim Peterson, on Digiday, covers the natural experiment of GDPR consenters and non-consenters:

If an exchange or SSP declines to sign the agreement, it is limited to only selling non-personalized ads through DBM. Those generic ads generate less revenue for publishers than personalized ads that are targeted to specific audiences based on data collected about them. Some publishers that are heavily reliant on DBM have seen their revenues decline by 70-80 percent since GDPR took effect because they were limited to non-personalized ads, said another ad tech exec.

(‘It’s impossible’: Google has asked ad tech firms to guarantee broad GDPR consent, assume liability - Digiday)

In the medium to long term, better browser privacy settings will give an advantage to high-reputation sites for two reasons:

  • ads on high-value content have signaling value

  • users are more likely to share information with a site they trust

But in the short term, what can browsers do to help address the market dislocation from the user data crunch?

One possibility is to take advantage of an important side effect of browser privacy improvements: better anti-fraud data.

Today, unprotected browsers and fraudbots are hard to tell apart. Both maintain a single “cookie jar” across trusted and untrusted sites. For fraudbots, cross-site trackability is not a bug as it is for a human user’s browser—it’s a feature. A fraudbot can only produce valuable ad impressions on a fraud site if it is somehow trackable from a legit site.

As browser users start to upgrade to nightly releases that include more protection, though, a trustworthy site’s real users will start to look more and more different from fraudbots. Low-reputation and fraud sites claiming to offer the same audience will have a harder and harder time trying to sell impressions to agencies that can see it’s not the same people. This does require better integration with anti-fraud tools, so it’s something sites and anti-fraud vendors can do in parallel with the brower release process.

Can the anti-fraud advantages of browser privacy improvements completely swamp out the market effects of reducing cross-site trackability? Depends on how much adfraud there is. We don’t know.

12 Jul 2018

Web ad bargain?

Tim Peterson, on Digiday:

If an exchange or SSP declines to sign the agreement, it is limited to only selling non-personalized ads through DBM. Those generic ads generate less revenue for publishers than personalized ads that are targeted to specific audiences based on data collected about them. Some publishers that are heavily reliant on DBM have seen their revenues decline by 70-80 percent since GDPR took effect because they were limited to non-personalized ads, said another ad tech exec. That revenue drop has put pressure on exchanges and SSPs to sign Google’s consent agreement lest their publishers move their inventory to other platforms that can run DBM’s personalized ads on their sites, the second exec said.

(‘It’s impossible’: Google has asked ad tech firms to guarantee broad GDPR consent, assume liability - Digiday)

A lot of those “specific audiences” are, of course, adfraud bots. Fraud hackers are better at adtech than adtech firms are. So ads shown to bots, on shitty sites, are going for more than ads seen by humans on legit sites.

Meanwhile, tracking-resistant, personalization-averse readers are overrepresented in some customer categories. Web developers are a good example. (40% protected based on recent data from one popular site.)

Of course, today’s web ad system is based on tracking the best possible prospect to the cheapest possible site, so it won’t be easy to take advantage of this nice piece of market inefficiency. First step is figuring out how well protected the people you want to reach are.

More: blog.aloodo.org - Beware of averages: why you need a local tracking protection metric

10 Jul 2018

Bug futures: business models

Recent question about futures markets on software bugs: what’s the business model?

As far as I can tell, there are several available models, just as there are multiple kinds of companies that can participate in any securities or commodities market.

Cushing, Oklahoma

Oracle operator: Read bug tracker state, write futures contract state, profit. This business would take an agreed-upon share of any contract in exchange for acting as a referee. The market won’t work without the oracle operator, which is needed in order to assign the correct resolution to each contract, but it’s possible that a single market could trade contracts resolved by multiple oracles.

Actively managed fund: Invest in many bug futures in order to incentivize a high-level outcome, such as support for a particular use case, platform, or performance target.

Bot fund: An actively managed fund that trades automatically, using open source metrics and other metadata.

Analytics provider: Report to clients on the quality of software projects, and the market-predicted likelihood that the projects will meet the client’s maintenance and improvement requirements in the future.

Stake provider: A developer participant in a bug futures market must invest to acquire a position on the fixed side of a contract. The stake provider enables low-budget developers to profit from larger contracts, by lending or by investing alongside them.

Arbitrageur: Helps to re-focus development efforts by buying the fixed side of one contract and the unfixed side of another. For example, an arbitrageur might buy the fixed side of several user-facing contracts and the unfixed side of the contract on a deeper issue whose resolution will result in a fix for them.

Arbitrageurs could also connect bug futures to other kinds of markets, such as subscriptions, token systems, or bug bounties.

Paper from WEIS

A trading market to incentivize secure software: Malvika Rao, Georg Link, Don Marti, Andy Leak & Rich Bodo (PDF) (presented at WEIS 2018)

Creative Commons

Pipeline monument in Cushing, Oklahoma: photo by Roy Luck for Wikimedia Commons. This file is licensed under the Creative Commons Attribution 2.0 Generic license.

08 Jul 2018

take the YouTube advertisers bowling

(update 25 Sep 2021: also applies to other “safe harbor” laws such as Section 230 in the USA)

What if there is a better way forward on the whole Safe Harbor controversy and Article 13?

Companies don’t advertise on sites like YouTube, sites teeming with copyright infringers and nationalist extremists, because those companies are run by copyright infringers or nationalist extremists. Marketing decision-makers are incentivized to play a corrupt online advertising game that rewards them for supporting infringement and extremism.

So the trick here is to help people move marketing money out of bad things (negative externalities) and toward good things (positive externalities). We know that YouTube is a brand-unsafe shitshow because Google won’t advertise its own end-user-facing products and services there without a whole extra layer of brand safety protection.

Big Internet companies are set up to insulate decision-makers from the consequences of their own online asshattery, anyway. The way to affect those big Internet companies is through their advertisers. So how about a tweak to any laws that give a safe harbor to an Internet service? Keep the safe harbor for the service itself, but remove safe harbor protections for companies that advertise on it. This should help in several ways.

  • Give legit services some flexibility. If your web site’s business model is anything other than “get cheap eyeballs with other people’s creative work” or “get cheap eyeballs by recommending divisive bullshit” then you don’t have to change a thing.

  • Incentivize sites to pay for new creative work, by making works covered by an author or artist contract a more attractive place for paid advertising than “content” uploaded by random users.

  • Make it easier for marketers who want to do the right thing, by pointing out the risks of supporting bad people.

  • Move some of the risks of online advertising away from the public and toward the decision makers who can make a difference.

How about it?

More: there ought to be a law

03 Jul 2018

Nudgestock 2018 transcript

(This is a cleaned-up and lightly edited version of my talk from Nudgestock 2018.)

First I have to give everybody a disclaimer. This is 100% off message. I work for Mozilla. I am NOT speaking for Mozilla here.

If you follow Rory, you have probably heard a lot about signaling in advertising, so I’m going to go over this material pretty quickly. Why does Homo economicus read magazine advertising but hangs up on cold calls? To put it another way why is every car commercial the same? You could shoot the “car driving down the windy road” commercial with any car. All that the car commercial tells you is: if it was a waste of your time to test drive our car then it would have been a waste of our money to make this little movie about it.

There’s a whole literature of economics and math about signaling involving deceptive senders and honest senders. With this paper, Gardete and Bart show that when the sender wants to really get a message across, counter-intuitively the best thing for the sender to do is deprive themselves of some information about the receiver. If you’re in the audience and you know what the sender knows about you, then you can’t tell are they honestly expressing their intentions in the market, or are they just telling you what you want to hear? Anyone who used to read Computer Shopper magazine for the ads didn’t just read it for specific information about all the parts that you might put into your computer. You read it to find out which manufacturers are adopting which standards so you don’t buy a motherboard that won’t support the video card that you might want to upgrade to next year.

There are three sets of papers in the signaling literature. There are papers that have pure math where you devise kind of a mathematical game of buyers and sellers and see how that game works out. And there are papers where you take users in an experimental setting. Ambler and Hollier took 540 users, showed them different versions of expensive looking and cheap looking advertising that conveys the same information. Finally you’ve got the kind of research that looks at spending across different product categories, and in this study they found that types of product that have different advertising to sales ratios really depends on how much extra user experience it takes to evaluate that product.

The feedback loop here is that when brands have signaling power, then that means market power for the publishers that carry their advertising, which means advertising rates tend to go up, which means the publishers can afford to make obviously expensive content. And when you attach advertising to obviously expensive content, that means more signaling power. It’s kind of a loop that builds more and more value for the advertiser.

Some people compare this to the signaling that a bank does when they build this monstrous stone building to keep your money. Really, the stuff that a bank does, having a stone building doesn’t do any more for keeping money in it than having a metal building or a concrete building, but it just shows that they’ve got this big stone building with their name on it so if they turned out to be deceptive it would be more costly for them to do it. That’s the pure signaling model. But the other area that we can see when we compare this kind of classic signal-carrying advertising to online advertising, the kind of ads that are targeted to you based on who you are, is what’s up with the norms enforcers?

Rory has his blue checkmark on Twitter which means he doesn’t see Twitter ads. I’m less Internet Famous, so I still get the advertising on Twitter. A lot of the ads that I get are deceptive issue ads. This is one. A company that’s getting sued for lead paint related issues is trying to convince residents of California that government inspectors are coming to their houses to declare them a nuisance. This is bogus and it’s the kind of thing that if it appeared in the newspaper that everyone got to see then journalists and public interest lawyers, and everyone else who enforces the norms on how we communicate, would call it out. But in a targeted ad medium this kind of deceptive advertising can target me directly.

So let me show a little simulation here. What we’re looking at is deceptive sellers making a sale. When a deceptive seller makes a sale that’s a red line. When an honest seller makes a sale, that’s a green line. The little blue squares are norms enforcers, and the only thing that makes a norms enforcer different in this game from a regular customer is when a deceptive seller contacts a norms enforcer the deceptive seller pays a higher price than they would have made in profit from a sale. So with honest sellers and deceptive sellers evolving and competing in this primordial soup of customers, what ends up happening to the deceptive sellers that try to do a broad reach and hit a bunch of different customers is, well you saw them, they hit the norms enforcers, the blue squares lit up. Advertisers who are deceptive and try to reach a bunch of different people end up getting squeezed out in this version of the game. An honest advertiser like this little square down here can reach over the whole board because they don’t pay the penalty for reaching the norms enforcer.

So what does this really mean for the real web? On the World Wide Web, have we inadvertently built a game that gives an unfair advantage to deceptive sellers? If somebody can take advantage of all the the user profiling information that’s available out there, and say, “oh I believe that these people are rural, low-income, unlikely to be finance journalists, therefore I’m going to hit them with the predatory finance ads,” does that cause users to pay less attention to the medium?

Online advertising effectiveness has declined since the launch of the first banner advertisement in 1994. That’s certainly not news. This is a slide that appeared in Mary Meeker’s famous Internet Trends presentation, and as you can see blue is percentage of ad spending, grey is percentage of people’s time. So TV is 36% of the time 36% of the money. Desktop web 18%, 20%, about right.

What’s going on with print? Print is 9% of the money for 4% of the time. Now you might say this is just inertia, that that this year people are finally just cutting back on spending money in print because of people spending less time on print and it’ll eventually catch up. But I went back and plotted the same slide from the same presentation going back to 2011, and I’ve got time plotted across the bottom, money plotted on the y axis, and what do we see about print? Print is on a whole different trend line. Print is on a trend line of much more value to the advertiser per unit of time spent than these other ad medium. My hypothesis is that targeting breaks signaling and this means an opportunity.

Targeting means that when you see an ad coming in targeted to you it’s more like a cold call. It doesn’t carry credible information about the seller’s intention in the market.

From the point of view of who has an incentive to to support signal-carrying ad media instead, the people who have an interest in that signal for attention bargain in that positive feedback loop are of course the publishers, high reputation brands that want to be able to send that signal, writers, photographers, and editors, people who get paid by that publisher, and people who benefit from the positive externalities of those signal carrying ads that support news and cultural works.

So if the signaling model is such a big thing then why are there so many targeted ads still out there?

Nudges.

Let’s have a look at, just to pick an example, the Facebook advertising policy. As you know, the Facebook advertising platform will let you micro target individuals extremely specifically. You can pick out seven people in Florida, you can pick out everyone who’s looking for an apartment who doesn’t have a certain ethnic affinity, that kind of thing. But the one thing you’re not allowed to do with Facebook targeting is put anything in your ad that might indicate how you’re targeting it. The policy says:

ads must not contain content that asserts or implies personal attributes

You can’t say, I know you’re male or female, I know your sexual orientation, I know what you do for a living. The ad copy has to be generic even if the targeting can be extremely specific. You can’t even say other. You can’t say meet other singles because that implies that the advertiser knows that the reader is single. Facebook will let you target people with depression but you can’t reveal that you know that about them. Aanother good example is Target. They do targeting of individuals who they believe to be pregnant, but they’ll pad out those ads for baby stuff with ads for other types of products so as not to creep everybody out.

Back to our shared interest in signal for attention bargain. Pretty much everybody has an interest in that original positive feedback loop of getting the higher reputation for brands of getting reputation driven publishers that’ll build high quality content for us. Writers and photographers have an interest in getting paid, and people who are shopping for goods are the ones who want the signal the most. All that stands on the opposite side is behavioral tricks to conceal targeting. Now I’m not going to say this as a privacy issue. I know that there are privacy issues here but that is really not my department. Besides, Facebook just announced a dating site so they’re going to breed privacy preferences out of their user base anyway.

Can the web as an advertising medium be redesigned to make it work better for carrying signal? We know from the existence of print that this type of signal carrying ad medium can exist. Print is an existence proof of signal carrying advertising. We also know that building that kind of an ad medium can’t be that hard because print was built when people were breathing fumes from molten lead all day.

The prize for building a signal-carrying ad medium is all the cultural works that you get when somebody like Kurt Vonnegut can quit his job as manager of a car dealership and write for Collier’s magazine full-time. This book is still on sale with the resulting stories. And of course local news. Democracy depends on the the vital flow of information of public interest. Some people say that the problem with news and information on the web is that it’s all been made free, and if people would just subscribe we could fix the system. But honestly if if free was the problem, then Walter Cronkite would have destroyed the media business in 1962. It’s a market design problem and a signaling problem, not just a problem of who has to pay for what.

And the web browsers got a bunch of things wrong in the 1990s. There are certain patterns of information flow that the browser facilitated, like third-party tracking, where browsers enable some companies to follow your activity from site to site, and data leakage. Things that that just don’t work according to the way that people expect. Most people don’t want their activity on one site to follow them over to another site, and the original batch of web browsers got that terribly wrong. The good news is web browsers are getting it right, and web browsers are under tremendous pressure now to do so. As a product the web browser is pretty much complete and working and generic. The whole point of a web browser is it shows web sites the same as all the other web browsers do, so there’s less and less reason for a user to want to switch web browsers. But everybody who is trying to get you to install a web browser needs for there to be a reason, so the opportunity for browsers is to align with those interests of users that the browser wasn’t able to pick up on previously.

At Mozilla some user researchers recently did a study on users with no ad blocker installed and users within the first few weeks of installing an ad blocker. Anybody want to guess on the increased engagement? How much more time those ad blocker users spend with that same browser than the non ad blocker users? Anybody shout out a number. All right, 28%. From the point of view of the browser those kinds of numbers, moving user engagement in a way that helps that browser meet its goals, that’s something that that the browser can’t ignore. So that means we’re going from the old web game where everyone tries win by collecting as much data on people can without their permission to a new game in which the browser, high reputation publishers, and high reputation brands are all aligned in trying to build enough trust to work on information that users choose to share.

I know when I say information that users choose to share you’re going to think about all these GDPR dialogs and I know I’ve seen these too, and they’re just tons of companies on these. To be honest, looking at some of these company names it looks like most of them were made up by guys from Florida who communicate primarily by finger guns. Users should not have to micromanage their consent for all this data collection activity any more than email users should have to go in and read their SMTP headers to filter spam. And really if you think about what brands are, it’s offloading information about a product buying decision onto the reputation coprocessor in the user’s brain. It’s kind of like taking a computational task and instead of running it on the CPU in your data center where you have to to pay the power and cooling bills for it, you offload it and run it on on the GPU on the client. It’ll run faster, it’ll run better, and the audience is maintaining that reputation state.

The future is here, it’s just not very evenly distributed, as William Gibson said. This picture is the cyberpunk of the 1990s. Today all of that stuff he’s carrying, his video camera, his laptop, his scanner, all that stuff’s on a phone and everybody has it.

Today, the privacy sensitive users, the ones who are already working based on sharing data with permission, they’re out there. But they’re in niches today. If you have a relationship with those people now, then now is an opportunity to connect with them, figure out how to build that signal carrying advertising game, and and create a reputation based advertising model for the web. Thank you very much.

02 Jul 2018

Worse is better, again?

Are there parallels between the rise of Worse Is Better in software and the success of the “uncreative counterrevolution” in advertising? (for more on that second one: John Hegarty: Creativity is receding from marketing and data is to blame) The winning strategy in software is to sacrifice consistency and correctness for simplicity. (probably because of network effects, principal-agent problems, and market failures.) And it seems like advertising has similar trade-offs between

  • Signal

  • Measurability (How well can we measure this project’s effect on sales?)

  • Message (Is it persuasive and on brand?)

Just as it’s rational for software decision-makers to choose simplicity, it can be rational for marketing decsion-makers to choose measurability over signal and message. (This is probably why there is a brand crisis going on—short-term CMOs are better off when they choose brand-unsafe tactics, sacrificing Message.)

As we’re now figuring out how to use market-based tools to fix market failures in software, where can we use better market design to fix market failures in advertising?

19 Jun 2018

blood donation: no good deed goes unpunished

I have been infected with the Ebola virus.

I have had sex with another man in the past year.

I am taking Coumadin®.

Actually, none of those three statements is true. And Facebook knows it.

The American Red Cross has given Facebook this highly personal information about me, by adding my contact info to an “American Red Cross Blood Donors” Facebook Custom Audience. If any of that stuff were true, I wouldn’t have been allowed to give blood.

When I heard back from the American Red Cross about this personal data problem, they told me that they don’t share my health information with Facebook.

That’s not how it works. I’m listed in the Custom Audience as a blood donor. Anyway, too late. Facebook has the info now.

So, which of its promises about how it uses people’s personal information is Facebook going to break next?

And is some creepy tech bro right now making a killer pitch to Paul Graham about a business plan to “disrupt” the health insurance market using blood donor information?

I should not have to care about this, and I don’t have time to. I don’t even have time to attempt a funny remark about the whole Facebook board member Peter Thiel craving blood thing.

17 Jun 2018

Helping people move ad budgets away from evil stuff

Hugo-award-winning author Charles Stross said that a corporation is some kind of sociopathic hive organism, but as far as I can tell a corporation is really more like a monkey troop cosplaying a sociopathic hive organism.

This is important to remember because, among other reasons, it turns out that the money that a corporation spends to support democracy and creative work comes from the same advertising budget as the money it spends on random white power trolls and actual no-shit Nazis. The challenge for customers is to help people at corporations who want to do the right thing with the advertising budget, but need to be able to justify it in terms that won’t break character (since they have agreed to pretend to be part of a sociopathic hive organism that only cares about its stock price).

So here is a quick follow-up to my earlier post about denying permission for some kinds of ad targeting.

Techcrunch reports that “Facebook Custom Audiences,” the system where advertisers upload contact lists to Facebook in order to target the people on those lists with ads, will soon require permission from the people on the list. Check it out: Introducing New Requirements for Custom Audience Targeting | Facebook Business. On July 2, Facebook’s own rules will extend a subset of Europe-like protection to everyone with a Facebook account. Beaujolais!

So this is a great opportunity to help people who work for corporations and want to do the right thing. Denying permission to share your info with Facebook can move the advertising money that they spend to reach you away from evil stuff and towards sites that make something good. Here’s a permission withdrawal letter to cut and paste. Pull requests welcome.

11 Jun 2018

simulating a market with honest and deceptive advertisers

At Nudgestock 2018 I mentioned the signaling literature that provides background for understanding the targeted advertising problem. Besides being behind paywalls, a lot of this material is written in math that takes a while to figure out. For example, it’s worth working through this Gardete and Bart paper to understand a situation in which the audience is making the right move to ignore a targeted message, but it can take a while.

Are people rational to ignore or block targeted advertising in some media, because those media are set up to give an incentive to deceptive sellers? Here’s a simulation of an ad market in which that might be the case. Of course, this does not show that in all advertising markets, better targeting leads to an advantage for deceptive sellers. But it is a demonstration that it is possible to design a set of rules for an advertising market that gives an advantage to deceptive sellers.

(view large version)

What are we looking at? Think of it as a culture medium where we can grow and evolve a population of single-celled advertisers.

The x and y coordinates are some arbitrary characteristic of offers made to customers. Customers, invisible, are scattered randomly all over the map. If a customer gets an offer for a product that is close enough to their preferences, it will buy.

Advertisers (yellow to orange squares) get to place ads that reach customers within a certain radius. The advertiser has a price that it will bid for an ad impression, and a maximum distance at which it will bid for an impression. These are assigned randomly when we populate the initial set of advertisers.

High-bidding advertisers are more orange, and lower-bidding advertisers are more pale yellow.

An advertiser is either deceptive, in which case it makes a slightly higher profit per sale, or honest. When an honest advertiser makes a sale, we draw a green line from the advertiser to the customer. When a deceptive advertiser makes a sale, we draw a red line. The lines appear to fade out because we draw a black line every time there is an ad impression that does not result in a sale.

So why don’t the honest advertisers die out? One more factor: the norms enforcers. You can think of these as product reviewers or regulators. If a deceptive advertiser wins an ad impression to a norms enforcer, then the deceptive advertiser pays a cost, greater than the profit from a sale. Think of it as having to register a new domain and get a new logo. Honest advertisers can make normal sales to the norms enforcers, which are shown as blue squares. An ad impression that results in an “enforcement penalty” is shown as a blue line.

So, out of those relative simple rules—two kinds of advertisers and two kinds of customers—we can see several main strategies arise. Your run of the simulation is unique, and you can also visit the big version.

What I’m seeing on mine is some clusters of finely targeted deceptive advertisers, in areas with relatively few norms enforcers, and some low-bidding honest advertisers with a relatively broad targeting radius. Again, I don’t think that this necessarily corresponds to any real-world advertising market, but it is interesting to figure out when and how an advertising market can give an advantage to deceptive sellers, and what kinds of protections on the customer side can change the game.

05 Jun 2018

Evil stuff on the Internet and following the money

Rule number one of dealing with the big Internet companies is: never complain to them about all the evil stuff they support. It’s a waste of time and carpal tunnels. All of the major Internet companies have software, processes, and, most important, contract moderators, to attenuate complaints. After all, if Big Company employees came in to work and saw real user screenshots of the beheading videos, or the child abuse channel, or the ethnic cleansing memes, then that would harsh their mellow and severely interfere with their ability to, as they say in California, bro down and crush code.

Fortunately, we have better options than engaging with a process that’s designed to mute a complaint. Follow the money.

Your average Internet ad does not come from some ominous all-seeing data-driven Panopticon. It’s probably placed by some marketing person looking at an ad dashboard screen that’s just as confusing to them as the ad placement is confusing to you.

So I’m borrowing the technique that “Spocko” started for talk radio, and Sleeping Giants scaled up for ads on extremist sites.

  • Contact a brand’s marketing decision makers directly.

  • Briefly make a specific request.

  • Put your request in terms that make not granting it riskier and more time-consuming.

This should be pretty well known by now. What’s new is a change in European privacy regulations. The famous European GDPR applies not just to Europeans, but to natural persons. So I’m going to test the idea that if I ask for something specific and easy to do, it will be easier for people to just do it, instead of having to figure out that (1) they have a different policy for people who they won’t honor GDPR requests from and (2) they can safely assign me to the non-GDPR group and ignore me.

My simple request is not to include me in a Facebook Custom Audience. I can find the brands that are doing this by downloading ad data from Facebook, and here’s a letter-making web thingy that I can use. Try it if you like. I’ll follow up with how it’s going.

02 Jun 2018

Opting into European mode

Trans Europa Express was covered on ghacks.net. This is an experimental Firefox extension that tries to get web sites to give you European-level privacy rights, even if the site classifies you as non-European.

Since the version they mentioned, I have updated it with a few new features.

Anyway, check it out. Seems to have actual users now, so I’ve got that going for me. But lots of secret European mode switches still remain unactivated. If you see one, please make a new issue.

25 May 2018

Happy GDPR day. Here’s some sensitive data about me.

I know I haven’t posted for a while, but I can’t skip GDPR Day. You don’t see a lot of personal info from me here on this blog. But just for once, I’m going to share something.

I’m a blood donor.

This doesn’t seem like a lot of information. People sign up for blood drives all the time. But the serious privacy problem here is that when I give blood, they also test me for a lot of diseases, many of which could have a big impact on my life and how much of certain kinds of healthcare products and services I’m likely to need. The fact that I’m a blood donor might also help people infer something about my sex life but the health data is TMI already.

And I have some bad news. I recently got the ad info from my Facebook account and there it is, in the file advertisers_who_uploaded_a_contact_list_with_your_information.html. American Red Cross Blood Donors. Yes, it looks like the people I chose to trust with some of my most sensitive personal info have given it to the least trusted company on the Internet.

In today’s marketing scene, the fact that my blood donor information leaked to Facebook isn’t too surprising. The Red Cross clearly has some marketing people, and targeting the existing contact list on Facebook is just one of the things that marketing people do without thinking about it too much.Not thinking about privacy concerns is a problem for Marketing as a career field long-term. If everyone thinks of Marketing as the Department of Creepy Stuff it’s going to be harder to recruit creative people.

So, wait a minute. Why am I concerned that Facebook has positive health info on me? Doesn’t that help maintain my status in the data-driven economy? What’s the downside? (Obvious joke about healthy-blood-craving Facebook board member Peter Thiel redacted—you’re welcome.)

The problem is that my control over my personal data isn’t just a problem for me. As Prof. Arvind Narayanan said (video), Poor privacy harms society as a whole. Can I trust Facebook to use my blood info just to target me for the Red Cross, and not to sort people by health for other purposes? Of course not. Facebook has crossed every creepy line that they have promised not to. To be fair, that’s not just a Facebook thing. Tech bros do risky and mean things all the time without really thinking them through, and even when they do set appropriate defaults they half-ass the implementation and shit happens.

Will blood donor status get you better deals, or apartments, or jobs, in the future? I don’t know. I do know that the Red Cross made a big point about confidentiality when they got me signed up. I’m waiting for a reply from the Red Cross privacy officer about this, and will post an update.

Anyway, happy GDPR Day, and, in case you missed it, Salesforce CEO Marc Benioff Calls for a National Privacy Law.

12 May 2018

Can markets for intent data even be a thing?

Doc Searls is optimistic that surveillance marketing is going away, but what’s going to replace it? One idea that keeps coming up is the suggestion that prospective buyers should be able to sell purchase intent data to vendors directly. This seems to be appealing because it means that the Marketing department will still get to have Big Data and stuff, but I’m still trying to figure out how voluntary transactions in intent data could even be a thing.

Here’s an example. It’s the week before Thanksgiving, and I’m shopping for a kitchen stove. Here are two possible pieces of intent information that I could sell.

  • “I’m cutting through the store on the way to buy something else. If a stove is on sale, I might buy it, but only if it’s a bargain, because who needs the hassle of handling a stove delivery the week before Thanksgiving?”

  • “My old stove is shot, and I need one right away because I have already invited people over. Shut up and take my money.”

On a future intent trading platform, what’s my incentive to reveal which intent is the true one?

If I’m a bargain hunter, I’m willing to sell my intent information, because it would tend to get me a lower price. But in that case, why would any store want to buy the information?

If I need the product now, I would only sell the information for a price higher than the expected difference between the price I would pay and the price a bargain hunter would pay. But if the information isn’t worth more than the price difference, why would the store want to buy it?

So how can a market for purchase intent data happen?

Or is the idea of selling access to purchase intent only feasible if the intent data is taken from the “data subject” without permission?

Anyway, I can see how search advertising and signal-based advertising can assume a more important role as surveillance marketing becomes less important, but I’m not sure about markets for purchase intent. Maybe user data sharing will be not so much a stand-alone thing but a role for trustworthy news and cultural sites, as people choose to share data as part of commenting and survey completion, and that data, in aggregated form, becomes part of a site’s audience profile. More: personal AI in the rugpull economy

06 May 2018

Unlocking the hidden European mode in web ads

It would make me really happy to be able to yellow-list Google web ads in Privacy Badger. (Yellow-listed domains are not blocked, but have their cookies restricted in order to cut back on cross-site tracking.) That’s because a lot of news and cultural sites use DoubleClick for Publishers and other Google services to deliver legit, context-based advertising. Unfortunately, as far as I can tell, Google mixes in-context ads with crappy, spam-like, targeted stuff. What I want is something like Doc Searls style ads: Just give me ads not based on tracking me.

Until now, there has been no such setting. There could have been, if Do Not Track (DNT) had turned out to be a thing, but no. But there is some good news. Instead of one easy-to-use DNT, sites are starting to give us harder-to-find, but still usable, settings, in order to enable GDPR-compliant ads for Europe. Here’s Google’s: Ads personalization settings in Google’s publisher ad tags - DoubleClick for Publishers Help.

Wait a minute? Google respects DNT now?

Sort of. GDPR-compliant terms written by Google aren’t exactly the same as EFF’s privacy-friendly Do Not Track (DNT) Policy All these different tracking policies are reminding me of open source licenses for some reason. but close enough. The catch is that as an end user, you can’t just turn on Google’s European mode. You have to do some JavaScript. I think I figured out how to do this in a simple browser extension to unlock secret European status.

Google doesn’t appear to have their European mode activated yet, so I added a do-nothing “European mode” to the Aloodo project, for testing. I’m not able to yellow-list Google yet, but when GDPR takes effect later this month I’ll test it some more.

In the meantime, I’ll keep looking for other examples of hidden European mode, and see if I can figure out how to activate them.

17 Apr 2018

GDPR and client-side tools

Lots of GDPR advice out there. As far as I can tell it pretty much falls into three categories.

But what if there is another way?

  1. Start with the clean version. (Here’s that link again: How to: GDPR, consent and data processing).

  2. Add microformats to label consent forms as consent forms, and appropriate links to the data usage policy to which the user is being asked to agree.

  3. Release a browser extension that will do the right thing with the consent forms, and submit automatically if the user is fine with the data usage request and policy, and appears to trust the site. Lots of options here, since the extension can keep track of known data usage policies and which sites the user appears to trust, based on their activity.

  4. Publish user research results from the browser extension. At this point the browsers can compete to do their own versions of step 3, in order to give their users a more trustworthy and less annoying experience.

Browsers need to differentiate in order to attract new users and keep existing users. Right now a good way to do that is in creating a safer-feeling, more trustworthy environment. The big opportunity is in seeing the overlap between that goal for the browser and the needs of brands to build reputation and the needs of high-reputation publishers to shift web advertising from a hacking game that adtech/adfraud wins now, to a reputation game where trusted sites can win.

14 Apr 2018

When can deceptive sellers outbid honest sellers for ad impressions?

Update 12 Dec 2021: Add link to London Underground example.

Update 8 Jun 2018: simulation, Why digital advertising leaves people underwhelmed

Why does the Peak Advertising effect occur most in the most accurately targeted ad media? Why do people tend to filter out targeted ads, using habit power, technology, and regulation, while paying more attention to less finely targeted ad media?

One explanation is that buying ad space is an example of costly signaling. On this view, advertising is basically an exchange of signal for attention, and ads that don’t pay their way with some kind of proof of spend are not worth paying attention to because they don’t convey useful information about the seller’s beliefs on how valuable the audience would find the product.

Another possible explanation is that targetable ad media are more suitable for deception, and that where advertisers bid for space in a medium, deceptive advertisers will tend to outbid the honest ones.

This seems counterintuitive, since we might suppose that the customer lifetime value of an honest seller’s newly acquired customer could in many cases be greater than the profit from a quick score by a deceptive seller. But targeting doesn’t just match ad impressions with prospective buyers. When used by a deceptive seller, it can also conceal an ad impression from potentially costly attention.

For honest direct marketers, the expected profit from reaching a buyer is positive, and the expected profit from reaching a non-buyer is zero. But the audience does not just contain buyers and non-buyers. People can also be divided into enforcers and non-enforcers. Enforcers can be anything from professional law enforcement people, to someone who takes apart a bogus product and makes a video about it, to just the writer of a bad online review. What enforcers have in common is that for a dishonest seller, the expected profit from reaching an enforcer is negative.

Some kinds of enforcer can impose costs even without buying. For example, a reader might send the publisher a screenshot containing a scam ad and get the advertiser added to an advertiser exclusion list. Other kinds of enforcer might only take action if they buy the product and find it to be a scam. A deceptive advertiser might incur costs when their ad is shown to either kind of enforcer.

For an honest advertiser, the expected profit from a single impression is:

probability of reaching a buyer × expected profit per sale

For the dishonest advertiser, the expected profit is:

probability of reaching a buyer × expected profit per sale − probability of reaching an enforcer × expected loss per enforcer

The expected loss per enforcer is typically high compared to the profit per sale. For example, a small number of contacts with review writers might require a seller to re-launch under a new name. In an ad impression market with both honest and deceptive sellers, where sellers can choose which impressions to bid on, an ad impression that a deceptive seller believes is unlikely to reach an enforcer has extra value to that deceptive advertiser but not to an honest advertiser. Deceptive sellers will tend to outbid honest ones for certain impressions.

A member of the audience might be able to see targeting criteria, but not the advertiser’s internal weighting of targeting criteria. (For example, a targeted ad platform might reveal to you that you are being targeted for an ad because your computer is running the latest release of the OS. What they won’t tell you is that the seller is bidding on impressions to your OS version because they’re selling a tainted nutritional supplement, and the lead testing department at the Ministry of Health is still on the old OS version.)

So, some ad impressions will tend to be purchased by deceptive sellers, but a low-information member of the audience can’t tell which impressions those are. Is this an ad from an honest seller that might be reaching both me and enforcers, or is this an ad from a dishonest seller targeted to reach me but not enforcers? When you read a magazine that reaches a community of practice of which you’re a member, you can be confident that product reviewers and editors are seeing the same ads you are. A web ad could be targeted to avoid experienced and better-connected members of the community of practice.

A good example of where lower targeting capabilities can assist norms enforcement is the 378 commmuters who reported London Underground ads that were later banned by the Advertising Standards Authority. Silvia Milano, Brent Mittelstadt, and Sandra Wachter write,

[I]f the ad could have been targeted to just the subset of consumers most receptive to its message, they might not have raised any complaints. As a result, the harmful message would have gone unchallenged, missing an opportunity for the regulator to update their guidelines in keeping with current social norms.

One possible explanation for the Peak Advertising effect is the interaction between deceptive sellers discovering how to use a new ad medium’s targeting capabilities to avoid enforcers, and the audience discovering the fraction of deceptive sellers.

More: some ways that Facebook ads are optimized for deceptive advertising

13 Apr 2018

working post-creepy ads, and stuff

Post-creepy web ad sightings: What’s next for web advertising after browser privacy improvements and regulatory changes make conventional adtech harder and harder?

The answer is probably something similar to what’s already starting to pop up on niche sites. Here’s a list of ad platforms that work more like print, less like spam: list of post-creepy web ad systems. Comments and suggestions welcome (mail me, or do a GitHub pull request from the link at the bottom.)

Fun with bug futures: we’re in Mozilla’s Internet Health Report. Previous items in that series:

ICYMI: Mozilla experiment aims to reduce bias in code reviews

19 Mar 2018

A good question, from Twitter

Good question on Twitter, but one that might take more than, what is is now, 280 characters? to answer.

Sir, why do you pay so much attention on internet advertising? I have hardly read your tweet that isn't related to internet advertising. I used Privacy Badger for some time last year. It's useful but a little heavy😅

— siven fang (@sivenfan) March 19, 2018

Why do I pay attention to Internet advertising? Why not just block it and forget about it? By now, web ad revenue per user is so small that it only makes sense if you’re running a platform with billions of users, so sites are busy figuring out other ways to get paid anyway.

To the generation that never had a print magazine subscription, advertising is just a subset of “creepy shit on the Internet.” Who wants to do that for a living? According to Charlotte Rogers at Marketing Week, the lack of information out there explaining the diverse opportunities of a career in marketing puts the industry at a distinct disadvantage in the minds of young people. Marketing also has to contend with a perception problem among the younger generation that it is intrinsically linked with advertising, which Generation Z notoriously either distrust or dislike.

Like the man says, Where Did It All Go Wrong?

The answer is that I’m interested in Internet advertising for two reasons.

  • First, because I’m a Kurt Vonnegut fan and have worked for a magazine. Some kinds of advertising can have positive externalities. Vonnegut was able to quit his job at a car dealership, and write full time, because advertising paid for original fiction in Collier’s magazine. How did advertising lose its ability to pay for news and cultural works? Can advertising reclaim that ability?

  • Second, because most of the economic role of advertising is in an area that Internet advertising hasn’t been able to get a piece of. While Internet advertising plays a game of haha, look what I tricked you into clicking on for chump change, the real money is in signal-carrying advertising that helps build brand reputation. Is it possible to make Internet advertising into a medium that can get a piece of the action?

Maybe make that three reasons. As long as Internet advertising fails to pull its weight in either supporting news and cultural works or helping to send a credible economic signal for brands then the scams, malware and mental manipulation will only continue. More: World’s last web advertising optimist tells all!

09 Mar 2018

People’s personal data: take it or ask for it?

We know that advertising on the web has reached a low point of fraud, security risks, and lack of brand safety. And it’s not making much money for publishers anyway. So a lot of people are talking about how to fix it, by building a new user data sharing system, in which individuals are in control of which data they choose to reveal to which companies.

Unlike today’s surveillance marketing, people wouldn’t be targeted for advertising based on data that someone figures out about them and that they might not choose to share.

A big win here will be that the new system would tend to lower the ROI on creepy marketing investments that have harmful side effects such as identity theft and facilitation of state-sponsored misinformation, and increase the ROI for funding ad-supported sites that people trust and choose to share personal information with.

A user-permissioned data sharing system is an excellent goal with the potential to help clean up a lot of the Internet’s problems. But I have to be realistic about it. Adam Smith once wrote,

The pride of man makes him love to domineer, and nothing mortifies him so much as to be obliged to condescend to persuade his inferiors.

So the big question is still:

Why would buyers of user data choose to deal with users (or publishers who hold data with the user’s permission) when they can just take the data from users, using existing surveillance marketing firms?

Some possible answers.

  • GDPR? Unfortunately, regulatory capture is still a thing even in Europe. Sometimes I wish that American privacy nerds would quit pretending that Europe is ruled by Galadriel or something.

  • brand safety problems? Maybe a little around the edges when a particularly bad video gets super viral. But platforms and adtech can easily hide brand-unsafe “dark” material from marketers, who can even spend time on Youtube and Facebook without ever developing a clue about how brand-unsafe they are for regular people. Even as news-gatherers get better at finding the worst stuff, platforms will always make hiding brand-unsafe content a high priority.

  • fraud concerns? Now we’re getting somewhere. Fraud hackers are good at making realistic user data. Even “people-based” platforms mysteriously have more users in desirable geography/demography combinations than are actually there according to the census data. So, where can user-permissioned data be a fraud solution?

  • signaling? The brand equity math must be out there somewhere, but it’s nowhere near as widely known as the direct response math that backs up the creepy stuff. Maybe some researcher at one of the big brand advertisers developed the math internally in the 1980s but it got shredded when the person retired. Big possible future win for the right behavioral economist at the right agency, but not in the short term.

  • improvements in client-side privacy? Another good one. Email spam filtering went from obscure nerdery to mainstream checklist feature quickly—because email services competed on it. Right now the web browser is a generic product, and browser makers need to differentiate. One promising angle is for the browser to help build a feeling of safety in the user by reducing user-perceived creepiness, and the browser’s need to compete on this is aligned with the interests of trustworthy sites and with user-permissioned data sharing.

(And what’s all this “we” stuff, anyway? Post-creepy advertising is an opportunity for individual publishers and brands to get out ahead, not a collective action problem.)

27 Feb 2018

What I don’t get about Marketing

I want to try to figure out something I still don’t understand about Marketing.

First, read this story by Sarah Vizard at Marketing Week: Why Google and Facebook should heed Unilever’s warnings.

All good points, right?

With the rise of fake news and revelations about how the Russians used social platforms to influence both the US election and EU referendum, the need for change is pressing, both for the platforms and for the advertisers that support them.

We know there’s a brand equity crisis going on. Brand-unsafe placements are making mainstream brands increasingly indistinguishable from scams. So the story makes sense so far. But here’s what I don’t get.

For the call to action to work, Unilever really needs other brands to rally round but these have so far been few and far between.

Other brands? Why?

If brands are worth anything, they can at least help people tell one product apart from another.

Think Small VW ad

Saying that other brands need to participate in saving Unilever’s brands from the three-ring shitshow of brand-unsafe advertising is like saying that Volkswagen really needs other brands to get into simple layouts and natural-sounding copy just because Volkswagen’s agency did.

Not everybody has to make the same stuff and sell it the same way. Brands being different from each other is a good thing. (Right?)

generic food

Sometimes a problem on the Internet isn’t a “let’s all work together” kind of problem. Sometimes it’s an opportunity for one brand to get out ahead of another.

What if every brand in a category kept on playing in the trash fire except one?

18 Feb 2018

The tracker will always get through?

(I work for Mozilla. None of this is secret. None of this is Mozilla policy. Not speaking for Mozilla here.)

A big objection to tracking protection is the idea that the tracker will always get through. Some people suggest that as browsers give users more ability to control how their personal information gets leaked across sites, things won’t get better for users, because third-party tracking will just keep up. On this view, today’s easy-to-block third-party cookies will be replaced by techniques such as passive fingerprinting where it’s hard to tell if the browser is succeeding at protecting the user or not, and users will be stuck in the same place they are now, or worse.

I doubt this is the case because we’re playing a more complex game than just trackers vs. users. The game has at least five sides, and some of the fastest-moving players with the best understanding of the game are the adfraud hackers. Right now adfraud is losing in some areas where they had been winning, and the resulting shift in adfraud is likely to shift the risks and rewards of tracking techniques.

Data center adfraud

Fraudbots, running in data centers, visit legit sites (with third-party ads and trackers) to pick up a realistic set of third-party cookies to make them look like high-value users. Then the bots visit dedicated fraudulent “cash out” sites (whose operators have the same third-party ads and trackers) to generate valuable ad impressions for those sites. If you wonder why so many sites made a big deal out of “pivot to video” but can’t remember watching a video ad, this is why. Fraudbots are patient enough to get profiled as, say, a car buyer, and watch those big-money ads. And the money is good enough to motivate fraud hackers to make good bots, usually based on real browser code. When a fraudbot network gets caught and blocked from high-value ads, it gets recycled for lower and lower value forms of advertising. By the time you see traffic for sale on fraud boards, those bots are probably only getting past just enough third-party anti-fraud services to be worth running.

This version of adfraud has minimal impact on real users. Real users don’t go to fraud sites, and fraudbots do their thing in data centers Doesn’t everyone do their Christmas shopping while chilling out in the cold aisle at an Amazon AWS data center? Seems legit to me. and don’t touch users’ systems. The companies that pay for it are legit publishers, who not only have to serve pages to fraudbots—remember, a bot needs to visit enough legit sites to look like a real user—but also end up competing with adfraud for ad revenue. Adfraud has only really been a problem for legit publishers. The adtech business is fine with it, since they make more money from fraud than the fraud hackers do, and the advertisers are fine with it because fraud is priced in, so they pay the fraud-adjusted price even for real impressions.

What’s new for adfraud

So what’s changing? More fraudbots in data centers are getting caught, just because the adtech firms have mostly been shamed into filtering out the embarassingly obvious traffic from IP addresses that everyone can tell probably don’t have a human user on them. So where is fraud going now? More fraud is likely to move to a place where a bot can look more realistic but probably not stay up as long—your computer or mobile device. Expect adfraud concealed within web pages, as a payload for malware, and of course in lots and lots of cheesy native mobile apps.The Google Play Store has an ongoing problem with adfraud, which is content marketing gold for Check Point Software, if you like “shitty app did WHAT?” stories. Adfraud makes way more money than cryptocurrency mining, using less CPU and battery.

So the bad news is that you’re going to have to reformat your uncle’s computer a lot this year, because more client-side fraud is coming. Data center IPs don’t get by the ad networks as well as they once did, so adfraud is getting personal. The good news, is, hey, you know all that big, scary passive fingerprinting that’s supposed to become the harder-to-beat replacement for the third-party cookie? Client-side fraud has to beat it in order to get paid, so they’ll beat it. As a bonus, client-side bots are way better at attribution fraud (where a fraudulent ad gets credit for a real sale) than data center bots.

Users don’t have to get protected from every possible tracking technique in order to shift the web advertising game from a hacking contest to a reputation contest. It often helps simply to shift the advertiser’s ROI from negative-externality advertising below the ROI of positive-externality advertising.

Advertisers have two possible responses to adfraud: either try to out-hack it, or join the “flight to quality” and cut back on trying to follow big-money users to low-reputation sites in the first place. Hard-to-detect client-side bots, by making creepy fingerprinting techniques less trustworthy, tend to increase the uncertainty of the hacking option and make flight to quality relatively more attractive.

17 Feb 2018

This is why we can’t have nice brands.

What if I told you that there was an Internet ad technology that…

  • can reach the same user on mobile and desktop

  • uses open-standard persistent identifiers for users

  • can connect users to their purchase history

  • reaches the users that the advertiser chooses, at the time the advertiser chooses

  • and doesn’t depend on the Google/Facebook duopoly?

Don’t go looking for it on the Lumascape.

I’m describing email spam.

Every feature that adtech is bragging on, or working toward? Email spam had it in the 1990s.

So why didn’t brand advertisers jump all over spam? Why did they mostly leave it to low-reputation brands and scammers?

To be honest, it probably wasn’t a decision decision in most cases, just corporate sloth. But staying away from spam was the right answer. In the email inbox, spam from a high-reputation brand doesn’t look any different from spam that any fly-by-night operation can send. All spammers can do the same stuff:

They can sell to people…for a fraction of what marketing used to cost. And they can collect data on these consumers, track what they buy, what they love and hate about the experience, and market to them directly much more effectively.

Oh, wait. That one isn’t about spam in the 1990s. That’s about targeted advertising on social media sites today. The CEO of digital advertising's biggest trade group says most big marketers are screwed unless they completely change their business models.

It’s the direct consumer relationships, and the use of consumer data, that is completely game-changing for the marketing world. And most big marketers, such as Procter & Gamble and Unilever, are not ready for this new reality, the IAB says.

But of course they’re ready. The difference is that those established brand advertisers aren’t any more ready than some guy who watched a YouTube video series on “growth hacking” and is ready to start buying targeted ads and drop-shipping.

The “new reality,” the targeted advertising business that the IAB wants brands to join them in, is a place where you win based not on how much the audience trusts you, but on how well you can out-hack the competition. And like any information space organized by hacking skill, it’s a hellscape of deceptive crap. Read The Strange Brands in Your Instagram Feed by Alexis C. Madrigal.

Some Instagram retailers are legit brands with employees and products. Others are simply middlemen for Chinese goods, built in bedrooms, and launched with no capital or inventory. All of them have been pulled into existence by the power of Instagram and Facebook ads combined with a suite of e-commerce tools based around Shopify.

Of course, not every brand that buys a social media ad or other targeted ad is crap.

But a social media ad is useless for telling crap brands from non-crap ones. It doesn’t carry economic signal. There’s no such thing as a free watch. (PDF)

Rory Sutherland writes, in Reducing activities to their core misses the point,

Many billions of pounds of advertising expenditure have been shifted from conventional media, most notably newspapers, and moved into digital media in a quest for targeted efficiency. If advertising simply works by the conveyance of messages, this would be a sensible thing to do. However, it is beginning to become apparent that not all, perhaps not even most, advertising works this way. It seems that a large part of advertising creates trust and conviction in its audience precisely because it is perceived to be costly.

If anyone knows that any seller can watch a few YouTube videos and do a certain activity, does that activity really help the audience distinguish a high-reputation seller from a low-reputation one?

And how does it affect a legit brand when its ads show up on the same medium with all the crappy ones?Twitter has a solution that keeps its ads saleable: just don’t show any ads to important people. I’m surprised they can get away with this, but given the mix of rip-off and real brand ads I keep seeing there, it seems to be working.

Extremists and state-sponsored misinformation campaigns aren’t “abusing” targeted advertising. They’re just taking advantage of a system optimized for deception and using it normally.

Now, I don’t want to blame targeted advertising for all of the problems of brand equity. When you put high-fructose corn syrup in your product, brand equity suffers. When you outsource or de-skill the customer support function, brand equity suffers. All the half-ass “looks good this quarter” stuff that established brands are doing is bad for brand equity. It just turns out that the kinds of advertising that you can do on the Internet today are all half-ass “looks good this quarter” stuff. If you want to send a credible economic signal, buy TV time or put a flagship store on some expensive real estate. The Internet’s got nothing for you.

Failure to create signal-carrying ad units should be more of a concern for people who want to earn ad money on the Internet than it is. See Bob Hoffman’s “refrigerator test.” All that work that went into building the most complicated ad medium ever? It went into building an ad medium optimized for low-reputation advertisers. And that kind of ad medium tends to see rates go down over time. It doesn’t hold value.

And the medium can’t gain value until the users trust it, which means they have to trust the browser. In-browser tracking protection is going to have to enable the legit web advertising industry the same way that spam filters enables the legit email newsletter industry.

13 Feb 2018

Two visions of GDPR

As far as I can tell, there are two sets of ambitious predictions about GDPR.

One is the VRM vision. Doc Searls writes, on ProjectVRM:

I am sure Google, Facebook and lesser purveyors of advertising online will find less icky ways to stay in business; but it is becoming clear that next May 25, when the GDPR goes into full effect, will be an extinction-level event for tracking-based advertising (aka adtech) as a business model.

Big impact? Not so fast. There’s also a “business as usual” story, and that one, you’ll find at Digital Advertising Consent.

Our complex ecosystem of companies must cooperate more closely than ever before to meet the transparency and consent requirements of European data protection law.

According to the adtech firms, well, maybe there will be more Bürokratie, more pointless dialogs that users have to click through, and one more line item, “GDPR compliance”, to come out of the publisher’s share, of course, but the second vision of GDPR is essentially just adtech/adfraud as usual. Upgrade to the new version of OpenRTB, and move along, nothing to see here.

Personally, I’m not buying either one of these GDPR visions. Because, just for fun and also because reasons, I run my own mail server.

And every little decision I have to make about how to configure the damn thing is based on playing a game with email spammers. Regulation is a part of my complete breakfast, but it’s not the whole story.

The government doesn’t give you freedom from spam. You have to take it for yourself, one filtering rule at a time. Or, do what most people do, and find a company that does it for you, but it has to be a company that you trust with your information.

A mail sender’s decision to comply, or not comply, with some regulation is a bit of information. That feeds into the software that makes the final decision: inbox, spam folder, or reject. When a spam message complies with the regulations of some country, my mail server doesn’t say, “Oh, wow, compliant! I can skip all the other checks and send this one straight to the inbox!” It uses the regulation compliance along with other information to make that decision.

So whatever extra consent forms that surveillance marketers are required to send by GDPR? They’re not the final decision on What The User Must See. They’re just data, coming over the network.

Some of that data will be interpreted to mean that this request is an obvious mismatch with how the user chooses to share their info. The user might not even see those consent forms, or the browser might pop up a notification:

4 requests to do creepy shit, that’s obviously against your preferences, already denied. Isn’t this the best browser ever?

(No, I don’t write copy for browser notifications. But you get the idea.)

Browsers that implement tracking protection might end up with a feature where they detect requests for permission to do things that the user has already said no to—by turning on tracking protection in the first place—and auto-deny them.

Legit email senders had to learn “deliverability,” the art and science of making legit mail look legit so that it can get past email spam filters. Legit advertisers will have to learn that users aren’t identical and spherical, users choose tools to implement their data sharing preferences, and that regulatory compliance is only part of the job.

11 Feb 2018

Team A vs. Team B

Let’s run a technical challenge on the Internet. Team A vs. Team B.

Team A gets to work where they want, when they want. Team B has to work in an open-plan office, with people walking behind them, talking on the phone, doing all that annoying office stuff.

Members of Team A get paid for successful work within weeks or months. Members of Team B get a base salary that they have to spend on rent in an expensive location, but just might get paid extra for successful work in four years.

Team A will let anyone try to join, and those who aren’t successful have to drop out quickly. Team B will only let members who are a “good cultural fit” join, and it takes a while to get rid of an unsuccessful member.

Team A can deploy unproven work for real-world testing, using infrastructure that they get for free on the Internet. Team B can only deploy their work when production-ready, on infrastructure they have to pay for.

If Team A breaks the rules, the penalty is that they have to spend a little money to register new domain names. If Team B breaks the rules, they risk lengthy regulatory and/or legal consequences.

Team A scores a win any time they can beat whoever is the weakest member of Team B at that time. Team B can only score a win when they can consistently defeat all of the most active members of Team A.

Team A is adfraud.

Why is so much marketing money being bet on Team B?

06 Feb 2018

Fun with numbers

(I work for Mozilla. None of this is secret. None of this is official Mozilla policy. Not speaking for Mozilla here.)

Guess what? According to Emil Protalinski at VentureBeat, the browser wars are back on.

Google is doubling down on the user experience by focusing on ads and performance, an opportunity I’ve argued its competitors have completely missed.

Good point. Jonathan Mendez has some good background on that.

The IAB road blocked the W3C Do Not Track initiative in 2012 that was led by a cross functional group that most importantly included the browser makers. In hindsight this was the only real chance for the industry to solve consumer needs around data privacy and advertising technology. The IAB wanted self-regulation. In the end, DNT died as the IAB hoped.

As third-party tracking made the ad experience crappier and crappier, browser makers tried to play nice. Browser makers tried to work in the open and build consensus.

That didn’t work, which shouldn’t be a surprise. Imagine if email providers had decided to build consensus with spammers about spam filtering rules. The spammers would have been all like, “It replaces the principle of consumer choice with an arrogant ‘Hotmail knows best’ system.” Any sensible email provider would ignore the spammers but listen to deliverability concerns from senders of legit opt-in newsletters. Spammers depend on sneaking around the user’s intent to get their stuff through, so email providers that want to get and keep users should stay on the user’s side. Fortunately for legit mail senders and recipients, that’s what happened.

On the web, though, not so much.

But now Apple Safari has Intelligent Tracking Prevention. Industry consensus achieved? No way. Safari’s developers put users first and, like the man said, if you’re not first you’re last.

And now Google is doing their own thing. Some positive parts about it, but by focusing on filtering annoying types of ad units they’re closer to the Adblock Plus “Acceptable Ads” racket than to a real solution. So it’s better to let Ben Williams at Adblock Plus explain that one. I still don’t get how it is that so many otherwise capable people come up with “let’s filter superficial annoyances and not fundamental issues” and “let’s shake down legit publishers for cash” as solutions to the web advertising problem, though. Especially when $16 billion in adfraud is just sitting there. It’s almost as if the Lumascape doesn’t care about fraud because it’s priced in so it comes out of the publisher’s share anyway.

So with all the money going to fraud and the intermediaries that facilitate it, local digital news publishers are looking for money in other places and writing off ads. That’s good news for the surviving web ad optimists (like me) because any time Management stops caring about something you get a big opportunity to do something transformative.

Small victories

The web advertising problem looks big, but I want to think positive about it.

  • billions of web users

  • visiting hundreds of web sites

  • with tens of third-party trackers per site.

That’s trillions of opportunities for tiny victories against adfraud.

Right now most browsers and most fraudbots are hard to tell apart. Both maintain a single “cookie jar” across trusted and untrusted sites, and both are subject to fingerprinting.

For fraudbots, cross-site trackability is a feature. A fraudbot can only produce valuable ad impressions on a fraud site if it is somehow trackable from a legit site.

For browsers, cross-site trackability is a bug, for two reasons.

  • Leaking activity from one context to another violates widely held user norms.

  • Because users enjoy ad-supported content, it is in the interest of users to reduce the fraction of ad budgets that go to fraud and intermediaries.

Browsers don’t have the solve the whole web advertising problem to make a meaningful difference. As soon as a trustworthy site’s real users look diffferent enough from fraudbots, because fraudbots make themselves more trackable than users running tracking-protected browsers do, then low-reputation and fraud sites claiming to offer the same audience will have a harder and harder time trying to sell impressions to agencies that can see it’s not the same people.

Of course, the browser market share numbers will still over-represent any undetected fraudbots and under-represent the “conscious chooser” users who choose to turn on extra tracking protection options. But that’s an opportunity for creative ad agencies that can buy underpriced post-creepy ad impressions and stay away from overvalued or worthless bot impressions. I expect that data on who has legit users—made more accurate by including tracking protection measurements—will be proprietary to certain agencies and brands that are going after customer segments with high tracking protection adoption, at least for a while.

20 Jan 2018

More brand safety bullshit

There’s enough bullshit on the Internet already, but I’m afraid I’m going to quote some more. This time from Ilyse Liffreing at IBM.

The reality is none of us can say with certainty that anywhere in the world, we are [brand] safe. Look what just happened with YouTube. They are working on fixing it, but even Facebook and Google themselves have said there’s not much they can do about it. I mean, it’s hard. It’s not black and white. We are putting a lot of money in it, and pull back on channels where we have concerns. We’ve had good talks with the YouTube teams.

Bullshit.

One important part of this decision is black and white.

Either you give money to Nazis.

Or you don’t give money to Nazis.

If Nazis are better at “programmatic” than the resting-and-vesting chill bros trying to deal with brand safety issues at the programmatic ad firms (and, face it, Nazis kick ass at programmatic), then the choice to spend ad money in a we’re-kind-of-not-sure-if-this-goes-to-Nazis-or-not way is a choice that puts your brand on the wrong side of a black and white line.

There are plenty of Nazi-free places for brands to run ads. They might not be the cheapest. But I know which side of the line I buy from.

16 Jan 2018

Remove all the tracking widgets? Maybe not.

Good one from Mark Pilipczuk: Publisher Advice From a Buyer.

Remove all the tracking widgets from your site. That Facebook “Like” button only serves to exfiltrate your valuable data to an entity that doesn’t have your best interests at heart. If you’ve got a valuable audience, why would you want to help the ad tech industry which promises “I can find the same and bigger audience over here for $2 CPM, so don’t buy from the publisher?” Sticking your own head in the noose is never a good idea.

That advice makes sense for the Facebook “like button.” That button is just a data shoplifter. The others, though? All those extra trackers come in as side effects of ad deals, and they’re likely to be contractually required to make ads on the site saleable.

Yes, those trackers feed bots and data leakage, and yes, they’re even terrible at fighting adfraud. Augustine Fou points out that Fraud filters don’t work. “In some cases it’s worse when filter is on.”

So in an ideal world you would be able to pull all the third-party trackers, but as far as day-to-day operations go, user tracking is a Chesterton’s Fence problem. What happens if a legit site unilaterally takes down the third-party trackers? All the targeted ad impressions that would have given that site a (small) payment end up going to bots.

So what can a site do? Understand that the real fix has to happen on the browser end, and nudge the users to either make their browsers less data-leaky, or switch to browsers that are leakage-resistant out of the box.

Start A/B testing some notifications to remind users to turn on tracking protection.

  • Can you get users who are already choosing “Do Not Track” to turn on real protection if you inform them that sites ignore their DNT choice?

  • If a user is running an ad blocker with a paid whitelisting scheme, can you inform them about it to get them to switch to a better tool, or at least add a second layer of protection that limits the damage that paid whitelisting can do?

  • When users visit privacy pages or opt-out of a marketing program, are they also willing to check their browser privacy settings?

Every site’s audience is different. It’s hard to know in advance how users will respond to different calls to action to turn up their privacy and create a win-win for legit sites and legit brands. We do know that users are concerned and confused about web advertising, and the good news is that the JavaScript needed to collect data and administer nudges is as easy to add as yet another tracker.

More on what sites can do, that might be more effective than just removing trackers: What The Verge can do to help save web advertising

13 Jan 2018

Easy question with too many wrong answers

Content warning: Godwin’s Law.

Here’s a marketing question that should be easy.

How much of my brand’s ad budget goes to Nazis?

Here’s the right answer.

Zero.

And here’s a guy who still seems to be having some trouble answering it: Dear Google (GOOG): Please stop using my advertising dollars to monetize hate speech.

If you’re responsible for a brand and somewhere in the mysterious tubes of adtech your money is finding its way to Nazis, what is the right course of action?

One wrong answer is to write a “please help me” letter to a company that will just ignore it. That’s just admitting to knowingly sending money to Nazis, which is clearly wrong.

Here’s another wrong idea, from the upcoming IAB Annual Leadership Meeting session on “brand safety” (which is the nice, sanitary professional-sounding term for “trying not to sponsor Nazis, but not too hard.”)

Threats to brand safety arise internally and externally, in your control and out of your control—and the stakes have never been higher. Learn how to minimize brand safety risks and maximize odds of survival when your brand takes a hit (spoiler alert: overreacting is as bad as underreacting). Best Buy and Starcom share best practices based on real-world encounters with brand safety issues.

Really, people? Overreacting is as bad as underreacting? The IAB wants you to come to a deluxe conference about how it’s fine to send a few bucks to Nazis here and there as long as it keeps their whole adtech/adfraud gravy train running on time.

I disagree. If Best Buy is fine with (indirectly of course) paying the occasional Nazi so that the IAB companies can keep sending them valuable eyeballs from the cheapest possible sites, then I can shop elsewhere.

Any nationalist extremist movement has its obvious supporters, who wear the outfits and get the tattoos and go march in the streets and all that stuff, and also the quiet supporters, who come up with the money and make nice with the powers that be. The supporters who can keep it deniable.

Can I, as a potential customer from the outside, tell the difference between quiet Nazi supporters and people who are just bad at online advertising and end up supporting Nazis by mistake? Of course not. Do I care? Of course not. If you’re not willing to put the basic “don’t pay Nazis to do Nazi stuff” rule ahead of a few ad clicks, I don’t want your brand anyway. And I’ll make sure to install and use the tracking protection tools that help keep my good data away from bad sites.

31 Dec 2017

some more random links

This one is timely, considering that an investment in “innovation” comes with a built-in short position in Bay Area real estate, and the short squeeze is on: Collaboration in 2018: Trends We’re Watching by Rowan Trollope

In 2018, we’ll see the rapid decline of “place-ism,” the discrimination against people who aren’t in a central office. Technology is making it easier not just to communicate with distant colleagues about work, but to have the personal interactions with them that are the foundation of trust, teamwork, and friendship.

Really, “place-ism” only works if you can afford to overpay the workers who are themselves overpaying for housing. And management can only afford to overpay the workers by giving in to the temptations of rent-seeking and deception. So the landlord makes the nerd pay too much, the manager has to pay the nerd too much, and you end up with, like the man said, [“debts that no honest man can pay”](https://en.wikipedia.org/wiki/Atlantic_City_(song)?

File under “good examples to illustrate Betteridge’s law of headlines”: Now That The FCC Is Doing Away With Title II For Broadband, Will Verizon Give Back The Taxpayer Subsidies It Got Under Title II?

Open source business news: Docker, Inc is Dead. Easy to see this as a run-of-the-mill open source business failure story. But at another level, it’s the story of how the existing open source incumbents used open practices to avoid having to bid against each other for an overfunded startup.

If “data is the new oil” where is the resource curse for data? Google Maps’s Moat, by Justin O’Beirne (related topic: once Google has the 3d models of buildings, they can build cool projects: Project Sunroof)

Have police departments even heard of Caller ID Spoofing or Swatting? Kansas Man Killed In ‘SWATting’ Attack

Next time I hear someone from a social site talking about how much they’re doing about extremists and misinformation and such, I have to remember to ask: have you adjusted your revenue targets for political advertising down in order to reflect the bad shit you’re not doing any more? How Facebook’s Political Unit Enables the Dark Art of Digital Propaganda

Or are you just encouraging the “dark social” users to hide it better?

ICYMI, great performance optimization: Firefox 57 delays requests to tracking domains

Boring: you’re operating a 4500-pound death machine. Exciting: three Slack notifications and a new AR game! Yes, Smartphone Use Is Probably Behind the Spike in Driving Deaths. So Why Isn’t More Being Done to Curb It?

I love “nopoly controls entire industry so there is no point in it any more” stories: The Digital Advertising Duopoly Good news on advertising. The Millennials are burned out on advertising—most of what they’re exposed to now is just another variant of “creepy annoying shit on the Internet”—but the generation after the Millennials are going to have hella mega opportunities building the next Creative Revolution.

Another must-read for the diversity and inclusion department. 2017 Was the Year I Learned About My White Privilege by Max Boot.

28 Dec 2017

Predictions for 2018

Bitcoin to the moooon: The futures market is starting up, so here comes a bunch more day trader action. More important, think about all the bucket shops (I even saw an “invest in Bitcoin without owning Bitcoin” ad on public transit in London), legit financial firms, Libertarian true believers, and coins lost forever because of human error. Central bankers had better keep an eye on Bitcoin, though. Last recession we saw that printing money doesn’t work as well as it used to, because it ends up in the hands of rich people who, instead of priming economic pumps with it, just drive up the prices of assets. I would predict “Entire Round of Quantitative Easing Gets Invested in Bitcoin Without Creating a Single New Job” but I’m saving that one for 2019. Central banks will need to innovate. Federal Reserve car crushers? Relieve medical debt by letting the UK operate NHS clinics at their consulates in the USA, and we trade them US green cards for visas that allow US citizens to get treated there? And—this is a brilliant quality of Bitcoin that I recognized too late—there is no bad news that could credibly hurt the value of a purely speculative asset.

The lesson for regular people here is not so much what to do with Bitcoin, but remember to keep putting some well-considered time into actions that you predict have unlikely but large and favorable outcomes. Must remember to do more of this.

High-profile Bitcoin kidnapping in the USA ends in tragedy: Kidnappers underestimate the amount of Bitcoin actually available to change hands, ask for more than the victim’s family (or fans? a crowdsourced kidnapping of a celebrity is now a possibility) can raise in time. Huge news but not big enough to slow down something that the finance scene has already committed to.

Tech industry reputation problems hit open source. California Internet douchebags talk like a positive social movement but act like East Coast vampire squid—and people are finally not so much letting them define the terms of the conversation. The real Internet economy is moving to a three-class system: plutocrats, well-paid brogrammers with Aeron chairs, free snacks and good health insurance, and everyone else in the algorithmically-managed precariat. So far, people are more concerned about the big social and surveillance marketing companies, but open source has some of the same issues. Just as it was widely considered silly for people to call Facebook users “the Facebook community” in 2017, some of the “community” talk about open source will be questioned in 2018. Who’s working for who, and who’s vulnerable to the risks of doing work that someone else extracts the value of? College athletes are ahead of the open source scene on this one.

Adfraud becomes a significant problem for end users: Powerful botnets in data centers drove the pivot to video. Now that video adfraud is well-known, more of the fraud hackers will move to attribution fraud. This ties in to adtech consolidation, too. Google is better at beating simple to midrange fraud than the rest of the Lumascape, so the steady progress towards a two-logo Lumascape means fewer opportunities for bots in data centers.

Attribution fraud is nastier than servers-talking-to-servers fraud, since it usually depends on having fraudulent and legit client software on the same system—legit to be used for a human purchase, fraudulent to “serve the ad” that takes credit for it. Unlike botnets that can run in data centers, attribution fraud comes home with you. Yeech. Browsers and privacy tools will need to level up from blocking relatively simple Lumascape trackers to blocking cleverer, more aggressive attribution fraud scripts.

Wannabe fascists keep control of the US Congress, because your Marketing budget: “Dark” social campaigns (both ads and fake “organic” activity) are still a thing. In the USA, voter suppression and gerrymandering have been cleverly enough done that social manipulation can still make a difference, and it will.

In the long run, dark social will get filtered out by habits, technology, norms, and regulation—like junk fax and email spam before it—but we don’t have a “long run” between now and November 2018. The only people who could make an impact on dark social now are the legit advertisers who don’t want their brands associated with this stuff. And right now the expectations to advertise on the major social sites are stronger than anybody’s ability to get an edgy, controversial “let’s not SPONSOR ACTUAL F—–G NAZIS” plan through the 2018 marketing budget process.

Yes, the idea of not spending marketing money on supporting nationalist extremist forums is new and different now. What a year.

24 Dec 2017

Salary puzzle

Short puzzle relevant to some diversity and inclusion threads that encourage people to share salary info. (I should tag this as “citation needed” because I don’t remember where I heard it.)

Alice, Bob, Carlos, and Dave all want to know the average salary of the four, but none wants to reveal their individual salary. How can the four of them work together to determine the average? Answer below.

 

 

 

 

 

 

 

 

 

 

 

 

 

Answer

Alice generates a random number, adds it to her salary, and gives the sum to Bob.

Bob adds his salary and gives the sum to Carlos.

Carlos adds his salary and gives the sum to Dave.

Dave adds his salary and gives the sum to Alice.

Alice subtracts her original random number, divides by the number of participants, and announces the average. No participant had to share their real salary, but everyone now knows if they are paid above or below the average for the group.

23 Dec 2017

What we have, what we need

Stuff the Internet needs: home fiber connections, symmetrical, flat rate, on neutral terms.

Stuff the Internet is going nuts over: cryptocurrencies.

Big problem with building fiber to the home: capital.

Big problem with cryptocurrencies: stability.

Two problems, one solution? Hard to make any kind of currency useful without something stable, with evidence-based value, to tie its value to. Fiat currencies are tied to something of value? Yes, people have to pay taxes in them. Hard to raise capital for “dumb pipe” Internet service because it’s just worth about the same thing, month after month. So what if we could combine the hotness and capital-attractiveness of cryptocurrencies with the stability and actual usefulness of fiber?

18 Dec 2017

quick question on tracking protection

One quick question for anyone who still isn’t convinced that tracking protection needs to be a high priority for web browsers in 2018. Web tracking isn’t just about items from your online shopping cart following you to other sites. Users who are vulnerable to abusive practices for health or other reasons have tracking protection needs too.

Screenshot from the American Cancer Society site, showing 24 web trackers

Who has access to the data from each of the 24 third-party trackers that appear on the American Cancer Society’s Find Cancer Treatment and Support page, and for what purposes can they use the data?

17 Dec 2017

Forbidden words

You know how the US government’s Centers for Disease Control and Prevention is now forbidden from using certain words?

vulnerable
entitlement
diversity
transgender
fetus
evidence-based
science-based

(source: Washington Post)

Well, in order to help slow down the spread of political speech enforcement that is apparently stopping all of us cool innovator type people from saying the Things We Can’t Say, here’s a Git hook to make sure that every time you blog, you include at least one of the forbidden words.

If you blog without including one of the forbidden words, you’re obviously internalizing censorship and need more freedom, which you can maybe get by getting out of California for a while. After all, a lot of people here seem to think that “innovation” is building more creepy surveillance as long as you call it “growth hacking” or writing apps to get members of the precariat to do the stuff that your Mom used to do for you.

You only have to include one forbidden word every time you commit a blog entry, not in every file. You only need forbidden words in blog entries, not in scripts or templates. You can always get around the forbidden word check with the –no-verify command-line option.

Suggestions welcome.

16 Dec 2017

Mindless link propagation

Not much time to blog because work travel, but here is some of the stuff I would have been linking to if I were writing anything. I plan to get started again over the holiday break.

What can possibly go wrong?

Simler and Hanson on Our Hidden Motivations in Everyday Life

Universities spend millions on accessing results of publicly funded research

The “hater” is calling from inside the cap table

How our housing choices make adult friendships more difficult

Former Gawker employees are crowdfunding to relaunch a Gawker.com that’s owned by a nonprofit and funded by readers

‘Data arbitrage is as big a problem as media arbitrage’: Confessions of a media exec

The First Women in Tech Didn’t Leave—Men Pushed Them Out

“Phantom debt” schemers target millions of Americans. After thousands of phone calls, one target got his revenge.

The digital hippies want to integrate life and work – but not in a good way

The Rise of Rust in Dev/Ops

Breaking Cliques at Events

I Made My Shed the Top Rated Restaurant On TripAdvisor

Not Every Kid-Bond Matures

09 Dec 2017

Are bug futures just high-tech piecework?

Are bug futures just high-tech piecework, or worse, some kind of “gig economy” racket?

Just to catch up, bug futures, an experimental kind of agreement covered in A Trading Market for Prices in Peer Production, are futures contracts based on the status of bugs in a bug tracker. (update: expanded paper from the same authors is Market for Trading Software Issues, in Journal of Cybersecurity)

For developers: Find an open issue that matches your skills and interests. Buy a futures contract connected to that issue that will pay you when the issue is fixed. Work on the issue, in the open—then decide if you want to hold your contract until maturity, or sell it at a profit. Report an issue and pay to reward others to fix it

For users: Create a new issue on the project bug tracker, or select an existing one. Buy a futures contract on that issue that will cost you a known amount when the issue is fixed, or pay you to compensate you if the issue goes unfixed. Reduce your exposure to software risks by directly signaling the project participants about what issues are important to you. Invest in futures on an open source market

Bug futures also open up the possibility of incentivizing other kinds of work, such as clarifying and translating bug reports, triaging bugs, writing failing tests, or doing code reviews—and especially arbitrage of bugs from project to project.

Bug futures are different from open source bounty systems, what have been repeatedly tried but have so far failed to take off. The big problem with conventional open source bounty systems is that, as far as I can tell, they fail to incentivize cooperative work, and in a lot of situations might incentivize un-cooperative behavior. If I find a bug in a web application, and offer a bounty to fix it, the fix might require JavaScript and CSS work. A developer who fixes the JavaScript and gets stuck on the CSS might choose not to share partial work in order to contend for the entire bounty. Likewise, the developer who fixes the CSS part of the bug might get stuck on the JavaScript. Because of how bounties are structured, if the two wanted to split the bounty they would need to find, trust, and coordinate with each other. Meanwhile, if the bug was the subject of a futures contract, the JavaScript developer could write up a good commit message explaining how their partial work made progress toward a fix, and offer to sell their side of the contract. A CSS developer could take on the rest of the work by buying out that position.

Futures trading and risk shifts

But will bug futures tend to shift the risks of software development away from the “owners” of software (the owners don’t have to be copyright holders, they could be those who benefit from network effects) and toward the workers who develop, maintain, and support it?

I don’t know, but I think that the difference between bug futures and piecework is where you put the brains of the operation. In piecework and the gig economy, the matching of workers to tasks is done by management, either manually or in software. Workers can set the rate at which they work in conventional piecework, or accept and reject tasks offered to them in the gig economy, but only management can have a view of all available tasks.

Bug futures operate within a commons-based peer production environment, though. In an ideal peer production scene, all participants can see all available tasks, and select the most rewarding tasks. Somewhere in the economics literature there is probably a model of task selection in open source development, and if I knew where to find it I could put an impressive LaTeX equation right around here. Of course, open source still has all kinds of barriers that make matching of workers to tasks less than ideal, but it’s a good goal to keep in mind.

If you do bug futures right, they interfere as little as possible with the peer production advantage—that it enables workers to match themselves to tasks. And the futures market adds the ability for people who are knowledgeable about the likelihood of completion of a task, usually those who can do the task, to profit from that knowledge.

Rather than paying a worker directly for performing a task, bug futures are about trading on the outcomes of tasks. When participating, you’re not trading labor for money, you’re trading on information you hold about the likelihood of successful completion of a task. As in conventional financial markets, information must be present on the edges, with the individual participants, in order for them to participate. If a feature is worth $1000 to me, and someone knows how to fix it in five minutes, bug futures could facilitate a trade that’s profitable to both ends. If the market design is done right, then most of that value gets captured by the endpoints—the user and developer who know when to make the right trade.

The transaction costs of trading in information tend to be lower than the transaction costs of trading in labor, for a variety of reasons which you will probably believe in to different extents depending on your politics. What if we could replace some direct trading in labor with trading in the outcomes of that labor by trading information? Lower transaction costs, more gains from trade, more value created.

07 Dec 2017

three kinds of open source metrics

Some random notes about open source metrics, related to work on CHAOSS, where Mozilla is a member and I’m on the Governing Board.

As far as I can tell, there are three kinds of open source metrics.

Impact metrics cover how much value the software creates. Possible good ones include count of projects dependent on this one, mentions of this project in job postings, books, papers, and conference talks, and, of course sales of products that bundle this project.

Contributor reward metrics cover how the software is a positive experience for the people who contribute to it. Job postings are a contributor reward metric as well as an impact metric. Contributor retention metrics and positive results on contributor experience surveys are some other examples.

But impact metrics and contributor reward metrics tend to be harder to collect, or slower-moving, than other kinds of metrics, which I’ll lump together as activity metrics. Activity metrics include most of the things you see on open source project dashboards, such as pull request counts, time to respond to bug reports, and many others. Other activity metrics can be the output of natural language processing on project discussions. An example of that is FOSS Heartbeat, which does sentiment analysis, but you could also do other kinds of metrics based on text.

IMHO, the most interesting questions in the open source metrics area are all about: how do you predict impact metrics and contributor reward metrics from activity metrics? Activity metrics are easy to automate, and make a nice-looking dashboard, but there are many activity metrics to choose from—so which ones should you look at?

Which activity metrics are correlated to any impact metrics?

Which activity metrics are correlated to any contributor reward metrics?

Those questions are key to deciding which of the activity metrics to pay attention to. I’m optimistic that we’ll be seeing some interesting correlations soon.

02 Dec 2017

Purple box claims another victim

Linux Journal Ceases Publication. If you can stand it, let’s have a look at the final damage.

screenshot of Linux Journal with a list of third-party trackers provided by the Ghostery extension

40 trackers. Not bad, but not especially good either. That purple box of data leakage—third-party trackers that forced Linux Journal into an advertising race to the bottom against low-value and fraud sites—is not so deep as a well, nor so wide as a church doorbut it’s there. A magazine that was a going concern in print tried to make the move to the web and didn’t survive.

Linux Journal is where I was working when I first started wondering why print ads tend to hold their value while web ads keep losing value. Unfortunately it’s not enough for sites to just stop running trackers and make the purple box go away. But there are a few practical steps that Internet freedom lovers can take to stop the purple box from taking out your other favorite sites.

18 Nov 2017

Asking sites to do something about surveillance marketing

This might get the privacy activists mad at me, but as far as I can tell it’s still counterproductive to ask a web site you visit to remove its third-party trackers.

Of course, third-party trackers are probably helping to support a political cause that most sites don’t agree with, and, as Zeynep Tufekci says, “We’re building a dystopia just to make people click on ads”. This stuff needs to get fixed. So this is about productive next steps.

Right now, advertising on the site you’re writing to probably isn’t saleable without the creepy trackers.
And if the site you’re writing to is a brand site, the marketing department can’t justify spending money to support content sites without showing some kind of tracking data. (more on this problem: User tracking as Chesterton’s Fence) So what can privacy people productively ask sites for? Some good ones are:

  • Fix any “turn off your ad blocker” scripts to detect ad blockers only, and not falsely alert on privacy tools.

  • Remove links to the the confusing and broken “YourAdChoices” site. Adtech opt-outs don’t cover all trackers, and are much less effective than real privacy tools. (I have never had all the opt-outs work on that site, even from a fresh, pristine browser. Somehow I get the sense that the adtech firms don’t exactly put their best people on it.)

  • Link to the privacy pages for the third parties the site uses. If the advertising on the site is set up so that this is hard to do, and users might see a tracker from an unknown domain, say so.

  • Fix up the privacy page to add links to appropriate privacy tools based on the user’s browser. Better to have users on privacy tools than get enrolled in a paid whitelisting scheme.

  • If you maintain a privacy tool, offer to do a campaign with the site. Privacy tool users are high-quality human traffic. Free or discounted privacy tools might work as a subscription promotion. Where’s the win-win?

Asking a site to walk away from money with no credible alternative is probably not going to work. Asking a site to consider next steps to get out of the current web advertising mess? That might.

More: What The Verge can do to help save web advertising

13 Nov 2017

Time-saving tip for Firefox 57

(updated 21 Nov 2017: made the words “even faster” a link to an article with graphs.)

Last time I recommended the Tracking Protection feature in Firefox 57, coming tomorrow. The fast browser is even faster when you block creepy trackers, which are basically untested combinations of third-party JavaScript.

But what about sites that mistakenly detect Tracking Protection as “an ad blocker” and give you grief about it? Do you have to turn Tracking Protection off?

So far I have found that the answer is usually no. I can usually use NJS to turn off JavaScript for that site instead. (After all, if a web developer can’t tell an ad blocker from a tracking protection tool, I don’t trust their JavaScript anyway.)

NJS will also deal with a lot of “growth hacking” tricks such as newsletter signup forms that appear in front of the main article. And it defaults to on, so that sites with JavaScript will work normally until I decide that they’re better off without it.

13 Nov 2017

I’m taking a Bitcoin risk even though I don’t hold Bitcoin. Please regulate me.

In the country where I live, kidnapping for ransom is not a very common crime.

That’s because picking up the ransom is too risky.

It’s easy to kidnap someone, and easy to let the person go when the ransom is paid, but picking up the ransom exposes you. Wannabe kidnappers who are motivated by money tend to choose other crimes.

As the [family relationship redacted] of a [family member information redacted], I’m happy that kidnapping is difficult here. High transaction costs for some kinds of transaction are a good thing.

Now, here comes Bitcoin.

As we’re already seeing with ransomware, harder-to-trace ransom drops are now a thing.

So, even though I don’t actually hold Bitcoin, someone could grab my family member (low risk), demand that I exchange some of my conventional assets for Bitcoin (low risk) and send the Bitcoin as ransom (low risk). The balance between risk and reward for the crime of kidnapping for ransom has changed.

IMHO this is a bigger problem than any of the reasons that Charles Stross wants Bitcoin to die in a fire.

So what to do about it?

Move the risks where the profits are.

Make the Bitcoin business eat the costs of payments made under duress.

New rule: If I ever trade any assets for Bitcoin in order to comply with a threat, and then transfer the Bitcoin under duress (kidnapping, ransomware, whatever), then I can go back to whoever I gave the assets to with a copy of the police report on the incident and get my original assets (and any fees) back.

Yes, that makes it harder for regular people to trade assets for Bitcoin. Exchanges would have to hold the money for a while, check that I’m not under duress, and probably do all kinds of other pain-in-the-ass, possibly costly, work. But I’d rather have that than the alternative.

11 Nov 2017

my Firefox 57 add-ons

Firefox 57 is coming on Tuesday, and as you may have heard, add-ons must use the WebExtensions API. I have been running Firefox Nightly for a while, so add-on switching came for me early. Here is what I have come up with.

The basic set

Privacy Badger is not on here just because I’m using Firefox Tracking Protection. I like both.

Blogging, development and testing

  • blind-reviews. This is an experiment to help break your own habits of bias when reviewing code contributions. It hides the contributor name and email when you first see the code, and you can reveal it later. Right now it just does Bugzilla, but watch this space for an upcoming GitHub version. (more info)

  • Copy as Markdown. Not quite as full-featured as the old “Copy as HTML Link” but still a time-saver for blogging. Copy both the page title and URL, formatted as Markdown, for pasting into a blog.

  • Firefox Pioneer. Participate in Firefox user research. Studies have extremely strict and detailed privacy policies.

  • Test Pilot. Try new Firefox features. Tracking Protection was on Test Pilot for a while. Right now there is a new speech recognition one, an in-browser notepad, and more.

Advanced (for now) nerdery

  • Cookie AutoDelete. Similar to the old “Self-Destructing Cookies”. Cleans up cookies after leaving a site. Useful but requires me to whitelist the sites where I want to stay logged in. More time-consuming than other privacy tools.

  • PrivacyPass. This is new. Privacy Pass interacts with supporting websites to introduce an anonymous user-authentication mechanism. In particular, Privacy Pass is suitable for cases where a user is required to complete some proof-of-work (e.g. solving an internet challenge) to authenticate to a service. Right now I don’t use any sites that have it, but it could be a great way to distribute “tickets” for reading articles or leaving comments.

Note on ad blocking

If you run an ad blocker, the pre-57 add-ons check is a good time to make sure that you’re not compromising your privacy by participating in a paid whitelisting scheme. As long as you have to go through your add-ons anyway, it’s a great time to ditch AdBlock Plus or Adblock. They’re taking advantage of users to shake down web sites.

What to use instead? For most people, either the built-in Firefox Tracking Protection or EFF’s Privacy Badger will provide good protection. I would try one or both of those before a conventional ad blocker. If sites have a broken ad blocker detector that falsely identifies a tracking protection tool as an ad blocker, you can usually get around it by turning off JavaScript for that site with NJS.

If you still want to get rid of more ads and join the blocker vs. anti-blocker game (I don’t), there’s always uBlock Origin, which does not do paid whitelisting. The project site has more info). But try either the built-in tracking protection or Privacy Badger first.

03 Nov 2017

World’s last web advertising optimist tells all!

It’s getting hard to explain still taking web advertising seriously in 2017, so I had better write something down. To start with, what is web advertising exactly?

Doesn’t sound good so far. Maybe I’m a fool to be the last advertising optimist on the web. (See, for example: me, running my mouth about how great advertising is, to an audience of web publishers looking to write it off and move on.)

From the point of view of users, web advertising has failed to hold up its end of the signal for attention bargain, and substituted nasty attempts at manipulation. No wonder people block it.

From the point of view of clients, web advertising has failed to meet the basic honesty standards that any third-rate print publication can. And every web advertising company is calling fraud an industry-wide problem, which is what business people say when they really don’t care about fixing something.

From the point of view of publishers, web advertising has failed to show the proverbial money. It’s stuck at a fraction of the value per user minute that print can pull in, which means that as print goes away, so does the ad money.

Web advertising has failed the audience, the advertisers, and the people who make ad-supported news and cultural works. Maybe I should go be a fan of something else, like securitizing bug trackers or something. Web advertising just is that annoying, creepy thing that browsers are competing to block in different, creative, ways. [T]he online ad sector transitioned from a creative-led industry to a data and algorithms-led industry, wrote venture capitalist Adam Fisher, who is understandably proud of not investing in it.

Some new companies, such as Scroll, are all about making it easier for readers to buy out of seeing advertising. Advertising is to web sites as annoying “UNREGISTERED SHAREWARE” banners and dialogs are to computer software.

On Twitter, what does the “verified” blue checkmark get you? A ticket out of Twitter’s world-classedly crappy advertising.

At least search advertising is working. Bob Hoffman calls it a “much better yellow pages.” But any kind of brand-building, signal-carrying advertising, where most of the money is? Not there. Ever notice how much of the evidence for “data-driven” advertising is anecdotal?

Is anyone speaking up for web advertising? Not really. Where advertising still has a policy voice, it’s a bunch of cut-and-paste anti-privacy advocacy that sounds like what you might get from eighth grade Libertarians, or from people who are so bad at math they assume that it’s humanly possible to read and understand Terms of Service from 70 third-party trackers on one web page. The Interactive Advertising Bureau has become the voice of schemes that are a few pages of fine print away from malware and spam. By expanding to include members whose interests oppose those of legit publishers and advertisers, and defending every creepy user privacy violation scheme that the worst members come up with, an organization that could have been a voice for pro-advertising policy positions has made itself meaningless. Right now the IAB is about as relevant to web advertising policy as the Tetraethyl Lead Industry Association is relevant to transportation policy.

Bad news all the way around, right? But some of us have been somewhere like this before.

Remember the operating systems market in the late 1990s?

In 1998, Unix was on the way out.

All the right-thinking people were going Windows NT.

Yes, even Tim O’Reilly, who built version 1.0 of his company on Unix, had apparently written it off. The spring 1998 O’Reilly catalog had all Windows books on the cover, and the Unix stuff was in back. O’Reilly and Associates was promoting the company’s first and only shrink-wrap software, a web server for Windows NT.

And why not? Bickering Unix vendors were doing short-sighted stunts such as removing the compiler from the basic version, and charging hard-to-justify prices for workstations and servers that users could beat with a properly-configured PC. Who needed it?

We know what happened shortly after that. The Unix scene Did anyone ever make a “Lumascape”-like chart of the Unix vendors? faded away and, with enough drama to make for good IT news coverage but not enough to interfere with successful efforts to fix the Year 2000 Problem, the Linux scene replaced it.

The good news is that people employed in the Unix scene were able to move, in most cases happily, to the Linux scene. (Which is big enough that it has become the OS for the “IoT”, “Saas” and “Cloud” businesses, and a majority of “mobile” by units, but not of course profits) So maybe my experience living through the end of Unix is why I’m still a web advertising optimist. The economic niche for advertising hasn’t gone away. Just as software had to get some important licensing and API decisions right in order to make the Linux boom happen, web advertising is so close to getting it right, too. Now that we know the basics…

  1. People have norms about data sharing. Browsers must reflect those norms or get replaced.

  2. People enjoy ad-supported news, cultural works, and services, and will tolerate ads that hold up their end of the bargain.

  3. People don’t like to micromanage their attention and privacy, and expect companies they deal with to cover the costs of coming into compliance with norms.

…the next steps are coming together pretty quickly.

01 Nov 2017

Always run a shell script from the directory it lives in

Always run a shell script in the directory in which it appears, and change back to the directory you were in when you ran it even if it fails.

trap popd EXIT
pushd $PWD
cd $(dirname "$0")

Works for me in bash. The pushd command does a cd but saves the directory where you were on a stack, and popd pops the saved directory from the stack. The trap ... EXIT is a bash way to run something when the script exits, no matter how, and dirname "$0" is the directory name of the script.

(Taken from the deploy.sh script that rebuilds and deploys this blog, so if you can read this, it works.)

26 Oct 2017

Fun with the spawn of Git and NoSQL

Hey, kids, check out the latest progress on the Attaca version control system.

What’s this? It’s basically the spawn of Git and a NoSQL database. So why would anybody want to make that? For Science, of course. A lot of research produces huge data files, and people would like to have a resilient way to collaborate on them, using commands they already know—but have it scale horizontally across large numbers of nodes, NoSQL style.

Git has the advantage that a lot of people know it, but it doesn’t really handle huge files that well. There are add-on solutions to make it work by connecting to another system for handling large files, but then you have to set up and trust two systems. And one of my favorite properties of Git is that any authorized user of a project can check the integrity of the entire project back to the beginning.

So what Attaca does is to consistently split huge files across a cluster, using cluster nodes that can be cheap VPSs, low-end servers with spinning disks, whatever. (In the test environment, nodes are just Linux containers.)

More: The architecture of Attaca, milestones, and current progress.

Next steps are to test it out with some scientific data (genomes, medical imaging, and so on), implement some more Git commands so that people can check files out and not just in, and build a (Raspberry Pi?) demo cluster.

25 Oct 2017

See you in London

Coming to Mozfest in London?

Please stop by our demo of Trading futures, fixing bugs: a live Smart Contracts installation.

What is it?

Bugmark is a market that connects people who want better software to the people who can build it.

In order to make open collabration more effective, we are using simple market mechanisms to add incentives to do useful work.

Bugmark allows you to

  1. Put financial value directly in the hands of the people who can fix the software issues that are most important to you.

  2. Discover which issues really matter to your project’s users.

  3. Work with open source practices and not against them.
    Solve part of a problem and still get paid, instead of contending to claim credit for a bounty payment.

Find an issue, fix it, and earn money

Vist Bugmark to find an open issue that matches your skills and interests. Buy a futures contract connected to that issue that will pay you when the issue is fixed. Work on the issue, in the open—then decide if you want to hold your contract until maturity, or sell it at a profit.

Report an issue and pay to reward others to fix it

Create a new issue on the project bug tracker, or select an existing one. Buy a futures contract on that issue that will cost you a known amount when the issue is fixed, or pay you to compensate you if the issue goes unfixed. Reduce your exposure to software risks by directly signaling the project participants about what issues are important to you.

Invest in futures on an open source market

Development isn’t the only task required to make a software project a success. You can trade futures to earn a profit from other vital tasks, such as clarifying and translating bug reports, triaging bugs, writing failing tests, or doing code reviews.

25 Oct 2017

ICYMI: AdLeaks

Looking for a way to get dedicated readers to un-block some of the ads on your site? One way could be to update and integrate the AdLeaks system:

Our ads contain code that encrypts an empty message with the AdLeaks public key and sends the ciphertext back to AdLeaks. This happens on all users’ web browsers. A whistleblower’s browser substitutes the ciphertext with encrypted parts of a disclosure. The protocol ensures that an adversary who can eavesdrop on the network communication cannot distinguish between the transmissions of regular browsers and those of whistleblowers’ browsers.

More info in the paper: That link goes to the Arxiv Vanity version of the paper. Now that we can read more Science on our phones I’m expecting the rate of progress toward the Singularity to increase by quite a bit. A Secure Submission System for Online Whistleblowing Platforms

Naturally sites would want to encourage whistleblowers (and others) to block the regular creepy ad trackers—but building post-creepy ads and hooking this up to them could be a way to encourage the dedicated readers to treat the high-reputation ads differently from the low-reputation ones.

22 Oct 2017

Tofu, hogs, and brand-safe news

(I work for Mozilla. None of this is secret. None of this is official Mozilla policy. Not speaking for Mozilla here.)

The following is an interesting business model, so I’m going to tell it whether it’s true or not. I once talked with a guy from rural China about the tofu business when he was there. Apparently, considering the price of soybeans and the price you can get for the tofu, you don’t earn a profit just making and selling tofu. So why do it? Because it leaves you with a bunch of soybean waste, you feed that to pigs, and you make your real money in the hog business.

Which is sort of related to the problem that (all together now) hard news isn’t brand-safe. It’s hard to sell travel agency ads on a plane crash story, or real estate ads on a story about asbestos in the local elementary schools, or any kind of ads on a disturbing, but hard to look away from, political scene.

In the old-school newspaper business, the profitable ads can go in the lifestyle or travel sections, and subsidize the hard news operation. The hard news is the tofu and the brand-friendly sections are the hogs.

On the web, though, where you have a lot of readers coming in from social sites, they might be getting their brand-friendly content from somewhere else. Sites that are popular for their hard news are stuck with just the tofu.

This is one of the places where it’s going to be interesting to watch the shift from unpermissioned user data collection to user data sharing by permission. As people get better control of how they share data with sites—whether that’s through regulation, browsers scrambling for users, or both—how will a site’s ability to deliver trustworty hard news give it an advantage?

The browser may have to adapt to treat trustworthy and untrustworthy sites differently, in order to come up with a good balance of keeping sites working and implementing user norms on data sharing. Will news sites that publish hard news stories that are often visited, shared, and commented on, get a user data advantage that translates into ad saleability for their more brand-safe pages? Does better user data control mean getting the hog business back?

19 Oct 2017

Open practices and tracking protection

(I work for Mozilla. None of this is secret. None of this is official Mozilla policy. Not speaking for Mozilla here.)

Browsers are going to have to change tracking protection defaults, just because the settings that help acquire and retain users are different from the current defaults that leave users fully trackable all the time. (Tracking protection is also an opportunity for open web players to differentiate themselves from mobile tracking devices.)

Before switching defaults, there are a bunch of opportunities to do collaboration and data collection in order to make the right choices and increase user satisfaction and trust (and retention). Interestingly enough, these tend to give an advantage to any browser that can attract a diverse, opinionated, values-driven user base.

So, as a followup on applying proposed principles for content blocking, some ways that a browser can prepare to make a move on tracking protection.

  • Build APIs that WebExtensions developers can use to change privacy-related behaviors. (WebExtension API for improved tracking protection, API for managing tracking protection, Implement browser.privacy.trackingProtection API). Use developer relations with the privacy tools scene.

  • Do innovation challenges and crowdsourcing for tracking protection tools. Use the results to expand the available APIs and built-in options.

  • Develop a variety of tracking protection methods, and ship them in a turned-off state so that motivated users can find the configuration and experiment with them, and to enable user research. Borrow approaches from other browsers (such as Apple Safari) where possible, and test them.

  • For example: avoid blocklist politics, and increase surveillance marketing uncertainty, by building Privacy-Badger-like tracker detection. Enable tracking protection without the policy implications of a top-down list. This is an opportunity for a crowdsourcing challenge: design better algorithms to detect trackers, and block them or scramble state.

  • Ship alternate experimental builds of the browser, with privacy settings turned on and/or add-ons pre-installed.

  • Communicate a lot about capabilities, values, and research. Spend time discussing what the browser can do if needed, and discussing the results of research on how users prefer to share their personal info.

  • Only communicate a little about future defaults. When asked about specifics, just say, “we’ll let the user data help us make that decision.” (Do spam filters share their filtering rules with spammers? Do search engines give their algorithms to SEO consultants?)

  • Build functionality to “learn” from the user’s activity and suggest specific settings that differ from the defaults (in either direction). For example, suggest more protective settings to users who have shown an interest in privacy—especially users who have installed any add-on whose maintainers misrepresent it as a privacy tool.

  • Do research to help legit publishers and marketers learn more about adfraud and how it is enabled by the same kinds of cross-site tracking that users dislike. As marketers better understand the risk levels of different approaches to web advertising, make it a better choice to rely less on highly intrusive tracking and more on reputation-driven placements.

  • Provide documentation and tutorials to help web developers develop and test sites that will work in the presence of a variety of privacy settings. “Does it pass Privacy Badger” is a good start, but more QA tools are needed.

If you do it right, you can force up the risks of future surveillance marketing just by increasing the uncertainty of future user trackability, and drive more marketing investment away from creepy projects and toward pro-web, reputation-driven projects.

09 Oct 2017

Evancoin and the stake problem

One of the problems with a bug futures market is: where do you get the initial investment, or “stake”, for a developer who plans to take on a high-value task?

In order to buy the FIXED side of a contract and make a profit when it matures, the developer needs to invest some cryptocurrency. In a bug futures market, it takes money to make money.

One possible solution is to use personal tokens, such as the new Evancoin. Evancoin is backed by hours of work performed by an individual (yes, his name is Evan).

If I believe that n hours of work from Evan are likely to increase the probability of a Bugmark-traded bug getting fixed, and my expected gain is greater than n * (current price of Evancoin), then I can

  1. buy the FIXED side of the Bugmark contract

  2. redeem n Evancoin for work from Evan on the bug

  3. sell my Bugmark position at a profit, or wait for it to mature.

Evan is not required to accept cryptocurrency exchange rate risk, and does not have to provide the “stake” himself. It’s the opposite—he has already sold the Evancoin on an exchange. Of course, he has an incentive to make as much progress on the bug as possible, in order to support the future price of Evancoin.

If Evan is working on the bug I selected, he would also know that he’s doing work that is likely to move the price of the Bugmark contract. So he can use some of the proceeds from his Evancoin sale to buy additional FIXED on Bugmark, and take a profit when I do.

Evan’s skills tends to improve, and my understanding of which tasks would be a profitable use of Evan’s time will tend to increase the more Evancoin I redeem. So the value of Evancoin to me is likely to continue rising. Therefore I am probably going to do best if I accumulate Evancoin in advance of identifying good bugs for Evan to work on.

01 Oct 2017

The capital dynamics are all wrong.

Ben Werdmuller, in Why open source software isn’t as ethical as you think it is:

When you release open source software, you have this egalitarian idea that you’re making it available to people who can really use it, who can then built on it to make amazing things….While this is a fine position to take, consider who has the most resources to build on top of a project that requires development. With most licenses, you’re issuing a free pass to corporations and other wealthy organizations, while providing no resources to those needy users. OpenSSL, which every major internet company depends on, was until recently receiving just $2,000 a year in donations, with the principal author in financial difficulty.

This is a good example of one of the really interesting problems of working in an immature industry. We have a similar problem in web advertising. We’re over-rewarding the ability to collect numbers that show the effectiveness of a marketing project, while under-rewarding the ability to build brand reputation. Web ads also have an opportunity to fix incentives. We don’t have our incentives hooked up right yet.

  • Why does open source have some bugs that stay open longer than careers do?

  • Why do people have the I’ve been coding to create lots of value for big companies for years and I’m still broke problem?

  • How does millions of dollars of shared vigilance even make the news, when the value extracted is in the billions?

  • Why is the meritocracy of open source even more biased than other technical and collaborative fields? (Are we at the bottom of the standings?) Why are we walking away from that many potential contributors?

Quinn Norton: Software is a Long Con:

It is to the benefit of software companies and programmers to claim that software as we know it is the state of nature. They can do stupid things, things we know will result in software vulnerabilities, and they suffer no consequences because people don’t know that software could be well-written. Often this ignorance includes developers themselves. We’ve also been conditioned to believe that software rots as fast as fruit. That if we waited for something, and paid more, it would still stop working in six months and we’d have to buy something new. The cruel irony of this is that despite being pushed to run out and buy the latest piece of software and the latest hardware to run it, our infrastructure is often running on horribly configured systems with crap code that can’t or won’t ever be updated or made secure.

We have two possible futures.

  • People finally get tired of software’s boyish antics lethal irresponsibility, and impose a regulatory regime. Rent-seekers rejoice. Software innovation as we know it ceases, and we get something like the pre-breakup Bell System—you have to be an insider to build and deploy anything that reaches real people.

  • The software scene outgrows the “disclaimer of implied warranty” level of quality, on its own.

How do we get to the second one? One approach is to use market mechanisms to help quantify software risk, then enable users with a preference for high quality and developers with a preference for high quality to interact directly, not through the filter of software companies that win by releasing early at a low quality level.

There is an opportunity here for the kinds of companies that are now doing open source license analysis. Right now they’re analyzing relatively few files in a project—the licenses and copyrights. A tool will go through your software stack, and hooray, you don’t have anything that depends on something with a consistent license, or on a license that would look bad to the people you want to see your company to.

What if that same tool would give you a better quality number for your stack, based on walking your dependency tree and looking for weak points based on market activity?

14 Sep 2017

another 2x2 chart

What to do about different kinds of user data interchange:

Data collected without permission Data collected with permission
Good data Build tools and norms to reduce the amount of reliable data that is available without permission. Develop and test new tools and norms that enable people to share data that they choose to share.
Bad data Report on and show errors in low-quality data that was collected without permission. Offer users incentives and tools that help them choose to share accurate data and correct errors in voluntarily shared data.

Most people who want data about other people still prefer data that’s collected without permission, and collaboration is something that they’ll settle for. So most voluntary user data sharing efforts will need a defense side as well. Freedom-loving technologists have to help people reduce the amount of data that they allow to be taken from them without permission in order for data listen to people about sharing data.

13 Sep 2017

Tracking protection defaults on trusted and untrusted sites

(I work for Mozilla. None of this is secret. None of this is official Mozilla policy. Not speaking for Mozilla here.)

Setting tracking protection defaults for a browser is hard. Some activities that the browser might detect as third-party tracking are actually third-party services such as single sign-on—so when the browser sets too high of a level of protection it can break something that the user expects to work.

Meanwhile, new research from Pagefair shows that The very large majority (81%) of respondents said they would not consent to having their behaviour tracked by companies other than the website they are visiting. A tracking protection policy that leans too far in the other direction will also fail to meet the user’s expectations.

So you have to balance two kinds of complaints.

  • “your dumbass browser broke a site that was working before”

  • “your dumbass browser let that stupid site do stupid shit”

Maybe, though, if the browser can figure out which sites the user trusts, you can keep the user happy by taking a moderate tracking protection approach on the trusted sites, and a more cautious approach on less trusted sites.

Apple Intelligent Tracking Prevention allows third-party tracking by domains that the user interacts with.

If the user has not interacted with example.com in the last 30 days, example.com website data and cookies are immediately purged and continue to be purged if new data is added. However, if the user interacts with example.com as the top domain, often referred to as a first-party domain, Intelligent Tracking Prevention considers it a signal that the user is interested in the website and temporarily adjusts its behavior (More…)

But it looks like this could give large companies an advantage—if the same domain has both a service that users will visit and third-party tracking, then the company that owns it can track users even on sites that the users don’t trust. Russell Brandom: Apple’s new anti-tracking system will make Google and Facebook even more powerful.

It might makes more sense to set the trust level, and the browser’s tracking protection defaults, based on which site the user is on. Will users want a working “Tweet® this story” button on a news site they like, and a “Log in with Google” feature on a SaaS site they use, but prefer to have third-party stuff blocked on random sites that they happen to click through to?

How should the browser calculate user trust level? Sites with bookmarks would look trusted, or sites where the user submits forms (especially something that looks like an email address). More testing is needed, and setting protection policies is still a hard problem.

Bonus link: Proposed Principles for Content Blocking.

12 Sep 2017

New WebExtension reveals targeted political ads: Interview with Jeff Larson

The investigative journalism organization ProPublica is teaming up with three German news sites to collect political ads on Facebook in advance of the German parliamentary election on Sept. 24.

Because typical Facebook ads are shown only to finely targeted subsets of users, the best way to understand them is to have a variety of users cooperate to run a client-side research tool. ProPublica developer Jeff Larson has written a WebExtension, that runs on Mozilla Firefox and Google Chrome, to do just that. I asked him how the development went.

Q: Who was involved in developing your WebExtension?

A: Just me. But I can’t take credit for the idea. I was at a conference in Germany a few months ago with my colleague Julia Angwin, and we were talking with people who worked at Spiegel about our work on the Machine Bias series. We all thought it would be a good idea to look at political ads on Facebook during the German election cycle, given what little we knew about what happened in the U.S. election last year.

Q: What documentation did you use, and what would you recommend that people read to get started with WebExtensions?

A: I think both Mozilla and Google’s documentation sites are great. I would say that the tooling for Firefox is much better due to the web-ext tool. I’d definitely start there (Getting started with web-ext) the next time around.

Basically, web-ext takes care of a great deal of the fiddly bits of writing an extension—everything from packaging to auto reloading the extension when you edit the source code. It makes the development process a lot more smooth.

Q: Did you develop in one browser first and then test in the other, or test in both as you went along?

A: I started out in Chrome, because most of the users of our site use Chrome. But I started using Firefox about halfway through because of web-ext. After that, I sort of ping ponged back and forth because I was using source maps and each browser handles those a bit differently. Mostly the extension worked pretty seamlessly across both browsers. I had to make a couple of changes but I think it took me a few minutes to get it working in Firefox, which was a pleasant surprise.

Q: What are you running as a back end service to collect ads submitted by the WebExtension?

A: We’re running a Rust server that collects the ads and uploads images to an S3 bucket. It is my first Rust project, and it has some rough edges, but I’m pretty much in love with Rust. It is pretty wonderful to know that the server won’t go down because of all the built in type and memory safety in the language. We’ve open sourced the project, I could use help if anyone wants to contribute: Facebook Political Ad Collector on GitHub.

Q: Can you see that the same user got a certain set of ads, or are they all anonymized?

A: We strive to clean the ads of all identifying information. So, we only collect the id of the ad, and the targeting information that the advertiser used. For example, people 18 to 44 who live in New York.

Q: What are your next steps?

A: Well, I’m planning on publishing the ads we’ve received on a web site, as well as a clean dataset that researchers might be interested in. We also plan to monitor the Austrian elections, and next year is pretty big for the U.S. politically, so I’ve got my work cut out for me.

Q: Facebook has refused to release some “dark” political ads from the 2016 election in the USA. Will your project make “dark” ads in Germany visible?

A: We’ve been running for about four days, and so far we’ve collected 300 political ads in Germany. My hope is we’ll start seeing some of the more interesting ones from fly by night groups. Political advertising on sites like Facebook isn’t regulated in either the United States or Germany, so on some level just having a repository of these ads is a public service.

Q: Your project reveals the “dark” possibly deceptive ads in Chrome and Firefox but not on mobile platforms. Will it drive deceptive advertising away from desktop and toward mobile?

A: I’m not sure, that’s a possibility. I can say that Firefox on Android allows WebExtensions and I plan on making sure this extension works there as well, but we’ll never be able to see what happens in the native Facebook applications in any sort of large scale and systematic way.

Q: Has anyone from Facebook offered to help with the project?

A: Nope, but if anyone wants to reach out, I would love the help!

Thank you.

Get the WebExtension

11 Sep 2017

Some ways that bug futures markets differ from open source bounties

Question about A Trading Market for Prices in Peer Production: what’s the difference between a futures market on software bugs and an open source bounty system connected to the issue tracker? In many simple cases a bug futures market will function in a similar way, but we predict that some qualities of the futures market will make it work differently.

  • Open source bounty systems have extra transaction costs of assigning credit for a fix.

  • Open source bounty systems can incentivize contention over who can submit a complete fix, when we want to be able to incentivize partial work and meta work.

Incentivizing partial work and meta work (such as bug triage) would be prohibitively expensive to manage using bounties claimed by individuals, where each claim must be accepted or rejected. The bug futures concept addresses this with radical simplicity: the owners of each side of the contract are tracked completely separately from the reporter and assignee of a bug in the bug tracker.

And bug futures contracts can be traded in advance of expiration. Any work that you do that meaningfully changes the probability of the bug getting fixed by the contract closing date can move the price.

You might choose to buy the “fixed” side of the contract, do some work that makes it look more fixable, sell at a higher price. A futures market might make it practical to do “day trading” of small steps, such as translating a bug report originally posted in a language that the developers don’t know, helping a user submit a log file, or writing a failing test.

With the right market design, participants in a bug futures market have the incentive to talk their books, by sharing partial work and metadata.

03 Sep 2017

JavaScript and not kicking puppies

(Updated 10 Nov 2022: ended up actually removing Twitter tracking script. Still not worried about Twitter, too much, just worried about whoever ends up with their assets.)

(Updated 4 Sep 2017: add screenshot and how to see the warning.)

Advice from yan, on Twitter:

if your site embeds tweets, add <meta name="twitter:dnt" content="on"> so that Twitter doesn't track your visitors — yan (@bcrypt) September 1, 2017

I decided not to do that for this site.

Yes, user tracking is creepy, and yes, collecting user information without permission is wrong. But read on for what could be a better approach for sites that can make a bigger difference.

First of all, Twitter is so far behind in their attempts to do surveillance marketing that they’re more funny and heartening than ominous. If getting targeted by one of the big players is like getting tracked down by a pack of hunting dogs, then Twitter targeting is like watching a puppy chew on your sock. Twitter has me in their database as…

  • Owner of eight luxury cars and a motorcycle.

  • Medical doctor advising patients about eating High Fructose Corn Syrup.

  • Owner of prime urban real estate looking for financing to build a hotel.

  • Decision-maker for a city water system, looking to read up on the pros and cons of cast iron and concrete pipes.

  • Active in-market car shopper, making all decisions based on superficial shit like whether the car has Beats® brand speakers in the doors. (Hey, where am I supposed to park car number 9?)

Advice from “me” as I appear on Twitter: As your doctor, I advise you to cut out HFCS entirely unless you’re at a family thing where you should just eat a little and not be an ass about it. When you’re in town, stay at my hotel, where the TV is a 4k monitor on an arm that moves to make it usable from the sit-stand desk, and the WiFi is fast and free. No idea on the city water pipe thing though.

So if Twitter is the minor leagues of creepy, and they probably won’t be something we have to worry about for long anyway, maybe we can think about whether there’s anything that sites can do about riskier kinds of tracking. Getting a user protected from being tracked by one Tweet is a start. But helping users get started with client-side privacy tools that protect from Twitter tracking everywhere can help with not just Twitter tracking, but with the serious trackers that show up in other places.

Blocking Twitter tracking: like kicking a puppy?

Funny wrong Twitter ad targeting is one of my reliable Internet amusements for the day. But that’s not why I’m not especially concerned with tagging quoted Tweets. Just doing that doesn’t protect this site’s visitors from retargeting schemes on other sites.

And every time someone clicks on a retargeted ad from a local business on a social site (probably Facebook, since more people spend more time there) then that’s 65 cents or whatever of marketing money that could have gone to local news, bus benches, Little League, or some other sustainable, signal-carrying marketing project. (That’s not even counting the medium to heavy treason angle that makes me really uncomfortable about seeing money move in Facebook’s direction.)

So, instead of messing with quoted Tweet tagging, I set up this script:

warn3p.js

This will load the Aloodo third-party tracking detector, and, if the browser shows up as easily trackable from site to site, switch out the page header to nag the user.

screenshot of tracking warning

(If you are viewing this site from an unprotected browser and still not seeing the warning, it means that your browser has not yet visited enough domains with the Aloodo script to detect that you’re trackable. Take a tracking protection test to expose your browser to more fake tracking, then try again.)

If the other side wants it hidden, then reveal it

Surveillance marketers want tracking to happen behind the scenes, so make it obvious. If you have a browser or privacy tool that you want to recommend, it’s easy to put in the link. Every retargeted ad impression that’s prevented from happening is more marketing money to pay for ad-sponsored resources that users really want. I know I can’t get all the users of this site perfectly protected from all surveillance marketing everywhere, but hey, 65 cents is 65 cents.

Bonus tweet

Bob Hoffman’s new book is out! Go click on this quoted Tweet, and do what it says.

Today's Newsletter (special clickbait edition) GUNFIGHT AT THE AD TECH SALOON https://t.co/OI9JXs1hbM

— adcontrarian (@AdContrarian) September 3, 2017

30 Aug 2017

Some ways that bug futures markets differ from prediction markets

What’s the difference between a futures market on software bugs and a prediction market? We don’t know how much a bug futures market will tend to act like a prediction market, but here are a few guesses about how it may turn out differently.

Prediction markets tend to have a relatively small number of tradeable questions, with a large number of market participants on each side of each question. Each individual bug future is likely to have a small number of participants, at least on the “fixed” side.

Prediction markets typically have participants who are not in a position to influence the outcome. For example, The Good Judgment Project recruited regular people to trade on worldwide events. Bug futures are designed to attract participants who have special knowledge and ability to change an outcome.

Prediction markets are designed for gathering knowledge. Bug futures are for incentivizing tasks. A well-designed bug futures market will monetize haters by turning a “bet” that a project will fail into a payment that makes it more likely to succeed. If successful in this, the market will have this feature in common with Alex Tabarrok’s Dominant Assurance Contract.

Prediction markets often implement conditional trading. Bug markets rely on the underlying bug tracker to maintain the dependency relationships among bugs, and trades on the market can reflect the strength of the connections among bugs as seen by the participants.

29 Aug 2017

hey, kids, 2x2 chart!

What’s the difference between spam and real advertising?

No signaling Signaling
Interruption spam advertising
No interruption organic social content marketing

Advertising is a signal for attention bargain. People pay attention to advertising that carries some hard-to-fake information about the seller’s intentions in the market.

Rory Sutherland says, What seems undoubtedly true is that humans, like peahens, attach significance to a piece of communication in some way proportionally to the cost of generating or transmitting it.

If I get spam email, that’s clearly signal-free because it costs practically nothing. If I see a magazine ad, it carries signal because I know that it cost money to place.

Today’s web ads are more like spam, because they can be finely targeted enough that no significant advertiser resources stand behind the message I’m looking at. (A bot might have even written the copy.) People don’t have to be experts in media buying to gauge the relative costs of different ads, and filter out the ones that are clearly micro-targeted and signal-free.

27 Aug 2017

Want to lose a hacking contest or win a reputation contest?

Doc Searls: How the personal data extraction industry ends.

Our data, and data about us, is the crude that Facebook and Google extract, refine and sell to advertisers. This by itself would not be a Bad Thing if it were done with our clearly expressed (rather than merely implied) permission, and if we had our own valves to control personal data flows with scale across all the companies we deal with, rather than countless different valves, many worthless, buried in the settings pages of the Web’s personal data extraction systems, as well as in all the extractive mobile apps of the world.

Today’s web advertising business is a hacking contest. Whoever can build the best system to take personal information from the user wins, whether or not the user knows about it. (And if you challenge adfraud and adtech hackers to a hacking contest, you can expect to come in third.)

As users get the tools to control who they share their information with (and they don’t want to leak it to everyone) then the web advertising business has to transform into a reputation contest. Whoever can build the most trustworthy place for users to choose to share their information wins.

This is why the IAB is freaking out about privacy regulations, by the way. IAB member companies are winning at hacking and failing at building reputation. (I want to do a user focus group where we show people a random IAB company’s webinar, then count how many participants ask for tracking protection support afterward.) But regulations are a sideshow. In the long run regulators will support the activities that legit business needs. So Doc has an important point. We have a big opportunity to rebuild important parts of the web advertising stack, this time based on the assumption that you only get user data if you can convince the user, or at least convince the maintainers of the user’s trusted tools, that you will use the data in a way that complies with that user’s norms.

One good place to check is: how many of a site’s readers are set up with protetcion tools that make them “invisible” to Google Analytics and Chartbeat? (script) And how many of the “users” who sites are making decisions for are just bots? If you don’t have good answers for those, you get dumbassery like “pivot to video” which is a polite expression for “make videos for bots because video ad impressions are worth enough money to get the best bot developers interested.”

Yes, “pivot to video” is still a thing, even though

News from the “pivot to video” department, by Lara O’Reilly, at the Wall Street Journal:

Google is issuing refunds for ads that ran on websites with fake traffic…

Google’s refunds amount to only a fraction of the cost of the ads served to invalid traffic, which has left some advertising executives unsatisfied…

In the recent cases Google discovered, the affected traffic involved video ads, which carry higher ad rates than typical display ads and are therefore an attractive target for fraudsters.

(read the whole thing. If we’re lucky, Bob Hoffman will blog about that story. “Some advertising executives unsatisfied”? Gosh, Bob, you think so?)

The good news here is that legit publishers, trying to transform web advertising from a hacking game into a reputation game, don’t have to do a perfect job right away. Incrementally make reputation-based, user-permissioned advertising into a better and better investment, while adfraud keeps making unpermissioned tracking into a worse and worse investment. Then wait for some ambitious marketer (and marketers are always looking for a new angle to reinvent Marketing) to discover the opportunity and take credit for it.

Anyway, bonus links.

Facebook Figured Out My Family Secrets, And It Won't Tell Me How

This App Tracks Political Ads To See Who Is Targeting Your Vote–And Why

Designers are using “dark UX” to turn you into a sleep-deprived internet addict

AdTech Weekly - Issue 53: Librarians care more about your privacy than most. - Aug 18th 2017

Rise of the racist robots – how AI is learning all our worst impulses

Getting To The Holy Grail: How Publishers Measure The Incremental Value Of Ad Tech Partners

Linguistic data analysis of 3 billion Reddit comments shows the alt-right is getting stronger

Let’s Talk About The Brand Safety Tax

The state of the brand crackdown on media transparency

Brands are now blacklisting mainstream news sites, including Fox News

Data-hucksters beware: online privacy is returning | John Naughton

Remember that Norwegian site that made readers take a quiz before commenting? Here’s an update on it

22 Aug 2017

List-based and behavior-based tracking protection

(update 13 Mar 2020: Remove link to the old Self-Destructing Cookies extension and add a link to the new Cookie AutoDelete extension. Both do about the same thing but Cookie AutoDelete is maintained on current browsers.)

In the news…

User privacy is at risk from both hackers and lawyers. Right now, lawyers are better at attacking lists, and hackers are better at modifying tracker behavior to get around protections.

The more I think about it, the more that I think it’s counterproductive to try to come up with one grand unified set of protection rules or cookie policies for everybody.

Spam filters don’t submit their scoring rules to ANSI—spammers would just work around them.

Search engines don’t standardize and publish their algorithms, because gray hat SEOs would just use the standard to make useless word salad pages that score high.

And different people have different needs.

If you’re a customer service rep at an HERBAL ENERGY SUPPLEMENTS company, you need a spam filter that can adjust for your real mail. And any user of a site that has problems with list-based tracking protection will need to have the browser adjust, and rely more on cleaning up third-party state after a session instead of blocking outright.

Does your company intranet become unusable if you fail to accept third-party tracking that comes from an internal domain that your employer acquired and still has some services running on? Browser developers can’t decide up front, so the browser will need to adjust. Every change breaks someone’s workflow.

That means the browser has to work to help the user pick a working set of protection methods and rules.

0. Send accurate Do Not Track

Inform sites of the user’s preferences on data sharing. (This will be more important in the future because Europe, but privacy-crazed Eurocrats will not save us from having to do our share of the work.

1. Block connections to third-party trackers

This will need to include both list-based protection and monitoring tracking behavior, like Privacy Badger, because hackers and lawyers are good at getting around different ones.

2. Limit data sent to third-party sites

Apple Safari does this, so it’s likely to get easier to do cookie double keying without breaking sites.

3. Scramble or delete unsafe data

If a tracking cookie or other identifier does get through, delete or scramble it on leaving the site or later, as the Cookie AutoDelete extension does. This could be a good backup for when the browser “learns” that a user needs some third-party state to do something like a shopping cart or comment form, but then doesn’t want the info to be used for “ads that follow me around” later.

20 Aug 2017

How is everyone’s tracking protection working? An update

When I set up this blog, I put in a script to check how many of the users here are protected from third-party tracking.

The best answer for now is 31%. Of the clients that ran JavaScript on this site over the past two weeks, 31% did not also run JavaScript from the Aloodo “fake third-party tracker”.

The script is here: /code/check3p.js

This is not as good as I had hoped (turn on your tracking protection, people! Don’t get tricked by ad blockers that leave you unprotected by default!) but it’s a start.

The Information Trust Exchange is doing research on the problem of third-party tracking at news sites. News industry consultant Greg Swanson:

All of the conversations on the newspaper side have been focused on how can we join the advertising technology ecosystem. For example, how can a daily newspaper site in Bismarck, North Dakota deliver targeted advertising to a higher-value soccer mom? And none of the newspapers them have considered the fact that when they join that ecosystem they are enabling spam sites, fraudulent sites – enabling those sites to get a higher CPM rate by parasitically riding on the data collected from the higher-value newspaper sites.

More info: Aloodo for web publishers.

19 Aug 2017

SEO hats and the browser of the future

The field of Search Engine Optimization has white hat SEO, black hat SEO, and gray hat SEO.

White hat SEO helps a user get a better search result, and complies with search engine policies. Examples include accurately using the same words that users search on, and getting honest inbound links.

Black hat SEO is clearly against search engine policies. Link farming, keyword stuffing, cloaking, and a zillion other schemes. If they see you doing it, your site gets penalized in search results.

Gray hat SEO is everything that doesn’t help the user get a better search result, but technically doesn’t violate a search engine policy.

Most SEO experts advise you not to put a lot of time and effort into gray hat, because eventually the search engines will notice your gray hat scheme and start penalizing sites that do it. Gray hat is just stuff that’s going to be black hat when the search engines figure it out.

Adtech has gray hat, too. Rocket Fuel Awarded Two Patents to Help Leverage First-Party Cookies to More Meaningfully Reach Consumers.

This scheme seems to be intended to get around existing third-party cookie protection, which is turned on by default in Apple Safari and available in other browsers.

But how long will it work?

Maybe the browser of the future won’t run a “kangaroo cookie court” but will ship with a built-in “kangaroo law school” so that each copy of the browser will develop its own local “courts” and its own local “case law” based on the user’s choices. It will become harder to predict how long any single gray hat adtech scheme will continue working.

In the big picture: in order to sell advertising you need to give the advertiser some credible information on who the audience is. Since the “browser wars” of the 1990s, most browsers have been bad at protecting personal information about the user, so web advertising has become a game where a whole bunch of companies compete to covertly capture as much user info as they can.

Today, browsers are getting better at implementing people’s preferences about sharing their information. The result is a change in the rules of the game. Investment in taking people’s personal info is becoming less rewarding, as browsers compete to reflect people’s preferences. (That patent will be irrelevant thanks to browser updates long before it expires.)

Adfraud is the other half of this story. Fraudbots are getting smarter at creating human-looking ad impressions just as humans are getting better protected. If you think that a web publisher’s response to harder-to-detect bots, viewing more high-CPM video ads, should be “pivot to video!!1!!” I don’t know if I can help you.

And investments in building sites and brands that are trustworthy enough for people to want to share their information will tend to become more rewarding. (This shift naturally leads to complaints from people who are used to winning the old game, but will probably be better for customers who want to use trustworthy brands and for people who want to earn money by making ad-supported news and cultural works.)

18 Aug 2017

cdparanoia returned code 73

Welcome, people googling for the above error message.

I saw the error

cdparanoia returned code 73

and it turns out I was trying to run two abcde processes in two terminal windows. Kill the second one and the error goes away.

Hope your problem was as simple as that.

16 Aug 2017

ePrivacy and marketing budgets

(Update 18 Aug 2017: this post is also available at Digital Content Next.)

As far as I know, there are three ways to match an ad to a user.

User intent: Show an ad based on what the user is searching for. Old-school version: the Yellow Pages.

Context: Show an ad based on where the user is, or what the user is interested in. Old-school versions: highway billboards (geographic context), specialized magazines (interest context).

User identity: Show an ad based on who the user is. Old-school version: direct mail.

Most online advertising is matched to the user based on a mix of all three. And different players have different pieces of the action for each one. For user intent, search engines are the gatekeepers. The other winners from matching ads to users by intent are browsers and mobile platforms, who get paid to set their default search engine. Advertising based on context rewards the owners of reputations for producing high-quality news, information, and cultural works. Finally, user identity now has a whole Lumascape of vendors in a variety of categories, all offering to help identify users in some way. (the Lumascape is rapidly consolidating, but that’s another story.)

Few of the web ads that you might see today are matched to you purely based on one of the three methods. Investments in all three tend to shift as the available technology, and the prevailing norms and laws, change.

Enough background.

Randall Rothenberg of the IAB is concerned about the proposed ePrivacy Regulation in Europe, and writes,

The basic functionality of the internet, which is built on data exchanges between a user’s computer and publishers’ servers, can no longer be used for the delivery of advertising unless the consumer agrees to receive the ads – but the publisher must deliver content to that consumer regardless.

This doesn’t look accurate. I don’t know of any proposal that would require publishers to serve users who block ads entirely. What Rothenberg is really complaining about is that the proposed regulation would limit the ability of sites and ad intermediaries to match ads to users based on user identity, forcing them to rely on user intent and context. If users choose to block ads delivered from ad servers that use their personal data without permission, then sites won’t be able to refuse to serve them the content, but will be able to run ads that are relevant to the content of the site. As far as I can tell, sites would still be able to pop a “turn off your ad blocker” message in place of a news story if the user was blocking an ad placed purely by context, magazine style.

Privacy regulation is not so much an attack on the basic functionality of the Internet, as it is a shift that lowers the return on investment on knowing who the user is, and drives up the return on investment on providing search results and content. That’s a big change in who gets paid: more money for search and for trustworthy content brands, and less for adtech intermediaries that depend on user tracking.

Advertising: a fair deal for the user?

That depends. Search advertising is clearly the result of a user choice. The user chooses to view ads that come with search results, as part of choosing to do a search. As long as the ads are marked as ads, it’s pretty obvious what is happening.

The same goes for ads placed in context. The advertiser trades economic signal, in the form of costly support of an ad-supported resource, for the user’s attention. This is common in magazine and broadcast advertising, and when you use a site with one of the (rare) pure in-context ad platforms such as Project Wonderful, it works about the same way.

The place where things start to get problematic is ads based on user identity, placed by tracking users from site to site. The more that users learn how their data is used, the less tracking they tend to want. In one survey, 66% of adult Americans said they do not want marketers to tailor advertisements to their interests, and when the researchers explained how ad targeting works, the percentage went up.

If users, on average, dislike tracking enough that sites choose to conceal it, then that’s pretty good evidence that sites should probably ask for permission to do it. Whether this opt-in should be enforced by law, technology, or both is left as an exercise for the reader.

So what happens if, thanks to new regulations, technical improvements in browsers, or both, cross-site tracking becomes harder? Rothenberg insists that this transformation would end ad-supported sites, but the real effects would be more complex. Ad-supported sites are already getting a remarkably lousy share of ad budgets. “The supply chain’s complexity and opacity net digital advertisers as little as 30 cents to 40 cents of working media for every dollar spent,” ANA CEO Bob Liodice said.

Advertising on high-reputation sites tends to be a better investment than using highly intermediated, fraud-prone, stacks of user tracking to try to chase good users to cheap sites. But crap ad inventory, including fraudulent and brand-unsafe stuff, persists. The crap only has market value because of user tracking, and it drives down the value of legit ads. If browser improvements or regulation make knowledge of user identity rarer, the crap tends to leave the market and the value of user intent and context go up.

Rothenberg speaks for today’s adtech, which despite all its acronyms and Big Data jive, is based on a pretty boring business model: find a user on a legit site, covertly follow the user to a crappy site where the ads are cheaper, sell an ad impression there, profit. Of course he’s entitled to make the case for enabling IAB members to continue to collect their “adtech tax.” But moving ad budgets from one set of players to another doesn’t end ad-supported sites, because marketers adjust. That’s what they do. There’s always something new in marketing, and budgets move around. What happens when privacy regulations shift the incentives, and make more of advertising have to depend on trustworthy content? That’s the real question here.

08 Aug 2017

Moral values in society

Moral values in society are collapsing? Really? Elizabeth Stoker Bruenig writes, The baseline moral values of poor people do not, in fact, differ that much from those of the rich. (read the whole thing).

Unfortunately, if you read the fine print, it’s more complicated than that. Any market economy depends on establishing trust between people who trade with each other. Tim Harford writes,

Being able to trust people might seem like a pleasant luxury, but economists are starting to believe that it’s rather more important than that. Trust is about more than whether you can leave your house unlocked; it is responsible for the difference between the richest countries and the poorest.

Somehow, over thousands of years, business people have built up a set of norms about high-status and low-status business activities. Craftsmanship, consistent supply of high-quality staple goods, and construction of noteworthy projects are high-status activities. Usury and deception are examples of low-status activities. (You make your money in quarters, gambling with retired people? You lend people $100 until Friday at a 300% interest rate? No club invitation for you.)

Somehow, though, that is now changing in the USA. Those who earn money through deception now have seats at the same table as legitimate business. Maybe it started with the shift into “consumer credit” by respectable banks. But why were high-status bankers willing to play loan shark to begin with? Something had to have been building, culturally. (It started too early to blame the Baby Boomers.)

We tend to blame information technology companies for complex, one-sided Terms of Service and EULAs, but it’s not so much a tech trend as it is a general business culture trend. It shows up in tech fast, because rapid technology change provides cover and concealment for simultaneous changes in business terms. US business was rapidly losing its connection to basic norms when it was still moving at the speed of FedEx and fax. (You can’t say, all of a sudden, “car crashes in existing fast-food drive-thrus are subject to arbitration in Unfreedonia” but you can stick that kind of term into a new service’s ToS.) There’s some kind of relativistic effect going on. Tech bros just seem like bigger douchebags because they’re moving faster.

Regulation isn’t the answer. We have a system in which business people can hire lobbyists to buy the laws and regulations we want. The question is whether we’re going to use our regulatory capture powers in a shortsighted, society-eroding hustler way, or in a conservative way. Economic conservatism means not just limiting centralized state control of capital, but preserving the balance among all the long-standing stewards of capital, including households, municipalities, and religious and educational institutions. Economic conservatism and radical free-marketism are fundamentally different.

People blame trashy media for the erosion of norms among the poor, so let’s borrow that explanation for the erosion of norms among the rich as well. Maybe our problem with business norms results from the globablization and sensationalism of business media. Joe CEO isn’t just the most important corporate leader of Mt. Rose, MN, any more—on a global scale he’s just another broke-ass hustler.

06 Aug 2017

Pragmatists for copyleft, or, corporate hive minds don’t accept software licenses

One of the common oversimplifications in discussing open-source software licenses is that copyleft licenses are “idealistic” while non-copyleft licenses are “pragmatic.” But that’s not all there is to it.

The problem is that most people redistributing licensed code are doing so in an organizational context. And no human organization is a hive mind where those who participate within it subordinate their goals to that of the collective. Human organizations are full of of people with their own motivations.

Instead of treating the downstrem developer’s employer as a hive mind, it can be more producive to assume good faith on the part of the individual who intends to contribute to the software, and think about the license from the point of view of a real person.

Releasing source for a derivative work costs time and money. The well-intentioned “downstream” contributor wants his or her organization to make those investments, but he or she has to make a case for them. The presence of copyleft helps steer the decision in the right direction. Jane Hacker at an organization planning to release a derivative work can say, matter-of-factly, “we need to comply with the upstream license” if copyleft is involved. The organization is then more likely to do the right thing. There are always violations, but the license is a nudge in the right direction.

(The extreme case is university licensing offices. University-owned software patents can exclude a graduate student from his or her own project when the student leaves the university, unless he or she had the foresight to build it as a derivative work of something under copyleft.)

Copyleft isn’t a magic commons-building tool, and it isn’t right for every situation. But it can be enough to push an organization over the line. (One place where I worked had to a do a source release for one dependency licensed under GPLv2, and it turned out to be easist to just build one big source code release with all the dependencies in it, and offer that.)

06 Aug 2017

More random links

Not the Google story everyone is talking about, but related: Google Is Matching Your Offline Buying With Its Online Ads, But It Isn’t Sharing How. (If a company becomes known for doing creepy shit, it will get job applications from creepy people, and at a large enough company some of them will get hired. Related: The Al Capone theory of sexual harassment)

Least surprising news story ever: The Campaign Against Facebook And Google's Ad "Duopoly" Is Going Nowhere Independent online publishers can’t beat the big surveillance marketing companies at surveillance marketing? How about they try to beat Amazon and Microsoft at cloud services, or Apple and Lenovo at laptop computers? There are possible winning strategies for web publishers, but doing the same as the incumbents with less money and less data is not one of them.

Meanwhile, from an investor point of view: It’s the Biggest Scandal in Tech (and no one’s talking about it) Missing the best investment advice: get out of any B-list adtech company that is at risk of getting forced into a low-value acquisition by a sustained fraud story. Or short it and research the fraud story yourself.

Did somebody at The Atlantic get a loud phone notification during a classical music concert or something? Your Smartphone Reduces Your Brainpower, Even If It's Just Sitting There and Have Smartphones Destroyed A Generation?, by Jean M. Twenge, The Atlantic

Good news: Math journal editors resign to start rival open-access journal

Apple’s Upcoming Safari Changes Will Shake Up Ad Tech: Not surprisingly, Facebook and Amazon are the big winners in this change. Most of their users come every day or at least every week. And even the mobile users click on links often, which, on Facebook, takes them to a browser. These companies will also be able to buy ad inventory on Safari at lower prices because many of the high-dollar bidders will go away. A good start by Apple, but other browsers can do better. (Every click on a Facebook ad from a local business is $0.65 of marketing money that’s not going to local news, Little League sponsorships, and other legit places.)

Still on the upward slope of the Peak Advertising curve: Facebook 'dark ads' can swing political opinions, research shows

You’re more likely to hear from tech employers if you have one of these 10 things on your resume (and only 2 of them are proprietary. These kids today don’t know how good they have it.)

The Pac-Man Rule at Conferences

How “Demo-or-Die” Helped My Career

05 Aug 2017

Hey kids, favicon!

Finally fixed those 404s from browsers looking for favicon.ico on this blog.

  1. Google image search for images where “reuse with modification” is allowed.

  2. Found this high-quality lab mouse SVG image.

  3. Opened it in GNU Image Manipulation Program, posterized, cropped to a square. Kept the transparent background.

  4. Just went to realfavicongenerator.net and did what it says, and added the resulting images and markup to the site.

That’s about it. Now there’s a little mouse in the browser tab (and it should do the right thing with the icons if someone pins it to their home screen on mobile.)

01 Aug 2017

Why surveillance marketers don’t worry about GDPR (but privacy nerds should)

A lot of privacy people these days sound like a little kid arguing with a sibling. You’re going to be in big trouble when Dad gets home!

Dad, here, is the European Union, who’s going to put the General Data Protection Regulation foot down, and then, oh, boy, those naughty surveillance marketers are going to catch it, and wish that they had been listening to us about privacy all along.

Right?

But Internet politics never works like that. Sure, European politicians don’t want to hand over power to the right-wing factions who are better at surveillance marketing than they are. And foreign agents use Facebook (and other US-based companies) to attack legit political systems. But that stuff is not going to be enough to save GDPR.

The problem is that perfectly normal businesses are using GDPR-violating sneaky tracking pixels and other surveillance marketing as part of their daily marketing routine.

As the GDPR deadline approaches, surveillance marketers in Europe are going to sigh and painstakingly explain to European politicians that of course this GDPR thing isn’t going to work. “You see, politicians, it’s an example of political overreach that completely conflicts with technical reality.” European surveillance marketers will use the same kind of language about GDPR that the freedom-loving side used when we talked about the proposed CBDTPA. It’s just going to Break the Internet! People will lose their jobs!

The result is predictable. GDPR will be delayed, festooned with exceptions, or both, and the hoped-for top-down solution to privacy problems will not come. There’s no shortcut. We’ll only get a replacement for surveillance marketing when we build the tools, the networks, the business processes, the customer/voter norms, and then the political power.

29 Jul 2017

Extracting just the audio from big video files

Update 24 Aug 2017: How to get the big video file from an Air Mozilla page.

  1. Sign in if needed and go to the page with the video on it.

  2. Control-I to open the page info window.

  3. Open the “Media” tab in the page info window, and find the item with type “Video”.

  4. Click “Save As” to save the video.

Got a big video, and want a copy of just the audio for listening on a device with limited storage? Use Soundconverter.

soundconverter -b -m mp3 -s .mp3 long-video.webm

(MP3 patents are expired now, hooray! I’m just using MP3 here because if I get a rental car that lets me plug in a USB stick for listening, the MP3 format is most likely to be supported.)

Soundconverter has a GUI but you can use -b for batch mode from the shell. soundconverter --help for help. You do need to set both the MIME type, with -m, and the file suffix, with -s.

28 Jul 2017

Online ads don’t matter to P&G

In the news: P&G Cuts More Than $100 Million in ‘Largely Ineffective’ Digital Ads

Not surprising.

Proctor & Gamble makes products that help you comply with widely held cleanliness norms.

Digital ads are micro-targeted to you as an individual.

That’s the worst possible brand/medium fit. If you don’t know that the people who expect you to keep your house or body clean are going to be aware of the same product, how do you know whether to buy it?

Bonus link from Bob Hoffman last year: Will The P&G Story Bring Down Ad Tech? Please?

26 Jul 2017

Got a reply from Twitter

I thought it would be fun to try Twitter ads, and, not surprisingly, I started getting fake followers pretty quickly after I started a Twitter follower campaign.

Since I’m paying nine cents a head for these followers, I don’t want to get ripped off. So naturally I put in a support ticket to Twitter, and just heard back.

Thanks for writing in about the quality of followers and engagements. One of the advantages of the Twitter Ads platform is that any RTs of your promoted ads are sent to the retweeting account’s followers as an organic tweet. Any engagements that result are not charged, however followers gained may not align with the original campaign’s targeting criteria. These earned followers or engagements do show in the campaign dashboard and are used to calculate cost per engagement, however you are not charged for them directly.

Twitter also passes all promoted engagements through a filtering mechanism to avoid charging advertisers for any low-quality or invalid engagements. These filters run on a set schedule so the engagements may show in the campaign dashboard, but will be deducted from the amount outstanding and will not be charged to your credit card.

If you have any further questions, please don’t hesitate to reply.

That’s pretty dense San Francisco speak, so let me see if I can translate to the equivalent for a normal product.

Hey, what are these rat turds doing in my raisin bran?

Thanks for writing in about the quality of your raisin bran eating experience. One of the advantages of the raisin bran platform is that during the production process, your raisin bran is made available to our rodent partners as an organic asset.

I paid for raisin bran, so why are you selling me raisin-plus-rat-turds bran?

Any ingredients that result from rodent engagement are not charged, however ingredients gained may not align with your original raisin-eating criteria.

Can I have my money back?

We pass all raisin bran sales through a filtering mechanism to avoid charging you for invalid ingredients. The total weight of the product, as printed on the box, includes these ingredients, but the weight of invalid ingredients will be deducted from the amount charged to your credit card.

So how can I tell which rat turds are “organic” so I’m not paying for them, and which are the ones that you just didn’t catch and are charging me for?

(?)

Buying Twitter followers: Fiverr or Twitter?

On Fiverr, Twitter followers are about half a cent each ($5/1000). On Twitter, I’m gettting followers for about 9 cents each. The Twitter price is about 18x the Fiverr price.

But every follower that someone else buys on Fiverr has to be “aged” and disguised in order to look realistic enough not to get banned. The bot-herders have to follow legit follower campaigns such as mine and not just their paying customers.

(I call them "sleepers." They do all sorts of natural things (following suggested accounts, tweeting quotes) aging into "trusted" zone.)

— Andréa López (@bluechoochoo) July 21, 2017

If Twitter is selling those “follow” actions to me for nine cents each, and the bot-herder is only making half a cent, how is Twitter not making more from bogus Twitter followers than the bot-herders are?

If you’re verified on Twitter, you may not be seeing how much of a shitshow their ad business is. Maybe the’re going to have to sell Twitter to me sooner than I thought.

26 Jul 2017

Incentivizing production of information goods

Just thinking about approaches to incentivizing production of information goods, and where futures markets might fit in.

Artificial property

Article 1, Section 8, of the US Constitution still covers this one best.

To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;

We know about the problems with this one. It encourages all kinds of rent-seeking and freedom-menacing behavior by the holders of property interests in information. And the transaction costs are too high to incentivize the production of some useful kinds of information.

Commoditize the complement

Joel Spolsky explained it best, in Strategy Letter V. Smart companies try to commoditize their products’ complements. (See also: the list of business models in the Some Easily Rebutted Objections to GNU’s Goals section of the GNU Manifesto)

This one has been shown to work for some categories of information goods but not others. (We have Free world-class browsers and OS kernels because search engines and hardware are complements. We don’t have free world-class software in categories such as CAD.)

Signaling

Release a free information good as a way to signal competence in performing a service, or at least a large investment by the author in persuading others that the author is competent. In software, this works at the level of the individual labor market and in consulting. Also a major part of certain kinds of ad-supported business models.

Game and market mechanisms

With “gamified crowdsourcing” you can earn play rewards for very low transaction costs, and contribute very small tasks.

Common Voice

Higher transaction costs are associated with “crowdfunding” which sounds similar but requires more collaboration and administration.

From games to markets

In the middle, between crowdsourcing and crowdfunding, is a niche for a mechanism with lower transaction costs than crowdfunding but more rewards than crowdsourcing.

By using the existing bug tracker to resolve contracts, a bug futures market keeps transaction costs low. By connecting to an existing payment platform, a bug futures market enables a kind of reward that is more liquid, and transferrable among projects.

We don’t know how wide the bug futures niche is. Is it a tiny space between increasingly complex tasks that can be resolved by crowdsourcing and increasingly finer-grained crowdfunding campaigns?

Or are bug futures capable of achieving low enough transaction costs to be an attractive incentivization mechanism for a lot of tasks that go into a variety of information goods?

23 Jul 2017

My bot parsed 12,387 RSS feeds and all I got were these links.

Bryan Alexander has a good description of an “open web” reading pipeline in I defy the world and go back to RSS. I’m all for the open web, but 40 separate folders for 400 feeds? That would drive me nuts. I’m a lumper, not a splitter. I have one folder for 12,387 feeds.

My chosen way to use RSS (and one of the great things about RSS is you can choose UX independently of information sources) is a “scored river”. Something like Dave Winer’s River of News concept, that you can navigate by just scrolling, but not exactly a river of news.

  • with full text if available, but without images. I can click through if I want the images.

  • items grouped by score, not feed. (Scores assigned managed by a dirt-simple algorithm where a feed “invests” a percentage of its points in every link, and the investments pay out in a higher score for that feed if the user likes a link.)

I also put the byline at the bottom of each item. Anyway, one thing I have found out about manipulating my own filter bubble is that linklog feeds and blogrolls are great inputs.

Here are some actual links.

This might look funny: How I ran my kids like an Atlassian team for a month. But think about it for a minute. Someone at every app or site your kids use is doing the same thing, and their goals don’t include “Dignity and Respect” or “Hard Work Smart Work”.

Global network of 'hunters' aim to take down terrorists on the internet It took me a few days to figure things out and after a few weeks I was dropping accounts like flies…

Google's been running a secret test to detect bogus ads — and its findings should make the industry nervous. (This is a hella good idea. Legit publishers could borrow it: just go ad-free for a few minutes at random, unannounced, a couple of times a week, then send the times straight to CMOs. Did you buy ads that someone claimed ran on our site at these times? Well, you got played.)

For an Inclusive Culture, Try Working Less As I said, to this day, my team at J.D. Edwards was the most diverse I’ve ever worked on….Still, I just couldn’t get over that damned tie.

The Al Capone theory of sexual harassment Initially, the connection eluded us: why would the same person who made unwanted sexual advances also fake expense reports, plagiarize, or take credit for other people’s work?

Jon Tennant - The Cost of Knowledge But there’s something much more sinister to consider; recently a group of researchers saw fit to publish Ebola research in a ‘glamour magazine’ behind a paywall; they cared more about brand association than the content. This could be life-saving research, why did they not at least educate themselves on the preprint procedure….

Twitter Is Still Dismissing Harassment Reports And Frustrating Victims

This Is How Your Fear and Outrage Are Being Sold for Profit (Profit? What about TEH LULZ??!?!1?)

Fine, have some cute animal photos, I was done with the other stuff anyway: Photographer Spends Years Taking Adorable Photos of Rats to Break the Stigma of Rodents

22 Jul 2017

the other dude

Making the rounds, this is a fun one: A computer was asked to predict which start-ups would be successful. The results were astonishing.

  • 2014: When there’s no other dude in the car, the cost of taking an Uber anywhere becomes cheaper than owning a vehicle. So the magic there is, you basically bring the cost below the cost of ownership for everybody, and then car ownership goes away.

  • 2018 (?): When there’s no other dude in the fund, the cost of financing innovation anywhere becomes cheaper than owning a portfolio of public company stock. So the magic there is, you basically bring the transaction costs of venture capital below the cost of public company ownership for everybody, and then public companies go away.

Could be a thing for software/service companies faster than we might think. Futures contracts on bugs→equity crowdfunding→bot-managed follow-on fund…

18 Jul 2017

Stupid ideas department

Here’s a probably stupid idea: give bots the right to accept proposed changes to a software project. Can automation encourage less burnout-provoking behavior?

A set of bots could interact in interesting ways.

  • Regression-test-bot: If a change only adds a test, applies cleanly to both the current version and to a previous version, and the previous version passses the test, accept it, even if the test fails for the current version.

  • Harmless-change-bot: If a change is below a certain size, does not modify existing tests, and all tests (including any new ones) pass, accept it.

  • Revert-bot: If any tests are failing on the current version, and have been failing for more than a certain amount of time, revert back to a version that passes.

Would more people write regression tests for their issues if they knew that a bot would accept them? Or say that someone makes a bad change but gets it past harmless-change-bot because no existing test covers it. No lengthy argument needed. Write a regression test and let regression-test-bot and revert-bot team up to take care of the problem. In general, move contributor energy away from arguing with people and toward test writing, and reduce the size of the maintainer’s to-do list.

17 Jul 2017

Playing for third place

Just tried a Twitter advertising trick that a guy who goes by “weev” posted two years ago.

It still works.

They didn’t fix it.

Any low-budget troll who can read that old blog post and come up with a valid credit card number can still do it.

Maybe Twitter is a bad example, but the fast-moving nationalist right wing manages to outclass its opponents on other social marketing platforms, too. Facebook won’t even reveal how badly they got played in 2016. They thought they were putting out cat food for cute Internet kittens, but the rats ate it.

This is not new. Right-wing shitlords, at least the best of them, are the masters of database marketing. They absolutely kill it, and they have been ever since Marketing as we know it became a thing. Some good examples:

All the creepy surveillance marketing stuff they’re doing today is just another set of tools in an expanding core competency.

Every once in a while you get an exception. The environmental movement became a direct mail operation in response to Interior Secretary James G. Watt, who alarmed environmentalists enough that organizations could reliably fundraise with direct mail copy quoting from Watt’s latest speech. And the Democrats tried that “Organizing for America” thing for a little while, but, man, their heart just wasn’t in it. They dropped it like a Moodle site during summer vacation. Somehow, the creepier the marketing, the more it skews “red”. The more creativity involved, the more it skews “blue” (using the USA meanings of those colors.) When we make decisions about how much user surveillance we’re going to allow on a platform, we’re making a political decision.

Anyway. News Outlets to Seek Bargaining Rights Against Google and Facebook.

The standings so far.

  1. Shitlords and fraud hackers

  2. Adtech and social media bros

  3. NEWS SITES HERE (?)

News sites want to go to Congress, to get permission to play for third place in their own business? You want permission to bring fewer resources and less experience to a surveillance marketing game that the Internet companies are already losing?

We know the qualities of a medium that you win by being creepier, and we know the qualities of a medium that you can win with reputation and creativity. Why waste time and money asking Congress for the opportunity to lose, when you could change the game instead?

Maybe achieving balance in political views depends on achieving balance in business model. Instead of buying in to the surveillance marketing model 100%, and handing an advantage to one side, maybe news sites should help users control what data they share in order to balance competing political interests.

14 Jul 2017

Smart futures contracts on software issues talk, and bullshit walks?

Previously: Benkler’s Tripod, transactions from a future software market, more transactions from a future software market

Owning “equity” in an outcome

John Robb: Revisiting Open Source Ventures:

Given this, it appears that an open source venture (a company that can scale to millions of worker/owners creating a new economic ecosystem) that builds massive human curated databases and decentralizes the processing load of training these AIs could become extremely competitive.

But what if the economic ecosystem could exist without the venture? Instead of trying to build a virtual company with millions of workers/owners, build a market economy with millions of participants in tens of thousands of projects and tasks? All of this stuff scales technically much better than it scales organizationally—you could still be part of a large organization or movement while only participating directly on a small set of issues at any one time. Instead of holding equity in a large organization with all its political risk, you could hold a portfolio of positions in areas where you have enough knowledge to be comfortable.

Robb’s opportunity is in training AIs, not in writing code. The “oracle” for resolving AI-training or dataset-building contracts would have to be different, but the futures market could be the same.

The cheating project problem

Why would you invest in a futures contract on bug outcomes when the project maintainer controls the bug tracker?

And what about employees who are incentivized from both sides: paid to fix a bug but able to buy futures contracts (anonymously) that will let them make more on the market by leaving it open?

In order for the market to function, the total reputation of the project and contributors must be high enough that outside participants believe that developers are more motivated to maintain that reputation than to “take a dive” on a bug.

That implies that there is some kind of relationship between the total “reputation capital” of a project and the maximum market value of all the futures contracts on it.

Open source metrics

To put that another way, there must be some relationship between the market value of futures contracts on a project and the maximum reputation value of the project. So that could be a proxy for a difficult-to-measure concept such as “open source health.”

Open source journalism

Hey, tickers to put into stories! Sparklines! All the charts and stuff that finance and sports reporters can build stories around!

13 Jul 2017

Blind code reviews experiment

Update 18 Dec 2017: The blind-reviews add-on now supports both Bugzilla code reviews and GitHub pull requests. Updated project status. Added a forbidden word.

In case you missed it, here’s a study that made the rounds earlier this year: Gender differences and bias in open source: Pull request acceptance of women versus men:

This paper presents the largest study to date on gender bias, where we compare acceptance rates of contributions from men versus women in an open source software community. Surprisingly, our results show that women’s contributions tend to be accepted more often than men’s. However, women’s acceptance rates are higher only when they are not identifiable as women.

A followup, from Alice Marshall, breaks out the differences between acceptance of “insider” and “outsider” contributions.

For outsiders, women coders who use gender-neutral profiles get their changes accepted 2.8% more of the time than men with gender-neutral profiles, but when their gender is obvious, they get their changes accepted 0.8% less of the time.

We decided to borrow the blind auditions concept from symphony orchestras for the open source experiments program.

The experiment, launching this month, will help reviewers who want to try breaking habits of unconscious bias (whether by gender or insider/outsider status) by concealing the name and email adddress of a code author during a review on Bugzilla. You’ll be able to un-hide the information before submitting a review, if you want, in order to add a personal touch, such as welcoming a new contributor.

Built with the WebExtension development work of Tomislav Jovanovic (“zombie” on IRC), and the Bugzilla bugmastering of Emma Humphries. For more info, see the Bugzilla bug discussion.

Data collection

The extension will “cc” one of two special accounts on a bug, to indicate if the review was done partly or fully blind. This lets us measure its impact without having to make back-end changes to Bugzilla.

(Yes, browser add-ons let you experiment with changing a user’s experience of a site without changing production web applications or content sites. Bonus link: FilterBubbler.)

Status

The blind-reviews add-on is available for Firefox here: Blind Reviews BMO Experiment.

Forbidden Word

Thing you “can’t” say for today: diversity (more info: forbidden words Git hook)

07 Jul 2017

Two approaches to adfraud, and some good news

Adfraud is a big problem, and we keep seeing two basic approaches to it.

Flight to quality: Run ads only on trustworthy sites. Brands are now playing the fraud game with the “reputation coprocessors” of the audience’s brains on the brand’s side. (Flight to quality doesn’t mean just advertise on the same major media sites as everyone else—it can scale downward with, for example, the Project Wonderful model that lets you choose sites that are “brand safe” for you.)

Increased surveillance: Try to fight adfraud by continuing to play the game of trying to get big-money impressions from the cheapest possible site, but throw more tracking at the problem. Biggest example of this is to move ad money to locked-down mobile platforms and away from the web.

The problem with the second approach is that the audience is no longer on the brand’s side. Trying to beat adfraud with technological measures is just challenging hackers to a series of hacking contests. And brands keep losing those. Recent news: The Judy Malware: Possibly the largest malware campaign found on Google Play.

Anyway, I’m interested in and optimistic about the results of the recent Mozilla/Caribou Digital report. It turns out that USA-style adtech is harder to do in countries where users are (1) less accurately tracked and (2) equipped with blockers to avoid bandwidth-sucking third-party ads. That’s likely to mean better prospects for ad-supported news and cultural works, not worse. This report points out the good news that the so-called adtech tax is lower in developing countries—so what kind of ad-supported businesses will be enabled by lower “taxes” and “reinvention, not reinsertion” of more magazine-like advertising?

Of course, working in those markets is going to be hard for big US or European ad agencies that are now used to solving problems by throwing creepy tracking at them. But the low rate of adtech taxation sounds like an opportunity for creative local agencies and brands. Maybe the report should have been called something like “The Global South is Shitty-Adtech-Proof, so Brands Built Online There Are Going to Come Eat Your Lunch.”

04 Jul 2017

transactions from a future software market

More on the third connection in Benkler’s Tripod, which was pretty general. This is just some notes on more concrete examples of how new kinds of direct connections between markets and peer production might work in the future.

Smart contracts should make it possible to enable these in a trustworthy, mostly decentralized, way.

Feature request I want emoji support on my blog, so I file, or find, a wishlist bug on the open source blog package I use: “Add emoji support.” I then offer to enter into a smart contract that will be worthless to me if the bug is fixed on September 1, or give me my money back if the bug is unfixed at that date.

A developer realizes that fixing the bug would be easy, and wants to do it, so takes the other side of the contract. The developer’s side will expire worthless if the bug is unfixed, and pay out if the bug is fixed.

“Unfixed” results will probably include bugs that are open, wontfix, invalid, or closed as duplicate of a bug that is still open.

“Fixed” results will include bugs closed as fixed, or any bug closed as a duplicate of a bug that is closed as fixed.

If the developer fixes the bug, and its status changes to fixed, then I lose money on the smart contract but get the feature I want. If the bug status is still unfixed, then I get my money back.

So far this is just one user paying one developer to write a feature. Not especially exciting. There is some interesting market design work to be done here, though. How can the developer signal serious interest in working on the bug, and get enough upside to be meaningful, without taking too much risk in the event the fix is not accepted on time?

Arbitrage I post the same offer, but another user realizes that the blog project can only support emoji if the template package that it depends on supports them. That user becomes an arbitrageur: takes the “fixed” side of my offer, and the “unfixed” side of the “Add emoji support” bug in the template project.

As an end user, I don’t have to know the dependency relationship, and the market gives the arbitrageur an incentive to collect information about multiple dependent bugs into the best place to fix them.

Front-running Dudley Do-Right’s open source project has a bug in it, users are offering to buy the “unfixed” side of the contract in order to incentivize a fix, and a trader realizes that Dudley would be unlikely to let the bug go unfixed. The trader takes the “fixed” side of the contract before Dudley wakes up. The deal means that the market gets information on the likelihood of the bug being fixed, but the developer doing the work does not profit from it.

This is a “picking up nickels in front of a steamroller” trading strategy. The front-runner is accepting the risk of Dudley burning out, writing a long Medium piece on how open source is full of FAIL, and never fixing a bug again.

Front-running game theory could be interesting. If developers get sufficiently annoyed by front-running, they could delay fixing certain bugs until after the end of the relevant contracts. A credible threat to do this might make front-runners get out of their positions at a loss.

CVE prediction A user of a static analysis tool finds a suspicious pattern in a section of a codebase, but cannot identify a specific vulnerability. The user offers to take one side of a smart contract that will pay off if a vulnerability matching a certain pattern is found. A software maintainer or key user can take the other side of these contracts, to encourage researchers to disclose information and focus attention on specific areas of the codebase.

Security information leakage Ernie and Bert discover a software vulnerability. Bert sells it to foreign spies. Ernie wants to get a piece of the action, too, but doesn’t want Bert to know, so he trades on a relevant CVE prediction. Neither Bert nor the foreign spies know who is making the prediction, but the market movement gives white-hat researchers a clue on where the vulnerability can be found.

Open source metrics: Prices and volumes on bug futures could turn out to be a more credible signal of interest in a project than raw activity numbers. It may be worth using a bot to trade on a project you depend on, just to watch the market move. Likewise, new open source metrics could provide useful trading strategies. If sentiment analysis shows that a project is melting down, offer to take the “unfixed” side of the project’s long-running bugs? (Of course, this is the same market action that incentivizes fixes, so betting that a project will fail is the same thing as paying them not to. My brain hurts.)

What’s an “oracle”?

The “oracle” is the software component that moves information from the bug tracker to the smart contracts system. Every smart contract has to be tied to a given oracle that both sides trust to resolve it fairly.

For CVE prediction, the oracle is responsible for pattern matching on new CVEs, and feeding the info into the smart contract system. As with all of these, CVE prediction contracts are tied to a specific oracle.

Bots

Bots might have several roles.

  • Move investments out of duplicate bugs. (Take a “fixed” position in the original and an “unfixed” position in the duplicate, or vice versa.)

  • Make small investments in bugs that appear valid based on project history and interactions by trusted users.

  • Track activity across projects and social sites to identify qualified bug fixers who are unlikely to fix a bug within the time frame of a contract, and take “unfixed” positions on bugs relevant to them.

  • For companies: when a bug is mentioned in an internal customer support ticketing system, buy “unfixed” on that bug. Map confidential customer needs to possible fixers.

04 Jul 2017

more transactions from a future software market

Previously: Benkler’s Tripod, transactions from a future software market

Why would you want the added complexity of a market where anyone can take either side of a futures contract on the status of a software bug, and not just offer to pay people to fix bugs like a sensible person? IMHO it’s worth trying not just because of the promise of lower transaction costs and more market liquidity (handwave) but because it enables other kinds of transactions. A few more.

Partial work I want a feature, and buy the “unfixed” side of a contract that I expect to lose. A developer decides to fix it, does the work, and posts a pull request that would close the bug. But the maintainer is on vacation, leaving her pull request hanging with a long comment thread. Another developer is willing to take on the political risk of merging the work, and buys out the original developer’s position.

Prediction/incentivization With the right market design, a prediction that something won’t happen is the same as an incentive to make it happen. If we make an attractive enough way for users to hedge their exposure to lack of innovation, we create a pool of wealth that can be captured by innovators. (Related: dominant assurance contracts)

Bug triage Much valuable work on bugs is in the form of modifying metadata: assigning a bug to the correct subsystem, identifying dependency relationships, cleaning up spam, and moving invalid bugs into a support ticket tracker or forum. This work is hard to reward, and infamously hard to find volunteers for. An active futures market could include both bots that trade bugs probabilistically based on status and activity, and active bug triagers who make small market gains from modifying metadata in a way that makes them more likely to be resolved.

04 Jul 2017

Applying proposed principles for content blocking

(I work for Mozilla. None of this is secret. None of this is official Mozilla policy. Not speaking for Mozilla here.)

In 2015, Denelle Dixon at Mozilla wrote Proposed Principles for Content Blocking.

The principles are:

  • Content Neutrality: Content blocking software should focus on addressing potential user needs (such as on performance, security, and privacy) instead of blocking specific types of content (such as advertising).

  • Transparency & Control: The content blocking software should provide users with transparency and meaningful controls over the needs it is attempting to address.

  • Openness: Blocking should maintain a level playing field and should block under the same principles regardless of source of the content. Publishers and other content providers should be given ways to participate in an open Web ecosystem, instead of being placed in a permanent penalty box that closes off the Web to their products and services.

See also Nine Principles of Policing by Sir Robert Peel, who wrote,

[T]he police are the public and that the public are the police, the police being only members of the public who are paid to give full-time attention to duties which are incumbent on every citizen in the interests of community welfare and existence.

Web browser developers have similar responsibilities to those of Peel’s ideal police: to build a browser to carry out the user’s intent, or, when setting defaults, to understand widely held user norms and implement those, while giving users the affordances to change the defaults if they choose.

The question now is how to apply content blocking principles to today’s web environment. Some qualities of today’s situation are:

  • Tracking protection often doesn’t have to be perfect, because adfraud. The browser can provide some protection, and influence the market in a positive direction, just by getting legit users below the noise floor of fraudbots.

  • Tracking protection has the potential to intensify a fingerprinting arms race that’s already going on, by forcing more adtech to rely on fingerprinting in place of third-party cookies.

  • Fraud is bad, but not all anti-fraud is good. Anti-fraud technologies that track users can create the same security risks as other tracking—and enable adtech to keep promising real eyeballs on crappy sites. The “flight to quality” approach to anti-fraud does not share these problems.

  • Adtech and adfraud can peek at Mozilla’s homework, but Mozilla can’t see theirs. Open source projects must rely on unpredictable users, not unpredictable platform decisions, to create uncertainty.

Which suggests a few tactics—low-risk ways to apply content blocking principles to address today’s adtech/adfraud problems.

Empower WebExtensions developers and users. Much of the tracking protection and anti-fingerprinting magic in Firefox is hidden behind preferences. This makes a lot of sense because it enables developers to integrate their work into the browser in parallel with user testing, and enables Tor Browser to do less patching. IMHO this work is also important to enable users to choose their own balance between privacy/security and breaking legacy sites.

Inform and nudge users who express an interest in privacy. Some users care about privacy, but don’t have enough information about how protection choices match up with their expectations. If a user cares enough to turn on Do Not Track, change cookie settings, or install an ad blocker, then try suggesting a tracking protection setting or tool. Don’t assume that just because a user has installed an ad blocker with deceptive privacy settings that the user would not choose privacy if asked clearly.

Understand and report on adfraud. Adfraud is more than just fake impressions and clicks. New techniques include attribution fraud: taking advantage of tracking to connect a bogus ad impression to a real sale. The complexity of attribution models makes this hard to track down. (Criteo and Steelhouse settled a lawsuit about this before discovery could reveal much.)

A multi-billion-dollar industry is devoted to spreading a story that minimizes adfraud, while independent research hints at a complex and lucrative adfraud scene. Remember how there were two Methbot stories: Methbot got a bogus block of IP addresses, and Methbot circumvented some widely used anti-fraud scripts. The ad networks dealt with the first one pretty quickly, but the second is still a work in progress.

The more that Internet freedom lovers can help marketers understand adfraud, and related problems such as brand-unsafe ad placements, the more that the content blocking story can be about users, legit sites, and brands dealing with problem tracking, and not just privacy nerds against all web business.

03 Jul 2017

Software: annoying speech or crappy product?

Zeynep Tufekci, in the New York Times:

Since most software is sold with an “as is” license, meaning the company is not legally liable for any issues with it even on day one, it has not made much sense to spend the extra money and time required to make software more secure quickly.

The software business is still stuck on the kind of licensing that might have made sense in the 8-bit micro days, when “personal computer productivity” was more aspirational than a real thing, and software licenses were printed on the backs of floppy sleeves.

Today, software is part of products that do real stuff, and it makes zero sense to ship a real product, that people’s safety or security depends on, with the fine print “WE RESERVE THE RIGHT TO TOTALLY HALF-ASS OUR JOBS” or in business-speak, “SELLER DISCLAIMS THE IMPLIED WARRANTY OF MERCHANTABILITY.”

But what about open source and collaboration and science, and all that stuff? Software can be both “product” and “speech”. Should there be a warranty on speech? If I dig up my shell script for re-running the make command when a source file changes, and put it on the Internet, should I be putting a warranty on it?

It seems that there are two kinds of software: some is more product-like, and should have a grown-up warranty on it like a real business. And some software is more speech-like, and should have ethical requirements like a scientific paper, but not a product-like warranty.

What’s the dividing line? Some ideas.

“productware is shipped as executables, freespeechware is shipped as source code” Not going to work for elevator_controller.php or a home router security tool written in JavaScript.

“productware is preinstalled, freespeechware is downloaded separately” That doesn’t make sense when even implanted defibrillators can update over the net.

“productware is proprietary, freespeechware is open source” Companies could put all the fragile stuff in open source components, then use the DMCA and CFAA to enable them to treat the whole compilation as proprietary.

Software companies are built to be good at getting around rules. If a company can earn all its money in faraway Dutch Sandwich Land and be conveniently too broke to pay the IRS in the USA, then it’s going to be hard to make it grow up licensing-wise without hurting other people first.

How about splitting out the legal advantages that the government offers to software and extending some to productware, others to freespeechware?

Freespeechware licenses

  • license may disclaim implied warranty

  • no anti-reverse-engineering clause in a freespeechware license is enforceable

  • freespeechware is not a “technological protection measure” under section 1201 of Title 17 of the United States Code (DMCA anticircumvention)

  • exploiting a flaw in freespeechware is never a violation of the Computer Fraud and Abuse Act

  • If the license allows it, a vendor may sell freespeechware, or a derivative work of it, as productware. (This could be as simple as following the You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. term of the GPL.)

Productware licenses:

  • license may not disclaim implied warranty

  • licensor and licensee may agree to limit reverse engineering rights

  • DMCA and CFAA apply (reformed of course, but that’s another story)

It seems to me that there needs to be some kind of quid pro quo here. If a company that sells software wants to use government-granted legal powers to control its work, that has to be conditioned on not using those powers just to protect irresponsible releases.

23 Jun 2017

Fun with dlvr.it

Check it out—I’m “on Facebook” again. Just fixed my gateway through dlvr.it. If you’re reading this on Facebook, that’s why.

Dlvr.it is a nifty service that will post to social sites from an RSS feed. If you don’t run your own linklog feed, the good news is that Pocket will generate RSS feeds from the articles you save, so if you want to share links with people still on Facebook, the combination of Pocket and dlvr.it makes that easy to do without actually spending human eyeball time there.

There’s a story about Thomas Nelson, Jr., leader of the Virginia Militia in the Revolutionary War.

During the siege and battle Nelson led the Virginia Militia whom he had personally organized and supplied with his own funds. Legend had it that Nelson ordered his artillery to direct their fire on his own house which was occupied by Cornwallis, offering five guineas to the first man who hit the house.

Would Facebook’s owners do the same, now that we know that foreign interests use Facebook to subvert America? Probably not. The Nelson story is just an unconfirmed patriotic anecdote, and we can’t expect that kind of thing from today’s post-patriotic investor class. Anyway, just seeing if I can move Facebook’s bots/eyeballs ratio up a little.

22 Jun 2017

1. Write open source. 2. ??? 3. PROFIT

Studies keep showing that open source developers get paid more than people who develop software but do not contribute to open source.

Good recent piece: Tabs, spaces and your salary - how is it really? by Evelina Gabasova.

But why?

Is open source participation a way to signal that you have skills and are capable of cooperation with others?

Is open source a way to build connections and social capital so that you have more awareness of new job openings and can more easily move to a higher-paid position?

Does open source participation just increase your skills so that you do better work and get paid more for it?

Are open source codebases a complementary good to open source maintenance programming, so that a lower price for access to the codebase tends to drive up the price for maintenance programming labor?

Is “we hire open source people” just an excuse for bias, since the open source scene at least in the USA is less diverse than the general pool of programming job applicants?

22 Jun 2017

Stuff I’m thankful for

I’m thankful that the sewing machine was invented a long time ago, not today. If the sewing machine were invented today, most sewing tutorials would be twice as long, because all the thread would come in proprietary cartridges, and you would usually have to hack the cartridge to get the type of thread you need in a cartridge that works with your machine.

21 Jun 2017

Catching up to Safari?

Earlier this month, Apple Safari pulled ahead of other mainstream browsers in tracking protection. Tracking protection in the browser is no longer a question of should the browser do it, but which browser best protects its users. But Apple’s early lead doesn’t mean that another browser can’t catch up.

Tracking protection is still hard. You have to provide good protection from third-party tracking, which users generally don’t want, without breaking legit third-party services such as content delivery networks, single sign-on systems, and shopping carts. Protection is a balance, similar to the problem of filtering spam while delivering legit mail. Just as spam filtering helps enable legit email marketing, tracking protection tends to enable legit advertising that supports journalism and cultural works.

In the long run, just as we have seen with spam filters, it will be more important to make protection hard to predict than to run the perfect protection out of the box. Do not repeat the tactics which have gained you one victory, but let your methods be regulated by the infinite variety of circumstances. — Sun Tzu A spam filter, or browser, that always does the same thing will be analyzed and worked around. A mail service that changes policies to respond to current spam runs, or an unpredictable ecosystem of tracking protection add-ons that browser users can install in unpredictable combinations, is likely to be harder.

But most users aren’t in the habit of installing add-ons, so browsers will probably have to give them a nudge, like Microsoft Windows does when it nags the user to pick an antivirus package (or did last time I checked.) So the decentralized way to catch up to Apple could end up being something like:

  • When new tracking protection methods show up in the privacy literature, quietly build the needed browser add-on APIs to make it possible for new add-ons to implement them.

  • Do user research to guide the content and timing of nudges. (Some atypical users prefer to be tracked, and should be offered a chance to silence the warnings by affirmatively choosing a do-nothing protection option.)

  • Help users share information about the pros and cons of different tools. If a tool saves lots of bandwidth and battery life but breaks some site’s comment form, help the user make the right choice.

  • Sponsor innovation challenges to incentivize development, testing, and promotion of diverse tracking protection tools.

Any surveillance marketer can install and test a copy of Safari, but working around an explosion of tracking protection tools would be harder. How to set priorities when they don’t know which tools will get popular?

What about adfraud?

Tracking protection strategies have to take adfraud into account. Marketers have two choices for how to deal with adfraud:

  • flight to quality

  • extra surveillance

Flight to quality is better in the long run. But it’s a problem from the point of view of adtech intermediaries because it moves more ad money to high-reputation sites, and the whole point of adtech is to reach big-money eyeballs on cheap sites. Adtech firms would rather see surveillance-heavy responses to adfraud. One way to help shift marketing budgets away from surveillance, and toward flight to quality, is to make the returns on surveillance investments less predictable.

This is possible to do without making value judgments about certain kinds of sites. If you like a site enough to let it see your personal info, you should be able to do it, even if in my humble opinion it’s a crappy site. But you can have this option without extending to all crappy sites the confidence that they’ll be able to live on leaked data from unaware users.

21 Jun 2017

Catching up to Safari?

Earlier this month, Apple Safari pulled ahead of other mainstream browsers in tracking protection. Tracking protection in the browser is no longer a question of should the browser do it, but which browser best protects its users. But Apple’s early lead doesn’t mean that another browser can’t catch up.

Tracking protection is still hard. You have to provide good protection from third-party tracking, which users generally don’t want, without breaking legit third-party services such as content delivery networks, single sign-on systems, and shopping carts. Protection is a balance, similar to the problem of filtering spam while delivering legit mail. Just as spam filtering helps enable legit email marketing, tracking protection tends to enable legit advertising that supports journalism and cultural works.

In the long run, just as we have seen with spam filters, it will be more important to make protection hard to predict than to run the perfect protection out of the box. Do not repeat the tactics which have gained you one victory, but let your methods be regulated by the infinite variety of circumstances. — Sun Tzu A spam filter, or browser, that always does the same thing will be analyzed and worked around. A mail service that changes policies to respond to current spam runs, or an unpredictable ecosystem of tracking protection add-ons that browser users can install in unpredictable combinations, is likely to be harder.

But most users aren’t in the habit of installing add-ons, so browsers will probably have to give them a nudge, like Microsoft Windows does when it nags the user to pick an antivirus package (or did last time I checked.) So the decentralized way to catch up to Apple could end up being something like:

  • When new tracking protection methods show up in the privacy literature, quietly build the needed browser add-on APIs to make it possible for new add-ons to implement them.

  • Do user research to guide the content and timing of nudges. (Some atypical users prefer to be tracked, and should be offered a chance to silence the warnings by affirmatively choosing a do-nothing protection option.)

  • Help users share information about the pros and cons of different tools. If a tool saves lots of bandwidth and battery life but breaks some site’s comment form, help the user make the right choice.

  • Sponsor innovation challenges to incentivize development, testing, and promotion of diverse tracking protection tools.

Any surveillance marketer can install and test a copy of Safari, but working around an explosion of tracking protection tools would be harder. How to set priorities when they don’t know which tools will get popular?

What about adfraud?

Tracking protection strategies have to take adfraud into account. Marketers have two choices for how to deal with adfraud:

  • flight to quality

  • extra surveillance

Flight to quality is better in the long run. But it’s a problem from the point of view of adtech intermediaries because it moves more ad money to high-reputation sites, and the whole point of adtech is to reach big-money eyeballs on cheap sites. Adtech firms would rather see surveillance-heavy responses to adfraud. One way to help shift marketing budgets away from surveillance, and toward flight to quality, is to make the returns on surveillance investments less predictable.

This is possible to do without making value judgments about certain kinds of sites. If you like a site enough to let it see your personal info, you should be able to do it, even if in my humble opinion it’s a crappy site. But you can have this option without extending to all crappy sites the confidence that they’ll be able to live on leaked data from unaware users.

06 Jun 2017

Apple user research revealed, sort of

This is not normally the blog to come to for Apple fan posts (my ThinkPad, desktop Linux, cold dead hands, and so on) but really good work here on “Intelligent Tracking Prevention” in Apple Safari.

Looks like the spawn of Privacy Badger and cookie double-keying, designed to balance user protection from surveillance marketing with minimal breakage of sites that depend on third-party resources.

(Now all the webmasters will fix stuff to make it work with Intelligent Tracking Prevention, which makes it easier for other browsers and privacy tools to justify their own features to protect users. Of course, now the surveillance marketers will rely more on passive fingerprinting, and Apple has an advantage there because there are fewer different Safari-capable devices. But browsers need to fix fingerprinting anyway.)

Apple does massive amounts of user research and it’s fun to watch the results leak through when they communicate about features. Looks like they have found that users care about being “followed” from site to site by ads, and that users are still pretty good at applied behavioral economics. The side effect of tracking protection, of course, is that it takes high-reputation sites out of competition with the bottom-feeders to reach their own audiences, so Intelligent Tracking Prevention is great news for publishers too.

Meanwhile, I don’t get Google’s weak “filter” thing. Looks like a transparently publisher-hostile move (since it blocks some potentially big-money ads without addressing the problem of site commodification), unless I’m missing something.

31 May 2017

The third connection in Benkler’s Tripod

Here’s a classic article by Yochai Benkler: Coase’s Penguin, or Linux and the Nature of the Firm.

Benkler builds on the work of Ronald Coase, whose The Nature of the Firm explains how transaction costs affect when companies can be more efficient ways to organize work than markets. Benkler adds a third organizational model, peer production. Peer production, commonly seen in open source projects, is good at matching creative people to rewarding problems.

As peer production relies on opening up access to resources for a relatively unbounded set of agents, freeing them to define and pursue an unbounded set of projects that are the best outcome of combining a particular individual or set of individuals with a particular set of resources, this open set of agents is likely to be more productive than the same set could have been if divided into bounded sets in firms.

Firms, markets, and peer production all have their advantages, and in the real world, most productive activity is mixed.

  • Managers in firms manage some production directly and trade in markets for other production. This connection in the firms/markets/peer production tripod is as old as firms.

  • The open source software business is the second connection. Managers in firms both manage software production directly and sponsor peer production projects, or manage employees who participate in projects.

But what about the third possible connection between legs of the tripod? Is it possible to make a direct connection between peer production and markets, one that doesn’t go through firms? And why would you want to connect peer production directly to markets in the first place? Not just because that’s where the money is, but because markets are a good tool for getting information out of people, and projects need information. Save the whole Kooths et al. paper to read later. Best case against open source that I know of—all the points that a serious open source proponent needs to be able to address. Stefan Kooths, Markus Langenfurth, and Nadine Kalwey wrote, in “Open-Source Software: An Economic Assessment” (PDF),

Developers lack key information due to the absence of pricing in open-source software. They do not have information concerning customers’ willingness to pay (= actual preferences), based on which production decisions would be made in the market process. Because of the absence of this information, supply does not automatically develop in line with the needs of the users, which may manifest itself as oversupply (excessive supply) or undersupply (excessive demand). Furthermore, the functional deficits in the software market also work their way up to the upstream factor markets (in particular, the labor market for developers) and–depending on the financing model of the open-source software development–to the downstream or parallel complementary markets (e.g., service markets) as well.

Because the open-source model at its core deliberately rejects the use of the market as a coordination mechanism and prevents the formation of price information, the above market functions cannot be satisfied by the open-source model. This results in a systematic disadvantage in the provision of software in the open-source model as compared to the proprietary production process.

The workaround is to connect peer production to markets by way of firms. But the more that connections between markets and peer production projects have to go through firms, the more chances to lose information. That’s not because firms are necessarily dysfunctional (although most are, in different ways). A firm might rationally choose to pay for the implementation of a feature that they predict will get 100 new users, paying $5000 each, instead of a feature that adds $1000 of value for 1000 existing users, but whose absence won’t stop them from renewing.

Some ways to connect peer production to markets are already working. Crowdfunding for software projects and Patreon are furthest along, both offering support for developers who have already built a reputation.

Dominant assurance contracts, by Alexander Tabarrok, build on the crowdfunding model, with the extra twist that the person proposing the project has to put up some seed money that is divided among backers if the project fails to secure funding. This is supposed to bring in extra investment early on, before a project looks likely to meet its goal.

Tom W. Bell’s “SPEX”, in Prediction Markets for Promoting the Progress of Sciences and the Useful Arts, is a proposed market to facilitate transactions in a variety of prediction certificates, each one of which promises to pay its bearer in the event that an associated claim about science, technology, or public policy comes true. The SPEX looks promising as a way for investors to hedge their exposure to lack of innovation. If you own data centers and need energy, take a short position in SPEX contracts on cold fusion. (Or, more likely, buy into a SPEX fund that invests for your industry.) The SPEX looks like a way to connect the market to more difficult problems than the kinds of incremental innovation that tend to be funded through the VC system.

What happens when the software industry is forced to grow up?

I’m starting to think that finishing the tripod, with better links from markets to peer production, is going to matter a lot more soon, because of the software quality problem.

Today’s software, both proprietary and open source, is distributed under ¯\_(ツ)_/¯ terms. “Disclaimer of implied warranty of merchantability” is lawyer-speak for “we reserve the right to half-ass our jobs lol.” As Zeynep Tufekci wrote in the New York Times, “The World Is Getting Hacked. Why Don’t We Do More to Stop It?” At some point the users are going to get fed up, and we’re going to have to. An industry as large and wealthy as software, still sticking to Homebrew Computer Club-era disclaimers, is like a 40-something-year-old startup bro doing crimes and claiming that they’re just boyish hijinks. This whole disclaimer of implied warranty thing is making us look stupid, people. (No, I’m not for warranties on software that counts as a scientific or technical communication, or on bona fide collaborative development, but on a product product? Come on.)

Grown-up software liability policy is coming, but we’re not ready for it. Quality software is not just a technically hard problem. Today, we’re set up to move fast, break things, and ship dancing pigs—with incentives more powerful than incentives to build secure software. Yes, you get the occasional DARPA initiative or tool to facilitate incremental cleanup, but most software is incentivized through too many layers of principal-agent problems. Everything is broken.

If governments try to fix software liability before the software scene can fix the incentives problem, then we will end up with a stifled, slowed-down software scene, a few incumbent software companies living on regulatory capture, and probably not much real security benefit for users. But what if users (directly or through their insurance companies) are willing to pay to avoid the costs of broken software, in markets, and open source developers are willing to participate in peer production to make quality software, but software firms are not set up to connect them?

What if there is another way to connect the “I would rather pay a little more and not get h@x0r3d!” demand to the “I would code that right and release it in open source, if someone would pay for it” supply?

30 May 2017

User tracking as Chesterton’s Fence

G.K. Chesterton once wrote

In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.

Bob Hoffman makes a good case for getting rid of user tracking in web advertising. But in order to take the next steps, and not just talk among ourselves about things that would be really great in the future, we first need to think about the needs that tracking seems to satisfy for legit marketers.

What I’m not going to do is pull out the argument that’s in every first comment on every blog post that criticizes tracking: that “adtech” is just technology and is somehow value-neutral. Tracking, like all technologies, enables some kinds of activity better than others. When tracking offers marketers the opportunity to reach users based on who the user is rather than on what they’re reading, watching, or listening to, then that means:

But if tracking is so bad, then why, when you go to any message board or Q&A site that discusses marketing for small businesses, is everyone discussing those nasty, potentially civilization-extinguishing targeted ads? Why is nobody popping up with a question on how to make the next They Laughed When I Sat Down At the Piano?

  • Targeted ads are self-serve and easy to get started with. If you have never bought a Twitter or Facebook ad, get out your credit card and start a stopwatch. These ads might be crappy, but they have the lowest time investment of any legit marketing project, so probably the only marketing project that time-crunched startups can do.

  • Targeted ads keep your OODA loop tight. Yes, running targeted ads can be addictive—If you thought the the attention slot machine game on social sites was bad, try the advertiser dashboard. But you’re able to use them to learn information that can help with the rest of marketing. If you have the budget to exhibit at one conference, compare Twitter ads targeted to attendees of conference A with ads targeted to attendees of conference B, and you’re closer to an answer.

  • Marketing has two jobs: sell stuff to customers and sell Marketing to management. Targeting is great for the second one, since it comes with the numbers that will help you take credit for results.

We’re not going to be able to get rid of risky tracking until we can understand the needs that it fills, not just for big advertisers who can afford the time and money to show up in Cannes every year, but for the company founder who still has $1.99 business cards and is doing all of Marketing themselves.

(The party line among web privacy people can’t just be that GDPR is going to save us because the French powers that be are all emmerdés ever since the surveillance/shitlord complex tried to run a US-style game on their political system. That might sound nice, but put not your trust in princes, man. Even the most arrogant Eurocrats in the world will not be able to regulate indefinitely against all the legit business people in their countries complaining that they can’t do something they see as essential. GDPR will be temporary air cover for building an alternative, not a fix in itself.)

Post-creepy web advertising is still missing some key features.

  • Branding and signaling metrics. We know the hard math works out against tracking and targeting, and we know about the failure of targeted media to build brands in the long run, but we don’t have good numbers that are usable day to day. The “customer journey” has nice graphs, but brand equity doesn’t.

  • Quick, low-risk service. With the exception of the Project Wonderful model, targeted ads are quick and low-risk, while signal-carrying ads are the opposite. A high-overhead direct ad sales process is not a drop-in replacement for an easy web form.

I don’t think that’s all of them. But I don’t think that the move to post-creepy web advertising is going to be a rush, all at once, either. Brands that have fly-by-night low-reputation competitors, brands that already have many tracking-protected customers, and brands with solid email lists are going to be able to move faster than marketers who are still making tracking work. More: Work together to fix web ads? Let’s not.

28 May 2017

sudo dnf install mosh

I’m still two steps behind in devops coolness for my network stuff. I don’t even have proper configuration management, and that’s fine because Configuration Management is an Anti-pattern now. Anyway, I still log in and actually run shell commands on the server, and the LWN review of mosh was helpful to me. Now using mosh for connections that persist across suspending the laptop and moving it from network to network. More info: Mosh: the mobile shell

free riding on open source

Here’s a good Twitter thread on open source projects and “free rider” companies. As far as I can tell, companies can pay for open source in three ways.

  • do software development

  • pay people to do software development

  • write a long Medium post apologizing to your users for failing

end date for IP Maximalism

When did serious “Intellectual Property Maximalism” end? I’m going to put it at September 18, 2006, which is the date that the Gates Foundation announced funding for the Public Library of Science’s journal PLoS Neglected Tropical Diseases. When it’s a serious matter of people’s health, open access matters, even to the author of “Open Letter to Hobbyists”. Since then, IP Maximalism stories have been mostly about rent-seeking behavior, which had been a big part of the freedom lovers’s point all along. (Nobody quoted in this story is pearl-clutching about “innovation”, for example: Supreme Court ruling threatens to shut down cottage industry for small East Texas town.)

random stuff

Just Keep Scrolling! How To Design Lengthy, Lengthy Pages is “sponsored content” but it’s really good sponsored content.

The marketplace of ideas is now struggling with the increasing incidence of algorithmic manipulation and disinformation campaigns. There are bots. Look around.

(In other news, Facebook is still evil, but you probably knew that by now: Why Facebook's Authentication Model is Inadequate, Does Facebook Make Us Unhappy and Unhealthy?)

27 May 2017

Some questions on a screenshot

Here’s a screenshot of an editorial from Der Spiegel, with Ghostery turned on.

article from Der Spiegel

Is it just me, or does it look to anyone else like the man in the photo is checking the list of third-party web trackers on the site to see who he can send a National Security Letter to?

Could a US president who is untrustworthy enough to be removed from office possibly be trustworthy enough to comply with his side of a “Privacy Shield” agreement?

If it’s necessary for the rest of the world to free itself of its dependence on the U.S., does that apply to US-based Internet companies that have become a bottleneck for news site ad revenue, and how is that going to work?

Update for 2024

trying to think about European tech policy in context

19 May 2017

What happened to Twitter? We can’t look away…

Hey, everybody, check it out.

Here’s a Twitter ad.

Twitter ad with a celebrity photo and the headline, What happened to Jenna Jameson? We Can’t Look Away…The New Look is Mesmerizing

If you’re “verified” on Twitter, you probably miss these, so I’ll just use my Fair Use rights to share that one with you.

You’re welcome.

Twitter is a uniquely influential medium, one that shows up on the TV news every night and on news sites all day. But somehow, the plan to make money from Twitter is to run the same kind of crappy ads that anyone can run. And the latest Twitter news is a privacy update that includes, among other things, more tracking of users from one site to another. Yes, the same kind of thing that Facebook already does, and better, with more users. And the same kind of thing that any web site can already get from an entire Lumascape of companies. Boring.

If you want to stick this kind of ad on your WordPress site, you just have to cut and paste some ad network HTML—not build out a deluxe office space on Market Street in San Francisco the way Twitter has. But the result is about the same.

What makes Twitter even more facepalm-worthy is that they make a point of not showing the ads to the influential people who draw attention to Twitter to start with. It’s like they’re posting a big sign that says STUPID AD ZONE: UNIMPORTANT PEOPLE ONLY. Twitter is building something unique, but they’re selling generic impressions that advertisers can get anywhere. So as far as I can tell, the Twitter business model is something like:

Money out: build something unique and expensive.

Money in: sell the most generic and shitty thing in the world.

Facebook can make this work because they have a lot of eyeball-minutes and a lot of sources of personal info (pixels, SDKs, server-to-server tracking…) But Facebook is an outlier in that area, and there aren’t enough minutes in the day for another. So Twitter is on track to get sold for $500,000, like Digg was. Which is good news for me because I know enough Twitter users that I can get that kind of money together.

So why should you help me buy Twitter when you could just get the $500,000 yourself? Because I have a secret plan, of course. Twitter is the site that everyone is talking about, right? So run the ads that people will talk about. Here’s the plan.

Sell one ad per day. And everybody sees the same one.

Sort of like the back cover of the magazine that everybody in the world reads (but there is no such magazine, so that’s why this is an opportunity.) No more need to excuse the verified users from the ads. Yes, an advertiser will have to provide a variety of sizes and localizations for each ad (and yes, Twitter will have to check that the translations match). But it’s the same essential ad, shown to every Twitter user in the world for 24 hours.

No point trying to out-Facebook Facebook or out-Lumascape the Lumascape. Surveillance ads are high in fraud and weak on signal, and a bunch of other companies are doing them more cost-effectively and at higher volume, anyway.

Of course, this is not for everybody. It’s for brands that want to use a memorable, creative ad to try for the same kind of global signal boost that a good Tweet® can get. But if you want generic targeted ads you can get those everywhere else on the Internet. Where else can you get signal? In order to beat current Twitter revenue, the One Twitter Ad needs to go for about the same price as a Super Bowl commercial. But if Twitter stays influential, that’s reasonable, and I make back the 500 grand and a lot more.

02 May 2017

Understanding the limitations of data pollution tools

Jeremy Gillula and Yomna Nasser write, on the EFF blog,

Internet users have been asking what they can do to protect their own data from this creepy, non-consensual tracking by Internet providers—for example, directing their Internet traffic through a VPN or Tor. One idea to combat this that’s recently gotten a lot of traction among privacy-conscious users is data pollution tools: software that fills your browsing history with visits to random websites in order to add “noise” to the browsing data that your Internet provider is collecting.

[T]here are currently too many limitations and too many unknowns to be able to confirm that data pollution is an effective strategy at protecting one’s privacy. We’d love to eventually be proven wrong, but for now, we simply cannot recommend these tools as an effective method for protecting your privacy.

This is one of those “two problems one solution” situations.

  • The problem for makers and users of “data pollution” or spoofing tools is QA. How do you know that your tool is working? Or are surveillance marketers just filtering out the impressions created by the tool, on the server side?

  • The problem for companies using so-called Non-Human Traffic (NHT) is that when users discover NHT software (bots), the users tend to remove it. What would make users choose to participate in NHT schemes so that the NHT software can run for longer and build up more valuable profiles?

So what if the makers of spoofing tools could get a live QA metric, and NHT software maintainers could give users an incentive to install and use their software?

NHT market as a tool for discovering information

Imagine a spoofing tool that offers an easy way to buy bot pageviews, I mean buy Perfectly Legitimate Data on how fast a site loads from various home Internet connections. When the tool connects to its server for an update, it gets a list of URLs to visit—a mix of random sites, popular sites, and paying customers.

Now the spoofing tool maintainer will be able to to tell right away if the tool is really generating realistic traffic, by looking at the market price of pageviews. The maintainer will even be able to tell whose tracking the tool can beat, by looking at which third-party resources are included on the pages getting paid-for traffic.

The money probably won’t be significant, since real web ad money is moving to whitelisted, legit sites and away from fraud-susceptible schemes anyway, but in the meantime it’s a way to measure effectiveness.

22 Apr 2017

NPM without sudo

Setting up a couple of Linux systems to work with FilterBubbler, which is one of the things that I’m up to at work now. FilterBubbler is a WebExtension, and the setup instructions use web-ext, so I need NPM. In order to keep all the NPM stuff under my own home directory, but still put the web-ext tool on my $PATH, I need to make one-line edits to three files.

One line in ~/.npmrc

prefix = ~/.npm

One line in ~/.gitignore

.npm/

One line in ~/.bashrc

export PATH="$PATH:$HOME/.npm/bin"

(My /bashrc has a bunch of export PATH= lines so that when I add or remove one it’s more likely to get a clean merge. Because home directory in git.) I think that’s it. Now I can do

npm install --global web-ext

with no sudo or mess. And when I clone my home directory on another system it will just work.

Based on: HowTo: npm global install without root privileges by Johannes Klose

15 Apr 2017

Traffic sourcing web obfuscator?

(This is an answer to a question on Twitter. Twitter is the new blog comments (for now) and I’m more likely to see comments there than to have time to set up and moderate comments here.)

Adfraud is an easy way to make mad cash, adtech is happily supporting it, and it all works because the system has enough layers between CMO and fraud hacker that everybody can stay as clean as they need to. Users bear the privacy risks of adfraud, legit publishers pay for it, and adtech makes more money from adfraud than fraud hackers do. Adtech doesn’t have to communicate or coordinate with adfraud, just set up a fraud-friendly system and let the actual fraud hackers go to work. Bad for users, people who make legit sites, and civilization in general.

But one piece of good news is that adfraud can change quickly. Adfraud hackers don’t have time to get stuck in conventional ways of doing things, because adfraud is so lucrative that the high-skill players don’t have to stay in it for very long. The adfraud hackers who were most active last fall have retired to run their resorts or recording studios or wineries or whatever.

So how can privacy tools get a piece of the action?

One random idea is for an obfuscation tool to participate in the market for so-called sourced traffic. Fraud hackers need real-looking traffic and are willing to pay for it. Supplying that traffic is sketchy but legal. Which is perfect, because put one more layer on top of it and it’s not even sketchy.

And who needs to know if they’re doing a good job at generating real-looking traffic? Obfuscation tool maintainers. Even if you write a great obfuscation tool, you never really know if your tricks for helping users beat surveillance are actually working, or if your tool’s traffic is getting quietly identified on the server side.

In proposed new privacy tool model, outsourced QA pays YOU!

Set up a market where a Perfectly Legitimate Site that is looking for sourced traffic can go to buy pageviews, I mean buy Perfectly Legitimate Data on how fast a site loads from various home Internet connections. When the obfuscation tool connects to its server for an update, it gets a list of URLs to visit—a mix of random, popular sites and paying customers.

Set a minimum price for pageviews that’s high enough to make it cost-ineffective for DDoS. Don’t allow it to be used on random sites, only those that the buyer controls. Make them put a secret in an unlinked-to URL or something. And if an obfuscation tool isn’t well enough sandboxed to visit a site that’s doing traffic sourcing, it isn’t well enough sandboxed to surf the web unsupervised at all.

Now the obfuscation tool maintainer will be able to to tell right away if the tool is really generating realistic traffic, by looking at the market price. The maintainer will even be able to tell whose tracking the tool can beat, by looking at which third-party resources are included on the pages getting paid-for traffic. And the whole thing can be done by stringing together stuff that IAB members are already doing, so they would look foolish to complain about it.

13 Apr 2017

Interesting stuff on the Internet

Just some mindless link propagation to tweak making the links on my blog the right shade of blue.

Good news: Portugal Pushes Law To Partially Ban DRM, Allow Circumvention

Study finds Pokémon Go players are happier and The More You Use Facebook, the Worse You Feel. Get your phone charged up, get off Facebook, and get out there.

If corporations are people, you wouldn’t be mean to a person, would you? Managing for the Long Term

Yay, surprise presents for Future Me! Why Kickstarter Decided To Radically Transform Its Business Model

Skateboarding obviously doesn’t cause hip fractures, because the age groups least likely to skateboard break their hips the most! Something is breaking American politics, but it's not social media

From Spocko, pioneer of Internet brand safety campaigns: Values: Brand, Corporate & Bill O’Reilly’s

In Spite of People Having Meetings, Bears Still Shit in the Woods: In Spite Of The Crackdown, Fake News Publishers Are Still Earning Money From Major Ad Networks

There’s another dead bishop on the landing. Alabama Senate OK's church police bill

Productivity is awesome: How to Avoid Distractions and Finish What You

Computer Science FTW: Corrode update: control flow translation correctness

More good news: Kentucky Coal Mining Museum converts to solar power

This is going to be…fun. Goldman Sachs: VC Dry Powder Hits Record Highs

If you want to prep for a developer job interview, here’s some good info: Hexing the technical interview

08 Apr 2017

Bunny: Internet famous?

bunny

I bought this ceramic bunny at a store on Park Street in Alameda, California. Somehow I think I have seen it before.

04 Apr 2017

Memo to self: make dentist appointment

(Hey, I said this was a personal blog.)

But I was just thinking—people started adding lots of refined sugar to their diets long before anybody discovered how dental caries works.

And today we have Internet distractions, and surveillance marketing, doing to our brains what sugar did to people’s teeth.

And people have both sugar and teeth today. Dental hygiene is awesome: it’s a set of norms, technologies, and habits, grounded in scientific understanding. Mental hygiene is just getting started.

The sugar industry moved faster to start with, but people agree that teeth matter. So do brains.

03 Apr 2017

Confusion about why we call adtech adtech

If you want people on the Internet to argue with you, say that you’re making a statement about values.

If you want people to negotiate with you, say that you’re making a statement about business.

If you want people to accept that something is inevitable, say that you’re making a statement about technology.

The mixup between values arguments, business arguments, and technology arguments might be why people are confused about Brands need to fire adtech by Doc Searls.

The set of trends that people call adtech is a values-driven business transformation that is trying to label itself as a technological transformation.

Some of the implementation involves technological changes (NoSQL databases! Nifty!) but fundamentally adtech is about changing how media business is done. Adtech does have a set of values, none of which are really commonly held even among people in the marketing or advertising field, but let’s not make the mistake of turning this into either an argument about values (that never accomplishes anything) or a set of statements about technology (that puts those with an inside POV on current technology at an unnecessary advantage). Instead, let’s look at the business positions that adtech is taking.

  • Adtech stands for profitable platforms, with commodity producers of news and cultural works. Michael Tiffany, CEO of advertising security firm White Ops, said The fundamental value proposition of these ad tech companies who are de-anonymizing the Internet is, Why spend big CPMs on branded sites when I can get them on no-name sites? This is not a healthy situation, but it’s a chosen path, not a technologically inevitable one.

  • Adtech stands for the needs of low-reputation sellers over the needs of high-reputation sellers. High-reputation and low-reputation brands need different qualities from an ad medium and adtech has to under-serve the high-reputation ones. Again, not technologically inevitable, but a business position that high-reputation brands and their agencies don’t have to accept.

  • Adtech stands for making advertisers support criminal and politically heinous activity. I’ll just let Bob Hoffman explain that one. Fraudulent and brand-unsafe content is just the overspray of the high value platforms/commoditized content system, and advertisers have to accept it in order to power that system. Or do they?

People have a lot of interesting decisions to make: policy, contractual, infrastructural, and client-side. When we treat the adtech movement as simply technology, we take the risk of missing great opportunities to negotiate for the benefit of brands, publishers, and the audience.

01 Apr 2017

Welcome RSS users

Welcome RSS users.

I am setting up a redirect from my old feed to the new one.

You might see a few old entries.

This new blog has better CSS for reading on small screens and has a Let’s Encrypt certificate.

More about how this site is built: colophon

26 Mar 2017

Welcome. How is everyone’s tracking protection working?

This is a brand new blog, so I’m setting up the basics. I just realized that I got the whole thing working without a single script, image, or HTML table. (These kids today have it easy, with their media queries and CSS Grid and stuff.)

One big question that I’m wondering about is: how many of the people who visit here are using some kind of protection from third-party tracking? Third-party tracking has been an unfixed vulnerability in web browsers for a long time. Check out the Unofficial Cookie FAQ from 1997. Third-party cookies are in there…and we’re still dealing with the third-party tracking problem?

In order to see how bad the problem is on this site, I’m going to set up a little bit of first-party data collection to measure people’s vulnerability to third-party data collection.

The three parts of that big question are:

  • Does first-party JavaScript load and run?

  • Does third-party JavaScript (from a site on popular filter lists) load and run?

  • Can a third-party tracker see state from other sites?

This will be easy to do with a little single-pixel image and the Aloodo tracking detection script.

Step two: write the first-party script

The local script is simple: /code/check3p.js

All it does is swap out the tracking image source three times.

  • When the script runs, to check that this is a browser with JavaScript on.

  • When the Aloodo tracking script runs, to check if this browser is blocking the script from loading.

  • When the Aloodo script confirms that tracking is possible.

The work is done in the setupAloodo function, which runs after the page loads. First, it sets the src for the tracking pixel to js.png, then sets up two callbacks: one to run after the Aloodo script is loaded, and switch the image to ld.png, and one to run if the script can track the user, and switch the image to td.png.

Step three: check the logs

Now I can use the regular server logs to compare the number of clients that load the original image, and the JavaScript-switched one, to the number that load the two tracking images.

(There are two different tracking callbacks because of the details of how Aloodo has to detect Privacy Badger, among other things. Not all tracking protection works the same.)

I’ll run some reports on the logs and post again about the results. (If you want to see your own results in the meantime, you can take a tracking protection test.)

14 Mar 2017

Am I metal yet?

(Update: this is about the old version of this blog. New version is covered at: colophon)

This is a blog. Started out with A Beginner’s Guide to Crafting a Blog with Metalsmith by Parimal Satyal, but added some other stuff.

Metalsmith is pretty fun. The basic pipeline from the article seems to work pretty well, but I ran into a couple of issues. I might have solved these in ways that are completely wrong, but here’s what works for me.

First, I needed to figure out how to get text from an earlier stage of the pipeline. My Metalsmith build is pretty basic:

  1. turn Markdown into HTML (plus article metadata stored with it, wrapped up in a JavaScript object)

  2. apply a template to turn the HTML version into a complete page.

That’s great, but the problem seems to be with getting a copy of just the HTML from step 1 for building the index page and the RSS feed. I don’t want the entire HTML page from step 2, just the inner HTML from step 1.

The solution seems to be metalsmith-untemplatize. This doesn’t actually strip off the template, just lets you capture an extra copy of the HTML before templatization. This goes into the pipeline after “markdown” but before the “layouts” step.

.use(untemplatize(
    { key: 'bodycopy'
}))

I also ran into the Repeat runs with collections adds duplicates issue. Strange to see the same blog items come up twice on the index page. The link on that bug page from Spacedawwwg goes to his fork of metalsmith-collections that seems to do the right thing.

Webfonts

16 Feb 2016

World’s Simplest Privacy Tool

Here’s the world’s simplest Firefox add-on, which just turns on Tracking Protection (ordinarily buried somewhere in about:config) and sets third-party cookie policy to a sane value.

install pq from addons.mozilla.org

So far it has 15 users and one review – five stars. It doesn’t do much, or for very many people, but what it does do it does with five-star quality.

Bonus link: How do I turn on Tracking Protection? Let me count the ways.

22 Oct 2015

MSIE on Fedora with virt-manager

Internet meetings are a pain in the behind. (Clearly online meeting software is controlled by the fossil fuel industry, and designed to be just flaky enough to make people drive to work instead.)

Here’s a work in progress to get an MSIE VM running on Fedora. (Will edit as I check these steps a few times. Suggestions welcome.)

Download: Download virtual machines.

Untar the OVA

tar xvf IE10\ -\ Win8.ova

You should end up with a .vmdk file.

Convert the OVA to qcow2

qemu-img convert IE10\ -\ Win8-disk1.vmdk -O qcow2 msie.qcow2

Import the qcow2 file using virt-manager.

Select Browse, then Browse Local, then select the .qcow2 file.

That’s it. Now looking at a virtual MS-Windows guest that I can use for those troublesome web conferences (and for testing web sites under MSIE. If you try the tracking test, it should take you to a protection page that prompts you to turn on the EasyPrivacy Tracking Protection List. That’s a quick and easy way to speed up your web browsing experience on MSIE.)

20 Jul 2015

Web Site User Agreement

Version 1.2: 20 July 2015

BY USING AND/OR VISITING THIS WEB SITE, YOU SIGNIFY YOUR ASSENT TO THESE TERMS AND CONDITIONS. IF YOU DO NOT AGREE TO ANY OF THESE TERMS, YOU SHOULD NOT USE THE WEB SITE. Your use of this web site, unless governed by a separate written agreement, is subject to the terms of this Agreement between you and Donald B. Marti (hereinafter “Site Owner”). Site Owner reserves the right to modify this Agreement without notice to you. The current version of this Agreement is available from a link on the web site home page. YOU AGREE THAT YOUR USE OF THIS WEB SITE SHALL BE AT YOUR SOLE RISK. THIS WEB SITE AND ITS CONTENT IS PROVIDED ON AN “AS IS” BASIS WITHOUT ANY WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION THE WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, AND FITNESS FOR PARTICULAR PURPOSE. “Web Site Software” means any software code, product, or work used in the creation, editing, storage, delivery, operation, or maintenance of this web site or its content, including but not limited to operating system, utilities, software development tools, graphic design and editing tools, audio recording, editing, and processing tools, language runtimes, and web server software. “Licensable” means having the right to grant, to the maximum extent possible, whether at the time of the initial grant or subsequently acquired, any and all rights to make, have made, use, sell, have sold and offer to sell any Web Site Software. “Patent Claims” means any patent claim(s), now owned or hereafter acquired, including without limitation, method, process, and apparatus claims, in any patent Licensable by you or your employer. In consideration for your use of and access to this web site, you grant Site Owner a worldwide, sublicensable, non-exclusive license under Patent Claims infringed by the making, using or selling of the Web Site Software, for the duration of the patents. You represent that you have the authority to grant this license on behalf of your employer. This license shall survive termination of this Agreement. If you choose to terminate this Agreement, you may do so by ceasing to use this web site, destroying any and all copies of any content from this site, or data collected from this site (including but not limited to the output of any network client, scanning or diagnostic software) in your possession, and notifying Site Owner in writing.

13 Sep 2014

A fresh start for advertising and the web?

Is advertising ruining the web? Ethan Zuckerman writes,

I have come to believe that advertising is the original sin of the web. The fallen state of our Internet is a direct, if unintentional, consequence of choosing advertising as the default model to support online content and services.

Is the web ruining advertising? Bob Hoffman writes,

[T]he advertising industry has become the web’s lapdog – irresponsibly exaggerating the effectiveness of online advertising and social media, ignoring the abominable results of display advertising, glossing over the fraud and corruption, and becoming a de facto sales arm for the online ad industry.

Advertising can be a good thing. Some of my favorite cultural goods are leftovers paid for by advertising at its best. There should be a way to make advertising work for the web, the way it has worked for print magazines.

But Hoffman and Zuckerman are both right. Web advertising has failed. We’re throwing away most of the potential value of the web as an ad medium by failing to fix privacy bugs. Web ads today work more like email spam than like magazine ads. The quest for “relevance” not only makes targeted ads less valuable than untargeted ones, but also wastes most of what advertisers spend. Buy an ad on the web, and more of your money goes to intermediaries and fraud than to the content that helps your ad carry a signal.

From Zuckerman’s point of view, advertising is a problem, because advertising is full of creepy stuff. From Hoffman’s point of view, the web is a problem, because the web is full of creepy stuff. (Bonus link: Big Brother Has Arrived, and He’s Us )

So let’s re-introduce the web to advertising, only this time, let’s try it without the creepy stuff. Brand advertisers and web content people have a lot more in common than either one has with database marketing. There are a lot of great opportunities on the post-creepy web, but the first step is to get the right people talking.

08 Aug 2013

Automatically run make when a file changes

(Update 24 Mar 2024: restart in case of an error.)

Really simple: do a makewatch [target] to re-run make with the supplied [target] when any files relevant to that target change.

makewatch script

Andrew Cowie has written something similar. The main thing that this one does differently is to ask make which files matter to it, instead of doing an inotifywatch on the whole directory. Comments and suggestions welcome.

19 Feb 2013

The Hedge

(I wrote this story a while ago, but the news is catching up fast, so I'd better put it up now.)

"Look, Agent Bellamy, I appreciate you coming out, but it's three in the morning. Can we set up a time to discuss this tomorrow, and can your people check the house while I'm at the office?"

Jack Murphy was too tired to follow some involved technical discussion with the big Intellectual Property Enforcement Agency agent, who sat in Murphy's old steam-bent office chair, briefcase at his feet. Murphy, quickly dressed in chinos and Stanford Law sweatshirt, sat in his new ergonomic chair at a gleaming glass and metal desk. The desk looked out of place in the much-remodeled Maryland house that Linda had found when Murphy accepted the appointment in DC.

"I'm afraid it won't wait, sir," Bellamy said. "Sorry for the unannounced visit, but as you're probably aware, our agency tracks the Free Markets closely."

"The Free Markets? That underground money web site?"

"Yes, basically. Although it's not really a site, just a system for communicating and trading. That's what makes it so hard to shut down."

"Well, all I know is that if you jailbreak your computer you can get on anonymously and buy drugs or guns or whatever."

"That's right. Let me show you an example." Bellamy pulled a plastic IPEA evidence bag out of his briefcase. Inside was a heavy semiautomatic pistol. It was raw machined steel without the usual blued finish, and a blank slide where the manufacturer's name and serial number would be. The plastic bag, oily on the inside, looked like it was lined with little rainbows. "You can't make steel parts like this on a 3D printer, but you can make parts for a plastic machine that will cut aluminum. Then you can use aluminum parts to make machines that can cut steel. People trade machines, parts and weapons every step of the way. This one's complete, and it works. It was on its way to an underground gunsmith who puts a nice finish on them."

Murphy could see the shiny steel reflected in both of the room's immaculate black windows. "It's like Adam Smith's pin factory."

"Yes. And this piece could have come from any combination of thousands of basement workshops. It's completely untraceable, and infringes a zillion patents. These things are a headache for us, but that's not why I'm here."

Murphy leaned over the desk, and Bellamy continued.

"There's also an online scene called the prediction markets. Oh, hold on, sorry." Bellamy spoke quietly into his jacket cuff. The agents who had arrived with Bellamy were still doing some kind of security sweep of the house. Murphy was glad that Linda was away, dropping Jack Jr. off at college. Security stuff always put her on edge.

"All right. Prediction markets," Bellamy said. "If I want to bet on a football game, I can buy a prediction, say 'Eagles win on Sunday.' If they win, after the game the prediction expires and I get a dollar."

"Sounds like just online gambling. They're just saying 'prediction' instead of 'bet.'" Murphy yawned and shook his head to try to clear it.

"Yes, it's like an ordinary bet in a lot of ways. If the Eagles lose, my prediction expires worthless. Just like losing a bet. But those predictions trade up and down, like stocks and bonds, right up until the end of the game."

"And they're untaxed and anonymous."

"Right. And there are other predictions I could make. I could buy a prediction on 'Jack Murphy dead before October 14th'." And if, for whatever reason, you're no longer with us that day, I make a dollar."

"So is that how the assassination market works? Someone just makes a bet that somebody else will be dead?"

"That's one side of the deal. That's the bet that the assassin makes. Someone else has to take the other side of the bet, and lose. If you want somebody dead, you just place a bet that they'll be alive. You lose your bet, but they get taken care of."

One of the agents who had come in with Bellamy was standing in the office door. His light blue gloves and shoe covers didn't go with his dark blue suit. He was holding Murphy's laptop computer, with Murphy's mobile phone and charger on top.

"We're going to need to check those in the van," Bellamy said. "We'll have them back in ten minutes."

Murphy nodded and the agent turned and left. Bellamy had introduced him but Murphy was too tired to remember the name.

"So the original client, or whatever you want to call him, makes a bet, and loses, and the assassin wins, and that's how the assassin gets paid. But you said a dollar. Nobody's going to murder someone for a dollar."

"Right. There has to be some volume in the market for it to be a significant risk. A lot of people have to be willing to buy those predictions of 'Jack Murphy alive.' and lose the money."

"So how is my stock doing?" Murphy knew that DC was still chattering about the news of his surprise appointment. The Secretary was an old colleague from think tank days, but nobody expected that the President would go along with bringing Murphy in. The President was too good a politician not to have his own person in every department's number two spot.

"That's why we're here. There's a lot of volume. A lot of outstanding predictions on you alive."

"They're predicting I'll be alive because they want me dead." Murphy finally yawned and got his hand over it.

Bellamy just continued. "Yes, that's right. The good news is that the administration has an independent fund for protecting appointees. Our agency can't know about it officially, of course. That fund buys the same 'dead' predictions that an assassin would. Makes it less profitable for the assassin. Basically, we play the market to lose. It's expensive, and it's not a hundred percent solution, but it's the best answer so far."

"What about just going after the people who want me dead?"

"Frankly, sir, that wouldn't scale. Between the senior citizens and the cat thing, our market model says that more than four hundred thousand people have some money on you. If you're alive next week, they make a little money. If you're dead, they're happy too."

Murphy was silent.

Bellamy said, "They don't really think of it as gambling. More like they're hedging their exposure to your continued existence."

Murphy looked up. One of the other agents, whose name Murphy didn't remember either, was standing in the doorway. "We're clear, sir. No cameras or devices left. Verified no other residents present. Charlie team is watching the egress. We're good to go."

"All right." Bellamy ripped open the evidence bag and pulled out the raw steel untraceable pistol. The room smelled of some kind of oil.

"What are you doing?" Murphy yelled. His voice went up in a squeak at the end. He grabbed for his desk phone and realized it was gone.

"Sorry, sir," said Bellamy. "But the money in that slush fund has to come from somewhere. Sometimes we play to win."

02 Nov 2011

Printer for Linux

Picking a printer for Linux?

The process is going to be a little different from what you might be used to with another OS. If you shop carefully (and reading blogs is a good first step) then the drivers you will need are already available through your Linux distribution’s printer setup tool.

HP has done a good job with enabling this. The company has already released the necessary printer software as open source, and your Linux distribution has already installed it. So, go to printers fully supported with the HPLIP software, pick a printer you like, and you’re done.

If you want a recommendation from me, the HP LaserJet 3055, a black and white all-in-one device, has worked fine for me with various Linux setups for years. It’s also a scanner/copier/fax machine, and you get the extra functionality for not much more than the price of a regular printer. It also comes with a good-sized toner cartridge, so your cost per page is probably going to be pretty reasonable.

Other printer brands have given me more grief, but fortunately the HP LaserJets are widely available and don’t jam much.

It’s important not to show a smug expression on your face while printing if users of non-Linux OSs are still dealing with driver CDs or vendor downloads.

05 Sep 2010

Landmarks in instructions

When you give travel directions, you include landmarks, and “gone too far” points. Turn left after you cross the bridge. Then look for my street and make a right. If you go past the water tower you’ve gone too far.

System administration instructions are much easier to follow if they include those kind of check-ins there, too. For example, if you explain how to set up server software you can put in quick “landmark” tests, such as, “at this point, you can run nmap and see the port in the results.” You can also include “gone too far” information by pointing out problems you can troubleshoot on the way.

A full-scale troubleshooting guide is a good idea, but quick warning signs as you go along are helpful. Much better than finding yourself lost at the end of a long set of setup instructions.

24 Dec 2008

dotted quad to decimal in bash

GNU seq doesn’t accept dotted quads for ranges, but fortunately most of the commands that accept an IP address will also take it in the form of a regular decimal. (Spammers used to use this to hide their naughty domains from scanners that only looked for the dotted quad while the browser would happily go to http://3232235520/barely-legal-mortgage.html or something.)

So here’s an ugly-ass shell function to convert an IP address to a decimal. If you have a better one, please let me know and I’ll update this page. (Yes, I know this would be one line in Perl.)

dq2int()
{
    if [ $(echo $1 | grep -q '\.') ]; then
        dq2int $(echo $1 | tr '.' ' ')
    elif [ $# -eq 1 ]; then
        echo $1
    else
        total=$1; next=$2; shift 2
        dq2int $(($total*2**8+$next)) $@
    fi
}

Seth Schoen has two shorter versions:

dq2int(){
a=0
for b in $(echo $1 | tr . ' '); do
    a=$((256*$a+$b))
done
echo $a
}

dq2int(){
a=0
for b in ${1//./ }; do
    a=$((256*$a+$b))
done
echo $a
}

And if you want to go the other way, Seth points out that you can set the “obase” variable for bc. Here’s an int2dq function based on that idea.

int2dq()
{
    { echo obase=256; echo $1; } | \
        bc | tr ' ' . | cut -c2-
}

To quote the GNU bc manual, “For bases greater than 16, bc uses a multi-character digit method of printing the numbers where each higher base digit is printed as a base 10 number.”

Trick.

09 Apr 2006

Transaction mail or junk mail? Check the postage.

It says “Personal and Confidential” or “IMPORTANT CORRESPONDENCE REGARDING YOUR OVERPAYMENT” on the envelope—can you really discard it without opening it? You sure can. Some junk mailers disguise their mail pieces as important correspondence from companies you actually do business with, and the USPS helped them out a lot by renaming “Bulk Mail” to “Standard Mail”. But you can look at the postage to discard “stealth” junk mail without opening it.

Postal regulations require that any bills or mail containing specific information about your business relationship with the company must be mailed First Class.

So, if “Standard Mail” or “STD” appears in the upper right corner, it’s not a bill, it’s not your new credit card, and it’s not a check. It’s just sneaky junk mail.

25 Mar 2006

On Web 2.0, application uses YOU!

Most of what separates Web 2.0 from pre-Web 2.0 is not really about the web. Paul Graham writes that Web 2.0 is about three things: AJAX, Democracy, and Don’t Maltreat Users. Democracy here means using users to collect decision-making information, such as evaluating whether something is worth reading or buying, for other users. Yes, visitors come to your site to get value that other users brought to it. Ka-ching!

Tim O’Reilly writes that Web 2.0 is about constantly improving software that’s available as a service; mixing data from multiple sources, including users; offering your own data and services in a flexible, mixable way; and going beyond the page metaphor (which sounds like AJAX to me.)

Anyway, the key part of being Web 2.0 is that you’re building value from many small information contributions that users don’t mind making. Every user whitewashes a little bit of the fence. Paul Graham points out that Google is a good example of this. When I say great burritos in San Francisco, Google uses my link-making work (and that of others) to amass awesome burrito (and other thing)-recommending power and rule the world. And I like it because I want my favorite San Francisco burrito place to succeed.

When people put geographical directions up using microformats, someone will crawl them and string the route decisions together to get a directions search engine with common sense (because it borrowed the common sense of millions of users) that doesn’t tell people to make an illegal left into oncoming traffic, the way a certain map site used to tell me to leave my old house every day. (70mph combined speed motor vehicle slalom! Yaaaaahooooo!)

Where AJAX fits into all this is that you’re snarfing one reputation information unit per click, quickly, instead of waiting for a whole page to render to suck the value out of the user’s head into your MySQL cluster where it becomes valuable. And you have to let users pull data back out and mix it, since that creates attention incentives for other users to push data in.

So far this Web 2.0 stuff sounds like it’s all about web sites. How can companies that aren’t basically web sites or mail-order catalogs be Web 2.0? Some already are. Remixed FedEx lately? Download their sample code and try their API.

Hold on a second—you don’t have to be a FedEx partner to do that? No, and that’s the first concrete difference between Web 2.0 and non-Web-2.0 companies. From a pre-2.0 point of view, the partner program is what enables companies to interact with you. Start thinking 2.0, though, and the partner program looks more and more like pointless bureaucracy that keeps non-partner companies out. Just as you want Googlebot to crawl your product pages, (and some of you will go flame Matt Cutts if it doesn’t) you want any company whose stuff can plug into yours to try your API.

You could probably do a pretty reliable Web-2.0-or-not-o-meter based on dates in the RSS feed for API announcements vs. dates in press releases matching /partner/i.

What next? Larry Augustin points out that sales and marketing accounts for 82 percent of new software license revenue. Ouch! Let’s throw some Web 2.0 magic at that number. And I don’t mean the sales part. The web, together with open source licensing, easy-to-demo ASP, and virtualization, is already taking a huge chunk out of the sales side.

But a huge, expensive part of software marketing is involved in information gathering, too. It’s really expensive to hire Software Marketing people to gather requirements from users, write big word processor documents full of what the users want, and show each other Microsoft PowerPoint presentations, then tie up the developers showing them again.

Within organizations, we know all about using Extreme Programming and other agile methodologies that replace the obsolete-as-soon-as-finished overdocumented waterfall development process. But often, as soon as projects cross organizational lines, we’re back to the kind of thing that with-it software people rightly make fun of.

Where web sites are concerned, Web 2.0 took the excellent idea of APIs from software developers and gave them to the webmasters. In off-web businesses, it’s taking development methodologies from the developers and giving them to marketing people.

But how do you know what to build when the Big Dumb Word Processor Document of What To Build is gone? This is where it gets fun. The customers are already telling you what they want, if you know how to listen to them. Web 2.0 companies are concentrating on building the place for that conversation to take place, instead of writing the damn document themselves. You get better, faster, cheaper when you lose the waterfall.

After all, some users will sit still for Focus Groups and other 20th-century marketing, but even if they do, you’re stuck dealing with the resulting data yourself. Canonical Ltd. takes the specification process where Google takes the search result ranking process—outside the company as much as possible. That doesn’t mean that random users design Canonical’s products for it, any more than search engine spammers define Google results. But in Web 2.0 you get the users to whitewash the fence.

There’s some overlap between being a Web 2.0 company and being an open source one. Here’s where I think Web 2.0 goes further than open source. If open source is trees, Web 2.0 is hemp. Instead of harvesting big particpation from a committed developer, tester, user, partner, or customer, you get a small quantity of fiber per transaction, fast, and you do a lot of them.

I think there’s a limit to how far pure conversationality and social software can take this, and that we’re going to have to get hairier information-sucking-a-little-bit-at-a-time-from-peoples-heads tools such as prediction markets. But Web 2.0 unplugged from the web can take us a lot further, faster, than the alternatives can, and, especially in the area of business software, companies are already using it.

17 Apr 2005

Escape URLs from the command line

Quick way to convert text to URL-escaped text:

perl -MCGI -e 'print CGI->escape($ARGV[0]), "\n"' 'Escape This!'

17 Apr 2005

An old Just Like a Book software license

This is an example of a “just like a book” license from an old 5.25 floppy sleeve, and they apparently don’t know what software it came with.

There’s a scan of the original sleeve.

By opening this package, you are agreeing to be bound by the following agreement.

This software product is copyrighted, and all rights are reserved by the publisher and the author. You are licensed to use the software on a single computer. You may copy and/or modify the software as needed to facilitate your use of it on a single computer. Making copies of the software for any other purpose is a violation of the United States copyright laws.

This software is sold AS IS without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Neither the publisher nor its dealers or distributors assumes any liability for any alleged or actual damages arising from the use of this program. (Some states do not allow the exclusion of implied warranties, to this exclusion may not apply to you.)

17 Apr 2005

eval button

Jef Raskin wrote,

All that is really needed on computers is a “Calculate” button or omnipresent menu command that allows you to take an arithmetic expression, like 248.93 / 375, select it, and do the calculation whether in the word processor, communications package, drawing or presentation application or just at the desktop level.

Fortunately, there’s a blue “Access IBM” button on this keyboard that doesn’t do much. So, I configured tpb to make “Access IBM” do this:

perl -e 'print eval `xsel -o`' | \
xsel -i && xte 'key Delete' 'mouseclick 2'

(That is, get the contents of the X primary selection, run it through a Perl “eval”, put the result back into the X primary selection, then fake a delete and paste.)

Here’s a version that uses the X clipboard selection instead.

xte 'keydown Control_L' 'key c' 'keyup Control_L' && \
perl -e 'print eval `xsel -b -o`' | xsel -b -i  && \
xte 'keydown Control_L' 'key v' 'keyup Control_L'

This one seems to work better in gedit.

If you want to do this, besides tpb, you’ll need xsel and xte, which is part of xautomation. If you don’t have an unused button, you could also set up a binding in your window manager or build a big red outboard USB “eval” button or something.

17 Apr 2005

Force ssh not to use ssh-agent

If you make a new ssh key and try to use it with ssh -i while running ssh-agent, ssh tries the agent first. You could end up using a key provided by the agent instead of the one you specify. You can fix this without killing the agent. Use:

env -u SSH_AUTH_SOCK ssh -i newkey host

08 Apr 2005

Picking a Linux distribution

(Updated 24 March 2025)

The most important part of picking a Linux distribution is thinking about where you will go for help, and what distribution that source of help understands. That’s true if your source of help is a vendor, a consultant, a user group, or an online forum about a particular Linux use case or application.

As a home user, you’ll probably be asking your local Linux users group for help when you need it. So get on the mailing list and just “lurk” for a while. See what the most helpful people on the list use, and install that. That way if you have a question, you’ll be more likely to reach someone who has already dealt with it.

If you’re getting into uses for Linux that are different from those of your local user group, it’s more important to use a list of people working on projects similar to yours than just the geographically closest user group. Find where people are talking about that kind of project and pay attention to what they’re using.

If you have an application that you want to run on Linux, either because you’re moving from another OS and are already running it on there, or because you have read or watched the tutorials and want to use it, then get on that application’s support list or forum and see who gets good help. Sometimes an application only gets packaged by a subset of the distributions, and sometimes an application’s users find it hard to get support because their distribution has an older or patched version. Some application maintainers are fired up about Flatpak as a way to distribute on Linux, so users of Flatpak-friendly distributions will get better support. But you don’t need to get into the details of packaging methods. Figure out where people are getting support for the application you want, and look at which distributions tend to be better supported.

01 Jan 2005

ssh scripts: fail fast

If you have a script that uses ssh, here’s something to put at the beginning of the script to make sure the necessary passphrase has already been entered, and the remote host is reachable, before starting a time-consuming operation such as an rsync.

ssh $REMOTE_HOST true || exit 1