Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror

Comment Re:WRATH OF THE BLUESKY WOKE MORONS! (Score 1) 73

1) The above is the definition of the 14 core characteristics of fascism, from "Ur-Fascism" by Umberto Eco, who grew up in Fascist Italy, and is an influential essay on the characteristics of fascist thought (which well predates the new movement).

2) Fascism is not a synonym of Naziism. The Nazis were the the first fascist movement that had the word "socialist" in their name. Several minor ones (like the British Nazi Party) later cribbed it from them, but for example the Italian Fascist movement didn't use it at all when they came to power, the Spanish described their movement (accurately) as syndicalist, etc. In general, Fascist movements were a mix of syndicalism and corporatism, sometimes with a window dressing of socialism to smooth over alliances with powerful oligarchs.

3) The background on the name: Fascism is, as noted by Eco, an overwhelmingly middle class movement, but it likes to dress itself up in the trappings of the working class (the working class, by contrast, has historically been more attracted to socialism, which the middle class sees as a threat to its status). Fascist imagery commonly uses and glorifies the image of "the working man", with hyper-masculinization and motifs promoting the concepts of glory and sacrifice, with the leader presented as the voice of the working man.

But as for Naziism in specific: The Nazi party had its roots as the German Workers Party, which presented itself as a right-wing alternative to the Communist Party (KPD). Hitler joined and soon took control of the party, and in 1920 rebranded it as the "National Socialist German Workers' Party" (Nationalsozialistische Deutsche Arbeiterpartei / NSDAP). The party sought, in the name, to tap into the working class (the country was in a massive economic crisis with large numbers of unemployed people). At the same time, it sought to set itself apart from "socialist parties", such as the Social Democratic Party (SPD), by stressing nationalism. Other parties of the 1930s included the Center Party (Zentrum), aka the Catholics (also their offshoot, the Bavarian People's Party (BVP)); the German National People's Party (DNVP), aka the monarchists (probably the closest party to the Nazis, philosophically); German People's Party (DVP), sort of a middle class-pro business party, sort of your "Never-Trump Republicans" or "Conservadems" (also similar: the German Economic Party (Wirtschaftspartei)); and the German Democratic Party (DDP), aka non-communist pro-democracy leftists.

In the early days of the party - late teens to the early 1920s - the party had a mix of left and right stances. It was from the start rabidly nationalist, anti-immigrant, anti-Jewish, and in general met all of Eco's characteristics of fascism - but it also had some genuinely socialist-leaning members like Gregor Strasser and his brother Otto, who advocated for nationalization of industry and land reform. However, after the failed Beer Hall Putsch, Hitler became increasingly dominant. Hitler increasingly marginalized the socialist wing in favour of powerful corporate alliances, and then outright eliminated them in the Night of the Long Knives in 1934.

The party still retained a number of superficially socialist policies, but, as examples:

x - They created the German Labor Front (Deutsche Arbeitsfront)... but after abolishing all independent unions. This allowed them to keep the appearance of supporting workers rights, while bringing all workers under their control and eliminating their actual ability to negotiate or strike.
x - They created affordable state-sponsored wellness camps and facilities and the like, with an emphasis on the outdoors, esp. to help people detox, etc, but if you're trying to avoid comparisons when you have RFK planning basically the same thing, that's not helping
x - They set a number of price controls and invoked war production acts, but again, that's not really helping the case vs. the current US administration's trade policy either
x - They talked about improving healthcare, housing, land reform, etc, but actually did very little in these regards. Again, not helping.

But overall, they much more strongly aligned with German oligarchs. Before Hitler gained power he started heavily meeting with industrialists. In a meeting in February 1933 he got most of them to "bend the knee" and provide financial support. Oligarchs were generally wary of the Nazis, but more afraid of the communists and socialists. One of Hitler's first acts in power was, as mentioned, to suppress all of the unions, which further cemented his alliance with the oligarchs. The legal code was sculpted into one of "guided capitalism" (what one might today call "Putinism"), where oligarchs were allowed to (and assisted in) amassing wealth, so long as they bent the knee to Nazi goals when it was demanded of them. Cartelization was encouraged, rather than discouraged. Large industrialists benefited massively under the Nazis, receiving large orders, suppression of strikes, access to slave labour (late regime), protection from nationalization, etc. Krupp expanded dramatically. IG Farben expanded dramatically. Major banks expanded dramatically. German automakers expanded dramatically. It was high times for German industrialists, and again, all they had to do was bend the knee. The Quandt family for example, which owns most of BMW today, owes its fortune to largesse from the Nazis.

But again, to reiterate, Naziism was a particular variant of fascism (perhaps notable for its inclusion of an intensely virulent Jewish conspiracy - other fascist movements were hardly pro-Jewish, but most did not include this notion of Jews as a "society-destroying woke mind virus")

Comment Re:What could be vs what will be (Score 1) 68

(That said, I do expect that early on, we'll see Star Wars Prequels-Disease, where some director obsesses over a new tech that's not yet very advanced - such as Lucas with CGI - and overuses it, badly, with little care for quality, and gets audiences sick of it for a while and hypersensitive to when it's badly done. "No no, we can't just suspect that force-lifted apple with a string - we need to use a *CGI apple*!" Substitute in AI for CGI here...)

Comment Re:What could be vs what will be (Score 1) 68

Oh yeah, those special effects totally hold up today *eyeroll*

Also, Jurassic Park cost $63M to produce, which is $136M today. Only a small fraction of that was computing hardware.

What's weird about these conversations is that everyone polarizes into all-or-nothing. Either AI is going to make movies entirely from scratch, or not a single frame is going to be touched by AI at all. That's not what people like James Cameron are getting at. They want high-quality tweeting. You can provide specific frames, and get those frames exactly how you want them, and then have the AI generate what happened between those frames - and you get many different clips to see different ways the transition could look. And if you like parts of one or another, you can take subframes from those, and tween between them. Your actor is standing on a skyscraper, and an alien missile slams into it, and the skyscraper starts tilting, then crumbles? You take your last filmed shot and photoshop together (with or without aid from AI inpainting) basically a comic book of how you want the attack to go down. You don't actually have to model the building and the missile and all the explosion physics - you just need to show the stages that you want the action to go through.

Also, there's editing and postprocessing. Uh oh, a stage hand was accidentally visible in one scene? Just blip them away. An actor's dialog was changed, but now their lips don't match? Resync to the new dialog. Basically any inconsistencies can be blipped out.

I *totally* get why they want this. And it's not "just write a prompt and get a movie"; it's just to be able to make their work faster and better. Yeah, you could go for "WAY faster, but worse", but then that'd be a box-office flop, because nobody wants to watch a movie where people move through each other or grow extra legs or whatnot. But pairing actual human work with AI lets each leverage each others' strengths and make up for each others' weaknesses.

Comment Re:WRATH OF THE BLUESKY WOKE MORONS! (Score 0) 73

And on that subject... if you're in a movement that (1) has a "cult of tradition", longing to go back to an imagined former greatness and seeing progress as backtracking; (2) "rejection of modernism", which views the rationalistic development of Western culture since the Enlightenment as a descent into depravity (but NOT rejection of industrial potency); (3) "cult of action for action's sake", such as attacks on modern culture and science even when the attacks are self-defeating; (4) "disagreement is treason"; (5) "fear of difference", often in the form of racism or an appeal against foreigners and immigrants; (6) "appeal to a frustrated middle class", fearing economic pressure from the demands and aspirations of lower social groups; (7) "obsession with a plot", such as a New World Order, Great Replacement Theory, Deep State, etc, (8) "at the same time too strong and too weak" - the enemy as simultaneously a massive oppressor with its claws ruthlessly in everything, yet also sniveling frail snowflakes; (9) "Pacifism is trafficking with the enemy" - if you oppose some favoured military action by you or your close allies, you too become the enemy to destroy; such regimes also strongly support military armament and expansionist policies (10) "Contempt for the weak", both within and between societies, with a strongly Social-Darwinist view on how the world should run; (11) "Everybody is educated to become a hero" - everyone is expected to sacrifice for the cause, with no sacrifice too great; (12) "Machismo" - holding "both disdain for women and intolerance and condemnation of nonstandard sexual habits, from chastity to homosexuality" (in our time, perhaps no focus could capture this locus better than the topic of trans people, though the new obsession over testosterone levels and "tradwives" certainly competes); (13) "Selective populism" - "The People", conceived monolithically, have a common will, distinct from and superior to the viewpoint of any individual, of which the leader holds himself out as the interpreter (though truly he alone dictates it); commonly used to deligitimize democratic institutions who they argue are "no longer represent[ing] the voice of the people"; (14) "Newspeak" - catchphrases become mantras and thought-terminating cliches.

x ... ... If that sounds at all like a movement you're in, then yes, you're a fascist.

Comment Re:WRATH OF THE BLUESKY WOKE MORONS! (Score 1) 73

Lol, if you think Bluesky is an echo chamber on the topic of Adobe's turn to AI... I can't think of any topic that more evenly divides the site's members than AI.

(And for the record, Bluesky is an Umberto Eco chamber, where we discuss semiotics and the latent characteristics of fascist movements ;) )

Comment Re:Beware of Pooh's Bearing gifts (Score 1) 90

Look, if you want an olive branch here: If you're looking for a local machine for inference of large models for under $10k instead of tens to hundreds of thousands of dollars... yeah, the M3 ultra IS a good option. I do not object to this - at all.

What I object to is the nonsensical claim that it is "fast" or "efficient" compared to modern NVidia servers. It is not. At all. Unless you're making lazy, contrived scenarios, that is.

Comment Re:Beware of Pooh's Bearing gifts (Score 1) 90

First, summary != article.

Hey, let's play a little game called "scroll up in the thread": "That said, a lot of this article summary is nonsensical hype"

Literally my very first post in the thread.

That said, everything in the summary is from the article, including that quote, so it doesn't matter which one is referred to.

You're doing it again.
Confusing compute with memory bandwidth.

I'm not "confusing" anything. As was laid out in detail above, compute is maxed in actual real-world usage. Which is the reason why this hardware is made with such extreme compute capabilities.

You brought up an irrelevant data point, and I pointed out the stupidity of it.

It is precisely the topic of the thread that the M3 has the computational performance of a potato when in, properly run, real-world scenarios, the compute capacity absolutely is critical - which is why servers designed for AI tasks have such immense compute capacity to begin with.

You can run R1, period, in 200W.

You "can" run R1 on 20W. That doesn't make it either fast or efficient. This is a thread about performance and efficiency ,as a result of a summary about performance and efficiency, as a result of an article about performance and efficiency.

It's not a naive parallelization approach- it's a simple fact. A network must be evaluated sequentially. The layers must be split between the 2 cards, and you cannot evaluate layer 2 until you have evaluated layer 1.

I *literally described to you different forms of parallellization and their optimizations* beyond , and you keep posting as if that never happened. Pipeline parallelization by layers is NOT the only way to distribute a model across multiple servers. And MoEs CAN distribute whole experts to individual machines so that only hidden states before and after the FFN need to be synced.

There are numerous libraries (seemingly growing by the day) for how best to manage parallelization. It is NOT, I repeat, NOT, just "let's put these layers on machine 1, and these other layers on machine 2",

You are also of course correct about batching- which is where the multi-GPU paradigm actually shines- in service multiple inferences at once, even if any particular inference is still limited by the performance of a single card. You, as a person, with your 2 B100s, or 7 RTX3090s, are not going to be helped by that expansion

If you're not a moron and you run speculative decoding, YES, you WILL benefit from that performance. Even in the deeply-abnormal "single-user-issuing-queries-consecutively" scenario. Speculative decoding in effect creates batching from a single prompt.

I'll repeat: you keep comparing naive inference approaches as if the year were 2019 and no modern research on fast inference had been done. It's frankly embarrassing.

Comment Re:As my humble zero-analysis Dunning-Kruger take. (Score 1) 109

To me, your "better to have two independent mechanisms that just-so exist with just-so parameters, than one mechanism driven by a more complex underlying process", comes across as the same as saying:

"Hmm, when I drop this rock it goes down but when I release this balloon it goes up... rather than trying to unify the two, which would have to deal with things like density and interactions with the surrounding atmosphere, I'll just say there are two separate forces, one which pulls rocks down and one which lifts balloons up".

Or:

"Hmm, when I roast these nuts, they turn sweet, but when I roast this sugar, it turns bitter. Rather than trying to understand the chemistry behind the Maillard reaction, I'll just define an equation that describes the sweetening of nuts and a different one that describes the bittering of sugar."

Your approach is not just deeply conceptually unsatisfying, but runs counter to the goals of physics research throughout its entire history. We live in a world full of complex reactions that create net impacts that are the result of their components.

Comment Re:As my humble zero-analysis Dunning-Kruger take. (Score 1) 109

That's an impressive straw man there. The entire theory of quintessential inflation is that it didn't turn off - that its intensity just declined by many orders of magnitude with density/time, and continues to decay with density/time. You're the one inserting "turned off" into the picture.

(And yes, just to head you off, there are numerous papers on reheating with respect to quintessential inflation, and there's a surprisingly large number of viable mechanisms)

And yes, electroweak unification should not have been accepted if it was just an arbitrary mashing together of two things without any evidence.

Again, impressive straw man work there. Nothing - not disjoint inflation vs. dark energy, nor unified inflation with dark energy - should be "accepted" "without evidence". But nobody serious rejected the search for a mechanism linking the two just because "it involves a more complex interaction than just having a few disjoint parameters", which is the argument you've been pushing to conceptually reject quintessential inflation.

Comment Re:Beware of Pooh's Bearing gifts (Score 1) 90

I fully get you want to entirely ignore the compute capability of the M3 and avoid having to discuss it at all costs because it's embarassingly slow by the standards of AI tasks, and yes, this VERY much matters in the real world. Because if I were trying to argue your side, I'd likewise be trying to avoid having to deal with discussing how few FP4 FLOPS the M3 has.

Comment Re:Beware of Pooh's Bearing gifts (Score 1) 90

The super-fast came from your imagination.

Huh, I must have imagined that the summary said "The new DeepSeek-V3-0324 in 4-bit runs at > 20 tokens/second on a 512GB M3 Ultra with mlx-lm!" as if this some sort of extreme performance figure. They even included an exclamation point for good measure. Or did I imagine that too?

As for efficiency? I don't think that can be reasonably argued. It is vastly more efficient.

Utter nonsense. It has an literal order of magnitude worse fp4 tflops per watt.

The mention of 3090 was only to have a flops comparison point as to how poor the performance of the M3 studio is.

It's not poor at all, particularly in the context that it can do things you need 7 3090s to do.

YOU are the only person here suggesting the absurd notion of using 7 3090s. The 3090 was only brought up to give a grounding of the level of compute power. NOT as a VRAM comparison. NOT as a "suggested alternative implementation". The fact that this has been pointed out to you multiple times and yet you persist has far beyond moved into straw man territory. You've decided what scenario you actually want to argue about - a scenario that was never suggested - and persist in trying to argue about it rather than have to defend the simply false case that the M3 is higher performance than modern NVidia servers.

Was the article inarticulate in what the critical difference really is? Of course.

"Inarticulate" is a kind way to spell "wrong".

You do NOT get a mere 20 tokens per second at fp4 precision on Deepseek on Nvidia servers that consume kilowatts of power.

You don't.

Then how can you avoid reaching the conclusion that the author's comparison of the power consumption of the two is absurd?

For our similarly outfit B200 system, we'll need a minimum of 2 B200s (192GB a piece).
Layer offloading is sequential, so we won't be able to leverage the performance of both of them, so for all that power and bandwidth, we're still limited to the token crunching performance of a single B200- roughly 10x that of an M3 Ultra.

Beyond the fact that this is a naive parallelization approach, it presumes zero batching. In the real-world, many batches are processed concurrently. Matrix ops scale with batch size, which is why servers are designed with such an emphasis on TFLOPS rather than memory size. You very much *do* actually utilize the 2 orders of magnitude more flops. And if you weren't doing that, then you wouldn't be drawing the full power consumption either.

Even if we take batching (aka, the real world) out of the equation, that again is a naive parallelization approach - you're acting like layer parallelism is the only approach, when its the least efficient way to go around it. Basic tensor parallelism, well implemented, is generally much faster. Beyond that, with MoEs, you can d expert parallelism, localizing specific experts to individual servers, needing only to sync the output hidden states. There's also various just-in-time asynchronous data transfer methods (like are used in DualPipe). And then there's speculative decoding, which in effect creates self-batching, so even if you're only serving individual consecutive requests (not a mainstream serving task), you still benefit from the utilization efficiency of batching.

You're arguing for a nonsensical scenario.

Slashdot Top Deals

panic: kernel trap (ignored)

Working...