hessen.social ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
hessen.social ist die Mastodongemeinschaft für alle Hessen:innen und alle, die sich Hessen verbunden fühlen

Serverstatistik:

1,6 Tsd.
aktive Profile

#generativeai

42 Beiträge34 Beteiligte3 Beiträge heute

"To test this out, the Carnegie Mellon researchers instructed artificial intelligence models from Google, OpenAI, Anthropic, and Meta to complete tasks a real employee might carry out in fields such as finance, administration, and software engineering. In one, the AI had to navigate through several files to analyze a coffee shop chain's databases. In another, it was asked to collect feedback on a 36-year-old engineer and write a performance review. Some tasks challenged the models' visual capabilities: One required the models to watch video tours of prospective new office spaces and pick the one with the best health facilities.

The results weren't great: The top-performing model, Anthropic's Claude 3.5 Sonnet, finished a little less than one-quarter of all tasks. The rest, including Google's Gemini 2.0 Flash and the one that powers ChatGPT, completed about 10% of the assignments. There wasn't a single category in which the AI agents accomplished the majority of the tasks, says Graham Neubig, a computer science professor at CMU and one of the study's authors. The findings, along with other emerging research about AI agents, complicate the idea that an AI agent workforce is just around the corner — there's a lot of work they simply aren't good at. But the research does offer a glimpse into the specific ways AI agents could revolutionize the workplace."

tech.yahoo.com/ai/articles/nex

Yahoo Tech · Carnegie Mellon staffed a fake company with AI agents. It was a total disaster.Von Shubham Agarwal
#AI#GenerativeAI#AIAgents

Watchdog: 380 Prozent mehr KI-generierte Bilder sexuellen Kindesmissbrauchs

Die Internet Watch Foundation verankert 62 Prozent aller von ihr 2024 ausgemachten Webseiten und Online-Foren rund um sexuellen Kindesmissbrauch in EU-Ländern.

heise.de/news/Watchdog-380-Pro

heise online · Experten: Immer mehr KI-Bilder sexuellen KindesmissbrauchsVon Stefan Krempl

Today the kickoff meeting of our project course "Telling Data Stories with Semantic Technologies and Generative AI" in collaboration with Academy of Sciences & Literature, Mainz, took place, introducing the general topic and our dedicated course software & research data infrastructure. Stay tuned for news & updates ;-)

@lysander07 @tabea @MahsaVafaie @epoz @fiz_karlsruhe @KIT_Karlsruhe @nfdi4culture #NFDIrocks #researchdata #semweb #semanticweb #knowledgegraphs #AI #generativeAI #llms #SPARQL

"This German startup is Europe’s best hope for developing AI advancement outside Silicon Valley"

Last year, Germany had everything it needed to start creating its own ChatGPT 🇺🇸 or a German Le Chat (Mistral) 🇫🇷 .

However, the most promising German AI company, Aleph Alpha 🇩🇪, which was supported by major German companies, decided to stop investing in an European LLM to compete with OpenAI: "It doesn’t justify the investment"

Instead, it chose to focus on business-to-business (B2B) and government markets - it also means it would generate revenue right away.

Full article:

fortune.com/europe/2024/09/07/

Fortune · This German startup is Europe’s best hope for developing AI advancement outside Silicon ValleyVon Mark Bergen

"For the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the first glimpses of this back in early 2023, with Perplexity (first launched December 2022, first prompt leak in January 2023) and then the GPT-4 powered Microsoft Bing (which launched/cratered spectacularly in February 2023). Since then a whole bunch of people have taken a swing at this problem, most notably Google Gemini and ChatGPT Search.

Those 2023-era versions were promising but very disappointing. They had a strong tendency to hallucinate details that weren’t present in the search results, to the point that you couldn’t trust anything they told you.

In this first half of 2025 I think these systems have finally crossed the line into being genuinely useful."

simonwillison.net/2025/Apr/21/

Simon Willison’s WeblogAI assisted search-based research actually works nowFor the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the …
#AI#GenerativeAI#Search

"This course is intended to provide you with a comprehensive step-by-step understanding of how to engineer optimal prompts within Claude.

After completing this course, you will be able to:

- Master the basic structure of a good prompt
- Recognize common failure modes and learn the '80/20' techniques to address them
- Understand Claude's strengths and weaknesses
- Build strong prompts from scratch for common use cases

Course structure and content

This course is structured to allow you many chances to practice writing and troubleshooting prompts yourself. The course is broken up into 9 chapters with accompanying exercises, as well as an appendix of even more advanced methods. It is intended for you to work through the course in chapter order.

Each lesson has an "Example Playground" area at the bottom where you are free to experiment with the examples in the lesson and see for yourself how changing prompts can change Claude's responses. There is also an answer key.

Note: This tutorial uses our smallest, fastest, and cheapest model, Claude 3 Haiku. Anthropic has two other models, Claude 3 Sonnet and Claude 3 Opus, which are more intelligent than Haiku, with Opus being the most intelligent.

This tutorial also exists on Google Sheets using Anthropic's Claude for Sheets extension. We recommend using that version as it is more user friendly."

github.com/anthropics/courses/

#AI#GenerativeAI#LLMs

"We must stop giving AI human traits. My first interaction with GPT-3 rather seriously annoyed me. It pretended to be a person. It said it had feelings, ambitions, even consciousness.

That’s no longer the default behaviour, thankfully. But the style of interaction — the eerily natural flow of conversation — remains intact. And that, too, is convincing. Too convincing.

We need to de-anthropomorphise AI. Now. Strip it of its human mask. This should be easy. Companies could remove all reference to emotion, judgement or cognitive processing on the part of the AI. In particular, it should respond factually without ever saying “I”, or “I feel that”… or “I am curious”.

Will it happen? I doubt it. It reminds me of another warning we’ve ignored for over 20 years: “We need to cut CO₂ emissions.” Look where that got us. But we must warn big tech companies of the dangers associated with the humanisation of AIs. They are unlikely to play ball, but they should, especially if they are serious about developing more ethical AIs.

For now, this is what I do (because I too often get this eerie feeling that I am talking to a synthetic human when using ChatGPT or Claude): I instruct my AI not to address me by name. I ask it to call itself AI, to speak in the third person, and to avoid emotional or cognitive terms.

If I am using voice chat, I ask the AI to use a flat prosody and speak a bit like a robot. It is actually quite fun and keeps us both in our comfort zone."

theconversation.com/we-need-to

The ConversationWe need to stop pretending AI is intelligent – here’s how
Mehr von The Conversation UK
#AI#GenerativeAI#LLMs

"The challenge, then, isn’t just understanding where A.I. is headed—it’s shaping its direction before the choices narrow. As an example of A.I.’s potential to play a socially productive role, Autor pointed to health care, now the largest employment sector in the U.S. If nurse practitioners were supported by well-designed A.I. systems, he said, they could take on a broader range of diagnostic and treatment responsibilities, easing the country’s shortage of M.D.s and lowering health-care costs. Similar opportunities exist in other fields, such as education and law, he argued. “The problem in the economy right now is that much of the most valuable work involves expert decision-making, monopolized by highly educated professionals who aren’t necessarily becoming more productive,” he said. “The result is that everyone pays a lot for education, health care, legal services, and design work. That’s fine for those of us providing these services—we pay high prices, but we also earn high wages. But many people only consume these services. They’re on the losing end.”

If A.I. were designed to augment human expertise rather than replace it, it could promote broader economic gains and reduce inequality by providing opportunities for middle-skill work, Autor said. His great concern, however, is that A.I. is not being developed with this goal in mind. Instead of designing systems that empower human workers in real-world environments—such as urgent-care centers—A.I. developers focus on optimizing performance against narrowly defined data sets."

newyorker.com/magazine/2025/04

The New Yorker · How to Survive the A.I. RevolutionVon John Cassidy
#AI#GenerativeAI#Automation