Tag: safety

Interpretable Features

May 26, 2024

—

by

hans

in Uncategorized

A team at 𝐀𝐧𝐭𝐡𝐫𝐨𝐩𝐢𝐜, creator of the Claude models, published a paper about extracting 𝐢𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐥𝐞 𝐟𝐞𝐚𝐭𝐮𝐫𝐞𝐬 from Claude 3 Sonnet. This is achieved by placing a sparse autoencoder halfway through the model and then training it. An autoencoder is a neural network that learns to encode input data, here a middle layer of Claude, into…
Fraunhofer launches FhGenie

Mar 24, 2024

—

by

hans

in Uncategorized

The Fraunhofer Gesellschaft, one of Germany’s largest and most renowned research agencies, published a paper “FhGenie: A Custom, Confidentiality-preserving Chat AI for Corporate and Scientific Use“, in which the authors describe a customized chat AI, baptized FHGenie. The paper describes the motivation and requirements leading to the design, as well as the solution’s architecture. 1.…