Monorepo for Aesthetic.Computer aesthetic.computer
at main 390 lines 25 kB view raw
1% !TEX program = xelatex 2\documentclass[10pt,letterpaper,twocolumn]{article} 3 4% === GEOMETRY === 5\usepackage[top=0.75in, bottom=0.75in, left=0.75in, right=0.75in]{geometry} 6 7% === FONTS === 8\usepackage{fontspec} 9\usepackage{unicode-math} 10\setmainfont{Latin Modern Roman} 11\setsansfont{Latin Modern Sans} 12\setmonofont{Latin Modern Mono}[Scale=0.85] 13\newfontfamily\acbold{ywft-processing-bold}[ 14 Path=../../system/public/type/webfonts/, 15 Extension=.ttf 16] 17\newfontfamily\aclight{ywft-processing-light}[ 18 Path=../../system/public/type/webfonts/, 19 Extension=.ttf 20] 21 22% === PACKAGES === 23\usepackage{xcolor} 24\usepackage{titlesec} 25\usepackage{enumitem} 26\usepackage{booktabs} 27\usepackage{tabularx} 28\usepackage{fancyhdr} 29\usepackage{hyperref} 30\usepackage{graphicx} 31\usepackage{ragged2e} 32\usepackage{microtype} 33\usepackage{natbib} 34\usepackage{listings} 35\usepackage{amsmath} 36\usepackage[colorspec=0.92]{draftwatermark} 37 38% === COLORS === 39\definecolor{acpink}{RGB}{180,72,135} 40\definecolor{acpurple}{RGB}{120,80,180} 41\definecolor{acdark}{RGB}{64,56,74} 42\definecolor{acgray}{RGB}{119,119,119} 43\definecolor{draftcolor}{RGB}{180,72,135} 44 45% === DRAFT WATERMARK === 46\DraftwatermarkOptions{ 47 text=WORKING DRAFT, 48 fontsize=3cm, 49 color=draftcolor!18, 50 angle=45, 51 pos={0.5\paperwidth, 0.5\paperheight} 52} 53 54% === JS SYNTAX COLORS === 55\definecolor{jskw}{RGB}{119,51,170} 56\definecolor{jsfn}{RGB}{0,136,170} 57\definecolor{jsstr}{RGB}{170,120,0} 58\definecolor{jsnum}{RGB}{204,0,102} 59\definecolor{jscmt}{RGB}{102,102,102} 60 61\lstdefinelanguage{acjs}{ 62 keywords={function, let, const, return, if, for, export, import, new, true, false}, 63 keywordstyle=\color{jskw}\bfseries, 64 ndkeywords={encodeSampleToBitmap, decodeBitmapToSample, loadPaintingAsAudio, saveTo, loadFrom}, 65 ndkeywordstyle=\color{jsfn}, 66 stringstyle=\color{jsstr}, 67 commentstyle=\color{jscmt}\itshape, 68 morecomment=[l]{//}, 69 morecomment=[s]{/*}{*/}, 70 morestring=[b]", 71 morestring=[b]', 72 sensitive=true, 73 basicstyle=\ttfamily\scriptsize, 74 breaklines=true, 75 frame=none, 76 columns=fullflexible, 77} 78 79% === HYPERREF === 80\hypersetup{ 81 colorlinks=true, 82 linkcolor=acpurple, 83 urlcolor=acpurple, 84 citecolor=acpurple, 85 pdfauthor={@jeffrey}, 86 pdftitle={Every Sound is a Painting: Sampling as Visual-Auditory Practice in Aesthetic Computer}, 87} 88 89% === SECTION FORMATTING === 90\titleformat{\section} 91 {\normalfont\bfseries\normalsize\uppercase} 92 {\thesection.} 93 {0.5em} 94 {} 95\titlespacing{\section}{0pt}{1.2em}{0.3em} 96 97\titleformat{\subsection} 98 {\normalfont\bfseries\small} 99 {\thesubsection} 100 {0.5em} 101 {} 102\titlespacing{\subsection}{0pt}{0.8em}{0.2em} 103 104% === HEADER/FOOTER === 105\pagestyle{fancy} 106\fancyhf{} 107\renewcommand{\headrulewidth}{0.4pt} 108\fancyhead[L]{\small\textit{Every Sound is a Painting}} 109\fancyhead[R]{\small\textit{@jeffrey}} 110\fancyfoot[C]{\small\thepage} 111 112% === CUSTOM COMMANDS === 113\newcommand{\ac}{Aesthetic Computer} 114\newcommand{\acos}{AC~Native~OS} 115\newcommand{\code}[1]{\texttt{#1}} 116 117\begin{document} 118 119% === TITLE === 120{\centering 121 {\acbold\large Every Sound is a Painting\par} 122 \vspace{0.15em} 123 {\aclight\normalsize Sampling as Visual-Auditory Practice in Aesthetic Computer\par} 124 \vspace{0.8em} 125 {\small @jeffrey\par} 126 \vspace{0.3em} 127 {\small\color{acgray} Working Draft --- \today\par} 128 \vspace{1em} 129} 130 131% === ABSTRACT === 132\begin{abstract} 133\noindent 134\ac{} encodes audio samples as RGB pixel data in shareable ``paintings''---the platform's native visual media format. A recorded sound becomes an image with a short code (e.g., \code{\#k3d}) that anyone can see, share, and play back on any AC device, from bare-metal laptops to web browsers. This paper traces the evolution of sampling in AC: from microphone capture in \code{notepat.mjs} through the pixel-sample encoding of \code{stample.mjs} to a cross-platform architecture where samples flow through the same infrastructure as visual art. We argue that collapsing the boundary between auditory and visual media is not merely a technical convenience but an aesthetic position: the encoding reveals the structure of sound as color, and the platform's social features (handles, short codes, galleries) make every user's recorded sound a first-class creative object. Following Attali's trajectory from repetition to composition~\citep{attali1985noise}, we propose that treating samples as paintings advances a mode of music-making where the distinction between recording, sharing, and performing dissolves. 135\end{abstract} 136 137% ═══════════════════════════════════════════════════════════════ 138\section{Introduction: The Sample Problem} 139 140Digital audio workstations treat samples as opaque binary files. A WAV or AIFF file contains numbers---amplitude values at regular time intervals---stored in a format that no human can read, no browser can natively display, and no social platform can natively share. To move a sample from one context to another requires file transfer, format negotiation, and specialized software. The sample is infrastructure, not art. 141 142This opacity is a design choice, not a necessity. A sample is a sequence of numbers. Numbers can be colors. Colors can be pixels. Pixels can be images. Images can be paintings. And paintings, on \ac{}, are first-class creative objects: uploadable, downloadable, addressable by short codes, owned by user handles, browsable in galleries, and renderable on any device. 143 144The question this paper addresses is simple: \emph{what happens when you treat every sound as a painting?} 145 146The answer, as implemented in \ac{}'s \code{stample.mjs} and related infrastructure, is that sampling ceases to be a technical operation (``record audio to file'') and becomes a creative act with visual, social, and cross-platform dimensions. A three-second vocal sample becomes a 256$\times$156 pixel image. That image has colors---smooth gradients for sine tones, sharp boundaries for percussion, rhythmic patterns for speech. The image gets a short code. The code can be typed into any AC device---a browser, a bare-metal laptop running \acos{}, a phone---and the sound plays. 147 148This is not steganography (hiding data in images) or sonification (mapping data to sound for analysis). It is a deliberate collapse of two media types into one, motivated by the observation that \ac{} already has a complete infrastructure for visual media and no infrastructure for audio media. Rather than build a parallel audio storage system, we encode audio into the visual system and let the existing painting pipeline handle storage, sharing, identity, and distribution. 149 150% ═══════════════════════════════════════════════════════════════ 151\section{Background} 152 153\subsection{The Political Economy of Sampling} 154 155Attali's \emph{Noise}~\citep{attali1985noise} traces music through four political-economic modes: sacrifice (ritual), representation (concert hall), repetition (recording industry), and composition (a prophesied mode where everyone creates). The recording industry's mode of repetition depends on the sample as commodity---a captured sound that can be duplicated, distributed, and sold. The sample is owned, licensed, and legally contested. 156 157Navas~\citep{navas2012remix} extends this analysis to remix culture, arguing that sampling is the foundational act of contemporary music production. The sampler is the instrument of repetition. But Attali's composition mode requires something different: not the consumption of pre-made samples, but the creation and sharing of new ones by everyone. 158 159\ac{} is designed for composition in Attali's sense. Every user can record. Every recording becomes a painting. Every painting is shareable. There is no sample marketplace, no licensing, no distinction between producer and consumer. The infrastructure treats a child's first microphone recording identically to a professional producer's sound design: both become paintings with short codes. 160 161\subsection{Visual-Auditory Crossings} 162 163The idea that sound has visual form is old. Xenakis~\citep{xenakis1992formalized} composed orchestral works by drawing curves on graph paper and translating them to musical parameters. Levin~\citep{levin2000painterly} built ``painterly interfaces'' where visual gestures generated sound in real time. The spectrogram---a standard tool since the 1940s---represents frequency content as a color image. 164 165What these approaches share is a \emph{representational} relationship: the visual form represents the auditory content, but the two remain separate media. You cannot play a spectrogram. You cannot share a Xenakis sketch as an audio file. 166 167AC's pixel-sample encoding is different: the image \emph{is} the audio. The RGB values are the sample amplitudes. Playing the painting and viewing the painting are two interpretations of the same data. There is no representation gap. 168 169\subsection{The Painting Infrastructure} 170 171Before describing how samples become paintings, it is worth noting what ``painting'' means in \ac{}. A painting is: 172 173\begin{itemize}[nosep] 174 \item A bitmap image (PNG) stored on a CDN (Digital Ocean Spaces) 175 \item Owned by a user handle (\code{@jeffrey/painting/slug}) 176 \item Addressable by a short code (\code{\#k3d}) 177 \item Viewable at a URL (\code{aesthetic.computer/painting/\#k3d}) 178 \item Tracked in MongoDB with metadata (creator, timestamp, dimensions) 179 \item Uploadable from any AC client (web, native) via presigned S3 URLs 180 \item Renderable on any AC device (browser, bare-metal OS) 181 \item Publishable to the AT Protocol network as a federated social object~\citep{scudder2026identity} 182\end{itemize} 183 184This infrastructure was built for visual art. But because it operates on bitmaps---arrays of pixel values---it is agnostic to what those pixel values mean. A painting of a sunset and a painting that encodes a three-second drum loop are handled identically by every layer of the stack. 185 186% ═══════════════════════════════════════════════════════════════ 187\section{The Encoding} 188 189\subsection{From Float to Pixel} 190 191The pixel-sample encoding maps audio amplitude values to RGB color channels: 192 193\begin{lstlisting}[language=acjs] 194function encodeSampleToBitmap(data, width = 256) { 195 const totalPixels = Math.ceil(data.length / 3); 196 const height = Math.ceil(totalPixels / width); 197 const pixels = new Uint8ClampedArray( 198 width * height * 4 199 ); 200 for (let i = 0; i < data.length; i++) { 201 const v = Math.max(-1, Math.min(1, data[i])); 202 const byte = Math.round((v + 1) * 127.5); 203 const pixelIndex = Math.floor(i / 3); 204 const channel = i % 3; // R, G, or B 205 pixels[pixelIndex * 4 + channel] = byte; 206 if (channel === 0) 207 pixels[pixelIndex * 4 + 3] = 255; // alpha 208 } 209 return { width, height, pixels, sampleLength }; 210} 211\end{lstlisting} 212 213Three consecutive audio samples map to one pixel's red, green, and blue channels. Each sample value (a 32-bit float in the range $[-1, 1]$) is quantized to an 8-bit unsigned integer ($[0, 255]$) via the affine mapping: 214 215\[ 216 b = \text{round}\left(\frac{v + 1}{2} \times 255\right) 217\] 218 219The alpha channel is set to 255 (fully opaque) for every pixel. The image width is fixed at 256 pixels; the height varies with sample duration. 220 221\subsection{Encoding Properties} 222 223\textbf{Compression ratio.} The encoding maps 3 float32 samples (12 bytes) to 4 bytes (RGBA pixel), a 3:1 compression before PNG. After PNG compression, a typical 5-second 48\,kHz sample (240,000 samples, 80,000 pixels, 256$\times$313 image) produces a file of approximately 100\,KB. 224 225\textbf{Quantization.} Reducing 32-bit float precision to 8-bit integer introduces quantization noise. The theoretical signal-to-noise ratio of 8-bit quantization is approximately 48\,dB---comparable to early consumer digital audio (Sony PCM-1600, 1979) and substantially above the noise floor of typical microphone recordings in untreated rooms. For the use cases \ac{} targets (voice memos, instrument recordings, sound effects captured on laptop microphones), 8-bit resolution is transparent. 226 227\textbf{Visual structure.} The encoding is not arbitrary; it produces images whose visual appearance reflects the auditory content: 228 229\begin{itemize}[nosep] 230 \item \textbf{Sine waves} produce smooth horizontal color gradients, cycling through the RGB color space at the wave's frequency. 231 \item \textbf{Percussion} creates sharp vertical boundaries where amplitude changes rapidly. 232 \item \textbf{Noise} produces visual static---uniformly random pixel colors. 233 \item \textbf{Speech} shows rhythmic patterns of color intensity, with silent gaps appearing as bands of neutral gray (the zero-crossing color, $\approx$128 in each channel). 234 \item \textbf{Silence} is a field of uniform gray: $(128, 128, 128)$. 235\end{itemize} 236 237A user browsing a gallery of sample-paintings develops visual intuition for how sounds look. This is not a spectrogram---it encodes amplitude, not frequency---but it is a legible visual language for temporal audio structure. 238 239\subsection{The Encoding as Aesthetic Choice} 240 241The quantization from 32-bit to 8-bit is lossy. This is a feature. Like vinyl's surface noise, cassette tape's harmonic distortion, or the bit-crushed textures of early samplers (Fairlight CMI, E-mu SP-1200), the encoding imposes a sonic character. Samples that have passed through the pixel-sample encoding have a subtle graininess that is consistent, predictable, and---for many users---pleasant. 242 243This is not accidental. The E-mu SP-1200's 12-bit converters, widely regarded as producing a characteristic ``warmth,'' operate at a similar bit depth to our 8-bit encoding. The difference is that our encoding also produces a \emph{visible} artifact: the painting itself. The medium is not merely the message~\citep{mcluhan1964understanding}; the medium is simultaneously the audio and the image. 244 245% ═══════════════════════════════════════════════════════════════ 246\section{Architecture} 247 248\subsection{Recording: notepat.mjs} 249 250\code{notepat.mjs}~\citep{scudder2026notepat} is AC's primary synthesizer instrument. Among its seven waveform types is ``sample'' mode, which enables microphone recording: 251 252\begin{itemize}[nosep] 253 \item \textbf{Home key}: hold to record a global sample from the microphone 254 \item \textbf{End key}: arm per-key recording (record a different sample for each piano key) 255 \item \textbf{Any tone key}: while End is held, pressing a tone key records to that key's sample bank 256\end{itemize} 257 258On \acos{} (bare metal), the audio engine captures at 48\,kHz via ALSA direct hardware access. Recorded samples are stored as raw float32 arrays in memory and persisted to \code{/mnt/ac-sample.raw} on the boot media. On next boot, the sample auto-loads---the instrument remembers its voice across power cycles. 259 260The bare-metal audio pipeline is described in detail in~\citep{scudder2026notepat}: 128-sample ALSA periods, 32-voice polyphony with voice stealing, room reverb (3-tap delay, 0.55 feedback), and per-sample exponential smoothing for pitch and volume control. 261 262\subsection{Encoding and Sharing: stample.mjs} 263 264\code{stample.mjs} (``stamp'' + ``sample'') is a web piece that spreads a sample across touchable pads. Its key contribution is the pixel-sample bridge: audio recorded in \code{notepat} or captured live through the microphone is encoded as a painting and stored in the user's painting library. 265 266The encoding path: 267 268\begin{enumerate}[nosep] 269 \item User records audio or loads an existing sample 270 \item \code{encodeSampleToBitmap()} converts float32 array to RGBA pixels 271 \item The bitmap is uploaded as a painting via \code{track-media.mjs} 272 \item The painting receives a short code (e.g., \code{\#k3d}) 273 \item The painting is now addressable, shareable, and playable 274\end{enumerate} 275 276The decoding path: 277 278\begin{enumerate}[nosep] 279 \item User enters a painting code (e.g., \code{stample \#k3d}) 280 \item \code{loadPaintingAsAudio()} resolves the code via the painting API 281 \item The PNG is fetched from the CDN 282 \item \code{decodeBitmapToSample()} extracts audio from RGB channels 283 \item The audio is registered as a playable sample in the audio engine 284\end{enumerate} 285 286Remarkably, \code{loadPaintingAsAudio()} also accepts KidLisp~\citep{scudder2026kidlisp} source codes (\code{\$roz}), rendering the generative art program to a bitmap and then decoding that bitmap as audio. Any visual program becomes a sound. 287 288\subsection{Local Library: samples.mjs} 289 290\code{samples.mjs} is a sample library manager for \acos{}. It maintains a timestamped collection of recorded samples on the boot media at \code{/mnt/samples/}: 291 292\begin{itemize}[nosep] 293 \item \textbf{s}: save current sample with timestamp 294 \item \textbf{enter}: load selected sample into the audio engine 295 \item \textbf{space}: preview (play at original pitch) 296 \item \textbf{x}: remove from library 297 \item \textbf{arrows/tab}: navigate 298\end{itemize} 299 300The library stores samples in the native raw format for instant loading (no decode step) and maintains a \code{manifest.json} index. Future work extends this with painting upload and download, unifying the local library with the cloud painting infrastructure. 301 302\subsection{Cross-Platform Bridge} 303 304The central architectural challenge is that \acos{} runs on QuickJS with C audio bindings and no DOM, while the web runtime runs in a browser with Web Audio API and Canvas. The pixel-sample encoding is pure arithmetic---no platform-specific APIs---and can be implemented identically in both environments. 305 306The bridge operates as follows: 307 308\begin{center} 309\begin{tabular}{lll} 310\toprule 311\textbf{Operation} & \textbf{Web} & \textbf{Native} \\ 312\midrule 313Record & Web Audio API & ALSA via C \\ 314Encode & \code{pixel-sample.mjs} & Same algorithm \\ 315Upload & \code{fetch()} + S3 & \code{system.fetchPost} \\ 316Download & \code{fetch()} & \code{system.fetchBinary} \\ 317Decode & \code{pixel-sample.mjs} & Same algorithm \\ 318Play & Web Audio & ALSA via C \\ 319\bottomrule 320\end{tabular} 321\end{center} 322 323The encoding and decoding layers are platform-independent. Only the I/O boundaries (recording, network, playback) differ. A sample recorded on a bare-metal laptop at a street fair, encoded as a painting, and uploaded over WiFi is indistinguishable from a sample recorded in a web browser at a desktop---both produce the same PNG, the same short code, and the same playback on any AC device. 324 325% ═══════════════════════════════════════════════════════════════ 326\section{Design Philosophy} 327 328\subsection{Infrastructure Reuse Over Parallel Systems} 329 330The conventional approach to adding audio sharing to a platform is to build audio-specific infrastructure: audio upload endpoints, audio format negotiation, audio CDN configuration, audio metadata schemas, audio player widgets. This produces a system where visual media and audio media are handled by parallel but separate stacks. 331 332\ac{}'s approach is to encode audio \emph{into} the existing visual stack. The cost is the 8-bit quantization described above. The benefit is zero new infrastructure: no new API endpoints, no new storage buckets, no new database collections, no new authentication flows, no new client-side players. Every feature that paintings have---short codes, handles, galleries, AT Protocol syndication, CDN caching---samples get for free. 333 334This is not laziness; it is a design principle. Manovich~\citep{manovich2001language} argues that the logic of new media is the logic of the database: media objects are stored, indexed, and recombined. By encoding audio as visual data, \ac{} ensures that audio objects enter the same database as visual objects, subject to the same operations (search, browse, share, remix). 335 336\subsection{The Social Sample} 337 338On most platforms, a recorded sample is a file on a hard drive. It has no identity, no address, no social context. To share it, you attach it to a message, upload it to a file host, or embed it in a track. 339 340On \ac{}, a recorded sample is a painting with a short code. It has an owner (\code{@jeffrey}), an address (\code{aesthetic.computer/painting/\#k3d}), a creation date, and a position in the owner's gallery alongside their visual art, their KidLisp generative pieces, and their chat messages. The sample is not a file; it is a social object. 341 342This has consequences for creative practice. When a student in a laptop orchestra~\citep{scudder2026plork, trueman2006plork} records a sample, that sample immediately exists in the same space as the student's drawings, code, and conversations. The creative identity is unified: ``my paintings'' includes my sounds. 343 344\subsection{Every User Creates} 345 346Attali's composition mode requires that everyone can create, not just consume. The pixel-sample system supports this by making the creation-to-sharing path trivially short: 347 348\begin{enumerate}[nosep] 349 \item Hold Home key in notepat (1 second) 350 \item Release (sample saved) 351 \item Open samples, press s (saved with timestamp) 352 \item Press u (uploaded as painting, short code returned) 353 \item Share the code 354\end{enumerate} 355 356Five actions, no file management, no format selection, no account creation beyond the AC handle the user already has. The barrier between ``I made a sound'' and ``anyone can play my sound'' is approximately ten seconds. 357 358% ═══════════════════════════════════════════════════════════════ 359\section{Future Work} 360 361\textbf{Collaborative sampling.} Multiple devices recording simultaneously, with samples merged into composite paintings. A laptop orchestra where every player's contribution is visible as a colored stripe in a shared image. 362 363\textbf{Generative samples.} KidLisp programs that produce sample-paintings algorithmically. The command \code{stample \$roz} already works---it renders a KidLisp program as a bitmap and plays the pixels as audio. Extending this to purpose-built audio-generative programs is straightforward. 364 365\textbf{Sample lineage.} Tracking the provenance of samples as they are re-recorded, modified, and re-shared. A sample-painting could carry metadata about its ancestors, creating a version history analogous to git for sound. 366 367\textbf{Physical prints.} Printing sample-paintings on paper and scanning them back as audio. The encoding survives JPEG compression and moderate print/scan degradation because the 8-bit quantization is already lossy; additional noise from the physical process adds character without destroying intelligibility. A concert poster that contains the concert's audio is technically feasible. 368 369\textbf{Federated samples.} Via AT Protocol integration~\citep{scudder2026identity}, sample-paintings syndicated across the federated social web. A sample recorded on a bare-metal AC device, uploaded to the user's Personal Data Server, and playable on any ATProto client that understands the \code{computer.aesthetic.painting} lexicon. 370 371% ═══════════════════════════════════════════════════════════════ 372\section{Conclusion} 373 374The premise of this paper is that the boundary between visual and auditory media is artificial---maintained by file format conventions, not by any fundamental property of the underlying data. By encoding audio samples as RGB pixels and routing them through a painting infrastructure, \ac{} eliminates this boundary for its users. A sound has a color. A painting has a voice. The creative act is unified. 375 376The technical contribution is modest: a 15-line encoding function, a 15-line decoding function, and the observation that an existing painting pipeline can carry audio without modification. The conceptual contribution is larger: in a system where every sound is a painting, sampling becomes a visual practice, sharing becomes social, and the distinction between hearing and seeing---between the ear and the eye---becomes a matter of interpretation rather than infrastructure. 377 378Wishart~\citep{wishart1996sonic} argued for treating sound as a plastic material, shaped by the composer's hands like clay. In \ac{}, the material is literally visible: the sample-painting shows its texture, its rhythm, its dynamic range as colored pixels. The composer's hands shape both the sound and the image, because they are the same thing. 379 380\vspace{1em} 381\noindent\rule{\columnwidth}{0.4pt} 382{\small 383\textbf{Acknowledgments.} This work builds on infrastructure developed across the Aesthetic Computer project. The pixel-sample encoding was first implemented in \code{stample.mjs} in January 2025. The cross-platform bridge described here was developed in March 2026 during the \acos{} boot media campaign. 384} 385 386% === REFERENCES === 387\bibliographystyle{plainnat} 388\bibliography{references} 389 390\end{document}