papers/arxiv-sampling/sampling.tex at main

aesthetic.computer / core
fork atom
Monorepo for Aesthetic.Computer aesthetic.computer
fork atom
core / papers / arxiv-sampling / sampling.tex
at main 390 lines 25 kB view raw
wrap content
prompt.ac/@jeffrey paper: Every Sound is a Painting — sampling in AC (full LaTeX draft) 9d ago
a4ebb7e7
  1% !TEX program = xelatex
  2\documentclass[10pt,letterpaper,twocolumn]{article}
  3
  4% === GEOMETRY ===
  5\usepackage[top=0.75in, bottom=0.75in, left=0.75in, right=0.75in]{geometry}
  6
  7% === FONTS ===
  8\usepackage{fontspec}
  9\usepackage{unicode-math}
 10\setmainfont{Latin Modern Roman}
 11\setsansfont{Latin Modern Sans}
 12\setmonofont{Latin Modern Mono}[Scale=0.85]
 13\newfontfamily\acbold{ywft-processing-bold}[
 14  Path=../../system/public/type/webfonts/,
 15  Extension=.ttf
 16]
 17\newfontfamily\aclight{ywft-processing-light}[
 18  Path=../../system/public/type/webfonts/,
 19  Extension=.ttf
 20]
 21
 22% === PACKAGES ===
 23\usepackage{xcolor}
 24\usepackage{titlesec}
 25\usepackage{enumitem}
 26\usepackage{booktabs}
 27\usepackage{tabularx}
 28\usepackage{fancyhdr}
 29\usepackage{hyperref}
 30\usepackage{graphicx}
 31\usepackage{ragged2e}
 32\usepackage{microtype}
 33\usepackage{natbib}
 34\usepackage{listings}
 35\usepackage{amsmath}
 36\usepackage[colorspec=0.92]{draftwatermark}
 37
 38% === COLORS ===
 39\definecolor{acpink}{RGB}{180,72,135}
 40\definecolor{acpurple}{RGB}{120,80,180}
 41\definecolor{acdark}{RGB}{64,56,74}
 42\definecolor{acgray}{RGB}{119,119,119}
 43\definecolor{draftcolor}{RGB}{180,72,135}
 44
 45% === DRAFT WATERMARK ===
 46\DraftwatermarkOptions{
 47  text=WORKING DRAFT,
 48  fontsize=3cm,
 49  color=draftcolor!18,
 50  angle=45,
 51  pos={0.5\paperwidth, 0.5\paperheight}
 52}
 53
 54% === JS SYNTAX COLORS ===
 55\definecolor{jskw}{RGB}{119,51,170}
 56\definecolor{jsfn}{RGB}{0,136,170}
 57\definecolor{jsstr}{RGB}{170,120,0}
 58\definecolor{jsnum}{RGB}{204,0,102}
 59\definecolor{jscmt}{RGB}{102,102,102}
 60
 61\lstdefinelanguage{acjs}{
 62  keywords={function, let, const, return, if, for, export, import, new, true, false},
 63  keywordstyle=\color{jskw}\bfseries,
 64  ndkeywords={encodeSampleToBitmap, decodeBitmapToSample, loadPaintingAsAudio, saveTo, loadFrom},
 65  ndkeywordstyle=\color{jsfn},
 66  stringstyle=\color{jsstr},
 67  commentstyle=\color{jscmt}\itshape,
 68  morecomment=[l]{//},
 69  morecomment=[s]{/*}{*/},
 70  morestring=[b]",
 71  morestring=[b]',
 72  sensitive=true,
 73  basicstyle=\ttfamily\scriptsize,
 74  breaklines=true,
 75  frame=none,
 76  columns=fullflexible,
 77}
 78
 79% === HYPERREF ===
 80\hypersetup{
 81  colorlinks=true,
 82  linkcolor=acpurple,
 83  urlcolor=acpurple,
 84  citecolor=acpurple,
 85  pdfauthor={@jeffrey},
 86  pdftitle={Every Sound is a Painting: Sampling as Visual-Auditory Practice in Aesthetic Computer},
 87}
 88
 89% === SECTION FORMATTING ===
 90\titleformat{\section}
 91  {\normalfont\bfseries\normalsize\uppercase}
 92  {\thesection.}
 93  {0.5em}
 94  {}
 95\titlespacing{\section}{0pt}{1.2em}{0.3em}
 96
 97\titleformat{\subsection}
 98  {\normalfont\bfseries\small}
 99  {\thesubsection}
100  {0.5em}
101  {}
102\titlespacing{\subsection}{0pt}{0.8em}{0.2em}
103
104% === HEADER/FOOTER ===
105\pagestyle{fancy}
106\fancyhf{}
107\renewcommand{\headrulewidth}{0.4pt}
108\fancyhead[L]{\small\textit{Every Sound is a Painting}}
109\fancyhead[R]{\small\textit{@jeffrey}}
110\fancyfoot[C]{\small\thepage}
111
112% === CUSTOM COMMANDS ===
113\newcommand{\ac}{Aesthetic Computer}
114\newcommand{\acos}{AC~Native~OS}
115\newcommand{\code}[1]{\texttt{#1}}
116
117\begin{document}
118
119% === TITLE ===
120{\centering
121  {\acbold\large Every Sound is a Painting\par}
122  \vspace{0.15em}
123  {\aclight\normalsize Sampling as Visual-Auditory Practice in Aesthetic Computer\par}
124  \vspace{0.8em}
125  {\small @jeffrey\par}
126  \vspace{0.3em}
127  {\small\color{acgray} Working Draft --- \today\par}
128  \vspace{1em}
129}
130
131% === ABSTRACT ===
132\begin{abstract}
133\noindent
134\ac{} encodes audio samples as RGB pixel data in shareable ``paintings''---the platform's native visual media format. A recorded sound becomes an image with a short code (e.g., \code{\#k3d}) that anyone can see, share, and play back on any AC device, from bare-metal laptops to web browsers. This paper traces the evolution of sampling in AC: from microphone capture in \code{notepat.mjs} through the pixel-sample encoding of \code{stample.mjs} to a cross-platform architecture where samples flow through the same infrastructure as visual art. We argue that collapsing the boundary between auditory and visual media is not merely a technical convenience but an aesthetic position: the encoding reveals the structure of sound as color, and the platform's social features (handles, short codes, galleries) make every user's recorded sound a first-class creative object. Following Attali's trajectory from repetition to composition~\citep{attali1985noise}, we propose that treating samples as paintings advances a mode of music-making where the distinction between recording, sharing, and performing dissolves.
135\end{abstract}
136
137% ═══════════════════════════════════════════════════════════════
138\section{Introduction: The Sample Problem}
139
140Digital audio workstations treat samples as opaque binary files. A WAV or AIFF file contains numbers---amplitude values at regular time intervals---stored in a format that no human can read, no browser can natively display, and no social platform can natively share. To move a sample from one context to another requires file transfer, format negotiation, and specialized software. The sample is infrastructure, not art.
141
142This opacity is a design choice, not a necessity. A sample is a sequence of numbers. Numbers can be colors. Colors can be pixels. Pixels can be images. Images can be paintings. And paintings, on \ac{}, are first-class creative objects: uploadable, downloadable, addressable by short codes, owned by user handles, browsable in galleries, and renderable on any device.
143
144The question this paper addresses is simple: \emph{what happens when you treat every sound as a painting?}
145
146The answer, as implemented in \ac{}'s \code{stample.mjs} and related infrastructure, is that sampling ceases to be a technical operation (``record audio to file'') and becomes a creative act with visual, social, and cross-platform dimensions. A three-second vocal sample becomes a 256$\times$156 pixel image. That image has colors---smooth gradients for sine tones, sharp boundaries for percussion, rhythmic patterns for speech. The image gets a short code. The code can be typed into any AC device---a browser, a bare-metal laptop running \acos{}, a phone---and the sound plays.
147
148This is not steganography (hiding data in images) or sonification (mapping data to sound for analysis). It is a deliberate collapse of two media types into one, motivated by the observation that \ac{} already has a complete infrastructure for visual media and no infrastructure for audio media. Rather than build a parallel audio storage system, we encode audio into the visual system and let the existing painting pipeline handle storage, sharing, identity, and distribution.
149
150% ═══════════════════════════════════════════════════════════════
151\section{Background}
152
153\subsection{The Political Economy of Sampling}
154
155Attali's \emph{Noise}~\citep{attali1985noise} traces music through four political-economic modes: sacrifice (ritual), representation (concert hall), repetition (recording industry), and composition (a prophesied mode where everyone creates). The recording industry's mode of repetition depends on the sample as commodity---a captured sound that can be duplicated, distributed, and sold. The sample is owned, licensed, and legally contested.
156
157Navas~\citep{navas2012remix} extends this analysis to remix culture, arguing that sampling is the foundational act of contemporary music production. The sampler is the instrument of repetition. But Attali's composition mode requires something different: not the consumption of pre-made samples, but the creation and sharing of new ones by everyone.
158
159\ac{} is designed for composition in Attali's sense. Every user can record. Every recording becomes a painting. Every painting is shareable. There is no sample marketplace, no licensing, no distinction between producer and consumer. The infrastructure treats a child's first microphone recording identically to a professional producer's sound design: both become paintings with short codes.
160
161\subsection{Visual-Auditory Crossings}
162
163The idea that sound has visual form is old. Xenakis~\citep{xenakis1992formalized} composed orchestral works by drawing curves on graph paper and translating them to musical parameters. Levin~\citep{levin2000painterly} built ``painterly interfaces'' where visual gestures generated sound in real time. The spectrogram---a standard tool since the 1940s---represents frequency content as a color image.
164
165What these approaches share is a \emph{representational} relationship: the visual form represents the auditory content, but the two remain separate media. You cannot play a spectrogram. You cannot share a Xenakis sketch as an audio file.
166
167AC's pixel-sample encoding is different: the image \emph{is} the audio. The RGB values are the sample amplitudes. Playing the painting and viewing the painting are two interpretations of the same data. There is no representation gap.
168
169\subsection{The Painting Infrastructure}
170
171Before describing how samples become paintings, it is worth noting what ``painting'' means in \ac{}. A painting is:
172
173\begin{itemize}[nosep]
174  \item A bitmap image (PNG) stored on a CDN (Digital Ocean Spaces)
175  \item Owned by a user handle (\code{@jeffrey/painting/slug})
176  \item Addressable by a short code (\code{\#k3d})
177  \item Viewable at a URL (\code{aesthetic.computer/painting/\#k3d})
178  \item Tracked in MongoDB with metadata (creator, timestamp, dimensions)
179  \item Uploadable from any AC client (web, native) via presigned S3 URLs
180  \item Renderable on any AC device (browser, bare-metal OS)
181  \item Publishable to the AT Protocol network as a federated social object~\citep{scudder2026identity}
182\end{itemize}
183
184This infrastructure was built for visual art. But because it operates on bitmaps---arrays of pixel values---it is agnostic to what those pixel values mean. A painting of a sunset and a painting that encodes a three-second drum loop are handled identically by every layer of the stack.
185
186% ═══════════════════════════════════════════════════════════════
187\section{The Encoding}
188
189\subsection{From Float to Pixel}
190
191The pixel-sample encoding maps audio amplitude values to RGB color channels:
192
193\begin{lstlisting}[language=acjs]
194function encodeSampleToBitmap(data, width = 256) {
195  const totalPixels = Math.ceil(data.length / 3);
196  const height = Math.ceil(totalPixels / width);
197  const pixels = new Uint8ClampedArray(
198    width * height * 4
199  );
200  for (let i = 0; i < data.length; i++) {
201    const v = Math.max(-1, Math.min(1, data[i]));
202    const byte = Math.round((v + 1) * 127.5);
203    const pixelIndex = Math.floor(i / 3);
204    const channel = i % 3;  // R, G, or B
205    pixels[pixelIndex * 4 + channel] = byte;
206    if (channel === 0)
207      pixels[pixelIndex * 4 + 3] = 255; // alpha
208  }
209  return { width, height, pixels, sampleLength };
210}
211\end{lstlisting}
212
213Three consecutive audio samples map to one pixel's red, green, and blue channels. Each sample value (a 32-bit float in the range $[-1, 1]$) is quantized to an 8-bit unsigned integer ($[0, 255]$) via the affine mapping:
214
215\[
216  b = \text{round}\left(\frac{v + 1}{2} \times 255\right)
217\]
218
219The alpha channel is set to 255 (fully opaque) for every pixel. The image width is fixed at 256 pixels; the height varies with sample duration.
220
221\subsection{Encoding Properties}
222
223\textbf{Compression ratio.} The encoding maps 3 float32 samples (12 bytes) to 4 bytes (RGBA pixel), a 3:1 compression before PNG. After PNG compression, a typical 5-second 48\,kHz sample (240,000 samples, 80,000 pixels, 256$\times$313 image) produces a file of approximately 100\,KB.
224
225\textbf{Quantization.} Reducing 32-bit float precision to 8-bit integer introduces quantization noise. The theoretical signal-to-noise ratio of 8-bit quantization is approximately 48\,dB---comparable to early consumer digital audio (Sony PCM-1600, 1979) and substantially above the noise floor of typical microphone recordings in untreated rooms. For the use cases \ac{} targets (voice memos, instrument recordings, sound effects captured on laptop microphones), 8-bit resolution is transparent.
226
227\textbf{Visual structure.} The encoding is not arbitrary; it produces images whose visual appearance reflects the auditory content:
228
229\begin{itemize}[nosep]
230  \item \textbf{Sine waves} produce smooth horizontal color gradients, cycling through the RGB color space at the wave's frequency.
231  \item \textbf{Percussion} creates sharp vertical boundaries where amplitude changes rapidly.
232  \item \textbf{Noise} produces visual static---uniformly random pixel colors.
233  \item \textbf{Speech} shows rhythmic patterns of color intensity, with silent gaps appearing as bands of neutral gray (the zero-crossing color, $\approx$128 in each channel).
234  \item \textbf{Silence} is a field of uniform gray: $(128, 128, 128)$.
235\end{itemize}
236
237A user browsing a gallery of sample-paintings develops visual intuition for how sounds look. This is not a spectrogram---it encodes amplitude, not frequency---but it is a legible visual language for temporal audio structure.
238
239\subsection{The Encoding as Aesthetic Choice}
240
241The quantization from 32-bit to 8-bit is lossy. This is a feature. Like vinyl's surface noise, cassette tape's harmonic distortion, or the bit-crushed textures of early samplers (Fairlight CMI, E-mu SP-1200), the encoding imposes a sonic character. Samples that have passed through the pixel-sample encoding have a subtle graininess that is consistent, predictable, and---for many users---pleasant.
242
243This is not accidental. The E-mu SP-1200's 12-bit converters, widely regarded as producing a characteristic ``warmth,'' operate at a similar bit depth to our 8-bit encoding. The difference is that our encoding also produces a \emph{visible} artifact: the painting itself. The medium is not merely the message~\citep{mcluhan1964understanding}; the medium is simultaneously the audio and the image.
244
245% ═══════════════════════════════════════════════════════════════
246\section{Architecture}
247
248\subsection{Recording: notepat.mjs}
249
250\code{notepat.mjs}~\citep{scudder2026notepat} is AC's primary synthesizer instrument. Among its seven waveform types is ``sample'' mode, which enables microphone recording:
251
252\begin{itemize}[nosep]
253  \item \textbf{Home key}: hold to record a global sample from the microphone
254  \item \textbf{End key}: arm per-key recording (record a different sample for each piano key)
255  \item \textbf{Any tone key}: while End is held, pressing a tone key records to that key's sample bank
256\end{itemize}
257
258On \acos{} (bare metal), the audio engine captures at 48\,kHz via ALSA direct hardware access. Recorded samples are stored as raw float32 arrays in memory and persisted to \code{/mnt/ac-sample.raw} on the boot media. On next boot, the sample auto-loads---the instrument remembers its voice across power cycles.
259
260The bare-metal audio pipeline is described in detail in~\citep{scudder2026notepat}: 128-sample ALSA periods, 32-voice polyphony with voice stealing, room reverb (3-tap delay, 0.55 feedback), and per-sample exponential smoothing for pitch and volume control.
261
262\subsection{Encoding and Sharing: stample.mjs}
263
264\code{stample.mjs} (``stamp'' + ``sample'') is a web piece that spreads a sample across touchable pads. Its key contribution is the pixel-sample bridge: audio recorded in \code{notepat} or captured live through the microphone is encoded as a painting and stored in the user's painting library.
265
266The encoding path:
267
268\begin{enumerate}[nosep]
269  \item User records audio or loads an existing sample
270  \item \code{encodeSampleToBitmap()} converts float32 array to RGBA pixels
271  \item The bitmap is uploaded as a painting via \code{track-media.mjs}
272  \item The painting receives a short code (e.g., \code{\#k3d})
273  \item The painting is now addressable, shareable, and playable
274\end{enumerate}
275
276The decoding path:
277
278\begin{enumerate}[nosep]
279  \item User enters a painting code (e.g., \code{stample \#k3d})
280  \item \code{loadPaintingAsAudio()} resolves the code via the painting API
281  \item The PNG is fetched from the CDN
282  \item \code{decodeBitmapToSample()} extracts audio from RGB channels
283  \item The audio is registered as a playable sample in the audio engine
284\end{enumerate}
285
286Remarkably, \code{loadPaintingAsAudio()} also accepts KidLisp~\citep{scudder2026kidlisp} source codes (\code{\$roz}), rendering the generative art program to a bitmap and then decoding that bitmap as audio. Any visual program becomes a sound.
287
288\subsection{Local Library: samples.mjs}
289
290\code{samples.mjs} is a sample library manager for \acos{}. It maintains a timestamped collection of recorded samples on the boot media at \code{/mnt/samples/}:
291
292\begin{itemize}[nosep]
293  \item \textbf{s}: save current sample with timestamp
294  \item \textbf{enter}: load selected sample into the audio engine
295  \item \textbf{space}: preview (play at original pitch)
296  \item \textbf{x}: remove from library
297  \item \textbf{arrows/tab}: navigate
298\end{itemize}
299
300The library stores samples in the native raw format for instant loading (no decode step) and maintains a \code{manifest.json} index. Future work extends this with painting upload and download, unifying the local library with the cloud painting infrastructure.
301
302\subsection{Cross-Platform Bridge}
303
304The central architectural challenge is that \acos{} runs on QuickJS with C audio bindings and no DOM, while the web runtime runs in a browser with Web Audio API and Canvas. The pixel-sample encoding is pure arithmetic---no platform-specific APIs---and can be implemented identically in both environments.
305
306The bridge operates as follows:
307
308\begin{center}
309\begin{tabular}{lll}
310\toprule
311\textbf{Operation} & \textbf{Web} & \textbf{Native} \\
312\midrule
313Record & Web Audio API & ALSA via C \\
314Encode & \code{pixel-sample.mjs} & Same algorithm \\
315Upload & \code{fetch()} + S3 & \code{system.fetchPost} \\
316Download & \code{fetch()} & \code{system.fetchBinary} \\
317Decode & \code{pixel-sample.mjs} & Same algorithm \\
318Play & Web Audio & ALSA via C \\
319\bottomrule
320\end{tabular}
321\end{center}
322
323The encoding and decoding layers are platform-independent. Only the I/O boundaries (recording, network, playback) differ. A sample recorded on a bare-metal laptop at a street fair, encoded as a painting, and uploaded over WiFi is indistinguishable from a sample recorded in a web browser at a desktop---both produce the same PNG, the same short code, and the same playback on any AC device.
324
325% ═══════════════════════════════════════════════════════════════
326\section{Design Philosophy}
327
328\subsection{Infrastructure Reuse Over Parallel Systems}
329
330The conventional approach to adding audio sharing to a platform is to build audio-specific infrastructure: audio upload endpoints, audio format negotiation, audio CDN configuration, audio metadata schemas, audio player widgets. This produces a system where visual media and audio media are handled by parallel but separate stacks.
331
332\ac{}'s approach is to encode audio \emph{into} the existing visual stack. The cost is the 8-bit quantization described above. The benefit is zero new infrastructure: no new API endpoints, no new storage buckets, no new database collections, no new authentication flows, no new client-side players. Every feature that paintings have---short codes, handles, galleries, AT Protocol syndication, CDN caching---samples get for free.
333
334This is not laziness; it is a design principle. Manovich~\citep{manovich2001language} argues that the logic of new media is the logic of the database: media objects are stored, indexed, and recombined. By encoding audio as visual data, \ac{} ensures that audio objects enter the same database as visual objects, subject to the same operations (search, browse, share, remix).
335
336\subsection{The Social Sample}
337
338On most platforms, a recorded sample is a file on a hard drive. It has no identity, no address, no social context. To share it, you attach it to a message, upload it to a file host, or embed it in a track.
339
340On \ac{}, a recorded sample is a painting with a short code. It has an owner (\code{@jeffrey}), an address (\code{aesthetic.computer/painting/\#k3d}), a creation date, and a position in the owner's gallery alongside their visual art, their KidLisp generative pieces, and their chat messages. The sample is not a file; it is a social object.
341
342This has consequences for creative practice. When a student in a laptop orchestra~\citep{scudder2026plork, trueman2006plork} records a sample, that sample immediately exists in the same space as the student's drawings, code, and conversations. The creative identity is unified: ``my paintings'' includes my sounds.
343
344\subsection{Every User Creates}
345
346Attali's composition mode requires that everyone can create, not just consume. The pixel-sample system supports this by making the creation-to-sharing path trivially short:
347
348\begin{enumerate}[nosep]
349  \item Hold Home key in notepat (1 second)
350  \item Release (sample saved)
351  \item Open samples, press s (saved with timestamp)
352  \item Press u (uploaded as painting, short code returned)
353  \item Share the code
354\end{enumerate}
355
356Five actions, no file management, no format selection, no account creation beyond the AC handle the user already has. The barrier between ``I made a sound'' and ``anyone can play my sound'' is approximately ten seconds.
357
358% ═══════════════════════════════════════════════════════════════
359\section{Future Work}
360
361\textbf{Collaborative sampling.} Multiple devices recording simultaneously, with samples merged into composite paintings. A laptop orchestra where every player's contribution is visible as a colored stripe in a shared image.
362
363\textbf{Generative samples.} KidLisp programs that produce sample-paintings algorithmically. The command \code{stample \$roz} already works---it renders a KidLisp program as a bitmap and plays the pixels as audio. Extending this to purpose-built audio-generative programs is straightforward.
364
365\textbf{Sample lineage.} Tracking the provenance of samples as they are re-recorded, modified, and re-shared. A sample-painting could carry metadata about its ancestors, creating a version history analogous to git for sound.
366
367\textbf{Physical prints.} Printing sample-paintings on paper and scanning them back as audio. The encoding survives JPEG compression and moderate print/scan degradation because the 8-bit quantization is already lossy; additional noise from the physical process adds character without destroying intelligibility. A concert poster that contains the concert's audio is technically feasible.
368
369\textbf{Federated samples.} Via AT Protocol integration~\citep{scudder2026identity}, sample-paintings syndicated across the federated social web. A sample recorded on a bare-metal AC device, uploaded to the user's Personal Data Server, and playable on any ATProto client that understands the \code{computer.aesthetic.painting} lexicon.
370
371% ═══════════════════════════════════════════════════════════════
372\section{Conclusion}
373
374The premise of this paper is that the boundary between visual and auditory media is artificial---maintained by file format conventions, not by any fundamental property of the underlying data. By encoding audio samples as RGB pixels and routing them through a painting infrastructure, \ac{} eliminates this boundary for its users. A sound has a color. A painting has a voice. The creative act is unified.
375
376The technical contribution is modest: a 15-line encoding function, a 15-line decoding function, and the observation that an existing painting pipeline can carry audio without modification. The conceptual contribution is larger: in a system where every sound is a painting, sampling becomes a visual practice, sharing becomes social, and the distinction between hearing and seeing---between the ear and the eye---becomes a matter of interpretation rather than infrastructure.
377
378Wishart~\citep{wishart1996sonic} argued for treating sound as a plastic material, shaped by the composer's hands like clay. In \ac{}, the material is literally visible: the sample-painting shows its texture, its rhythm, its dynamic range as colored pixels. The composer's hands shape both the sound and the image, because they are the same thing.
379
380\vspace{1em}
381\noindent\rule{\columnwidth}{0.4pt}
382{\small
383\textbf{Acknowledgments.} This work builds on infrastructure developed across the Aesthetic Computer project. The pixel-sample encoding was first implemented in \code{stample.mjs} in January 2025. The cross-platform bridge described here was developed in March 2026 during the \acos{} boot media campaign.
384}
385
386% === REFERENCES ===
387\bibliographystyle{plainnat}
388\bibliography{references}
389
390\end{document}