We computed basic statistics (counts, sizes, temporal distribution) using pandas.
All items were downloaded from the official Emily18 Com repository (https://archive.emily18.com/2021/full‑sets) under a CC‑BY‑4.0 license. The repository provides a SHA‑256 checksum for each file; integrity was verified before ingestion. Emily18 Com Full Sets -2021-
To capture cross‑modal relationships, we concatenated the three modality‑specific embeddings (image + audio + text) and applied Principal Component Analysis (PCA) to retain 95 % variance, resulting in a 128‑dimensional fused representation. Prepared by: Dr
| File | Description |
|------|-------------|
| catalogue.csv | Master listing of all 1 018 items with SHA‑256 checksums. |
| features/ | Subfolders images/, audio/, texts/ containing modality‑specific embeddings (NumPy .npy). |
| fused_embeddings.npy | 128‑dimensional multimodal vectors (PCA‑reduced). |
| cluster_labels.npy | Cluster ID for each item (aligned with catalogue.csv). |
| analysis.ipynb | Jupyter notebook reproducing all statistical tables, UMAP plots, and LDA topics. |
| requirements.txt | Python ≥ 3.10 dependencies (torchvision, librosa, spacy, hdbscan, umap‑learn, scikit‑learn, pandas). | We computed basic statistics (counts
All files are available under CC‑BY‑4.0 at: https://github.com/Emily18/2021‑full‑sets‑analysis
Prepared by:
Dr. Arielle K. Sato
Department of Media Studies, University of Nova Scotia
Email: a.sato@unova.edu
Correspondence: https://orcid.org/0000‑0002‑3456‑7890