Emily18 Com Full Sets -2021- -

We computed basic statistics (counts, sizes, temporal distribution) using pandas.

All items were downloaded from the official Emily18 Com repository (https://archive.emily18.com/2021/full‑sets) under a CC‑BY‑4.0 license. The repository provides a SHA‑256 checksum for each file; integrity was verified before ingestion. Emily18 Com Full Sets -2021-

To capture cross‑modal relationships, we concatenated the three modality‑specific embeddings (image + audio + text) and applied Principal Component Analysis (PCA) to retain 95 % variance, resulting in a 128‑dimensional fused representation. Prepared by: Dr

| File | Description | |------|-------------| | catalogue.csv | Master listing of all 1 018 items with SHA‑256 checksums. | | features/ | Subfolders images/, audio/, texts/ containing modality‑specific embeddings (NumPy .npy). | | fused_embeddings.npy | 128‑dimensional multimodal vectors (PCA‑reduced). | | cluster_labels.npy | Cluster ID for each item (aligned with catalogue.csv). | | analysis.ipynb | Jupyter notebook reproducing all statistical tables, UMAP plots, and LDA topics. | | requirements.txt | Python ≥ 3.10 dependencies (torchvision, librosa, spacy, hdbscan, umap‑learn, scikit‑learn, pandas). | We computed basic statistics (counts

All files are available under CC‑BY‑4.0 at: https://github.com/Emily18/2021‑full‑sets‑analysis


Prepared by:
Dr. Arielle K. Sato
Department of Media Studies, University of Nova Scotia
Email: a.sato@unova.edu

Correspondence: https://orcid.org/0000‑0002‑3456‑7890