CSV Cross-check · IDA-PTW EEWS Review

TL;DR — Ringkas — There are three distinct model runs that produce non-overlapping R² numbers, and they have been mixed in reports without clear labeling. The current validation_evidence_report.md and manuscript_draft_IEEE.md pull from different runs, causing internal inconsistency. Recommendation: adopt a single authoritative source per evidence. Terdapat tiga run model yang berbeda dengan nilai R² tidak saling beririsan, dan ketiganya tercampur di laporan tanpa pelabelan yang jelas. validation_evidence_report.md dan manuscript_draft_IEEE.md saat ini menarik dari run yang berbeda sehingga terjadi inkonsistensi internal. Rekomendasi: gunakan satu sumber otoritatif per evidence.

The Three Runs

Tiga Run Tersebut

Run ID	ID Run	Produced by	Dihasilkan oleh	Core config	Konfigurasi inti
RUN-A "Heavy stratified""Heavy stratified"	train_stratified_ida.py train_eews_windows.py	XGBoost `n_est=800, max_depth=12`; GroupKFold 5-fold; full feature set. N=25,058 (Fixed PTW) or N=2,747 (IDA stratified subset).	XGBoost `n_est=800, max_depth=12`; GroupKFold 5-fold; fitur lengkap. N=25.058 (Fixed PTW) atau N=2.747 (subset IDA stratified).	0.867 (Fixed 10s) 0.876 (IDA)
RUN-B "103 fast-marathon""103 fast-marathon"	train_xgboost_103_marathon_all.py	XGBoost `n_est=150, max_depth=8`; 5-fold. "Full-Wave" here is 50 s, NOT 341 s.	XGBoost `n_est=150, max_depth=8`; 5-fold. "Full-Wave" di sini 50 s, BUKAN 341 s.	0.763 (Fixed 10s) 0.738 (IDA) 0.811 (50s)
RUN-C "End-to-end operational""End-to-end operasional"	Not found in codebase; referenced only in manuscript	Tidak ditemukan di codebase; hanya dirujuk di manuskrip	Includes Stage 1 routing uncertainty (8.91% critical miss); numbers quoted in manuscript Table 11/12.	Mencakup ketidakpastian routing Stage 1 (8,91% critical miss); angka dikutip di Table 11/12 manuskrip.	0.625 (IDA e2e)

Key Discrepancies by Period

Perbedaan Utama per Periode

IDA-PTW Adaptive at Sa(0.3s)

IDA-PTW Adaptif pada Sa(0,3 s)

Source	Sumber	Value	Nilai
benchmark_results_ida.csv	0.8759	RUN-A · N=2,747	RUN-A · N=2.747
spectral_r2_performance.csv @ T=0.3	0.8798	RUN-A · N=2,747	RUN-A · N=2.747
comparison_r2_table.csv	0.8760	RUN-A · N=2,747	RUN-A · N=2.747
xgboost_103_all_baselines.csv @ T=0.3	0.7381	RUN-B · N=25,058	RUN-B · N=25.058
comparison_golden_metrics.csv @ T=0.300	0.7381	RUN-B · N=25,058	RUN-B · N=25.058
manuscript_draft_IEEE.md Table 11 "IDA-PTW Operational"	0.6252	RUN-C · no CSV source	RUN-C · tanpa sumber CSV
validation_evidence_report.md	0.8759	RUN-A · N=2,747	RUN-A · N=2.747

⚠

Δ = 25 p.p. spread Δ = 25 p.p. rentang

Three documents label the same thing "IDA-PTW Adaptive" but the quoted R² values span from 0.876 → 0.738 → 0.625. The manuscript's "Operational" value has no reproducible CSV artifact.

Tiga dokumen memberi label "IDA-PTW Adaptive" untuk hal yang sama, namun nilai R² yang dikutip merentang dari 0,876 → 0,738 → 0,625. Nilai "Operational" di manuskrip tidak memiliki artefak CSV yang dapat direproduksi.

"Full-Wave" R² at Sa(0.3s) — label ambiguity

R² "Full-Wave" pada Sa(0,3 s) — ambiguitas label

Source	Sumber	Value	Nilai
validation_evidence_report.md Evidence B	0.9508	~341 s
comparison_r2_table.csv	0.9450	not specified	tidak dispesifikasi
xgboost_103_all_baselines.csv "Full_Wave"	0.8110	50 s (RUN-B)
manuscript_draft_IEEE.md Table 12 "Post-P Full-Wave"	0.951	~341 s

⚠

Critical issue: "Full_Wave" in RUN-B is actually a 50 s window — not 341 s. The ~341 s values (0.951) in the evidence report trace back to "run c7a50193", but no CSV artifact for that run exists in the repo. It is quoted as scalars only.

Isu kritis: "Full_Wave" di RUN-B sebenarnya window 50 s — bukan 341 s. Nilai ~341 s (0,951) di evidence report berasal dari "run c7a50193", tetapi tidak ada artefak CSV untuk run tersebut di repo. Nilai hanya dikutip sebagai skalar.

Internal Discrepancies Within Documents

Inkonsistensi Internal dalam Dokumen

1. Manuscript Table 11 — mislabeled PTW rows

1. Manuskrip Table 11 — baris PTW salah label

Table 11 in manuscript_draft_IEEE.md labels its rows "Fixed 2 / 3 / 4 / 6 / 8":

Table 11 di manuscript_draft_IEEE.md memberi label baris "Fixed 2 / 3 / 4 / 6 / 8":

Row "Fixed 2/3/8" matches the CSV — OK.
Baris "Fixed 2/3/8" sesuai dengan CSV — OK.
Row "Fixed 4" values (0.7181 / 0.8595 / 0.8073 / 0.7916) exactly match PTW=5 in benchmark_results_fixed.csv — mislabel.
Nilai baris "Fixed 4" (0,7181 / 0,8595 / 0,8073 / 0,7916) persis sama dengan PTW=5 di benchmark_results_fixed.csv — salah label.
Row "Fixed 6" has only a composite R² of 0.7808 and no per-period values — no CSV source.
Baris "Fixed 6" hanya memiliki R² komposit 0,7808 tanpa nilai per-periode — tanpa sumber CSV.

2. intensity_correlation_metrics.csv N anomaly

2. Anomali N pada intensity_correlation_metrics.csv

Sum across 4 intensity bins: 7,356 + 8,309 + 8,698 + 3,903 = 28,266 ≠ 25,058. This suggests per-trace predictions were enumerated across multiple PTW outputs (double-counting), or a different training split was used. This file feeds Table 1 of the vFinal draft — if the N is inflated, the per-bin R² values may be biased too.

Jumlah 4 bin intensitas: 7.356 + 8.309 + 8.698 + 3.903 = 28.266 ≠ 25.058. Ini mengindikasikan prediksi per-trace dihitung ganda pada beberapa output PTW, atau split training yang digunakan berbeda. File ini menjadi sumber Table 1 draft vFinal — bila N menggelembung, nilai R² per-bin juga berpotensi bias.

3. Fisis ceiling N = 21,704 vs dataset N = 25,058

3. Plafon Fisis N = 21.704 vs N dataset = 25.058

scwfparam_equivalence_golden.csv reports N=21,704, but dataset description claims 25,058. The "golden" subset is a stricter filter (defined in metadata_golden.csv) — legitimate, but must be clearly distinguished from the 25,058 training dataset in any tables that juxtapose ML R² (25,058) against Fisis R² (21,704).

scwfparam_equivalence_golden.csv melaporkan N=21.704, sedangkan deskripsi dataset menyatakan 25.058. Subset "golden" merupakan filter yang lebih ketat (didefinisikan di metadata_golden.csv) — legitim, tetapi harus secara eksplisit dibedakan dari dataset training 25.058 di setiap tabel yang membandingkan R² ML (25.058) dengan R² Fisis (21.704).

Authoritative Source Mapping (Proposed)

Pemetaan Sumber Otoritatif (Usulan)

Evidence	Evidence	Use this CSV	Gunakan CSV ini	Rationale
A1. Fixed PTW benchmark	A1. Benchmark Fixed PTW	benchmark_results_fixed.csv	Heavy RUN-A on full N=25,058	Heavy RUN-A pada N=25.058
A2. IDA-PTW 3 anchor periods	A2. 3 periode anchor IDA-PTW	benchmark_results_ida.csv	RUN-A stratified on N=2,747	RUN-A stratified pada N=2.747
A3. IDA-PTW 103-period	A3. IDA-PTW 103 periode	spectral_r2_performance.csv	RUN-A stratified, 103 periods	RUN-A stratified, 103 periode
B. Information ceiling	🚨 No CSV exists	🚨 CSV belum ada	Must re-run or flag as legacy	Harus re-run atau tandai legacy
C. Saturation test	saturation_test_results.csv	Direct match to evidence report	Cocok langsung dengan evidence report
D. P-arrival sensitivity	p_arrival_sensitivity.csv	Direct match	Cocok langsung
E. Newmark-Beta ceiling	scwfparam_equivalence_golden.csv	Physics validation on N=21,704	Validasi fisis pada N=21.704
F. 103-period fast marathon	F. 103 periode fast marathon	xgboost_103_all_baselines.csv	Secondary/exploratory only	Sekunder/eksploratif saja

Recommended Actions

Tindakan yang Direkomendasikan

Rename or retire comparison_golden_metrics.csv. The "Golden" label implies it is authoritative; in fact it comes from the fast marathon (RUN-B). Rename to comparison_marathon_metrics_preliminary.csv or delete.
Rename atau nonaktifkan comparison_golden_metrics.csv. Label "Golden" menyiratkan file ini otoritatif; faktanya berasal dari fast marathon (RUN-B). Rename menjadi comparison_marathon_metrics_preliminary.csv atau hapus.
Audit comparison_r2_table.csv Full-Wave & Total MiniSEED columns — these appear to be placeholders, not machine-computed. Either regenerate from a real 341 s run or remove the columns.
Audit kolom Full-Wave & Total MiniSEED pada comparison_r2_table.csv — tampaknya placeholder, bukan hasil komputasi mesin. Regenerasi dari run 341 s yang nyata atau hapus kolomnya.
Recover or document the "c7a50193" (Full-Wave 341 s) and "c3399cac" (Total MiniSEED) runs. Without CSV artifacts, Evidence B numbers are not reproducible.
Pulihkan atau dokumentasikan run "c7a50193" (Full-Wave 341 s) dan "c3399cac" (Total MiniSEED). Tanpa artefak CSV, angka Evidence B tidak dapat direproduksi.
Fix Manuscript Table 11 labels. Either rename rows "Fixed 4/6" → "Fixed 5/10", or regenerate values for true 4 s and 6 s PTW.
Perbaiki label Manuscript Table 11. Rename baris "Fixed 4/6" → "Fixed 5/10", atau regenerasi nilai untuk PTW 4 s dan 6 s yang sebenarnya.
Verify intensity_correlation_metrics.csv N=28,266 and re-run per-trace grouping logic if double-counting is confirmed.
Verifikasi N=28.266 pada intensity_correlation_metrics.csv dan re-run logika grouping per-trace jika terbukti double-counting.
Decide which IDA-PTW paradigm the paper defends — Stage-2-oracle (R²≈0.88) or end-to-end with routing uncertainty (R²≈0.62–0.73) — and use ONE consistently throughout abstract, tables, and conclusion.
Putuskan paradigma IDA-PTW mana yang dipertahankan paper — Stage-2 oracle (R²≈0,88) atau end-to-end dengan routing uncertainty (R²≈0,62–0,73) — dan gunakan SATU secara konsisten di abstract, tabel, maupun kesimpulan.
Add a PROVENANCE.md at the top of reports/ listing every CSV with its producing script, dataset N, CV config, model hyperparameters, and run date.
Tambahkan PROVENANCE.md di bagian atas reports/ yang mencantumkan setiap CSV beserta skrip produsen, N dataset, konfigurasi CV, hiperparameter model, dan tanggal run.

Report generated 2026-04-22 · Based on file state of /mnt/DL_Spectra/reports/ as of 2026-04-19.Laporan dihasilkan 22-04-2026 · Berdasarkan kondisi /mnt/DL_Spectra/reports/ pada 19-04-2026.

CSV Cross-check — Consistency Audit

Cross-check CSV — Audit Konsistensi