validation_evidence_report.md and manuscript_draft_IEEE.md pull from different runs, causing internal inconsistency. Recommendation: adopt a single authoritative source per evidence.
Terdapat tiga run model yang berbeda dengan nilai R² tidak saling beririsan, dan ketiganya tercampur di laporan tanpa pelabelan yang jelas. validation_evidence_report.md dan manuscript_draft_IEEE.md saat ini menarik dari run yang berbeda sehingga terjadi inkonsistensi internal. Rekomendasi: gunakan satu sumber otoritatif per evidence.
The Three Runs
Tiga Run Tersebut
| Run ID | ID Run | Produced by | Dihasilkan oleh | Core config | Konfigurasi inti | Sa(0.3s) R² |
|---|---|---|---|---|---|---|
| RUN-A "Heavy stratified""Heavy stratified" | train_stratified_ida.py train_eews_windows.py |
XGBoost n_est=800, max_depth=12; GroupKFold 5-fold; full feature set. N=25,058 (Fixed PTW) or N=2,747 (IDA stratified subset). |
XGBoost n_est=800, max_depth=12; GroupKFold 5-fold; fitur lengkap. N=25.058 (Fixed PTW) atau N=2.747 (subset IDA stratified). |
0.867 (Fixed 10s) 0.876 (IDA) |
||
| RUN-B "103 fast-marathon""103 fast-marathon" | train_xgboost_103_marathon_all.py | XGBoost n_est=150, max_depth=8; 5-fold. "Full-Wave" here is 50 s, NOT 341 s. |
XGBoost n_est=150, max_depth=8; 5-fold. "Full-Wave" di sini 50 s, BUKAN 341 s. |
0.763 (Fixed 10s) 0.738 (IDA) 0.811 (50s) |
||
| RUN-C "End-to-end operational""End-to-end operasional" | Not found in codebase; referenced only in manuscript | Tidak ditemukan di codebase; hanya dirujuk di manuskrip | Includes Stage 1 routing uncertainty (8.91% critical miss); numbers quoted in manuscript Table 11/12. | Mencakup ketidakpastian routing Stage 1 (8,91% critical miss); angka dikutip di Table 11/12 manuskrip. | 0.625 (IDA e2e) |
Key Discrepancies by Period
Perbedaan Utama per Periode
IDA-PTW Adaptive at Sa(0.3s)
IDA-PTW Adaptif pada Sa(0,3 s)
| Source | Sumber | Value | Nilai | Run | Run |
|---|---|---|---|---|---|
| benchmark_results_ida.csv | 0.8759 | RUN-A · N=2,747 | RUN-A · N=2.747 | ||
| spectral_r2_performance.csv @ T=0.3 | 0.8798 | RUN-A · N=2,747 | RUN-A · N=2.747 | ||
| comparison_r2_table.csv | 0.8760 | RUN-A · N=2,747 | RUN-A · N=2.747 | ||
| xgboost_103_all_baselines.csv @ T=0.3 | 0.7381 | RUN-B · N=25,058 | RUN-B · N=25.058 | ||
| comparison_golden_metrics.csv @ T=0.300 | 0.7381 | RUN-B · N=25,058 | RUN-B · N=25.058 | ||
| manuscript_draft_IEEE.md Table 11 "IDA-PTW Operational" | 0.6252 | RUN-C · no CSV source | RUN-C · tanpa sumber CSV | ||
| validation_evidence_report.md | 0.8759 | RUN-A · N=2,747 | RUN-A · N=2.747 |
Three documents label the same thing "IDA-PTW Adaptive" but the quoted R² values span from 0.876 → 0.738 → 0.625. The manuscript's "Operational" value has no reproducible CSV artifact.
Tiga dokumen memberi label "IDA-PTW Adaptive" untuk hal yang sama, namun nilai R² yang dikutip merentang dari 0,876 → 0,738 → 0,625. Nilai "Operational" di manuskrip tidak memiliki artefak CSV yang dapat direproduksi.
"Full-Wave" R² at Sa(0.3s) — label ambiguity
R² "Full-Wave" pada Sa(0,3 s) — ambiguitas label
| Source | Sumber | Value | Nilai | Actual window | Jendela aktual |
|---|---|---|---|---|---|
| validation_evidence_report.md Evidence B | 0.9508 | ~341 s | |||
| comparison_r2_table.csv | 0.9450 | not specified | tidak dispesifikasi | ||
| xgboost_103_all_baselines.csv "Full_Wave" | 0.8110 | 50 s (RUN-B) | |||
| manuscript_draft_IEEE.md Table 12 "Post-P Full-Wave" | 0.951 | ~341 s |
Critical issue: "Full_Wave" in RUN-B is actually a 50 s window — not 341 s. The ~341 s values (0.951) in the evidence report trace back to "run c7a50193", but no CSV artifact for that run exists in the repo. It is quoted as scalars only.
Isu kritis: "Full_Wave" di RUN-B sebenarnya window 50 s — bukan 341 s. Nilai ~341 s (0,951) di evidence report berasal dari "run c7a50193", tetapi tidak ada artefak CSV untuk run tersebut di repo. Nilai hanya dikutip sebagai skalar.
Internal Discrepancies Within Documents
Inkonsistensi Internal dalam Dokumen
1. Manuscript Table 11 — mislabeled PTW rows
1. Manuskrip Table 11 — baris PTW salah label
Table 11 in manuscript_draft_IEEE.md labels its rows "Fixed 2 / 3 / 4 / 6 / 8":
Table 11 di manuscript_draft_IEEE.md memberi label baris "Fixed 2 / 3 / 4 / 6 / 8":
- Row "Fixed 2/3/8" matches the CSV — OK.
- Baris "Fixed 2/3/8" sesuai dengan CSV — OK.
- Row "Fixed 4" values (0.7181 / 0.8595 / 0.8073 / 0.7916) exactly match PTW=5 in
benchmark_results_fixed.csv— mislabel. - Nilai baris "Fixed 4" (0,7181 / 0,8595 / 0,8073 / 0,7916) persis sama dengan PTW=5 di
benchmark_results_fixed.csv— salah label. - Row "Fixed 6" has only a composite R² of 0.7808 and no per-period values — no CSV source.
- Baris "Fixed 6" hanya memiliki R² komposit 0,7808 tanpa nilai per-periode — tanpa sumber CSV.
2. intensity_correlation_metrics.csv N anomaly
2. Anomali N pada intensity_correlation_metrics.csv
Sum across 4 intensity bins: 7,356 + 8,309 + 8,698 + 3,903 = 28,266 ≠ 25,058. This suggests per-trace predictions were enumerated across multiple PTW outputs (double-counting), or a different training split was used. This file feeds Table 1 of the vFinal draft — if the N is inflated, the per-bin R² values may be biased too.
Jumlah 4 bin intensitas: 7.356 + 8.309 + 8.698 + 3.903 = 28.266 ≠ 25.058. Ini mengindikasikan prediksi per-trace dihitung ganda pada beberapa output PTW, atau split training yang digunakan berbeda. File ini menjadi sumber Table 1 draft vFinal — bila N menggelembung, nilai R² per-bin juga berpotensi bias.
3. Fisis ceiling N = 21,704 vs dataset N = 25,058
3. Plafon Fisis N = 21.704 vs N dataset = 25.058
scwfparam_equivalence_golden.csv reports N=21,704, but dataset description claims 25,058. The "golden" subset is a stricter filter (defined in metadata_golden.csv) — legitimate, but must be clearly distinguished from the 25,058 training dataset in any tables that juxtapose ML R² (25,058) against Fisis R² (21,704).
scwfparam_equivalence_golden.csv melaporkan N=21.704, sedangkan deskripsi dataset menyatakan 25.058. Subset "golden" merupakan filter yang lebih ketat (didefinisikan di metadata_golden.csv) — legitim, tetapi harus secara eksplisit dibedakan dari dataset training 25.058 di setiap tabel yang membandingkan R² ML (25.058) dengan R² Fisis (21.704).
Authoritative Source Mapping (Proposed)
Pemetaan Sumber Otoritatif (Usulan)
| Evidence | Evidence | Use this CSV | Gunakan CSV ini | Rationale | Alasan |
|---|---|---|---|---|---|
| A1. Fixed PTW benchmark | A1. Benchmark Fixed PTW | benchmark_results_fixed.csv | Heavy RUN-A on full N=25,058 | Heavy RUN-A pada N=25.058 | |
| A2. IDA-PTW 3 anchor periods | A2. 3 periode anchor IDA-PTW | benchmark_results_ida.csv | RUN-A stratified on N=2,747 | RUN-A stratified pada N=2.747 | |
| A3. IDA-PTW 103-period | A3. IDA-PTW 103 periode | spectral_r2_performance.csv | RUN-A stratified, 103 periods | RUN-A stratified, 103 periode | |
| B. Information ceiling | 🚨 No CSV exists | 🚨 CSV belum ada | Must re-run or flag as legacy | Harus re-run atau tandai legacy | |
| C. Saturation test | saturation_test_results.csv | Direct match to evidence report | Cocok langsung dengan evidence report | ||
| D. P-arrival sensitivity | p_arrival_sensitivity.csv | Direct match | Cocok langsung | ||
| E. Newmark-Beta ceiling | scwfparam_equivalence_golden.csv | Physics validation on N=21,704 | Validasi fisis pada N=21.704 | ||
| F. 103-period fast marathon | F. 103 periode fast marathon | xgboost_103_all_baselines.csv | Secondary/exploratory only | Sekunder/eksploratif saja |
Recommended Actions
Tindakan yang Direkomendasikan
- Rename or retire
comparison_golden_metrics.csv. The "Golden" label implies it is authoritative; in fact it comes from the fast marathon (RUN-B). Rename tocomparison_marathon_metrics_preliminary.csvor delete. - Rename atau nonaktifkan
comparison_golden_metrics.csv. Label "Golden" menyiratkan file ini otoritatif; faktanya berasal dari fast marathon (RUN-B). Rename menjadicomparison_marathon_metrics_preliminary.csvatau hapus. - Audit
comparison_r2_table.csvFull-Wave & Total MiniSEED columns — these appear to be placeholders, not machine-computed. Either regenerate from a real 341 s run or remove the columns. - Audit kolom Full-Wave & Total MiniSEED pada
comparison_r2_table.csv— tampaknya placeholder, bukan hasil komputasi mesin. Regenerasi dari run 341 s yang nyata atau hapus kolomnya. - Recover or document the "c7a50193" (Full-Wave 341 s) and "c3399cac" (Total MiniSEED) runs. Without CSV artifacts, Evidence B numbers are not reproducible.
- Pulihkan atau dokumentasikan run "c7a50193" (Full-Wave 341 s) dan "c3399cac" (Total MiniSEED). Tanpa artefak CSV, angka Evidence B tidak dapat direproduksi.
- Fix Manuscript Table 11 labels. Either rename rows "Fixed 4/6" → "Fixed 5/10", or regenerate values for true 4 s and 6 s PTW.
- Perbaiki label Manuscript Table 11. Rename baris "Fixed 4/6" → "Fixed 5/10", atau regenerasi nilai untuk PTW 4 s dan 6 s yang sebenarnya.
- Verify
intensity_correlation_metrics.csvN=28,266 and re-run per-trace grouping logic if double-counting is confirmed. - Verifikasi N=28.266 pada
intensity_correlation_metrics.csvdan re-run logika grouping per-trace jika terbukti double-counting. - Decide which IDA-PTW paradigm the paper defends — Stage-2-oracle (R²≈0.88) or end-to-end with routing uncertainty (R²≈0.62–0.73) — and use ONE consistently throughout abstract, tables, and conclusion.
- Putuskan paradigma IDA-PTW mana yang dipertahankan paper — Stage-2 oracle (R²≈0,88) atau end-to-end dengan routing uncertainty (R²≈0,62–0,73) — dan gunakan SATU secara konsisten di abstract, tabel, maupun kesimpulan.
- Add a
PROVENANCE.mdat the top ofreports/listing every CSV with its producing script, dataset N, CV config, model hyperparameters, and run date. - Tambahkan
PROVENANCE.mddi bagian atasreports/yang mencantumkan setiap CSV beserta skrip produsen, N dataset, konfigurasi CV, hiperparameter model, dan tanggal run.
Report generated 2026-04-22 · Based on file state of /mnt/DL_Spectra/reports/ as of 2026-04-19.Laporan dihasilkan 22-04-2026 · Berdasarkan kondisi /mnt/DL_Spectra/reports/ pada 19-04-2026.