Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference | ResearchPod