212 reads

Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs

by reinforcem...June 24th, 2025

Read on Terminal Reader

Read this story w/o Javascript

Too Long; Didn't Read

Explore rigorous mathematical proofs, including properties of incomplete gamma functions, Stirling's approximation, and derivations of loss functions and partition functions for our theoretical model.

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail

Mention Thumbnail

featured image - Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs

‘colorful energy beams crossing streams’ Image created by HackerNoon AI Image Generator

Table of Links

Abstract and 1 Introduction

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

Appendix C. Deferred Proofs from Section 5

C.1 Proof of Proposition 4

C.2

Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo (8@huawei.com);

(3) Lei Deng (deng.lei2@huawei.com);

(4) Wei Han (harvey.hanwei@huawei.com).

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Databricks <> AWS Marketplace

L O A D I N G
. . . comments & more!

About Author

Reinforcement Technology Advancements@reinforcement

Leading research and publication in advancing reinforcement machine learning, shaping intelligent systems & automation.

Read my stories Learn More

TOPICS

purcat-img

machine-learning #transformer-models #associative-memory #hopfield-networks #model-generalization #attention-mechanism #cross-entropy-loss #model-scaling #neural-network-performance

THIS ARTICLE WAS FEATURED IN...

Arweave

Read on Terminal Reader

Read this story w/o Javascript

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks