Behavior Transformer
A transformer-based policy architecture (BeT) that discretizes the continuous action space into clusters and uses an autoregressive transformer to predict action token sequences. BeT can represent multi-modal action distributions — when multiple valid actions exist for the same observation — which is a key advantage over MSE-based behavioral cloning that averages over modes.