Feb 3, 2026

BrainTransformers: Building Language Models with Spiking Neural Networks

Large language models consume enormous amounts of energy. Training GPT-3 used an estimated 1,287 MWh of electricity. Running these models at scale requires thousands of GPUs operating continuously. A 2024 research project, BrainTransformers, demonstrates an alternative approach. By implementing a language model using Spiking Neural Networks instead of conventional artificial neural networks, the researchers achieved competitive performance with dramatically reduced computational overhead.

Spiking Neural Networks for Language

Spiking Neural Networks differ from standard deep learning models in how they process information. Instead of computing continuous valued activations at each layer, SNNs use discrete spike events. Neurons remain inactive until input crosses a threshold, at which point they fire a spike and reset. This event driven computation more closely resembles biological neurons, which communicate through action potentials rather than continuous signals.

The sparse activation pattern of SNNs makes them efficient on specialized neuromorphic hardware. Where a conventional neural network computes output for every neuron at every timestep, an SNN only performs computation when a spike occurs. For many tasks, most neurons remain silent most of the time, leading to substantial reduction in operations.

Language modeling seemed an unlikely application for SNNs. Standard language models like transformers process entire sequences in parallel, with dense matrix operations across all tokens. SNNs, in contrast, process inputs sequentially over time. The temporal dynamics that make SNNs efficient for sensory processing appeared incompatible with the architectural requirements of language generation.

Technology Readiness Level: TRL 3 (proof of concept). The model demonstrates feasibility with working prototypes, but practical deployment requires additional optimization and specialized hardware.

BrainTransformers Architecture

The BrainTransformers project, published in October 2024 by Zhengzheng Tang and colleagues, solves this problem by designing SNN compatible versions of transformer components. The key innovations include SNNMatmul for spike based matrix multiplication, SNNSoftmax for computing attention weights with spikes, and SNNSiLU as an SNN approximation of the SiLU activation function.

The most significant architectural change addresses the quadratic complexity of self attention. Standard transformer attention scales as O(N²) with sequence length, making it computationally expensive for long sequences. BrainTransformers modifies the attention mechanism to process tokens sequentially rather than in parallel, reducing complexity to O(N). This aligns the sequence dimension of language with the temporal dimension of SNNs.

The model also implements a Synapsis module to simulate synaptic plasticity, allowing weights to adapt based on spike timing patterns. This brings a form of biological learning mechanism into the language modeling context.

The architecture is trained in three stages. First, standard supervised learning on language modeling objectives. Second, SNN specific training that takes advantage of temporal spike dynamics. Third, synaptic plasticity training that fine tunes weights based on spike timing patterns observed during inference.

Performance and Efficiency

The researchers trained a 3 billion parameter model, BrainTransformers-3B-Chat, and evaluated it on standard language model benchmarks. Results include:

MMLU (Massive Multitask Language Understanding): 63.2%
BBH (Big Bench Hard): 54.1%
ARC-C (AI2 Reasoning Challenge, Challenge Set): 54.3%
GSM8K (Grade School Math): 76.3%

These scores are competitive with conventional transformer models of similar size. The model can generate coherent text and solve reasoning tasks at a level comparable to non-spiking baselines.

The efficiency gains appear when running on neuromorphic hardware. Because SNNs only compute when spikes occur, and because most neurons remain silent for most timesteps, the total number of operations is dramatically reduced. The researchers estimate 20x fewer operations compared to a dense neural network implementation.

This translates to lower energy consumption. While exact energy measurements depend on hardware, the operation count reduction suggests proportional energy savings when deployed on neuromorphic chips designed to exploit sparse spiking computation.

BrainTransformers builds on earlier attempts to create spiking language models. SpikeGPT, published in February 2023 by Rui-Jie Zhu and colleagues, demonstrated a 216 million parameter spiking language model inspired by the RWKV architecture. That model achieved 22x reduction in synaptic operations compared to dense baselines on the Enwik8 dataset.

IBM’s NorthPole neuromorphic chip, demonstrated in 2024, achieved image classification using a tiny fraction of the energy required by conventional systems while running five times faster. These results showed that the theoretical efficiency advantages of SNNs could be realized in practice with appropriate hardware.

In April 2025, researchers deployed an LLM on Intel’s Loihi 2 neuromorphic processor, the first language model to run on neuromorphic hardware. The Loihi 2 implementation matched the accuracy of a GPU based LLM while using half the energy, validating that spiking language models can achieve real world energy savings.

Recent research in 2024-2025 demonstrated that SNN architectures can achieve 99.72% reduction in dynamic energy consumption compared to conventional neural network baselines. This shift from 4,421.52 nJ to 12.55 nJ per inference represents the kind of efficiency gain that could make large scale AI deployment sustainable.

Hardware and Deployment Challenges

The efficiency benefits of SNNs only materialize on hardware designed for event driven computation. Running an SNN on a standard GPU does not provide energy savings because the GPU must still compute at every timestep. Neuromorphic chips like Intel’s Loihi, IBM’s TrueNorth, and specialized processors from startups like SpiNNcloud are designed to exploit spike sparsity.

These chips use asynchronous event driven circuits that only consume power when a spike occurs. They implement local memory and computation, avoiding the energy cost of moving data between processor and memory. This neuromorphic architecture is fundamentally different from the von Neumann design of conventional computers.

However, neuromorphic hardware remains limited in scale and availability. Loihi 2 has 128 cores and can simulate roughly 1 million neurons. TrueNorth has 1 million neurons and 256 million synapses. These are orders of magnitude smaller than the billions of parameters in modern language models. Scaling neuromorphic systems to billions of neurons while maintaining energy efficiency is an open engineering challenge.

Programming neuromorphic hardware is also more difficult than programming GPUs. The tools and frameworks for training and deploying SNNs are less mature than those for conventional deep learning. Integration with existing ML pipelines, model conversion, debugging, and optimization all require new approaches.

Implications for Brain Emulation

Spiking neural networks are more biologically realistic than standard deep learning models, but they remain abstractions. Real neurons have complex dendritic trees, multiple ion channels, various neurotransmitters, and intricate feedback loops. Reducing this to integrate and fire dynamics or even to more sophisticated neuron models is a simplification.

For whole brain emulation, the question is what level of biological detail is necessary to preserve function and consciousness. If cognitive processes emerge from spike timing patterns, then SNNs might capture the essential dynamics. If they depend on subcellular mechanisms like microtubule quantum states, then even spiking models may be insufficient.

The success of BrainTransformers and similar projects suggests that spike based computation is powerful enough for complex cognitive tasks like language understanding. This supports the hypothesis that the computational substrate of cognition operates at the level of neuronal firing patterns rather than deeper biophysical processes.

However, these language models are trained with backpropagation on supervised datasets. They do not develop through the self-organized learning processes that shape biological brains. The SPAUN cognitive model and other brain inspired architectures attempt to capture these developmental and learning dynamics, but scaling them to billions of neurons remains beyond current capabilities.

Path Forward

The immediate future for spiking language models involves hardware optimization and deployment at scale. Companies like SpiNNcloud are working with government research labs to develop specialized neuromorphic chips. If these chips can be manufactured cost effectively, they could provide an energy efficient alternative to GPUs for inference workloads.

Research directions include developing specialized fine tuning tools for SNNs, exploring hybrid architectures that combine spiking and non-spiking components, and investigating how to leverage temporal spike patterns for tasks beyond language modeling. Video processing, speech recognition, and real time control systems may benefit more from the temporal dynamics of SNNs than static language tasks.

For applications where energy efficiency is critical, such as edge computing, mobile devices, or large scale deployment, spiking models offer a practical advantage. Data centers running language models at scale could reduce energy costs significantly if neuromorphic hardware becomes viable.

The BrainTransformers project demonstrates that the gap between biological and artificial intelligence can be narrowed through better understanding of neural computation. As we develop more sophisticated models of how real brains process information, we gain both efficiency improvements for AI systems and insights into the computational principles underlying biological cognition.

Whether this path leads toward genuine brain emulation or simply brain inspired computing remains an open question. But the convergence of neuroscience, AI, and specialized hardware is producing systems that increasingly resemble the computational architecture of biological nervous systems, with measurable benefits in energy efficiency and potential for scaling.

Official Sources

Primary Research Papers:

Tang, Z., et al. (2024). “BrainTransformers: SNN-LLM.” arXiv preprint arXiv:2410.14687. arXiv | Hugging Face
Zhu, R.-J., et al. (2023). “SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks.” arXiv preprint arXiv:2302.13939. arXiv | GitHub Implementation

Neuromorphic Hardware:

Eshraghian, J. K., et al. (2025). “Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2.” arXiv preprint arXiv:2503.18002. arXiv
Modha, D. S., et al. (2023). “Neural inference at the frontier of energy, space, and time.” Science, 382(6668), 329-335. IBM NorthPole Neuromorphic Chip

Energy Efficiency Studies:

Dhakal, A. (2025). “Green Neuromorphic Computing: Quantifying the Energy Efficiency of Spiking Neural Networks for Edge AI Applications.” SSRN. Energy Analysis
Parker, A., & Furber, S. (2025). “Can neuromorphic computing help reduce AI’s high energy cost?” PNAS, 122(4). PNAS Article

Reviews and Surveys:

Christensen, D. V., et al. (2025). “The road to commercial success for neuromorphic technologies.” Nature Communications, 16, 1342. Nature Communications
Boubakeur, M., et al. (2025). “A comparative review of deep and spiking neural networks for edge AI neuromorphic circuits.” Frontiers in Neuroscience, 19. Frontiers Review

Recent Developments:

“Neuromorphic Computing 2025: Current State of the Art.” Human Unsupervised, 2025. Survey
Synced. (2023). “Introducing SpikeGPT: UCSC & Kuaishou’s LLM With Spiking Neural Networks Slashes Language Generation Costs.” Press Coverage