Model Overview
Model Name: MIDI_Mamba-159M_1536VS
Model Architecture: Mamba
A MIDI Based music generation model based on Mamba architecture.
Download Pre-Trained Model
Download the pre-trained model from Hugging Face 🤗.
Sample Generations
Here are some sample outputs from the model (not cherry-picked):
These are some sample output generations. My overall expectation was not matched with this performance, so I will continue to work more on that niche. Want to DM me?
Connect with Me
My E-Mail: iamhemantindia@protonmail.com
Open for any suggestion, advice, or collaboration 🤗.
Installation and Setup
# Install required libraries
!pip install pretty_midi midi2audio
!pip install miditok
# Install fluidsynth for audio synthesis
!apt-get install fluidsynth
# Install Mamba dependencies
!pip install causal-conv1d>=1.1.0
!pip install mamba-ssm
# Set environment variables
!export LC_ALL="en_US.UTF-8"
!export LD_LIBRARY_PATH="/usr/lib64-nvidia"
!export LIBRARY_PATH="/usr/local/cuda/lib64/stubs"
!ldconfig /usr/lib64-nvidia
Code Implementation
import torch
from mamba_ssm import Mamba
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from mamba_ssm.models.config_mamba import MambaConfig
import numpy as np
from midi2audio import FluidSynth
import IPython.display as ipd
fs = FluidSynth()
Download Model Files
# Download the model and tokenizer
!wget https://huggingface.co/krystv/MIDI_Mamba-159M/resolve/main/MIDI_Mamba-159M_1536VS.pt
!wget https://huggingface.co/krystv/MIDI_Mamba-159M/resolve/main/tokenizer_1536mix_BPE.json
Model Configuration
mc = MambaConfig()
mc.d_model = 768
mc.n_layer = 42
mc.vocab_size = 1536
Load Model and Tokenizer
import IPython
import pretty_midi
from miditok import MIDILike, REMI, TokenizerConfig
from pathlib import Path
tokenizer = REMI(params='tokenizer_1536mix_BPE.json')
mf = MambaLMHeadModel(config=mc, device='cuda:0')
mf.load_state_dict(torch.load("/content/MIDI_Mamba-159M_1536VS.pt", map_location='cuda:0'))
Generate Music
input_ids = torch.tensor([[1,]]).to('cuda:0')
out = mf.generate(
input_ids=input_ids,
max_length=512,
temperature=0.9,
top_p=0.9,
top_k=30,
eos_token_id=2,
)
m = tokenizer.decode(np.array(out[0].to('cpu')))
np.array(out.to('cpu')).shape
m.dump_midi('output.mid')
Convert to Audio
fs.midi_to_audio('output.mid', 'output.wav')
ipd.Audio("output.wav")
Model Statistics
def count_parameters(model):
total_params = sum(p.numel() for p in model.parameters())
return total_params
count_parameters(mf)
# Output: 159589632
GitHub Repository
This project is open source. Feel free to contribute or provide feedback!
