tawsif Profile
tawsif

@sleeping4cat

Followers
266
Following
3K
Media
53
Statuses
2K

ai + neuroscience researcher | ex. @hseas, @DondersInst researcher | posting about scaling laws, music + evals | present. @laion_ai, @sleeping4ai

Malta
Joined March 2021
Don't wanna be here? Send us removal request.
@sleeping4cat
tawsif
14 hours
made a nice 24M song dataset and made it private on HF. will be announcing it on Tuesday. stay tuned for the biggest dataset in this field.
1
1
3
@sleeping4cat
tawsif
19 hours
it'll be half-paper and half-report cuz I still need some money and GPUs to finish-off to make the dataset for rich and ready-to-train and ICLR.
@sleeping4cat
tawsif
19 hours
thinking of dropping this next-week.
Tweet media one
0
0
3
@sleeping4cat
tawsif
19 hours
11M songs in one hour.
@sleeping4cat
tawsif
21 hours
I announce 1M songs in 10 minutes. will be very soon dropping something.
0
1
2
@sleeping4cat
tawsif
19 hours
thinking of dropping this next-week.
Tweet media one
0
0
1
@sleeping4cat
tawsif
21 hours
I announce 1M songs in 10 minutes. will be very soon dropping something.
0
0
3
@sleeping4cat
tawsif
23 hours
i'm going to drop something soon.
0
0
0
@sleeping4cat
tawsif
23 hours
I was never keen to engineering and backend till, I found-out today I am so good at it that I can reverse engineer stuff that took seasoned silicon valley startup folks many prime ($) to create.
1
0
0
@sleeping4cat
tawsif
1 day
true.
@tsoding
Тsфdiиg
1 day
@lisyarus I remember being told by my dear high school Informatics teacher that if you don't understand something that means the time has not come yet. The knowledge is not unlocked yet. You just have to wait and don't get too wind up about it.
0
0
0
@sleeping4cat
tawsif
2 days
i'm sleepy and cooking same-time. another dataset coming-soon, folks!
Tweet media one
0
0
0
@sleeping4cat
tawsif
2 days
also really grateful for the feedback, as it helped me to fix the issues with my estimate and the dataset. (it happens that even perfect scrapers don't work in real-messy-world sometimes).
0
0
1
@sleeping4cat
tawsif
2 days
earlier tweet:
@sleeping4cat
tawsif
3 days
I introduce the largest known collection of artificially generated songs on the internet, and entire collection of SUNO. thread🔽.
0
0
1
@sleeping4cat
tawsif
2 days
@xeophon_ thanks for sharing my earlier tweet on this dataset. but now this SUNO fiasco is complete. and I give 1M songs, largest till now.
0
0
1
@sleeping4cat
tawsif
2 days
I plan to expand this dataset family and add 2 additional datasets of same category. If you have any plans feel free to tell me. I also plan to write an arXiv, because ICLR takes them, LOL! so, my interest got much higher now.
0
0
1
@sleeping4cat
tawsif
2 days
I really like this dataset so I plan to provide more data based-off this songs through burning some GPUs. If you have a 2xH100s idle, you can give me. Otherwise I have to find it, which will take a few weeks-to-one-month.
0
0
1
@sleeping4cat
tawsif
2 days
NO! I don't provide entire dump, because it contains private data about the users which I don't think is valuable to train models aside from data mining. again, this dataset does not want to encourage data-mining rather only training research models.
0
0
1
@sleeping4cat
tawsif
2 days
I gathered this from 130K users and I believe there are 140-150K users in the entire SUNO platform. I didn't map the remaining ones yet but will try.
0
0
1
@sleeping4cat
tawsif
2 days
my current estimates tell me, there are between 1.5-2M SUNO songs but are too hard and painfully slow to retrive. In that situation, 1 million and 40 thousand was the best, I believe anyonce can do. without sacrificing quality sleep.
0
0
2
@sleeping4cat
tawsif
2 days
re-introducing the world's largest collection of SUNO songs. It was a big back-forth to get all the data from SUNO since most of my attempts of gathering were too fast so I was getting empty response. I found 1M-40K SUNO, the largest known till date.
Tweet media one
9
2
13
@sleeping4cat
tawsif
2 days
fine, I GOT IT. Will be dropping the full SUNO in moments 🫂.
0
1
4
@sleeping4cat
tawsif
2 days
just for light-ref: music+song field of ai, is notoriously problematic and self-centred field. for no-reason a paper can be a hit and something X can be a flop. WASABI, from INRIA folks are the best example Vs million song dataset.
0
0
1