If 'fun at parties' means ignoring the potential of a 146 trillion parameter model, then yeah, I’m the most boring person you'll ever meet. I’ll let the results do the talking from here.
I'm not saying that an 140 whetever trillion parameter model can't exist, I'm just telling that your "paper" is misleading users to believe that someone single handed made an AGI.
Just be realistic, try making a 140 Billion model once and reply me how much time it took to train it from scratch.
Training a 140B model is a calculation of compute; designing a 146T architecture is a matter of engineering. While you're stuck on the 'time' it takes others, I’m focused on the MoE scaling and dataset curation for SKT AI. If you’re so concerned about the realism, do 𝗚𝗼 𝗔𝗻𝗱 𝗖𝗵𝗲𝗰𝗸 𝗢𝘂𝘁 𝗢𝘃𝗲𝗿 𝗥𝗲𝗽𝗼 𝗟𝗼𝗹