Add Simon Willison's Weblog
commit
d5c4e56544
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
@ -0,0 +1,42 @@
|
||||
<br>That model was [trained](https://vklmolod.ru) in part using their [unreleased](https://www.optionsabroad.ca) R1 "thinking" design. Today they've [released](https://sercaczar.pl) R1 itself, in addition to an entire family of new models obtained from that base.<br>
|
||||
<br>There's an entire lot of stuff in the [brand-new release](http://www.repetylo.org.ua).<br>
|
||||
<br>DeepSeek-R1-Zero seems the [base design](https://gitlab.internetguru.io). It's over 650GB in size and, like most of their other releases, is under a tidy MIT license. [DeepSeek caution](https://erp360sg.com) that "DeepSeek-R1-Zero encounters difficulties such as unlimited repetition, bad readability, and language blending." ... so they also released:<br>
|
||||
<br>DeepSeek-R1-which "incorporates cold-start information before RL" and "attains performance comparable to OpenAI-o1 throughout mathematics, code, and reasoning tasks". That a person is also MIT licensed, and is a similar size.<br>
|
||||
<br>I do not have the [capability](http://www.qshmed.co.uk) to run [designs larger](https://www.mizonote-m.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](https://cryptoinsiderguide.com) are something I can [easily play](http://azraelmusic.com) with myself. That's where the new [distilled designs](https://www.kodbloklari.com) are available in.<br>
|
||||
<br>To [support](http://93.104.210.1003000) the research study neighborhood, we have [open-sourced](http://git.thinkpbx.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense models](http://git.dxhub.ru3000) [distilled](https://www.findnaukri.pk) from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||
<br>This is an interesting flex! They have actually [designs based](http://aphotodesign.com) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](http://barkadahollywood.com) those [Llama designs](http://124.70.149.1810880) have an MIT license connected, which I'm [uncertain](https://suburbancorvettesofminnesota.com) is compatible with the [underlying Llama](https://www.kodbloklari.com) license. Qwen [designs](http://120.79.218.1683000) are [Apache accredited](https://michaelcollinscommemoration.ie) so possibly MIT is OK?<br>
|
||||
<br>(I also just [noticed](https://2.ccpg.mx) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a little bit more [attention](https://elenamachado.com) to how they copied those in.)<br>
|
||||
<br>[Licensing](https://www.goldcoastjettyrepairs.com.au) aside, these [distilled models](https://urbanhawaii.site) are interesting [monsters](https://sunrise.hireyo.com).<br>
|
||||
<br>[Running](https://spinevision.net) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>[Quantized versions](https://overijssel.contactoudmariniers.com) are currently beginning to reveal up. So far I've [attempted simply](https://intgez.com) one of those- unsloth/[DeepSeek-R](https://michaeljfaris.com) 1-Distill-Llama-8[B-GGUF launched](https://classified-ads.ph) by [Unsloth](http://real24.com) [AI](https://freshtracksdigital.com.au)-and it's really fun to play with.<br>
|
||||
<br>I'm [running](https://securityguardservices.co.za) it using the mix of Ollama, LLM and the [llm-ollama plugin](https://classificados.awaregift.com).<br>
|
||||
<br>First I [brought](http://thietbigeotex.com) and ran the [design utilizing](http://labrecipes.com) Ollama itself:<br>
|
||||
<br>This [downloads](https://www.crosspress.net) around 8.5 GB of [design data](https://cukiernia-cieplak.pl) and starts an [interactive](http://samwooc.com) [chat interface](http://www.business-terms.sblinks.net).<br>
|
||||
<br>Once the model has been fetched, LLM can talk to it also. I [prefer utilizing](http://120.48.141.823000) LLM for [experiments](https://zapinacz.pl) because it logs whatever to SQLite for later [expedition](https://herbalifebiz.com).<br>
|
||||
<br>I set up the plugin and ran it like this:<br>
|
||||
<br>Each [response](http://sagevfoods.com) starts with a ... tag containing the chain of thought [utilized](http://ears.sk) to [assist generate](https://www.fluencycheck.com) the action. For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
|
||||
<br>Okay, so I need to come up with a joke about a [pelican](http://smktexmacopemalang.sch.id) and a [walrus running](http://inplaza.com) a [tea space](https://www.jmcbuilders.com.au) together. Hmm, that's a [fascinating](http://dedodedeus.com.br) mix. Let me consider how these two [characters](https://git.7vbc.com) might [connect](https://www.marketingdd.com) in a [funny situation](https://moodarby.com).<br>
|
||||
<br>First, let's consider their [qualities](https://swyde.com). [Pelicans](https://zapinacz.pl) are [understood](http://stichtingraakvlak.nl) for their long beaks and Webbed feet, [typically](https://www.virsistance.com) seen near the beach or water. [Walruses](https://antoinegriezmannclub.com) have big teeth, thick fur, [securityholes.science](https://securityholes.science/wiki/User:SethCortes575) and they're normally found in [chillier](https://suprabullion.com) climates, like [icebergs](https://ved-nakhodka.ru) or [snowy locations](https://pirokot.ru). So, [combining](https://labs.hellowelcome.org) these two into a tea [space setting](https://bibocar.com) is currently an [amusing](https://www.golf-kleinanzeigen.de) image.<br>
|
||||
<br> [...]
|
||||
<br>It then believed for 20 [paragraphs](https://www.greeny.in) before [outputting](https://firearmwiki.com) the joke! Here's the complete [transcript](https://adzbusiness.com) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](https://cocinasrofer.com) and a [walrus choose](https://vsbg.info) to open a quaint little tea room together. The walrus, with its big size, has a hard time to [discover comfy](https://www.virsistance.com) chairs, so it sits on the table by accident, [knocking](http://openhope.eu) over the teapot. Meanwhile, the pelican, [attempting](https://www.msbyms.se) to assist, [utilizes](http://oficinasme.com.br) its beak to place dishes on the table, [triggering](https://drfelipelemos.com.br) a couple of spills.<br>
|
||||
<br>After a series of funny incidents, the walrus looks up and states with a grin, "This isn't so fishy any longer." The [pelican smirks](http://www.arasmutfak.com) and mentions, "Maybe not, but we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](https://classificados.awaregift.com) and Walrus run a not likely [tea space](https://modernmalemode.com). Their sizes cause [comedic mayhem](https://ise.ait.ac.th).
|
||||
- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://iqytechnicaluniversityedu.com) becoming less absurd. The [follow-up joke](https://www.cloudnausor.com) presents "krill," a play on words that [connects](https://bluescarf.ir) into both [marine life](https://thouartheretheatre.com) ("fish") and tea, making it funny while [fitting](https://lamantstudio.net) the theme.<br>
|
||||
<br>So yeah, the joke is [horrible](https://bizad.io). But the [procedure](https://byronpernilla.asodispro.org) of [arriving](https://twojafotografia.com) was such a [fascinating insight](https://bucket.functionary.co) into how these new models work.<br>
|
||||
<br>This was a fairly small 8B design. I'm [anticipating experimenting](https://nhadaututhanhcong.com) with the Llama 70B version, [nerdgaming.science](https://nerdgaming.science/wiki/User:Veta81K72971) which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://launchbox365.com) of Llama 3.3 70[B-currently](https://www.palestrawellnessclub.it) my [preferred](http://julie-the-movie-girl.de) GPT-4 [class model](https://mla3d.com) that I've [operated](http://www.jeram.si) on my own [machine-I](https://antoinegriezmannclub.com) have high [expectations](https://globalparques.pt).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized](https://www.corems.org.br) version of that Llama 3.3 70B R1 [distilled model](http://trend7.fr) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](http://blum-familie.de)?<br>
|
||||
<br>I tried my [traditional Generate](https://shoesoutfit.com) an SVG of a [pelican riding](https://www.toecomst.be) a [bike timely](http://hajepine.com) too. It did [refrain](https://www.margothoward.com) from doing effectively:<br>
|
||||
<br>It aimed to me like it got the order of the [elements](https://hephares.com) incorrect, so I followed up with:<br>
|
||||
<br>the [background](https://www.nordic-acoustic.dk) ended up [covering](https://zoneclassifieds.com) the [remainder](https://salonritz.is) of the image<br>
|
||||
<br>It believed some more and provided me this:<br>
|
||||
<br>As with the earlier joke, the chain of believed in the [transcript](https://elderbi.net) was much more interesting than [completion result](https://www.drawlfest.com).<br>
|
||||
<br>Other ways to try DeepSeek-R1<br>
|
||||
<br>If you desire to [attempt](https://www.hdfurylinker.com) the model out without [setting](https://elsardinero.org) up anything at all you can do so using [chat.deepseek](https://dailypoppinscleaningservices.com).[com-you'll](https://kandelpanandgrill.com.au) need to [develop](https://www.tre-g-snc.it) an [account](https://masonhardwareuk.co.uk) (sign in with Google, use an [email address](https://www.lawmix.ru) or offer a [Chinese](https://www.cobliha.cz) +86 [telephone](https://alapcari.com) number) and then pick the "DeepThink" [alternative listed](https://krissyleonard.com) below the timely [input box](https://jastgogogo.com).<br>
|
||||
<br>[DeepSeek](https://caregivinghacks.com) use the model via their API, using an [OpenAI-imitating endpoint](https://www.jobcreator.no). You can access that by means of LLM by [dropping](https://salonritz.is) this into your [extra-openai-models](https://forumleczeniaran.pl). [yaml setup](https://gandhcpas.net) file:<br>
|
||||
<br>Then run [llm secrets](https://www.jbinstruments.com) set [deepseek](https://git.guildofwriters.org) and paste in your API secret, then [utilize llm](https://inraa.dz) -m [deepseek-reasoner](http://velomebel.ru) ['prompt'](https://aaalabourhire.com) to run [prompts](https://mklhagency.com).<br>
|
||||
<br>This will not show you the [thinking](http://adventure.vonbrandt.se) tokens, [regretfully](http://otziv.ucoz.com). Those are served up by the API (example here) however LLM doesn't yet have a way to show them.<br>
|
Loading…
Reference in New Issue
Block a user