Deepseek R1
In recent days the US stock market has been heavily disrupted by the news that perhaps Silicon Valley’s advantage in the race to AGI is not as secure as it seemed.
In a fascinating break from the standard LLM development narrative, a Chinese hedge fund diverted a portion of the GPU cluster it had built to run financial algorithms to the training of a reasoning model in the vein of OpenAI’s o-series.
The controversy comes is significant :
- This was the side-project of some extremely competent computer scientists that seems to have closed the gap with the American teams who have dedicated years and billions of dollars to this space.
- Some evidence points to Deepseek using ChatGPT to help train the model in violation of the OpenAI license agreement.
- The restrictions placed on the GPUs exported to China forced Deepseek engineers to find ingenious and highly effective workarounds that brought the cost of training the model by two or three orders of magnitude (from ~$1B to ~$5M).
- R1 is offered to the public at a cost 27 times cheaper than O1. This may be due to :
- The GPU gains from writing assembly code rather than using Nvidia’s high-level tools not only improve training efficiency, but also translate to inference, making it inherently cheaper to deliver results.
- OpenAI probably has a crazy inference margin.
- Perhaps the hedge fund shorted Nvidia, and is hosting Deepseek at a loss to crash their stock.
- Perhaps the Chinese Communist Party is subsidising Deepseek queries in order to gain access to user / company information.
- Unlike OpenAI, Google, Anthropic and other teams competing for the winner-takes-all prize of reach AGI first, Deepseek publish the model weights, allowing anyone to download and run the model locally, for free. This puts downward price pressure on OpenAI et al, making many commentators question whether foundational models aren’t locked in a race to the bottom where all the value goes to the consumer.
I downloaded a 14 billion parameter version of Deepseek onto my M3 MacBook Pro. Models range from 1.5B to 671B (the full online version), so I am at the lower end of the scale, but I’m getting around 10 tokens per second, which is about reading speed, and the results are fascinating.
The main reason for this post is not to discuss the model’s output, but rather its inner monologue that is quirky, uncertain, playful, curious and oh so human.
Here’s a quick example : when asked “teach me how to solve the Rubik’s cube”, the output is relatively banal, but the inner monologue is really worth reading :
Okay, so I want to learn how to solve a Rubik’s Cube. I’ve heard it’s a bit tricky, but I’m up for the challenge. Let me see if I can figure this out by following the steps I read earlier.
First, the goal is to have all the faces be one color each. The cube has six faces: white, orange, red, yellow, green, and blue. I remember the instructions said something about solving one face at a time, starting with white.
Alright, step one is to create a white cross. I’m a bit confused about how to do that. The instructions mentioned matching the edge pieces to their corresponding centers. So, each face has a center color, right? For example, if I’m looking at the white face, the center is white. The edges around it should be colors that match the adjacent centers.
Wait, so if I have a white center on top, then each edge piece around it should match the color of the centers on their respective faces. That makes sense because each edge is part of two faces, so it has to match both centers.
I think the first step is to get all the white edge pieces around the white center, making a cross. But I’m not sure how to do that without messing up the cube. Maybe I can start by holding the cube so that the white center is on top. Then, look at each edge piece around it and see if they match the adjacent centers.
For example, the front face center is probably red or green? Hmm, maybe it’s better to think that each color has a specific opposite: white opposite yellow, red opposite orange, and green opposite blue. So if the top is white, then the bottom must be yellow. The front could be red or orange; I’ll have to figure that out.
Wait, actually, the centers are fixed relative to each other, so if I have white on top, then front is red, right? Or maybe it’s arbitrary depending on how I orient the cube. Hmm, maybe that’s something I need to clarify.
I think each center is fixed relative to the others. So, if I hold white on top, then front could be red, right next to it is orange, and so on. But I’m not entirely sure, maybe I should look up a standard configuration to avoid confusion.
Assuming the centers are fixed, let’s say white is on top, yellow on bottom, red in front, orange to the right, green to the left, and blue at the back. That seems like a standard setup.
So, for step one, I need to create a white cross by getting all four white edge pieces around the white center. Each of these edges should also match their adjacent centers. So, for example, the front-top edge should be white and red because it’s next to the white center on top and the red center in front.
I think I can start by finding a white edge piece that’s not yet on top. Let’s say it’s on the front face, but the front face is red. If I move that piece up to the top layer, it should go between white and red centers.
Wait, but moving pieces around might cause other pieces to get out of place. Maybe I need a specific algorithm for this. The instructions mentioned something about moving pieces without disturbing others, but it’s a bit vague.
I think I should try to bring the white edge pieces up one by one. Let’s say I find a white-red edge somewhere on the cube. If it’s not already adjacent to the red center, I can move it there.
But then, when I place it next to red, the top layer might get messed up. Maybe I have to do some turns to move it into place without messing up the cross too much.
This part seems a bit confusing. Maybe I should try it physically and see how it works. Let’s attempt to get white cross on top.
Suppose I have the cube in front of me with white on top. The front-top edge is red, which isn’t white, so it’s incorrect. I need to find a white-red edge somewhere else.
Looking at the right face, maybe there’s a white edge. If I can get that to come up next to the red center, it might work. But when I move it up, the top layer’s other edges get moved too.
Hmm, perhaps I should do an algorithm to cycle these edges. Like moving the top layer so that a non-white edge goes back, allowing me to insert the white edge in its place.
Wait, I might be overcomplicating it. Maybe just practice moving pieces around and see how they affect the cube.
After some trial and error, I might get all four white edge pieces around the white center, forming a cross. Then, step one is done.
Moving on to step two: solving the white corners. There are four corner pieces with three colors each, including white. I need to place them in the correct positions.
Each corner should have the colors of the three adjacent centers. So, for example, the front-top-right corner should have white, red, and orange.
I think I can start by finding a white-corner piece in the bottom layer. Let’s say I find one with white, red, and orange. If it’s not already in the correct position, I can bring it up.
But how do I insert it without messing up the white cross? The instructions mention something about moving edges out of the way and inserting the corner.
Maybe I can hold the cube so that the white face is on top, with the target corner position facing me. Then, move the piece I want to place into that spot using some moves.
I’m not entirely sure about the algorithm, but I can try to visualize it. Maybe turning the bottom layer so that the target corner comes up, then moving it into place.
This might take some practice. I can try inserting one corner at a time, checking their orientation as I go.
Once all four white corners are in place, the first layer should be solved. Now, I can move on to step three: solving the middle layer.
The middle layer edges are the ones between the centers of two faces. For example, after solving white on top and yellow on bottom, the middle layer would be the one between them.
I need to solve these edges without disturbing the solved white and yellow faces. The middle layer has four edge pieces, each connecting two adjacent centers.
I think I can find an unsolved middle layer edge and bring it up, then insert it into the correct position. But again, without messing up the top and bottom layers.
Maybe by moving other pieces out of the way. The instructions mentioned that after solving white, you can work on the middle layer by moving pieces into place.
This part seems a bit tricky. Maybe I should focus on one edge at a time, bringing them up and inserting them correctly.
Once the middle layer is done, step three is complete. Now, on to step four: solving the top layer cross.
Wait, isn’t the top layer already solved? Or maybe I have to solve it relative to the other layers now. No, actually, after solving white on top and middle layer, I need to solve the yellow face.
So step four is solving the yellow cross. Similar to the white cross, I need to get all the yellow edges around the yellow center.
But since the top and middle layers are already solved, I have to be careful not to disturb them. So maybe I can use some algorithms that work on the bottom layer.
I need to get all four yellow edges around the yellow center. Maybe I can find a yellow edge in the middle or bottom layer and bring it up to the yellow face.
This might involve flipping edges or cycling them into place. I think using some move sequences can help, like moving a yellow edge to the correct position without disturbing others.
Once the yellow cross is done, step five comes in: solving the yellow corners. These are the corner pieces on the bottom face.
Each corner should have the three colors of its adjacent centers. So, for example, the bottom-front-right corner should have yellow, red, and orange.
I can find these corners in the top layer or other places. Once found, I need to bring them down and insert them without messing up the solved cross.
Orientation might be an issue here. The corners need to have their yellow face on the bottom, and the other two colors matching adjacent centers.
This might require some edge flipping or corner orientation adjustments. I think using specific algorithms can help with that.
After solving all yellow corners, the cube should be fully solved. But wait, sometimes pieces can get misplaced or oriented incorrectly, so I might need to do some final adjustments.
Overall, it looks like a lot of steps, but if I follow the instructions and practice each part individually, I can get better at it. Maybe I should try solving one layer at a time and not get discouraged if I make mistakes.
I think the key is to take it step by step, understand each part, and practice until it becomes second nature. It might take some time, but with patience, I can solve the Rubik’s Cube.