Thoughts on Devin after four months

4/15/2025

Although in the last post I concluded that Devin (opens in a new tab) (by Cognition (opens in a new tab)) was disappointing, we ended up keeping our Devin subscription at Liquid AI. This is because as I worked more with Devin, I learnt how to better leverage it to be more productive.

My biggest learning is to expect Devin to complete only 30-50% of the work, not 100%. Devin can go into a dead end, and work overly hard to dig itself out of an impossible hole. E.g., it may test a complicated function with tons of mocks but never able to make all the mocks working. A better solution is to split the big function into a group of smaller but more testable ones, and test them one by one. (This is likely because Devin reasonably tries to work within the existing boundaries and not to turn every task into an infinitely open-ended problem.)

Devin's greatest strength is its autonomy. When I need to implement a new feature with multiple approaches and not sure about which one is the best, I could easily launch 2-3 Devin instances, ask each of them to work on a different direction, compare the results and pick one to pursue. This is possible with local tools like Cursor too. But Cursor needs to use my computer and can only try one approach at a time, while Devin works independently and simultaneously. There were times when I was away from my computer and asked Devin to experiment with new ideas for me. The high availability of Devin has no parallel.

Hiring Devin is equivalent to having 3-5 interns per engineer. The number is capped at around 5, because the human engineer still needs to wrap up Devin's work, and one person can only handle this much at a time. Starting from this month, I even subscribed Devin to work on my personal projects. Life is short. No one else will help me with my ever-growing Trello board of ideas for bootstrapping. The $500 per month is a good investment (and decent tax write-off).

For engineers concerned about losing jobs to AI like Devin, you won't be worried after working closely with AI. They are in general still so lacking and fail at the slight increase in complexity. Even for seemingly simple and routine tasks, AI is not good enough. Some common examples include: adding Storybook, upgrading ESlint (version 9 just unnecessarily breaks everything), migrating Nextjs app from page router to app router. Collectively the entire engineering community has done these things millions of times. Yet when dealing with our own codebase, we still have to do most of them manually. A new complaint I have for Devin is that though Storybook has a great semi-automated commandline tool for integrating into a new project, Devin does not know to use it, and cannot correctly add Storybook for my barebone Nextjs app. Eventually all these routine tasks should just be a Devin / AI playbook and everyone can apply them with one click.

Only by increasing the level of abstraction could we elevate ourselves to deal with more complexities. AI is not moving too fast, it is not moving fast enough.

Github action template