Artificial Intelligence in Science

The Potential and the Pitfalls

Sep 20, 2025

The rise of Artificial Intelligence (AI) especially in the post-Covid pandemic world is arguably the most profound technological change in the world at the moment. How we grapple with it is perhaps the most important question of our times, given it’s effects are so unpredictable; scenarios ranging from AI improving scientific output exponentially to one’s where it is massively destructive for science are both eminently plausible. Therefore, exploring this is important.

This post is motivated both by what I’ve read generally, as well as a very productive session we had discussing these issues within my department, whose contributions are utilised throughout this post.

The History of Progress

Before we delve into AI specifically, I think it’s worth beginning with understanding how human progress is achieved at a more general level and then keeping this backdrop in our mind when we explore the potential and the pitfalls of AI.

Fundamentally, humans all started off as hunter gatherers and/or subsistence farmers who could survive on their own independent of others. The history of human progress is one where we give up certain skillsets either to technology and/or other humans, thereby opening up space for us to devote ourselves to specialising in other skillsets. By doing this, we hope that we allow humanity as a collective to produce more output, since it allows us to specialise in more productive activities and spend less time on less productive activities. However, as we do this, it also means we become more and more interdependent on other humans and/or technology.

For example, I am reliant on other humans, and technology, to produce my food for me since I have long given up the skills needed to be a successful subsistence farmer or hunter gatherer, and instead have chosen to specialise in being a hopefully half-decent statistician. This arrangement is mutually beneficial since it results in a higher productive output than if we all still tried to be subsistence farmers, but it also comes at the expense of me being reliant on others for food!

Artificial Intelligence is simply the latest episode in this story. Fundamentally, the question over the use of Artificial Intelligence is what skillsets are we willing to give up and transfer to AI to do for us. This will save work hours and allow us to specialise in other, more productive areas, but also creates a dependency that we may not be comfortable with, especially if it’s a world where there are very few AI systems and hence have a high potential for monopolies to develop.

Potential Uses of Artificial Intelligence

With this relevant backdrop in mind, we will now go through a list of areas of science where AI could potentially be used to replace humans, in whole or in part. We go through them of order from most to least confident in AI’s usage for that purpose.

Administration Tasks:

The most natural area where AI can be used in my view is to reduce administrative burden. For example, filling out forms needed when publishing papers or claiming expenses can be cut down considerably via using AI.

Given these are areas that are far from the actual research process, this is a skillset where one would feel comfortable surrendering over to AI to free up more time for other activities, because losing these skillsets associated with administrative tasks is unlikely to compromise the actual research process.

Mitigating Certain Disabilities:

Another potential use of AI will be to mitigate certain disabilities. For example, AI can aid dyslexics with spelling or potentially autistic people with adding certain bits of “small talk” or so called “pleasantries” to emails that ordinarily an autistic person might struggle with.

It’s also worth noting that mitigating disabilities is totally different to worrying about losing a skillset one would otherwise have to AI. This is because someone with a certain disability either never possessed that skillset to begin with or could only perform it with great burden and difficulty.

Helping Write Code:

Colleagues have found the use of AI very effective at writing code for certain tasks. For this to be effective however it is vital one breaks down the tasks into sufficiently small chunks. Asking the AI in one prompt to code some broad simulation task will most likely end in tears, but using individual prompts to get an AI to write code for individual functions on the way to constructing such a simulation is often effective.

One can also enter into AI systems one’s past code and ask the AI to write it in a way similar to how one has written one’s code in the past. This will make it easier to check the code the AI has written. It’s vital one checks each code output by the AI to make sure it does what you really intended for it to do.

I do think it is important however that one has done a sufficient amount of programming before one starts using AI to try and shorten the process of constructing code. This is important to preserve one’s skillsets in coding so that one can easily evaluate the code output of the AI and check it does what it purports to. It’s also important so that one has enough examples of your own code for the AI to try and emulate the style of so that checking it becomes easier.

Writing Papers:

Some people in my department have found success in getting AI to write extensive papers provided one makes the prompt sufficiently long (approximately of A4 paper length). Even after doing this, one still needs to make fairly substantial manual corrections to the paper (a couple of hours worth of work) to make it suitable but it overall still seems to save substantial amounts of time.

How replicable this is generally is unclear but it is definitely worth investigating.

Advanced Search Engine:

AI could potentially be used to provide improved search engines for certain materials in a relevant topic area. My biggest worry with this however is if we don’t fully understand how the AI system works, it won’t be clear what makes a paper get promoted by an AI search algorithm and what doesn’t. This means that in the long run we don’t know if we will be incentivising bad practices. Especially if the AI is based around LLM rather than genuine conceptual understanding, it seems probable it would be more interested in content that “sounds” impressive so to speak rather than work that is genuinely substantive.

With the current technology, it seems that AI would be suitable as one of many search algorithms but it should not be the exclusive or the primary means to search for papers on a given topic.

Human-AI Hybrid Research Workflows:

Potential uses of AI include placing it in the research pipeline. For example, researchers can perform bits of research, enter it into an AI system which then amalgmates the findings of multiple labs and then can communicate back findings to the individual labs with recommendations of what to work on next.

I’m informed that Genomics England are adopting some practices along these lines.

I think it’s really important however that this is only done if the AI is really suited for the specifics of the research field. Given the success of alphafold, which has made genuine scientific breakthroughs, this might be feasible in some areas like proteomics. I still don’t feel fully comfortable with this however since relying on an AI, which may still have an incomplete understanding of the field yet is having arguably the most important task in this research pipeline. Namely, directing what research questions to ask/investigate in the first place. I should stress though I’m not an expert in these areas so this could easily be a misperception on my part.

If one however is relying on generic LLM models accumulating multiple writing outputs by individual research groups, this again has the potential to reward content that “sounds” good rather than what is substantively good. Thus I think this would be a very dangerous practice to engage in.

Pitfalls of Artificial Intelligence

Energy Consumption:

Perhaps the most obvious drawback in Artificial Intelligence is the immense amount of energy that’s required to run the algorithms. Therefore, even if performing a task with AI maybe efficient in terms of being quicker than a human, it maybe inefficient in terms of the amount of energy expended to do it compared to tasking a human.

Fundamentally, there are two things in the economy at play here both of which are in finite amount: human labour hours and the total amount of energy produced.

Therefore, whether AI will improve economic output overall depends on whether human labour hours or the total amount of energy produced are the limiting factor in the wider economy. If it’s the former, then using AI to save human labour hours at the expense of energy expenditure will increase economic output overall. If it’s the latter then AI would be counterproductive unless one finds a way to increase the total amount of energy produced, and that has potentially worrisome implications of the means to do that are likely to exacerbate environmental issues like climate change.

One should keep in mind this tension between efficiency in terms of human time and efficiency in terms of energy expenditure when deciding whether using AI for a given task is worthwhile.

Privacy and Data Protection:

Another potential worry with AI is the fact that the more one uses it the more it will obtain data on us individually. The amount one should worry about certain large AI centres controlling data on us is highly subjective and debatable so I’ll defer that to other discussions.

What I will note however is we could end up in in the worse of both worlds where we have heavy burdensome regulations under the Data Protection Act but where the use of AI means we end up losing our data privacy anyway. This would mean that we suffer the costs of heavy regulation without even attaining the theoretical benefits.

Loss of Skillsets:

Recalling our opening backdrop in how to think about AI, we need to be alert to what skillsets we humans are giving up when we turn to reliance on AI. This is especially important if we need to maintain these skillsets to evaluate the AI’s performance.

We might be comfortable surrendering doing our laundry to AI, but I’m much less comfortable surrendering our programming or writing skills. I think at the very least it’s important one personally has done substantial amounts of coding and writing manually without the assistance of AI, before one starts using it to shorten these tasks. This is in order to retain one’s skillsets so that one can evaluate an AI’s code and writing outputs and edit them accordingly.

Exactly how much writing and coding one should have to do before starting to use AI on these tasks is unclear. This is definitely a question research departments should consider careful. Personally I’d start off cautiously; in other words, demand substantial amounts of writing and programming experience before using AI. One can then slowly reduce the threshold of experience gradually until one finds the “optimal” threshold so to speak.

Priorities Inherently Involve what Humans Value Psychologically:

Another massive issue with AI is that, fundamentally choosing how important something is depends on human beings innate psychological propensity to suffer or enjoy something. The only reason a musical record for example is worth a certain amount of money that we’re willing to trade for it is the intrinsic psychological enjoyment humans get from it.

Because this is intrinsic to the human brain, I think it’s difficult for AI to evaluate how important something should be. Maybe in the future it will be sufficiently advanced that when trained on some data on human reactions it could predict how much a human is likely to enjoy or abhor something, but I think we are long way from that. And even then, I think AI would only be able to in certain limited contexts that the data it’s used to train is specific to. Anything beyond that seems like dangerous extrapolation.

Subliminal Discrimination against Minorities:

AI also has the potential to discriminate against minorities and other vulnerable groups. Because we fundamentally don’t understand the blackbox process going on underneath the AI, we don’t know if it’s decision making process is inherently discriminatory towards minorities.

For example, I am an autistic person. If the AI is training to evaluate writings that are written predominantly by non-autistic people, it seems probable to me that it inherently is not going to view the language and writings typical of autistic people as favourably as it should. Likewise, it might end up more hostile to writings and parlances more common with certain minorities that it isn’t used to seeing in it’s training data.

This flaw with AI is perhaps the one I most worry about on a personal level and leaves me someone who is extremely cautious in when we should use it. The fact that there is no “accountability” so to speak since we don’t understand the blackbox decision making process going on is extremely alarming!

Chess Analogy - A Good Move for an AI is not always a Good Move for a Human:

I will now talk about how AI is used in chess and then try and generalise the lessons from it to the use of AI more broadly.

AI chess machines now completely outplay humans by orders of magnitude. Magnus Carlsen is widely regarded as quite comfortably the greatest chess player of all time and certainly by a huge distance the best player in the world right now. However, if he played a standard chess computer, he would lose literally every single time. That’s how much better AI is at playing chess than humans now.

Indeed what’s particularly disturbing, is if you take something like AlphaZero and train it to play chess, it performs better if you give it no prerequisite human knowledge to begin with than if you give it some prerequisite knowledge such as opening theory that human’s have developed over literal centuries. Indeed, AI often performs moves which would not make any intuitive sense whatsoever even for high ranking human grandmasters, such as advancing rooks up ranks early in the game or not worrying too much about control of the centre.

This fundamentally gets at something with chess: A move that might be good for an AI to make might be terrible for a human to. If I have a chess position and the AI tells me the best move is X, that move will only be effective if I understand how the AI plans to make that move pay off subsequently. For example, if the AI is telling me to sacrifice my Queen, I better know what the AI is planning in order to make that sacrifice pay dividends and be worthwhile. If I don’t then all I’m doing is sacrificing my Queen for nothing since I won’t know what to do subsequently to exploit the resultant position for my benefit. Instead, I’d just be losing my most powerful piece on the board for nothing.

This exposes a worry I have with us using AI in life more generally. If we are doing something the AI is telling us is the best choice, such as investigating a specific research area, it might not be optimal for us humans if we don’t know how the AI has come to that decision and how it plans to make that decision pay off subsequently. Indeed, it not only might not be optimal, but it could be disastrous if it’s something in life equivalent to a Queen sacrifice while not knowing how to make it pay off.

Unknown Downstream Consequences:

Reflecting on the chess analogy, this is a key worry I have with using AI while not understanding the fundamental blackbox processes underpinning it.

We could end up following it’s recommendations in certain areas without fully understanding how and why the AI came to those conclusions and how it plans to exploit the results of the actions it’s recommending we take. Even if we just kept following the AI (the equivalent in chess would be to keep following what ever moves it tells us to make rather than just using it for one move), if we don’t understand fundamentally what’s driving its decisions we could find ourselves a few years down the line utterly clueless as to how we got there. In some sense, the skillset we’re sacrificing is our fundamental domain knowledge that we use to evaluate things. That would make it very hard for us to diagnose any issues or mistakes that happen by the AI, which surely will occur at some point. Overzealous in our use of AI and the tempting power it seems able to wield, we could end up running head first into a brick wall in five or ten years time so to speak.

Conclusions

Artificial Intelligence has the potential to provide us immense benefits in our lives and scientific research. However, simultaneously, it presents some potentially disastrous pitfalls if we’re not careful, even independent of the considerable energy expenditure the use of AI requires.

For this reason, it’s very important we consider carefully the potential benefits of specific uses of AI and when we think those benefits of AI are sufficient in a specific circumstance to justify a specific use of it.

Overall I favour a cautious approach to using AI where we don’t use it for anything where we can’t at least evaluate for ourselves the output it provides and checking that the output is satisfactory based on our own human domain knowledge. Since checking something is good is often orders of magnitude quicker than constructing it, this still allows AI to significantly improve the efficiency of the research process. Examples of such a pipeline where AI could prove useful is in constructing code more efficiently and potentially even in writing papers eventually, provided one is willing to scrutinise the write up heavily and make sufficient editing manually.

However, I think we should not use AI for anything where we are not well posed to evaluate the output ourselves or where we don’t understand how a given output or recommendation from AI could be used going forward. This is like how in chess we should never make a piece sacrifice we’re told by an AI is optimal if we ourselves don’t understand how to capitalise on that sacrifice subsequently.

If AI research improves enough that these blackbox processes get “broken open” so to speak and allow us to understand the AI’s decision making processes better and hence how it comes to certain outputs and recommendations, it might then become safer to use AI in more ambitious contexts since we will at least then be able to diagnose more easily any mistakes the AI makes. But until that happens, we should not be willing to take the risk of using AI in more ambitious circumstances which, in my view is rather like racing down a blind ally you hope is a short cut with no idea how to turn around and find out where you are if you take a wrong turn and end up in at a dead end.

What would be a shortcut given sufficient information could be a fatal disaster without sufficient information. In my view we should lean on the side of caution in how we use AI in any important research process unless we are confident in our ability to evaluate the AI’s outputs or we have sufficient understanding of the process by which it came to those outputs.

KingStream

Discussion about this post