Some Problems I'm Thinking About

In my gap year review, I mentioned problems I care about but didn't describe any specific problems. There are a lot of problems I care about, but I want to highlight two in particular: AI safety and the verification of digital information.



AI Safety



I think it is pretty clear why AI safety is important, and there's a lot of writing out there. I want to highlight some very near term practical points (that might be under-discussed, as opposed to arguments that are more existential).



The argument is short. Right now, corporations are rushing to shift the paradigm from chat-based language models to autonomous agents that:



1. are fed private information

2. can take actions on users' behalf



But today's LLMs are fundamentally insecure, in that adversaries (let alone clever adversaries) can:



1. extract information from context fed into LLMs (even that which is meant to be hidden)

2. jailbreak LLMs into generating content they shouldn't (in an agent scenario, imagine jailbreaking an email agent to delete someone's emails, or jailbreaking a shopping agent to force a user to purchase items).



This is bad!



The Verification of Digital Information



We are entering into an era in which AI-generated content is becoming near-indistinguishable from content captured by humans. I don't like this.



Thinking about the type of world I want to live in . . . I want to be able to see a picture and know that it was taken by a person. I want to be able to hear a voice recording of my mom and know that it was her speaking, not an algorithm trying to replicate her voice.



This is a hard problem. It's involved in a lot of areas--cryptography, hardware, policy. But I don't want our future to be one in which we haven't solved this.



It seems like we are getting closer, but there is a lot of work to be done.



(As a side note: this is probably the area I would want to pivot to, work-wise, if I wasn't working in ML. There is a lot that I don't know, so if you know more about these things, or just want to talk about them, I'd be super interested to learn and chat!)





To reply you need to sign in.