AI Goes to College - Should You Trust AI?

Issue #1

Jan 30, 2024

Welcome to the first issue of AI Goes to College! The newsletter is devoted to helping higher education professionals navigate the world of generative artificial intelligence (GAI) so that they can use it ethically and effectively.

This week’s issue:

Discusses whether you should trust AI
Gives a short review of Consensus, a promising new app for research
Fills you in on a couple of recent news items, and
Reveals my choice for the best paid AI service if I could only choose one.

The highlights of this newsletter are available as a podcast, which is also called AI Goes to College. You can subscribe to the newsletter and the podcast at https://www.aigoestocollege.com/. The newsletter is also available on Substack (https://aigoestocollege.substack.com/). Subscribers get a free copy of my Getting Started with Generative AI guide! The guide is published under a Creative Commons non-commercial license, so you can share it as you please (as long as you don’t charge for it).

Should you trust AI?

Should you trust AI? As is often the case, the answer is "maybe" or perhaps "sometimes." Whether to trust the results of generative AI (GAI) is an interesting, tricky question. If you trust it too much, you run the risk of relying on bad outputs. If you don't trust it enough, you spend too much time double-checking its outputs, which can severely hurt the efficiency of using GAI in your work.

In some ways, "Should you trust AI?" is kind of a pointless question. If someone asks me this, I would respond by asking ,"For what?". Whether you should trust GAI depends on two main factors: GAI's capabilities and the consequences of depending on faulty output. In many respects, it's not that different from figuring out whether you should trust a person.

One of the key factors in determining someone's trustworthiness is their ability in a particular context. You can trust me to help you figure out GAI, but you really shouldn't trust me to fix your car. My abilities are pretty good in the first task, but I am NOT competent in car repair. (My first attempt at changing my car's oil was like a bad Monty Python routine.) We're all good at some things and bad at others.

GAI is the same. It's really good for some things and terrible for others. In general, GAI is pretty good at tasks involving language, such as generating exam questions or routine communication. It's also quite good at critiquing things that you've written. For example, I often run parts of this newsletter through GAI to get a critique. It's also good at summarizing documents and other forms of content, including audio and video (which is pretty amazing). GAI is also great at helping you brainstorm ideas. I've used it to develop reflection prompts for my classes. It works pretty well.

GAI is not great at tasks that involve nuanced, complex decision-making or deep understanding. It also doesn't do well with very novel tasks or tasks that require highly specialized knowledge. Fact-checking recent events is also a weak area for GAI. GAI is also bad at tasks that require true human empathy or understanding of nuanced contexts.

Developing a solid understanding of GAI's capabilities and limitations requires experience and experimentation. You just have to play with it to test its limits. Experts can help you understand the general capabilities and limitations, but you use it to really understand how you should use it in your particular situation.

The other part of the GAI trust puzzle is to consider the consequences of bad output. I tend to think about trusting GAI like I think about trusting Wikipedia. I'm interested in history, so I often look up information about some historical event or ancient civilization. Wikipedia is my go-to for this sort of information. Does it really matter if Wikipedia gets some of the facts wrong? Not really. But for my academic work, I don't use Wikipedia. If I rely on Wikipedia and get something wrong, my reputation suffers. When I'm working on a research paper, I check my facts very carefully, usually through multiple authoritative sources. If I want to know the year of the Battle of Hastings, I go to Google or Wikipedia. (It's 1066, by the way.)

So, back to the original question. You should trust GAI for some things but not for others. Sorry, but there aren't any hard and fast rules that apply to all situations. My general advice is to be a skeptical user of GAI until you fully understand its capabilities and limitations for the tasks that are important to you. Verify the important stuff. The more important, the more effort you should put into checking GAI's work, just like a human assistant.

Tool of the week - Consensus.app

Consensus (https://consensus.app) has received quite a bit of attention, especially with the recent launch of OpenAI's GPT store (https://chat.openai.com/gpts). Consensus was one of only four featured GPTs on the GPT store page.

Consensus describes itself as "Your AI Research Assistant. Search 200M academic papers from Consensus, get science-based answers, and draft content with accurate citations." That's a pretty tall order but an exciting prospect. As any academic will tell you, it's a lot of work to pour through dozens of papers to get a sense of what the literature says about a topic.

So, does Consensus deliver on this promise? ... kind of. What Consensus does is actually pretty amazing. If you want a quick sense of the literature, Consensus can help. There's even a "Synthesize" button that produces a short overview of the main findings.

Consensus is NOT a chatbot, it's an AI-fueled search engine that uses the Semantic Scholar database as its source material. The database is the source of Consensus' strengths and weaknesses. Semantic Scholar's database is extensive but problematically limited for some fields.

The results of my initial tests were lukewarm. I asked Consensus about a couple of research areas I know quite well. Although I didn't take exception with the summary that Consensus produced, the specific studies it returned didn't always include the most influential papers in the area and it missed some highly-cited papers in top journals. For example, in one area, it missed papers with hundreds of citations and included a study in a third-rate journal that had only eight citations.

I also wanted to see if Consensus could help for more practice-oriented tasks (as opposed to scholarly research). The results were better here. I asked about programs that can improve the success of first-generation college students. Consensus gave me a nice set of research-backed programs, although the list of supporting papers still seemed a little random. Overall, though, the results were useful.

So, if you want to get a quick handle on the literature in an area, Consensus can help. But, it is no substitute for the hard work of a proper literature review. Despite its current limitations, Consensus is useful for some tasks and has a promising future. In other words, it's a solid start but has quite some way to go.

Is Consensus worth it? Maybe. It depends on your particular use case. I suggest trying it out for yourself. There's a free tier that is more than sufficient for testing its capabilities. A Premium subscription is $6.99 a month (billed annually), which seems reasonable. Here's a link to their pricing page: https://consensus.app/pricing/.

Even if Consensus isn't right for you currently, I recommend setting up a free account and checking back occasionally. The promising start bodes well for its future.

Thank you for reading AI Goes to College. Please share this post if you know someone who might benefit from reading it.

ASU - OpenAI Partnership

Recently, OpenAI and Arizona State University announced a partnership that will give ASU students and faculty access to the "most advanced iteration of ChatGPT." (I'm not quite sure what that means.)

This is big news, although it wasn't all that surprising to me. ASU is the largest public university in the United States (I think). ASU's President, Michael Crow is known for pushing the boundaries of higher ed. So, I wasn't really surprised when I saw that ASU was OpenAI's first higher ed partner.

It will be interesting to see how this plays out, and I have mixed feelings about the partnership. On the one hand, this may turn out to be a bunch of hype without much substance. Anyone who has been around higher ed for a while is skeptical about these sorts of partnerships (with good reason). It's also possible that applications will be shoved down the throats of faculty without sufficient thought and testing. Time will tell.

But I think this is a step in the right direction. Banning generative AI is wrong-headed and ultimately a losing battle. So, I've advocated embracing generative AI since my first interactions with ChatGPT. I teach my students how to use generative AI ethically and effectively, and I urge my colleagues to do the same. Having a huge, well-resourced university act as a test bed may ultimately be beneficial for all of us. As I said, time will tell. Regardless, I'll be keeping a close eye on how this partnership plays out. You might want to do the same.

Link: https://www.insidehighered.com/news/quick-takes/2024/01/19/arizona-state-joins-chatgpt-first-higher-ed-partnership#

Anthology's AI Framework for Higher Ed

Anthology, an edtech firm, recently released its AI Policy Framework. The framework is based on seven principles: fairness, accountability, reliability, value alignment, humans in control, transparency & explainability, and privacy, security & safety. The document outlining the framework also includes advice on identifying and engaging stakeholders and other aspects of developing a policy. (https://www.govtech.com/education/higher-ed/purdue-leaves-generative-ai-guidelines-up-to-professors)

Although I'm generally skeptical of documents like this coming from organizations with "skin in the game," Anthology's framework is reasonable, although not particularly innovative. I have no problem with the seven principles the framework is based on. However, I can see the use of the framework turning into a never-ending series of task force(s), meetings, and reports ... eventually resulting in the hiring of a consultant.

Look, I get it. Generative AI is groundbreaking for higher ed and we're all trying to figure out how to respond. I personally favor an approach that lays out some common sense guidelines that can be applied across disciplines, then allowing colleges, departments, and individual faculty to develop their own guidelines. Purdue is following this sort of approach: https://www.govtech.com/education/higher-ed/purdue-leaves-generative-ai-guidelines-up-to-professors.

There is a need for universities to develop institutional-level guidelines about how AI is going to be used beyond individual classes. I have some huge privacy and autonomy concerns about the use of AI coupled with learning analytics. So, I'm all in favor of some detailed debate around the use of AI in higher ed. But, faculty need to figure out what to do now; there isn't time to wait for the typical slow pace of higher ed.

If you're trying to develop your own AI guidelines, I'm happy to share mine. If you sign up for this newsletter I'll send you my Guide to Getting Started with Generative AI, which includes my policy. Just go to AIGoesToCollege.com and sign up. It's free and I won't try to sell you anything. You can also email me at craig@EthicalAIUse.com.

Link: https://www.anthology.com/sites/default/files/2024-01/Mastering%20AI%20Policies-A%20Framework%20for%20Institutional%20Alignment-v4_11-23.pdf

Share AI Goes to College

Poe.com for the win

The options for chat-based generative AI (GAI) seem to be expanding rapidly. I'm all for playing around with the free versions of these tools, just to see what they can do. But paying for the "pro" versions of different tools can quickly get expensive. (Ask me how I know.) I use ChatGPT 4 most of the time, but it wouldn't be my choice if I could only use one paid tool.

If I was only going to pay for one, it would either be ChatGPT Pro (https://chat.openai.com/) or Poe.com's (https://poe.com) paid version. Although I really like ChatGPT Pro, if I had to choose it would be Poe. A detailed comparison of the options is beyond my scope here, but the main reason I would choose Poe is its flexibility. As I'm writing this, a subscription to Poe gets you access to 29 AI chatbots, along with a huge number of custom bots created by other users. ChatGPT Pro only gives you access to OpenAI's models. Although these are fantastic, there are other models that are useful. So, Poe's versatility is often useful.

The paid version of Poe also has a couple of other nice advantages. It includes access to Claude-2 100K, which allows you to upload large documents as part of your chat. For example, you can upload a long pdf and then ask questions about it. Claude-2 100K's document size limit (approximately 75,000 words) is much larger than that of ChatGPT 4. Also, because Poe offers so many different chatbots, running over message limits is less of a problem. ChatGPT 4 currently has a soft limit of 40 messages every three hours. Whether this limit is enforced seems unpredictable, so it's nice to have workarounds available.

Microsoft's Copilot (https://copilot.microsoft.com) is also a solid free choice. The paid version opens up some useful capabilities, such as integration with Office apps. But right now, I'd still choose Poe. (You may be able to get the paid version's capabilities of Copilot through your university.)

My general advice is to play around with the free tools before you start paying. Poe is still my choice for free chatbots since it also includes access to numerous chatbots, including ChatGPT 3.5 and Claude, among numerous others. An email address is all you need for the free version. I recommend Poe to my students since its free version is so flexible. But, there's really no reason not to play with the free versions of all of the major AI chatbots. Your preferences may differ from mine; ultimately the choice is subjective. Also, Poe is my choice today but tools are evolving rapidly, so I might make a different choice next week.