GitHub Copilot in RStudio,
it’s finally here!


Tom Mock: Product Manager  

linkedin.com/in/jthomasmock/

2024-06-04

This talk closes issue #10148

GitHub copilot integration with RStudio, the issue header

Ok, so what’s generative AI?

Generative AI refers to a category of AI models and tools designed to create new content, such as text, images, videos, music, or code. Generative AI uses a variety of techniques—including neural networks and deep learning algorithms—to identify patterns and generate new outcomes based on them. - GitHub blog

Midjourney prompt:

png transparent background, a seated, robotic android boston terrier wearing pilot goggles, in the style of pixel art, full body --style raw --stylize 50

Generative AI for text

For text generation, Generative AI just wants to predict the next word/token/string!

I might ask ChatGPT (asking is also known as “prompting”):

“Complete the sentence every good.”

‘Trust’ but verify

  • Generative AI doesn’t have a ‘brain’ or general intelligence
  • It’s just a model that’s been trained on a lot of data
  • It’s not always right, appropriate, or optimal
  • It can make up things that aren’t true, or use code that doesn’t actually exist (or run!)
  • So it’s important to verify the output before using it
  • But we can use it to quickly experiment and maybe provide a novel direction – basically “prompt” ourselves and our own knowledge

What is Copilot?

GitHub Copilot is an AI pair programmer that offers autocomplete-style suggestions and real-time hints for the code you are writing by providing suggestions as “ghost text” based on the context of the surrounding code - GitHub Copilot docs

Autocomplete

  • Parses code and environment
  • Supplies possible completions
  • Static set of completions in popup
  • IDE provided from local disk

Copilot

  • Parses code, environment and training data
  • Supplies likely completions
  • Dynamic set of completions via ‘ghost text’
  • Generative AI provided via API endpoint

Autocomplete vs Copilot

Autocomplete in RStudio, with a list of possible completions

Copilot in RStudio, with a list of possible completions

RStudio, now with ‘Ghost Text’



Context

Copilot generated
‘ghost text’

Ghost text doesn’t “exist” in
document until accepted

Copilot in RStudio, with a list of possible completions

Copilot in RStudio

gif of Copilot autocompletion

Copilot in RStudio

Diagram of Copilot autocompletion

Get started

  • Get a subscription to GitHub Copilot, Personal or Business
  • Tools > Global Options > Copilot Tab

Sign in page GitHub Auth Signed-in

Don’t be afraid to play around!

Keyword is a word game, popularized by the Washington Post.

Keyword screenshot Keyword screenshot, solved

How can we solve the Keyword game?

With RStudio + Copilot of course!

Getting the most out of the generative loop

GitHub Copilot is a generative AI tool that can be used to generate (ie predict) output text, and more specifically, code.

Generative AI doesn’t understand anything. It’s just a prediction engine! - David Smith

  • Context - What prompts have been provided?
  • Intent - What is the user trying to do?
  • Output - What is actually returned?

Better context == closer to intent == better output





S2C
Simple, Specific, and use Comments!

Simple and specific: Break down complex tasks

Provide a high-level description of the project goal at the top level, and build off that with more specific tasks.

For example, I know how to play Keyword, and I know others have solved similar games (Wordle) with R.

# create a function to solve the Keyword game
# this game uses a 6 letter horizontal word at the
# intersection of 6 other vertical words, 
# where a missing letter from each of the vertical words
# accounts for one letter of the mystery 6 letter length horizontal word

# how to play
# Guess 6 letters across.
# Letters must spell words across and down.

Simple(st): Cheat Be clever

library(jsonlite)
url <- "https://keyword-client-prod.red.aws.wapo.pub/levels/2023/08/09.json"
raw_json <- fromJSON(url)
raw_json$answer
[1] "staple"


Ok that’s cheating, but what else is in there?

raw_json$words
[1] "_ee"    "chan_"  "b_re"   "s_ur"   "real_y" "scal_" 
[1] "[s]ee"    "chan[t]"  "b[a]re"   "s[p]ur"   "real[l]y" "scal[e]" 

Simple: Just get the hint words

Break it down into component parts, let’s work with the hint words!

json_url function screenshot with ghost-text

Simple: Just get the hint words

raw_json <- json_url(date = "2023-08-09") |> jsonlite::fromJSON(simplifyVector = FALSE)

hints <- as.character(raw_json$words)
hints
[1] "_ee"    "chan_"  "b_re"   "s_ur"   "real_y" "scal_" 

Specific: Use expressive names (and comments)

  • Use expressive names for variables, functions, and objects (this is best practice anyway!)

Copilot in RStudio, with a list of possible completions

Copilot in RStudio, with a list of possible completions

Specific: Use expressive names

top_words function with ghost-text

letters_from_blank function with ghost-text

S2C: guess_keyword() function

# Solved with Copilot and some human ingenuity in RStudio!
guess_keyword("2023-09-09")
 The keyword is one of the following:
 recipe, repipe 

 The played words are some of the following:
 [r]aid, stat[e], [c]ap, gamb[i]t, [p]arent, decr[e]e
 [r]aid, stat[e], [p]ap, gamb[i]t, [p]arent, decr[e]e 

Getting stuck?

  • Add more context, and follow S2C (Simple, Specific, and use Comments)
  • Prompt again or in a different way
  • Add more top-level or inline comments
  • Build off your own momentum (write some of your own code)
  • Turn off Copilot for a bit

Better context == closer to intent == better output

More than one way to generate text

Ghost Text

Ghost text in RStudio, with a list of possible completions

Chat with {chattr}

Chat with chattr in RStudio, with a list of possible completions

{chattr}

Chat with chattr in RStudio viewer pane

{chattr} - enriched requests

library(chattr)

data(mtcars)
data(iris)

chattr(preview = TRUE)
#> ── Preview for: Console
#> • Provider: Open AI - Chat Completions
#> • Path/URL: https://api.openai.com/v1/chat/completions
#> • Model: gpt-3.5-turbo
#> • temperature: 0.01
#> • max_tokens: 1000
#> • stream: TRUE
#> 
#> ── Prompt:
#> role: system
#> content: You are a helpful coding assistant
#> role: user
#> content:
#> * Use the 'Tidy Modeling with R' (https://www.tmwr.org/) book as main reference
#> * Use the 'R for Data Science' (https://r4ds.had.co.nz/) book as main reference
#> * Use tidyverse packages: readr, ggplot2, dplyr, tidyr
#> * For models, use tidymodels packages: recipes, parsnip, yardstick, workflows,
#> broom
#> * Avoid explanations unless requested by user, expecting code only
#> * For any line that is not code, prefix with a: #
#> * Keep each line of explanations to no more than 80 characters
#> * DO NOT use Markdown for the code
#> [Your future prompt goes here]

Diagram of chattr request

Generative AI tools with Posit Workbench and RStudio

  • For best outputs, follow S2C (Simple, Specific, and use Comments)

  • GitHub Copilot is an optional integration, available as a Preview feature in 2023.09 release of RStudio and Posit Workbench

  • To provide feedback or report bugs, please open a GitHub Issue on the RStudio repo

  • {chattr} provides integrations to OpenAI’s REST API models, Copilot Chat, and locally hosted models such as LLaMA, and more backends are expected. Install via: remotes::install_github("mlverse/chattr")

Robot cat

Robot cat

Robot cat

Additional Slides

What about different backend models for RStudio?

There will be some options to use community-created RStudio add-ins to connect to other models and still make use of ghost text:

Example of arbitrary ghost text

What other things can Copilot generate?

# test the time_between function with testthat
library(testthat)

test_that("time_between works", {
  expect_equal(time_between("2023-08-15", "2022-08-15", "days"), -365)
  expect_equal(time_between("2023-08-15", "2022-08-15", "weeks"), -52.1428571428571)
  expect_equal(time_between("2023-08-15", "2022-08-15", "months"), -12)
  expect_equal(time_between("2023-08-15", "2022-08-15", "years"), -1)
  expect_equal(time_between("2022-12-06", Sys.Date(), "decades"), "Please enter a valid time unit")
})

Example of Copilot generating code

Example of Copilot generating code

{chattr} interface

The main way to use chattr is through the Shiny Gadget app. By default, it runs inside the Viewer pane, and use as_job = TRUE in RStudio to run it in the background:

# Change default to run as job via: `options(chattr.as_job=TRUE)`
chattr::chattr_app(as_job = TRUE)

Screenshot of chattr running in RStudio

{chattr}: Available models

chattr provides two main integration with two main LLM back-ends. Each back-end provides access to multiple LLM types:

Provider Models Setup Instructions
OpenAI GPT Models accessible via the OpenAI’s REST API. chattr provides a convenient way to interact with GPT 3.5, and DaVinci 3. Interact with OpenAI GPT models
LLamaGPT-Chat LLM models available in your computer. Including GPT-J, LLaMA, and MPT. Tested on a GPT4ALL model. LLamaGPT-Chat is a command line chat program for models written in C++. Interact with local models

The idea is that as time goes by, more back-ends will be added.

What about GenAI for other IDEs in Posit Workbench?

  • Posit Workbench will be upgrading JupyterLab to v4, and dev team exploring JupyterAI
  • VS Code has many extensions for Generative AI, including Codeium, Google Gemini, AWS Code Whisperer and Amazon Q, Sourcegraph’s Cody, Tabnine and more…

Customization in RStudio

RStudio with Copilot

Customization in RStudio

){fig-alt=“RStudio with Copilot”}

hide-logo.js