name: r-python-translation description: >- R-to-Python translation for data analysis. Maps R packages (tidyverse, ggplot2, fixest, survey, sf, plm) to Python equivalents (polars, plotnine, pyfixest, svy, geopandas). Use when user has R background or requests R-equivalent code comments. metadata: audience: research-coders domain: research-methodology skill-last-updated: "2026-03-28"

R-to-Python Translation Skill

R-to-Python translation reference for quantitative social science data analysis. Maps R ecosystem packages (tidyverse/dplyr, ggplot2, fixest, survey, sf, plm, lme4, marginaleffects, rdrobust) to DAAF Python equivalents (polars, plotnine, pyfixest, statsmodels, linearmodels, svy, geopandas). Use when user mentions R/RStudio background, requests R-equivalent code comments, needs to understand Python analysis code from an R perspective, or wants to translate R data analysis concepts to Python. Covers paradigm differences, verb-by-verb operation translations, regression modeling, causal inference, visualization, and workflow adaptation.

Cross-language translation reference for researchers moving between the R and Python data analysis ecosystems. This skill maps R packages, idioms, and workflows to their DAAF Python equivalents so that R-background users can audit, understand, and learn from DAAF-produced code, and so that code-producing agents can annotate their output with R equivalents when directed.

This skill is a routing hub — it provides overview tables, decision trees, and directs readers to the detailed reference files listed below. The reference files contain the exhaustive verb-by-verb mappings, code examples, and edge-case documentation.

What This Skill Does

Maps the R data analysis ecosystem to DAAF's Python stack across data wrangling, modeling, visualization, causal inference, surveys, spatial analysis, and workflow tooling
Provides a structured annotation protocol for agents to add inline R-equivalent comments to Python code
Identifies paradigm gaps where R and Python diverge fundamentally, so users know where to expect friction

Use cases:

R user auditing DAAF Python code and needing to understand what operations are being performed
Agent annotating code with R-equivalent comments for an R-background researcher
R user learning Python for data analysis and needing a conceptual bridge
Translating a specific R operation or idiom to its Python equivalent
Understanding where R tools have no direct Python equivalent (and what the workaround is)

How to Use This Skill

Reference File Structure

Each topic in ./references/ contains focused documentation:

File	Purpose	When to Read
`paradigm-differences.md`	Core language and paradigm differences	Encountering fundamental R-vs-Python confusion
`polars-dplyr.md`	Core dplyr/tidyr to polars verb mapping (select, filter, mutate, joins, reshaping, window functions, lazy eval)	Reading or writing data manipulation code
`polars-strings-dates-factors.md`	String, date/time, and factor operations (stringr, lubridate, forcats to polars)	Working with string/date/categorical columns
`regression-modeling.md`	fixest/stats/plm to pyfixest/statsmodels/linearmodels	Reading or writing regression code
`visualization.md`	ggplot2/plotly R to plotnine/plotly Python	Reading or writing visualization code
`causal-inference.md`	R causal inference ecosystem to Python equivalents	Working with DiD, RDD, IV, event studies
`survey-spatial-ml.md`	survey/sf/tidymodels to svy/geopandas/scikit-learn	Working with surveys, spatial data, or ML
`workflow-environment.md`	RStudio/Quarto workflow to DAAF/marimo workflow	Adapting to DAAF's execution model
`external-resources.md`	Curated guides and tutorials with provenance	Seeking additional learning materials
`gotchas.md`	Common R-user mistakes in Python	Debugging or reviewing code from R perspective

Reading Order

R user auditing DAAF code: paradigm-differences.md then the relevant domain file (e.g., polars-dplyr.md for data wrangling, regression-modeling.md for models) then gotchas.md
Agent annotating code with R equivalents: Agent Code Annotation Protocol section below, then the relevant domain file for the code being annotated
Learning Python from R background: paradigm-differences.md then polars-dplyr.md then workflow-environment.md then external-resources.md
Looking up a specific translation: Quick Decision Trees below, then the relevant reference file

Quick Decision Trees

"How do I do X from R in Python?"

What kind of R operation?
├─ Data wrangling (filter, mutate, join, pivot, summarise)
│   └─ ./references/polars-dplyr.md
├─ Regression / statistical modeling
│   └─ ./references/regression-modeling.md
├─ Plotting / visualization
│   └─ ./references/visualization.md
├─ Causal inference (DiD, RDD, IV, event studies)
│   └─ ./references/causal-inference.md
├─ Surveys / spatial / machine learning
│   └─ ./references/survey-spatial-ml.md
└─ Fundamental language differences (types, syntax, environment)
    └─ ./references/paradigm-differences.md

"Why does this Python code look different from R?"

What looks unfamiliar?
├─ Expression syntax (pl.col().method().alias())
│   └─ ./references/paradigm-differences.md
├─ Missing values (None vs NaN vs null vs NA)
│   └─ ./references/paradigm-differences.md
├─ Formula interface (~) behaves differently
│   └─ ./references/regression-modeling.md
├─ Import patterns and namespacing
│   └─ ./references/gotchas.md
└─ No interactive REPL / console workflow
    └─ ./references/workflow-environment.md

"I want to translate an R script to Python"

What does the R script do?
├─ Loads and wrangles data (read_csv, dplyr verbs)
│   └─ ./references/polars-dplyr.md
├─ Runs regressions (lm, feols, plm)
│   └─ ./references/regression-modeling.md
├─ Creates plots (ggplot, plotly)
│   └─ ./references/visualization.md
├─ Uses survey weights (svydesign, svymean)
│   └─ ./references/survey-spatial-ml.md
├─ Spatial operations (sf, st_join)
│   └─ ./references/survey-spatial-ml.md
├─ Multiple of the above
│   └─ Start with ./references/paradigm-differences.md, then each relevant file
└─ Uses a package not listed above
    └─ ./references/external-resources.md for broader ecosystem guidance

"Something isn't working and I think it's an R habit"

What went wrong?
├─ 1-indexed access gave wrong element
│   └─ ./references/gotchas.md
├─ Factor/categorical behaves differently
│   └─ ./references/gotchas.md
├─ NA handling surprised me
│   └─ ./references/paradigm-differences.md
├─ Pipe operator (|> or %>%) not available
│   └─ ./references/paradigm-differences.md
├─ library() vs import confusion
│   └─ ./references/gotchas.md
└─ Model output structure is different
    └─ ./references/regression-modeling.md

"Which Python package replaces my R package?"

Which R package?
├─ dplyr / tidyr / readr / tibble → polars
│   └─ ./references/polars-dplyr.md
├─ ggplot2 → plotnine
│   └─ ./references/visualization.md
├─ plotly (R) → plotly (Python)
│   └─ ./references/visualization.md
├─ fixest → pyfixest
│   └─ ./references/regression-modeling.md
├─ stats (lm, glm) → statsmodels
│   └─ ./references/regression-modeling.md
├─ plm / lme4 / estimatr → linearmodels
│   └─ ./references/regression-modeling.md
├─ survey → svy
│   └─ ./references/survey-spatial-ml.md
├─ sf / terra → geopandas
│   └─ ./references/survey-spatial-ml.md
├─ tidymodels / caret → scikit-learn
│   └─ ./references/survey-spatial-ml.md
├─ marginaleffects → marginaleffects (Python)
│   └─ ./references/regression-modeling.md
├─ rdrobust / did / synthdid → rdrobust / pyfixest DiD
│   └─ ./references/causal-inference.md
└─ Quarto / RMarkdown → marimo
    └─ ./references/workflow-environment.md

Package Mapping Overview

Python Package	R Equivalent	Fidelity	Key Difference
polars	dplyr + tidyr + data.table	Low	Expression system vs verb grammar; method chaining vs pipe
pyfixest	fixest	High	Near-identical formula syntax; minor SE default differences
plotnine	ggplot2	High	Same grammar of graphics; Python string quoting for aes
plotly	plotly (R)	High	`px.scatter()` vs `plot_ly()`; similar output
statsmodels	base R stats + lmtest + sandwich	Medium	Three formula dialects; manual vcov specification
linearmodels	plm + lme4 + estimatr	Medium	Requires pandas MultiIndex for panel structure
scikit-learn	tidymodels / caret	Medium	Imperative fit/predict vs declarative recipe pipeline
geopandas	sf + terra	Medium	shapely geometries vs sfc; different CRS handling
svy	survey (Lumley)	Medium	Limited GLM family coverage (gaussian/binomial/Poisson only)
marimo	Quarto / RMarkdown	Medium	Reactive cells vs knit-based linear execution

Fidelity key: High = near-direct translation, same mental model. Medium = same capability, different API patterns. Low = fundamentally different paradigm requiring conceptual remapping.

Library Versions

Translations in this skill reference specific library versions. Python versions are pinned in DAAF's Docker environment (Python 3.12). R versions reference CRAN releases as of March 2026. When syntax or behavior has changed between versions, the reference files note the change.

Python Package	DAAF Version	R Equivalent	R Version (CRAN)
polars	1.38.1	dplyr + tidyr + data.table	dplyr 1.2.0, tidyr 1.3.2, data.table 1.18.2
pyfixest	0.40.0	fixest	0.14.0
plotnine	0.15.3	ggplot2	4.0.2
plotly	6.5.2	plotly (R)	4.12.0
statsmodels	0.14.6	base R stats + lmtest + sandwich	lmtest 0.9-40, sandwich 3.1-1
linearmodels	unpinned	plm + lme4 + estimatr	plm 2.6-7, lme4 2.0-1
scikit-learn	1.8.0	tidymodels / caret	tidymodels 1.4.1, caret 7.0-1
geopandas	1.1.3	sf + terra	sf 1.1-0, terra 1.9-11
svy	0.13.0	survey	survey 4.5
marginaleffects	unpinned	marginaleffects (R)	0.32.0
rdrobust	unpinned	rdrobust (R)	3.0.0
marimo	0.19.11	Quarto / RMarkdown	Quarto 1.6.x

Unpinned packages: linearmodels, marginaleffects, and rdrobust install the latest version at Docker build time. Translations for these packages reference their documented API as of March 2026.

R version note: R package versions are from CRAN as of March 2026 (R 4.5.3). Check packageVersion("pkg") in your R installation to verify your local version matches.

Top 10 Paradigm Differences

These are the friction points R users encounter most frequently when reading or writing DAAF Python code. Each is covered in depth in the referenced file.

#	Friction Point	R Way	Python Way	Reference
1	Expression system	`df %>% mutate(x = a + b)`	`df.with_columns((pl.col("a") + pl.col("b")).alias("x"))`	`paradigm-differences.md`
2	Formula fragmentation	One universal `~` syntax	Three dialects (pyfixest, statsmodels, linearmodels)	`regression-modeling.md`
3	Missing values	Single `NA` type	`None`, `NaN`, and `null` (context-dependent)	`paradigm-differences.md`
4	mutate equivalent	`mutate(new = expr)`	`with_columns(expr.alias("new"))`	`polars-dplyr.md`
5	No row index	Tibbles have row numbers	Polars has no row index; use `with_row_index()`	`paradigm-differences.md`
6	Polars-to-pandas bridge	Data frames go directly into models	Must call `.to_pandas()` before statsmodels/pyfixest	`paradigm-differences.md`
7	Factor vs Categorical	`factor()` with ordered levels	`pl.Categorical` / `pd.Categorical` (different semantics)	`gotchas.md`
8	Package fragmentation	One package per domain (fixest does it all)	Multiple packages per domain (statsmodels + linearmodels + pyfixest)	`paradigm-differences.md`
9	1-indexed vs 0-indexed	`x[1]` is first element	`x[0]` is first element	`gotchas.md`
10	Namespace model	`library()` exports all names	`import` requires explicit namespacing	`gotchas.md`

Agent Code Annotation Protocol

This section defines when and how code-producing agents add inline R-equivalent comments to DAAF Python scripts.

When to Annotate

Annotations are added only when the orchestrator explicitly passes an R-background directive to the agent. This is not a default behavior.

Trigger conditions (orchestrator activates this when any apply):

User states they have an R / RStudio background
User requests R-equivalent comments in code
User asks to understand Python code from an R perspective

How the orchestrator passes the directive: The orchestrator adds the following to the agent prompt:

"User has R background. Load r-python-translation skill. Add inline R-equivalent comments for non-trivial data operations."

Comment Format

# R: df %>% filter(year == 2020)
filtered = df.filter(pl.col("year") == 2020)

# R: df %>% mutate(pct = count / sum(count))
result = df.with_columns(
    (pl.col("count") / pl.col("count").sum()).alias("pct")
)

# R: feols(y ~ x1 + x2 | state + year, data = df, cluster = ~state)
fit = pf.feols("y ~ x1 + x2 | state + year", data=pdf, vcov={"CRV1": "state"})

What to Annotate

Annotate: Data wrangling (polars operations), modeling calls (pyfixest, statsmodels, linearmodels), visualization layer construction (plotnine, plotly), causal inference method calls
Do NOT annotate: Import statements, print()/assert validation lines, file I/O boilerplate (pl.read_parquet, df.write_parquet), config sections, section separator comments

Rules

One # R: comment per logical operation, placed on the line immediately above the Python code
Keep annotations to a single line; abbreviate complex R pipelines if needed
R annotations are in addition to standard IAT comments (# INTENT:, # REASONING:, # ASSUMES:), not a replacement
Consumer agents: research-executor, code-reviewer, debugger, data-ingest

Related Skills

Skill	Relationship
`polars`	Python-side data wrangling — detailed API reference for the dplyr/tidyr equivalent
`pyfixest`	Python-side fixed effects regression — detailed API for the fixest equivalent
`plotnine`	Python-side static visualization — detailed API for the ggplot2 equivalent
`plotly`	Python-side interactive visualization — detailed API for plotly R equivalent
`statsmodels`	Python-side general modeling — covers base R stats, lmtest, sandwich equivalents
`linearmodels`	Python-side panel/IV models — covers plm, lme4, estimatr equivalents
`scikit-learn`	Python-side ML — covers tidymodels/caret equivalents
`geopandas`	Python-side spatial data — covers sf/terra equivalents
`svy`	Python-side survey analysis — covers survey (Lumley) equivalents
`marimo`	Python-side notebooks — covers Quarto/RMarkdown workflow equivalents
`stata-python-translation`	Parallel skill for Stata-background users — shares the same Python target stack

Note: Individual tool skills contain library-specific usage guidance (syntax, gotchas, performance). This skill provides the R-to-Python conceptual bridge — use both together when an R-background user is working with a specific library.

Topic Index

Topic	Reference File
Pipe operator (`%>%` / `	>`) equivalents
Expression system (pl.col, .alias)	`./references/paradigm-differences.md`
Missing value semantics (NA vs None/NaN/null)	`./references/paradigm-differences.md`
Type system differences	`./references/paradigm-differences.md`
Package/namespace model	`./references/paradigm-differences.md`
0-indexing vs 1-indexing	`./references/paradigm-differences.md`
Polars-to-pandas conversion for modeling	`./references/paradigm-differences.md`
Row index differences	`./references/paradigm-differences.md`
dplyr verb mapping (filter, select, mutate, arrange)	`./references/polars-dplyr.md`
summarise / group_by equivalents	`./references/polars-dplyr.md`
tidyr verbs (pivot_longer, pivot_wider, separate, unite)	`./references/polars-dplyr.md`
Join operations (left_join, inner_join, anti_join)	`./references/polars-dplyr.md`
String operations (stringr vs polars .str)	`./references/polars-strings-dates-factors.md`
Date operations (lubridate vs polars .dt)	`./references/polars-strings-dates-factors.md`
across() / where() equivalents	`./references/polars-dplyr.md`
case_when equivalent	`./references/polars-dplyr.md`
readr I/O equivalents	`./references/polars-dplyr.md`
fixest formula syntax in pyfixest	`./references/regression-modeling.md`
lm() / glm() in statsmodels	`./references/regression-modeling.md`
Formula interface comparison (three Python dialects)	`./references/regression-modeling.md`
Standard error specification differences	`./references/regression-modeling.md`
plm panel models in linearmodels	`./references/regression-modeling.md`
lme4 mixed effects equivalents	`./references/regression-modeling.md`
marginaleffects (R to Python)	`./references/regression-modeling.md`
Model summary / tidy output	`./references/regression-modeling.md`
Sandwich / robust SE equivalents	`./references/regression-modeling.md`
ggplot2 layer mapping to plotnine	`./references/visualization.md`
aes() string quoting in plotnine	`./references/visualization.md`
Theme customization	`./references/visualization.md`
Scale functions	`./references/visualization.md`
Faceting (facet_wrap, facet_grid)	`./references/visualization.md`
plotly R vs plotly Python	`./references/visualization.md`
ggsave equivalent	`./references/visualization.md`
Difference-in-differences (did, did2s)	`./references/causal-inference.md`
Regression discontinuity (rdrobust)	`./references/causal-inference.md`
Instrumental variables (ivreg vs pyfixest IV)	`./references/causal-inference.md`
Event study designs	`./references/causal-inference.md`
Synthetic control	`./references/causal-inference.md`
Matching / propensity scores	`./references/causal-inference.md`
survey package to svy	`./references/survey-spatial-ml.md`
svydesign / svymean / svyglm equivalents	`./references/survey-spatial-ml.md`
sf spatial operations to geopandas	`./references/survey-spatial-ml.md`
CRS / projection handling	`./references/survey-spatial-ml.md`
Spatial joins (st_join vs sjoin)	`./references/survey-spatial-ml.md`
tidymodels pipeline to scikit-learn	`./references/survey-spatial-ml.md`
RStudio vs DAAF workflow	`./references/workflow-environment.md`
Quarto / RMarkdown vs marimo	`./references/workflow-environment.md`
Interactive console vs file-first execution	`./references/workflow-environment.md`
Package management (renv vs pip/uv)	`./references/workflow-environment.md`
Project structure conventions	`./references/workflow-environment.md`
Curated R-to-Python migration guides	`./references/external-resources.md`
Package documentation links	`./references/external-resources.md`
Tutorial recommendations with provenance	`./references/external-resources.md`
1-indexed list/vector access	`./references/gotchas.md`
Factor vs Categorical pitfalls	`./references/gotchas.md`
library() vs import habits	`./references/gotchas.md`
T/F vs True/False	`./references/gotchas.md`
Assignment operator (<- vs =)	`./references/gotchas.md`
Vectorized operations expectations	`./references/gotchas.md`
NULL vs None differences	`./references/gotchas.md`
apply family vs map/list comprehension	`./references/gotchas.md`
Copying semantics (R copy-on-modify vs Python references)	`./references/gotchas.md`
Logical operators (& /	vs and / or)
String interpolation (glue vs f-strings)	`./references/gotchas.md`
data.table vs polars	`./references/polars-strings-dates-factors.md`
Lazy evaluation (polars LazyFrame vs R lazy tibble)	`./references/polars-dplyr.md`
nest/unnest equivalents	`./references/polars-dplyr.md`
Window functions (over vs mutate + group_by)	`./references/polars-dplyr.md`
Coordinate systems (coord_flip, coord_polar)	`./references/visualization.md`
Stat layers (stat_smooth, stat_summary)	`./references/visualization.md`
Color palette mapping (viridis, brewer)	`./references/visualization.md`
Multi-panel layouts (patchwork vs subplot)	`./references/visualization.md`
Staggered DiD estimators	`./references/causal-inference.md`
Parallel trends testing	`./references/causal-inference.md`
BRR / jackknife replication weights	`./references/survey-spatial-ml.md`
Raster data handling (terra vs rasterio)	`./references/survey-spatial-ml.md`
Feature engineering (recipes vs sklearn Pipeline)	`./references/survey-spatial-ml.md`
Cross-validation (rsample vs sklearn)	`./references/survey-spatial-ml.md`
Environment/workspace differences (.RData vs nothing)	`./references/workflow-environment.md`
Debugging workflow (browser() vs breakpoint())	`./references/workflow-environment.md`
R help system (?func) vs Python help(func)	`./references/workflow-environment.md`
Cheat sheet and quick-reference links	`./references/external-resources.md`
Community resources (Stack Overflow tags, forums)	`./references/external-resources.md`

ナビゲーション

Skillsとは？

リンク

r-python-translation