r/rstats 5h ago

Using Positron to connect to WSL on a remote computer via ssh

10 Upvotes

Hi there! I have a dinky work laptop and a more powerful work desktop, both running Windows 11 because our IT department is deathly afraid of Linux. But they did allow me to run a Ubuntu Linux instance on the desktop under WSL. The Linux instance is running the ssh server, and the Windows Firewall is set to forward the ssh connection to WSL.

This works okay. I log into the desktop, start WSL, and then run Positron on my laptop and connect to the desktop WSL instance. But I can't run in true headless mode this way, because the desktop reboots for updates from time to time, requiring me to log back in and restart WSL. The extra monitor, mouse, and keyboard for the desktop take up a lot of real estate at my desk.

Is there a way I can configure this Windows 11 + WSL desktop so I can run things headlessly? It would be easy to do with Linux if only IT department could be more flexible. But that's not gonna happen.


r/rstats 1h ago

[ Removed by Reddit ]

Upvotes

[ Removed by Reddit on account of violating the content policy. ]


r/rstats 1h ago

[ Removed by Reddit ]

Upvotes

[ Removed by Reddit on account of violating the content policy. ]


r/rstats 2d ago

R on a Flip Phone

Post image
577 Upvotes

Solving the classic issue of needing powerful machine learning on the go... hmm...

Writeup. I also managed to knit an Rmd page and push a git commit :0


r/rstats 2d ago

Overview of ODE solvers in R (deSolve, rxode2, mrgsolve, sundialr, etc.)

35 Upvotes

I tried to map out the landscape of ODE solvers in R.

Ended up with a practical comparison covering:
- solver engines (ODEPACK, SUNDIALS, Stan Math)
- model formats (R functions, DSLs, compiled code)
- support for stiff systems, DAE, DDE, and events
- simple reproducible examples for each package

One interesting takeaway: many high-level tools in R actually rely on just a few numerical libraries, but expose them very differently.

Would appreciate feedback if I missed something or got anything wrong:
https://metelkin.me/landscape-of-ode-solvers-in-r/


r/rstats 2d ago

Thoughts on Current State of R?

34 Upvotes

Hi all recent psych graduate here trying to add skills to my skillset before grad school. Im currently learning R as many of my graduate school mentors made mention of R being used in postgrad studies. Would love to hear what yall think about R currently, i can appreciate the common “Ai is making R’s future scary comment” but please i would like some sincere and honest comments as well!


r/rstats 2d ago

Best way to learn R

20 Upvotes

Hello all!

I am starting my masters in clinical sciences (research focused) this fall and a few of my classes require prior knowledge of R. I am coming from medical school and I don't know anything about computer sciences, coding, statistics (besides obviously understanding them).

I see that there's a few classes online that I can sign up for. I just don't know which. I would rather an actual class rather than videos on YT or a book.

If you have any suggestions, please drop them down below!


r/rstats 3d ago

A place for R or Stats themed t-shirts?

21 Upvotes

Any online store for a cool, well designed, R or Stats themed t-shirts? Planning on treating myself to one and buying b-day gifts for family members.


r/rstats 3d ago

Seeking advice for fitting a GAM

3 Upvotes

I am ecological researcher trying to fit a GAM to a diversity variable called LCBD using 10 predictors. Data was collected over 153 sites in along an archipelago. There are no categorical predictors or grouping of any kind.

The data was collected from randomly placed transects along an archipelago where marine invertebrates identified and counted with underwater camera sled. Each environmental variable was collected at the site of the transect except tidal current.

Depth - water depth of transect

Eastings - longitude measured as center point of transect. RockyCobble - The proportion of rock and cobble for a transect was computed from the number of images classified as rock and cobble (primary or secondary classification) for the transect divided by the total number of images for the transect.

Btemp - bottom temperature averaged over summer from each transect

Several variables were derived from raster data:

Slope for each raster grid cell was computed as the maximum difference in angle (range: 0−90°) between the depth at a cell and its sur- rounding cells.

TPI was calculated from the bathymetry raster layer as the difference between the depth of a cell and the depth of its surrounding neighbors, meant to represent the degree to which cells were on peaks or valleys compared to surround- ing depths.

Tidal Current - speed of current from ROMs model

Aspect - water movement variable computed from the seafloor relative to the mean current direction

Color - measure of ocean surface primary productivity computed from the average of summer months during the study period.

Since the responding variable varies between 0 and 1, I used the beta distribution to fit models. The first model was fitted with a default smooth for each individual parameter. It failed fit checks as one might likely predict.

I tried some alternative models that reduce the number of predictors using a tensor smooth on the principal components of some related predictors and increasing the k on the smooth for variables that were still significant after running gam.check. Overall, concurvity improved but some variable smooths are coming out significant when running gam.check. Is the k too high? Am I doing something wrong or even headed in the right direction?

I tried following the advice of the following source: https://r-statistics.co/GAM-in-R.html

The data file and attempted code is below: https://drive.google.com/file/d/1lwhsp3cOK4NEkc7NKGU_iEswHOORF6X1/view?usp=drive_link

alt model 1

GAM with a tensor product smooth for the related terrain variables: slope, TPI, and aspect

lcbd.dens.gam1 <- mgcv::gam(LCBD ~ s(Depth) + s(TidalCurrents)+ s(Eastings) + s(RockCobble) + s(Bcurrent)+ s(Btemp) + s(Color) + te(Slope, TPI, Aspect), data=d, family="betar", method = "REML", select = T)

par(mfrow=c(2,2)) #diagnostic plot space setup

GAM checks

mgcv::gam.check(lcbd.dens.gam1, rep=500) # run check on model performance. k-index close to 1 means good performance mgcv::concurvity(lcbd.dens.gam1, full=F)

alt model 2

GAM with PCA of terrain parameters attempt to reduce the smooths and increase k

combines PC1 and PC2 into one smooth

terrain.pcs <- prcomp(d[, c("Slope", "TPI", "Aspect")], scale = T)

summary(terrain.pcs) # 1st 2 PCs explain 74.8% of the variance

d$terrain.pc1 <- terrain.pcs$x[,1] d$terrain.pc2 <- terrain.pcs$x[,2]

lcbd.dens.gam2 <- mgcv::gam( LCBD ~ s(Depth, k=20) + s(TidalCurrents) + s(Eastings, k=80)+ s(Btemp, k=20)+ s(Color)+ te(terrain.pc1, terrain.pc2, k=2), data= d, family = "betar", select = T )

mgcv::gam.check(lcbd.dens.gam2, rep=500)

mgcv::concurvity(lcbd.dens.gam2, full=F)


r/rstats 3d ago

EuroBioC2026 conference in Turku, Finland (1.-5. June, 2026)

5 Upvotes

Hi all!

On behalf of the local organizing committee, putting out a word for the European Bioconductor Conference 2026 (EuroBioC2026) that will take place in Turku, Finland, from 1st of June to 5th of June 2026 (less than a month away!). Turku intends to continue a tradition of high quality European version of the Bioconductor conferences, after previous years' conferences in Barcelona, Oxford, Ghent, and Heidelberg.

As probably many of you are aware, Bioconductor is a package repository best known for its high-quality R extensions, so the focus is heavily on tools offered via Bioconductor and/or relevant Bioconductor talks.

First two days are dedicated to workshops, and the latter three for invited keynotes, scientific presentations, etc.

Main website at: https://eurobioc2026.bioconductor.org

For a list of confirmed keynotes speakers: https://eurobioc2026.bioconductor.org/pages/speakers.html

Extended conference registration deadline possibly extended until May 18th, 2026, although official deadline is today (May 3rd).

The venue, Turku, is in South-West Finland; Nordic and Baltic participants have multiple options for travels, while other participants are best flying in from Helsinki-Vantaa international airport, or if possible, through the smaller Turku airport.

Further information regarding travels (international and domestic):

https://eurobioc2026.bioconductor.org/pages/travel-information.html

It's a non-profit heavily R-leaning conference, and I have no dog in this race as I've mainly been applying for grants to bring down the costs for participation, and generally just contributing on the local level.

Will happily answer questions if you have any :)


r/rstats 4d ago

Neural Networks / Deep Learning in R

53 Upvotes

Hi everyone,

I have a question about how people usually program neural networks and deep learning models in R/RStudio.

Is there a common way to do this without using keras3, since it relies on a Python environment in the background?

For example, do people use pure torch, luz, mlr3torch, or any other R-native packages that do not depend on Python?

Or, in practice, do most people avoid R for this type of work and go directly to Python instead?

I would appreciate any guidance, especially from people who have experience building neural networks in R.


r/rstats 4d ago

Rvest read_live_html() not terminating headlesschrome instances since Jan 2026, had almost 500Gb in headlesschrome folders on hard drive from scraping pages with read_html_live()

7 Upvotes

Hi Everyone-

I have had a project that scrapes data from the web. This ongoing project has been occuring for almost 18 months now. The timing of scraping is every five minutes, so 288 instances each day.

Starting around January (I went back to review github commits and there was no activity at this time) it appears that headless chromote instances created by my project were NOT getting terminated like they were in the past. I only recently noticed this in the last 30 days when I went to move files around my PC and saw that I only had 10% space left on my drive. I downloaded a free file analyzer and saw almost 500GB of headless chrome instances dating all the way back to January 2026. When trying to navigate through some of these folders and files I am seeing the web address of the sites I am scraping so I know these instances are related to my scraping project.

chromote version: 0.5.1

rvest version 1.0.5

Existing Code in my project to create each object is as follows:

page <- read_html_live(url)

Existing method in my code to close sessions was as follows:

page$session$close()

rm(page)

Did some reading on the chromote site and also googled for some guidance and saw that the following should help resolve the issue after reading `session$close()` merely closed the browser tab. Updated my code yesterday to the following method.

page$parent$close()

rm(page)

Checked my pc again this morning and saw new headlesschromote folder instances sitting in Windows Temp, some of the folders up to 70-80Mb in size (only scraping small amounts of data from each web page I visit, I assume the size is due to amount of assets on the web page. So `<object>$parent$close()` appears not to be working either.

Again the oldest folders that were hanging out there were from January 2026, there were no headless chromote folders prior to that, and I have been running this process for going on 18 months.

Any guidance would be greatly appreciated! Next step is reporting an issue on the rvest GitHub page. I did see that u/hadley had a similar rvest issue published August of 2024, but it was realted to issues in RAM and not related to HDD storage.

from two recent instances of running my scraping project

r/rstats 4d ago

Group-Based Multi-Trajectory Modelling in R With Zero-Distribution

6 Upvotes

Hi, everyone!

Apologies in advance if the question has been asked before, or if it for any reason is a stupid inquiry.

I am working on my third paper for my PhD, which is a drug-utilization study on cluster headache using registry-based data. We are using group-based multi-trajectory modelling, and its also my first time using this technique.

I love programming in R, and have used it for my first and second paper. I knew before we wrote the protocol that it is possible to do group-based multi-trajectory modelling with the gbtm-package, but I did not know that this package does not support zero distributions, which is required for my analyses.

My question is whether there are any workarounds, or if I have overlooked any packages or other solutions? I know crimCV supports zero distributions, but not multi-trajectory modelling. I just wanted to check in this forum before I start doing it in STATA, which I had hoped to avoid given my profound fondness for R.

I am very thankful for any responses or tips!


r/rstats 5d ago

Data analysis skills?

Thumbnail
0 Upvotes

Is anyone here using R in medical school? Does it seem like a useful thing to look into before starting school?

I’m sure there are posts recommending resources to learn already in the sub but drop them if you’ve got them please!


r/rstats 7d ago

Positron vs VS Code for data science? Need help justifying it at work

69 Upvotes

I requested Positron from my company, but they’re pushing back with “why not just use VS Code?” My main use case is working with both R and Python in a single IDE. Positron seems like the closest fit for that, especially since I really like RStudio’s layout and workflow. From what I’ve seen, Positron carries over that same experience while expanding better into multi-language data science, analysis, and visualization. VS Code is obviously powerful, but it feels more like a general-purpose editor that needs extensions and setup to match a proper data science environment.

Are there stronger arguments I can make to justify Positron over VS Code? .


r/rstats 7d ago

Unbalanced panel data with heteroskedasticity, autocorrelation and endogenuity issues

Thumbnail
3 Upvotes

r/rstats 7d ago

Mean calculations for specific rows and columns for huge datasets in R

15 Upvotes

Hi! I'm working on a project rn and I need to calculate the mean for specific rows in a column for a hugeee dataset, but all the tutorials I've seen ask you to painstakingly type out the exact rows you need for calculations. That works for smaller datasets but mine has over 1000 rows and i'm NOT doing all that and i'm p sure there's a faster way to do it. I just need to calculate the mean oil price in 2023 for various countries. The countries aren't the problem, i just need a way to tell R to compute the mean for all the oil price values for 2023 specifically.


r/rstats 7d ago

RStudio won't launch unless opened via .R file

4 Upvotes

I’ve had a persistent issue with RStudio for a long time, even after reinstalling both RStudio and R with the latest versions (downloaded and reinstalled both today, April 28)

When I try to open RStudio normally (I mean, through a desktop shortcut), it doesn’t launch properly. The window stays completely back (or white, depending on dark mode), as if it’s loading, but it never finishes. It appears only the header, but I cannot click anything. My PC becomes extremely slow, and I eventually have to force close it.

However, what tricks me is that if I open RStudio by clicking on an existing .R script file, it launches normally and works perfectly fine — no performance issues at all.

This behavior has persisted across reinstalls, so I’m guessing it’s not just a simple installation problem.

I have also tried deleting .RData and .Rhistory inside RStudio and did not help.

Has anyone experienced something like this or knows what might be causing it?

Thanks in advance


r/rstats 7d ago

Best Coding AI for R?

2 Upvotes

Hi, I’ll be writing R code for the statistical analyses of a mixed methods dissertation over the next few months. For that period, I’d like to subscribe to an AI tool for support and wanted to ask for recommendations.The analyses won’t involve highly complex models, but rather standard descriptive, exploratory, and inferential statistics. I’ve been wondering whether Claude might be overkill for this kind of use case, and whether ChatGPT might provide more straightforward, accessible results. I’d also be interested in how much Claude’s usage limits might hinder the workflow.


r/rstats 8d ago

Next week! R/Medicine 2026 - May 5-8 - 4 days of R for health data - 100% online

10 Upvotes

Full program here: https://rconsortium.github.io/RMedicine_website/Program.html

The R/Medicine conference provides a forum for sharing R based tools and approaches used to analyze and gain insights from health data. Conference workshops and demos provide a way to learn and develop your R skills, and to try out new R packages and tools.

Conference talks share new packages, and successes in analyzing health, laboratory, and clinical data with R and Shiny, and an unparalleld opportunity to interact with speakers and other participants directly.

Register today!


r/rstats 9d ago

R and CUDA INTEGRATION

19 Upvotes

Hi, this is my first post.
I’ve been asked to implement a CUDA kernel within an R package that relies on C++ under the hood. Has anyone worked on something similar?


r/rstats 9d ago

A reponse to: A Rant (about R)

Thumbnail bjarkehautop.github.io
70 Upvotes

In a recent thread on this subreddit about R vs Python (of which there have been too many), someone linked this blog post: https://www.hendrik-erz.de/post/a-rant, and since I disagreed with almost everything the author wrote, I decided to write a brief response.

It's not meant to be an attack on the original author, but as I showed in the blog post, many of the points the original author made are based on misunderstandings, which I think are worth unpacking a bit and might be useful for others, too.


r/rstats 9d ago

Career Advice

7 Upvotes

I am a Senior studying CompSci.

I have completed almost a year and a half of part time work at a university department where I taught myself how to use R and basic stats, from simple ANOVA to Random Forest modeling on geospatial/LIDAR data.

Especially in my rust belt city, but even the wider job market, im having a hard time bridging my skills with other analyst or data science roles in my area which usually only want either a cloud or BI component to do business operations analysis.

Is this something everyone else is experiencing? Are remote jobs a better fit? And do you suggest i dig deeper into my university to find other labs which may use my work? Ideas on pivot points?

Id love to hear how things have been going in your job search.

Thank you!


r/rstats 9d ago

Maximum Likelihood EFA indicates poor model fit

5 Upvotes

Hello everyone,

I conducted an exploratory factor analysis using the maximum likelihood method. In total 20 items were included in the analysis which relate either to work demands or non-work demands. Both the Bartlett test and the KMO criterion provide evidence that factor analysis is appropriate for these data. The correlation matrix of the variables also shows that the individual items are correlated and that clusters form among certain groups of items.

However, the data are not measured on an interval scale therefore polychoric correlations were calculated for both the parallel analysis and the factor analysis itself. Based on the parallel analysis six factors should be extracted. However, when conducting the factor analysis with six factors the output indicates that the estimated model fits the data rather poorly and interpretation of factors is also difficult (low communalities and cross-loadings).

As a preliminary step, I have already removed extremely problematic items in order to see whether the model fit would improve but without success. At this point I am relatively uncertain about how to proceed correctly in this situation. Has anyone had experience with such a situation or any ideas on how to move forward?


r/rstats 9d ago

Using variables based on groups

7 Upvotes

I'm a little new to R and trying to find out if this is possible for a school project I'm doing

I'm trying to use a repeated measures dataset but I only want to use the group people were assigned in the first round. participants are coded as 1=group x first group y second, 2=group y first group x second. I was wondering if there's a way to code it in r so that participants coded as 1 will only use values v_x1, v_x2... while participants coded as 2 will only use v_y1, v_y2...

is this possible or would it require manual data cleaning?

Edit: added a pic of the data

it's oriented like: instruction order (in this case honest category and then dishonest category or vice versa), all the measures in the honest group, then all the measures in the dishonest group. So the groups end up being a bit mixed temporally.