Recent Activity
How to get posterior parameters used in each imputation with monotone regression method? I wonder how to get the parameters used in each imputation when we apply monotone regression method in multiple imputation procedure. Let us use the example dataset as follows, data Fish1;
title 'Fish Measurement Data';
input Length1 Length2 Length3 @@;
datalines;
23.2 25.4 30.0 24.0 26.3 31.2 23.9 26.5 31.1
26.3 29.0 33.5 26.5 29.0 . 26.8 29.7 34.7
26.8 . . 27.6 30.0 35.0 27.6 30.0 35.1
28.5 30.7 36.2 28.4 31.0 36.2 28.7 . .
29.1 31.5 . 29.5 32.0 37.3 29.4 32.0 37.2
29.4 32.0 37.2 30.4 33.0 38.3 30.4 33.0 38.5
30.9 33.5 38.6 31.0 33.5 38.7 31.3 34.0 39.5
31.4 34.0 39.2 31.5 34.5 . 31.8 35.0 40.6
31.9 35.0 40.5 31.8 35.0 40.9 32.0 35.0 40.6
32.7 36.0 41.5 32.8 36.0 41.6 33.5 37.0 42.6
35.0 38.5 44.1 35.0 38.5 44.0 36.2 39.5 45.3
37.4 41.0 45.9 38.0 41.0 46.5
;
run; Then we apply monotone regression method provided by `proc mi`. proc mi data=Fish1 nimpute=5 seed=769097 out=outex3;
monotone reg(Length2= Length1 / details);
var Length1 Length2;
ods output MonoReg=MonoReg;
run; We can see the result of estimated parameters used in first imputation are -0.044703 (Intercept) and 0.982951 (Length1). According to the supporting document, regression method is based on Rubin's book (1987, Multiple Imputation for Nonresponse in Surveys, pp. 166–167). The specific steps are demonstrated here. I followed the steps with code shown as below, but I can't get the same result of the parameters used in the imputation. I just repeat all the steps one time, so only one result was gotten. Presented by beta_star, my results were -0.04867 (Intercept) and 0.9804691 (Length1). What's wrong with my code? /* Extract records with missing Length2 */
data Length2_missing;
set fish2;
if missing(Length2);
keep Length1;
run;
/* Extract observed records (non-missing Length2) for regression modeling */
data Length2_observed;
set fish2;
where not missing(Length2);
keep Length1 Length2;
run;
/* Fit a regression model: Length2 ~ Length1 using GENMOD */
proc genmod data=Length2_observed; model Length2 = Length1 / dist=normal link=identity; /* Normal distribution with identity link */ ods output ParameterEstimates=beta_est(where=(Parameter='Intercept' | Parameter='Length1')); output out=residuals pred=pred_y /* Predicted values */ resraw=raw_res /* Raw residuals */ reschi=pearson_res /* Pearson residuals */ resdev=deviance_res; /* Deviance residuals */ run;
/* Calculate residual variance (s^2) and degrees of freedom (df) */
proc sql noprint;
select count(*) into :n from Length2_observed; /* Total observed sample size (n) */
select &n - 2 into :df from Length2_observed; /* Degrees of freedom: df = n - p (p=2 parameters: intercept + Length1) */
quit;
/* Bayesian posterior sampling for β using IML (Matrix Language) */
proc iml;
/* Read regression coefficients (β_hat) */
use beta_est;
read all var {'Estimate'} into beta_hat; /* Extract β estimates */
close beta_est;
/* Read residuals to compute Sum of Squared Errors (SSE) */
use residuals;
read all var {'raw_res'} into RAW; /* Raw residuals */
RESIDUAL = RAW##2; /* Squared residuals */
SSE = (1/(&n - 2)) * sum(residual[, 1]); /* Residual variance estimator: s^2 = SSE / df */
close residuals;
/* Construct design matrix X (with intercept) */
use Length2_observed;
read all var {'Length1'} into X; /* Predictor variable */
close Length2_observed;
X = j(nrow(X), 1, 1) || X; /* Add intercept column (1s) */
/* Cholesky decomposition of covariance matrix for sampling */
U = root(inv(X` * X)); /* Cholesky root of (X'X)^(-1) */
/* Set random seed for reproducibility */
call randseed(769097); /* Initialize random number generator */
call streaminit(769097); /* Synchronize random stream */
/* Generate random normal vector Z ~ N(0, I) */
Z = randnormal(1, {0, 0}, {1 0, 0 1}); /* Z: 1x2 vector from standard normal */
/* Sample posterior sigma2 from inverse chi-square distribution */
sigma2 = SSE * &df / rand('chisq', &df); /* sigma2 ~ Inv-Chisq(df, SSE) */
/* Sample β_star from posterior: β_star = β_hat + σ * U * Z */
beta_star = beta_hat + sqrt(sigma2) * U * Z`; /* Posterior draw of coefficients */
print beta_star; /* Display sampled β_star */
/* Save sampled β_star to a dataset */
create beta_sample from beta_star[colname={'beta_star'}];
append from beta_star;
close beta_sample;
quit;
... View more

0
0
Hi all, I'm working on a predictive modeling pipeline in SAS Viya using Open Source Code nodes (Python). The pipeline works well during training, but when I add a Scoring Node, I get this : ERROR: Scoring cannot be completed because new variables have been created by one or more Open Source node(s). I understand this may be due to the fact that new variables (like predictions) are being created dynamically in the Python code, but I haven't found a clear solution to properly register or declare them for scoring. I've had several long chats with ChatGPT — no luck so far, I'm now in desperate need of help I'm attaching the code below for context. Any help, advice, or working example would be sincerely appreciated! Thanks in advance 🙏 Training Code : import pandas as pd
import numpy as np
import lightgbm as lgb
target_col = 'Stress_level'
target_months = [202308, 202309, 202409, 202410]
record = dm_inputdf.copy()
dm_interval_input = ["DATA_YM", "blood_pressure", "sleep_duration", "work_hours", "age"]
rec_intv = record[dm_interval_input].astype(np.float32)
rec_all = pd.concat([rec_intv.reset_index(drop=True)], axis=1)
record["target_group"] = np.where(record["DATA_YM"].isin(target_months), "tr1", "tr2")
tr1 = rec_all[record["target_group"] == "tr1"]
tr2 = rec_all[record["target_group"] == "tr2"]
y_tr1 = record.loc[record["target_group"] == "tr1", target_col].astype(np.float32)
y_tr2 = record.loc[record["target_group"] == "tr2", target_col].astype(np.float32)
model_tr1 = lgb.LGBMRegressor(
n_estimators=3000,
learning_rate=0.05,
max_depth=12,
num_leaves=13,
force_row_wise=True
)
model_tr1.fit(tr1, y_tr1)
model_tr2 = lgb.LGBMRegressor(
n_estimators=3000,
learning_rate=0.05,
max_depth=12,
num_leaves=13,
force_row_wise=True
)
model_tr2.fit(tr2, y_tr2)
pred_tr1 = model_tr1.predict(tr1)
pred_tr2 = model_tr2.predict(tr2)
record.loc[record["target_group"] == "tr1", "P_Stress_level"] = np.clip(pred_tr1, 0, None)
record.loc[record["target_group"] == "tr2", "P_Stress_level"] = np.clip(pred_tr2, 0, None)
dm_scoreddf = record.copy()
dm_scoreddf["P_Stress_level"] = dm_scoreddf["P_Stress_level"].astype(np.float64)
dm_scoreddf=dm_scoreddf[[
"Stress_level", "blood_pressure", "sleep_duration", "work_hours", "age", "P_Stress_level", "DATA_YM"]]
dm_scoreddf["P_Stress_level"].attrs.update({
"role": "PREDICTION",
"level": "INTERVAL",
"description": "LightGBM Predition"
}) Scoring Code : import pandas as pd
import numpy as np
def score_method(blood_pressure, sleep_duration, work_hours, age, DATA_YM):
"Output: P_Stress_level"
record = pd.DataFrame([[blood_pressure, sleep_duration, work_hours, age, DATA_YM
]], columns=['blood_pressure', 'sleep_duration', 'work_hours', 'age', 'DATA_YM'
])
dm_interval_input = [col for col in record.columns if col not in dm_class_input]
rec_intv = record[dm_interval_input]
rec_intv_imp = imputer.transform(rec_intv)
rec = np.concatenate((rec_intv_imp), axis=1)
rec_pred = model_tr1.predict(rec) if int(DATA_YM) in target_months else model_tr2.predict(rec)
return float(np.clip(rec_pred[0], 0, None))
... View more

0
0
Hi all I would like to create a new variable based on another column in sas visual analytics. I have this dataformat: ItemID ItemID_reference My wanted output 1111 . 0 2222 8888 0 3333 . 0 4444 1111 1 5555 . 0 6666 7777 1 7777 . 0 How can I match the value from ItemID_reference to the values of ItemID? In other words: If a value of ItemID_reference is any of ItemID, return 1 else 0. Thanks!
... View more

0
2
I have about forty jobs getting data from somewhere online (where? I don't know; online's everywhere). Most of them will pick up only a handle of incremental files each day and put them on our Linux terminal server; they then get pushed to an S3 bucket (somewhere else!), and from there read into Snowflake. As of today, it works perfectly, except for one job.
We've ironed out almost all the procedural problems, except for one. When I run the jobs through DI (I've created a bespoke transformation which takes the name of the source table and automates the whole process through to Snowflake), they all run fine. But when they've been deployed and run under the service account, the big job (which reads roughly new 11.5k files a day) always crashes. Today's run was when it attempted file 3,574.
Because the log becomes quickly unmanageable and for security reasons, I mask it by using option nomprint, but I expose where it's up to and the error messages.
From the log:
File 3,570: 07MAY2025:22:16:11 /org/warehouse/bin/gateway/edh/org_table_name/_change_data/cdc-00068-5dafcc5b-7512-4ecb-9175-91d01fb39600.c000.snappy.parquet
File 3,571: 07MAY2025:22:16:11 /org/warehouse/bin/gateway/edh/org_table_name/part-00066-c7df53d0-0747-4e30-8c39-16c9ac9d075b.c000.snappy.parquet
File 3,572: 07MAY2025:22:16:11 /org/warehouse/bin/gateway/edh/org_table_name/part-00067-e7b5f225-f351-448a-b5f9-4624b613fdc0.c000.snappy.parquet
File 3,573: 07MAY2025:22:16:11 /org/warehouse/bin/gateway/edh/org_table_name/part-00068-f00a655d-10c3-4b17-93dd-2083d213618b.c000.snappy.parquet
ERROR: tkzCapture() failed
ERROR: tkzCapture() failed
ERROR: tkzCapture() failed
ERROR: Unable to establish an SSL connection.
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Message file "t0b4en" is not found.
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Message file "t0b4en" is not found.
ERROR: Message file is not loaded.
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Message file "t0b4en" is not found.
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Message file "t0b4en" is not found.
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0a2en.so: cannot open shared object file: Too many open files)
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0a2en.so: cannot open shared object file: Too many open files)
ERROR: Message file "t0a2en" is not found.
ERROR: Message file is not loaded.
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0a2en.so: cannot open shared object file: Too many open files)
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0a2en.so: cannot open shared object file: Too many open files)
ERROR: Message file "t0a2en" is not found.
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Message file "t0b4en" is not found.
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Extension Load Failure: OS Error: -1 (/sso/sfw/sas/940/SASFoundation/9.4/sasexe/t0b4en.so: cannot open shared object file: Too many open files)
ERROR: Message file "t0b4en" is not found.
ERROR: Message file is not loaded.
WARNING: Apparent symbolic reference SYS_PROCHTTP_STATUS_CODE not resolved.
WARNING: Apparent symbolic reference SYS_PROCHTTP_STATUS_CODE not resolved.
ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was: &sys_prochttp_status_code > 200
ERROR: %EVAL function has no expression to evaluate, or %IF statement has no condition.
I think that sys_prochttp_status_code is destroyed at the top of each http call and created again very soon after, so I suspect that the error is being picked up at the procedure initialisation.
I've checked both my and the service account's Linux ulimit values - both 350,000, so the Too many open files would appear to be a red herring.
Here's the meat of the getfiles macro:
%do i = 1 %to &files;
%let rc = %sysfunc(fetchobs(&dsid, &i));
%let file = %sysfunc(strip(&file));
%if %eval(%sysfunc(indexc(&file, %str(/)))) %then %do;
%let sub_directory = %sysfunc(scan(&file, 1, %str(/)));
%if %eval(%sysfunc(fileexist(&parent_directory/&source/&sub_directory)) = 0) %then /* Create each non-existant directory */
%let rc = %sysfunc(dcreate(&sub_directory, &parent_directory/&source));
%end;
%if %eval(%sysfunc(fileexist(&parent_directory/&source/&file)) = 1) %then /* Don't bother re-getting a file */
%goto EndLoop;
filename source "&parent_directory/&source/&file";
%let url = https://&source_url/files/download?;
%let url = &url.tableName=&source.%nrstr(&file=)&file;
%let fail_count = 0;
/*
Every (hour - 500 seconds), get another bearer code. It is only valid for an hour, so 500 seconds short
will (prob'ly) always work. If it doesn't, something else has gone wrong. This should be good for around 12-15,000 files at a time.
*/
%if %sysevalf(%sysfunc(datetime()) > "&bearer_expiry"dt) %then
%renew_bearer;
%do %until(%eval(&sys_prochttp_status_code) = 200);
proc http url="&url"
proxyhost="http://webproxy.vsp.sas.com:3128"
oauth_bearer="&bearer"
in='scope=urn://onmicrosoft.com/vcp/api/vbi/.default'
out=source
timeout=1000 /* How long to wait (seconds) */
method='get';
headers 'Accept' = 'application/json'
'consistencylevel' = 'eventual';
run;
%if %eval(&sys_prochttp_status_code > 200) %then %do;
%put %sysfunc(strip(%sysfunc(datetime(), datetime23.3))) HTTP Status code: &sys_prochttp_status_code %refnumv(val=&i) &=file;
%let fail_count = %eval(&fail_count + 1);
%if %eval(&fail_count > 5) %then %do;
%check_status
%goto EndMac;
%end;
%let rc = %sysfunc(sleep(30, 1));
%end;
%end;
%put File %refnumv(val=&i): %sysfunc(strip(%sysfunc(datetime(), datetime19.))) %sysfunc(strip(%sysfunc(putn(&lastmodified, datetime23.)))) &parent_directory/&source/&file;
filename source clear;
%EndLoop:
%end;
I could obviously check for the symbol existence of sys_prochttp_status_code before I check its contents - but its non-existence isn't something I had considered!
I'm pretty much convinced that it's something specific with the service account, but my ingestion jobs run through it literally thousands of times a day without error, including many that use proc http, and I've never seen this before.
Has anyone ever seen anything like this before? What is tkzCapture()? what are toa4en and t0b4en, and why can't they be (re-)opened? They do exist - seven years old. Maybe it's a factor of running M6; M8 may be getting installed mid-year.
... View more

0
1
Hi SAS Community! The aim id to calculate the sample size for average bioequivalence trial. I would like to replicate the below example from "Sample Size Calculations in Clinical Research" by Chow and Shao (3rd edition). Below you can see the formula that is used as well. So, with the SD=0.4, delta = 0.05, limit = 0.223, alpha = 0.05 and 80% power I am using the below code: proc power; twosamplemeans test=equiv_diff DIST=NORMAL lower = -0.223 upper = 0.223 meandiff = 0.05 stddev = 0.4 npergroup = . power = 0.8; run; But I am getting 69 subjects required, not 21 (as in the book, or 24 from the table approximation) Is there a different formula SAS is using? Or should I use a different procedure? Sorry if I am missing something, just trying to get my head around that. Thank you Guys, Agnieszka
... View more

0
1
Unanswered topics
These topics from the past 30 days have no replies. Can you help?
Subject | Likes | Author | Latest Post |
---|---|---|---|
0 | |||
0 | |||
0 | |||
0 | |||
0 | |||
0 | |||
0 | |||
2 | |||
1 | |||
0 |