Error in colmeans x na rm true x must be numeric

I'm trying to execute a Principal Components Analysis, but I'm getting the error: Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric I know all the columns have to be numeric, but how to han...

I’m trying to execute a Principal Components Analysis, but I’m getting the error: Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric

I know all the columns have to be numeric, but how to handle when you have character objects in the data set? E.g:

data(birth.death.rates.1966)
data2 <- birth.death.rates.1966
princ <- prcomp(data2)
  • data2 example of data below:

enter image description here

Should I add a new column referring the country name to a numeric code? If yes, how to do this in R?

amonk's user avatar

amonk

1,7602 gold badges18 silver badges27 bronze badges

asked May 25, 2017 at 4:28

Rubens Rodrigues's user avatar

0

You can convert a character vector to numeric values by going via factor. Then each unique value gets a unique integer code. In this example, there’s four values so the numbers are 1 to 4, in alphabetical order, I think:

> d = data.frame(country=c("foo","bar","baz","qux"),x=runif(4),y=runif(4))
> d
  country          x         y
1     foo 0.84435112 0.7022875
2     bar 0.01343424 0.5019794
3     baz 0.09815888 0.5832612
4     qux 0.18397525 0.8049514
> d$country = as.numeric(as.factor(d$country))
> d
  country          x         y
1       3 0.84435112 0.7022875
2       1 0.01343424 0.5019794
3       2 0.09815888 0.5832612
4       4 0.18397525 0.8049514

You can then run prcomp:

> prcomp(d)
Standard deviations:
[1] 1.308665216 0.339983614 0.009141194

Rotation:
               PC1          PC2          PC3
country -0.9858920  0.132948161 -0.101694168
x       -0.1331795 -0.991081523 -0.004541179
y       -0.1013910  0.009066471  0.994805345

Whether this makes sense for your application is up to you. Maybe you just want to drop the first column: prcomp(d[,-1]) and work with the numeric data, which seems to be what the other «answers» are trying to achieve.

answered May 25, 2017 at 7:34

Spacedman's user avatar

SpacedmanSpacedman

91.5k12 gold badges136 silver badges218 bronze badges

The first column of the data frame is character. So you can recode it to row names as :

library(tidyverse)
data2 %>% remove_rownames %>% column_to_rownames(var="country")
princ <- prcomp(data2)

Alternatively as :

data2 <- data2[,-1]
rownames(data2) <- data2[,1]
princ <- prcomp(data2)

answered May 25, 2017 at 4:49

parth's user avatar

4

In R, adding the factor method to a character set of data, does not make it numeric.
Indeed it is to make our machine learning model a mathematical model but it is not numeric data.

Example: If you have a list of names and then they are being encoded numerically then it may happen that a certain name may have a higher numerical value which will give it a different definition depending on our model.
Which should not be the case as names(text data which is just for labeling a specific set) generally should not define the way a model should work.

Also if you try working with this data assuming it to be numeric, you may get the following error:

Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric

I have defined why you may get this error above

To overcome this problem

training_set[,2:3] = scale(training_set)
test_set[,2:3] = scale(test_set)

In the following image, columns 1 and 4 have encoded data and cannot be treated as a numerical model Columns 2 and 3 have been originally containing numerical data so we can run our model only on that part of the data. The above code just shows how to select the data it includes all rows and columns 2 and 3
RStudio screen shot

Community's user avatar

answered Mar 25, 2020 at 9:38

Aditya Jadhav's user avatar

1


One error message you may encounter when using R is:

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

This error usually occurs when you attempt to use the prcomp() function to perform principal components analysis in R, yet one or more of the columns in the data frame you’re using is not numeric.

There are two ways to get around this error:

Method 1: Convert Non-Numeric Columns to Numeric

Method 2: Remove Non-Numeric Columns from Data Frame

The following examples show how to use each method in practice.

How to Reproduce the Error

Suppose we attempt to perform principal components analysis on the following data frame that contains a character column:

#create data frame
df <- data.frame(team=c('A', 'A', 'C', 'B', 'C', 'B', 'B', 'C', 'A'),
                 points=c(12, 8, 26, 25, 38, 30, 24, 24, 15),
                 rebounds=c(10, 4, 5, 5, 4, 3, 8, 18, 22))

#view data frame
df

  team points rebounds
1    A     12       10
2    A      8        4
3    C     26        5
4    B     25        5
5    C     38        4
6    B     30        3
7    B     24        8
8    C     24       18
9    A     15       22

#attempt to calculate principal components
prcomp(df)

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

The team column is a character column, which causes an error when we attempt to use the prcomp() function.

Method 1: Convert Non-Numeric Columns to Numeric

One way to avoid the error is to convert the team column to a numeric column before using the prcomp() function:

#convert character column to numeric
df$team <- as.numeric(as.factor(df$team))

#view updated data frame
df

  team points rebounds
1    1     12       10
2    1      8        4
3    3     26        5
4    2     25        5
5    3     38        4
6    2     30        3
7    2     24        8
8    3     24       18
9    1     15       22

#calculate principal components
prcomp(df)

Standard deviations (1, .., p=3):
[1] 9.8252704 6.0990235 0.4880538

Rotation (n x k) = (3 x 3):
                 PC1        PC2         PC3
team     -0.06810285 0.04199272  0.99679417
points   -0.91850806 0.38741460 -0.07907512
rebounds  0.38949319 0.92094872 -0.01218661

This time we don’t receive any error because each column in the data frame is numeric.

Method 2: Remove Non-Numeric Columns from Data Frame

Another way to avoid the error is to simply remove any non-numeric columns from the data frame before using the prcomp() function:

#remove non-numeric columns from data frame
df_new <- df[ , unlist(lapply(df, is.numeric))]

#view new data frame
df_new

  points rebounds
1     12       10
2      8        4
3     26        5
4     25        5
5     38        4
6     30        3
7     24        8
8     24       18
9     15       22

#calculate principal components
prcomp(df_new)

Standard deviations (1, .., p=2):
[1] 9.802541 6.093638

Rotation (n x k) = (2 x 2):
                PC1       PC2
points    0.9199431 0.3920519
rebounds -0.3920519 0.9199431

Once again, we we don’t receive any error because each column in the data frame is numeric.

Note: In most cases, the first method is the preferred solution because it allows you to use all of the data rather than removing some of the columns.

Additional Resources

The following tutorials explain how to fix other common errors in R:

How to Fix in R: Arguments imply differing number of rows
How to Fix in R: error in select unused arguments
How to Fix in R: replacement has length zero

When you are doing a Principal Components Analysis, you will get the “error in colmeans(x, na.rm = true) : ‘x’ must be numeric” error message if one of your columns has characters or other non-numeric values. Fortunately, there is a simple solution for fixing this problem. It simply involves translating a factor variable into a numeric variable.

Description of the error

This error message occurs because when you are doing a Principal Components Analysis, the values of each column of your data frame have to have numeric values. If it has characters or other non-numeric values such as missing values, you will get our error message. This occurs because the prcomp function only works with numeric values, so as a result, you will get an error message if the values are not numeric. As a result, if you need to run this kind of analysis, you need to make sure that you are giving it only numeric values. If you give it non-numeric values, you will get our error message.

Explanation of the error

The following example contains code that produces our error message. You should note column Z of the data frame.

> t = as.numeric(Sys.time())
> set.seed(t)
> z = c(“A”, “B”, “C”, “D”, “E”)
> x = rnorm(5)
> y = rnorm(5)
> df = data.frame(z, x, y)
> df
z x y
1 A 0.02307778 0.41365815
2 B 0.63213959 0.77502100
3 C -0.91366753 1.83374930
4 D 0.90422176 -0.09915274
5 E 0.75987927 -0.77146351
> pr = prcomp(df)
Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric

If you look at data frame df you will notice that column Z has characters instead of numbers. It is this fact that triggers our error message because it is looking only for numeric values.
<h2>How to fix the error.</h2>Here we have an example of how to fix this problem. As long as you can convert the column into a factor, you can easily convert it into a numeric value. This is what we do in this example, and it fixes the problem.

> t = as.numeric(Sys.time())
> set.seed(t)
> z = c(“A”, “B”, “C”, “D”, “E”)
> x = rnorm(5)
> y = rnorm(5)
> df = data.frame(z, x, y)
> df
z x y
1 A 1.0158299 1.3621230
2 B -1.0393691 -0.4218296
3 C 0.1113177 0.5536360
4 D 1.8122020 1.1435097
5 E -1.3957393 0.9001602
> df2 = df
> df2$z = as.numeric(as.factor(df2$z))
> df2
z x y
1 1 1.0158299 1.3621230
2 2 -1.0393691 -0.4218296
3 3 0.1113177 0.5536360
4 4 1.8122020 1.1435097
5 5 -1.3957393 0.9001602
> pr = prcomp(df2)
> pr
Standard deviations (1, .., p=3):
[1] 1.664002 1.351141 0.469779

Rotation (n x k) = (3 x 3):
PC1 PC2 PC3
z 0.86675116 -0.4768597 -0.1461071
x -0.49435044 -0.7826437 -0.3782676
y -0.06603079 -0.4000920 0.9140932

If you will take a look at the difference between data frames df and df2, you will see that in df2 column Z is a series of numbers rather than letters. This conversion was accomplished by converting the column into a factor and then converting the factor into a list of numeric values.

This error message results from a simple mistake to make, but one that is also easy to fix. It is a simple matter of making sure that what you are putting through a Principal Components Analysis is only a numeric variable. This one simple correction will allow you to do the analysis without any errors. This means that you will get the results that you are looking for.

In this tutorial, I’ll demonstrate how to avoid the “Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric” in R.

Creation of Example Data

data(iris)                            # Loading example data
head(iris)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

Example 1: Replicating the Error Message in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric

prcomp(iris)                          # prcomp function cannot be applied to character column
# Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

Example 2: Debugging the Error Message in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric

iris_numb <- iris                     # Transforming categories to numbers
iris_numb$Species <- as.numeric(as.factor(iris_numb$Species))
head(iris_numb)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2       1
# 2          4.9         3.0          1.4         0.2       1
# 3          4.7         3.2          1.3         0.2       1
# 4          4.6         3.1          1.5         0.2       1
# 5          5.0         3.6          1.4         0.2       1
# 6          5.4         3.9          1.7         0.4       1
prcomp(iris_numb)                     # Applying prcomp function to new data frame
# Standard deviations (1, .., p=5):
# [1] 2.1996441 0.5023804 0.3094851 0.1914559 0.1443656
# 
# Rotation (n x k) = (5 x 5):
#                      PC1         PC2        PC3         PC4        PC5
# Sepal.Length  0.33402494 -0.68852577  0.4414776 -0.43312829  0.1784853
# Sepal.Width  -0.08034626 -0.68474905 -0.6114140  0.30348725 -0.2423462
# Petal.Length  0.80059273  0.09713877  0.1466787  0.49080356 -0.2953177
# Petal.Width   0.33657862  0.06894557 -0.4202025  0.06667133  0.8372253
# Species       0.35740442  0.20703034 -0.4828930 -0.68917499 -0.3482135

Related Tutorials & Further Resources

You may find some related R programming tutorials on topics such as coding errors and ggplot2 below.

  • Error in sort.int(x, na.last, decreasing, …) : ‘x’ must be atomic
  • Fix R Error – stat_count Must not be Used with a Y Aesthetic
  • Error in ggplot2 – must be data frame not integer
  • Error in as.Date.numeric: Origin must be Supplied

Ezoicreport this ad

Hi, I’m trying to do PCA for my data using R and I keep getting the error message:

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

I checked my data and all are numeric and no string text exists. I already scaled my data to remove variables as an answer to my initial problem of having correlated variables. Here’s what I got so far:

> summary(dorsalisaf.scale) TL Depth Temp Cond pH 0.900 : 7 10.58 :207 28.1490:207 161.5239:207 4.0700 :207 1.060 : 7 198.00 :115 26.2715:115 1805.4100:115 4.0400 :115 0.970 : 6 6.92 : 62 29.3471: 62 55.5422: 62 3.9882 : 62 1.024 : 6 14.63 : 54 29.1181: 54 143.0539: 54 3.8267 : 54 1.051 : 6 0.90 : 24 27.3099: 24 182.1667: 24 4.0600 : 24 1.119 : 6 61.75 : 15 25.6917: 15 219.7335: 15 3.9473 : 15 (Other):483 (Other): 44 (Other): 44 (Other) : 44 (Other): 44 Sal 0.0699 :207 0.8900 :115 0.0227 : 62 0.0608 : 54 0.0833 : 24 0.1034 : 15 (Other): 44

——

> str(dorsalisaf.scale) 'data.frame': 521 obs. of 6 variables:
$ TL : Factor w/ 284 levels "0.712","0.747",..: 33 120 16 30 24 64 6 78 91 20 ...
$ Depth: Factor w/ 13 levels " 0.90"," 6.92",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Temp : Factor w/ 13 levels "25.4937","25.6917",..: 7 7 7 7 7 7 7 7 7 7 ...
$ Cond : Factor w/ 13 levels " 55.5422"," 143.0539",..: 5 5 5 5 5 5 5 5 5 5 ...
$ pH : Factor w/ 13 levels "3.8122","3.8267",..: 7 7 7 7 7 7 7 7 7 7 ...
$ Sal : Factor w/ 13 levels "0.0227","0.0608",..: 6 6 6 6 6 6 6 6 6 6 ...

I can’t proceed with using prcomp() command because of this.

Dear Sacha,

Whilst trying to plot a model, I get the following error

Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric

I get this error only in the following combination (both conditions must be present):
Condition 1: When the sem(fit, df, ordered=c(«X1», «X2″…)) is instructed to treat specific variables as ordered.
Condition 2: When in the semPaths command I want to use my own matrix than the standard one provided by the software.

With my data, when I remove either ordered=, or when use one of the predefined layouts (tree, circle, etc.), I do not get the above problem.

I do not wish to burden you with my data or code, so below is an example which reproduces the result (The example, not my data, creates a problem when I don’t define specific variables as ordered, but hopefully the below example is helpful).

set.seed(1234)
X1 <- sample(0:2, 100, replace=TRUE)
X2 <- sample(0:2, 100, replace=TRUE)
X3 <- sample(0:2, 100, replace=TRUE)
X4 <- sample(0:2, 100, replace=TRUE)

df <- data.frame(X1, X2, X3, X4)

example.model <- '
latent =~ X1 + X2 + X3 + X4
'

fit <- cfa(example.model,
           df,
           ordered = c("X1","X2","X3","X4")
           )
summary(fit, fit.measures=TRUE, standardized=TRUE)

#create plot
semPaths(fit)

# Create plot using self defined matrix
L <- matrix(
  c(
    NA, 	  "latent", 	NA, 	NA, 
    "X1", 	"X2", 	"X3", "X4") 
  ,4 )

Graph <- semPaths(fit, layout=L)
Graph$graphAttributes$Edges$curve <-
  ifelse(Graph$Edgelist$bidir, 1, 0)
plot(Graph)



June 27, 2019, 02:48:51 PM
Hi,
Any body knows how to solve this R PCA problem? I`ve try many times, I still don`t know the reason. I hope somebody could help me
When I type:
> dtp<-prcomp(tdt,retx= TRUE,center= TRUE,scale= TRUE)
Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric

Thanks in advance for all your help.


Re: Help on PCA: Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric



Reply #1 – June 28, 2019, 08:31:50 AM
Hello,
One of your columns contains non-numeric information. This might be because your data frame (tdt) has a column specifying the treatment. Here is an example with the iris data set which has «Species» in column 5.

data(«iris»)
pca <- prcomp(iris)  # Fails with «Error in colMeans(x, na.rm = TRUE) : ‘x’ must be numeric»
pca <- prcomp(iris[1:4])  # Success

Hope this helps.


Понравилась статья? Поделить с друзьями:
  • Error in closing in the unzipfile
  • Error in class librarylwjglopenal
  • Error in class codecjorbis
  • Error in child compilations use stats children true resp stats children for more details
  • Error in chartodate x текстовая строка не относится к стандартному однозначному формату