R Read First 4 Digit of Cell
Addressing Data
Overview
Teaching: 20 min
Exercises: 0 minQuestions
What are the unlike methods for accessing parts of a data frame?
Objectives
Empathize the three unlike ways R can address information inside a data frame.
Combine dissimilar methods for addressing data with the assignment operator to update subsets of data.
R is a powerful language for data manipulation. In that location are iii principal ways for addressing information inside R objects.
- By alphabetize (subsetting)
- By logical vector
- By proper name
Lets start by loading some sample data:
dat <- read.csv ( file = 'data/sample.csv' , header = TRUE , stringsAsFactors = Faux )
The outset row of this csv file is a list of cavalcade names. We used the header = True statement to
read.csv
so that R can interpret the file correctly. We are using the stringsAsFactors = False argument to override the default behaviour for R. Using factors in R is covered in a separate lesson.
Lets take a await at this data.
R has loaded the contents of the .csv file into a variable chosen dat
which is a data frame
.
We can compactly display the internal structure of a information frame using the structure part str
.
'data.frame': 100 obs. of 9 variables: $ ID : chr "Sub001" "Sub002" "Sub003" "Sub004" ... $ Gender : chr "m" "m" "m" "f" ... $ Group : chr "Control" "Treatment2" "Treatment2" "Treatment1" ... $ BloodPressure: int 132 139 130 105 125 112 173 108 131 129 ... $ Historic period : num sixteen 17.2 nineteen.5 xv.7 19.9 14.three 17.7 19.8 nineteen.4 18.8 ... $ Aneurisms_q1 : int 114 148 196 199 188 260 135 216 117 188 ... $ Aneurisms_q2 : int 140 209 251 140 120 266 98 238 215 144 ... $ Aneurisms_q3 : int 202 248 122 233 222 320 154 279 181 192 ... $ Aneurisms_q4 : int 237 248 177 220 228 294 245 251 272 185 ...
The str
role tell us that the data has 100 rows and 9 columns. It is also tell us that the data frame is made up of character chr
, integer int
and numeric
vectors.
ID Gender Group BloodPressure Age Aneurisms_q1 Aneurisms_q2 1 Sub001 g Control 132 xvi.0 114 140 2 Sub002 g Treatment2 139 17.2 148 209 3 Sub003 chiliad Treatment2 130 19.5 196 251 iv Sub004 f Treatment1 105 xv.7 199 140 five Sub005 yard Treatment1 125 19.ix 188 120 6 Sub006 1000 Treatment2 112 14.3 260 266 Aneurisms_q3 Aneurisms_q4 1 202 237 2 248 248 3 122 177 four 233 220 5 222 228 6 320 294
The data is the results of an (not real) experiment, looking at the number of aneurysms that formed in the optics of patients who undertook 3 different treatments.
Addressing past Index
Information tin can exist accessed by alphabetize. We accept already seen how square brackets [
tin can be used to subset information (sometimes as well chosen "slicing"). The generic format is dat[row_numbers,column_numbers]
.
Selecting Values
What will be returned by
dat[1, 1]
? Think about the number of rows and columns yous would expect as the effect.Solution
If nosotros leave out a dimension R volition interpret this every bit a request for all values in that dimension.
Selecting More than Values
What volition be returned by
dat[, 2]
?Solution
[1] "m" "thou" "m" "f" "m" "One thousand" "f" "m" "m" "f" "thousand" "f" "f" "m" "thousand" "m" "f" "m" [19] "m" "F" "f" "one thousand" "f" "f" "m" "M" "M" "f" "m" "f" "f" "m" "m" "1000" "m" "f" [37] "f" "chiliad" "G" "m" "f" "1000" "m" "m" "f" "f" "One thousand" "M" "m" "m" "m" "f" "f" "f" [55] "m" "f" "thousand" "m" "m" "f" "f" "f" "f" "K" "f" "one thousand" "f" "f" "M" "one thousand" "m" "k" [73] "F" "g" "thousand" "f" "M" "M" "Grand" "f" "m" "M" "M" "m" "m" "f" "f" "f" "one thousand" "k" [91] "f" "m" "F" "f" "m" "chiliad" "F" "g" "M" "M"
The colon :
tin exist used to create a sequence of integers.
Creates a vector of numbers from 6 to 9.
This can be very useful for addressing information.
Subsetting with Sequences
Use the colon operator to alphabetize just the aneurism count data (columns vi to ix).
Solution
Aneurisms_q1 Aneurisms_q2 Aneurisms_q3 Aneurisms_q4 1 114 140 202 237 2 148 209 248 248 3 196 251 122 177 4 199 140 233 220 5 188 120 222 228 6 260 266 320 294 7 135 98 154 245 8 216 238 279 251 9 117 215 181 272 10 188 144 192 185 11 134 155 247 223 12 152 177 323 245 13 112 220 225 195 14 109 150 177 189 xv 146 140 239 223 xvi 97 172 203 207 17 165 157 200 193 xviii 158 265 243 187 19 178 109 206 182 twenty 107 188 167 218 21 174 160 203 183 22 97 110 194 133 23 187 239 281 214 24 188 191 256 265 25 114 199 242 195 26 115 160 158 228 27 128 249 294 315 28 112 230 281 126 29 136 109 105 155 30 103 148 219 228 31 132 151 234 162 32 118 154 260 160 33 166 176 253 233 34 152 105 197 299 35 191 148 166 185 36 152 178 158 170 37 161 270 232 284 38 239 184 317 269 39 132 137 193 206 40 168 255 273 274 41 140 184 239 202 42 166 85 179 196 43 141 160 179 239 44 161 168 212 181 45 103 111 254 126 46 231 240 260 310 47 192 141 180 225 48 178 180 169 183 49 167 123 236 224 fifty 135 150 208 279 51 150 166 153 204 52 192 eighty 138 222 53 153 153 236 216 54 205 264 269 207 55 117 194 216 211 56 199 119 183 251 57 182 129 226 218 58 180 196 250 294 59 111 111 244 201 60 101 98 178 116 61 166 167 232 241 62 158 171 237 212 63 189 178 177 238 64 189 101 193 172 65 239 189 297 300 66 185 224 151 182 67 224 112 304 288 68 104 139 211 204 69 222 199 280 196 70 107 98 204 138 71 153 255 218 234 72 118 165 220 227 73 102 184 246 222 74 188 125 191 157 75 180 283 204 298 76 178 214 291 240 77 168 184 184 229 78 118 170 249 249 79 169 114 248 233 fourscore 156 138 218 258 81 232 211 219 246 82 188 108 180 136 83 169 168 180 211 84 241 233 292 182 85 65 207 234 235 86 225 185 195 235 87 104 116 173 221 88 179 158 216 244 89 103 140 209 186 xc 112 130 175 191 91 226 170 307 244 92 228 221 316 259 93 209 142 199 184 94 153 104 194 214 95 111 118 173 191 96 148 132 200 194 97 141 196 322 273 98 193 112 123 181 99 130 226 286 281 100 126 157 129 160
Finally nosotros tin utilize the c()
(combine) function to address not-sequential rows and columns.
ID Gender Group BloodPressure Age 1 Sub001 grand Control 132 xvi.0 five Sub005 m Treatment1 125 19.nine seven Sub007 f Control 173 17.7 9 Sub009 m Treatment2 131 19.4
Returns the get-go 5 columns for patients in rows 1,5,seven and 9
Subsetting Not-Sequential Data
Write lawmaking to return the historic period and gender values for the start five patients.
Solution
Historic period Gender 1 16.0 m ii 17.ii chiliad 3 19.five k 4 15.seven f five 19.nine one thousand
Addressing by Proper name
Columns in an R data frame are named.
[1] "ID" "Gender" "Grouping" "BloodPressure" [v] "Age" "Aneurisms_q1" "Aneurisms_q2" "Aneurisms_q3" [nine] "Aneurisms_q4"
Default Names
If cavalcade names are not specified e.thousand. using
headers = Simulated
in aread.csv()
function, R assigns default namesV1, V2, ..., Vn
We ordinarily utilize the $
operator to address a column past name
[ane] "m" "m" "g" "f" "m" "Thou" "f" "m" "m" "f" "m" "f" "f" "m" "m" "yard" "f" "g" [xix] "k" "F" "f" "m" "f" "f" "m" "M" "Thou" "f" "k" "f" "f" "m" "g" "m" "thou" "f" [37] "f" "m" "M" "thousand" "f" "yard" "m" "yard" "f" "f" "M" "M" "thousand" "k" "m" "f" "f" "f" [55] "k" "f" "k" "m" "1000" "f" "f" "f" "f" "M" "f" "m" "f" "f" "G" "m" "m" "grand" [73] "F" "k" "m" "f" "Thousand" "Thou" "Yard" "f" "m" "M" "M" "thousand" "m" "f" "f" "f" "m" "m" [91] "f" "m" "F" "f" "m" "1000" "F" "k" "M" "Grand"
When nosotros extract a single column from a data frame using the $
operator, R will return a vector of that cavalcade class and not a data frame.
Named addressing can also be used in square brackets.
caput ( dat [, c ( 'Age' , 'Gender' )])
Age Gender 1 sixteen.0 m 2 17.ii g three 19.5 m 4 xv.7 f v 19.9 k six 14.iii K
Best Practice
All-time do is to address columns by proper name. Often, you will create or delete columns and the column position volition change.
Rows in an R data frame can also be named, and rows can also exist addressed past their names.
By default, row names are indices (i.east. position of each row in the information frame):
[1] "1" "two" "three" "four" "v" "6" "vii" "8" "nine" "10" "11" "12" [13] "xiii" "14" "15" "sixteen" "17" "eighteen" "nineteen" "xx" "21" "22" "23" "24" [25] "25" "26" "27" "28" "29" "thirty" "31" "32" "33" "34" "35" "36" [37] "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" [49] "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "sixty" [61] "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72" [73] "73" "74" "75" "76" "77" "78" "79" "80" "81" "82" "83" "84" [85] "85" "86" "87" "88" "89" "90" "91" "92" "93" "94" "95" "96" [97] "97" "98" "99" "100"
Nosotros tin can add row names equally nosotros read in the file with the row.names
parameter in read.csv
.
In the post-obit example, we choose the first column ID to become the vector of row names of the data frame, with row.names = 1
.
dat2 <- read.csv ( file = 'data/sample.csv' , header = TRUE , stringsAsFactors = FALSE , row.names = 1 ) rownames ( dat2 )
[one] "Sub001" "Sub002" "Sub003" "Sub004" "Sub005" "Sub006" "Sub007" "Sub008" [ix] "Sub009" "Sub010" "Sub011" "Sub012" "Sub013" "Sub014" "Sub015" "Sub016" [17] "Sub017" "Sub018" "Sub019" "Sub020" "Sub021" "Sub022" "Sub023" "Sub024" [25] "Sub025" "Sub026" "Sub027" "Sub028" "Sub029" "Sub030" "Sub031" "Sub032" [33] "Sub033" "Sub034" "Sub035" "Sub036" "Sub037" "Sub038" "Sub039" "Sub040" [41] "Sub041" "Sub042" "Sub043" "Sub044" "Sub045" "Sub046" "Sub047" "Sub048" [49] "Sub049" "Sub050" "Sub051" "Sub052" "Sub053" "Sub054" "Sub055" "Sub056" [57] "Sub057" "Sub058" "Sub059" "Sub060" "Sub061" "Sub062" "Sub063" "Sub064" [65] "Sub065" "Sub066" "Sub067" "Sub068" "Sub069" "Sub070" "Sub071" "Sub072" [73] "Sub073" "Sub074" "Sub075" "Sub076" "Sub077" "Sub078" "Sub079" "Sub080" [81] "Sub081" "Sub082" "Sub083" "Sub084" "Sub085" "Sub086" "Sub087" "Sub088" [89] "Sub089" "Sub090" "Sub091" "Sub092" "Sub093" "Sub094" "Sub095" "Sub096" [97] "Sub097" "Sub098" "Sub099" "Sub100"
We can now extract one or more than rows using those row names:
Gender Grouping BloodPressure Historic period Aneurisms_q1 Aneurisms_q2 Aneurisms_q3 Sub072 m Control 116 17.four 118 165 220 Aneurisms_q4 Sub072 227
dat2 [ c ( "Sub009" , "Sub072" ), ]
Gender Group BloodPressure Age Aneurisms_q1 Aneurisms_q2 Sub009 m Treatment2 131 19.4 117 215 Sub072 m Command 116 17.4 118 165 Aneurisms_q3 Aneurisms_q4 Sub009 181 272 Sub072 220 227
Note that row names must be unique!
For instance, if we try and read in the data setting the Group column as row names, R will throw an error because values in that cavalcade are duplicated:
dat2 <- read.csv ( file = 'data/sample.csv' , header = Truthful , stringsAsFactors = FALSE , row.names = 3 )
Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not immune
Addressing by Logical Vector
A logical vector contains only the special values Truthful
and Fake
.
c ( TRUE , TRUE , FALSE , FALSE , Truthful )
[1] TRUE Truthful FALSE Faux True
Truth and Its Reverse
Notation the values
TRUE
andFALSE
are all capital letter letters and are not quoted.
Logical vectors can be created using relational operators
eastward.g. <, >, ==, !=, %in%
.
x <- c ( ane , two , iii , 11 , 12 , 13 ) x < ten
[1] TRUE Truthful TRUE Simulated Simulated FALSE
[one] TRUE TRUE TRUE Fake Faux FALSE
We can apply logical vectors to select data from a information frame. This is often referred to equally logical indexing.
index <- dat $ Group == 'Command' dat [ alphabetize ,] $ BloodPressure
[ane] 132 173 129 77 158 81 137 111 135 108 133 139 126 125 99 122 155 133 94 [20] 98 74 116 97 104 117 xc 150 116 108 102
Often this operation is written as i line of code:
plot ( dat [ dat $ Group == 'Control' , ] $ BloodPressure )
Using Logical Indexes
- Create a scatterplot showing BloodPressure for subjects not in the control group.
- How many ways are there to index this prepare of subjects?
Solution
The lawmaking for such a plot:
plot ( dat [ dat $ Group != 'Control' , ] $ BloodPressure )
In add-on to
dat$Group != 'Control'
, one could usedat$Group %in% c("Treatment1", "Treatment2")
.
Combining Addressing and Assignment
The consignment operator <-
can be combined with addressing.
10 <- c ( 1 , ii , 3 , eleven , 12 , 13 ) x [ x < 10 ] <- 0 x
Updating a Subset of Values
In this dataset, values for Gender have been recorded as both uppercase
K, F
and lowercasethou, f
. Combine the addressing and assignment operations to convert all values to lowercase.Solution
dat [ dat $ Gender == 'Grand' , ] $ Gender <- 'yard' dat [ dat $ Gender == 'F' , ] $ Gender <- 'f'
Primal Points
Data in data frames can be addressed by index (subsetting), by logical vector, or by name (columns only).
Use the
$
operator to address a column by proper noun.
Source: https://swcarpentry.github.io/r-novice-inflammation/10-supp-addressing-data/
0 Response to "R Read First 4 Digit of Cell"
ارسال یک نظر