R Read First 4 Digit of Cell

Addressing Data

Overview

Teaching: 20 min
Exercises: 0 min

Questions

  • What are the unlike methods for accessing parts of a data frame?

Objectives

  • Empathize the three unlike ways R can address information inside a data frame.

  • Combine dissimilar methods for addressing data with the assignment operator to update subsets of data.

R is a powerful language for data manipulation. In that location are iii principal ways for addressing information inside R objects.

  • By alphabetize (subsetting)
  • By logical vector
  • By proper name

Lets start by loading some sample data:

                          dat                                          <-                                          read.csv              (              file                                          =                                          'data/sample.csv'              ,                                          header                                          =                                          TRUE              ,                                          stringsAsFactors                                          =                                          Faux              )                                                  

The outset row of this csv file is a list of cavalcade names. We used the header = True statement to read.csv so that R can interpret the file correctly. We are using the stringsAsFactors = False argument to override the default behaviour for R. Using factors in R is covered in a separate lesson.

Lets take a await at this data.

R has loaded the contents of the .csv file into a variable chosen dat which is a data frame.

We can compactly display the internal structure of a information frame using the structure part str.

            'data.frame':	100 obs. of  9 variables:  $ ID           : chr  "Sub001" "Sub002" "Sub003" "Sub004" ...  $ Gender       : chr  "m" "m" "m" "f" ...  $ Group        : chr  "Control" "Treatment2" "Treatment2" "Treatment1" ...  $ BloodPressure: int  132 139 130 105 125 112 173 108 131 129 ...  $ Historic period          : num  sixteen 17.2 nineteen.5 xv.7 19.9 14.three 17.7 19.8 nineteen.4 18.8 ...  $ Aneurisms_q1 : int  114 148 196 199 188 260 135 216 117 188 ...  $ Aneurisms_q2 : int  140 209 251 140 120 266 98 238 215 144 ...  $ Aneurisms_q3 : int  202 248 122 233 222 320 154 279 181 192 ...  $ Aneurisms_q4 : int  237 248 177 220 228 294 245 251 272 185 ...                      

The str role tell us that the data has 100 rows and 9 columns. It is also tell us that the data frame is made up of character chr, integer int and numeric vectors.

                          ID Gender      Group BloodPressure  Age Aneurisms_q1 Aneurisms_q2 1 Sub001      g    Control           132 xvi.0          114          140 2 Sub002      g Treatment2           139 17.2          148          209 3 Sub003      chiliad Treatment2           130 19.5          196          251 iv Sub004      f Treatment1           105 xv.7          199          140 five Sub005      yard Treatment1           125 19.ix          188          120 6 Sub006      1000 Treatment2           112 14.3          260          266   Aneurisms_q3 Aneurisms_q4 1          202          237 2          248          248 3          122          177 four          233          220 5          222          228 6          320          294                      

The data is the results of an (not real) experiment, looking at the number of aneurysms that formed in the optics of patients who undertook 3 different treatments.

Addressing past Index

Information tin can exist accessed by alphabetize. We accept already seen how square brackets [ tin can be used to subset information (sometimes as well chosen "slicing"). The generic format is dat[row_numbers,column_numbers].

Selecting Values

What will be returned by dat[1, 1]? Think about the number of rows and columns yous would expect as the effect.

Solution

If nosotros leave out a dimension R volition interpret this every bit a request for all values in that dimension.

Selecting More than Values

What volition be returned by dat[, 2]?

Solution

                                  [1] "m" "thou" "m" "f" "m" "One thousand" "f" "m" "m" "f" "thousand" "f" "f" "m" "thousand" "m" "f" "m"  [19] "m" "F" "f" "one thousand" "f" "f" "m" "M" "M" "f" "m" "f" "f" "m" "m" "1000" "m" "f"  [37] "f" "chiliad" "G" "m" "f" "1000" "m" "m" "f" "f" "One thousand" "M" "m" "m" "m" "f" "f" "f"  [55] "m" "f" "thousand" "m" "m" "f" "f" "f" "f" "K" "f" "one thousand" "f" "f" "M" "one thousand" "m" "k"  [73] "F" "g" "thousand" "f" "M" "M" "Grand" "f" "m" "M" "M" "m" "m" "f" "f" "f" "one thousand" "k"  [91] "f" "m" "F" "f" "m" "chiliad" "F" "g" "M" "M"                              

The colon : tin exist used to create a sequence of integers.

Creates a vector of numbers from 6 to 9.

This can be very useful for addressing information.

Subsetting with Sequences

Use the colon operator to alphabetize just the aneurism count data (columns vi to ix).

Solution

                                  Aneurisms_q1 Aneurisms_q2 Aneurisms_q3 Aneurisms_q4 1            114          140          202          237 2            148          209          248          248 3            196          251          122          177 4            199          140          233          220 5            188          120          222          228 6            260          266          320          294 7            135           98          154          245 8            216          238          279          251 9            117          215          181          272 10           188          144          192          185 11           134          155          247          223 12           152          177          323          245 13           112          220          225          195 14           109          150          177          189 xv           146          140          239          223 xvi            97          172          203          207 17           165          157          200          193 xviii           158          265          243          187 19           178          109          206          182 twenty           107          188          167          218 21           174          160          203          183 22            97          110          194          133 23           187          239          281          214 24           188          191          256          265 25           114          199          242          195 26           115          160          158          228 27           128          249          294          315 28           112          230          281          126 29           136          109          105          155 30           103          148          219          228 31           132          151          234          162 32           118          154          260          160 33           166          176          253          233 34           152          105          197          299 35           191          148          166          185 36           152          178          158          170 37           161          270          232          284 38           239          184          317          269 39           132          137          193          206 40           168          255          273          274 41           140          184          239          202 42           166           85          179          196 43           141          160          179          239 44           161          168          212          181 45           103          111          254          126 46           231          240          260          310 47           192          141          180          225 48           178          180          169          183 49           167          123          236          224 fifty           135          150          208          279 51           150          166          153          204 52           192           eighty          138          222 53           153          153          236          216 54           205          264          269          207 55           117          194          216          211 56           199          119          183          251 57           182          129          226          218 58           180          196          250          294 59           111          111          244          201 60           101           98          178          116 61           166          167          232          241 62           158          171          237          212 63           189          178          177          238 64           189          101          193          172 65           239          189          297          300 66           185          224          151          182 67           224          112          304          288 68           104          139          211          204 69           222          199          280          196 70           107           98          204          138 71           153          255          218          234 72           118          165          220          227 73           102          184          246          222 74           188          125          191          157 75           180          283          204          298 76           178          214          291          240 77           168          184          184          229 78           118          170          249          249 79           169          114          248          233 fourscore           156          138          218          258 81           232          211          219          246 82           188          108          180          136 83           169          168          180          211 84           241          233          292          182 85            65          207          234          235 86           225          185          195          235 87           104          116          173          221 88           179          158          216          244 89           103          140          209          186 xc           112          130          175          191 91           226          170          307          244 92           228          221          316          259 93           209          142          199          184 94           153          104          194          214 95           111          118          173          191 96           148          132          200          194 97           141          196          322          273 98           193          112          123          181 99           130          226          286          281 100          126          157          129          160                              

Finally nosotros tin utilize the c() (combine) function to address not-sequential rows and columns.

                          ID Gender      Group BloodPressure  Age 1 Sub001      grand    Control           132 xvi.0 five Sub005      m Treatment1           125 19.nine seven Sub007      f    Control           173 17.7 9 Sub009      m Treatment2           131 19.4                      

Returns the get-go 5 columns for patients in rows 1,5,seven and 9

Subsetting Not-Sequential Data

Write lawmaking to return the historic period and gender values for the start five patients.

Solution

                                  Historic period Gender 1 16.0      m ii 17.ii      chiliad 3 19.five      k 4 15.seven      f five 19.nine      one thousand                              

Addressing by Proper name

Columns in an R data frame are named.

            [1] "ID"            "Gender"        "Grouping"         "BloodPressure" [v] "Age"           "Aneurisms_q1"  "Aneurisms_q2"  "Aneurisms_q3"  [nine] "Aneurisms_q4"                      

Default Names

If cavalcade names are not specified e.thousand. using headers = Simulated in a read.csv() function, R assigns default names V1, V2, ..., Vn

We ordinarily utilize the $ operator to address a column past name

                          [ane] "m" "m" "g" "f" "m" "Thou" "f" "m" "m" "f" "m" "f" "f" "m" "m" "yard" "f" "g"  [xix] "k" "F" "f" "m" "f" "f" "m" "M" "Thou" "f" "k" "f" "f" "m" "g" "m" "thou" "f"  [37] "f" "m" "M" "thousand" "f" "yard" "m" "yard" "f" "f" "M" "M" "thousand" "k" "m" "f" "f" "f"  [55] "k" "f" "k" "m" "1000" "f" "f" "f" "f" "M" "f" "m" "f" "f" "G" "m" "m" "grand"  [73] "F" "k" "m" "f" "Thousand" "Thou" "Yard" "f" "m" "M" "M" "thousand" "m" "f" "f" "f" "m" "m"  [91] "f" "m" "F" "f" "m" "1000" "F" "k" "M" "Grand"                      

When nosotros extract a single column from a data frame using the $ operator, R will return a vector of that cavalcade class and not a data frame.

Named addressing can also be used in square brackets.

                          caput              (              dat              [,                                          c              (              'Age'              ,                                          'Gender'              )])                                                  
                          Age Gender 1 sixteen.0      m 2 17.ii      g three 19.5      m 4 xv.7      f v 19.9      k six 14.iii      K                      

Best Practice

All-time do is to address columns by proper name. Often, you will create or delete columns and the column position volition change.

Rows in an R data frame can also be named, and rows can also exist addressed past their names.
By default, row names are indices (i.east. position of each row in the information frame):

                          [1] "1"   "two"   "three"   "four"   "v"   "6"   "vii"   "8"   "nine"   "10"  "11"  "12"   [13] "xiii"  "14"  "15"  "sixteen"  "17"  "eighteen"  "nineteen"  "xx"  "21"  "22"  "23"  "24"   [25] "25"  "26"  "27"  "28"  "29"  "thirty"  "31"  "32"  "33"  "34"  "35"  "36"   [37] "37"  "38"  "39"  "40"  "41"  "42"  "43"  "44"  "45"  "46"  "47"  "48"   [49] "49"  "50"  "51"  "52"  "53"  "54"  "55"  "56"  "57"  "58"  "59"  "sixty"   [61] "61"  "62"  "63"  "64"  "65"  "66"  "67"  "68"  "69"  "70"  "71"  "72"   [73] "73"  "74"  "75"  "76"  "77"  "78"  "79"  "80"  "81"  "82"  "83"  "84"   [85] "85"  "86"  "87"  "88"  "89"  "90"  "91"  "92"  "93"  "94"  "95"  "96"   [97] "97"  "98"  "99"  "100"                      

Nosotros tin can add row names equally nosotros read in the file with the row.names parameter in read.csv.
In the post-obit example, we choose the first column ID to become the vector of row names of the data frame, with row.names = 1.

                          dat2                                          <-                                          read.csv              (              file                                          =                                          'data/sample.csv'              ,                                          header                                          =                                          TRUE              ,                                          stringsAsFactors                                          =                                          FALSE              ,                                          row.names              =              1              )                                          rownames              (              dat2              )                                                  
                          [one] "Sub001" "Sub002" "Sub003" "Sub004" "Sub005" "Sub006" "Sub007" "Sub008"   [ix] "Sub009" "Sub010" "Sub011" "Sub012" "Sub013" "Sub014" "Sub015" "Sub016"  [17] "Sub017" "Sub018" "Sub019" "Sub020" "Sub021" "Sub022" "Sub023" "Sub024"  [25] "Sub025" "Sub026" "Sub027" "Sub028" "Sub029" "Sub030" "Sub031" "Sub032"  [33] "Sub033" "Sub034" "Sub035" "Sub036" "Sub037" "Sub038" "Sub039" "Sub040"  [41] "Sub041" "Sub042" "Sub043" "Sub044" "Sub045" "Sub046" "Sub047" "Sub048"  [49] "Sub049" "Sub050" "Sub051" "Sub052" "Sub053" "Sub054" "Sub055" "Sub056"  [57] "Sub057" "Sub058" "Sub059" "Sub060" "Sub061" "Sub062" "Sub063" "Sub064"  [65] "Sub065" "Sub066" "Sub067" "Sub068" "Sub069" "Sub070" "Sub071" "Sub072"  [73] "Sub073" "Sub074" "Sub075" "Sub076" "Sub077" "Sub078" "Sub079" "Sub080"  [81] "Sub081" "Sub082" "Sub083" "Sub084" "Sub085" "Sub086" "Sub087" "Sub088"  [89] "Sub089" "Sub090" "Sub091" "Sub092" "Sub093" "Sub094" "Sub095" "Sub096"  [97] "Sub097" "Sub098" "Sub099" "Sub100"                      

We can now extract one or more than rows using those row names:

                          Gender   Grouping BloodPressure  Historic period Aneurisms_q1 Aneurisms_q2 Aneurisms_q3 Sub072      m Control           116 17.four          118          165          220        Aneurisms_q4 Sub072          227                      
                          dat2              [              c              (              "Sub009"              ,                                          "Sub072"              ),                                          ]                                                  
                          Gender      Group BloodPressure  Age Aneurisms_q1 Aneurisms_q2 Sub009      m Treatment2           131 19.4          117          215 Sub072      m    Command           116 17.4          118          165        Aneurisms_q3 Aneurisms_q4 Sub009          181          272 Sub072          220          227                      

Note that row names must be unique!
For instance, if we try and read in the data setting the Group column as row names, R will throw an error because values in that cavalcade are duplicated:

                          dat2                                          <-                                          read.csv              (              file                                          =                                          'data/sample.csv'              ,                                          header                                          =                                          Truthful              ,                                          stringsAsFactors                                          =                                          FALSE              ,                                          row.names              =              3              )                                                  
            Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not immune                      

Addressing by Logical Vector

A logical vector contains only the special values Truthful and Fake.

                          c              (              TRUE              ,                                          TRUE              ,                                          FALSE              ,                                          FALSE              ,                                          Truthful              )                                                  
            [1]  TRUE  Truthful FALSE Faux  True                      

Truth and Its Reverse

Notation the values TRUE and FALSE are all capital letter letters and are not quoted.

Logical vectors can be created using relational operators eastward.g. <, >, ==, !=, %in%.

                          x                                          <-                                          c              (              ane              ,                                          two              ,                                          iii              ,                                          11              ,                                          12              ,                                          13              )                                          x                                          <                                          ten                                                  
            [1]  TRUE  Truthful  TRUE Simulated Simulated FALSE                      
            [one]  TRUE  TRUE  TRUE Fake Faux FALSE                      

We can apply logical vectors to select data from a information frame. This is often referred to equally logical indexing.

                          index                                          <-                                          dat              $              Group                                          ==                                          'Command'                                          dat              [              alphabetize              ,]              $              BloodPressure                                                  
                          [ane] 132 173 129  77 158  81 137 111 135 108 133 139 126 125  99 122 155 133  94 [20]  98  74 116  97 104 117  xc 150 116 108 102                      

Often this operation is written as i line of code:

                          plot              (              dat              [              dat              $              Group                                          ==                                          'Control'              ,                                          ]              $              BloodPressure              )                                                  

plot of chunk logical_vectors_indexing2

Using Logical Indexes

  1. Create a scatterplot showing BloodPressure for subjects not in the control group.
  2. How many ways are there to index this prepare of subjects?

Solution

  1. The lawmaking for such a plot:

                                                                                          plot                      (                      dat                      [                      dat                      $                      Group                                                                  !=                                                                  'Control'                      ,                                                                  ]                      $                      BloodPressure                      )                                                                                  

    plot of chunk plot-logical

  2. In add-on to dat$Group != 'Control', one could use dat$Group %in% c("Treatment1", "Treatment2").

Combining Addressing and Assignment

The consignment operator <- can be combined with addressing.

                          10                                          <-                                          c              (              1              ,                                          ii              ,                                          3              ,                                          eleven              ,                                          12              ,                                          13              )                                          x              [              x                                          <                                          10              ]                                          <-                                          0                                          x                                                  

Updating a Subset of Values

In this dataset, values for Gender have been recorded as both uppercase K, F and lowercase thou, f. Combine the addressing and assignment operations to convert all values to lowercase.

Solution

                                  dat                  [                  dat                  $                  Gender                                                      ==                                                      'Grand'                  ,                                                      ]                  $                  Gender                                                      <-                                                      'yard'                                                      dat                  [                  dat                  $                  Gender                                                      ==                                                      'F'                  ,                                                      ]                  $                  Gender                                                      <-                                                      'f'                                                                  

Primal Points

  • Data in data frames can be addressed by index (subsetting), by logical vector, or by name (columns only).

  • Use the $ operator to address a column by proper noun.

clarkoves2000.blogspot.com

Source: https://swcarpentry.github.io/r-novice-inflammation/10-supp-addressing-data/

0 Response to "R Read First 4 Digit of Cell"

ارسال یک نظر

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel