0

I'm pretty new to R and programming in general. I'm working on an assignment in R and I'm at a dead end with my current knowledge.

My data looks like this: my table

I'm using the tidyverse and I want to create a new table with the only entries being rows with the "Kennziffer" (first column) ranging from 1 to 10 in the first two numbers.

My try is it to use the command:

new_object <- table_name %>% 
filter(table_name, Kennziffer == and I don't know what to put here to get values starting with 1 to 10

any help would be greatly appreciated.

Thanks for taking the time to read and answer.

I tried:

new_object <- table_name %>% filter(table_name, Kennziffer == 1,2,3,4,5,6,7,8,9,10)

but this doesn't work as the Kennziffer value is 4 or 5 characters long.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
GeoNerd
  • 19
  • 4
  • 1
    Welcome to StackOverflow. Please don't paste images of data. Instead, please make your questions reproducible by pasting a minimal sample of your data as text into your question so others can recreate your issue and provide a working solution. For example you could provide the result of `dput(head(table_name))` . – Dan Adams Nov 22 '22 at 20:01
  • 1
    Also what type of data is the `Kennziffer` column? If it's `numeric` you can do different things than if you're treating it as `character`. If you only care about the first two digits, I'd suggest splitting them off as a separate column and `filter`ing on that. – Dan Adams Nov 22 '22 at 20:02
  • Basic set membership is `.. %>% filter(Kennziffer %in% c(1, 5, 99, 19293))` (c.f., https://stackoverflow.com/q/15358006/3358272, https://stackoverflow.com/q/42637099/3358272). You can use `between` as well for your range, such as `.. %>% filter(between(Kennziffer, 100, 1099))` (note that it is closed-ends, inclusive of both the `100` and `1099` in that example). – r2evans Nov 22 '22 at 20:11
  • 2
    What does "ranging from 1 to 10 in the first two numbers" mean exactly? Can you give examples of number that should and should not be included? – MrFlick Nov 22 '22 at 20:13
  • I don't quite know what you mean by *"ranging from 1 to 10 in the first two numbers"* - 1 to 9 is only one number, so the "first two numbers part is confusing. But it seems like you might be interested in the `>=` and `<=` operators. – Gregor Thomas Nov 22 '22 at 20:14
  • Hi! Sorry for not clarifying correctly. I want to filter so that I get a table including the Kennziffer Value starting at 1 which is 4 digits long, e.g. 1XXX ending at a 5 digit long number starting with 10 like 10XXX because the Kennziffer switches to 5 Digits beginning with 11,12,13 etc after like 100 rows and I only want to include the Kennziffer values from 1XXX to 10XXX – GeoNerd Nov 22 '22 at 20:38
  • And what is the form of the numbers you want to exclude? How many digits, what do they start with etc.? – Dan Adams Nov 22 '22 at 20:48
  • the numbers I want to exclude have 5 digits and start with an 11 like 11XXX or 16XXX. I only want to have the 4 digit ascending numbers to the 5 digit numbers from 1XXX to 10XXX – GeoNerd Nov 22 '22 at 20:51

1 Answers1

0

You can use stringr::str_sub() to remove the last 3 digits and then ensure a match to your list of accepted start values (e.g. 1-10).

library(tidyverse)

d <- structure(list(Kennziffer = c(1001L, 1002L, 1003L, 1004L, 1051L, 1053L, 1054L, 1055L, 1056L, 1057L, 1058L, 1059L), Raumeinheit = c("Flensburg. Stadt", "Kiel. Stadt", "Lübeck. Stadt", "Neumünster. Stadt", "Dithmarschen", "Herzogtum Lauenburg", "Nordfriesland", "Ostholstein", "Pinneberg", "Plön", "Rendsburg-Eckernförde", "Schleswig-Flensburg"), Aggregat = c("kreisfreie Stadt", "kreisfreie Stadt", "kreisfreie Stadt", "kreisfreie Stadt", "Landkreis", "Landkreis", "Landkreis", "Landkreis", "Landkreis", "Landkreis", "Landkreis", "Landkreis"), Langzeitarbeitslose = c(30.58, 36.47, 34.28, 35.49, 28.1, 33.43, 37.16, 30.58, 27.15, 27.38, 27.48, 30.12)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))

# create list of "first two digits" you want to match
digits2keep <- as.character(1:10)

# extract first 2 digits and filter to matches
d %>% 
  mutate(start_digits = str_sub(Kennziffer, 1, nchar(Kennziffer) - 3)) %>% 
  filter(start_digits %in% digits2keep)
#>    Kennziffer           Raumeinheit         Aggregat Langzeitarbeitslose
#> 1        1001      Flensburg. Stadt kreisfreie Stadt               30.58
#> 2        1002           Kiel. Stadt kreisfreie Stadt               36.47
#> 3        1003         Lübeck. Stadt kreisfreie Stadt               34.28
#> 4        1004     Neumünster. Stadt kreisfreie Stadt               35.49
#> 5        1051          Dithmarschen        Landkreis               28.10
#> 6        1053   Herzogtum Lauenburg        Landkreis               33.43
#> 7        1054         Nordfriesland        Landkreis               37.16
#> 8        1055           Ostholstein        Landkreis               30.58
#> 9        1056             Pinneberg        Landkreis               27.15
#> 10       1057                  Plön        Landkreis               27.38
#> 11       1058 Rendsburg-Eckernförde        Landkreis               27.48
#> 12       1059   Schleswig-Flensburg        Landkreis               30.12
#>    start_digits
#> 1             1
#> 2             1
#> 3             1
#> 4             1
#> 5             1
#> 6             1
#> 7             1
#> 8             1
#> 9             1
#> 10            1
#> 11            1
#> 12            1

Created on 2022-11-22 with reprex v2.0.2

Although this seems unnecessarily complicated. Since Kennziffer is already numeric I can't see why d %>% filter(Kennziffer < 11000) wouldn't work.

Dan Adams
  • 4,971
  • 9
  • 28