0

List item

I am new to web scrapping and after a couple of Wikipedia pages I found this page where I wanted to extract the tables for all the portfolio managers. I am not able to use the things I found on the internet. I thought it would be easy since it's just a table but I am not able to extract even a single table after filling out the form. Can someone please tell me how I could get this done in R? I have added an image in this post but it seems to look like a link that says to enter image description here.

https://www.sebi.gov.in/sebiweb/other/OtherAction.do?doPmr=yes

library(tidyverse)
library(rvest)
library(httr)
library(RCurl)

url    <- "https://www.sebi.gov.in/sebiweb/other/OtherAction.do?doPmr=yes"
result <- postForm(url,
                   pmrId="RIGHT HORIZONS PORTFOLIO MANAGEMENT PRIVATE LIMITED",
                   year="2022",
                   month="August")
attr(result,"Content-Type")
result

enter image description here Sebi Website

mathplyr
  • 3
  • 2

1 Answers1

0

If you change those passed values to corresponding value attribute values of options (i.e. "8" instead of "August" in case of <option value="8">August</option>), you should be all set. And you can also check the actual payload of POST requests: enter image description here

Lazy approach would be just using Copy as cURL in DevTools and heading to https://curlconverter.com/r/ to convert it to httr request.

library(rvest)
resp <- httr::POST("https://www.sebi.gov.in/sebiweb/other/OtherAction.do?doPmr=yes", 
           body = list(
             pmrId="INP000004417@@INP000004417@@AEQUITAS INVESTMENT CONSULTANCY PRIVATE LIMITED",
             year="2022",
             month="8")) 

tables <- resp %>% 
  read_html() %>% 
  html_elements("table") %>% 
  html_table()

# first table:
tables[[1]]
#> # A tibble: 11 × 2
#>    X1                                                                      X2   
#>    <chr>                                                                   <chr>
#>  1 Name of the Portfolio Manager                                           "Aeq…
#>  2 Registration Number                                                     "INP…
#>  3 Date of Registration                                                    "201…
#>  4 Registered Address of the Portfolio Manager                             ",,,…
#>  5 Name of Principal Officer                                               ""   
#>  6 Email ID of the Principal Officer                                       ""   
#>  7 Contact Number (Direct) of the Principal Officer                        ""   
#>  8 Name of Compliance Officer                                              ""   
#>  9 Email ID of the Compliance Officer                                      ""   
#> 10 No. of clients as on last day of the month                              "124…
#> 11 Total Assets under Management (AUM) as on last day of the month (Amoun… "143…

Created on 2022-10-11 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20