Databricks
Databricks
  • Видео 2 956
  • Просмотров 16 105 448
LakeFlow Demo
Databricks LakeFlow is a new solution that contains everything you need to build and operate production data pipelines. It includes new native, highly scalable connectors for databases including MySQL, Postgres, SQL Server and Oracle and enterprise applications like Salesforce, Microsoft Dynamics, NetSuite, Workday, ServiceNow and Google Analytics. Users can transform data in batch and streaming using standard SQL and Python.
Learn about Data Engineering: www.databricks.com/solutions/data-engineering
Просмотров: 1 665

Видео

Say goodbye to messy JSON headaches with VARIANT
Просмотров 2,2 тыс.22 часа назад
Try it out today on Databricks: docs.databricks.com/en/semi-structured/variant.html Read more about it on our blog: www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark If you're curious about the implementation check out the talk: ruclips.net/video/jtjOfggD4YY/видео.html Or read about it on GitHub: github.com/apache/spark/blob/master/common/variant/README.md
Data Intelligence Day Seoul 2024
Просмотров 514День назад
Data Intelligence Day Seoul, Korea took place on 23 April 2024 and gathered over 1,200 industry leaders and data and AI experts. Watch Data Intelligence Day Seoul On Demand: events.databricks.com/KoreaDIDays2024
An Introduction to DBRX
Просмотров 3,7 тыс.День назад
Learn from Naveen Rao, VP of Generative AI at Databricks, as he explains DBRX, a new, open source foundation model that sets the standard for production quality and price/performance. With up to 3x faster inference, DBRX - outperforms all other open models in quality benchmarks - and that allows enterprises to quickly build your own custom LLM efficiently and with full control. Read more about ...
Demo: How Do I Use DBRX?
Просмотров 1,5 тыс.День назад
Watch how DBRX uses Databricks to build and customize GenAI applications using your own enterprise data Read more about DBRX here: www.databricks.com/blog/announcing-dbrx-new-standard-efficient-open-source-customizable-llms?
What's Next for Apache Spark™ Including the Upcoming Release of Apache Spark 4.0
Просмотров 6 тыс.День назад
Reynold Xin, Co-founder and Chief Architect, Databricks shares the latest innovation coming out of the Apache Spark™ open source project including a preview of the anticipated release of Spark 4.0 Speakers: Reynold Xin, Co-founder and Chief Architect, Databricks Tareef Kawaf, President, Posit Sofware, PBC
The Evolution of Delta Lake from Data + AI Summit 2024
Просмотров 1,8 тыс.День назад
Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format. Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - D...
Setting up PAT and Secret Scope
Просмотров 340День назад
Quick video on how to setup a Personal Access Token and Secret Scope and Secret with Azure Key Vault.
Increase your column sizes without rewriting the entire table
Просмотров 721День назад
Docs: docs.databricks.com/en/delta/type-widening.html
Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit
Просмотров 4,4 тыс.День назад
Shant Hovsepian, CTO of Data Warehousing at Databricks announced the biggest Delta Lake release to date, Delta 4.0, during the Data AI Summit 2024 in San Francisco. Speaker: Shant Hovsepian, Chief Technology Officer of Data Warehousing, Databricks
Open Sourcing Unity Catalog Live Onstage with Matei Zaharia at Data + AI Summit 2024
Просмотров 887День назад
Speaker: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Matei Zaharia, Original Creator of Apache Spark™ and MLflow and Chief Technologist at Databricks open sourced Unity Catalog live onstage at the Data AI Summit 2024 in San Francisco.
Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering. Presented by Bilal Aslam
Просмотров 7 тыс.День назад
Speaker: Bilal Aslam, Sr. Director of Product Management, Databricks Bilal explains that everything starts with good data and outlines the three steps to good data including, ingesting, transforming and orchestrating your data. Then Bilal announces Databricks LakeFlow - a unified solution for data engineering. With LakeFlow you can ingest data from databases, enterprise apps and cloud sources, ...
Recap of Announcements at Data + AI Summit 2024 with Ali Ghodsi, Co-Founder and CEO, Databricks
Просмотров 826День назад
Ali Ghodsi, Co-founder and CEO of Databricks closes the 2024 Data AI Summit with a recap of Databricks and open source innovation announced during the 4-day conference in San Francisco. Speaker: Ali Ghodsi, Co-founder and CEO, Databricks @Databricks
Announcing Databricks Clean Rooms with Live Demo. Presented by Matei Zaharia and Darshana Sivakumar
Просмотров 993День назад
Speakers: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Darshana Sivakumar, Staff Product Manager, Databricks Organizations are looking for ways to securely exchange their data and collaborate with external partners to foster data-driven innovations. In the past, organizations had limited data sharing solutions, relinquishing control over how their ...
Data Sharing and Cross-Organization Collaboration. Presented by Matei Zaharia at Data + AI Summit
Просмотров 327День назад
Speaker: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Summary: Data sharing and collaboration are important aspects of the data space. Matei Zaharia explains the evolution of the Databricks data platform to facilitate data sharing and collaboration for customers and their partners. Delta Sharing allows you to share parts of your table with third pa...
Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data + AI Summit
Просмотров 863День назад
Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data AI Summit
Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data + AI Summit 2024
Просмотров 2,6 тыс.День назад
Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data AI Summit 2024
Unity Catalog Demo of New Features with Zeashan Pappa at Data + AI Summit 2024
Просмотров 905День назад
Unity Catalog Demo of New Features with Zeashan Pappa at Data AI Summit 2024
How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data + AI Summit
Просмотров 324День назад
How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data AI Summit
The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data + AI Summit
Просмотров 531День назад
The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data AI Summit
How to Make Small Language Models Work. Yejin Choi Presents at Data + AI Summit 2024
Просмотров 3,3 тыс.День назад
How to Make Small Language Models Work. Yejin Choi Presents at Data AI Summit 2024
Data + AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi
Просмотров 238День назад
Data AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi
Building an Enterprise Data & AI Catalog with Databricks Unity Catalog
Просмотров 1,2 тыс.День назад
Building an Enterprise Data & AI Catalog with Databricks Unity Catalog
Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems
Просмотров 1,4 тыс.День назад
Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems
The Best Data Warehouse is a Lakehouse
Просмотров 4,1 тыс.День назад
The Best Data Warehouse is a Lakehouse
Building and Deploying GenAI Apps at Block with Jackie Brosamer, Head of AI, Data & Analytics
Просмотров 616День назад
Building and Deploying GenAI Apps at Block with Jackie Brosamer, Head of AI, Data & Analytics
Fei Fei Li, Professor, Stanford University on the History and Future of AI at Data + AI Summit 2024
Просмотров 16 тыс.День назад
Fei Fei Li, Professor, Stanford University on the History and Future of AI at Data AI Summit 2024
Building an Insights Factory at General Motors - Data + AI Summit 2024
Просмотров 434День назад
Building an Insights Factory at General Motors - Data AI Summit 2024
Jensen Huang, Founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks
Просмотров 37 тыс.День назад
Jensen Huang, Founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks
Ali Ghodsi, Databricks Co-founder and CEO Closes the Keynote with a Summary of Product Announcements
Просмотров 323День назад
Ali Ghodsi, Databricks Co-founder and CEO Closes the Keynote with a Summary of Product Announcements

Комментарии

  • @yao5261
    @yao5261 18 часов назад

    懂了,赛博号脉!

  • @FullEvent5678
    @FullEvent5678 День назад

    Very inspiring! My mind is going att 1000 miles an hour with ideas for our startup and clients from this!

  • @subedi04
    @subedi04 День назад

    Where can access your code or workbook? Would be nie to run your code.

  • @AadidevSooknananNXS
    @AadidevSooknananNXS День назад

    Holden and team are incredibly engaging and very easy to understand!

  • @ia6906
    @ia6906 День назад

    Great feature, please also include low code features in order to be more beneficial as Data factory also has for ETL

  • @Naraharisettiraviteja
    @Naraharisettiraviteja День назад

    awesome

  • @brento2890
    @brento2890 2 дня назад

    Excellent presentation, beginning 3.5-4.0 Billion years ago and explaining all the way to now (AI, non-physical-spatial). Excellent. Thank you. 👏

  • @TheDataArchitect
    @TheDataArchitect 2 дня назад

    Who's the speaker?

    • @Databricks
      @Databricks День назад

      Holly Smith - FYI it's also me in the comments for my videos so fire away with any technical follow on questions - Holly

    • @TheDataArchitect
      @TheDataArchitect День назад

      @@Databricks Awesome thanks

  • @muhammadibrahimabdullahi3840
    @muhammadibrahimabdullahi3840 2 дня назад

    AI can do everything you need to do in times of studying and understanding AI.

  • @benim1917
    @benim1917 2 дня назад

    Awesome 👏🏾

  • @Thegameplay2
    @Thegameplay2 2 дня назад

    🎉

  • @gravenguan
    @gravenguan 2 дня назад

    How did parse_json handle schema evolution and from my kowledge, prod table do not recommend parse schema on the fly, it's more safer to define schema first

    • @Databricks
      @Databricks 2 дня назад

      I agree, but with a lot of JSON data you don't know the schema upfront and so can't define it. It's worth noting this is different from inferring the schema which looks at the first 1000 rows and is brittle to upstream changes - Holly

    • @gravenguan
      @gravenguan 2 дня назад

      @@Databricks We used parse_json for dev and exploration purposes as well, thank for the clarification

    • @Databricks
      @Databricks 2 дня назад

      @@gravenguan No worries! Hope this clarifies for other users too

  • @nagendrasrinivas-cj7sr
    @nagendrasrinivas-cj7sr 2 дня назад

    this is clearly copied from snowflake

    • @Databricks
      @Databricks 2 дня назад

      Variants in their various forms have been around for many decades. We're big fans of open source so anyone can use the implementation in other projects or products.

  • @TheDataArchitect
    @TheDataArchitect 2 дня назад

    That's awesome.

  • @matthiasmueller9340
    @matthiasmueller9340 2 дня назад

    How can I specify the required runtime version when using serverless sql warehouse?

    • @Databricks
      @Databricks 2 дня назад

      Variant types will be coming to serverless early/mid July, no need to select a runtime - Holly

  • @afrikaniz3d
    @afrikaniz3d 2 дня назад

    Only note for these videos, since they're not Shorts, ia that it would be more beneficial to use the full wide (1920 x 1080) format, so it's more readable at all resolutions.

    • @Databricks
      @Databricks День назад

      I completely hear you, trying to figure out the best way to film for multiple platforms at once when some define 'short' as <10 mins and RUclips graces me with a mere 60 seconds - Holly

  • @EranM
    @EranM 3 дня назад

    Can't you get the score (ranking score | similarity score) while fetching items from the Vector DB? ..

  • @EranM
    @EranM 3 дня назад

    can someone explain to me, how come you calculate USER embedding when training. And when searching for similar embeddings, you actually get ITEMS embeddings???

  • @LQDEN
    @LQDEN 4 дня назад

    Still didn't explain what it is exactly

  • @gybob100
    @gybob100 4 дня назад

    The shovel company telling you how valuable the gold is

  • @user-he1hs5vx3d
    @user-he1hs5vx3d 5 дней назад

    She is creepy because she is not an honest person. She keeps stealing others works and ideas to pretend she is an expert. To make her greater, she belittles others, including her student (5:30).

  • @jianguo8233
    @jianguo8233 5 дней назад

    Is 4.0 a release or preview today?

  • @uchechukwumadu9625
    @uchechukwumadu9625 5 дней назад

    Insightful!

  • @slavenlulic7736
    @slavenlulic7736 5 дней назад

    powerfull

  • @SnatrWhamo
    @SnatrWhamo 6 дней назад

    Great video and very very useful! While implementing, I got stuck uploading the pdf to a Volume in the Unity Catalog. I am the "Owner" of my Databricks Workspace and Azure account although I don't seem to have the option to add a Volume to a Catalog and thus don't have the option to add the pdf to a Volume. This seems to have to do with permissions and possibly setting up a metastore between DataBricks and Azure Blob Storage? Might you have any insights, ideas, solutions or workarounds? Thanks again for a great video and all the resources to implement this super useful technology!

    • @jasondrew2087
      @jasondrew2087 6 дней назад

      Couple of things, you need USE SCHEMA and CREATE VOLUME permissions on the Schema and USE CATALOG on the catalog. Also you need CREATE EXTERNAL VOLUME permissions on the External Location you plan on using for your Volume.

  • @BlizzardzRS
    @BlizzardzRS 6 дней назад

    While I appreciate the contributions Databricks's makes to the open source community, *this video is incredibly misleading*. DBRX is *not* the highest production quality open-source model nor the best in price per performance. The graph you showed is incredibly misleading, not least because you compared your models to LLaMa2-70B. No one in their right mind at the time of this video's recording is using LLaMa2-70B. Everyone has moved on to LLaMa3, with many providers even disabling LLaMa2 on their platforms because it is more expensive and less performant than LLaMa3. A fairer comparison would be between DBRX and LLaMa3-70B and LLaMa3-8B. You didn’t show that because DBRX gets roasted in these comparisons. (Your talked about the cost associated with training your LLMs and how the cost has come down substantially. Really, this is an argument that the $10M Mosaic/Databricks have spent on DBRX is already redundant. You guys are losing credibility by posting stuff like this. Databricks does some great work. Don’t tarnish your reputation with borderline fraudulent content like this.

  • @georges7298
    @georges7298 7 дней назад

    Thanks - for the open sourcing, and for the summit.

  • @BeginnerAlchemist
    @BeginnerAlchemist 7 дней назад

    I have a question: why we try to research Small-LM just to avoid using GPUs? If we want to save the money for training, we can do the research for how to make GPU or model more effectively, not to avoid using higher techs.

    • @DamaruM
      @DamaruM 6 дней назад

      GPU= power consumption

    • @tulikabose5120
      @tulikabose5120 3 дня назад

      It's not just for GPUs...Small-LM has its own market for on-device or on-edge processing, where there are concerns of privacy and customers would not want their data to go to clouds, and secondly in many industrial use-cases where internet and cloud access isn't accessible due to the remote nature of the use-case, and model inference needs to be done on device...The demand for SLMs is increasing in such use cases...Many big tech companies are not just working on LLMs but also on SLMs under the hood as both of them have to co-exist to cater to different user requirements.

    • @BeginnerAlchemist
      @BeginnerAlchemist 3 дня назад

      @@tulikabose5120 Thank you, I see. It is useful for small devices with limited calculation hardware and the privacy. That's true. So many LLM need a huge data to train and it should collect people's private info to become stronger. That's hated by most of people.

  • @mc.pretzel
    @mc.pretzel 7 дней назад

    Boomshakalaka!

  • @plartoo
    @plartoo 7 дней назад

    :D Show us how to do more complex data transformations than just a simple join you demo-ed and what the actual limitations are (because that's where the reality meets the demo). While you are at it, tell us how to automate (schedule) this pipeline and set up notifications and data quality checks. Next, let us know how to QA that dashboard you let GenAI created (to make sure it's not hallucinating and spitting out bullshit while destroying our firm's reputation), and how to surface it to customers via URL in a secure way (without paying you through our noses). Finally, tell us how much it costs to process GBs of data per month. This is the unbearably condescending demo that assumes the attendees are stupid and don't know what entails in serious, real-world data wrangling. And I know a couple of my clients who are leaving Databricks because they are freaking expensive.

    • @ser1ification
      @ser1ification 3 дня назад

      Exactly. I’m tired of these hype machines. Everything is in beta. Customers are the beta testers. Only thing these guys did good is the Unity Catalog. Of course Spark and Delta as well.

  • @gopi4841
    @gopi4841 8 дней назад

    Nice one, Darshana.

  • @xiaoyu2270
    @xiaoyu2270 8 дней назад

    jensen from china wenzhou

    • @chima6291
      @chima6291 2 дня назад

      bullshit. He was born in Taiwan

  • @forrestbajbek3900
    @forrestbajbek3900 8 дней назад

    Wow, this is a huge improvement.

  • @AleksandarKrumov-pm4tk
    @AleksandarKrumov-pm4tk 8 дней назад

    wow

  • @cobrider2
    @cobrider2 8 дней назад

    2 reactions: - by querying the table with duckdb, the authentication and permission is handled only by Unity Catalog, and not by the underlying storage solution (AWS S3, Azure ADLS, ...). right ? - Applying column masks will only work for hosted compute like the databricks clusters, because querying with a local self hosted compute like DuckDB requires to download the parquet files (containing the PII data) locally then only execute the query... meaning you actually have PII data downloaded on your local machine. right ?

  • @cobrider2
    @cobrider2 8 дней назад

    had a laugh, thank you

  • @subhroitmecse
    @subhroitmecse 8 дней назад

    Examples are not clear about Delta lake ACID properties.

  • @Clammer999
    @Clammer999 8 дней назад

    One of my favourite AI legends. Her passion for humanity and how AI can be leveraged to help improve people’s lives is admirable and astounding.

  • @WonkaTruck
    @WonkaTruck 8 дней назад

    I still can't read Iceberg in Databricks, stop hoping for adoption and just fix that...

  • @DCC72
    @DCC72 8 дней назад

    And from nothing, a college professor just evolved from the bacteria. Rubbish.

  • @sunnychabbi3639
    @sunnychabbi3639 8 дней назад

    Pls provide notebook. It is not available in dbdemos

  • @GerardInnes
    @GerardInnes 9 дней назад

    As a new RUclipsr you doing very well. He teaching us option trading nicely. Just need to be consistent with this process of trading on binary options...

  • @henryebube3576
    @henryebube3576 9 дней назад

    I followed you tutor.I get stuck at 9.38. I type databricks-bge-large-en as the embedding model but the create button is disable not sure why

    • @jasondrew2087
      @jasondrew2087 8 дней назад

      You shouldn't have to type it in, rather it should be an option in the drop down. If you go to Serving do you see it listed as a Foundational model?